forked from Minki/linux
Small release, the most interesting stuff is x86 nested virt improvements.
x86: userspace can now hide nested VMX features from guests; nested VMX can now run Hyper-V in a guest; support for AVX512_4VNNIW and AVX512_FMAPS in KVM; infrastructure support for virtual Intel GPUs. PPC: support for KVM guests on POWER9; improved support for interrupt polling; optimizations and cleanups. s390: two small optimizations, more stuff is in flight and will be in 4.11. ARM: support for the GICv3 ITS on 32bit platforms. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQExBAABCAAbBQJYTkP0FBxwYm9uemluaUByZWRoYXQuY29tAAoJEL/70l94x66D lZIH/iT1n9OQXcuTpYYnQhuCenzI3GZZOIMTbCvK2i5bo0FIJKxVn0EiAAqZSXvO nO185FqjOgLuJ1AD1kJuxzye5suuQp4HIPWWgNHcexLuy43WXWKZe0IQlJ4zM2Xf u31HakpFmVDD+Cd1qN3yDXtDrRQ79/xQn2kw7CWb8olp+pVqwbceN3IVie9QYU+3 gCz0qU6As0aQIwq2PyalOe03sO10PZlm4XhsoXgWPG7P18BMRhNLTDqhLhu7A/ry qElVMANT7LSNLzlwNdpzdK8rVuKxETwjlc1UP8vSuhrwad4zM2JJ1Exk26nC2NaG D0j4tRSyGFIdx6lukZm7HmiSHZ0= =mkoB -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "Small release, the most interesting stuff is x86 nested virt improvements. x86: - userspace can now hide nested VMX features from guests - nested VMX can now run Hyper-V in a guest - support for AVX512_4VNNIW and AVX512_FMAPS in KVM - infrastructure support for virtual Intel GPUs. PPC: - support for KVM guests on POWER9 - improved support for interrupt polling - optimizations and cleanups. s390: - two small optimizations, more stuff is in flight and will be in 4.11. ARM: - support for the GICv3 ITS on 32bit platforms" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits) arm64: KVM: pmu: Reset PMSELR_EL0.SEL to a sane value before entering the guest KVM: arm/arm64: timer: Check for properly initialized timer on init KVM: arm/arm64: vgic-v2: Limit ITARGETSR bits to number of VCPUs KVM: x86: Handle the kthread worker using the new API KVM: nVMX: invvpid handling improvements KVM: nVMX: check host CR3 on vmentry and vmexit KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentry KVM: nVMX: propagate errors from prepare_vmcs02 KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUID KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation KVM: nVMX: support restore of VMX capability MSRs KVM: nVMX: generate non-true VMX MSRs based on true versions KVM: x86: Do not clear RFLAGS.TF when a singlestep trap occurs. KVM: x86: Add kvm_skip_emulated_instruction and use it. KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12 KVM: VMX: Reorder some skip_emulated_instruction calls KVM: x86: Add a return value to kvm_emulate_cpuid KVM: PPC: Book3S: Move prototypes for KVM functions into kvm_ppc.h ...
This commit is contained in:
commit
93173b5bf2
@ -6,6 +6,8 @@ cpuid.txt
|
||||
- KVM-specific cpuid leaves (x86).
|
||||
devices/
|
||||
- KVM_CAP_DEVICE_CTRL userspace API.
|
||||
halt-polling.txt
|
||||
- notes on halt-polling
|
||||
hypercalls.txt
|
||||
- KVM hypercalls.
|
||||
locking.txt
|
||||
|
@ -2034,6 +2034,8 @@ registers, find a list below:
|
||||
PPC | KVM_REG_PPC_WORT | 64
|
||||
PPC | KVM_REG_PPC_SPRG9 | 64
|
||||
PPC | KVM_REG_PPC_DBSR | 32
|
||||
PPC | KVM_REG_PPC_TIDR | 64
|
||||
PPC | KVM_REG_PPC_PSSCR | 64
|
||||
PPC | KVM_REG_PPC_TM_GPR0 | 64
|
||||
...
|
||||
PPC | KVM_REG_PPC_TM_GPR31 | 64
|
||||
@ -2050,6 +2052,7 @@ registers, find a list below:
|
||||
PPC | KVM_REG_PPC_TM_VSCR | 32
|
||||
PPC | KVM_REG_PPC_TM_DSCR | 64
|
||||
PPC | KVM_REG_PPC_TM_TAR | 64
|
||||
PPC | KVM_REG_PPC_TM_XER | 64
|
||||
| |
|
||||
MIPS | KVM_REG_MIPS_R0 | 64
|
||||
...
|
||||
@ -2209,7 +2212,7 @@ after pausing the vcpu, but before it is resumed.
|
||||
4.71 KVM_SIGNAL_MSI
|
||||
|
||||
Capability: KVM_CAP_SIGNAL_MSI
|
||||
Architectures: x86 arm64
|
||||
Architectures: x86 arm arm64
|
||||
Type: vm ioctl
|
||||
Parameters: struct kvm_msi (in)
|
||||
Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error
|
||||
|
127
Documentation/virtual/kvm/halt-polling.txt
Normal file
127
Documentation/virtual/kvm/halt-polling.txt
Normal file
@ -0,0 +1,127 @@
|
||||
The KVM halt polling system
|
||||
===========================
|
||||
|
||||
The KVM halt polling system provides a feature within KVM whereby the latency
|
||||
of a guest can, under some circumstances, be reduced by polling in the host
|
||||
for some time period after the guest has elected to no longer run by cedeing.
|
||||
That is, when a guest vcpu has ceded, or in the case of powerpc when all of the
|
||||
vcpus of a single vcore have ceded, the host kernel polls for wakeup conditions
|
||||
before giving up the cpu to the scheduler in order to let something else run.
|
||||
|
||||
Polling provides a latency advantage in cases where the guest can be run again
|
||||
very quickly by at least saving us a trip through the scheduler, normally on
|
||||
the order of a few micro-seconds, although performance benefits are workload
|
||||
dependant. In the event that no wakeup source arrives during the polling
|
||||
interval or some other task on the runqueue is runnable the scheduler is
|
||||
invoked. Thus halt polling is especially useful on workloads with very short
|
||||
wakeup periods where the time spent halt polling is minimised and the time
|
||||
savings of not invoking the scheduler are distinguishable.
|
||||
|
||||
The generic halt polling code is implemented in:
|
||||
|
||||
virt/kvm/kvm_main.c: kvm_vcpu_block()
|
||||
|
||||
The powerpc kvm-hv specific case is implemented in:
|
||||
|
||||
arch/powerpc/kvm/book3s_hv.c: kvmppc_vcore_blocked()
|
||||
|
||||
Halt Polling Interval
|
||||
=====================
|
||||
|
||||
The maximum time for which to poll before invoking the scheduler, referred to
|
||||
as the halt polling interval, is increased and decreased based on the perceived
|
||||
effectiveness of the polling in an attempt to limit pointless polling.
|
||||
This value is stored in either the vcpu struct:
|
||||
|
||||
kvm_vcpu->halt_poll_ns
|
||||
|
||||
or in the case of powerpc kvm-hv, in the vcore struct:
|
||||
|
||||
kvmppc_vcore->halt_poll_ns
|
||||
|
||||
Thus this is a per vcpu (or vcore) value.
|
||||
|
||||
During polling if a wakeup source is received within the halt polling interval,
|
||||
the interval is left unchanged. In the event that a wakeup source isn't
|
||||
received during the polling interval (and thus schedule is invoked) there are
|
||||
two options, either the polling interval and total block time[0] were less than
|
||||
the global max polling interval (see module params below), or the total block
|
||||
time was greater than the global max polling interval.
|
||||
|
||||
In the event that both the polling interval and total block time were less than
|
||||
the global max polling interval then the polling interval can be increased in
|
||||
the hope that next time during the longer polling interval the wake up source
|
||||
will be received while the host is polling and the latency benefits will be
|
||||
received. The polling interval is grown in the function grow_halt_poll_ns() and
|
||||
is multiplied by the module parameter halt_poll_ns_grow.
|
||||
|
||||
In the event that the total block time was greater than the global max polling
|
||||
interval then the host will never poll for long enough (limited by the global
|
||||
max) to wakeup during the polling interval so it may as well be shrunk in order
|
||||
to avoid pointless polling. The polling interval is shrunk in the function
|
||||
shrink_halt_poll_ns() and is divided by the module parameter
|
||||
halt_poll_ns_shrink, or set to 0 iff halt_poll_ns_shrink == 0.
|
||||
|
||||
It is worth noting that this adjustment process attempts to hone in on some
|
||||
steady state polling interval but will only really do a good job for wakeups
|
||||
which come at an approximately constant rate, otherwise there will be constant
|
||||
adjustment of the polling interval.
|
||||
|
||||
[0] total block time: the time between when the halt polling function is
|
||||
invoked and a wakeup source received (irrespective of
|
||||
whether the scheduler is invoked within that function).
|
||||
|
||||
Module Parameters
|
||||
=================
|
||||
|
||||
The kvm module has 3 tuneable module parameters to adjust the global max
|
||||
polling interval as well as the rate at which the polling interval is grown and
|
||||
shrunk. These variables are defined in include/linux/kvm_host.h and as module
|
||||
parameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the
|
||||
powerpc kvm-hv case.
|
||||
|
||||
Module Parameter | Description | Default Value
|
||||
--------------------------------------------------------------------------------
|
||||
halt_poll_ns | The global max polling interval | KVM_HALT_POLL_NS_DEFAULT
|
||||
| which defines the ceiling value |
|
||||
| of the polling interval for | (per arch value)
|
||||
| each vcpu. |
|
||||
--------------------------------------------------------------------------------
|
||||
halt_poll_ns_grow | The value by which the halt | 2
|
||||
| polling interval is multiplied |
|
||||
| in the grow_halt_poll_ns() |
|
||||
| function. |
|
||||
--------------------------------------------------------------------------------
|
||||
halt_poll_ns_shrink | The value by which the halt | 0
|
||||
| polling interval is divided in |
|
||||
| the shrink_halt_poll_ns() |
|
||||
| function. |
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
These module parameters can be set from the debugfs files in:
|
||||
|
||||
/sys/module/kvm/parameters/
|
||||
|
||||
Note: that these module parameters are system wide values and are not able to
|
||||
be tuned on a per vm basis.
|
||||
|
||||
Further Notes
|
||||
=============
|
||||
|
||||
- Care should be taken when setting the halt_poll_ns module parameter as a
|
||||
large value has the potential to drive the cpu usage to 100% on a machine which
|
||||
would be almost entirely idle otherwise. This is because even if a guest has
|
||||
wakeups during which very little work is done and which are quite far apart, if
|
||||
the period is shorter than the global max polling interval (halt_poll_ns) then
|
||||
the host will always poll for the entire block time and thus cpu utilisation
|
||||
will go to 100%.
|
||||
|
||||
- Halt polling essentially presents a trade off between power usage and latency
|
||||
and the module parameters should be used to tune the affinity for this. Idle
|
||||
cpu time is essentially converted to host kernel time with the aim of decreasing
|
||||
latency when entering the guest.
|
||||
|
||||
- Halt polling will only be conducted by the host when no other tasks are
|
||||
runnable on that cpu, otherwise the polling will cease immediately and
|
||||
schedule will be invoked to allow that other task to run. Thus this doesn't
|
||||
allow a guest to denial of service the cpu.
|
@ -87,9 +87,11 @@ struct kvm_regs {
|
||||
/* Supported VGICv3 address types */
|
||||
#define KVM_VGIC_V3_ADDR_TYPE_DIST 2
|
||||
#define KVM_VGIC_V3_ADDR_TYPE_REDIST 3
|
||||
#define KVM_VGIC_ITS_ADDR_TYPE 4
|
||||
|
||||
#define KVM_VGIC_V3_DIST_SIZE SZ_64K
|
||||
#define KVM_VGIC_V3_REDIST_SIZE (2 * SZ_64K)
|
||||
#define KVM_VGIC_V3_ITS_SIZE (2 * SZ_64K)
|
||||
|
||||
#define KVM_ARM_VCPU_POWER_OFF 0 /* CPU is started in OFF state */
|
||||
#define KVM_ARM_VCPU_PSCI_0_2 1 /* CPU uses PSCI v0.2 */
|
||||
|
@ -34,6 +34,7 @@ config KVM
|
||||
select HAVE_KVM_IRQFD
|
||||
select HAVE_KVM_IRQCHIP
|
||||
select HAVE_KVM_IRQ_ROUTING
|
||||
select HAVE_KVM_MSI
|
||||
depends on ARM_VIRT_EXT && ARM_LPAE && ARM_ARCH_TIMER
|
||||
---help---
|
||||
Support hosting virtualized guest machines.
|
||||
|
@ -32,5 +32,6 @@ obj-y += $(KVM)/arm/vgic/vgic-mmio.o
|
||||
obj-y += $(KVM)/arm/vgic/vgic-mmio-v2.o
|
||||
obj-y += $(KVM)/arm/vgic/vgic-mmio-v3.o
|
||||
obj-y += $(KVM)/arm/vgic/vgic-kvm-device.o
|
||||
obj-y += $(KVM)/arm/vgic/vgic-its.o
|
||||
obj-y += $(KVM)/irqchip.o
|
||||
obj-y += $(KVM)/arm/arch_timer.o
|
||||
|
@ -221,6 +221,12 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
|
||||
case KVM_CAP_MAX_VCPUS:
|
||||
r = KVM_MAX_VCPUS;
|
||||
break;
|
||||
case KVM_CAP_MSI_DEVID:
|
||||
if (!kvm)
|
||||
r = -EINVAL;
|
||||
else
|
||||
r = kvm->arch.vgic.msis_require_devid;
|
||||
break;
|
||||
default:
|
||||
r = kvm_arch_dev_ioctl_check_extension(kvm, ext);
|
||||
break;
|
||||
|
@ -16,9 +16,6 @@ menuconfig VIRTUALIZATION
|
||||
|
||||
if VIRTUALIZATION
|
||||
|
||||
config KVM_ARM_VGIC_V3_ITS
|
||||
bool
|
||||
|
||||
config KVM
|
||||
bool "Kernel-based Virtual Machine (KVM) support"
|
||||
depends on OF
|
||||
@ -34,7 +31,6 @@ config KVM
|
||||
select KVM_VFIO
|
||||
select HAVE_KVM_EVENTFD
|
||||
select HAVE_KVM_IRQFD
|
||||
select KVM_ARM_VGIC_V3_ITS
|
||||
select KVM_ARM_PMU if HW_PERF_EVENTS
|
||||
select HAVE_KVM_MSI
|
||||
select HAVE_KVM_IRQCHIP
|
||||
|
@ -85,7 +85,13 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
|
||||
write_sysreg(val, hcr_el2);
|
||||
/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
|
||||
write_sysreg(1 << 15, hstr_el2);
|
||||
/* Make sure we trap PMU access from EL0 to EL2 */
|
||||
/*
|
||||
* Make sure we trap PMU access from EL0 to EL2. Also sanitize
|
||||
* PMSELR_EL0 to make sure it never contains the cycle
|
||||
* counter, which could make a PMXEVCNTR_EL0 access UNDEF at
|
||||
* EL1 instead of being trapped to EL2.
|
||||
*/
|
||||
write_sysreg(0, pmselr_el0);
|
||||
write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
|
||||
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
|
||||
__activate_traps_arch()();
|
||||
|
@ -86,12 +86,6 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
|
||||
case KVM_CAP_VCPU_ATTRIBUTES:
|
||||
r = 1;
|
||||
break;
|
||||
case KVM_CAP_MSI_DEVID:
|
||||
if (!kvm)
|
||||
r = -EINVAL;
|
||||
else
|
||||
r = kvm->arch.vgic.msis_require_devid;
|
||||
break;
|
||||
default:
|
||||
r = 0;
|
||||
}
|
||||
|
@ -70,7 +70,9 @@
|
||||
|
||||
#define HPTE_V_SSIZE_SHIFT 62
|
||||
#define HPTE_V_AVPN_SHIFT 7
|
||||
#define HPTE_V_COMMON_BITS ASM_CONST(0x000fffffffffffff)
|
||||
#define HPTE_V_AVPN ASM_CONST(0x3fffffffffffff80)
|
||||
#define HPTE_V_AVPN_3_0 ASM_CONST(0x000fffffffffff80)
|
||||
#define HPTE_V_AVPN_VAL(x) (((x) & HPTE_V_AVPN) >> HPTE_V_AVPN_SHIFT)
|
||||
#define HPTE_V_COMPARE(x,y) (!(((x) ^ (y)) & 0xffffffffffffff80UL))
|
||||
#define HPTE_V_BOLTED ASM_CONST(0x0000000000000010)
|
||||
@ -80,14 +82,16 @@
|
||||
#define HPTE_V_VALID ASM_CONST(0x0000000000000001)
|
||||
|
||||
/*
|
||||
* ISA 3.0 have a different HPTE format.
|
||||
* ISA 3.0 has a different HPTE format.
|
||||
*/
|
||||
#define HPTE_R_3_0_SSIZE_SHIFT 58
|
||||
#define HPTE_R_3_0_SSIZE_MASK (3ull << HPTE_R_3_0_SSIZE_SHIFT)
|
||||
#define HPTE_R_PP0 ASM_CONST(0x8000000000000000)
|
||||
#define HPTE_R_TS ASM_CONST(0x4000000000000000)
|
||||
#define HPTE_R_KEY_HI ASM_CONST(0x3000000000000000)
|
||||
#define HPTE_R_RPN_SHIFT 12
|
||||
#define HPTE_R_RPN ASM_CONST(0x0ffffffffffff000)
|
||||
#define HPTE_R_RPN_3_0 ASM_CONST(0x01fffffffffff000)
|
||||
#define HPTE_R_PP ASM_CONST(0x0000000000000003)
|
||||
#define HPTE_R_PPP ASM_CONST(0x8000000000000003)
|
||||
#define HPTE_R_N ASM_CONST(0x0000000000000004)
|
||||
@ -316,11 +320,42 @@ static inline unsigned long hpte_encode_avpn(unsigned long vpn, int psize,
|
||||
*/
|
||||
v = (vpn >> (23 - VPN_SHIFT)) & ~(mmu_psize_defs[psize].avpnm);
|
||||
v <<= HPTE_V_AVPN_SHIFT;
|
||||
if (!cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
v |= ((unsigned long) ssize) << HPTE_V_SSIZE_SHIFT;
|
||||
v |= ((unsigned long) ssize) << HPTE_V_SSIZE_SHIFT;
|
||||
return v;
|
||||
}
|
||||
|
||||
/*
|
||||
* ISA v3.0 defines a new HPTE format, which differs from the old
|
||||
* format in having smaller AVPN and ARPN fields, and the B field
|
||||
* in the second dword instead of the first.
|
||||
*/
|
||||
static inline unsigned long hpte_old_to_new_v(unsigned long v)
|
||||
{
|
||||
/* trim AVPN, drop B */
|
||||
return v & HPTE_V_COMMON_BITS;
|
||||
}
|
||||
|
||||
static inline unsigned long hpte_old_to_new_r(unsigned long v, unsigned long r)
|
||||
{
|
||||
/* move B field from 1st to 2nd dword, trim ARPN */
|
||||
return (r & ~HPTE_R_3_0_SSIZE_MASK) |
|
||||
(((v) >> HPTE_V_SSIZE_SHIFT) << HPTE_R_3_0_SSIZE_SHIFT);
|
||||
}
|
||||
|
||||
static inline unsigned long hpte_new_to_old_v(unsigned long v, unsigned long r)
|
||||
{
|
||||
/* insert B field */
|
||||
return (v & HPTE_V_COMMON_BITS) |
|
||||
((r & HPTE_R_3_0_SSIZE_MASK) <<
|
||||
(HPTE_V_SSIZE_SHIFT - HPTE_R_3_0_SSIZE_SHIFT));
|
||||
}
|
||||
|
||||
static inline unsigned long hpte_new_to_old_r(unsigned long r)
|
||||
{
|
||||
/* clear out B field */
|
||||
return r & ~HPTE_R_3_0_SSIZE_MASK;
|
||||
}
|
||||
|
||||
/*
|
||||
* This function sets the AVPN and L fields of the HPTE appropriately
|
||||
* using the base page size and actual page size.
|
||||
@ -341,12 +376,8 @@ static inline unsigned long hpte_encode_v(unsigned long vpn, int base_psize,
|
||||
* aligned for the requested page size
|
||||
*/
|
||||
static inline unsigned long hpte_encode_r(unsigned long pa, int base_psize,
|
||||
int actual_psize, int ssize)
|
||||
int actual_psize)
|
||||
{
|
||||
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
pa |= ((unsigned long) ssize) << HPTE_R_3_0_SSIZE_SHIFT;
|
||||
|
||||
/* A 4K page needs no special encoding */
|
||||
if (actual_psize == MMU_PAGE_4K)
|
||||
return pa & HPTE_R_RPN;
|
||||
|
@ -99,6 +99,7 @@
|
||||
#define BOOK3S_INTERRUPT_H_EMUL_ASSIST 0xe40
|
||||
#define BOOK3S_INTERRUPT_HMI 0xe60
|
||||
#define BOOK3S_INTERRUPT_H_DOORBELL 0xe80
|
||||
#define BOOK3S_INTERRUPT_H_VIRT 0xea0
|
||||
#define BOOK3S_INTERRUPT_PERFMON 0xf00
|
||||
#define BOOK3S_INTERRUPT_ALTIVEC 0xf20
|
||||
#define BOOK3S_INTERRUPT_VSX 0xf40
|
||||
|
@ -48,7 +48,7 @@
|
||||
#ifdef CONFIG_KVM_MMIO
|
||||
#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
|
||||
#endif
|
||||
#define KVM_HALT_POLL_NS_DEFAULT 500000
|
||||
#define KVM_HALT_POLL_NS_DEFAULT 10000 /* 10 us */
|
||||
|
||||
/* These values are internal and can be increased later */
|
||||
#define KVM_NR_IRQCHIPS 1
|
||||
@ -244,8 +244,10 @@ struct kvm_arch_memory_slot {
|
||||
struct kvm_arch {
|
||||
unsigned int lpid;
|
||||
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
|
||||
unsigned int tlb_sets;
|
||||
unsigned long hpt_virt;
|
||||
struct revmap_entry *revmap;
|
||||
atomic64_t mmio_update;
|
||||
unsigned int host_lpid;
|
||||
unsigned long host_lpcr;
|
||||
unsigned long sdr1;
|
||||
@ -408,6 +410,24 @@ struct kvmppc_passthru_irqmap {
|
||||
#define KVMPPC_IRQ_MPIC 1
|
||||
#define KVMPPC_IRQ_XICS 2
|
||||
|
||||
#define MMIO_HPTE_CACHE_SIZE 4
|
||||
|
||||
struct mmio_hpte_cache_entry {
|
||||
unsigned long hpte_v;
|
||||
unsigned long hpte_r;
|
||||
unsigned long rpte;
|
||||
unsigned long pte_index;
|
||||
unsigned long eaddr;
|
||||
unsigned long slb_v;
|
||||
long mmio_update;
|
||||
unsigned int slb_base_pshift;
|
||||
};
|
||||
|
||||
struct mmio_hpte_cache {
|
||||
struct mmio_hpte_cache_entry entry[MMIO_HPTE_CACHE_SIZE];
|
||||
unsigned int index;
|
||||
};
|
||||
|
||||
struct openpic;
|
||||
|
||||
struct kvm_vcpu_arch {
|
||||
@ -498,6 +518,8 @@ struct kvm_vcpu_arch {
|
||||
ulong tcscr;
|
||||
ulong acop;
|
||||
ulong wort;
|
||||
ulong tid;
|
||||
ulong psscr;
|
||||
ulong shadow_srr1;
|
||||
#endif
|
||||
u32 vrsave; /* also USPRG0 */
|
||||
@ -546,6 +568,7 @@ struct kvm_vcpu_arch {
|
||||
u64 tfiar;
|
||||
|
||||
u32 cr_tm;
|
||||
u64 xer_tm;
|
||||
u64 lr_tm;
|
||||
u64 ctr_tm;
|
||||
u64 amr_tm;
|
||||
@ -655,9 +678,11 @@ struct kvm_vcpu_arch {
|
||||
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
|
||||
struct kvm_vcpu_arch_shared shregs;
|
||||
|
||||
struct mmio_hpte_cache mmio_cache;
|
||||
unsigned long pgfault_addr;
|
||||
long pgfault_index;
|
||||
unsigned long pgfault_hpte[2];
|
||||
struct mmio_hpte_cache_entry *pgfault_cache;
|
||||
|
||||
struct task_struct *run_task;
|
||||
struct kvm_run *kvm_run;
|
||||
|
@ -483,9 +483,10 @@ extern void kvmppc_xics_set_mapped(struct kvm *kvm, unsigned long guest_irq,
|
||||
unsigned long host_irq);
|
||||
extern void kvmppc_xics_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
|
||||
unsigned long host_irq);
|
||||
extern long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu, u32 xirr,
|
||||
struct kvmppc_irq_map *irq_map,
|
||||
struct kvmppc_passthru_irqmap *pimap);
|
||||
extern long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu, __be32 xirr,
|
||||
struct kvmppc_irq_map *irq_map,
|
||||
struct kvmppc_passthru_irqmap *pimap,
|
||||
bool *again);
|
||||
extern int h_ipi_redirect;
|
||||
#else
|
||||
static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap(
|
||||
@ -509,6 +510,48 @@ static inline int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
|
||||
{ return 0; }
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Prototypes for functions called only from assembler code.
|
||||
* Having prototypes reduces sparse errors.
|
||||
*/
|
||||
long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
|
||||
unsigned long ioba, unsigned long tce);
|
||||
long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
|
||||
unsigned long liobn, unsigned long ioba,
|
||||
unsigned long tce_list, unsigned long npages);
|
||||
long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
|
||||
unsigned long liobn, unsigned long ioba,
|
||||
unsigned long tce_value, unsigned long npages);
|
||||
long int kvmppc_rm_h_confer(struct kvm_vcpu *vcpu, int target,
|
||||
unsigned int yield_count);
|
||||
long kvmppc_h_random(struct kvm_vcpu *vcpu);
|
||||
void kvmhv_commence_exit(int trap);
|
||||
long kvmppc_realmode_machine_check(struct kvm_vcpu *vcpu);
|
||||
void kvmppc_subcore_enter_guest(void);
|
||||
void kvmppc_subcore_exit_guest(void);
|
||||
long kvmppc_realmode_hmi_handler(void);
|
||||
long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
|
||||
long pte_index, unsigned long pteh, unsigned long ptel);
|
||||
long kvmppc_h_remove(struct kvm_vcpu *vcpu, unsigned long flags,
|
||||
unsigned long pte_index, unsigned long avpn);
|
||||
long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu);
|
||||
long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
|
||||
unsigned long pte_index, unsigned long avpn,
|
||||
unsigned long va);
|
||||
long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
|
||||
unsigned long pte_index);
|
||||
long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
|
||||
unsigned long pte_index);
|
||||
long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
|
||||
unsigned long pte_index);
|
||||
long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
|
||||
unsigned long slb_v, unsigned int status, bool data);
|
||||
unsigned long kvmppc_rm_h_xirr(struct kvm_vcpu *vcpu);
|
||||
int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
|
||||
unsigned long mfrr);
|
||||
int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr);
|
||||
int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr);
|
||||
|
||||
/*
|
||||
* Host-side operations we want to set up while running in real
|
||||
* mode in the guest operating on the xics.
|
||||
|
@ -214,6 +214,11 @@ extern u64 ppc64_rma_size;
|
||||
/* Cleanup function used by kexec */
|
||||
extern void mmu_cleanup_all(void);
|
||||
extern void radix__mmu_cleanup_all(void);
|
||||
|
||||
/* Functions for creating and updating partition table on POWER9 */
|
||||
extern void mmu_partition_table_init(void);
|
||||
extern void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
|
||||
unsigned long dw1);
|
||||
#endif /* CONFIG_PPC64 */
|
||||
|
||||
struct mm_struct;
|
||||
|
@ -220,9 +220,12 @@ int64_t opal_pci_set_power_state(uint64_t async_token, uint64_t id,
|
||||
int64_t opal_pci_poll2(uint64_t id, uint64_t data);
|
||||
|
||||
int64_t opal_int_get_xirr(uint32_t *out_xirr, bool just_poll);
|
||||
int64_t opal_rm_int_get_xirr(__be32 *out_xirr, bool just_poll);
|
||||
int64_t opal_int_set_cppr(uint8_t cppr);
|
||||
int64_t opal_int_eoi(uint32_t xirr);
|
||||
int64_t opal_rm_int_eoi(uint32_t xirr);
|
||||
int64_t opal_int_set_mfrr(uint32_t cpu, uint8_t mfrr);
|
||||
int64_t opal_rm_int_set_mfrr(uint32_t cpu, uint8_t mfrr);
|
||||
int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t kill_type,
|
||||
uint32_t pe_num, uint32_t tce_size,
|
||||
uint64_t dma_addr, uint32_t npages);
|
||||
|
@ -153,6 +153,8 @@
|
||||
#define PSSCR_EC 0x00100000 /* Exit Criterion */
|
||||
#define PSSCR_ESL 0x00200000 /* Enable State Loss */
|
||||
#define PSSCR_SD 0x00400000 /* Status Disable */
|
||||
#define PSSCR_PLS 0xf000000000000000 /* Power-saving Level Status */
|
||||
#define PSSCR_GUEST_VIS 0xf0000000000003ff /* Guest-visible PSSCR fields */
|
||||
|
||||
/* Floating Point Status and Control Register (FPSCR) Fields */
|
||||
#define FPSCR_FX 0x80000000 /* FPU exception summary */
|
||||
@ -236,6 +238,7 @@
|
||||
#define SPRN_TEXASRU 0x83 /* '' '' '' Upper 32 */
|
||||
#define TEXASR_FS __MASK(63-36) /* TEXASR Failure Summary */
|
||||
#define SPRN_TFHAR 0x80 /* Transaction Failure Handler Addr */
|
||||
#define SPRN_TIDR 144 /* Thread ID register */
|
||||
#define SPRN_CTRLF 0x088
|
||||
#define SPRN_CTRLT 0x098
|
||||
#define CTRL_CT 0xc0000000 /* current thread */
|
||||
@ -294,6 +297,7 @@
|
||||
#define SPRN_HSRR1 0x13B /* Hypervisor Save/Restore 1 */
|
||||
#define SPRN_LMRR 0x32D /* Load Monitor Region Register */
|
||||
#define SPRN_LMSER 0x32E /* Load Monitor Section Enable Register */
|
||||
#define SPRN_ASDR 0x330 /* Access segment descriptor register */
|
||||
#define SPRN_IC 0x350 /* Virtual Instruction Count */
|
||||
#define SPRN_VTB 0x351 /* Virtual Time Base */
|
||||
#define SPRN_LDBAR 0x352 /* LD Base Address Register */
|
||||
@ -305,6 +309,7 @@
|
||||
|
||||
/* HFSCR and FSCR bit numbers are the same */
|
||||
#define FSCR_LM_LG 11 /* Enable Load Monitor Registers */
|
||||
#define FSCR_MSGP_LG 10 /* Enable MSGP */
|
||||
#define FSCR_TAR_LG 8 /* Enable Target Address Register */
|
||||
#define FSCR_EBB_LG 7 /* Enable Event Based Branching */
|
||||
#define FSCR_TM_LG 5 /* Enable Transactional Memory */
|
||||
@ -320,6 +325,7 @@
|
||||
#define FSCR_DSCR __MASK(FSCR_DSCR_LG)
|
||||
#define SPRN_HFSCR 0xbe /* HV=1 Facility Status & Control Register */
|
||||
#define HFSCR_LM __MASK(FSCR_LM_LG)
|
||||
#define HFSCR_MSGP __MASK(FSCR_MSGP_LG)
|
||||
#define HFSCR_TAR __MASK(FSCR_TAR_LG)
|
||||
#define HFSCR_EBB __MASK(FSCR_EBB_LG)
|
||||
#define HFSCR_TM __MASK(FSCR_TM_LG)
|
||||
@ -358,6 +364,7 @@
|
||||
#define LPCR_PECE_HVEE ASM_CONST(0x0000400000000000) /* P9 Wakeup on HV interrupts */
|
||||
#define LPCR_MER ASM_CONST(0x0000000000000800) /* Mediated External Exception */
|
||||
#define LPCR_MER_SH 11
|
||||
#define LPCR_GTSE ASM_CONST(0x0000000000000400) /* Guest Translation Shootdown Enable */
|
||||
#define LPCR_TC ASM_CONST(0x0000000000000200) /* Translation control */
|
||||
#define LPCR_LPES 0x0000000c
|
||||
#define LPCR_LPES0 ASM_CONST(0x0000000000000008) /* LPAR Env selector 0 */
|
||||
@ -378,6 +385,12 @@
|
||||
#define PCR_VEC_DIS (1ul << (63-0)) /* Vec. disable (bit NA since POWER8) */
|
||||
#define PCR_VSX_DIS (1ul << (63-1)) /* VSX disable (bit NA since POWER8) */
|
||||
#define PCR_TM_DIS (1ul << (63-2)) /* Trans. memory disable (POWER8) */
|
||||
/*
|
||||
* These bits are used in the function kvmppc_set_arch_compat() to specify and
|
||||
* determine both the compatibility level which we want to emulate and the
|
||||
* compatibility level which the host is capable of emulating.
|
||||
*/
|
||||
#define PCR_ARCH_207 0x8 /* Architecture 2.07 */
|
||||
#define PCR_ARCH_206 0x4 /* Architecture 2.06 */
|
||||
#define PCR_ARCH_205 0x2 /* Architecture 2.05 */
|
||||
#define SPRN_HEIR 0x153 /* Hypervisor Emulated Instruction Register */
|
||||
@ -1219,6 +1232,7 @@
|
||||
#define PVR_ARCH_206 0x0f000003
|
||||
#define PVR_ARCH_206p 0x0f100003
|
||||
#define PVR_ARCH_207 0x0f000004
|
||||
#define PVR_ARCH_300 0x0f000005
|
||||
|
||||
/* Macros for setting and retrieving special purpose registers */
|
||||
#ifndef __ASSEMBLY__
|
||||
|
@ -573,6 +573,10 @@ struct kvm_get_htab_header {
|
||||
#define KVM_REG_PPC_SPRG9 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xba)
|
||||
#define KVM_REG_PPC_DBSR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbb)
|
||||
|
||||
/* POWER9 registers */
|
||||
#define KVM_REG_PPC_TIDR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbc)
|
||||
#define KVM_REG_PPC_PSSCR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbd)
|
||||
|
||||
/* Transactional Memory checkpointed state:
|
||||
* This is all GPRs, all VSX regs and a subset of SPRs
|
||||
*/
|
||||
@ -596,6 +600,7 @@ struct kvm_get_htab_header {
|
||||
#define KVM_REG_PPC_TM_VSCR (KVM_REG_PPC_TM | KVM_REG_SIZE_U32 | 0x67)
|
||||
#define KVM_REG_PPC_TM_DSCR (KVM_REG_PPC_TM | KVM_REG_SIZE_U64 | 0x68)
|
||||
#define KVM_REG_PPC_TM_TAR (KVM_REG_PPC_TM | KVM_REG_SIZE_U64 | 0x69)
|
||||
#define KVM_REG_PPC_TM_XER (KVM_REG_PPC_TM | KVM_REG_SIZE_U64 | 0x6a)
|
||||
|
||||
/* PPC64 eXternal Interrupt Controller Specification */
|
||||
#define KVM_DEV_XICS_GRP_SOURCES 1 /* 64-bit source attributes */
|
||||
|
@ -487,6 +487,7 @@ int main(void)
|
||||
|
||||
/* book3s */
|
||||
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
|
||||
DEFINE(KVM_TLB_SETS, offsetof(struct kvm, arch.tlb_sets));
|
||||
DEFINE(KVM_SDR1, offsetof(struct kvm, arch.sdr1));
|
||||
DEFINE(KVM_HOST_LPID, offsetof(struct kvm, arch.host_lpid));
|
||||
DEFINE(KVM_HOST_LPCR, offsetof(struct kvm, arch.host_lpcr));
|
||||
@ -548,6 +549,8 @@ int main(void)
|
||||
DEFINE(VCPU_TCSCR, offsetof(struct kvm_vcpu, arch.tcscr));
|
||||
DEFINE(VCPU_ACOP, offsetof(struct kvm_vcpu, arch.acop));
|
||||
DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
|
||||
DEFINE(VCPU_TID, offsetof(struct kvm_vcpu, arch.tid));
|
||||
DEFINE(VCPU_PSSCR, offsetof(struct kvm_vcpu, arch.psscr));
|
||||
DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_map));
|
||||
DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
|
||||
DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, napping_threads));
|
||||
@ -569,6 +572,7 @@ int main(void)
|
||||
DEFINE(VCPU_VRS_TM, offsetof(struct kvm_vcpu, arch.vr_tm.vr));
|
||||
DEFINE(VCPU_VRSAVE_TM, offsetof(struct kvm_vcpu, arch.vrsave_tm));
|
||||
DEFINE(VCPU_CR_TM, offsetof(struct kvm_vcpu, arch.cr_tm));
|
||||
DEFINE(VCPU_XER_TM, offsetof(struct kvm_vcpu, arch.xer_tm));
|
||||
DEFINE(VCPU_LR_TM, offsetof(struct kvm_vcpu, arch.lr_tm));
|
||||
DEFINE(VCPU_CTR_TM, offsetof(struct kvm_vcpu, arch.ctr_tm));
|
||||
DEFINE(VCPU_AMR_TM, offsetof(struct kvm_vcpu, arch.amr_tm));
|
||||
|
@ -174,7 +174,7 @@ __init_FSCR:
|
||||
__init_HFSCR:
|
||||
mfspr r3,SPRN_HFSCR
|
||||
ori r3,r3,HFSCR_TAR|HFSCR_TM|HFSCR_BHRB|HFSCR_PM|\
|
||||
HFSCR_DSCR|HFSCR_VECVSX|HFSCR_FP|HFSCR_EBB
|
||||
HFSCR_DSCR|HFSCR_VECVSX|HFSCR_FP|HFSCR_EBB|HFSCR_MSGP
|
||||
mtspr SPRN_HFSCR,r3
|
||||
blr
|
||||
|
||||
|
@ -88,6 +88,8 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
|
||||
/* 128 (2**7) bytes in each HPTEG */
|
||||
kvm->arch.hpt_mask = (1ul << (order - 7)) - 1;
|
||||
|
||||
atomic64_set(&kvm->arch.mmio_update, 0);
|
||||
|
||||
/* Allocate reverse map array */
|
||||
rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt_npte);
|
||||
if (!rev) {
|
||||
@ -255,7 +257,7 @@ static void kvmppc_mmu_book3s_64_hv_reset_msr(struct kvm_vcpu *vcpu)
|
||||
kvmppc_set_msr(vcpu, msr);
|
||||
}
|
||||
|
||||
long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
|
||||
static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
|
||||
long pte_index, unsigned long pteh,
|
||||
unsigned long ptel, unsigned long *pte_idx_ret)
|
||||
{
|
||||
@ -312,7 +314,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
|
||||
struct kvmppc_slb *slbe;
|
||||
unsigned long slb_v;
|
||||
unsigned long pp, key;
|
||||
unsigned long v, gr;
|
||||
unsigned long v, orig_v, gr;
|
||||
__be64 *hptep;
|
||||
int index;
|
||||
int virtmode = vcpu->arch.shregs.msr & (data ? MSR_DR : MSR_IR);
|
||||
@ -337,10 +339,12 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
|
||||
return -ENOENT;
|
||||
}
|
||||
hptep = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
|
||||
v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
|
||||
v = orig_v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
v = hpte_new_to_old_v(v, be64_to_cpu(hptep[1]));
|
||||
gr = kvm->arch.revmap[index].guest_rpte;
|
||||
|
||||
unlock_hpte(hptep, v);
|
||||
unlock_hpte(hptep, orig_v);
|
||||
preempt_enable();
|
||||
|
||||
gpte->eaddr = eaddr;
|
||||
@ -438,6 +442,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
|
||||
{
|
||||
struct kvm *kvm = vcpu->kvm;
|
||||
unsigned long hpte[3], r;
|
||||
unsigned long hnow_v, hnow_r;
|
||||
__be64 *hptep;
|
||||
unsigned long mmu_seq, psize, pte_size;
|
||||
unsigned long gpa_base, gfn_base;
|
||||
@ -451,6 +456,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
|
||||
unsigned int writing, write_ok;
|
||||
struct vm_area_struct *vma;
|
||||
unsigned long rcbits;
|
||||
long mmio_update;
|
||||
|
||||
/*
|
||||
* Real-mode code has already searched the HPT and found the
|
||||
@ -460,6 +466,19 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
|
||||
*/
|
||||
if (ea != vcpu->arch.pgfault_addr)
|
||||
return RESUME_GUEST;
|
||||
|
||||
if (vcpu->arch.pgfault_cache) {
|
||||
mmio_update = atomic64_read(&kvm->arch.mmio_update);
|
||||
if (mmio_update == vcpu->arch.pgfault_cache->mmio_update) {
|
||||
r = vcpu->arch.pgfault_cache->rpte;
|
||||
psize = hpte_page_size(vcpu->arch.pgfault_hpte[0], r);
|
||||
gpa_base = r & HPTE_R_RPN & ~(psize - 1);
|
||||
gfn_base = gpa_base >> PAGE_SHIFT;
|
||||
gpa = gpa_base | (ea & (psize - 1));
|
||||
return kvmppc_hv_emulate_mmio(run, vcpu, gpa, ea,
|
||||
dsisr & DSISR_ISSTORE);
|
||||
}
|
||||
}
|
||||
index = vcpu->arch.pgfault_index;
|
||||
hptep = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
|
||||
rev = &kvm->arch.revmap[index];
|
||||
@ -472,6 +491,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
|
||||
unlock_hpte(hptep, hpte[0]);
|
||||
preempt_enable();
|
||||
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||
hpte[0] = hpte_new_to_old_v(hpte[0], hpte[1]);
|
||||
hpte[1] = hpte_new_to_old_r(hpte[1]);
|
||||
}
|
||||
if (hpte[0] != vcpu->arch.pgfault_hpte[0] ||
|
||||
hpte[1] != vcpu->arch.pgfault_hpte[1])
|
||||
return RESUME_GUEST;
|
||||
@ -575,16 +598,22 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
|
||||
*/
|
||||
if (psize < PAGE_SIZE)
|
||||
psize = PAGE_SIZE;
|
||||
r = (r & ~(HPTE_R_PP0 - psize)) | ((pfn << PAGE_SHIFT) & ~(psize - 1));
|
||||
r = (r & HPTE_R_KEY_HI) | (r & ~(HPTE_R_PP0 - psize)) |
|
||||
((pfn << PAGE_SHIFT) & ~(psize - 1));
|
||||
if (hpte_is_writable(r) && !write_ok)
|
||||
r = hpte_make_readonly(r);
|
||||
ret = RESUME_GUEST;
|
||||
preempt_disable();
|
||||
while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
|
||||
cpu_relax();
|
||||
if ((be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK) != hpte[0] ||
|
||||
be64_to_cpu(hptep[1]) != hpte[1] ||
|
||||
rev->guest_rpte != hpte[2])
|
||||
hnow_v = be64_to_cpu(hptep[0]);
|
||||
hnow_r = be64_to_cpu(hptep[1]);
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||
hnow_v = hpte_new_to_old_v(hnow_v, hnow_r);
|
||||
hnow_r = hpte_new_to_old_r(hnow_r);
|
||||
}
|
||||
if ((hnow_v & ~HPTE_V_HVLOCK) != hpte[0] || hnow_r != hpte[1] ||
|
||||
rev->guest_rpte != hpte[2])
|
||||
/* HPTE has been changed under us; let the guest retry */
|
||||
goto out_unlock;
|
||||
hpte[0] = (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
|
||||
@ -615,6 +644,10 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
|
||||
kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
|
||||
}
|
||||
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||
r = hpte_old_to_new_r(hpte[0], r);
|
||||
hpte[0] = hpte_old_to_new_v(hpte[0]);
|
||||
}
|
||||
hptep[1] = cpu_to_be64(r);
|
||||
eieio();
|
||||
__unlock_hpte(hptep, hpte[0]);
|
||||
@ -758,6 +791,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
|
||||
hpte_rpn(ptel, psize) == gfn) {
|
||||
hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
|
||||
kvmppc_invalidate_hpte(kvm, hptep, i);
|
||||
hptep[1] &= ~cpu_to_be64(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
|
||||
/* Harvest R and C */
|
||||
rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
|
||||
*rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
|
||||
@ -1165,7 +1199,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
|
||||
unsigned long *hpte, struct revmap_entry *revp,
|
||||
int want_valid, int first_pass)
|
||||
{
|
||||
unsigned long v, r;
|
||||
unsigned long v, r, hr;
|
||||
unsigned long rcbits_unset;
|
||||
int ok = 1;
|
||||
int valid, dirty;
|
||||
@ -1192,6 +1226,11 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
|
||||
while (!try_lock_hpte(hptp, HPTE_V_HVLOCK))
|
||||
cpu_relax();
|
||||
v = be64_to_cpu(hptp[0]);
|
||||
hr = be64_to_cpu(hptp[1]);
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||
v = hpte_new_to_old_v(v, hr);
|
||||
hr = hpte_new_to_old_r(hr);
|
||||
}
|
||||
|
||||
/* re-evaluate valid and dirty from synchronized HPTE value */
|
||||
valid = !!(v & HPTE_V_VALID);
|
||||
@ -1199,8 +1238,8 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
|
||||
|
||||
/* Harvest R and C into guest view if necessary */
|
||||
rcbits_unset = ~revp->guest_rpte & (HPTE_R_R | HPTE_R_C);
|
||||
if (valid && (rcbits_unset & be64_to_cpu(hptp[1]))) {
|
||||
revp->guest_rpte |= (be64_to_cpu(hptp[1]) &
|
||||
if (valid && (rcbits_unset & hr)) {
|
||||
revp->guest_rpte |= (hr &
|
||||
(HPTE_R_R | HPTE_R_C)) | HPTE_GR_MODIFIED;
|
||||
dirty = 1;
|
||||
}
|
||||
@ -1608,7 +1647,7 @@ static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
|
||||
return ret;
|
||||
}
|
||||
|
||||
ssize_t debugfs_htab_write(struct file *file, const char __user *buf,
|
||||
static ssize_t debugfs_htab_write(struct file *file, const char __user *buf,
|
||||
size_t len, loff_t *ppos)
|
||||
{
|
||||
return -EACCES;
|
||||
|
@ -39,7 +39,6 @@
|
||||
#include <asm/udbg.h>
|
||||
#include <asm/iommu.h>
|
||||
#include <asm/tce.h>
|
||||
#include <asm/iommu.h>
|
||||
|
||||
#define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64))
|
||||
|
||||
|
@ -54,6 +54,9 @@
|
||||
#include <asm/dbell.h>
|
||||
#include <asm/hmi.h>
|
||||
#include <asm/pnv-pci.h>
|
||||
#include <asm/mmu.h>
|
||||
#include <asm/opal.h>
|
||||
#include <asm/xics.h>
|
||||
#include <linux/gfp.h>
|
||||
#include <linux/vmalloc.h>
|
||||
#include <linux/highmem.h>
|
||||
@ -62,6 +65,7 @@
|
||||
#include <linux/irqbypass.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/compiler.h>
|
||||
#include <linux/of.h>
|
||||
|
||||
#include "book3s.h"
|
||||
|
||||
@ -104,23 +108,6 @@ module_param_cb(h_ipi_redirect, &module_param_ops, &h_ipi_redirect,
|
||||
MODULE_PARM_DESC(h_ipi_redirect, "Redirect H_IPI wakeup to a free host core");
|
||||
#endif
|
||||
|
||||
/* Maximum halt poll interval defaults to KVM_HALT_POLL_NS_DEFAULT */
|
||||
static unsigned int halt_poll_max_ns = KVM_HALT_POLL_NS_DEFAULT;
|
||||
module_param(halt_poll_max_ns, uint, S_IRUGO | S_IWUSR);
|
||||
MODULE_PARM_DESC(halt_poll_max_ns, "Maximum halt poll time in ns");
|
||||
|
||||
/* Factor by which the vcore halt poll interval is grown, default is to double
|
||||
*/
|
||||
static unsigned int halt_poll_ns_grow = 2;
|
||||
module_param(halt_poll_ns_grow, int, S_IRUGO);
|
||||
MODULE_PARM_DESC(halt_poll_ns_grow, "Factor halt poll time is grown by");
|
||||
|
||||
/* Factor by which the vcore halt poll interval is shrunk, default is to reset
|
||||
*/
|
||||
static unsigned int halt_poll_ns_shrink;
|
||||
module_param(halt_poll_ns_shrink, int, S_IRUGO);
|
||||
MODULE_PARM_DESC(halt_poll_ns_shrink, "Factor halt poll time is shrunk by");
|
||||
|
||||
static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
|
||||
static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
|
||||
|
||||
@ -146,12 +133,21 @@ static inline struct kvm_vcpu *next_runnable_thread(struct kvmppc_vcore *vc,
|
||||
|
||||
static bool kvmppc_ipi_thread(int cpu)
|
||||
{
|
||||
unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
|
||||
|
||||
/* On POWER9 we can use msgsnd to IPI any cpu */
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||
msg |= get_hard_smp_processor_id(cpu);
|
||||
smp_mb();
|
||||
__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
|
||||
return true;
|
||||
}
|
||||
|
||||
/* On POWER8 for IPIs to threads in the same core, use msgsnd */
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
|
||||
preempt_disable();
|
||||
if (cpu_first_thread_sibling(cpu) ==
|
||||
cpu_first_thread_sibling(smp_processor_id())) {
|
||||
unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
|
||||
msg |= cpu_thread_in_core(cpu);
|
||||
smp_mb();
|
||||
__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
|
||||
@ -162,8 +158,12 @@ static bool kvmppc_ipi_thread(int cpu)
|
||||
}
|
||||
|
||||
#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
|
||||
if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) {
|
||||
xics_wake_cpu(cpu);
|
||||
if (cpu >= 0 && cpu < nr_cpu_ids) {
|
||||
if (paca[cpu].kvm_hstate.xics_phys) {
|
||||
xics_wake_cpu(cpu);
|
||||
return true;
|
||||
}
|
||||
opal_int_set_mfrr(get_hard_smp_processor_id(cpu), IPI_PRIORITY);
|
||||
return true;
|
||||
}
|
||||
#endif
|
||||
@ -299,41 +299,54 @@ static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
|
||||
vcpu->arch.pvr = pvr;
|
||||
}
|
||||
|
||||
/* Dummy value used in computing PCR value below */
|
||||
#define PCR_ARCH_300 (PCR_ARCH_207 << 1)
|
||||
|
||||
static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
|
||||
{
|
||||
unsigned long pcr = 0;
|
||||
unsigned long host_pcr_bit = 0, guest_pcr_bit = 0;
|
||||
struct kvmppc_vcore *vc = vcpu->arch.vcore;
|
||||
|
||||
/* We can (emulate) our own architecture version and anything older */
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
host_pcr_bit = PCR_ARCH_300;
|
||||
else if (cpu_has_feature(CPU_FTR_ARCH_207S))
|
||||
host_pcr_bit = PCR_ARCH_207;
|
||||
else if (cpu_has_feature(CPU_FTR_ARCH_206))
|
||||
host_pcr_bit = PCR_ARCH_206;
|
||||
else
|
||||
host_pcr_bit = PCR_ARCH_205;
|
||||
|
||||
/* Determine lowest PCR bit needed to run guest in given PVR level */
|
||||
guest_pcr_bit = host_pcr_bit;
|
||||
if (arch_compat) {
|
||||
switch (arch_compat) {
|
||||
case PVR_ARCH_205:
|
||||
/*
|
||||
* If an arch bit is set in PCR, all the defined
|
||||
* higher-order arch bits also have to be set.
|
||||
*/
|
||||
pcr = PCR_ARCH_206 | PCR_ARCH_205;
|
||||
guest_pcr_bit = PCR_ARCH_205;
|
||||
break;
|
||||
case PVR_ARCH_206:
|
||||
case PVR_ARCH_206p:
|
||||
pcr = PCR_ARCH_206;
|
||||
guest_pcr_bit = PCR_ARCH_206;
|
||||
break;
|
||||
case PVR_ARCH_207:
|
||||
guest_pcr_bit = PCR_ARCH_207;
|
||||
break;
|
||||
case PVR_ARCH_300:
|
||||
guest_pcr_bit = PCR_ARCH_300;
|
||||
break;
|
||||
default:
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
if (!cpu_has_feature(CPU_FTR_ARCH_207S)) {
|
||||
/* POWER7 can't emulate POWER8 */
|
||||
if (!(pcr & PCR_ARCH_206))
|
||||
return -EINVAL;
|
||||
pcr &= ~PCR_ARCH_206;
|
||||
}
|
||||
}
|
||||
|
||||
/* Check requested PCR bits don't exceed our capabilities */
|
||||
if (guest_pcr_bit > host_pcr_bit)
|
||||
return -EINVAL;
|
||||
|
||||
spin_lock(&vc->lock);
|
||||
vc->arch_compat = arch_compat;
|
||||
vc->pcr = pcr;
|
||||
/* Set all PCR bits for which guest_pcr_bit <= bit < host_pcr_bit */
|
||||
vc->pcr = host_pcr_bit - guest_pcr_bit;
|
||||
spin_unlock(&vc->lock);
|
||||
|
||||
return 0;
|
||||
@ -945,6 +958,7 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
|
||||
break;
|
||||
case BOOK3S_INTERRUPT_EXTERNAL:
|
||||
case BOOK3S_INTERRUPT_H_DOORBELL:
|
||||
case BOOK3S_INTERRUPT_H_VIRT:
|
||||
vcpu->stat.ext_intr_exits++;
|
||||
r = RESUME_GUEST;
|
||||
break;
|
||||
@ -1229,6 +1243,12 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
|
||||
case KVM_REG_PPC_WORT:
|
||||
*val = get_reg_val(id, vcpu->arch.wort);
|
||||
break;
|
||||
case KVM_REG_PPC_TIDR:
|
||||
*val = get_reg_val(id, vcpu->arch.tid);
|
||||
break;
|
||||
case KVM_REG_PPC_PSSCR:
|
||||
*val = get_reg_val(id, vcpu->arch.psscr);
|
||||
break;
|
||||
case KVM_REG_PPC_VPA_ADDR:
|
||||
spin_lock(&vcpu->arch.vpa_update_lock);
|
||||
*val = get_reg_val(id, vcpu->arch.vpa.next_gpa);
|
||||
@ -1288,6 +1308,9 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
|
||||
case KVM_REG_PPC_TM_CR:
|
||||
*val = get_reg_val(id, vcpu->arch.cr_tm);
|
||||
break;
|
||||
case KVM_REG_PPC_TM_XER:
|
||||
*val = get_reg_val(id, vcpu->arch.xer_tm);
|
||||
break;
|
||||
case KVM_REG_PPC_TM_LR:
|
||||
*val = get_reg_val(id, vcpu->arch.lr_tm);
|
||||
break;
|
||||
@ -1427,6 +1450,12 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
|
||||
case KVM_REG_PPC_WORT:
|
||||
vcpu->arch.wort = set_reg_val(id, *val);
|
||||
break;
|
||||
case KVM_REG_PPC_TIDR:
|
||||
vcpu->arch.tid = set_reg_val(id, *val);
|
||||
break;
|
||||
case KVM_REG_PPC_PSSCR:
|
||||
vcpu->arch.psscr = set_reg_val(id, *val) & PSSCR_GUEST_VIS;
|
||||
break;
|
||||
case KVM_REG_PPC_VPA_ADDR:
|
||||
addr = set_reg_val(id, *val);
|
||||
r = -EINVAL;
|
||||
@ -1498,6 +1527,9 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
|
||||
case KVM_REG_PPC_TM_CR:
|
||||
vcpu->arch.cr_tm = set_reg_val(id, *val);
|
||||
break;
|
||||
case KVM_REG_PPC_TM_XER:
|
||||
vcpu->arch.xer_tm = set_reg_val(id, *val);
|
||||
break;
|
||||
case KVM_REG_PPC_TM_LR:
|
||||
vcpu->arch.lr_tm = set_reg_val(id, *val);
|
||||
break;
|
||||
@ -1540,6 +1572,20 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
|
||||
return r;
|
||||
}
|
||||
|
||||
/*
|
||||
* On POWER9, threads are independent and can be in different partitions.
|
||||
* Therefore we consider each thread to be a subcore.
|
||||
* There is a restriction that all threads have to be in the same
|
||||
* MMU mode (radix or HPT), unfortunately, but since we only support
|
||||
* HPT guests on a HPT host so far, that isn't an impediment yet.
|
||||
*/
|
||||
static int threads_per_vcore(void)
|
||||
{
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
return 1;
|
||||
return threads_per_subcore;
|
||||
}
|
||||
|
||||
static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
|
||||
{
|
||||
struct kvmppc_vcore *vcore;
|
||||
@ -1554,7 +1600,7 @@ static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
|
||||
init_swait_queue_head(&vcore->wq);
|
||||
vcore->preempt_tb = TB_NIL;
|
||||
vcore->lpcr = kvm->arch.lpcr;
|
||||
vcore->first_vcpuid = core * threads_per_subcore;
|
||||
vcore->first_vcpuid = core * threads_per_vcore();
|
||||
vcore->kvm = kvm;
|
||||
INIT_LIST_HEAD(&vcore->preempt_list);
|
||||
|
||||
@ -1717,7 +1763,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
|
||||
int core;
|
||||
struct kvmppc_vcore *vcore;
|
||||
|
||||
core = id / threads_per_subcore;
|
||||
core = id / threads_per_vcore();
|
||||
if (core >= KVM_MAX_VCORES)
|
||||
goto out;
|
||||
|
||||
@ -1935,7 +1981,10 @@ static void kvmppc_wait_for_nap(void)
|
||||
{
|
||||
int cpu = smp_processor_id();
|
||||
int i, loops;
|
||||
int n_threads = threads_per_vcore();
|
||||
|
||||
if (n_threads <= 1)
|
||||
return;
|
||||
for (loops = 0; loops < 1000000; ++loops) {
|
||||
/*
|
||||
* Check if all threads are finished.
|
||||
@ -1943,17 +1992,17 @@ static void kvmppc_wait_for_nap(void)
|
||||
* and the thread clears it when finished, so we look
|
||||
* for any threads that still have a non-NULL vcore ptr.
|
||||
*/
|
||||
for (i = 1; i < threads_per_subcore; ++i)
|
||||
for (i = 1; i < n_threads; ++i)
|
||||
if (paca[cpu + i].kvm_hstate.kvm_vcore)
|
||||
break;
|
||||
if (i == threads_per_subcore) {
|
||||
if (i == n_threads) {
|
||||
HMT_medium();
|
||||
return;
|
||||
}
|
||||
HMT_low();
|
||||
}
|
||||
HMT_medium();
|
||||
for (i = 1; i < threads_per_subcore; ++i)
|
||||
for (i = 1; i < n_threads; ++i)
|
||||
if (paca[cpu + i].kvm_hstate.kvm_vcore)
|
||||
pr_err("KVM: CPU %d seems to be stuck\n", cpu + i);
|
||||
}
|
||||
@ -2019,7 +2068,7 @@ static void kvmppc_vcore_preempt(struct kvmppc_vcore *vc)
|
||||
|
||||
vc->vcore_state = VCORE_PREEMPT;
|
||||
vc->pcpu = smp_processor_id();
|
||||
if (vc->num_threads < threads_per_subcore) {
|
||||
if (vc->num_threads < threads_per_vcore()) {
|
||||
spin_lock(&lp->lock);
|
||||
list_add_tail(&vc->preempt_list, &lp->list);
|
||||
spin_unlock(&lp->lock);
|
||||
@ -2123,8 +2172,7 @@ static bool can_dynamic_split(struct kvmppc_vcore *vc, struct core_info *cip)
|
||||
cip->subcore_threads[sub] = vc->num_threads;
|
||||
cip->subcore_vm[sub] = vc->kvm;
|
||||
init_master_vcore(vc);
|
||||
list_del(&vc->preempt_list);
|
||||
list_add_tail(&vc->preempt_list, &cip->vcs[sub]);
|
||||
list_move_tail(&vc->preempt_list, &cip->vcs[sub]);
|
||||
|
||||
return true;
|
||||
}
|
||||
@ -2309,6 +2357,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
|
||||
unsigned long cmd_bit, stat_bit;
|
||||
int pcpu, thr;
|
||||
int target_threads;
|
||||
int controlled_threads;
|
||||
|
||||
/*
|
||||
* Remove from the list any threads that have a signal pending
|
||||
@ -2326,12 +2375,19 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
|
||||
init_master_vcore(vc);
|
||||
vc->preempt_tb = TB_NIL;
|
||||
|
||||
/*
|
||||
* Number of threads that we will be controlling: the same as
|
||||
* the number of threads per subcore, except on POWER9,
|
||||
* where it's 1 because the threads are (mostly) independent.
|
||||
*/
|
||||
controlled_threads = threads_per_vcore();
|
||||
|
||||
/*
|
||||
* Make sure we are running on primary threads, and that secondary
|
||||
* threads are offline. Also check if the number of threads in this
|
||||
* guest are greater than the current system threads per guest.
|
||||
*/
|
||||
if ((threads_per_core > 1) &&
|
||||
if ((controlled_threads > 1) &&
|
||||
((vc->num_threads > threads_per_subcore) || !on_primary_thread())) {
|
||||
for_each_runnable_thread(i, vcpu, vc) {
|
||||
vcpu->arch.ret = -EBUSY;
|
||||
@ -2347,7 +2403,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
|
||||
*/
|
||||
init_core_info(&core_info, vc);
|
||||
pcpu = smp_processor_id();
|
||||
target_threads = threads_per_subcore;
|
||||
target_threads = controlled_threads;
|
||||
if (target_smt_mode && target_smt_mode < target_threads)
|
||||
target_threads = target_smt_mode;
|
||||
if (vc->num_threads < target_threads)
|
||||
@ -2383,7 +2439,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
|
||||
smp_wmb();
|
||||
}
|
||||
pcpu = smp_processor_id();
|
||||
for (thr = 0; thr < threads_per_subcore; ++thr)
|
||||
for (thr = 0; thr < controlled_threads; ++thr)
|
||||
paca[pcpu + thr].kvm_hstate.kvm_split_mode = sip;
|
||||
|
||||
/* Initiate micro-threading (split-core) if required */
|
||||
@ -2493,7 +2549,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
|
||||
}
|
||||
|
||||
/* Let secondaries go back to the offline loop */
|
||||
for (i = 0; i < threads_per_subcore; ++i) {
|
||||
for (i = 0; i < controlled_threads; ++i) {
|
||||
kvmppc_release_hwthread(pcpu + i);
|
||||
if (sip && sip->napped[i])
|
||||
kvmppc_ipi_thread(pcpu + i);
|
||||
@ -2545,9 +2601,6 @@ static void grow_halt_poll_ns(struct kvmppc_vcore *vc)
|
||||
vc->halt_poll_ns = 10000;
|
||||
else
|
||||
vc->halt_poll_ns *= halt_poll_ns_grow;
|
||||
|
||||
if (vc->halt_poll_ns > halt_poll_max_ns)
|
||||
vc->halt_poll_ns = halt_poll_max_ns;
|
||||
}
|
||||
|
||||
static void shrink_halt_poll_ns(struct kvmppc_vcore *vc)
|
||||
@ -2558,7 +2611,8 @@ static void shrink_halt_poll_ns(struct kvmppc_vcore *vc)
|
||||
vc->halt_poll_ns /= halt_poll_ns_shrink;
|
||||
}
|
||||
|
||||
/* Check to see if any of the runnable vcpus on the vcore have pending
|
||||
/*
|
||||
* Check to see if any of the runnable vcpus on the vcore have pending
|
||||
* exceptions or are no longer ceded
|
||||
*/
|
||||
static int kvmppc_vcore_check_block(struct kvmppc_vcore *vc)
|
||||
@ -2657,16 +2711,18 @@ out:
|
||||
}
|
||||
|
||||
/* Adjust poll time */
|
||||
if (halt_poll_max_ns) {
|
||||
if (halt_poll_ns) {
|
||||
if (block_ns <= vc->halt_poll_ns)
|
||||
;
|
||||
/* We slept and blocked for longer than the max halt time */
|
||||
else if (vc->halt_poll_ns && block_ns > halt_poll_max_ns)
|
||||
else if (vc->halt_poll_ns && block_ns > halt_poll_ns)
|
||||
shrink_halt_poll_ns(vc);
|
||||
/* We slept and our poll time is too small */
|
||||
else if (vc->halt_poll_ns < halt_poll_max_ns &&
|
||||
block_ns < halt_poll_max_ns)
|
||||
else if (vc->halt_poll_ns < halt_poll_ns &&
|
||||
block_ns < halt_poll_ns)
|
||||
grow_halt_poll_ns(vc);
|
||||
if (vc->halt_poll_ns > halt_poll_ns)
|
||||
vc->halt_poll_ns = halt_poll_ns;
|
||||
} else
|
||||
vc->halt_poll_ns = 0;
|
||||
|
||||
@ -2973,6 +3029,15 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
|
||||
struct kvm_memslots *slots;
|
||||
struct kvm_memory_slot *memslot;
|
||||
|
||||
/*
|
||||
* If we are making a new memslot, it might make
|
||||
* some address that was previously cached as emulated
|
||||
* MMIO be no longer emulated MMIO, so invalidate
|
||||
* all the caches of emulated MMIO translations.
|
||||
*/
|
||||
if (npages)
|
||||
atomic64_inc(&kvm->arch.mmio_update);
|
||||
|
||||
if (npages && old->npages) {
|
||||
/*
|
||||
* If modifying a memslot, reset all the rmap dirty bits.
|
||||
@ -3017,6 +3082,22 @@ static void kvmppc_mmu_destroy_hv(struct kvm_vcpu *vcpu)
|
||||
return;
|
||||
}
|
||||
|
||||
static void kvmppc_setup_partition_table(struct kvm *kvm)
|
||||
{
|
||||
unsigned long dw0, dw1;
|
||||
|
||||
/* PS field - page size for VRMA */
|
||||
dw0 = ((kvm->arch.vrma_slb_v & SLB_VSID_L) >> 1) |
|
||||
((kvm->arch.vrma_slb_v & SLB_VSID_LP) << 1);
|
||||
/* HTABSIZE and HTABORG fields */
|
||||
dw0 |= kvm->arch.sdr1;
|
||||
|
||||
/* Second dword has GR=0; other fields are unused since UPRT=0 */
|
||||
dw1 = 0;
|
||||
|
||||
mmu_partition_table_set_entry(kvm->arch.lpid, dw0, dw1);
|
||||
}
|
||||
|
||||
static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
int err = 0;
|
||||
@ -3068,17 +3149,20 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
|
||||
psize == 0x1000000))
|
||||
goto out_srcu;
|
||||
|
||||
/* Update VRMASD field in the LPCR */
|
||||
senc = slb_pgsize_encoding(psize);
|
||||
kvm->arch.vrma_slb_v = senc | SLB_VSID_B_1T |
|
||||
(VRMA_VSID << SLB_VSID_SHIFT_1T);
|
||||
/* the -4 is to account for senc values starting at 0x10 */
|
||||
lpcr = senc << (LPCR_VRMASD_SH - 4);
|
||||
|
||||
/* Create HPTEs in the hash page table for the VRMA */
|
||||
kvmppc_map_vrma(vcpu, memslot, porder);
|
||||
|
||||
kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
|
||||
/* Update VRMASD field in the LPCR */
|
||||
if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||
/* the -4 is to account for senc values starting at 0x10 */
|
||||
lpcr = senc << (LPCR_VRMASD_SH - 4);
|
||||
kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
|
||||
} else {
|
||||
kvmppc_setup_partition_table(kvm);
|
||||
}
|
||||
|
||||
/* Order updates to kvm->arch.lpcr etc. vs. hpte_setup_done */
|
||||
smp_wmb();
|
||||
@ -3193,14 +3277,18 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
|
||||
* Since we don't flush the TLB when tearing down a VM,
|
||||
* and this lpid might have previously been used,
|
||||
* make sure we flush on each core before running the new VM.
|
||||
* On POWER9, the tlbie in mmu_partition_table_set_entry()
|
||||
* does this flush for us.
|
||||
*/
|
||||
cpumask_setall(&kvm->arch.need_tlb_flush);
|
||||
if (!cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
cpumask_setall(&kvm->arch.need_tlb_flush);
|
||||
|
||||
/* Start out with the default set of hcalls enabled */
|
||||
memcpy(kvm->arch.enabled_hcalls, default_enabled_hcalls,
|
||||
sizeof(kvm->arch.enabled_hcalls));
|
||||
|
||||
kvm->arch.host_sdr1 = mfspr(SPRN_SDR1);
|
||||
if (!cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
kvm->arch.host_sdr1 = mfspr(SPRN_SDR1);
|
||||
|
||||
/* Init LPCR for virtual RMA mode */
|
||||
kvm->arch.host_lpid = mfspr(SPRN_LPID);
|
||||
@ -3213,8 +3301,28 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
|
||||
/* On POWER8 turn on online bit to enable PURR/SPURR */
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_207S))
|
||||
lpcr |= LPCR_ONL;
|
||||
/*
|
||||
* On POWER9, VPM0 bit is reserved (VPM0=1 behaviour is assumed)
|
||||
* Set HVICE bit to enable hypervisor virtualization interrupts.
|
||||
*/
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||
lpcr &= ~LPCR_VPM0;
|
||||
lpcr |= LPCR_HVICE;
|
||||
}
|
||||
|
||||
kvm->arch.lpcr = lpcr;
|
||||
|
||||
/*
|
||||
* Work out how many sets the TLB has, for the use of
|
||||
* the TLB invalidation loop in book3s_hv_rmhandlers.S.
|
||||
*/
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
kvm->arch.tlb_sets = POWER9_TLB_SETS_HASH; /* 256 */
|
||||
else if (cpu_has_feature(CPU_FTR_ARCH_207S))
|
||||
kvm->arch.tlb_sets = POWER8_TLB_SETS; /* 512 */
|
||||
else
|
||||
kvm->arch.tlb_sets = POWER7_TLB_SETS; /* 128 */
|
||||
|
||||
/*
|
||||
* Track that we now have a HV mode VM active. This blocks secondary
|
||||
* CPU threads from coming online.
|
||||
@ -3279,9 +3387,9 @@ static int kvmppc_core_check_processor_compat_hv(void)
|
||||
!cpu_has_feature(CPU_FTR_ARCH_206))
|
||||
return -EIO;
|
||||
/*
|
||||
* Disable KVM for Power9, untill the required bits merged.
|
||||
* Disable KVM for Power9 in radix mode.
|
||||
*/
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300) && radix_enabled())
|
||||
return -EIO;
|
||||
|
||||
return 0;
|
||||
@ -3635,6 +3743,23 @@ static int kvmppc_book3s_init_hv(void)
|
||||
if (r)
|
||||
return r;
|
||||
|
||||
/*
|
||||
* We need a way of accessing the XICS interrupt controller,
|
||||
* either directly, via paca[cpu].kvm_hstate.xics_phys, or
|
||||
* indirectly, via OPAL.
|
||||
*/
|
||||
#ifdef CONFIG_SMP
|
||||
if (!get_paca()->kvm_hstate.xics_phys) {
|
||||
struct device_node *np;
|
||||
|
||||
np = of_find_compatible_node(NULL, NULL, "ibm,opal-intc");
|
||||
if (!np) {
|
||||
pr_err("KVM-HV: Cannot determine method for accessing XICS\n");
|
||||
return -ENODEV;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
kvm_ops_hv.owner = THIS_MODULE;
|
||||
kvmppc_hv_ops = &kvm_ops_hv;
|
||||
|
||||
@ -3657,3 +3782,4 @@ module_exit(kvmppc_book3s_exit_hv);
|
||||
MODULE_LICENSE("GPL");
|
||||
MODULE_ALIAS_MISCDEV(KVM_MINOR);
|
||||
MODULE_ALIAS("devname:kvm");
|
||||
|
||||
|
@ -26,6 +26,8 @@
|
||||
#include <asm/dbell.h>
|
||||
#include <asm/cputhreads.h>
|
||||
#include <asm/io.h>
|
||||
#include <asm/opal.h>
|
||||
#include <asm/smp.h>
|
||||
|
||||
#define KVM_CMA_CHUNK_ORDER 18
|
||||
|
||||
@ -205,12 +207,18 @@ static inline void rm_writeb(unsigned long paddr, u8 val)
|
||||
void kvmhv_rm_send_ipi(int cpu)
|
||||
{
|
||||
unsigned long xics_phys;
|
||||
unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
|
||||
|
||||
/* On POWER8 for IPIs to threads in the same core, use msgsnd */
|
||||
/* On POWER9 we can use msgsnd for any destination cpu. */
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||
msg |= get_hard_smp_processor_id(cpu);
|
||||
__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
|
||||
return;
|
||||
}
|
||||
/* On POWER8 for IPIs to threads in the same core, use msgsnd. */
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
|
||||
cpu_first_thread_sibling(cpu) ==
|
||||
cpu_first_thread_sibling(raw_smp_processor_id())) {
|
||||
unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
|
||||
msg |= cpu_thread_in_core(cpu);
|
||||
__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
|
||||
return;
|
||||
@ -218,7 +226,11 @@ void kvmhv_rm_send_ipi(int cpu)
|
||||
|
||||
/* Else poke the target with an IPI */
|
||||
xics_phys = paca[cpu].kvm_hstate.xics_phys;
|
||||
rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
|
||||
if (xics_phys)
|
||||
rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
|
||||
else
|
||||
opal_rm_int_set_mfrr(get_hard_smp_processor_id(cpu),
|
||||
IPI_PRIORITY);
|
||||
}
|
||||
|
||||
/*
|
||||
@ -329,7 +341,7 @@ static struct kvmppc_irq_map *get_irqmap(struct kvmppc_passthru_irqmap *pimap,
|
||||
* saved a copy of the XIRR in the PACA, it will be picked up by
|
||||
* the host ICP driver.
|
||||
*/
|
||||
static int kvmppc_check_passthru(u32 xisr, __be32 xirr)
|
||||
static int kvmppc_check_passthru(u32 xisr, __be32 xirr, bool *again)
|
||||
{
|
||||
struct kvmppc_passthru_irqmap *pimap;
|
||||
struct kvmppc_irq_map *irq_map;
|
||||
@ -348,11 +360,11 @@ static int kvmppc_check_passthru(u32 xisr, __be32 xirr)
|
||||
/* We're handling this interrupt, generic code doesn't need to */
|
||||
local_paca->kvm_hstate.saved_xirr = 0;
|
||||
|
||||
return kvmppc_deliver_irq_passthru(vcpu, xirr, irq_map, pimap);
|
||||
return kvmppc_deliver_irq_passthru(vcpu, xirr, irq_map, pimap, again);
|
||||
}
|
||||
|
||||
#else
|
||||
static inline int kvmppc_check_passthru(u32 xisr, __be32 xirr)
|
||||
static inline int kvmppc_check_passthru(u32 xisr, __be32 xirr, bool *again)
|
||||
{
|
||||
return 1;
|
||||
}
|
||||
@ -367,14 +379,31 @@ static inline int kvmppc_check_passthru(u32 xisr, __be32 xirr)
|
||||
* -1 if there was a guest wakeup IPI (which has now been cleared)
|
||||
* -2 if there is PCI passthrough external interrupt that was handled
|
||||
*/
|
||||
static long kvmppc_read_one_intr(bool *again);
|
||||
|
||||
long kvmppc_read_intr(void)
|
||||
{
|
||||
long ret = 0;
|
||||
long rc;
|
||||
bool again;
|
||||
|
||||
do {
|
||||
again = false;
|
||||
rc = kvmppc_read_one_intr(&again);
|
||||
if (rc && (ret == 0 || rc > ret))
|
||||
ret = rc;
|
||||
} while (again);
|
||||
return ret;
|
||||
}
|
||||
|
||||
static long kvmppc_read_one_intr(bool *again)
|
||||
{
|
||||
unsigned long xics_phys;
|
||||
u32 h_xirr;
|
||||
__be32 xirr;
|
||||
u32 xisr;
|
||||
u8 host_ipi;
|
||||
int64_t rc;
|
||||
|
||||
/* see if a host IPI is pending */
|
||||
host_ipi = local_paca->kvm_hstate.host_ipi;
|
||||
@ -383,8 +412,14 @@ long kvmppc_read_intr(void)
|
||||
|
||||
/* Now read the interrupt from the ICP */
|
||||
xics_phys = local_paca->kvm_hstate.xics_phys;
|
||||
if (unlikely(!xics_phys))
|
||||
return 1;
|
||||
if (!xics_phys) {
|
||||
/* Use OPAL to read the XIRR */
|
||||
rc = opal_rm_int_get_xirr(&xirr, false);
|
||||
if (rc < 0)
|
||||
return 1;
|
||||
} else {
|
||||
xirr = _lwzcix(xics_phys + XICS_XIRR);
|
||||
}
|
||||
|
||||
/*
|
||||
* Save XIRR for later. Since we get control in reverse endian
|
||||
@ -392,7 +427,6 @@ long kvmppc_read_intr(void)
|
||||
* host endian. Note that xirr is the value read from the
|
||||
* XIRR register, while h_xirr is the host endian version.
|
||||
*/
|
||||
xirr = _lwzcix(xics_phys + XICS_XIRR);
|
||||
h_xirr = be32_to_cpu(xirr);
|
||||
local_paca->kvm_hstate.saved_xirr = h_xirr;
|
||||
xisr = h_xirr & 0xffffff;
|
||||
@ -411,8 +445,16 @@ long kvmppc_read_intr(void)
|
||||
* If it is an IPI, clear the MFRR and EOI it.
|
||||
*/
|
||||
if (xisr == XICS_IPI) {
|
||||
_stbcix(xics_phys + XICS_MFRR, 0xff);
|
||||
_stwcix(xics_phys + XICS_XIRR, xirr);
|
||||
if (xics_phys) {
|
||||
_stbcix(xics_phys + XICS_MFRR, 0xff);
|
||||
_stwcix(xics_phys + XICS_XIRR, xirr);
|
||||
} else {
|
||||
opal_rm_int_set_mfrr(hard_smp_processor_id(), 0xff);
|
||||
rc = opal_rm_int_eoi(h_xirr);
|
||||
/* If rc > 0, there is another interrupt pending */
|
||||
*again = rc > 0;
|
||||
}
|
||||
|
||||
/*
|
||||
* Need to ensure side effects of above stores
|
||||
* complete before proceeding.
|
||||
@ -429,7 +471,11 @@ long kvmppc_read_intr(void)
|
||||
/* We raced with the host,
|
||||
* we need to resend that IPI, bummer
|
||||
*/
|
||||
_stbcix(xics_phys + XICS_MFRR, IPI_PRIORITY);
|
||||
if (xics_phys)
|
||||
_stbcix(xics_phys + XICS_MFRR, IPI_PRIORITY);
|
||||
else
|
||||
opal_rm_int_set_mfrr(hard_smp_processor_id(),
|
||||
IPI_PRIORITY);
|
||||
/* Let side effects complete */
|
||||
smp_mb();
|
||||
return 1;
|
||||
@ -440,5 +486,5 @@ long kvmppc_read_intr(void)
|
||||
return -1;
|
||||
}
|
||||
|
||||
return kvmppc_check_passthru(xisr, xirr);
|
||||
return kvmppc_check_passthru(xisr, xirr, again);
|
||||
}
|
||||
|
@ -16,6 +16,7 @@
|
||||
#include <asm/machdep.h>
|
||||
#include <asm/cputhreads.h>
|
||||
#include <asm/hmi.h>
|
||||
#include <asm/kvm_ppc.h>
|
||||
|
||||
/* SRR1 bits for machine check on POWER7 */
|
||||
#define SRR1_MC_LDSTERR (1ul << (63-42))
|
||||
|
@ -264,8 +264,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
|
||||
|
||||
if (pa)
|
||||
pteh |= HPTE_V_VALID;
|
||||
else
|
||||
else {
|
||||
pteh |= HPTE_V_ABSENT;
|
||||
ptel &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
|
||||
}
|
||||
|
||||
/*If we had host pte mapping then Check WIMG */
|
||||
if (ptep && !hpte_cache_flags_ok(ptel, is_ci)) {
|
||||
@ -351,6 +353,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
|
||||
/* inval in progress, write a non-present HPTE */
|
||||
pteh |= HPTE_V_ABSENT;
|
||||
pteh &= ~HPTE_V_VALID;
|
||||
ptel &= ~(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
|
||||
unlock_rmap(rmap);
|
||||
} else {
|
||||
kvmppc_add_revmap_chain(kvm, rev, rmap, pte_index,
|
||||
@ -361,6 +364,11 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
|
||||
}
|
||||
}
|
||||
|
||||
/* Convert to new format on P9 */
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||
ptel = hpte_old_to_new_r(pteh, ptel);
|
||||
pteh = hpte_old_to_new_v(pteh);
|
||||
}
|
||||
hpte[1] = cpu_to_be64(ptel);
|
||||
|
||||
/* Write the first HPTE dword, unlocking the HPTE and making it valid */
|
||||
@ -386,6 +394,13 @@ long kvmppc_h_enter(struct kvm_vcpu *vcpu, unsigned long flags,
|
||||
#define LOCK_TOKEN (*(u32 *)(&get_paca()->paca_index))
|
||||
#endif
|
||||
|
||||
static inline int is_mmio_hpte(unsigned long v, unsigned long r)
|
||||
{
|
||||
return ((v & HPTE_V_ABSENT) &&
|
||||
(r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) ==
|
||||
(HPTE_R_KEY_HI | HPTE_R_KEY_LO));
|
||||
}
|
||||
|
||||
static inline int try_lock_tlbie(unsigned int *lock)
|
||||
{
|
||||
unsigned int tmp, old;
|
||||
@ -409,13 +424,18 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
|
||||
{
|
||||
long i;
|
||||
|
||||
/*
|
||||
* We use the POWER9 5-operand versions of tlbie and tlbiel here.
|
||||
* Since we are using RIC=0 PRS=0 R=0, and P7/P8 tlbiel ignores
|
||||
* the RS field, this is backwards-compatible with P7 and P8.
|
||||
*/
|
||||
if (global) {
|
||||
while (!try_lock_tlbie(&kvm->arch.tlbie_lock))
|
||||
cpu_relax();
|
||||
if (need_sync)
|
||||
asm volatile("ptesync" : : : "memory");
|
||||
for (i = 0; i < npages; ++i)
|
||||
asm volatile(PPC_TLBIE(%1,%0) : :
|
||||
asm volatile(PPC_TLBIE_5(%0,%1,0,0,0) : :
|
||||
"r" (rbvalues[i]), "r" (kvm->arch.lpid));
|
||||
asm volatile("eieio; tlbsync; ptesync" : : : "memory");
|
||||
kvm->arch.tlbie_lock = 0;
|
||||
@ -423,7 +443,8 @@ static void do_tlbies(struct kvm *kvm, unsigned long *rbvalues,
|
||||
if (need_sync)
|
||||
asm volatile("ptesync" : : : "memory");
|
||||
for (i = 0; i < npages; ++i)
|
||||
asm volatile("tlbiel %0" : : "r" (rbvalues[i]));
|
||||
asm volatile(PPC_TLBIEL(%0,%1,0,0,0) : :
|
||||
"r" (rbvalues[i]), "r" (0));
|
||||
asm volatile("ptesync" : : : "memory");
|
||||
}
|
||||
}
|
||||
@ -435,18 +456,23 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
|
||||
__be64 *hpte;
|
||||
unsigned long v, r, rb;
|
||||
struct revmap_entry *rev;
|
||||
u64 pte;
|
||||
u64 pte, orig_pte, pte_r;
|
||||
|
||||
if (pte_index >= kvm->arch.hpt_npte)
|
||||
return H_PARAMETER;
|
||||
hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
|
||||
while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
|
||||
cpu_relax();
|
||||
pte = be64_to_cpu(hpte[0]);
|
||||
pte = orig_pte = be64_to_cpu(hpte[0]);
|
||||
pte_r = be64_to_cpu(hpte[1]);
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||
pte = hpte_new_to_old_v(pte, pte_r);
|
||||
pte_r = hpte_new_to_old_r(pte_r);
|
||||
}
|
||||
if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
|
||||
((flags & H_AVPN) && (pte & ~0x7fUL) != avpn) ||
|
||||
((flags & H_ANDCOND) && (pte & avpn) != 0)) {
|
||||
__unlock_hpte(hpte, pte);
|
||||
__unlock_hpte(hpte, orig_pte);
|
||||
return H_NOT_FOUND;
|
||||
}
|
||||
|
||||
@ -454,7 +480,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
|
||||
v = pte & ~HPTE_V_HVLOCK;
|
||||
if (v & HPTE_V_VALID) {
|
||||
hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
|
||||
rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
|
||||
rb = compute_tlbie_rb(v, pte_r, pte_index);
|
||||
do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
|
||||
/*
|
||||
* The reference (R) and change (C) bits in a HPT
|
||||
@ -472,6 +498,9 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
|
||||
note_hpte_modification(kvm, rev);
|
||||
unlock_hpte(hpte, 0);
|
||||
|
||||
if (is_mmio_hpte(v, pte_r))
|
||||
atomic64_inc(&kvm->arch.mmio_update);
|
||||
|
||||
if (v & HPTE_V_ABSENT)
|
||||
v = (v & ~HPTE_V_ABSENT) | HPTE_V_VALID;
|
||||
hpret[0] = v;
|
||||
@ -498,7 +527,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
|
||||
int global;
|
||||
long int ret = H_SUCCESS;
|
||||
struct revmap_entry *rev, *revs[4];
|
||||
u64 hp0;
|
||||
u64 hp0, hp1;
|
||||
|
||||
global = global_invalidates(kvm, 0);
|
||||
for (i = 0; i < 4 && ret == H_SUCCESS; ) {
|
||||
@ -531,6 +560,11 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
|
||||
}
|
||||
found = 0;
|
||||
hp0 = be64_to_cpu(hp[0]);
|
||||
hp1 = be64_to_cpu(hp[1]);
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||
hp0 = hpte_new_to_old_v(hp0, hp1);
|
||||
hp1 = hpte_new_to_old_r(hp1);
|
||||
}
|
||||
if (hp0 & (HPTE_V_ABSENT | HPTE_V_VALID)) {
|
||||
switch (flags & 3) {
|
||||
case 0: /* absolute */
|
||||
@ -561,13 +595,14 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
|
||||
rcbits = rev->guest_rpte & (HPTE_R_R|HPTE_R_C);
|
||||
args[j] |= rcbits << (56 - 5);
|
||||
hp[0] = 0;
|
||||
if (is_mmio_hpte(hp0, hp1))
|
||||
atomic64_inc(&kvm->arch.mmio_update);
|
||||
continue;
|
||||
}
|
||||
|
||||
/* leave it locked */
|
||||
hp[0] &= ~cpu_to_be64(HPTE_V_VALID);
|
||||
tlbrb[n] = compute_tlbie_rb(be64_to_cpu(hp[0]),
|
||||
be64_to_cpu(hp[1]), pte_index);
|
||||
tlbrb[n] = compute_tlbie_rb(hp0, hp1, pte_index);
|
||||
indexes[n] = j;
|
||||
hptes[n] = hp;
|
||||
revs[n] = rev;
|
||||
@ -605,7 +640,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
|
||||
__be64 *hpte;
|
||||
struct revmap_entry *rev;
|
||||
unsigned long v, r, rb, mask, bits;
|
||||
u64 pte;
|
||||
u64 pte_v, pte_r;
|
||||
|
||||
if (pte_index >= kvm->arch.hpt_npte)
|
||||
return H_PARAMETER;
|
||||
@ -613,14 +648,16 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
|
||||
hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
|
||||
while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
|
||||
cpu_relax();
|
||||
pte = be64_to_cpu(hpte[0]);
|
||||
if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
|
||||
((flags & H_AVPN) && (pte & ~0x7fUL) != avpn)) {
|
||||
__unlock_hpte(hpte, pte);
|
||||
v = pte_v = be64_to_cpu(hpte[0]);
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
v = hpte_new_to_old_v(v, be64_to_cpu(hpte[1]));
|
||||
if ((v & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
|
||||
((flags & H_AVPN) && (v & ~0x7fUL) != avpn)) {
|
||||
__unlock_hpte(hpte, pte_v);
|
||||
return H_NOT_FOUND;
|
||||
}
|
||||
|
||||
v = pte;
|
||||
pte_r = be64_to_cpu(hpte[1]);
|
||||
bits = (flags << 55) & HPTE_R_PP0;
|
||||
bits |= (flags << 48) & HPTE_R_KEY_HI;
|
||||
bits |= flags & (HPTE_R_PP | HPTE_R_N | HPTE_R_KEY_LO);
|
||||
@ -642,22 +679,26 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
|
||||
* readonly to writable. If it should be writable, we'll
|
||||
* take a trap and let the page fault code sort it out.
|
||||
*/
|
||||
pte = be64_to_cpu(hpte[1]);
|
||||
r = (pte & ~mask) | bits;
|
||||
if (hpte_is_writable(r) && !hpte_is_writable(pte))
|
||||
r = (pte_r & ~mask) | bits;
|
||||
if (hpte_is_writable(r) && !hpte_is_writable(pte_r))
|
||||
r = hpte_make_readonly(r);
|
||||
/* If the PTE is changing, invalidate it first */
|
||||
if (r != pte) {
|
||||
if (r != pte_r) {
|
||||
rb = compute_tlbie_rb(v, r, pte_index);
|
||||
hpte[0] = cpu_to_be64((v & ~HPTE_V_VALID) |
|
||||
hpte[0] = cpu_to_be64((pte_v & ~HPTE_V_VALID) |
|
||||
HPTE_V_ABSENT);
|
||||
do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags),
|
||||
true);
|
||||
/* Don't lose R/C bit updates done by hardware */
|
||||
r |= be64_to_cpu(hpte[1]) & (HPTE_R_R | HPTE_R_C);
|
||||
hpte[1] = cpu_to_be64(r);
|
||||
}
|
||||
}
|
||||
unlock_hpte(hpte, v & ~HPTE_V_HVLOCK);
|
||||
unlock_hpte(hpte, pte_v & ~HPTE_V_HVLOCK);
|
||||
asm volatile("ptesync" : : : "memory");
|
||||
if (is_mmio_hpte(v, pte_r))
|
||||
atomic64_inc(&kvm->arch.mmio_update);
|
||||
|
||||
return H_SUCCESS;
|
||||
}
|
||||
|
||||
@ -681,6 +722,10 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
|
||||
hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
|
||||
v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
|
||||
r = be64_to_cpu(hpte[1]);
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||
v = hpte_new_to_old_v(v, r);
|
||||
r = hpte_new_to_old_r(r);
|
||||
}
|
||||
if (v & HPTE_V_ABSENT) {
|
||||
v &= ~HPTE_V_ABSENT;
|
||||
v |= HPTE_V_VALID;
|
||||
@ -798,10 +843,16 @@ void kvmppc_invalidate_hpte(struct kvm *kvm, __be64 *hptep,
|
||||
unsigned long pte_index)
|
||||
{
|
||||
unsigned long rb;
|
||||
u64 hp0, hp1;
|
||||
|
||||
hptep[0] &= ~cpu_to_be64(HPTE_V_VALID);
|
||||
rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
|
||||
pte_index);
|
||||
hp0 = be64_to_cpu(hptep[0]);
|
||||
hp1 = be64_to_cpu(hptep[1]);
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||
hp0 = hpte_new_to_old_v(hp0, hp1);
|
||||
hp1 = hpte_new_to_old_r(hp1);
|
||||
}
|
||||
rb = compute_tlbie_rb(hp0, hp1, pte_index);
|
||||
do_tlbies(kvm, &rb, 1, 1, true);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvmppc_invalidate_hpte);
|
||||
@ -811,9 +862,15 @@ void kvmppc_clear_ref_hpte(struct kvm *kvm, __be64 *hptep,
|
||||
{
|
||||
unsigned long rb;
|
||||
unsigned char rbyte;
|
||||
u64 hp0, hp1;
|
||||
|
||||
rb = compute_tlbie_rb(be64_to_cpu(hptep[0]), be64_to_cpu(hptep[1]),
|
||||
pte_index);
|
||||
hp0 = be64_to_cpu(hptep[0]);
|
||||
hp1 = be64_to_cpu(hptep[1]);
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||
hp0 = hpte_new_to_old_v(hp0, hp1);
|
||||
hp1 = hpte_new_to_old_r(hp1);
|
||||
}
|
||||
rb = compute_tlbie_rb(hp0, hp1, pte_index);
|
||||
rbyte = (be64_to_cpu(hptep[1]) & ~HPTE_R_R) >> 8;
|
||||
/* modify only the second-last byte, which contains the ref bit */
|
||||
*((char *)hptep + 14) = rbyte;
|
||||
@ -828,6 +885,37 @@ static int slb_base_page_shift[4] = {
|
||||
20, /* 1M, unsupported */
|
||||
};
|
||||
|
||||
static struct mmio_hpte_cache_entry *mmio_cache_search(struct kvm_vcpu *vcpu,
|
||||
unsigned long eaddr, unsigned long slb_v, long mmio_update)
|
||||
{
|
||||
struct mmio_hpte_cache_entry *entry = NULL;
|
||||
unsigned int pshift;
|
||||
unsigned int i;
|
||||
|
||||
for (i = 0; i < MMIO_HPTE_CACHE_SIZE; i++) {
|
||||
entry = &vcpu->arch.mmio_cache.entry[i];
|
||||
if (entry->mmio_update == mmio_update) {
|
||||
pshift = entry->slb_base_pshift;
|
||||
if ((entry->eaddr >> pshift) == (eaddr >> pshift) &&
|
||||
entry->slb_v == slb_v)
|
||||
return entry;
|
||||
}
|
||||
}
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static struct mmio_hpte_cache_entry *
|
||||
next_mmio_cache_entry(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
unsigned int index = vcpu->arch.mmio_cache.index;
|
||||
|
||||
vcpu->arch.mmio_cache.index++;
|
||||
if (vcpu->arch.mmio_cache.index == MMIO_HPTE_CACHE_SIZE)
|
||||
vcpu->arch.mmio_cache.index = 0;
|
||||
|
||||
return &vcpu->arch.mmio_cache.entry[index];
|
||||
}
|
||||
|
||||
/* When called from virtmode, this func should be protected by
|
||||
* preempt_disable(), otherwise, the holding of HPTE_V_HVLOCK
|
||||
* can trigger deadlock issue.
|
||||
@ -842,7 +930,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
|
||||
unsigned long avpn;
|
||||
__be64 *hpte;
|
||||
unsigned long mask, val;
|
||||
unsigned long v, r;
|
||||
unsigned long v, r, orig_v;
|
||||
|
||||
/* Get page shift, work out hash and AVPN etc. */
|
||||
mask = SLB_VSID_B | HPTE_V_AVPN | HPTE_V_SECONDARY;
|
||||
@ -877,6 +965,8 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
|
||||
for (i = 0; i < 16; i += 2) {
|
||||
/* Read the PTE racily */
|
||||
v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
v = hpte_new_to_old_v(v, be64_to_cpu(hpte[i+1]));
|
||||
|
||||
/* Check valid/absent, hash, segment size and AVPN */
|
||||
if (!(v & valid) || (v & mask) != val)
|
||||
@ -885,8 +975,12 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
|
||||
/* Lock the PTE and read it under the lock */
|
||||
while (!try_lock_hpte(&hpte[i], HPTE_V_HVLOCK))
|
||||
cpu_relax();
|
||||
v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
|
||||
v = orig_v = be64_to_cpu(hpte[i]) & ~HPTE_V_HVLOCK;
|
||||
r = be64_to_cpu(hpte[i+1]);
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||
v = hpte_new_to_old_v(v, r);
|
||||
r = hpte_new_to_old_r(r);
|
||||
}
|
||||
|
||||
/*
|
||||
* Check the HPTE again, including base page size
|
||||
@ -896,7 +990,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
|
||||
/* Return with the HPTE still locked */
|
||||
return (hash << 3) + (i >> 1);
|
||||
|
||||
__unlock_hpte(&hpte[i], v);
|
||||
__unlock_hpte(&hpte[i], orig_v);
|
||||
}
|
||||
|
||||
if (val & HPTE_V_SECONDARY)
|
||||
@ -924,30 +1018,45 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
|
||||
{
|
||||
struct kvm *kvm = vcpu->kvm;
|
||||
long int index;
|
||||
unsigned long v, r, gr;
|
||||
unsigned long v, r, gr, orig_v;
|
||||
__be64 *hpte;
|
||||
unsigned long valid;
|
||||
struct revmap_entry *rev;
|
||||
unsigned long pp, key;
|
||||
struct mmio_hpte_cache_entry *cache_entry = NULL;
|
||||
long mmio_update = 0;
|
||||
|
||||
/* For protection fault, expect to find a valid HPTE */
|
||||
valid = HPTE_V_VALID;
|
||||
if (status & DSISR_NOHPTE)
|
||||
if (status & DSISR_NOHPTE) {
|
||||
valid |= HPTE_V_ABSENT;
|
||||
|
||||
index = kvmppc_hv_find_lock_hpte(kvm, addr, slb_v, valid);
|
||||
if (index < 0) {
|
||||
if (status & DSISR_NOHPTE)
|
||||
return status; /* there really was no HPTE */
|
||||
return 0; /* for prot fault, HPTE disappeared */
|
||||
mmio_update = atomic64_read(&kvm->arch.mmio_update);
|
||||
cache_entry = mmio_cache_search(vcpu, addr, slb_v, mmio_update);
|
||||
}
|
||||
hpte = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
|
||||
v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
|
||||
r = be64_to_cpu(hpte[1]);
|
||||
rev = real_vmalloc_addr(&kvm->arch.revmap[index]);
|
||||
gr = rev->guest_rpte;
|
||||
if (cache_entry) {
|
||||
index = cache_entry->pte_index;
|
||||
v = cache_entry->hpte_v;
|
||||
r = cache_entry->hpte_r;
|
||||
gr = cache_entry->rpte;
|
||||
} else {
|
||||
index = kvmppc_hv_find_lock_hpte(kvm, addr, slb_v, valid);
|
||||
if (index < 0) {
|
||||
if (status & DSISR_NOHPTE)
|
||||
return status; /* there really was no HPTE */
|
||||
return 0; /* for prot fault, HPTE disappeared */
|
||||
}
|
||||
hpte = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
|
||||
v = orig_v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
|
||||
r = be64_to_cpu(hpte[1]);
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||
v = hpte_new_to_old_v(v, r);
|
||||
r = hpte_new_to_old_r(r);
|
||||
}
|
||||
rev = real_vmalloc_addr(&kvm->arch.revmap[index]);
|
||||
gr = rev->guest_rpte;
|
||||
|
||||
unlock_hpte(hpte, v);
|
||||
unlock_hpte(hpte, orig_v);
|
||||
}
|
||||
|
||||
/* For not found, if the HPTE is valid by now, retry the instruction */
|
||||
if ((status & DSISR_NOHPTE) && (v & HPTE_V_VALID))
|
||||
@ -985,12 +1094,32 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
|
||||
vcpu->arch.pgfault_index = index;
|
||||
vcpu->arch.pgfault_hpte[0] = v;
|
||||
vcpu->arch.pgfault_hpte[1] = r;
|
||||
vcpu->arch.pgfault_cache = cache_entry;
|
||||
|
||||
/* Check the storage key to see if it is possibly emulated MMIO */
|
||||
if (data && (vcpu->arch.shregs.msr & MSR_IR) &&
|
||||
(r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) ==
|
||||
(HPTE_R_KEY_HI | HPTE_R_KEY_LO))
|
||||
return -2; /* MMIO emulation - load instr word */
|
||||
if ((r & (HPTE_R_KEY_HI | HPTE_R_KEY_LO)) ==
|
||||
(HPTE_R_KEY_HI | HPTE_R_KEY_LO)) {
|
||||
if (!cache_entry) {
|
||||
unsigned int pshift = 12;
|
||||
unsigned int pshift_index;
|
||||
|
||||
if (slb_v & SLB_VSID_L) {
|
||||
pshift_index = ((slb_v & SLB_VSID_LP) >> 4);
|
||||
pshift = slb_base_page_shift[pshift_index];
|
||||
}
|
||||
cache_entry = next_mmio_cache_entry(vcpu);
|
||||
cache_entry->eaddr = addr;
|
||||
cache_entry->slb_base_pshift = pshift;
|
||||
cache_entry->pte_index = index;
|
||||
cache_entry->hpte_v = v;
|
||||
cache_entry->hpte_r = r;
|
||||
cache_entry->rpte = gr;
|
||||
cache_entry->slb_v = slb_v;
|
||||
cache_entry->mmio_update = mmio_update;
|
||||
}
|
||||
if (data && (vcpu->arch.shregs.msr & MSR_IR))
|
||||
return -2; /* MMIO emulation - load instr word */
|
||||
}
|
||||
|
||||
return -1; /* send fault up to host kernel mode */
|
||||
}
|
||||
|
@ -70,7 +70,11 @@ static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu)
|
||||
hcpu = hcore << threads_shift;
|
||||
kvmppc_host_rm_ops_hv->rm_core[hcore].rm_data = vcpu;
|
||||
smp_muxed_ipi_set_message(hcpu, PPC_MSG_RM_HOST_ACTION);
|
||||
icp_native_cause_ipi_rm(hcpu);
|
||||
if (paca[hcpu].kvm_hstate.xics_phys)
|
||||
icp_native_cause_ipi_rm(hcpu);
|
||||
else
|
||||
opal_rm_int_set_mfrr(get_hard_smp_processor_id(hcpu),
|
||||
IPI_PRIORITY);
|
||||
}
|
||||
#else
|
||||
static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu) { }
|
||||
@ -737,7 +741,7 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
|
||||
|
||||
unsigned long eoi_rc;
|
||||
|
||||
static void icp_eoi(struct irq_chip *c, u32 hwirq, u32 xirr)
|
||||
static void icp_eoi(struct irq_chip *c, u32 hwirq, __be32 xirr, bool *again)
|
||||
{
|
||||
unsigned long xics_phys;
|
||||
int64_t rc;
|
||||
@ -751,7 +755,12 @@ static void icp_eoi(struct irq_chip *c, u32 hwirq, u32 xirr)
|
||||
|
||||
/* EOI it */
|
||||
xics_phys = local_paca->kvm_hstate.xics_phys;
|
||||
_stwcix(xics_phys + XICS_XIRR, xirr);
|
||||
if (xics_phys) {
|
||||
_stwcix(xics_phys + XICS_XIRR, xirr);
|
||||
} else {
|
||||
rc = opal_rm_int_eoi(be32_to_cpu(xirr));
|
||||
*again = rc > 0;
|
||||
}
|
||||
}
|
||||
|
||||
static int xics_opal_rm_set_server(unsigned int hw_irq, int server_cpu)
|
||||
@ -809,9 +818,10 @@ static void kvmppc_rm_handle_irq_desc(struct irq_desc *desc)
|
||||
}
|
||||
|
||||
long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu,
|
||||
u32 xirr,
|
||||
__be32 xirr,
|
||||
struct kvmppc_irq_map *irq_map,
|
||||
struct kvmppc_passthru_irqmap *pimap)
|
||||
struct kvmppc_passthru_irqmap *pimap,
|
||||
bool *again)
|
||||
{
|
||||
struct kvmppc_xics *xics;
|
||||
struct kvmppc_icp *icp;
|
||||
@ -825,7 +835,8 @@ long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu,
|
||||
icp_rm_deliver_irq(xics, icp, irq);
|
||||
|
||||
/* EOI the interrupt */
|
||||
icp_eoi(irq_desc_get_chip(irq_map->desc), irq_map->r_hwirq, xirr);
|
||||
icp_eoi(irq_desc_get_chip(irq_map->desc), irq_map->r_hwirq, xirr,
|
||||
again);
|
||||
|
||||
if (check_too_hard(xics, icp) == H_TOO_HARD)
|
||||
return 2;
|
||||
|
@ -501,17 +501,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
|
||||
cmpwi r0, 0
|
||||
beq 57f
|
||||
li r3, (LPCR_PECEDH | LPCR_PECE0) >> 4
|
||||
mfspr r4, SPRN_LPCR
|
||||
rlwimi r4, r3, 4, (LPCR_PECEDP | LPCR_PECEDH | LPCR_PECE0 | LPCR_PECE1)
|
||||
mtspr SPRN_LPCR, r4
|
||||
isync
|
||||
std r0, HSTATE_SCRATCH0(r13)
|
||||
ptesync
|
||||
ld r0, HSTATE_SCRATCH0(r13)
|
||||
1: cmpd r0, r0
|
||||
bne 1b
|
||||
nap
|
||||
b .
|
||||
mfspr r5, SPRN_LPCR
|
||||
rlwimi r5, r3, 4, (LPCR_PECEDP | LPCR_PECEDH | LPCR_PECE0 | LPCR_PECE1)
|
||||
b kvm_nap_sequence
|
||||
|
||||
57: li r0, 0
|
||||
stbx r0, r3, r4
|
||||
@ -523,6 +515,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
|
||||
* *
|
||||
*****************************************************************************/
|
||||
|
||||
/* Stack frame offsets */
|
||||
#define STACK_SLOT_TID (112-16)
|
||||
#define STACK_SLOT_PSSCR (112-24)
|
||||
|
||||
.global kvmppc_hv_entry
|
||||
kvmppc_hv_entry:
|
||||
|
||||
@ -581,12 +577,14 @@ kvmppc_hv_entry:
|
||||
ld r9,VCORE_KVM(r5) /* pointer to struct kvm */
|
||||
cmpwi r6,0
|
||||
bne 10f
|
||||
ld r6,KVM_SDR1(r9)
|
||||
lwz r7,KVM_LPID(r9)
|
||||
BEGIN_FTR_SECTION
|
||||
ld r6,KVM_SDR1(r9)
|
||||
li r0,LPID_RSVD /* switch to reserved LPID */
|
||||
mtspr SPRN_LPID,r0
|
||||
ptesync
|
||||
mtspr SPRN_SDR1,r6 /* switch to partition page table */
|
||||
END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
|
||||
mtspr SPRN_LPID,r7
|
||||
isync
|
||||
|
||||
@ -607,12 +605,8 @@ kvmppc_hv_entry:
|
||||
stdcx. r7,0,r6
|
||||
bne 23b
|
||||
/* Flush the TLB of any entries for this LPID */
|
||||
/* use arch 2.07S as a proxy for POWER8 */
|
||||
BEGIN_FTR_SECTION
|
||||
li r6,512 /* POWER8 has 512 sets */
|
||||
FTR_SECTION_ELSE
|
||||
li r6,128 /* POWER7 has 128 sets */
|
||||
ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_207S)
|
||||
lwz r6,KVM_TLB_SETS(r9)
|
||||
li r0,0 /* RS for P9 version of tlbiel */
|
||||
mtctr r6
|
||||
li r7,0x800 /* IS field = 0b10 */
|
||||
ptesync
|
||||
@ -698,6 +692,14 @@ kvmppc_got_guest:
|
||||
mtspr SPRN_PURR,r7
|
||||
mtspr SPRN_SPURR,r8
|
||||
|
||||
/* Save host values of some registers */
|
||||
BEGIN_FTR_SECTION
|
||||
mfspr r5, SPRN_TIDR
|
||||
mfspr r6, SPRN_PSSCR
|
||||
std r5, STACK_SLOT_TID(r1)
|
||||
std r6, STACK_SLOT_PSSCR(r1)
|
||||
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
|
||||
|
||||
BEGIN_FTR_SECTION
|
||||
/* Set partition DABR */
|
||||
/* Do this before re-enabling PMU to avoid P7 DABR corruption bug */
|
||||
@ -750,14 +752,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_PMAO_BUG)
|
||||
BEGIN_FTR_SECTION
|
||||
ld r5, VCPU_MMCR + 24(r4)
|
||||
ld r6, VCPU_SIER(r4)
|
||||
mtspr SPRN_MMCR2, r5
|
||||
mtspr SPRN_SIER, r6
|
||||
BEGIN_FTR_SECTION_NESTED(96)
|
||||
lwz r7, VCPU_PMC + 24(r4)
|
||||
lwz r8, VCPU_PMC + 28(r4)
|
||||
ld r9, VCPU_MMCR + 32(r4)
|
||||
mtspr SPRN_MMCR2, r5
|
||||
mtspr SPRN_SIER, r6
|
||||
mtspr SPRN_SPMC1, r7
|
||||
mtspr SPRN_SPMC2, r8
|
||||
mtspr SPRN_MMCRS, r9
|
||||
END_FTR_SECTION_NESTED(CPU_FTR_ARCH_300, 0, 96)
|
||||
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
|
||||
mtspr SPRN_MMCR0, r3
|
||||
isync
|
||||
@ -813,20 +817,30 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
|
||||
mtspr SPRN_EBBHR, r8
|
||||
ld r5, VCPU_EBBRR(r4)
|
||||
ld r6, VCPU_BESCR(r4)
|
||||
ld r7, VCPU_CSIGR(r4)
|
||||
ld r8, VCPU_TACR(r4)
|
||||
mtspr SPRN_EBBRR, r5
|
||||
mtspr SPRN_BESCR, r6
|
||||
mtspr SPRN_CSIGR, r7
|
||||
mtspr SPRN_TACR, r8
|
||||
ld r5, VCPU_TCSCR(r4)
|
||||
ld r6, VCPU_ACOP(r4)
|
||||
lwz r7, VCPU_GUEST_PID(r4)
|
||||
ld r8, VCPU_WORT(r4)
|
||||
mtspr SPRN_TCSCR, r5
|
||||
mtspr SPRN_ACOP, r6
|
||||
mtspr SPRN_EBBRR, r5
|
||||
mtspr SPRN_BESCR, r6
|
||||
mtspr SPRN_PID, r7
|
||||
mtspr SPRN_WORT, r8
|
||||
BEGIN_FTR_SECTION
|
||||
/* POWER8-only registers */
|
||||
ld r5, VCPU_TCSCR(r4)
|
||||
ld r6, VCPU_ACOP(r4)
|
||||
ld r7, VCPU_CSIGR(r4)
|
||||
ld r8, VCPU_TACR(r4)
|
||||
mtspr SPRN_TCSCR, r5
|
||||
mtspr SPRN_ACOP, r6
|
||||
mtspr SPRN_CSIGR, r7
|
||||
mtspr SPRN_TACR, r8
|
||||
FTR_SECTION_ELSE
|
||||
/* POWER9-only registers */
|
||||
ld r5, VCPU_TID(r4)
|
||||
ld r6, VCPU_PSSCR(r4)
|
||||
oris r6, r6, PSSCR_EC@h /* This makes stop trap to HV */
|
||||
mtspr SPRN_TIDR, r5
|
||||
mtspr SPRN_PSSCR, r6
|
||||
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
|
||||
8:
|
||||
|
||||
/*
|
||||
@ -1341,20 +1355,29 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
|
||||
std r8, VCPU_EBBHR(r9)
|
||||
mfspr r5, SPRN_EBBRR
|
||||
mfspr r6, SPRN_BESCR
|
||||
mfspr r7, SPRN_CSIGR
|
||||
mfspr r8, SPRN_TACR
|
||||
std r5, VCPU_EBBRR(r9)
|
||||
std r6, VCPU_BESCR(r9)
|
||||
std r7, VCPU_CSIGR(r9)
|
||||
std r8, VCPU_TACR(r9)
|
||||
mfspr r5, SPRN_TCSCR
|
||||
mfspr r6, SPRN_ACOP
|
||||
mfspr r7, SPRN_PID
|
||||
mfspr r8, SPRN_WORT
|
||||
std r5, VCPU_TCSCR(r9)
|
||||
std r6, VCPU_ACOP(r9)
|
||||
std r5, VCPU_EBBRR(r9)
|
||||
std r6, VCPU_BESCR(r9)
|
||||
stw r7, VCPU_GUEST_PID(r9)
|
||||
std r8, VCPU_WORT(r9)
|
||||
BEGIN_FTR_SECTION
|
||||
mfspr r5, SPRN_TCSCR
|
||||
mfspr r6, SPRN_ACOP
|
||||
mfspr r7, SPRN_CSIGR
|
||||
mfspr r8, SPRN_TACR
|
||||
std r5, VCPU_TCSCR(r9)
|
||||
std r6, VCPU_ACOP(r9)
|
||||
std r7, VCPU_CSIGR(r9)
|
||||
std r8, VCPU_TACR(r9)
|
||||
FTR_SECTION_ELSE
|
||||
mfspr r5, SPRN_TIDR
|
||||
mfspr r6, SPRN_PSSCR
|
||||
std r5, VCPU_TID(r9)
|
||||
rldicl r6, r6, 4, 50 /* r6 &= PSSCR_GUEST_VIS */
|
||||
rotldi r6, r6, 60
|
||||
std r6, VCPU_PSSCR(r9)
|
||||
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
|
||||
/*
|
||||
* Restore various registers to 0, where non-zero values
|
||||
* set by the guest could disrupt the host.
|
||||
@ -1363,12 +1386,14 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
|
||||
mtspr SPRN_IAMR, r0
|
||||
mtspr SPRN_CIABR, r0
|
||||
mtspr SPRN_DAWRX, r0
|
||||
mtspr SPRN_TCSCR, r0
|
||||
mtspr SPRN_WORT, r0
|
||||
BEGIN_FTR_SECTION
|
||||
mtspr SPRN_TCSCR, r0
|
||||
/* Set MMCRS to 1<<31 to freeze and disable the SPMC counters */
|
||||
li r0, 1
|
||||
sldi r0, r0, 31
|
||||
mtspr SPRN_MMCRS, r0
|
||||
END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
|
||||
8:
|
||||
|
||||
/* Save and reset AMR and UAMOR before turning on the MMU */
|
||||
@ -1502,15 +1527,17 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
|
||||
stw r8, VCPU_PMC + 20(r9)
|
||||
BEGIN_FTR_SECTION
|
||||
mfspr r5, SPRN_SIER
|
||||
std r5, VCPU_SIER(r9)
|
||||
BEGIN_FTR_SECTION_NESTED(96)
|
||||
mfspr r6, SPRN_SPMC1
|
||||
mfspr r7, SPRN_SPMC2
|
||||
mfspr r8, SPRN_MMCRS
|
||||
std r5, VCPU_SIER(r9)
|
||||
stw r6, VCPU_PMC + 24(r9)
|
||||
stw r7, VCPU_PMC + 28(r9)
|
||||
std r8, VCPU_MMCR + 32(r9)
|
||||
lis r4, 0x8000
|
||||
mtspr SPRN_MMCRS, r4
|
||||
END_FTR_SECTION_NESTED(CPU_FTR_ARCH_300, 0, 96)
|
||||
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
|
||||
22:
|
||||
/* Clear out SLB */
|
||||
@ -1519,6 +1546,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
|
||||
slbia
|
||||
ptesync
|
||||
|
||||
/* Restore host values of some registers */
|
||||
BEGIN_FTR_SECTION
|
||||
ld r5, STACK_SLOT_TID(r1)
|
||||
ld r6, STACK_SLOT_PSSCR(r1)
|
||||
mtspr SPRN_TIDR, r5
|
||||
mtspr SPRN_PSSCR, r6
|
||||
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
|
||||
|
||||
/*
|
||||
* POWER7/POWER8 guest -> host partition switch code.
|
||||
* We don't have to lock against tlbies but we do
|
||||
@ -1552,12 +1587,14 @@ kvmhv_switch_to_host:
|
||||
beq 19f
|
||||
|
||||
/* Primary thread switches back to host partition */
|
||||
ld r6,KVM_HOST_SDR1(r4)
|
||||
lwz r7,KVM_HOST_LPID(r4)
|
||||
BEGIN_FTR_SECTION
|
||||
ld r6,KVM_HOST_SDR1(r4)
|
||||
li r8,LPID_RSVD /* switch to reserved LPID */
|
||||
mtspr SPRN_LPID,r8
|
||||
ptesync
|
||||
mtspr SPRN_SDR1,r6 /* switch to partition page table */
|
||||
mtspr SPRN_SDR1,r6 /* switch to host page table */
|
||||
END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
|
||||
mtspr SPRN_LPID,r7
|
||||
isync
|
||||
|
||||
@ -2211,6 +2248,21 @@ BEGIN_FTR_SECTION
|
||||
ori r5, r5, LPCR_PECEDH
|
||||
rlwimi r5, r3, 0, LPCR_PECEDP
|
||||
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
|
||||
|
||||
kvm_nap_sequence: /* desired LPCR value in r5 */
|
||||
BEGIN_FTR_SECTION
|
||||
/*
|
||||
* PSSCR bits: exit criterion = 1 (wakeup based on LPCR at sreset)
|
||||
* enable state loss = 1 (allow SMT mode switch)
|
||||
* requested level = 0 (just stop dispatching)
|
||||
*/
|
||||
lis r3, (PSSCR_EC | PSSCR_ESL)@h
|
||||
mtspr SPRN_PSSCR, r3
|
||||
/* Set LPCR_PECE_HVEE bit to enable wakeup by HV interrupts */
|
||||
li r4, LPCR_PECE_HVEE@higher
|
||||
sldi r4, r4, 32
|
||||
or r5, r5, r4
|
||||
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
|
||||
mtspr SPRN_LPCR,r5
|
||||
isync
|
||||
li r0, 0
|
||||
@ -2219,7 +2271,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
|
||||
ld r0, HSTATE_SCRATCH0(r13)
|
||||
1: cmpd r0, r0
|
||||
bne 1b
|
||||
BEGIN_FTR_SECTION
|
||||
nap
|
||||
FTR_SECTION_ELSE
|
||||
PPC_STOP
|
||||
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
|
||||
b .
|
||||
|
||||
33: mr r4, r3
|
||||
@ -2600,11 +2656,13 @@ kvmppc_save_tm:
|
||||
mfctr r7
|
||||
mfspr r8, SPRN_AMR
|
||||
mfspr r10, SPRN_TAR
|
||||
mfxer r11
|
||||
std r5, VCPU_LR_TM(r9)
|
||||
stw r6, VCPU_CR_TM(r9)
|
||||
std r7, VCPU_CTR_TM(r9)
|
||||
std r8, VCPU_AMR_TM(r9)
|
||||
std r10, VCPU_TAR_TM(r9)
|
||||
std r11, VCPU_XER_TM(r9)
|
||||
|
||||
/* Restore r12 as trap number. */
|
||||
lwz r12, VCPU_TRAP(r9)
|
||||
@ -2697,11 +2755,13 @@ kvmppc_restore_tm:
|
||||
ld r7, VCPU_CTR_TM(r4)
|
||||
ld r8, VCPU_AMR_TM(r4)
|
||||
ld r9, VCPU_TAR_TM(r4)
|
||||
ld r10, VCPU_XER_TM(r4)
|
||||
mtlr r5
|
||||
mtcr r6
|
||||
mtctr r7
|
||||
mtspr SPRN_AMR, r8
|
||||
mtspr SPRN_TAR, r9
|
||||
mtxer r10
|
||||
|
||||
/*
|
||||
* Load up PPR and DSCR values but don't put them in the actual SPRs
|
||||
|
@ -536,7 +536,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
|
||||
#ifdef CONFIG_PPC_BOOK3S_64
|
||||
case KVM_CAP_SPAPR_TCE:
|
||||
case KVM_CAP_SPAPR_TCE_64:
|
||||
case KVM_CAP_PPC_ALLOC_HTAB:
|
||||
case KVM_CAP_PPC_RTAS:
|
||||
case KVM_CAP_PPC_FIXUP_HCALL:
|
||||
case KVM_CAP_PPC_ENABLE_HCALL:
|
||||
@ -545,13 +544,20 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
|
||||
#endif
|
||||
r = 1;
|
||||
break;
|
||||
|
||||
case KVM_CAP_PPC_ALLOC_HTAB:
|
||||
r = hv_enabled;
|
||||
break;
|
||||
#endif /* CONFIG_PPC_BOOK3S_64 */
|
||||
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
|
||||
case KVM_CAP_PPC_SMT:
|
||||
if (hv_enabled)
|
||||
r = threads_per_subcore;
|
||||
else
|
||||
r = 0;
|
||||
r = 0;
|
||||
if (hv_enabled) {
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
r = 1;
|
||||
else
|
||||
r = threads_per_subcore;
|
||||
}
|
||||
break;
|
||||
case KVM_CAP_PPC_RMA:
|
||||
r = 0;
|
||||
|
@ -449,7 +449,7 @@ TRACE_EVENT(kvmppc_vcore_wakeup,
|
||||
__entry->tgid = current->tgid;
|
||||
),
|
||||
|
||||
TP_printk("%s time %lld ns, tgid=%d",
|
||||
TP_printk("%s time %llu ns, tgid=%d",
|
||||
__entry->waited ? "wait" : "poll",
|
||||
__entry->ns, __entry->tgid)
|
||||
);
|
||||
|
@ -221,13 +221,18 @@ static long native_hpte_insert(unsigned long hpte_group, unsigned long vpn,
|
||||
return -1;
|
||||
|
||||
hpte_v = hpte_encode_v(vpn, psize, apsize, ssize) | vflags | HPTE_V_VALID;
|
||||
hpte_r = hpte_encode_r(pa, psize, apsize, ssize) | rflags;
|
||||
hpte_r = hpte_encode_r(pa, psize, apsize) | rflags;
|
||||
|
||||
if (!(vflags & HPTE_V_BOLTED)) {
|
||||
DBG_LOW(" i=%x hpte_v=%016lx, hpte_r=%016lx\n",
|
||||
i, hpte_v, hpte_r);
|
||||
}
|
||||
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||
hpte_r = hpte_old_to_new_r(hpte_v, hpte_r);
|
||||
hpte_v = hpte_old_to_new_v(hpte_v);
|
||||
}
|
||||
|
||||
hptep->r = cpu_to_be64(hpte_r);
|
||||
/* Guarantee the second dword is visible before the valid bit */
|
||||
eieio();
|
||||
@ -295,6 +300,8 @@ static long native_hpte_updatepp(unsigned long slot, unsigned long newpp,
|
||||
vpn, want_v & HPTE_V_AVPN, slot, newpp);
|
||||
|
||||
hpte_v = be64_to_cpu(hptep->v);
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
hpte_v = hpte_new_to_old_v(hpte_v, be64_to_cpu(hptep->r));
|
||||
/*
|
||||
* We need to invalidate the TLB always because hpte_remove doesn't do
|
||||
* a tlb invalidate. If a hash bucket gets full, we "evict" a more/less
|
||||
@ -309,6 +316,8 @@ static long native_hpte_updatepp(unsigned long slot, unsigned long newpp,
|
||||
native_lock_hpte(hptep);
|
||||
/* recheck with locks held */
|
||||
hpte_v = be64_to_cpu(hptep->v);
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
hpte_v = hpte_new_to_old_v(hpte_v, be64_to_cpu(hptep->r));
|
||||
if (unlikely(!HPTE_V_COMPARE(hpte_v, want_v) ||
|
||||
!(hpte_v & HPTE_V_VALID))) {
|
||||
ret = -1;
|
||||
@ -350,6 +359,8 @@ static long native_hpte_find(unsigned long vpn, int psize, int ssize)
|
||||
for (i = 0; i < HPTES_PER_GROUP; i++) {
|
||||
hptep = htab_address + slot;
|
||||
hpte_v = be64_to_cpu(hptep->v);
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
hpte_v = hpte_new_to_old_v(hpte_v, be64_to_cpu(hptep->r));
|
||||
|
||||
if (HPTE_V_COMPARE(hpte_v, want_v) && (hpte_v & HPTE_V_VALID))
|
||||
/* HPTE matches */
|
||||
@ -409,6 +420,8 @@ static void native_hpte_invalidate(unsigned long slot, unsigned long vpn,
|
||||
want_v = hpte_encode_avpn(vpn, bpsize, ssize);
|
||||
native_lock_hpte(hptep);
|
||||
hpte_v = be64_to_cpu(hptep->v);
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
hpte_v = hpte_new_to_old_v(hpte_v, be64_to_cpu(hptep->r));
|
||||
|
||||
/*
|
||||
* We need to invalidate the TLB always because hpte_remove doesn't do
|
||||
@ -467,6 +480,8 @@ static void native_hugepage_invalidate(unsigned long vsid,
|
||||
want_v = hpte_encode_avpn(vpn, psize, ssize);
|
||||
native_lock_hpte(hptep);
|
||||
hpte_v = be64_to_cpu(hptep->v);
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
hpte_v = hpte_new_to_old_v(hpte_v, be64_to_cpu(hptep->r));
|
||||
|
||||
/* Even if we miss, we need to invalidate the TLB */
|
||||
if (!HPTE_V_COMPARE(hpte_v, want_v) || !(hpte_v & HPTE_V_VALID))
|
||||
@ -504,6 +519,10 @@ static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
|
||||
/* Look at the 8 bit LP value */
|
||||
unsigned int lp = (hpte_r >> LP_SHIFT) & ((1 << LP_BITS) - 1);
|
||||
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
|
||||
hpte_v = hpte_new_to_old_v(hpte_v, hpte_r);
|
||||
hpte_r = hpte_new_to_old_r(hpte_r);
|
||||
}
|
||||
if (!(hpte_v & HPTE_V_LARGE)) {
|
||||
size = MMU_PAGE_4K;
|
||||
a_size = MMU_PAGE_4K;
|
||||
@ -512,11 +531,7 @@ static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
|
||||
a_size = hpte_page_sizes[lp] >> 4;
|
||||
}
|
||||
/* This works for all page sizes, and for 256M and 1T segments */
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
*ssize = hpte_r >> HPTE_R_3_0_SSIZE_SHIFT;
|
||||
else
|
||||
*ssize = hpte_v >> HPTE_V_SSIZE_SHIFT;
|
||||
|
||||
*ssize = hpte_v >> HPTE_V_SSIZE_SHIFT;
|
||||
shift = mmu_psize_defs[size].shift;
|
||||
|
||||
avpn = (HPTE_V_AVPN_VAL(hpte_v) & ~mmu_psize_defs[size].avpnm);
|
||||
@ -639,6 +654,9 @@ static void native_flush_hash_range(unsigned long number, int local)
|
||||
want_v = hpte_encode_avpn(vpn, psize, ssize);
|
||||
native_lock_hpte(hptep);
|
||||
hpte_v = be64_to_cpu(hptep->v);
|
||||
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
||||
hpte_v = hpte_new_to_old_v(hpte_v,
|
||||
be64_to_cpu(hptep->r));
|
||||
if (!HPTE_V_COMPARE(hpte_v, want_v) ||
|
||||
!(hpte_v & HPTE_V_VALID))
|
||||
native_unlock_hpte(hptep);
|
||||
|
@ -796,37 +796,17 @@ static void update_hid_for_hash(void)
|
||||
static void __init hash_init_partition_table(phys_addr_t hash_table,
|
||||
unsigned long htab_size)
|
||||
{
|
||||
unsigned long ps_field;
|
||||
unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
|
||||
mmu_partition_table_init();
|
||||
|
||||
/*
|
||||
* slb llp encoding for the page size used in VPM real mode.
|
||||
* We can ignore that for lpid 0
|
||||
* PS field (VRMA page size) is not used for LPID 0, hence set to 0.
|
||||
* For now, UPRT is 0 and we have no segment table.
|
||||
*/
|
||||
ps_field = 0;
|
||||
htab_size = __ilog2(htab_size) - 18;
|
||||
|
||||
BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
|
||||
partition_tb = __va(memblock_alloc_base(patb_size, patb_size,
|
||||
MEMBLOCK_ALLOC_ANYWHERE));
|
||||
|
||||
/* Initialize the Partition Table with no entries */
|
||||
memset((void *)partition_tb, 0, patb_size);
|
||||
partition_tb->patb0 = cpu_to_be64(ps_field | hash_table | htab_size);
|
||||
/*
|
||||
* FIXME!! This should be done via update_partition table
|
||||
* For now UPRT is 0 for us.
|
||||
*/
|
||||
partition_tb->patb1 = 0;
|
||||
mmu_partition_table_set_entry(0, hash_table | htab_size, 0);
|
||||
pr_info("Partition table %p\n", partition_tb);
|
||||
if (cpu_has_feature(CPU_FTR_POWER9_DD1))
|
||||
update_hid_for_hash();
|
||||
/*
|
||||
* update partition table control register,
|
||||
* 64 K size.
|
||||
*/
|
||||
mtspr(SPRN_PTCR, __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
|
||||
|
||||
}
|
||||
|
||||
static void __init htab_initialize(void)
|
||||
|
@ -177,23 +177,15 @@ redo:
|
||||
|
||||
static void __init radix_init_partition_table(void)
|
||||
{
|
||||
unsigned long rts_field;
|
||||
unsigned long rts_field, dw0;
|
||||
|
||||
mmu_partition_table_init();
|
||||
rts_field = radix__get_tree_size();
|
||||
dw0 = rts_field | __pa(init_mm.pgd) | RADIX_PGD_INDEX_SIZE | PATB_HR;
|
||||
mmu_partition_table_set_entry(0, dw0, 0);
|
||||
|
||||
BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 24), "Partition table size too large.");
|
||||
partition_tb = early_alloc_pgtable(1UL << PATB_SIZE_SHIFT);
|
||||
partition_tb->patb0 = cpu_to_be64(rts_field | __pa(init_mm.pgd) |
|
||||
RADIX_PGD_INDEX_SIZE | PATB_HR);
|
||||
pr_info("Initializing Radix MMU\n");
|
||||
pr_info("Partition table %p\n", partition_tb);
|
||||
|
||||
memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
|
||||
/*
|
||||
* update partition table control register,
|
||||
* 64 K size.
|
||||
*/
|
||||
mtspr(SPRN_PTCR, __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
|
||||
}
|
||||
|
||||
void __init radix_init_native(void)
|
||||
@ -378,6 +370,8 @@ void __init radix__early_init_mmu(void)
|
||||
radix_init_partition_table();
|
||||
}
|
||||
|
||||
memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
|
||||
|
||||
radix_init_pgtable();
|
||||
}
|
||||
|
||||
|
@ -431,3 +431,37 @@ void pgtable_free_tlb(struct mmu_gather *tlb, void *table, int shift)
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_PPC_BOOK3S_64
|
||||
void __init mmu_partition_table_init(void)
|
||||
{
|
||||
unsigned long patb_size = 1UL << PATB_SIZE_SHIFT;
|
||||
|
||||
BUILD_BUG_ON_MSG((PATB_SIZE_SHIFT > 36), "Partition table size too large.");
|
||||
partition_tb = __va(memblock_alloc_base(patb_size, patb_size,
|
||||
MEMBLOCK_ALLOC_ANYWHERE));
|
||||
|
||||
/* Initialize the Partition Table with no entries */
|
||||
memset((void *)partition_tb, 0, patb_size);
|
||||
|
||||
/*
|
||||
* update partition table control register,
|
||||
* 64 K size.
|
||||
*/
|
||||
mtspr(SPRN_PTCR, __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
|
||||
}
|
||||
|
||||
void mmu_partition_table_set_entry(unsigned int lpid, unsigned long dw0,
|
||||
unsigned long dw1)
|
||||
{
|
||||
partition_tb[lpid].patb0 = cpu_to_be64(dw0);
|
||||
partition_tb[lpid].patb1 = cpu_to_be64(dw1);
|
||||
|
||||
/* Global flush of TLBs and partition table caches for this lpid */
|
||||
asm volatile("ptesync" : : : "memory");
|
||||
asm volatile(PPC_TLBIE_5(%0,%1,2,0,0) : :
|
||||
"r" (TLBIEL_INVAL_SET_LPID), "r" (lpid));
|
||||
asm volatile("eieio; tlbsync; ptesync" : : : "memory");
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(mmu_partition_table_set_entry);
|
||||
#endif /* CONFIG_PPC_BOOK3S_64 */
|
||||
|
@ -304,8 +304,11 @@ OPAL_CALL(opal_pci_get_presence_state, OPAL_PCI_GET_PRESENCE_STATE);
|
||||
OPAL_CALL(opal_pci_get_power_state, OPAL_PCI_GET_POWER_STATE);
|
||||
OPAL_CALL(opal_pci_set_power_state, OPAL_PCI_SET_POWER_STATE);
|
||||
OPAL_CALL(opal_int_get_xirr, OPAL_INT_GET_XIRR);
|
||||
OPAL_CALL_REAL(opal_rm_int_get_xirr, OPAL_INT_GET_XIRR);
|
||||
OPAL_CALL(opal_int_set_cppr, OPAL_INT_SET_CPPR);
|
||||
OPAL_CALL(opal_int_eoi, OPAL_INT_EOI);
|
||||
OPAL_CALL_REAL(opal_rm_int_eoi, OPAL_INT_EOI);
|
||||
OPAL_CALL(opal_int_set_mfrr, OPAL_INT_SET_MFRR);
|
||||
OPAL_CALL_REAL(opal_rm_int_set_mfrr, OPAL_INT_SET_MFRR);
|
||||
OPAL_CALL(opal_pci_tce_kill, OPAL_PCI_TCE_KILL);
|
||||
OPAL_CALL_REAL(opal_rm_pci_tce_kill, OPAL_PCI_TCE_KILL);
|
||||
|
@ -896,3 +896,5 @@ EXPORT_SYMBOL_GPL(opal_leds_get_ind);
|
||||
EXPORT_SYMBOL_GPL(opal_leds_set_ind);
|
||||
/* Export this symbol for PowerNV Operator Panel class driver */
|
||||
EXPORT_SYMBOL_GPL(opal_write_oppanel_async);
|
||||
/* Export this for KVM */
|
||||
EXPORT_SYMBOL_GPL(opal_int_set_mfrr);
|
||||
|
@ -63,7 +63,7 @@ static long ps3_hpte_insert(unsigned long hpte_group, unsigned long vpn,
|
||||
vflags &= ~HPTE_V_SECONDARY;
|
||||
|
||||
hpte_v = hpte_encode_v(vpn, psize, apsize, ssize) | vflags | HPTE_V_VALID;
|
||||
hpte_r = hpte_encode_r(ps3_mm_phys_to_lpar(pa), psize, apsize, ssize) | rflags;
|
||||
hpte_r = hpte_encode_r(ps3_mm_phys_to_lpar(pa), psize, apsize) | rflags;
|
||||
|
||||
spin_lock_irqsave(&ps3_htab_lock, flags);
|
||||
|
||||
|
@ -145,7 +145,7 @@ static long pSeries_lpar_hpte_insert(unsigned long hpte_group,
|
||||
hpte_group, vpn, pa, rflags, vflags, psize);
|
||||
|
||||
hpte_v = hpte_encode_v(vpn, psize, apsize, ssize) | vflags | HPTE_V_VALID;
|
||||
hpte_r = hpte_encode_r(pa, psize, apsize, ssize) | rflags;
|
||||
hpte_r = hpte_encode_r(pa, psize, apsize) | rflags;
|
||||
|
||||
if (!(vflags & HPTE_V_BOLTED))
|
||||
pr_devel(" hpte_v=%016lx, hpte_r=%016lx\n", hpte_v, hpte_r);
|
||||
|
@ -415,7 +415,7 @@ static int __write_machine_check(struct kvm_vcpu *vcpu,
|
||||
int rc;
|
||||
|
||||
mci.val = mchk->mcic;
|
||||
/* take care of lazy register loading via vcpu load/put */
|
||||
/* take care of lazy register loading */
|
||||
save_fpu_regs();
|
||||
save_access_regs(vcpu->run->s.regs.acrs);
|
||||
|
||||
|
@ -1812,22 +1812,7 @@ __u64 kvm_s390_get_cpu_timer(struct kvm_vcpu *vcpu)
|
||||
|
||||
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
|
||||
{
|
||||
/* Save host register state */
|
||||
save_fpu_regs();
|
||||
vcpu->arch.host_fpregs.fpc = current->thread.fpu.fpc;
|
||||
vcpu->arch.host_fpregs.regs = current->thread.fpu.regs;
|
||||
|
||||
if (MACHINE_HAS_VX)
|
||||
current->thread.fpu.regs = vcpu->run->s.regs.vrs;
|
||||
else
|
||||
current->thread.fpu.regs = vcpu->run->s.regs.fprs;
|
||||
current->thread.fpu.fpc = vcpu->run->s.regs.fpc;
|
||||
if (test_fp_ctl(current->thread.fpu.fpc))
|
||||
/* User space provided an invalid FPC, let's clear it */
|
||||
current->thread.fpu.fpc = 0;
|
||||
|
||||
save_access_regs(vcpu->arch.host_acrs);
|
||||
restore_access_regs(vcpu->run->s.regs.acrs);
|
||||
gmap_enable(vcpu->arch.enabled_gmap);
|
||||
atomic_or(CPUSTAT_RUNNING, &vcpu->arch.sie_block->cpuflags);
|
||||
if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu))
|
||||
@ -1844,16 +1829,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
|
||||
vcpu->arch.enabled_gmap = gmap_get_enabled();
|
||||
gmap_disable(vcpu->arch.enabled_gmap);
|
||||
|
||||
/* Save guest register state */
|
||||
save_fpu_regs();
|
||||
vcpu->run->s.regs.fpc = current->thread.fpu.fpc;
|
||||
|
||||
/* Restore host register state */
|
||||
current->thread.fpu.fpc = vcpu->arch.host_fpregs.fpc;
|
||||
current->thread.fpu.regs = vcpu->arch.host_fpregs.regs;
|
||||
|
||||
save_access_regs(vcpu->run->s.regs.acrs);
|
||||
restore_access_regs(vcpu->arch.host_acrs);
|
||||
}
|
||||
|
||||
static void kvm_s390_vcpu_initial_reset(struct kvm_vcpu *vcpu)
|
||||
@ -2243,7 +2218,6 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
|
||||
{
|
||||
memcpy(&vcpu->run->s.regs.acrs, &sregs->acrs, sizeof(sregs->acrs));
|
||||
memcpy(&vcpu->arch.sie_block->gcr, &sregs->crs, sizeof(sregs->crs));
|
||||
restore_access_regs(vcpu->run->s.regs.acrs);
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -2257,11 +2231,9 @@ int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu,
|
||||
|
||||
int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
|
||||
{
|
||||
/* make sure the new values will be lazily loaded */
|
||||
save_fpu_regs();
|
||||
if (test_fp_ctl(fpu->fpc))
|
||||
return -EINVAL;
|
||||
current->thread.fpu.fpc = fpu->fpc;
|
||||
vcpu->run->s.regs.fpc = fpu->fpc;
|
||||
if (MACHINE_HAS_VX)
|
||||
convert_fp_to_vx((__vector128 *) vcpu->run->s.regs.vrs,
|
||||
(freg_t *) fpu->fprs);
|
||||
@ -2279,7 +2251,7 @@ int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
|
||||
(__vector128 *) vcpu->run->s.regs.vrs);
|
||||
else
|
||||
memcpy(fpu->fprs, vcpu->run->s.regs.fprs, sizeof(fpu->fprs));
|
||||
fpu->fpc = current->thread.fpu.fpc;
|
||||
fpu->fpc = vcpu->run->s.regs.fpc;
|
||||
return 0;
|
||||
}
|
||||
|
||||
@ -2740,6 +2712,20 @@ static void sync_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
|
||||
if (riccb->valid)
|
||||
vcpu->arch.sie_block->ecb3 |= 0x01;
|
||||
}
|
||||
save_access_regs(vcpu->arch.host_acrs);
|
||||
restore_access_regs(vcpu->run->s.regs.acrs);
|
||||
/* save host (userspace) fprs/vrs */
|
||||
save_fpu_regs();
|
||||
vcpu->arch.host_fpregs.fpc = current->thread.fpu.fpc;
|
||||
vcpu->arch.host_fpregs.regs = current->thread.fpu.regs;
|
||||
if (MACHINE_HAS_VX)
|
||||
current->thread.fpu.regs = vcpu->run->s.regs.vrs;
|
||||
else
|
||||
current->thread.fpu.regs = vcpu->run->s.regs.fprs;
|
||||
current->thread.fpu.fpc = vcpu->run->s.regs.fpc;
|
||||
if (test_fp_ctl(current->thread.fpu.fpc))
|
||||
/* User space provided an invalid FPC, let's clear it */
|
||||
current->thread.fpu.fpc = 0;
|
||||
|
||||
kvm_run->kvm_dirty_regs = 0;
|
||||
}
|
||||
@ -2758,6 +2744,15 @@ static void store_regs(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
|
||||
kvm_run->s.regs.pft = vcpu->arch.pfault_token;
|
||||
kvm_run->s.regs.pfs = vcpu->arch.pfault_select;
|
||||
kvm_run->s.regs.pfc = vcpu->arch.pfault_compare;
|
||||
save_access_regs(vcpu->run->s.regs.acrs);
|
||||
restore_access_regs(vcpu->arch.host_acrs);
|
||||
/* Save guest register state */
|
||||
save_fpu_regs();
|
||||
vcpu->run->s.regs.fpc = current->thread.fpu.fpc;
|
||||
/* Restore will be done lazily at return */
|
||||
current->thread.fpu.fpc = vcpu->arch.host_fpregs.fpc;
|
||||
current->thread.fpu.regs = vcpu->arch.host_fpregs.regs;
|
||||
|
||||
}
|
||||
|
||||
int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
|
||||
@ -2874,7 +2869,7 @@ int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr)
|
||||
{
|
||||
/*
|
||||
* The guest FPRS and ACRS are in the host FPRS/ACRS due to the lazy
|
||||
* copying in vcpu load/put. Lets update our copies before we save
|
||||
* switch in the run ioctl. Let's update our copies before we save
|
||||
* it into the save area
|
||||
*/
|
||||
save_fpu_regs();
|
||||
|
@ -191,6 +191,8 @@ enum {
|
||||
#define PFERR_RSVD_BIT 3
|
||||
#define PFERR_FETCH_BIT 4
|
||||
#define PFERR_PK_BIT 5
|
||||
#define PFERR_GUEST_FINAL_BIT 32
|
||||
#define PFERR_GUEST_PAGE_BIT 33
|
||||
|
||||
#define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
|
||||
#define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
|
||||
@ -198,6 +200,13 @@ enum {
|
||||
#define PFERR_RSVD_MASK (1U << PFERR_RSVD_BIT)
|
||||
#define PFERR_FETCH_MASK (1U << PFERR_FETCH_BIT)
|
||||
#define PFERR_PK_MASK (1U << PFERR_PK_BIT)
|
||||
#define PFERR_GUEST_FINAL_MASK (1ULL << PFERR_GUEST_FINAL_BIT)
|
||||
#define PFERR_GUEST_PAGE_MASK (1ULL << PFERR_GUEST_PAGE_BIT)
|
||||
|
||||
#define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK | \
|
||||
PFERR_USER_MASK | \
|
||||
PFERR_WRITE_MASK | \
|
||||
PFERR_PRESENT_MASK)
|
||||
|
||||
/* apic attention bits */
|
||||
#define KVM_APIC_CHECK_VAPIC 0
|
||||
@ -1062,6 +1071,7 @@ unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
|
||||
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
|
||||
|
||||
int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3);
|
||||
bool pdptrs_changed(struct kvm_vcpu *vcpu);
|
||||
|
||||
int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
|
||||
const void *val, int bytes);
|
||||
@ -1124,7 +1134,8 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
|
||||
struct x86_emulate_ctxt;
|
||||
|
||||
int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port);
|
||||
void kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
|
||||
int kvm_fast_pio_in(struct kvm_vcpu *vcpu, int size, unsigned short port);
|
||||
int kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
|
||||
int kvm_emulate_halt(struct kvm_vcpu *vcpu);
|
||||
int kvm_vcpu_halt(struct kvm_vcpu *vcpu);
|
||||
int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
|
||||
@ -1203,7 +1214,7 @@ void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu);
|
||||
|
||||
int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
|
||||
|
||||
int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code,
|
||||
int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u64 error_code,
|
||||
void *insn, int insn_len);
|
||||
void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
|
||||
void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu);
|
||||
@ -1358,7 +1369,8 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
|
||||
bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
|
||||
extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
|
||||
|
||||
void kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
|
||||
int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
|
||||
int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
|
||||
|
||||
int kvm_is_in_guest(void);
|
||||
|
||||
|
@ -25,6 +25,7 @@
|
||||
#define VMX_H
|
||||
|
||||
|
||||
#include <linux/bitops.h>
|
||||
#include <linux/types.h>
|
||||
#include <uapi/asm/vmx.h>
|
||||
|
||||
@ -60,6 +61,7 @@
|
||||
*/
|
||||
#define SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES 0x00000001
|
||||
#define SECONDARY_EXEC_ENABLE_EPT 0x00000002
|
||||
#define SECONDARY_EXEC_DESC 0x00000004
|
||||
#define SECONDARY_EXEC_RDTSCP 0x00000008
|
||||
#define SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE 0x00000010
|
||||
#define SECONDARY_EXEC_ENABLE_VPID 0x00000020
|
||||
@ -110,6 +112,36 @@
|
||||
#define VMX_MISC_SAVE_EFER_LMA 0x00000020
|
||||
#define VMX_MISC_ACTIVITY_HLT 0x00000040
|
||||
|
||||
static inline u32 vmx_basic_vmcs_revision_id(u64 vmx_basic)
|
||||
{
|
||||
return vmx_basic & GENMASK_ULL(30, 0);
|
||||
}
|
||||
|
||||
static inline u32 vmx_basic_vmcs_size(u64 vmx_basic)
|
||||
{
|
||||
return (vmx_basic & GENMASK_ULL(44, 32)) >> 32;
|
||||
}
|
||||
|
||||
static inline int vmx_misc_preemption_timer_rate(u64 vmx_misc)
|
||||
{
|
||||
return vmx_misc & VMX_MISC_PREEMPTION_TIMER_RATE_MASK;
|
||||
}
|
||||
|
||||
static inline int vmx_misc_cr3_count(u64 vmx_misc)
|
||||
{
|
||||
return (vmx_misc & GENMASK_ULL(24, 16)) >> 16;
|
||||
}
|
||||
|
||||
static inline int vmx_misc_max_msr(u64 vmx_misc)
|
||||
{
|
||||
return (vmx_misc & GENMASK_ULL(27, 25)) >> 25;
|
||||
}
|
||||
|
||||
static inline int vmx_misc_mseg_revid(u64 vmx_misc)
|
||||
{
|
||||
return (vmx_misc & GENMASK_ULL(63, 32)) >> 32;
|
||||
}
|
||||
|
||||
/* VMCS Encodings */
|
||||
enum vmcs_field {
|
||||
VIRTUAL_PROCESSOR_ID = 0x00000000,
|
||||
@ -399,10 +431,11 @@ enum vmcs_field {
|
||||
#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT (KVM_USER_MEM_SLOTS + 2)
|
||||
|
||||
#define VMX_NR_VPIDS (1 << 16)
|
||||
#define VMX_VPID_EXTENT_INDIVIDUAL_ADDR 0
|
||||
#define VMX_VPID_EXTENT_SINGLE_CONTEXT 1
|
||||
#define VMX_VPID_EXTENT_ALL_CONTEXT 2
|
||||
#define VMX_VPID_EXTENT_SINGLE_NON_GLOBAL 3
|
||||
|
||||
#define VMX_EPT_EXTENT_INDIVIDUAL_ADDR 0
|
||||
#define VMX_EPT_EXTENT_CONTEXT 1
|
||||
#define VMX_EPT_EXTENT_GLOBAL 2
|
||||
#define VMX_EPT_EXTENT_SHIFT 24
|
||||
@ -419,8 +452,10 @@ enum vmcs_field {
|
||||
#define VMX_EPT_EXTENT_GLOBAL_BIT (1ull << 26)
|
||||
|
||||
#define VMX_VPID_INVVPID_BIT (1ull << 0) /* (32 - 32) */
|
||||
#define VMX_VPID_EXTENT_INDIVIDUAL_ADDR_BIT (1ull << 8) /* (40 - 32) */
|
||||
#define VMX_VPID_EXTENT_SINGLE_CONTEXT_BIT (1ull << 9) /* (41 - 32) */
|
||||
#define VMX_VPID_EXTENT_GLOBAL_CONTEXT_BIT (1ull << 10) /* (42 - 32) */
|
||||
#define VMX_VPID_EXTENT_SINGLE_NON_GLOBAL_BIT (1ull << 11) /* (43 - 32) */
|
||||
|
||||
#define VMX_EPT_DEFAULT_GAW 3
|
||||
#define VMX_EPT_MAX_GAW 0x4
|
||||
|
@ -65,6 +65,8 @@
|
||||
#define EXIT_REASON_TPR_BELOW_THRESHOLD 43
|
||||
#define EXIT_REASON_APIC_ACCESS 44
|
||||
#define EXIT_REASON_EOI_INDUCED 45
|
||||
#define EXIT_REASON_GDTR_IDTR 46
|
||||
#define EXIT_REASON_LDTR_TR 47
|
||||
#define EXIT_REASON_EPT_VIOLATION 48
|
||||
#define EXIT_REASON_EPT_MISCONFIG 49
|
||||
#define EXIT_REASON_INVEPT 50
|
||||
@ -113,6 +115,8 @@
|
||||
{ EXIT_REASON_MCE_DURING_VMENTRY, "MCE_DURING_VMENTRY" }, \
|
||||
{ EXIT_REASON_TPR_BELOW_THRESHOLD, "TPR_BELOW_THRESHOLD" }, \
|
||||
{ EXIT_REASON_APIC_ACCESS, "APIC_ACCESS" }, \
|
||||
{ EXIT_REASON_GDTR_IDTR, "GDTR_IDTR" }, \
|
||||
{ EXIT_REASON_LDTR_TR, "LDTR_TR" }, \
|
||||
{ EXIT_REASON_EPT_VIOLATION, "EPT_VIOLATION" }, \
|
||||
{ EXIT_REASON_EPT_MISCONFIG, "EPT_MISCONFIG" }, \
|
||||
{ EXIT_REASON_INVEPT, "INVEPT" }, \
|
||||
@ -129,6 +133,7 @@
|
||||
{ EXIT_REASON_XRSTORS, "XRSTORS" }
|
||||
|
||||
#define VMX_ABORT_SAVE_GUEST_MSR_FAIL 1
|
||||
#define VMX_ABORT_LOAD_HOST_PDPTE_FAIL 2
|
||||
#define VMX_ABORT_LOAD_HOST_MSR_FAIL 4
|
||||
|
||||
#endif /* _UAPIVMX_H */
|
||||
|
@ -16,6 +16,7 @@
|
||||
#include <linux/export.h>
|
||||
#include <linux/vmalloc.h>
|
||||
#include <linux/uaccess.h>
|
||||
#include <asm/processor.h>
|
||||
#include <asm/user.h>
|
||||
#include <asm/fpu/xstate.h>
|
||||
#include "cpuid.h"
|
||||
@ -64,6 +65,11 @@ u64 kvm_supported_xcr0(void)
|
||||
|
||||
#define F(x) bit(X86_FEATURE_##x)
|
||||
|
||||
/* These are scattered features in cpufeatures.h. */
|
||||
#define KVM_CPUID_BIT_AVX512_4VNNIW 2
|
||||
#define KVM_CPUID_BIT_AVX512_4FMAPS 3
|
||||
#define KF(x) bit(KVM_CPUID_BIT_##x)
|
||||
|
||||
int kvm_update_cpuid(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_cpuid_entry2 *best;
|
||||
@ -80,6 +86,10 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
|
||||
best->ecx |= F(OSXSAVE);
|
||||
}
|
||||
|
||||
best->edx &= ~F(APIC);
|
||||
if (vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE)
|
||||
best->edx |= F(APIC);
|
||||
|
||||
if (apic) {
|
||||
if (best->ecx & F(TSC_DEADLINE_TIMER))
|
||||
apic->lapic_timer.timer_mode_mask = 3 << 17;
|
||||
@ -374,6 +384,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
|
||||
/* cpuid 7.0.ecx*/
|
||||
const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
|
||||
|
||||
/* cpuid 7.0.edx*/
|
||||
const u32 kvm_cpuid_7_0_edx_x86_features =
|
||||
KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
|
||||
|
||||
/* all calls to cpuid_count() should be made on the same cpu */
|
||||
get_cpu();
|
||||
|
||||
@ -456,12 +470,14 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
|
||||
/* PKU is not yet implemented for shadow paging. */
|
||||
if (!tdp_enabled)
|
||||
entry->ecx &= ~F(PKU);
|
||||
entry->edx &= kvm_cpuid_7_0_edx_x86_features;
|
||||
entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
|
||||
} else {
|
||||
entry->ebx = 0;
|
||||
entry->ecx = 0;
|
||||
entry->edx = 0;
|
||||
}
|
||||
entry->eax = 0;
|
||||
entry->edx = 0;
|
||||
break;
|
||||
}
|
||||
case 9:
|
||||
@ -861,17 +877,17 @@ void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_cpuid);
|
||||
|
||||
void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
|
||||
int kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
u32 function, eax, ebx, ecx, edx;
|
||||
u32 eax, ebx, ecx, edx;
|
||||
|
||||
function = eax = kvm_register_read(vcpu, VCPU_REGS_RAX);
|
||||
eax = kvm_register_read(vcpu, VCPU_REGS_RAX);
|
||||
ecx = kvm_register_read(vcpu, VCPU_REGS_RCX);
|
||||
kvm_cpuid(vcpu, &eax, &ebx, &ecx, &edx);
|
||||
kvm_register_write(vcpu, VCPU_REGS_RAX, eax);
|
||||
kvm_register_write(vcpu, VCPU_REGS_RBX, ebx);
|
||||
kvm_register_write(vcpu, VCPU_REGS_RCX, ecx);
|
||||
kvm_register_write(vcpu, VCPU_REGS_RDX, edx);
|
||||
kvm_x86_ops->skip_emulated_instruction(vcpu);
|
||||
return kvm_skip_emulated_instruction(vcpu);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_emulate_cpuid);
|
||||
|
@ -158,9 +158,11 @@
|
||||
#define Src2GS (OpGS << Src2Shift)
|
||||
#define Src2Mask (OpMask << Src2Shift)
|
||||
#define Mmx ((u64)1 << 40) /* MMX Vector instruction */
|
||||
#define AlignMask ((u64)7 << 41)
|
||||
#define Aligned ((u64)1 << 41) /* Explicitly aligned (e.g. MOVDQA) */
|
||||
#define Unaligned ((u64)1 << 42) /* Explicitly unaligned (e.g. MOVDQU) */
|
||||
#define Avx ((u64)1 << 43) /* Advanced Vector Extensions */
|
||||
#define Unaligned ((u64)2 << 41) /* Explicitly unaligned (e.g. MOVDQU) */
|
||||
#define Avx ((u64)3 << 41) /* Advanced Vector Extensions */
|
||||
#define Aligned16 ((u64)4 << 41) /* Aligned to 16 byte boundary (e.g. FXSAVE) */
|
||||
#define Fastop ((u64)1 << 44) /* Use opcode::u.fastop */
|
||||
#define NoWrite ((u64)1 << 45) /* No writeback */
|
||||
#define SrcWrite ((u64)1 << 46) /* Write back src operand */
|
||||
@ -446,6 +448,26 @@ FOP_END;
|
||||
FOP_START(salc) "pushf; sbb %al, %al; popf \n\t" FOP_RET
|
||||
FOP_END;
|
||||
|
||||
/*
|
||||
* XXX: inoutclob user must know where the argument is being expanded.
|
||||
* Relying on CC_HAVE_ASM_GOTO would allow us to remove _fault.
|
||||
*/
|
||||
#define asm_safe(insn, inoutclob...) \
|
||||
({ \
|
||||
int _fault = 0; \
|
||||
\
|
||||
asm volatile("1:" insn "\n" \
|
||||
"2:\n" \
|
||||
".pushsection .fixup, \"ax\"\n" \
|
||||
"3: movl $1, %[_fault]\n" \
|
||||
" jmp 2b\n" \
|
||||
".popsection\n" \
|
||||
_ASM_EXTABLE(1b, 3b) \
|
||||
: [_fault] "+qm"(_fault) inoutclob ); \
|
||||
\
|
||||
_fault ? X86EMUL_UNHANDLEABLE : X86EMUL_CONTINUE; \
|
||||
})
|
||||
|
||||
static int emulator_check_intercept(struct x86_emulate_ctxt *ctxt,
|
||||
enum x86_intercept intercept,
|
||||
enum x86_intercept_stage stage)
|
||||
@ -632,21 +654,26 @@ static void set_segment_selector(struct x86_emulate_ctxt *ctxt, u16 selector,
|
||||
* depending on whether they're AVX encoded or not.
|
||||
*
|
||||
* Also included is CMPXCHG16B which is not a vector instruction, yet it is
|
||||
* subject to the same check.
|
||||
* subject to the same check. FXSAVE and FXRSTOR are checked here too as their
|
||||
* 512 bytes of data must be aligned to a 16 byte boundary.
|
||||
*/
|
||||
static bool insn_aligned(struct x86_emulate_ctxt *ctxt, unsigned size)
|
||||
static unsigned insn_alignment(struct x86_emulate_ctxt *ctxt, unsigned size)
|
||||
{
|
||||
if (likely(size < 16))
|
||||
return false;
|
||||
u64 alignment = ctxt->d & AlignMask;
|
||||
|
||||
if (ctxt->d & Aligned)
|
||||
return true;
|
||||
else if (ctxt->d & Unaligned)
|
||||
return false;
|
||||
else if (ctxt->d & Avx)
|
||||
return false;
|
||||
else
|
||||
return true;
|
||||
if (likely(size < 16))
|
||||
return 1;
|
||||
|
||||
switch (alignment) {
|
||||
case Unaligned:
|
||||
case Avx:
|
||||
return 1;
|
||||
case Aligned16:
|
||||
return 16;
|
||||
case Aligned:
|
||||
default:
|
||||
return size;
|
||||
}
|
||||
}
|
||||
|
||||
static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
|
||||
@ -704,7 +731,7 @@ static __always_inline int __linearize(struct x86_emulate_ctxt *ctxt,
|
||||
}
|
||||
break;
|
||||
}
|
||||
if (insn_aligned(ctxt, size) && ((la & (size - 1)) != 0))
|
||||
if (la & (insn_alignment(ctxt, size) - 1))
|
||||
return emulate_gp(ctxt, 0);
|
||||
return X86EMUL_CONTINUE;
|
||||
bad:
|
||||
@ -3842,6 +3869,131 @@ static int em_movsxd(struct x86_emulate_ctxt *ctxt)
|
||||
return X86EMUL_CONTINUE;
|
||||
}
|
||||
|
||||
static int check_fxsr(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
u32 eax = 1, ebx, ecx = 0, edx;
|
||||
|
||||
ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
|
||||
if (!(edx & FFL(FXSR)))
|
||||
return emulate_ud(ctxt);
|
||||
|
||||
if (ctxt->ops->get_cr(ctxt, 0) & (X86_CR0_TS | X86_CR0_EM))
|
||||
return emulate_nm(ctxt);
|
||||
|
||||
/*
|
||||
* Don't emulate a case that should never be hit, instead of working
|
||||
* around a lack of fxsave64/fxrstor64 on old compilers.
|
||||
*/
|
||||
if (ctxt->mode >= X86EMUL_MODE_PROT64)
|
||||
return X86EMUL_UNHANDLEABLE;
|
||||
|
||||
return X86EMUL_CONTINUE;
|
||||
}
|
||||
|
||||
/*
|
||||
* FXSAVE and FXRSTOR have 4 different formats depending on execution mode,
|
||||
* 1) 16 bit mode
|
||||
* 2) 32 bit mode
|
||||
* - like (1), but FIP and FDP (foo) are only 16 bit. At least Intel CPUs
|
||||
* preserve whole 32 bit values, though, so (1) and (2) are the same wrt.
|
||||
* save and restore
|
||||
* 3) 64-bit mode with REX.W prefix
|
||||
* - like (2), but XMM 8-15 are being saved and restored
|
||||
* 4) 64-bit mode without REX.W prefix
|
||||
* - like (3), but FIP and FDP are 64 bit
|
||||
*
|
||||
* Emulation uses (3) for (1) and (2) and preserves XMM 8-15 to reach the
|
||||
* desired result. (4) is not emulated.
|
||||
*
|
||||
* Note: Guest and host CPUID.(EAX=07H,ECX=0H):EBX[bit 13] (deprecate FPU CS
|
||||
* and FPU DS) should match.
|
||||
*/
|
||||
static int em_fxsave(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
struct fxregs_state fx_state;
|
||||
size_t size;
|
||||
int rc;
|
||||
|
||||
rc = check_fxsr(ctxt);
|
||||
if (rc != X86EMUL_CONTINUE)
|
||||
return rc;
|
||||
|
||||
ctxt->ops->get_fpu(ctxt);
|
||||
|
||||
rc = asm_safe("fxsave %[fx]", , [fx] "+m"(fx_state));
|
||||
|
||||
ctxt->ops->put_fpu(ctxt);
|
||||
|
||||
if (rc != X86EMUL_CONTINUE)
|
||||
return rc;
|
||||
|
||||
if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_OSFXSR)
|
||||
size = offsetof(struct fxregs_state, xmm_space[8 * 16/4]);
|
||||
else
|
||||
size = offsetof(struct fxregs_state, xmm_space[0]);
|
||||
|
||||
return segmented_write(ctxt, ctxt->memop.addr.mem, &fx_state, size);
|
||||
}
|
||||
|
||||
static int fxrstor_fixup(struct x86_emulate_ctxt *ctxt,
|
||||
struct fxregs_state *new)
|
||||
{
|
||||
int rc = X86EMUL_CONTINUE;
|
||||
struct fxregs_state old;
|
||||
|
||||
rc = asm_safe("fxsave %[fx]", , [fx] "+m"(old));
|
||||
if (rc != X86EMUL_CONTINUE)
|
||||
return rc;
|
||||
|
||||
/*
|
||||
* 64 bit host will restore XMM 8-15, which is not correct on non-64
|
||||
* bit guests. Load the current values in order to preserve 64 bit
|
||||
* XMMs after fxrstor.
|
||||
*/
|
||||
#ifdef CONFIG_X86_64
|
||||
/* XXX: accessing XMM 8-15 very awkwardly */
|
||||
memcpy(&new->xmm_space[8 * 16/4], &old.xmm_space[8 * 16/4], 8 * 16);
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Hardware doesn't save and restore XMM 0-7 without CR4.OSFXSR, but
|
||||
* does save and restore MXCSR.
|
||||
*/
|
||||
if (!(ctxt->ops->get_cr(ctxt, 4) & X86_CR4_OSFXSR))
|
||||
memcpy(new->xmm_space, old.xmm_space, 8 * 16);
|
||||
|
||||
return rc;
|
||||
}
|
||||
|
||||
static int em_fxrstor(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
struct fxregs_state fx_state;
|
||||
int rc;
|
||||
|
||||
rc = check_fxsr(ctxt);
|
||||
if (rc != X86EMUL_CONTINUE)
|
||||
return rc;
|
||||
|
||||
rc = segmented_read(ctxt, ctxt->memop.addr.mem, &fx_state, 512);
|
||||
if (rc != X86EMUL_CONTINUE)
|
||||
return rc;
|
||||
|
||||
if (fx_state.mxcsr >> 16)
|
||||
return emulate_gp(ctxt, 0);
|
||||
|
||||
ctxt->ops->get_fpu(ctxt);
|
||||
|
||||
if (ctxt->mode < X86EMUL_MODE_PROT64)
|
||||
rc = fxrstor_fixup(ctxt, &fx_state);
|
||||
|
||||
if (rc == X86EMUL_CONTINUE)
|
||||
rc = asm_safe("fxrstor %[fx]", : [fx] "m"(fx_state));
|
||||
|
||||
ctxt->ops->put_fpu(ctxt);
|
||||
|
||||
return rc;
|
||||
}
|
||||
|
||||
static bool valid_cr(int nr)
|
||||
{
|
||||
switch (nr) {
|
||||
@ -4194,7 +4346,9 @@ static const struct gprefix pfx_0f_ae_7 = {
|
||||
};
|
||||
|
||||
static const struct group_dual group15 = { {
|
||||
N, N, N, N, N, N, N, GP(0, &pfx_0f_ae_7),
|
||||
I(ModRM | Aligned16, em_fxsave),
|
||||
I(ModRM | Aligned16, em_fxrstor),
|
||||
N, N, N, N, N, GP(0, &pfx_0f_ae_7),
|
||||
}, {
|
||||
N, N, N, N, N, N, N, N,
|
||||
} };
|
||||
@ -5066,21 +5220,13 @@ static bool string_insn_completed(struct x86_emulate_ctxt *ctxt)
|
||||
|
||||
static int flush_pending_x87_faults(struct x86_emulate_ctxt *ctxt)
|
||||
{
|
||||
bool fault = false;
|
||||
int rc;
|
||||
|
||||
ctxt->ops->get_fpu(ctxt);
|
||||
asm volatile("1: fwait \n\t"
|
||||
"2: \n\t"
|
||||
".pushsection .fixup,\"ax\" \n\t"
|
||||
"3: \n\t"
|
||||
"movb $1, %[fault] \n\t"
|
||||
"jmp 2b \n\t"
|
||||
".popsection \n\t"
|
||||
_ASM_EXTABLE(1b, 3b)
|
||||
: [fault]"+qm"(fault));
|
||||
rc = asm_safe("fwait");
|
||||
ctxt->ops->put_fpu(ctxt);
|
||||
|
||||
if (unlikely(fault))
|
||||
if (unlikely(rc != X86EMUL_CONTINUE))
|
||||
return emulate_exception(ctxt, MF_VECTOR, 0, false);
|
||||
|
||||
return X86EMUL_CONTINUE;
|
||||
|
@ -291,7 +291,7 @@ static int synic_get_msr(struct kvm_vcpu_hv_synic *synic, u32 msr, u64 *pdata)
|
||||
return ret;
|
||||
}
|
||||
|
||||
int synic_set_irq(struct kvm_vcpu_hv_synic *synic, u32 sint)
|
||||
static int synic_set_irq(struct kvm_vcpu_hv_synic *synic, u32 sint)
|
||||
{
|
||||
struct kvm_vcpu *vcpu = synic_to_vcpu(synic);
|
||||
struct kvm_lapic_irq irq;
|
||||
|
@ -212,7 +212,7 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
|
||||
*/
|
||||
smp_mb();
|
||||
if (atomic_dec_if_positive(&ps->pending) > 0)
|
||||
kthread_queue_work(&pit->worker, &pit->expired);
|
||||
kthread_queue_work(pit->worker, &pit->expired);
|
||||
}
|
||||
|
||||
void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
|
||||
@ -272,7 +272,7 @@ static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
|
||||
if (atomic_read(&ps->reinject))
|
||||
atomic_inc(&ps->pending);
|
||||
|
||||
kthread_queue_work(&pt->worker, &pt->expired);
|
||||
kthread_queue_work(pt->worker, &pt->expired);
|
||||
|
||||
if (ps->is_periodic) {
|
||||
hrtimer_add_expires_ns(&ps->timer, ps->period);
|
||||
@ -667,10 +667,8 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
|
||||
pid_nr = pid_vnr(pid);
|
||||
put_pid(pid);
|
||||
|
||||
kthread_init_worker(&pit->worker);
|
||||
pit->worker_task = kthread_run(kthread_worker_fn, &pit->worker,
|
||||
"kvm-pit/%d", pid_nr);
|
||||
if (IS_ERR(pit->worker_task))
|
||||
pit->worker = kthread_create_worker(0, "kvm-pit/%d", pid_nr);
|
||||
if (IS_ERR(pit->worker))
|
||||
goto fail_kthread;
|
||||
|
||||
kthread_init_work(&pit->expired, pit_do_work);
|
||||
@ -713,7 +711,7 @@ fail_register_speaker:
|
||||
fail_register_pit:
|
||||
mutex_unlock(&kvm->slots_lock);
|
||||
kvm_pit_set_reinject(pit, false);
|
||||
kthread_stop(pit->worker_task);
|
||||
kthread_destroy_worker(pit->worker);
|
||||
fail_kthread:
|
||||
kvm_free_irq_source_id(kvm, pit->irq_source_id);
|
||||
fail_request:
|
||||
@ -730,8 +728,7 @@ void kvm_free_pit(struct kvm *kvm)
|
||||
kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &pit->speaker_dev);
|
||||
kvm_pit_set_reinject(pit, false);
|
||||
hrtimer_cancel(&pit->pit_state.timer);
|
||||
kthread_flush_work(&pit->expired);
|
||||
kthread_stop(pit->worker_task);
|
||||
kthread_destroy_worker(pit->worker);
|
||||
kvm_free_irq_source_id(kvm, pit->irq_source_id);
|
||||
kfree(pit);
|
||||
}
|
||||
|
@ -44,8 +44,7 @@ struct kvm_pit {
|
||||
struct kvm_kpit_state pit_state;
|
||||
int irq_source_id;
|
||||
struct kvm_irq_mask_notifier mask_notifier;
|
||||
struct kthread_worker worker;
|
||||
struct task_struct *worker_task;
|
||||
struct kthread_worker *worker;
|
||||
struct kthread_work expired;
|
||||
};
|
||||
|
||||
|
@ -342,9 +342,11 @@ void __kvm_apic_update_irr(u32 *pir, void *regs)
|
||||
u32 i, pir_val;
|
||||
|
||||
for (i = 0; i <= 7; i++) {
|
||||
pir_val = xchg(&pir[i], 0);
|
||||
if (pir_val)
|
||||
pir_val = READ_ONCE(pir[i]);
|
||||
if (pir_val) {
|
||||
pir_val = xchg(&pir[i], 0);
|
||||
*((u32 *)(regs + APIC_IRR + i * 0x10)) |= pir_val;
|
||||
}
|
||||
}
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(__kvm_apic_update_irr);
|
||||
@ -1090,7 +1092,7 @@ static void apic_send_ipi(struct kvm_lapic *apic)
|
||||
|
||||
static u32 apic_get_tmcct(struct kvm_lapic *apic)
|
||||
{
|
||||
ktime_t remaining;
|
||||
ktime_t remaining, now;
|
||||
s64 ns;
|
||||
u32 tmcct;
|
||||
|
||||
@ -1101,7 +1103,8 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
|
||||
apic->lapic_timer.period == 0)
|
||||
return 0;
|
||||
|
||||
remaining = hrtimer_get_remaining(&apic->lapic_timer.timer);
|
||||
now = ktime_get();
|
||||
remaining = ktime_sub(apic->lapic_timer.target_expiration, now);
|
||||
if (ktime_to_ns(remaining) < 0)
|
||||
remaining = ktime_set(0, 0);
|
||||
|
||||
@ -1332,7 +1335,7 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
|
||||
|
||||
local_irq_save(flags);
|
||||
|
||||
now = apic->lapic_timer.timer.base->get_time();
|
||||
now = ktime_get();
|
||||
guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
|
||||
if (likely(tscdeadline > guest_tsc)) {
|
||||
ns = (tscdeadline - guest_tsc) * 1000000ULL;
|
||||
@ -1347,6 +1350,79 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
|
||||
local_irq_restore(flags);
|
||||
}
|
||||
|
||||
static void start_sw_period(struct kvm_lapic *apic)
|
||||
{
|
||||
if (!apic->lapic_timer.period)
|
||||
return;
|
||||
|
||||
if (apic_lvtt_oneshot(apic) &&
|
||||
ktime_after(ktime_get(),
|
||||
apic->lapic_timer.target_expiration)) {
|
||||
apic_timer_expired(apic);
|
||||
return;
|
||||
}
|
||||
|
||||
hrtimer_start(&apic->lapic_timer.timer,
|
||||
apic->lapic_timer.target_expiration,
|
||||
HRTIMER_MODE_ABS_PINNED);
|
||||
}
|
||||
|
||||
static bool set_target_expiration(struct kvm_lapic *apic)
|
||||
{
|
||||
ktime_t now;
|
||||
u64 tscl = rdtsc();
|
||||
|
||||
now = ktime_get();
|
||||
apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
|
||||
* APIC_BUS_CYCLE_NS * apic->divide_count;
|
||||
|
||||
if (!apic->lapic_timer.period)
|
||||
return false;
|
||||
|
||||
/*
|
||||
* Do not allow the guest to program periodic timers with small
|
||||
* interval, since the hrtimers are not throttled by the host
|
||||
* scheduler.
|
||||
*/
|
||||
if (apic_lvtt_period(apic)) {
|
||||
s64 min_period = min_timer_period_us * 1000LL;
|
||||
|
||||
if (apic->lapic_timer.period < min_period) {
|
||||
pr_info_ratelimited(
|
||||
"kvm: vcpu %i: requested %lld ns "
|
||||
"lapic timer period limited to %lld ns\n",
|
||||
apic->vcpu->vcpu_id,
|
||||
apic->lapic_timer.period, min_period);
|
||||
apic->lapic_timer.period = min_period;
|
||||
}
|
||||
}
|
||||
|
||||
apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
|
||||
PRIx64 ", "
|
||||
"timer initial count 0x%x, period %lldns, "
|
||||
"expire @ 0x%016" PRIx64 ".\n", __func__,
|
||||
APIC_BUS_CYCLE_NS, ktime_to_ns(now),
|
||||
kvm_lapic_get_reg(apic, APIC_TMICT),
|
||||
apic->lapic_timer.period,
|
||||
ktime_to_ns(ktime_add_ns(now,
|
||||
apic->lapic_timer.period)));
|
||||
|
||||
apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
|
||||
nsec_to_cycles(apic->vcpu, apic->lapic_timer.period);
|
||||
apic->lapic_timer.target_expiration = ktime_add_ns(now, apic->lapic_timer.period);
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
static void advance_periodic_target_expiration(struct kvm_lapic *apic)
|
||||
{
|
||||
apic->lapic_timer.tscdeadline +=
|
||||
nsec_to_cycles(apic->vcpu, apic->lapic_timer.period);
|
||||
apic->lapic_timer.target_expiration =
|
||||
ktime_add_ns(apic->lapic_timer.target_expiration,
|
||||
apic->lapic_timer.period);
|
||||
}
|
||||
|
||||
bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (!lapic_in_kernel(vcpu))
|
||||
@ -1356,52 +1432,59 @@ bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_lapic_hv_timer_in_use);
|
||||
|
||||
static void cancel_hv_tscdeadline(struct kvm_lapic *apic)
|
||||
static void cancel_hv_timer(struct kvm_lapic *apic)
|
||||
{
|
||||
kvm_x86_ops->cancel_hv_timer(apic->vcpu);
|
||||
apic->lapic_timer.hv_timer_in_use = false;
|
||||
}
|
||||
|
||||
static bool start_hv_timer(struct kvm_lapic *apic)
|
||||
{
|
||||
u64 tscdeadline = apic->lapic_timer.tscdeadline;
|
||||
|
||||
if ((atomic_read(&apic->lapic_timer.pending) &&
|
||||
!apic_lvtt_period(apic)) ||
|
||||
kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
|
||||
if (apic->lapic_timer.hv_timer_in_use)
|
||||
cancel_hv_timer(apic);
|
||||
} else {
|
||||
apic->lapic_timer.hv_timer_in_use = true;
|
||||
hrtimer_cancel(&apic->lapic_timer.timer);
|
||||
|
||||
/* In case the sw timer triggered in the window */
|
||||
if (atomic_read(&apic->lapic_timer.pending) &&
|
||||
!apic_lvtt_period(apic))
|
||||
cancel_hv_timer(apic);
|
||||
}
|
||||
trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
|
||||
apic->lapic_timer.hv_timer_in_use);
|
||||
return apic->lapic_timer.hv_timer_in_use;
|
||||
}
|
||||
|
||||
void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_lapic *apic = vcpu->arch.apic;
|
||||
|
||||
WARN_ON(!apic->lapic_timer.hv_timer_in_use);
|
||||
WARN_ON(swait_active(&vcpu->wq));
|
||||
cancel_hv_tscdeadline(apic);
|
||||
cancel_hv_timer(apic);
|
||||
apic_timer_expired(apic);
|
||||
|
||||
if (apic_lvtt_period(apic) && apic->lapic_timer.period) {
|
||||
advance_periodic_target_expiration(apic);
|
||||
if (!start_hv_timer(apic))
|
||||
start_sw_period(apic);
|
||||
}
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);
|
||||
|
||||
static bool start_hv_tscdeadline(struct kvm_lapic *apic)
|
||||
{
|
||||
u64 tscdeadline = apic->lapic_timer.tscdeadline;
|
||||
|
||||
if (atomic_read(&apic->lapic_timer.pending) ||
|
||||
kvm_x86_ops->set_hv_timer(apic->vcpu, tscdeadline)) {
|
||||
if (apic->lapic_timer.hv_timer_in_use)
|
||||
cancel_hv_tscdeadline(apic);
|
||||
} else {
|
||||
apic->lapic_timer.hv_timer_in_use = true;
|
||||
hrtimer_cancel(&apic->lapic_timer.timer);
|
||||
|
||||
/* In case the sw timer triggered in the window */
|
||||
if (atomic_read(&apic->lapic_timer.pending))
|
||||
cancel_hv_tscdeadline(apic);
|
||||
}
|
||||
trace_kvm_hv_timer_state(apic->vcpu->vcpu_id,
|
||||
apic->lapic_timer.hv_timer_in_use);
|
||||
return apic->lapic_timer.hv_timer_in_use;
|
||||
}
|
||||
|
||||
void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_lapic *apic = vcpu->arch.apic;
|
||||
|
||||
WARN_ON(apic->lapic_timer.hv_timer_in_use);
|
||||
|
||||
if (apic_lvtt_tscdeadline(apic))
|
||||
start_hv_tscdeadline(apic);
|
||||
start_hv_timer(apic);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_hv_timer);
|
||||
|
||||
@ -1413,62 +1496,28 @@ void kvm_lapic_switch_to_sw_timer(struct kvm_vcpu *vcpu)
|
||||
if (!apic->lapic_timer.hv_timer_in_use)
|
||||
return;
|
||||
|
||||
cancel_hv_tscdeadline(apic);
|
||||
cancel_hv_timer(apic);
|
||||
|
||||
if (atomic_read(&apic->lapic_timer.pending))
|
||||
return;
|
||||
|
||||
start_sw_tscdeadline(apic);
|
||||
if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic))
|
||||
start_sw_period(apic);
|
||||
else if (apic_lvtt_tscdeadline(apic))
|
||||
start_sw_tscdeadline(apic);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_lapic_switch_to_sw_timer);
|
||||
|
||||
static void start_apic_timer(struct kvm_lapic *apic)
|
||||
{
|
||||
ktime_t now;
|
||||
|
||||
atomic_set(&apic->lapic_timer.pending, 0);
|
||||
|
||||
if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
|
||||
/* lapic timer in oneshot or periodic mode */
|
||||
now = apic->lapic_timer.timer.base->get_time();
|
||||
apic->lapic_timer.period = (u64)kvm_lapic_get_reg(apic, APIC_TMICT)
|
||||
* APIC_BUS_CYCLE_NS * apic->divide_count;
|
||||
|
||||
if (!apic->lapic_timer.period)
|
||||
return;
|
||||
/*
|
||||
* Do not allow the guest to program periodic timers with small
|
||||
* interval, since the hrtimers are not throttled by the host
|
||||
* scheduler.
|
||||
*/
|
||||
if (apic_lvtt_period(apic)) {
|
||||
s64 min_period = min_timer_period_us * 1000LL;
|
||||
|
||||
if (apic->lapic_timer.period < min_period) {
|
||||
pr_info_ratelimited(
|
||||
"kvm: vcpu %i: requested %lld ns "
|
||||
"lapic timer period limited to %lld ns\n",
|
||||
apic->vcpu->vcpu_id,
|
||||
apic->lapic_timer.period, min_period);
|
||||
apic->lapic_timer.period = min_period;
|
||||
}
|
||||
}
|
||||
|
||||
hrtimer_start(&apic->lapic_timer.timer,
|
||||
ktime_add_ns(now, apic->lapic_timer.period),
|
||||
HRTIMER_MODE_ABS_PINNED);
|
||||
|
||||
apic_debug("%s: bus cycle is %" PRId64 "ns, now 0x%016"
|
||||
PRIx64 ", "
|
||||
"timer initial count 0x%x, period %lldns, "
|
||||
"expire @ 0x%016" PRIx64 ".\n", __func__,
|
||||
APIC_BUS_CYCLE_NS, ktime_to_ns(now),
|
||||
kvm_lapic_get_reg(apic, APIC_TMICT),
|
||||
apic->lapic_timer.period,
|
||||
ktime_to_ns(ktime_add_ns(now,
|
||||
apic->lapic_timer.period)));
|
||||
if (set_target_expiration(apic) &&
|
||||
!(kvm_x86_ops->set_hv_timer && start_hv_timer(apic)))
|
||||
start_sw_period(apic);
|
||||
} else if (apic_lvtt_tscdeadline(apic)) {
|
||||
if (!(kvm_x86_ops->set_hv_timer && start_hv_tscdeadline(apic)))
|
||||
if (!(kvm_x86_ops->set_hv_timer && start_hv_timer(apic)))
|
||||
start_sw_tscdeadline(apic);
|
||||
}
|
||||
}
|
||||
@ -1701,13 +1750,22 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu)
|
||||
* LAPIC interface
|
||||
*----------------------------------------------------------------------
|
||||
*/
|
||||
u64 kvm_get_lapic_target_expiration_tsc(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_lapic *apic = vcpu->arch.apic;
|
||||
|
||||
if (!lapic_in_kernel(vcpu))
|
||||
return 0;
|
||||
|
||||
return apic->lapic_timer.tscdeadline;
|
||||
}
|
||||
|
||||
u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
struct kvm_lapic *apic = vcpu->arch.apic;
|
||||
|
||||
if (!lapic_in_kernel(vcpu) || apic_lvtt_oneshot(apic) ||
|
||||
apic_lvtt_period(apic))
|
||||
if (!lapic_in_kernel(vcpu) ||
|
||||
!apic_lvtt_tscdeadline(apic))
|
||||
return 0;
|
||||
|
||||
return apic->lapic_timer.tscdeadline;
|
||||
@ -1748,14 +1806,17 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
|
||||
u64 old_value = vcpu->arch.apic_base;
|
||||
struct kvm_lapic *apic = vcpu->arch.apic;
|
||||
|
||||
if (!apic) {
|
||||
if (!apic)
|
||||
value |= MSR_IA32_APICBASE_BSP;
|
||||
vcpu->arch.apic_base = value;
|
||||
return;
|
||||
}
|
||||
|
||||
vcpu->arch.apic_base = value;
|
||||
|
||||
if ((old_value ^ value) & MSR_IA32_APICBASE_ENABLE)
|
||||
kvm_update_cpuid(vcpu);
|
||||
|
||||
if (!apic)
|
||||
return;
|
||||
|
||||
/* update jump label if enable bit changes */
|
||||
if ((old_value ^ value) & MSR_IA32_APICBASE_ENABLE) {
|
||||
if (value & MSR_IA32_APICBASE_ENABLE) {
|
||||
@ -1909,6 +1970,7 @@ static enum hrtimer_restart apic_timer_fn(struct hrtimer *data)
|
||||
apic_timer_expired(apic);
|
||||
|
||||
if (lapic_is_periodic(apic)) {
|
||||
advance_periodic_target_expiration(apic);
|
||||
hrtimer_add_expires_ns(&ktimer->timer, ktimer->period);
|
||||
return HRTIMER_RESTART;
|
||||
} else
|
||||
@ -1993,6 +2055,10 @@ void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
|
||||
kvm_apic_local_deliver(apic, APIC_LVTT);
|
||||
if (apic_lvtt_tscdeadline(apic))
|
||||
apic->lapic_timer.tscdeadline = 0;
|
||||
if (apic_lvtt_oneshot(apic)) {
|
||||
apic->lapic_timer.tscdeadline = 0;
|
||||
apic->lapic_timer.target_expiration = ktime_set(0, 0);
|
||||
}
|
||||
atomic_set(&apic->lapic_timer.pending, 0);
|
||||
}
|
||||
}
|
||||
|
@ -15,6 +15,7 @@
|
||||
struct kvm_timer {
|
||||
struct hrtimer timer;
|
||||
s64 period; /* unit: ns */
|
||||
ktime_t target_expiration;
|
||||
u32 timer_mode;
|
||||
u32 timer_mode_mask;
|
||||
u64 tscdeadline;
|
||||
@ -85,6 +86,7 @@ int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
|
||||
int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
|
||||
int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
|
||||
|
||||
u64 kvm_get_lapic_target_expiration_tsc(struct kvm_vcpu *vcpu);
|
||||
u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu);
|
||||
void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data);
|
||||
|
||||
|
@ -1660,17 +1660,9 @@ int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
|
||||
* This has some overhead, but not as much as the cost of swapping
|
||||
* out actively used pages or breaking up actively used hugepages.
|
||||
*/
|
||||
if (!shadow_accessed_mask) {
|
||||
/*
|
||||
* We are holding the kvm->mmu_lock, and we are blowing up
|
||||
* shadow PTEs. MMU notifier consumers need to be kept at bay.
|
||||
* This is correct as long as we don't decouple the mmu_lock
|
||||
* protected regions (like invalidate_range_start|end does).
|
||||
*/
|
||||
kvm->mmu_notifier_seq++;
|
||||
if (!shadow_accessed_mask)
|
||||
return kvm_handle_hva_range(kvm, start, end, 0,
|
||||
kvm_unmap_rmapp);
|
||||
}
|
||||
|
||||
return kvm_handle_hva_range(kvm, start, end, 0, kvm_age_rmapp);
|
||||
}
|
||||
@ -4509,7 +4501,7 @@ static void make_mmu_pages_available(struct kvm_vcpu *vcpu)
|
||||
kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
|
||||
}
|
||||
|
||||
int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code,
|
||||
int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code,
|
||||
void *insn, int insn_len)
|
||||
{
|
||||
int r, emulation_type = EMULTYPE_RETRY;
|
||||
@ -4528,12 +4520,28 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code,
|
||||
return r;
|
||||
}
|
||||
|
||||
r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code, false);
|
||||
r = vcpu->arch.mmu.page_fault(vcpu, cr2, lower_32_bits(error_code),
|
||||
false);
|
||||
if (r < 0)
|
||||
return r;
|
||||
if (!r)
|
||||
return 1;
|
||||
|
||||
/*
|
||||
* Before emulating the instruction, check if the error code
|
||||
* was due to a RO violation while translating the guest page.
|
||||
* This can occur when using nested virtualization with nested
|
||||
* paging in both guests. If true, we simply unprotect the page
|
||||
* and resume the guest.
|
||||
*
|
||||
* Note: AMD only (since it supports the PFERR_GUEST_PAGE_MASK used
|
||||
* in PFERR_NEXT_GUEST_PAGE)
|
||||
*/
|
||||
if (error_code == PFERR_NESTED_GUEST_PAGE) {
|
||||
kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2));
|
||||
return 1;
|
||||
}
|
||||
|
||||
if (mmio_info_in_cache(vcpu, cr2, direct))
|
||||
emulation_type = 0;
|
||||
emulate:
|
||||
@ -4967,7 +4975,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, struct kvm_memslots *slots)
|
||||
* zap all shadow pages.
|
||||
*/
|
||||
if (unlikely((slots->generation & MMIO_GEN_MASK) == 0)) {
|
||||
printk_ratelimited(KERN_DEBUG "kvm: zapping shadow pages for mmio generation wraparound\n");
|
||||
kvm_debug_ratelimited("kvm: zapping shadow pages for mmio generation wraparound\n");
|
||||
kvm_mmu_invalidate_zap_all_pages(kvm);
|
||||
}
|
||||
}
|
||||
|
@ -2074,7 +2074,7 @@ static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value)
|
||||
static int pf_interception(struct vcpu_svm *svm)
|
||||
{
|
||||
u64 fault_address = svm->vmcb->control.exit_info_2;
|
||||
u32 error_code;
|
||||
u64 error_code;
|
||||
int r = 1;
|
||||
|
||||
switch (svm->apf_reason) {
|
||||
@ -2270,7 +2270,7 @@ static int io_interception(struct vcpu_svm *svm)
|
||||
++svm->vcpu.stat.io_exits;
|
||||
string = (io_info & SVM_IOIO_STR_MASK) != 0;
|
||||
in = (io_info & SVM_IOIO_TYPE_MASK) != 0;
|
||||
if (string || in)
|
||||
if (string)
|
||||
return emulate_instruction(vcpu, 0) == EMULATE_DONE;
|
||||
|
||||
port = io_info >> 16;
|
||||
@ -2278,7 +2278,8 @@ static int io_interception(struct vcpu_svm *svm)
|
||||
svm->next_rip = svm->vmcb->control.exit_info_2;
|
||||
skip_emulated_instruction(&svm->vcpu);
|
||||
|
||||
return kvm_fast_pio_out(vcpu, size, port);
|
||||
return in ? kvm_fast_pio_in(vcpu, size, port)
|
||||
: kvm_fast_pio_out(vcpu, size, port);
|
||||
}
|
||||
|
||||
static int nmi_interception(struct vcpu_svm *svm)
|
||||
@ -3150,8 +3151,7 @@ static int skinit_interception(struct vcpu_svm *svm)
|
||||
|
||||
static int wbinvd_interception(struct vcpu_svm *svm)
|
||||
{
|
||||
kvm_emulate_wbinvd(&svm->vcpu);
|
||||
return 1;
|
||||
return kvm_emulate_wbinvd(&svm->vcpu);
|
||||
}
|
||||
|
||||
static int xsetbv_interception(struct vcpu_svm *svm)
|
||||
@ -3238,8 +3238,7 @@ static int task_switch_interception(struct vcpu_svm *svm)
|
||||
static int cpuid_interception(struct vcpu_svm *svm)
|
||||
{
|
||||
svm->next_rip = kvm_rip_read(&svm->vcpu) + 2;
|
||||
kvm_emulate_cpuid(&svm->vcpu);
|
||||
return 1;
|
||||
return kvm_emulate_cpuid(&svm->vcpu);
|
||||
}
|
||||
|
||||
static int iret_interception(struct vcpu_svm *svm)
|
||||
@ -3275,9 +3274,7 @@ static int rdpmc_interception(struct vcpu_svm *svm)
|
||||
return emulate_on_interception(svm);
|
||||
|
||||
err = kvm_rdpmc(&svm->vcpu);
|
||||
kvm_complete_insn_gp(&svm->vcpu, err);
|
||||
|
||||
return 1;
|
||||
return kvm_complete_insn_gp(&svm->vcpu, err);
|
||||
}
|
||||
|
||||
static bool check_selective_cr0_intercepted(struct vcpu_svm *svm,
|
||||
@ -3374,9 +3371,7 @@ static int cr_interception(struct vcpu_svm *svm)
|
||||
}
|
||||
kvm_register_write(&svm->vcpu, reg, val);
|
||||
}
|
||||
kvm_complete_insn_gp(&svm->vcpu, err);
|
||||
|
||||
return 1;
|
||||
return kvm_complete_insn_gp(&svm->vcpu, err);
|
||||
}
|
||||
|
||||
static int dr_interception(struct vcpu_svm *svm)
|
||||
|
1098
arch/x86/kvm/vmx.c
1098
arch/x86/kvm/vmx.c
File diff suppressed because it is too large
Load Diff
@ -434,12 +434,14 @@ void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr)
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_requeue_exception);
|
||||
|
||||
void kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err)
|
||||
int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err)
|
||||
{
|
||||
if (err)
|
||||
kvm_inject_gp(vcpu, 0);
|
||||
else
|
||||
kvm_x86_ops->skip_emulated_instruction(vcpu);
|
||||
return kvm_skip_emulated_instruction(vcpu);
|
||||
|
||||
return 1;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_complete_insn_gp);
|
||||
|
||||
@ -573,7 +575,7 @@ out:
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(load_pdptrs);
|
||||
|
||||
static bool pdptrs_changed(struct kvm_vcpu *vcpu)
|
||||
bool pdptrs_changed(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
u64 pdpte[ARRAY_SIZE(vcpu->arch.walk_mmu->pdptrs)];
|
||||
bool changed = true;
|
||||
@ -599,6 +601,7 @@ out:
|
||||
|
||||
return changed;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(pdptrs_changed);
|
||||
|
||||
int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
|
||||
{
|
||||
@ -2178,7 +2181,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
break;
|
||||
case MSR_KVM_SYSTEM_TIME_NEW:
|
||||
case MSR_KVM_SYSTEM_TIME: {
|
||||
u64 gpa_offset;
|
||||
struct kvm_arch *ka = &vcpu->kvm->arch;
|
||||
|
||||
kvmclock_reset(vcpu);
|
||||
@ -2200,8 +2202,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
if (!(data & 1))
|
||||
break;
|
||||
|
||||
gpa_offset = data & ~(PAGE_MASK | 1);
|
||||
|
||||
if (kvm_gfn_to_hva_cache_init(vcpu->kvm,
|
||||
&vcpu->arch.pv_time, data & ~1ULL,
|
||||
sizeof(struct pvclock_vcpu_time_info)))
|
||||
@ -2296,7 +2296,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
if (kvm_pmu_is_valid_msr(vcpu, msr))
|
||||
return kvm_pmu_set_msr(vcpu, msr_info);
|
||||
if (!ignore_msrs) {
|
||||
vcpu_unimpl(vcpu, "unhandled wrmsr: 0x%x data 0x%llx\n",
|
||||
vcpu_debug_ratelimited(vcpu, "unhandled wrmsr: 0x%x data 0x%llx\n",
|
||||
msr, data);
|
||||
return 1;
|
||||
} else {
|
||||
@ -2508,7 +2508,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
|
||||
if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
|
||||
return kvm_pmu_get_msr(vcpu, msr_info->index, &msr_info->data);
|
||||
if (!ignore_msrs) {
|
||||
vcpu_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", msr_info->index);
|
||||
vcpu_debug_ratelimited(vcpu, "unhandled rdmsr: 0x%x\n",
|
||||
msr_info->index);
|
||||
return 1;
|
||||
} else {
|
||||
vcpu_unimpl(vcpu, "ignored rdmsr: 0x%x\n", msr_info->index);
|
||||
@ -2812,7 +2813,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
|
||||
}
|
||||
if (kvm_lapic_hv_timer_in_use(vcpu) &&
|
||||
kvm_x86_ops->set_hv_timer(vcpu,
|
||||
kvm_get_lapic_tscdeadline_msr(vcpu)))
|
||||
kvm_get_lapic_target_expiration_tsc(vcpu)))
|
||||
kvm_lapic_switch_to_sw_timer(vcpu);
|
||||
/*
|
||||
* On a host with synchronized TSC, there is no need to update
|
||||
@ -4832,7 +4833,7 @@ static void emulator_invlpg(struct x86_emulate_ctxt *ctxt, ulong address)
|
||||
kvm_mmu_invlpg(emul_to_vcpu(ctxt), address);
|
||||
}
|
||||
|
||||
int kvm_emulate_wbinvd_noskip(struct kvm_vcpu *vcpu)
|
||||
static int kvm_emulate_wbinvd_noskip(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
if (!need_emulate_wbinvd(vcpu))
|
||||
return X86EMUL_CONTINUE;
|
||||
@ -4852,8 +4853,8 @@ int kvm_emulate_wbinvd_noskip(struct kvm_vcpu *vcpu)
|
||||
|
||||
int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
kvm_x86_ops->skip_emulated_instruction(vcpu);
|
||||
return kvm_emulate_wbinvd_noskip(vcpu);
|
||||
kvm_emulate_wbinvd_noskip(vcpu);
|
||||
return kvm_skip_emulated_instruction(vcpu);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_emulate_wbinvd);
|
||||
|
||||
@ -5451,7 +5452,6 @@ static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, unsigned long rflag
|
||||
kvm_run->exit_reason = KVM_EXIT_DEBUG;
|
||||
*r = EMULATE_USER_EXIT;
|
||||
} else {
|
||||
vcpu->arch.emulate_ctxt.eflags &= ~X86_EFLAGS_TF;
|
||||
/*
|
||||
* "Certain debug exceptions may clear bit 0-3. The
|
||||
* remaining contents of the DR6 register are never
|
||||
@ -5464,6 +5464,17 @@ static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, unsigned long rflag
|
||||
}
|
||||
}
|
||||
|
||||
int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
|
||||
int r = EMULATE_DONE;
|
||||
|
||||
kvm_x86_ops->skip_emulated_instruction(vcpu);
|
||||
kvm_vcpu_check_singlestep(vcpu, rflags, &r);
|
||||
return r == EMULATE_DONE;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_skip_emulated_instruction);
|
||||
|
||||
static bool kvm_vcpu_check_breakpoint(struct kvm_vcpu *vcpu, int *r)
|
||||
{
|
||||
if (unlikely(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) &&
|
||||
@ -5649,6 +5660,49 @@ int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port)
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_fast_pio_out);
|
||||
|
||||
static int complete_fast_pio_in(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
unsigned long val;
|
||||
|
||||
/* We should only ever be called with arch.pio.count equal to 1 */
|
||||
BUG_ON(vcpu->arch.pio.count != 1);
|
||||
|
||||
/* For size less than 4 we merge, else we zero extend */
|
||||
val = (vcpu->arch.pio.size < 4) ? kvm_register_read(vcpu, VCPU_REGS_RAX)
|
||||
: 0;
|
||||
|
||||
/*
|
||||
* Since vcpu->arch.pio.count == 1 let emulator_pio_in_emulated perform
|
||||
* the copy and tracing
|
||||
*/
|
||||
emulator_pio_in_emulated(&vcpu->arch.emulate_ctxt, vcpu->arch.pio.size,
|
||||
vcpu->arch.pio.port, &val, 1);
|
||||
kvm_register_write(vcpu, VCPU_REGS_RAX, val);
|
||||
|
||||
return 1;
|
||||
}
|
||||
|
||||
int kvm_fast_pio_in(struct kvm_vcpu *vcpu, int size, unsigned short port)
|
||||
{
|
||||
unsigned long val;
|
||||
int ret;
|
||||
|
||||
/* For size less than 4 we merge, else we zero extend */
|
||||
val = (size < 4) ? kvm_register_read(vcpu, VCPU_REGS_RAX) : 0;
|
||||
|
||||
ret = emulator_pio_in_emulated(&vcpu->arch.emulate_ctxt, size, port,
|
||||
&val, 1);
|
||||
if (ret) {
|
||||
kvm_register_write(vcpu, VCPU_REGS_RAX, val);
|
||||
return ret;
|
||||
}
|
||||
|
||||
vcpu->arch.complete_userspace_io = complete_fast_pio_in;
|
||||
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_fast_pio_in);
|
||||
|
||||
static int kvmclock_cpu_down_prep(unsigned int cpu)
|
||||
{
|
||||
__this_cpu_write(cpu_tsc_khz, 0);
|
||||
@ -5998,8 +6052,12 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_halt);
|
||||
|
||||
int kvm_emulate_halt(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
kvm_x86_ops->skip_emulated_instruction(vcpu);
|
||||
return kvm_vcpu_halt(vcpu);
|
||||
int ret = kvm_skip_emulated_instruction(vcpu);
|
||||
/*
|
||||
* TODO: we might be squashing a GUESTDBG_SINGLESTEP-triggered
|
||||
* KVM_EXIT_DEBUG here.
|
||||
*/
|
||||
return kvm_vcpu_halt(vcpu) && ret;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(kvm_emulate_halt);
|
||||
|
||||
@ -6030,9 +6088,9 @@ void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu)
|
||||
int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
unsigned long nr, a0, a1, a2, a3, ret;
|
||||
int op_64_bit, r = 1;
|
||||
int op_64_bit, r;
|
||||
|
||||
kvm_x86_ops->skip_emulated_instruction(vcpu);
|
||||
r = kvm_skip_emulated_instruction(vcpu);
|
||||
|
||||
if (kvm_hv_hypercall_enabled(vcpu->kvm))
|
||||
return kvm_hv_hypercall(vcpu);
|
||||
|
@ -295,10 +295,10 @@
|
||||
#define GITS_BASER_InnerShareable \
|
||||
GIC_BASER_SHAREABILITY(GITS_BASER, InnerShareable)
|
||||
#define GITS_BASER_PAGE_SIZE_SHIFT (8)
|
||||
#define GITS_BASER_PAGE_SIZE_4K (0UL << GITS_BASER_PAGE_SIZE_SHIFT)
|
||||
#define GITS_BASER_PAGE_SIZE_16K (1UL << GITS_BASER_PAGE_SIZE_SHIFT)
|
||||
#define GITS_BASER_PAGE_SIZE_64K (2UL << GITS_BASER_PAGE_SIZE_SHIFT)
|
||||
#define GITS_BASER_PAGE_SIZE_MASK (3UL << GITS_BASER_PAGE_SIZE_SHIFT)
|
||||
#define GITS_BASER_PAGE_SIZE_4K (0ULL << GITS_BASER_PAGE_SIZE_SHIFT)
|
||||
#define GITS_BASER_PAGE_SIZE_16K (1ULL << GITS_BASER_PAGE_SIZE_SHIFT)
|
||||
#define GITS_BASER_PAGE_SIZE_64K (2ULL << GITS_BASER_PAGE_SIZE_SHIFT)
|
||||
#define GITS_BASER_PAGE_SIZE_MASK (3ULL << GITS_BASER_PAGE_SIZE_SHIFT)
|
||||
#define GITS_BASER_PAGES_MAX 256
|
||||
#define GITS_BASER_PAGES_SHIFT (0)
|
||||
#define GITS_BASER_NR_PAGES(r) (((r) & 0xff) + 1)
|
||||
|
@ -438,6 +438,9 @@ struct kvm {
|
||||
pr_info("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
|
||||
#define kvm_debug(fmt, ...) \
|
||||
pr_debug("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
|
||||
#define kvm_debug_ratelimited(fmt, ...) \
|
||||
pr_debug_ratelimited("kvm [%i]: " fmt, task_pid_nr(current), \
|
||||
## __VA_ARGS__)
|
||||
#define kvm_pr_unimpl(fmt, ...) \
|
||||
pr_err_ratelimited("kvm [%i]: " fmt, \
|
||||
task_tgid_nr(current), ## __VA_ARGS__)
|
||||
@ -449,6 +452,9 @@ struct kvm {
|
||||
|
||||
#define vcpu_debug(vcpu, fmt, ...) \
|
||||
kvm_debug("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
|
||||
#define vcpu_debug_ratelimited(vcpu, fmt, ...) \
|
||||
kvm_debug_ratelimited("vcpu%i " fmt, (vcpu)->vcpu_id, \
|
||||
## __VA_ARGS__)
|
||||
#define vcpu_err(vcpu, fmt, ...) \
|
||||
kvm_err("vcpu%i " fmt, (vcpu)->vcpu_id, ## __VA_ARGS__)
|
||||
|
||||
@ -1108,6 +1114,10 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu)
|
||||
|
||||
extern bool kvm_rebooting;
|
||||
|
||||
extern unsigned int halt_poll_ns;
|
||||
extern unsigned int halt_poll_ns_grow;
|
||||
extern unsigned int halt_poll_ns_shrink;
|
||||
|
||||
struct kvm_device {
|
||||
struct kvm_device_ops *ops;
|
||||
struct kvm *kvm;
|
||||
|
@ -651,6 +651,9 @@ struct kvm_enable_cap {
|
||||
};
|
||||
|
||||
/* for KVM_PPC_GET_PVINFO */
|
||||
|
||||
#define KVM_PPC_PVINFO_FLAGS_EV_IDLE (1<<0)
|
||||
|
||||
struct kvm_ppc_pvinfo {
|
||||
/* out */
|
||||
__u32 flags;
|
||||
@ -682,8 +685,6 @@ struct kvm_ppc_smmu_info {
|
||||
struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
|
||||
};
|
||||
|
||||
#define KVM_PPC_PVINFO_FLAGS_EV_IDLE (1<<0)
|
||||
|
||||
#define KVMIO 0xAE
|
||||
|
||||
/* machine type bits, to be used as argument to KVM_CREATE_VM */
|
||||
|
@ -425,6 +425,11 @@ int kvm_timer_hyp_init(void)
|
||||
info = arch_timer_get_kvm_info();
|
||||
timecounter = &info->timecounter;
|
||||
|
||||
if (!timecounter->cc) {
|
||||
kvm_err("kvm_arch_timer: uninitialized timecounter\n");
|
||||
return -ENODEV;
|
||||
}
|
||||
|
||||
if (info->virtual_irq <= 0) {
|
||||
kvm_err("kvm_arch_timer: invalid virtual timer IRQ: %d\n",
|
||||
info->virtual_irq);
|
||||
@ -498,17 +503,7 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
|
||||
/*
|
||||
* There is a potential race here between VCPUs starting for the first
|
||||
* time, which may be enabling the timer multiple times. That doesn't
|
||||
* hurt though, because we're just setting a variable to the same
|
||||
* variable that it already was. The important thing is that all
|
||||
* VCPUs have the enabled variable set, before entering the guest, if
|
||||
* the arch timers are enabled.
|
||||
*/
|
||||
if (timecounter)
|
||||
timer->enabled = 1;
|
||||
timer->enabled = 1;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -632,21 +632,22 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, int id)
|
||||
int index;
|
||||
u64 indirect_ptr;
|
||||
gfn_t gfn;
|
||||
int esz = GITS_BASER_ENTRY_SIZE(baser);
|
||||
|
||||
if (!(baser & GITS_BASER_INDIRECT)) {
|
||||
phys_addr_t addr;
|
||||
|
||||
if (id >= (l1_tbl_size / GITS_BASER_ENTRY_SIZE(baser)))
|
||||
if (id >= (l1_tbl_size / esz))
|
||||
return false;
|
||||
|
||||
addr = BASER_ADDRESS(baser) + id * GITS_BASER_ENTRY_SIZE(baser);
|
||||
addr = BASER_ADDRESS(baser) + id * esz;
|
||||
gfn = addr >> PAGE_SHIFT;
|
||||
|
||||
return kvm_is_visible_gfn(its->dev->kvm, gfn);
|
||||
}
|
||||
|
||||
/* calculate and check the index into the 1st level */
|
||||
index = id / (SZ_64K / GITS_BASER_ENTRY_SIZE(baser));
|
||||
index = id / (SZ_64K / esz);
|
||||
if (index >= (l1_tbl_size / sizeof(u64)))
|
||||
return false;
|
||||
|
||||
@ -670,8 +671,8 @@ static bool vgic_its_check_id(struct vgic_its *its, u64 baser, int id)
|
||||
indirect_ptr &= GENMASK_ULL(51, 16);
|
||||
|
||||
/* Find the address of the actual entry */
|
||||
index = id % (SZ_64K / GITS_BASER_ENTRY_SIZE(baser));
|
||||
indirect_ptr += index * GITS_BASER_ENTRY_SIZE(baser);
|
||||
index = id % (SZ_64K / esz);
|
||||
indirect_ptr += index * esz;
|
||||
gfn = indirect_ptr >> PAGE_SHIFT;
|
||||
|
||||
return kvm_is_visible_gfn(its->dev->kvm, gfn);
|
||||
|
@ -221,11 +221,9 @@ int kvm_register_vgic_device(unsigned long type)
|
||||
ret = kvm_register_device_ops(&kvm_arm_vgic_v3_ops,
|
||||
KVM_DEV_TYPE_ARM_VGIC_V3);
|
||||
|
||||
#ifdef CONFIG_KVM_ARM_VGIC_V3_ITS
|
||||
if (ret)
|
||||
break;
|
||||
ret = kvm_vgic_register_its_device();
|
||||
#endif
|
||||
break;
|
||||
}
|
||||
|
||||
|
@ -129,6 +129,7 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
|
||||
unsigned long val)
|
||||
{
|
||||
u32 intid = VGIC_ADDR_TO_INTID(addr, 8);
|
||||
u8 cpu_mask = GENMASK(atomic_read(&vcpu->kvm->online_vcpus) - 1, 0);
|
||||
int i;
|
||||
|
||||
/* GICD_ITARGETSR[0-7] are read-only */
|
||||
@ -141,7 +142,7 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
|
||||
|
||||
spin_lock(&irq->irq_lock);
|
||||
|
||||
irq->targets = (val >> (i * 8)) & 0xff;
|
||||
irq->targets = (val >> (i * 8)) & cpu_mask;
|
||||
target = irq->targets ? __ffs(irq->targets) : 0;
|
||||
irq->target_vcpu = kvm_get_vcpu(vcpu->kvm, target);
|
||||
|
||||
|
@ -42,7 +42,6 @@ u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
|
||||
return reg | ((u64)val << lower);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_KVM_ARM_VGIC_V3_ITS
|
||||
bool vgic_has_its(struct kvm *kvm)
|
||||
{
|
||||
struct vgic_dist *dist = &kvm->arch.vgic;
|
||||
@ -52,7 +51,6 @@ bool vgic_has_its(struct kvm *kvm)
|
||||
|
||||
return dist->has_its;
|
||||
}
|
||||
#endif
|
||||
|
||||
static unsigned long vgic_mmio_read_v3_misc(struct kvm_vcpu *vcpu,
|
||||
gpa_t addr, unsigned int len)
|
||||
|
@ -84,37 +84,11 @@ int vgic_v3_probe(const struct gic_kvm_info *info);
|
||||
int vgic_v3_map_resources(struct kvm *kvm);
|
||||
int vgic_register_redist_iodevs(struct kvm *kvm, gpa_t dist_base_address);
|
||||
|
||||
#ifdef CONFIG_KVM_ARM_VGIC_V3_ITS
|
||||
int vgic_register_its_iodevs(struct kvm *kvm);
|
||||
bool vgic_has_its(struct kvm *kvm);
|
||||
int kvm_vgic_register_its_device(void);
|
||||
void vgic_enable_lpis(struct kvm_vcpu *vcpu);
|
||||
int vgic_its_inject_msi(struct kvm *kvm, struct kvm_msi *msi);
|
||||
#else
|
||||
static inline int vgic_register_its_iodevs(struct kvm *kvm)
|
||||
{
|
||||
return -ENODEV;
|
||||
}
|
||||
|
||||
static inline bool vgic_has_its(struct kvm *kvm)
|
||||
{
|
||||
return false;
|
||||
}
|
||||
|
||||
static inline int kvm_vgic_register_its_device(void)
|
||||
{
|
||||
return -ENODEV;
|
||||
}
|
||||
|
||||
static inline void vgic_enable_lpis(struct kvm_vcpu *vcpu)
|
||||
{
|
||||
}
|
||||
|
||||
static inline int vgic_its_inject_msi(struct kvm *kvm, struct kvm_msi *msi)
|
||||
{
|
||||
return -ENODEV;
|
||||
}
|
||||
#endif
|
||||
|
||||
int kvm_register_vgic_device(unsigned long type);
|
||||
int vgic_lazy_init(struct kvm *kvm);
|
||||
|
@ -70,16 +70,19 @@ MODULE_AUTHOR("Qumranet");
|
||||
MODULE_LICENSE("GPL");
|
||||
|
||||
/* Architectures should define their poll value according to the halt latency */
|
||||
static unsigned int halt_poll_ns = KVM_HALT_POLL_NS_DEFAULT;
|
||||
unsigned int halt_poll_ns = KVM_HALT_POLL_NS_DEFAULT;
|
||||
module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR);
|
||||
EXPORT_SYMBOL_GPL(halt_poll_ns);
|
||||
|
||||
/* Default doubles per-vcpu halt_poll_ns. */
|
||||
static unsigned int halt_poll_ns_grow = 2;
|
||||
unsigned int halt_poll_ns_grow = 2;
|
||||
module_param(halt_poll_ns_grow, uint, S_IRUGO | S_IWUSR);
|
||||
EXPORT_SYMBOL_GPL(halt_poll_ns_grow);
|
||||
|
||||
/* Default resets per-vcpu halt_poll_ns . */
|
||||
static unsigned int halt_poll_ns_shrink;
|
||||
unsigned int halt_poll_ns_shrink;
|
||||
module_param(halt_poll_ns_shrink, uint, S_IRUGO | S_IWUSR);
|
||||
EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
|
||||
|
||||
/*
|
||||
* Ordering of locks:
|
||||
@ -595,7 +598,7 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
|
||||
stat_data->kvm = kvm;
|
||||
stat_data->offset = p->offset;
|
||||
kvm->debugfs_stat_data[p - debugfs_entries] = stat_data;
|
||||
if (!debugfs_create_file(p->name, 0444,
|
||||
if (!debugfs_create_file(p->name, 0644,
|
||||
kvm->debugfs_dentry,
|
||||
stat_data,
|
||||
stat_fops_per_vm[p->kind]))
|
||||
@ -3669,11 +3672,23 @@ static int vm_stat_get_per_vm(void *data, u64 *val)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int vm_stat_clear_per_vm(void *data, u64 val)
|
||||
{
|
||||
struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data;
|
||||
|
||||
if (val)
|
||||
return -EINVAL;
|
||||
|
||||
*(ulong *)((void *)stat_data->kvm + stat_data->offset) = 0;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int vm_stat_get_per_vm_open(struct inode *inode, struct file *file)
|
||||
{
|
||||
__simple_attr_check_format("%llu\n", 0ull);
|
||||
return kvm_debugfs_open(inode, file, vm_stat_get_per_vm,
|
||||
NULL, "%llu\n");
|
||||
vm_stat_clear_per_vm, "%llu\n");
|
||||
}
|
||||
|
||||
static const struct file_operations vm_stat_get_per_vm_fops = {
|
||||
@ -3699,11 +3714,26 @@ static int vcpu_stat_get_per_vm(void *data, u64 *val)
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int vcpu_stat_clear_per_vm(void *data, u64 val)
|
||||
{
|
||||
int i;
|
||||
struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data;
|
||||
struct kvm_vcpu *vcpu;
|
||||
|
||||
if (val)
|
||||
return -EINVAL;
|
||||
|
||||
kvm_for_each_vcpu(i, vcpu, stat_data->kvm)
|
||||
*(u64 *)((void *)vcpu + stat_data->offset) = 0;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int vcpu_stat_get_per_vm_open(struct inode *inode, struct file *file)
|
||||
{
|
||||
__simple_attr_check_format("%llu\n", 0ull);
|
||||
return kvm_debugfs_open(inode, file, vcpu_stat_get_per_vm,
|
||||
NULL, "%llu\n");
|
||||
vcpu_stat_clear_per_vm, "%llu\n");
|
||||
}
|
||||
|
||||
static const struct file_operations vcpu_stat_get_per_vm_fops = {
|
||||
@ -3738,7 +3768,26 @@ static int vm_stat_get(void *_offset, u64 *val)
|
||||
return 0;
|
||||
}
|
||||
|
||||
DEFINE_SIMPLE_ATTRIBUTE(vm_stat_fops, vm_stat_get, NULL, "%llu\n");
|
||||
static int vm_stat_clear(void *_offset, u64 val)
|
||||
{
|
||||
unsigned offset = (long)_offset;
|
||||
struct kvm *kvm;
|
||||
struct kvm_stat_data stat_tmp = {.offset = offset};
|
||||
|
||||
if (val)
|
||||
return -EINVAL;
|
||||
|
||||
spin_lock(&kvm_lock);
|
||||
list_for_each_entry(kvm, &vm_list, vm_list) {
|
||||
stat_tmp.kvm = kvm;
|
||||
vm_stat_clear_per_vm((void *)&stat_tmp, 0);
|
||||
}
|
||||
spin_unlock(&kvm_lock);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
DEFINE_SIMPLE_ATTRIBUTE(vm_stat_fops, vm_stat_get, vm_stat_clear, "%llu\n");
|
||||
|
||||
static int vcpu_stat_get(void *_offset, u64 *val)
|
||||
{
|
||||
@ -3758,7 +3807,27 @@ static int vcpu_stat_get(void *_offset, u64 *val)
|
||||
return 0;
|
||||
}
|
||||
|
||||
DEFINE_SIMPLE_ATTRIBUTE(vcpu_stat_fops, vcpu_stat_get, NULL, "%llu\n");
|
||||
static int vcpu_stat_clear(void *_offset, u64 val)
|
||||
{
|
||||
unsigned offset = (long)_offset;
|
||||
struct kvm *kvm;
|
||||
struct kvm_stat_data stat_tmp = {.offset = offset};
|
||||
|
||||
if (val)
|
||||
return -EINVAL;
|
||||
|
||||
spin_lock(&kvm_lock);
|
||||
list_for_each_entry(kvm, &vm_list, vm_list) {
|
||||
stat_tmp.kvm = kvm;
|
||||
vcpu_stat_clear_per_vm((void *)&stat_tmp, 0);
|
||||
}
|
||||
spin_unlock(&kvm_lock);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
DEFINE_SIMPLE_ATTRIBUTE(vcpu_stat_fops, vcpu_stat_get, vcpu_stat_clear,
|
||||
"%llu\n");
|
||||
|
||||
static const struct file_operations *stat_fops[] = {
|
||||
[KVM_STAT_VCPU] = &vcpu_stat_fops,
|
||||
@ -3776,7 +3845,7 @@ static int kvm_init_debug(void)
|
||||
|
||||
kvm_debugfs_num_entries = 0;
|
||||
for (p = debugfs_entries; p->name; ++p, kvm_debugfs_num_entries++) {
|
||||
if (!debugfs_create_file(p->name, 0444, kvm_debugfs_dir,
|
||||
if (!debugfs_create_file(p->name, 0644, kvm_debugfs_dir,
|
||||
(void *)(long)p->offset,
|
||||
stat_fops[p->kind]))
|
||||
goto out_dir;
|
||||
|
Loading…
Reference in New Issue
Block a user