Commit Graph

1335 Commits

Author SHA1 Message Date
Marc Zyngier
cb5faa8c7d Merge branch kvm-arm64/pmu-fixes into kvmarm-master/next
Fixes for the PMUv3 emulation of PMCR_EL0:
- Don't spuriously reset the cycle counter when resetting other counters
- Force PMCR_EL0 to become effective after having restored it

* kvm-arm64/pmu-fixes:
  KVM: arm64: Restore PMU configuration on first run
  KVM: arm64: Don't zero the cycle count register when PMCR_EL0.P is set
2021-06-18 14:25:59 +01:00
Marc Zyngier
d0c94c4979 KVM: arm64: Restore PMU configuration on first run
Restoring a guest with an active virtual PMU results in no perf
counters being instanciated on the host side. Not quite what
you'd expect from a restore.

In order to fix this, force a writeback of PMCR_EL0 on the first
run of a vcpu (using a new request so that it happens once the
vcpu has been loaded). This will in turn create all the host-side
counters that were missing.

Reported-by: Jinank Jain <jinankj@amazon.de>
Tested-by: Jinank Jain <jinankj@amazon.de>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/87wnrbylxv.wl-maz@kernel.org
Link: https://lore.kernel.org/r/b53dfcf9bbc4db7f96154b1cd5188d72b9766358.camel@amazon.de
2021-06-18 14:18:37 +01:00
Alexandru Elisei
2a71fabf6a KVM: arm64: Don't zero the cycle count register when PMCR_EL0.P is set
According to ARM DDI 0487G.a, page D13-3895, setting the PMCR_EL0.P bit to
1 has the following effect:

"Reset all event counters accessible in the current Exception level, not
including PMCCNTR_EL0, to zero."

Similar behaviour is described for AArch32 on page G8-7022. Make it so.

Fixes: c01d6a1802 ("KVM: arm64: pmu: Only handle supported event counters")
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210618105139.83795-1-alexandru.elisei@arm.com
2021-06-18 13:23:50 +01:00
Marc Zyngier
b88835a89d Merge branch kvm-arm64/mmu/stage2-cmos into kvmarm-master/next
Cache maintenance updates from Yanan Wang, moving the CMOs
down into the page-table code. This ensures that we only issue
them when actually performing a mapping rather than upfront.

* kvm-arm64/mmu/stage2-cmos:
  KVM: arm64: Move guest CMOs to the fault handlers
  KVM: arm64: Tweak parameters of guest cache maintenance functions
  KVM: arm64: Introduce mm_ops member for structure stage2_attr_data
  KVM: arm64: Introduce two cache maintenance callbacks
2021-06-18 12:36:30 +01:00
Yanan Wang
25aa28691b KVM: arm64: Move guest CMOs to the fault handlers
We currently uniformly perform CMOs of D-cache and I-cache in function
user_mem_abort before calling the fault handlers. If we get concurrent
guest faults(e.g. translation faults, permission faults) or some really
unnecessary guest faults caused by BBM, CMOs for the first vcpu are
necessary while the others later are not.

By moving CMOs to the fault handlers, we can easily identify conditions
where they are really needed and avoid the unnecessary ones. As it's a
time consuming process to perform CMOs especially when flushing a block
range, so this solution reduces much load of kvm and improve efficiency
of the stage-2 page table code.

We can imagine two specific scenarios which will gain much benefit:
1) In a normal VM startup, this solution will improve the efficiency of
handling guest page faults incurred by vCPUs, when initially populating
stage-2 page tables.
2) After live migration, the heavy workload will be resumed on the
destination VM, however all the stage-2 page tables need to be rebuilt
at the moment. So this solution will ease the performance drop during
resuming stage.

Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210617105824.31752-5-wangyanan55@huawei.com
2021-06-18 12:35:35 +01:00
Yanan Wang
378e6a9c78 KVM: arm64: Tweak parameters of guest cache maintenance functions
Adjust the parameter "kvm_pfn_t pfn" of __clean_dcache_guest_page
and __invalidate_icache_guest_page to "void *va", which paves the
way for converting these two guest CMO functions into callbacks in
structure kvm_pgtable_mm_ops. No functional change.

Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210617105824.31752-4-wangyanan55@huawei.com
2021-06-18 12:35:35 +01:00
Yanan Wang
a4d5ca5c7c KVM: arm64: Introduce mm_ops member for structure stage2_attr_data
Also add a mm_ops member for structure stage2_attr_data, since we
will move I-cache maintenance for guest stage-2 to the permission
path and as a result will need mm_ops for some callbacks.

Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210617105824.31752-3-wangyanan55@huawei.com
2021-06-18 12:35:35 +01:00
Marc Zyngier
46c886220a Merge branch kvm-arm64/mmu/reduce-vmemmap-overhead into kvmarm-master/next
Host stage-2 optimisations from Quentin Perret

* kvm-arm64/mmu/reduce-vmemmap-overhead:
  KVM: arm64: Use less bits for hyp_page refcount
  KVM: arm64: Use less bits for hyp_page order
  KVM: arm64: Remove hyp_pool pointer from struct hyp_page
  KVM: arm64: Unify MMIO and mem host stage-2 pools
  KVM: arm64: Remove list_head from hyp_page
  KVM: arm64: Use refcount at hyp to check page availability
  KVM: arm64: Move hyp_pool locking out of refcount helpers
2021-06-11 13:26:36 +01:00
Will Deacon
2f6a49bbc0 KVM: arm64: Kill 32-bit vCPUs on systems with mismatched EL0 support
If a vCPU is caught running 32-bit code on a system with mismatched
support at EL0, then we should kill it.

Acked-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210608180313.11502-4-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2021-06-11 13:25:40 +01:00
Quentin Perret
6929586d8e KVM: arm64: Use less bits for hyp_page refcount
The hyp_page refcount is currently encoded on 4 bytes even though we
never need to count that many objects in a page. Make it 2 bytes to save
some space in the vmemmap.

As overflows are more likely to happen as well, make sure to catch those
with a BUG in the increment function.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210608114518.748712-8-qperret@google.com
2021-06-11 13:24:12 +01:00
Quentin Perret
87ec060673 KVM: arm64: Use less bits for hyp_page order
The hyp_page order is currently encoded on 4 bytes even though it is
guaranteed to be smaller than this. Make it 2 bytes to reduce the hyp
vmemmap overhead.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210608114518.748712-7-qperret@google.com
2021-06-11 13:24:11 +01:00
Quentin Perret
d978b9cfe6 KVM: arm64: Remove hyp_pool pointer from struct hyp_page
Each struct hyp_page currently contains a pointer to a hyp_pool struct
where the page should be freed if its refcount reaches 0. However, this
information can always be inferred from the context in the EL2 code, so
drop the pointer to save a few bytes in the vmemmap.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210608114518.748712-6-qperret@google.com
2021-06-11 13:24:11 +01:00
Quentin Perret
7c350ea39e KVM: arm64: Unify MMIO and mem host stage-2 pools
We currently maintain two separate memory pools for the host stage-2,
one for pages used in the page-table when mapping memory regions, and
the other to map MMIO regions. The former is large enough to map all of
memory with page granularity and the latter can cover an arbitrary
portion of IPA space, but allows to 'recycle' pages.

However, this split makes accounting difficult to manage as pages at
intermediate levels of the page-table may be used to map both memory and
MMIO regions. Simplify the scheme by merging both pools into one. This
means we can now hit the -ENOMEM case in the memory abort path, but
we're still guaranteed forward-progress in the worst case by unmapping
MMIO regions. On the plus side this also means we can usually map a lot
more MMIO space at once if memory ranges happen to be mapped with block
mappings.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210608114518.748712-5-qperret@google.com
2021-06-11 13:24:11 +01:00
Quentin Perret
914cde58a0 KVM: arm64: Remove list_head from hyp_page
The list_head member of struct hyp_page is only needed when the page is
attached to a free-list, which by definition implies the page is free.
As such, nothing prevents us from using the page itself to store the
list_head, hence reducing the size of the vmemmap.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210608114518.748712-4-qperret@google.com
2021-06-11 13:24:11 +01:00
Quentin Perret
581982decc KVM: arm64: Use refcount at hyp to check page availability
The hyp buddy allocator currently checks the struct hyp_page list node
to see if a page is available for allocation or not when trying to
coalesce memory. Now that decrementing the refcount and attaching to
the buddy tree is done in the same critical section, we can rely on the
refcount of the buddy page to be in sync, which allows to replace the
list node check by a refcount check. This will ease removing the list
node from struct hyp_page later on.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210608114518.748712-3-qperret@google.com
2021-06-11 13:24:11 +01:00
Quentin Perret
6cbf874e51 KVM: arm64: Move hyp_pool locking out of refcount helpers
The hyp_page refcount helpers currently rely on the hyp_pool lock for
serialization. However, this means the refcounts can't be changed from
the buddy allocator core as it already holds the lock, which means pages
have to go through odd transient states.

For example, when a page is freed, its refcount is set to 0, and the
lock is transiently released before the page can be attached to a free
list in the buddy tree. This is currently harmless as the allocator
checks the list node of each page to see if it is available for
allocation or not, but it means the page refcount can't be trusted to
represent the state of the page even if the pool lock is held.

In order to fix this, remove the pool locking from the refcount helpers,
and move all the logic to the buddy allocator. This will simplify the
removal of the list node from struct hyp_page in a later patch.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210608114518.748712-2-qperret@google.com
2021-06-11 13:24:11 +01:00
Marc Zyngier
32ab5a5e97 Merge branch kvm-arm64/mmu/MMIO-block-mapping into kvmarm-master/next
MMIO block mapping support from Keqian Zhu, allowing larger
(and lazy) mappings for devices assigned to guests.

* kvm-arm64/mmu/MMIO-block-mapping:
  KVM: arm64: Try stage2 block mapping for host device MMIO
  KVM: arm64: Remove the creation time's mapping of MMIO regions
2021-06-06 11:15:20 +01:00
Ingo Molnar
a9e906b71f Merge branch 'sched/urgent' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-06-03 19:00:49 +02:00
Keqian Zhu
2aa53d68ce KVM: arm64: Try stage2 block mapping for host device MMIO
The MMIO region of a device maybe huge (GB level), try to use
block mapping in stage2 to speedup both map and unmap.

Compared to normal memory mapping, we should consider two more
points when try block mapping for MMIO region:

1. For normal memory mapping, the PA(host physical address) and
HVA have same alignment within PUD_SIZE or PMD_SIZE when we use
the HVA to request hugepage, so we don't need to consider PA
alignment when verifing block mapping. But for device memory
mapping, the PA and HVA may have different alignment.

2. For normal memory mapping, we are sure hugepage size properly
fit into vma, so we don't check whether the mapping size exceeds
the boundary of vma. But for device memory mapping, we should pay
attention to this.

This adds get_vma_page_shift() to get page shift for both normal
memory and device MMIO region, and check these two points when
selecting block mapping size for MMIO region.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210507110322.23348-3-zhukeqian1@huawei.com
2021-06-01 12:01:47 +01:00
Keqian Zhu
fd6f17bade KVM: arm64: Remove the creation time's mapping of MMIO regions
The MMIO regions may be unmapped for many reasons and can be remapped
by stage2 fault path. Map MMIO regions at creation time becomes a
minor optimization and makes these two mapping path hard to sync.

Remove the mapping code while keep the useful sanity check.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210507110322.23348-2-zhukeqian1@huawei.com
2021-06-01 12:01:40 +01:00
Marc Zyngier
5f59229680 KVM: arm64: timer: Add support for SW-based deactivation
In order to deal with the lack of active state, we need to use
the mask/unmask primitives (after all, the active state is just an
additional mask on top of the normal one).

To avoid adding a bunch of ugly conditionals in the timer and vgic
code, let's use a timer-specific irqdomain to deal with the state
conversion. Yes, this is an unexpected use of irqdomains, but
there is no reason not to be just as creative as the designers
of the HW...

This involves overloading the vcpu_affinity, set_irqchip_state
and eoi callbacks so that the rest of the KVM code can continue
ignoring the oddities of the underlying platform.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-01 10:46:00 +01:00
Marc Zyngier
2f2f7e39db KVM: arm64: timer: Refactor IRQ configuration
As we are about to add some more things to the timer IRQ
configuration, move this code out of the main timer init code
into its own set of functions.

No functional changes.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-01 10:46:00 +01:00
Marc Zyngier
354920e794 KVM: arm64: vgic: Implement SW-driven deactivation
In order to deal with these systems that do not offer HW-based
deactivation of interrupts, let implement a SW-based approach:

- When the irq is queued into a LR, treat it as a pure virtual
  interrupt and set the EOI flag in the LR.

- When the interrupt state is read back from the LR, force a
  deactivation when the state is invalid (neither active nor
  pending)

Interrupts requiring such treatment get the VGIC_SW_RESAMPLE flag.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-01 10:46:00 +01:00
Marc Zyngier
db75f1a33f KVM: arm64: vgic: move irq->get_input_level into an ops structure
We already have the option to attach a callback to an interrupt
to retrieve its pending state. As we are planning to expand this
facility, move this callback into its own data structure.

This will limit the size of individual interrupts as the ops
structures can be shared across multiple interrupts.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-01 10:45:59 +01:00
Marc Zyngier
f6c3e24fb7 KVM: arm64: vgic: Let an interrupt controller advertise lack of HW deactivation
The vGIC, as architected by ARM, allows a virtual interrupt to
trigger the deactivation of a physical interrupt. This allows
the following interrupt to be delivered without requiring an exit.

However, some implementations have choosen not to implement this,
meaning that we will need some unsavoury workarounds to deal with this.

On detecting such a case, taint the kernel and spit a nastygram.
We'll deal with this in later patches.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-01 10:45:59 +01:00
Marc Zyngier
669062d2a1 KVM: arm64: vgic: Be tolerant to the lack of maintenance interrupt masking
As it turns out, not all the interrupt controllers are able to
expose a vGIC maintenance interrupt that can be independently
enabled/disabled.

And to be fair, it doesn't really matter as all we require is
for the interrupt to kick us out of guest mode out way or another.

To that effect, add gic_kvm_info.no_maint_irq_mask for an interrupt
controller to advertise the lack of masking.

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-01 10:45:59 +01:00
Marc Zyngier
74501499d4 KVM: arm64: Handle physical FIQ as an IRQ while running a guest
As we we now entertain the possibility of FIQ being used on the host,
treat the signalling of a FIQ while running a guest as an IRQ,
causing an exit instead of a HYP panic.

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-01 10:45:58 +01:00
Marc Zyngier
0e5cb77706 irqchip/gic: Split vGIC probing information from the GIC code
The vGIC advertising code is unsurprisingly very much tied to
the GIC implementations. However, we are about to extend the
support to lesser implementations.

Let's dissociate the vgic registration from the GIC code and
move it into KVM, where it makes a bit more sense. This also
allows us to mark the gic_kvm_info structures as __initdata.

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-06-01 10:45:58 +01:00
Marc Zyngier
66e94d5caf KVM: arm64: Prevent mixed-width VM creation
It looks like we have tolerated creating mixed-width VMs since...
forever. However, that was never the intention, and we'd rather
not have to support that pointless complexity.

Forbid such a setup by making sure all the vcpus have the same
register width.

Reported-by: Steven Price <steven.price@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210524170752.1549797-1-maz@kernel.org
2021-05-27 10:34:33 +01:00
Zenghui Yu
e3e880bb15 KVM: arm64: Resolve all pending PC updates before immediate exit
Commit 26778aaa13 ("KVM: arm64: Commit pending PC adjustemnts before
returning to userspace") fixed the PC updating issue by forcing an explicit
synchronisation of the exception state on vcpu exit to userspace.

However, we forgot to take into account the case where immediate_exit is
set by userspace and KVM_RUN will exit immediately. Fix it by resolving all
pending PC updates before returning to userspace.

Since __kvm_adjust_pc() relies on a loaded vcpu context, I moved the
immediate_exit checking right after vcpu_load(). We will get some overhead
if immediate_exit is true (which should hopefully be rare).

Fixes: 26778aaa13 ("KVM: arm64: Commit pending PC adjustemnts before returning to userspace")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210526141831.1662-1-yuzenghui@huawei.com
Cc: stable@vger.kernel.org # 5.11
2021-05-27 10:33:06 +01:00
Fuad Tabba
fade9c2c6e arm64: Rename arm64-internal cache maintenance functions
Although naming across the codebase isn't that consistent, it
tends to follow certain patterns. Moreover, the term "flush"
isn't defined in the Arm Architecture reference manual, and might
be interpreted to mean clean, invalidate, or both for a cache.

Rename arm64-internal functions to make the naming internally
consistent, as well as making it consistent with the Arm ARM, by
specifying whether it applies to the instruction, data, or both
caches, whether the operation is a clean, invalidate, or both.
Also specify which point the operation applies to, i.e., to the
point of unification (PoU), coherency (PoC), or persistence
(PoP).

This commit applies the following sed transformation to all files
under arch/arm64:

"s/\b__flush_cache_range\b/caches_clean_inval_pou_macro/g;"\
"s/\b__flush_icache_range\b/caches_clean_inval_pou/g;"\
"s/\binvalidate_icache_range\b/icache_inval_pou/g;"\
"s/\b__flush_dcache_area\b/dcache_clean_inval_poc/g;"\
"s/\b__inval_dcache_area\b/dcache_inval_poc/g;"\
"s/__clean_dcache_area_poc\b/dcache_clean_poc/g;"\
"s/\b__clean_dcache_area_pop\b/dcache_clean_pop/g;"\
"s/\b__clean_dcache_area_pou\b/dcache_clean_pou/g;"\
"s/\b__flush_cache_user_range\b/caches_clean_inval_user_pou/g;"\
"s/\b__flush_icache_all\b/icache_inval_all_pou/g;"

Note that __clean_dcache_area_poc is deliberately missing a word
boundary check at the beginning in order to match the efistub
symbols in image-vars.h.

Also note that, despite its name, __flush_icache_range operates
on both instruction and data caches. The name change here
reflects that.

No functional change intended.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210524083001.2586635-19-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-25 19:27:49 +01:00
Fuad Tabba
814b186079 arm64: __flush_dcache_area to take end parameter instead of size
To be consistent with other functions with similar names and
functionality in cacheflush.h, cache.S, and cachetlb.rst, change
to specify the range in terms of start and end, as opposed to
start and size.

No functional change intended.

Reported-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210524083001.2586635-13-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-25 19:27:49 +01:00
Fuad Tabba
163d3f8069 arm64: dcache_by_line_op to take end parameter instead of size
To be consistent with other functions with similar names and
functionality in cacheflush.h, cache.S, and cachetlb.rst, change
to specify the range in terms of start and end, as opposed to
start and size.

No functional change intended.

Reported-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210524083001.2586635-12-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-05-25 19:27:49 +01:00
Marc Zyngier
cb853ded1d KVM: arm64: Fix debug register indexing
Commit 03fdfb2690 ("KVM: arm64: Don't write junk to sysregs on
reset") flipped the register number to 0 for all the debug registers
in the sysreg table, hereby indicating that these registers live
in a separate shadow structure.

However, the author of this patch failed to realise that all the
accessors are using that particular index instead of the register
encoding, resulting in all the registers hitting index 0. Not quite
a valid implementation of the architecture...

Address the issue by fixing all the accessors to use the CRm field
of the encoding, which contains the debug register index.

Fixes: 03fdfb2690 ("KVM: arm64: Don't write junk to sysregs on reset")
Reported-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
2021-05-15 10:27:59 +01:00
Marc Zyngier
26778aaa13 KVM: arm64: Commit pending PC adjustemnts before returning to userspace
KVM currently updates PC (and the corresponding exception state)
using a two phase approach: first by setting a set of flags,
then by converting these flags into a state update when the vcpu
is about to enter the guest.

However, this creates a disconnect with userspace if the vcpu thread
returns there with any exception/PC flag set. In this case, the exposed
context is wrong, as userspace doesn't have access to these flags
(they aren't architectural). It also means that these flags are
preserved across a reset, which isn't expected.

To solve this problem, force an explicit synchronisation of the
exception state on vcpu exit to userspace. As an optimisation
for nVHE systems, only perform this when there is something pending.

Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Tested-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org # 5.11
2021-05-15 10:27:59 +01:00
Marc Zyngier
f5e3068061 KVM: arm64: Move __adjust_pc out of line
In order to make it easy to call __adjust_pc() from the EL1 code
(in the case of nVHE), rename it to __kvm_adjust_pc() and move
it out of line.

No expected functional change.

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Zenghui Yu <yuzenghui@huawei.com>
Tested-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org # 5.11
2021-05-15 10:27:59 +01:00
Quentin Perret
3fdc15fe8c KVM: arm64: Mark the host stage-2 memory pools static
The host stage-2 memory pools are not used outside of mem_protect.c,
mark them static.

Fixes: 1025c8c0c6 ("KVM: arm64: Wrap the host with a stage 2")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210514085640.3917886-3-qperret@google.com
2021-05-15 10:27:59 +01:00
Quentin Perret
eaa9b88dae KVM: arm64: Mark pkvm_pgtable_mm_ops static
It is not used outside of setup.c, mark it static.

Fixes:f320bc742bc2 ("KVM: arm64: Prepare the creation of s1 mappings at EL2")

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210514085640.3917886-2-qperret@google.com
2021-05-15 10:27:58 +01:00
kernel test robot
fcb8283920 KVM: arm64: Fix boolreturn.cocci warnings
arch/arm64/kvm/mmu.c:1114:9-10: WARNING: return of 0/1 in function 'kvm_age_gfn' with return type bool
arch/arm64/kvm/mmu.c:1084:9-10: WARNING: return of 0/1 in function 'kvm_set_spte_gfn' with return type bool
arch/arm64/kvm/mmu.c:1127:9-10: WARNING: return of 0/1 in function 'kvm_test_age_gfn' with return type bool
arch/arm64/kvm/mmu.c:1070:9-10: WARNING: return of 0/1 in function 'kvm_unmap_gfn_range' with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

Fixes: cd4c718352 ("KVM: arm64: Convert to the gfn-based MMU notifier callbacks")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210426223357.GA45871@cd4295a34ed8
2021-05-15 10:27:58 +01:00
Peter Zijlstra
63b3f96e1a kvm: Select SCHED_INFO instead of TASK_DELAY_ACCT
AFAICT KVM only relies on SCHED_INFO. Nothing uses the p->delays data
that belongs to TASK_DELAY_ACCT.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Link: https://lkml.kernel.org/r/20210505111525.187225172@infradead.org
2021-05-12 11:43:24 +02:00
Linus Torvalds
152d32aa84 ARM:
- Stage-2 isolation for the host kernel when running in protected mode
 
 - Guest SVE support when running in nVHE mode
 
 - Force W^X hypervisor mappings in nVHE mode
 
 - ITS save/restore for guests using direct injection with GICv4.1
 
 - nVHE panics now produce readable backtraces
 
 - Guest support for PTP using the ptp_kvm driver
 
 - Performance improvements in the S2 fault handler
 
 x86:
 
 - Optimizations and cleanup of nested SVM code
 
 - AMD: Support for virtual SPEC_CTRL
 
 - Optimizations of the new MMU code: fast invalidation,
   zap under read lock, enable/disably dirty page logging under
   read lock
 
 - /dev/kvm API for AMD SEV live migration (guest API coming soon)
 
 - support SEV virtual machines sharing the same encryption context
 
 - support SGX in virtual machines
 
 - add a few more statistics
 
 - improved directed yield heuristics
 
 - Lots and lots of cleanups
 
 Generic:
 
 - Rework of MMU notifier interface, simplifying and optimizing
 the architecture-specific code
 
 - Some selftests improvements
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCJ13kUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroM1HAgAqzPxEtiTPTFeFJV5cnPPJ3dFoFDK
 y/juZJUQ1AOtvuWzzwuf175ewkv9vfmtG6rVohpNSkUlJYeoc6tw7n8BTTzCVC1b
 c/4Dnrjeycr6cskYlzaPyV6MSgjSv5gfyj1LA5UEM16LDyekmaynosVWY5wJhju+
 Bnyid8l8Utgz+TLLYogfQJQECCrsU0Wm//n+8TWQgLf1uuiwshU5JJe7b43diJrY
 +2DX+8p9yWXCTz62sCeDWNahUv8AbXpMeJ8uqZPYcN1P0gSEUGu8xKmLOFf9kR7b
 M4U1Gyz8QQbjd2lqnwiWIkvRLX6gyGVbq2zH0QbhUe5gg3qGUX7JjrhdDQ==
 =AXUi
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "This is a large update by KVM standards, including AMD PSP (Platform
  Security Processor, aka "AMD Secure Technology") and ARM CoreSight
  (debug and trace) changes.

  ARM:

   - CoreSight: Add support for ETE and TRBE

   - Stage-2 isolation for the host kernel when running in protected
     mode

   - Guest SVE support when running in nVHE mode

   - Force W^X hypervisor mappings in nVHE mode

   - ITS save/restore for guests using direct injection with GICv4.1

   - nVHE panics now produce readable backtraces

   - Guest support for PTP using the ptp_kvm driver

   - Performance improvements in the S2 fault handler

  x86:

   - AMD PSP driver changes

   - Optimizations and cleanup of nested SVM code

   - AMD: Support for virtual SPEC_CTRL

   - Optimizations of the new MMU code: fast invalidation, zap under
     read lock, enable/disably dirty page logging under read lock

   - /dev/kvm API for AMD SEV live migration (guest API coming soon)

   - support SEV virtual machines sharing the same encryption context

   - support SGX in virtual machines

   - add a few more statistics

   - improved directed yield heuristics

   - Lots and lots of cleanups

  Generic:

   - Rework of MMU notifier interface, simplifying and optimizing the
     architecture-specific code

   - a handful of "Get rid of oprofile leftovers" patches

   - Some selftests improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits)
  KVM: selftests: Speed up set_memory_region_test
  selftests: kvm: Fix the check of return value
  KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt()
  KVM: SVM: Skip SEV cache flush if no ASIDs have been used
  KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids()
  KVM: SVM: Drop redundant svm_sev_enabled() helper
  KVM: SVM: Move SEV VMCB tracking allocation to sev.c
  KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup()
  KVM: SVM: Unconditionally invoke sev_hardware_teardown()
  KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported)
  KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y
  KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables
  KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features
  KVM: SVM: Move SEV module params/variables to sev.c
  KVM: SVM: Disable SEV/SEV-ES if NPT is disabled
  KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails
  KVM: SVM: Zero out the VMCB array used to track SEV ASID association
  x86/sev: Drop redundant and potentially misleading 'sev_enabled'
  KVM: x86: Move reverse CPUID helpers to separate header file
  KVM: x86: Rename GPR accessors to make mode-aware variants the defaults
  ...
2021-05-01 10:14:08 -07:00
Linus Torvalds
57fa2369ab CFI on arm64 series for v5.13-rc1
- Clean up list_sort prototypes (Sami Tolvanen)
 
 - Introduce CONFIG_CFI_CLANG for arm64 (Sami Tolvanen)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmCHCR8ACgkQiXL039xt
 wCZyFQ//fnUZaXR2K354zDyW6CJljMf+d94RF6rH+J6eMTH2/HXa5v0iJokwABLf
 ussP6qF4k5wtmI22Gm9A5Zc3e4iiry5pC0jOdk0mk4gzWwFN9MdgNxJZIGA3xqhS
 bsBK4AGrVKjtZl48G1/ZxJuNDeJhVp6GNK2n6/Gl4rZF6R7D/Upz0XelyJRdDpcM
 HIGma7jZl6xfGU0mdWCzpOGK1zdMca1WVs7A4YuurSbLn5PZJrcNVWLouDqt/Si2
 AduSri1gyPClicgvqWjMOzhUpuw/nJtBLRl1x1EsWk/KSZ1/uNVjlewfzdN4fZrr
 zbtFr2gLubYLK6JOX7/LqoHlOTgE3tYLL+WIVN75DsOGZBKgHhmebTmWLyqzV0SL
 oqcyM5d3ucC6msdtAK5Fv4MSp8rpjqlK1Ha4SGRT6kC2wut7AhZ3KD7eyRIz8mV9
 Sa9mhignGFJnTEUp+LSbYdrAudgSKxB40WyXPmswAXX4VJFRD4ONrrcAON/SzkUT
 Hw/JdFRCKkJjgwNQjIQoZcUNMTbFz2PlNIEnjJWm38YImQKQlCb2mXaZKCwBkf45
 aheCZk17eKoxTCXFMd+KxlyNEtS2yBfq/PpZgvw7GW/pfFbWUg1+2O41LnihIe5v
 zu0hN1wNCQqgfxiMZqX1OTb9C/2vybzGsXILt+9nppjZ8EBU7iU=
 =wU6U
 -----END PGP SIGNATURE-----

Merge tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull CFI on arm64 support from Kees Cook:
 "This builds on last cycle's LTO work, and allows the arm64 kernels to
  be built with Clang's Control Flow Integrity feature. This feature has
  happily lived in Android kernels for almost 3 years[1], so I'm excited
  to have it ready for upstream.

  The wide diffstat is mainly due to the treewide fixing of mismatched
  list_sort prototypes. Other things in core kernel are to address
  various CFI corner cases. The largest code portion is the CFI runtime
  implementation itself (which will be shared by all architectures
  implementing support for CFI). The arm64 pieces are Acked by arm64
  maintainers rather than coming through the arm64 tree since carrying
  this tree over there was going to be awkward.

  CFI support for x86 is still under development, but is pretty close.
  There are a handful of corner cases on x86 that need some improvements
  to Clang and objtool, but otherwise works well.

  Summary:

   - Clean up list_sort prototypes (Sami Tolvanen)

   - Introduce CONFIG_CFI_CLANG for arm64 (Sami Tolvanen)"

* tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  arm64: allow CONFIG_CFI_CLANG to be selected
  KVM: arm64: Disable CFI for nVHE
  arm64: ftrace: use function_nocfi for ftrace_call
  arm64: add __nocfi to __apply_alternatives
  arm64: add __nocfi to functions that jump to a physical address
  arm64: use function_nocfi with __pa_symbol
  arm64: implement function_nocfi
  psci: use function_nocfi for cpu_resume
  lkdtm: use function_nocfi
  treewide: Change list_sort to use const pointers
  bpf: disable CFI in dispatcher functions
  kallsyms: strip ThinLTO hashes from static functions
  kthread: use WARN_ON_FUNCTION_MISMATCH
  workqueue: use WARN_ON_FUNCTION_MISMATCH
  module: ensure __cfi_check alignment
  mm: add generic function_nocfi macro
  cfi: add __cficanonical
  add support for Clang CFI
2021-04-27 10:16:46 -07:00
Linus Torvalds
91552ab8ff The usual updates from the irq departement:
Core changes:
 
  - Provide IRQF_NO_AUTOEN as a flag for request*_irq() so drivers can be
    cleaned up which either use a seperate mechanism to prevent auto-enable
    at request time or have a racy mechanism which disables the interrupt
    right after request.
 
  - Get rid of the last usage of irq_create_identity_mapping() and remove
    the interface.
 
  - An overhaul of tasklet_disable(). Most usage sites of tasklet_disable()
    are in task context and usually in cleanup, teardown code pathes.
    tasklet_disable() spinwaits for a tasklet which is currently executed.
    That's not only a problem for PREEMPT_RT where this can lead to a live
    lock when the disabling task preempts the softirq thread. It's also
    problematic in context of virtualization when the vCPU which runs the
    tasklet is scheduled out and the disabling code has to spin wait until
    it's scheduled back in. Though there are a few code pathes which invoke
    tasklet_disable() from non-sleepable context. For these a new disable
    variant which still spinwaits is provided which allows to switch
    tasklet_disable() to a sleep wait mechanism. For the atomic use cases
    this does not solve the live lock issue on PREEMPT_RT. That is mitigated
    by blocking on the RT specific softirq lock.
 
  - The PREEMPT_RT specific implementation of softirq processing and
    local_bh_disable/enable().
 
    On RT enabled kernels soft interrupt processing happens always in task
    context and all interrupt handlers, which are not explicitly marked to
    be invoked in hard interrupt context are forced into task context as
    well. This allows to protect against softirq processing with a per
    CPU lock, which in turn allows to make BH disabled regions preemptible.
 
    Most of the softirq handling code is still shared. The RT/non-RT
    specific differences are addressed with a set of inline functions which
    provide the context specific functionality. The local_bh_disable() /
    local_bh_enable() mechanism are obviously seperate.
 
  - The usual set of small improvements and cleanups
 
 Driver changes:
 
  - New drivers for Nuvoton WPCM450 and DT 79rc3243x interrupt controllers
 
  - Extended functionality for MStar, STM32 and SC7280 irq chips
 
  - Enhanced robustness for ARM GICv3/4.1 drivers
 
  - The usual set of cleanups and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmCGh5wTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoZ+/EACWBpQ/2ZHizEw1bzjaDzJrR8U228xu
 wNi7nSP92Y07nJ3cCX7a6TJ53mqd0n3RT+DprlsOuqSN0D7Ktr/x44V/aZtm0d3N
 GkFOlpeGCRnHusLaUTwk7a8289LuoQ7OhSxIB409n1I4nLI96ZK41D1tYonMYl6E
 nxDiGADASfjaciBWbjwJO/mlwmiW/VRpSTxswx0wzakFfbIx9iKyKv1bCJQZ5JK+
 lHmf0jxpDIs1EVK/ElJ9Ky6TMBlEmZyiX7n6rujtwJ1W+Jc/uL/y8pLJvGwooVmI
 yHTYsLMqzviCbAMhJiB3h1qs3GbCGlM78prgJTnOd0+xEUOCcopCRQlsTXVBq8Nb
 OS+HNkYmYXRfiSH6lINJsIok8Xis28bAw/qWz2Ho+8wLq0TI8crK38roD1fPndee
 FNJRhsPPOBkscpIldJ0Cr0X5lclkJFiAhAxORPHoseKvQSm7gBMB7H99xeGRffTn
 yB3XqeTJMvPNmAHNN4Brv6ey3OjwnEWBgwcnIM2LtbIlRtlmxTYuR+82OPOgEvzk
 fSrjFFJqu0LEMLEOXS4pYN824PawjV//UAy4IaG8AodmUUCSGHgw1gTVa4sIf72t
 tXY54HqWfRWRpujhVRgsZETqBUtZkL6yvpoe8f6H7P91W5tAfv3oj4ch9RkhUo+Z
 b0/u9T0+Fpbg+w==
 =id4G
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "The usual updates from the irq departement:

  Core changes:

   - Provide IRQF_NO_AUTOEN as a flag for request*_irq() so drivers can
     be cleaned up which either use a seperate mechanism to prevent
     auto-enable at request time or have a racy mechanism which disables
     the interrupt right after request.

   - Get rid of the last usage of irq_create_identity_mapping() and
     remove the interface.

   - An overhaul of tasklet_disable().

     Most usage sites of tasklet_disable() are in task context and
     usually in cleanup, teardown code pathes. tasklet_disable()
     spinwaits for a tasklet which is currently executed. That's not
     only a problem for PREEMPT_RT where this can lead to a live lock
     when the disabling task preempts the softirq thread. It's also
     problematic in context of virtualization when the vCPU which runs
     the tasklet is scheduled out and the disabling code has to spin
     wait until it's scheduled back in.

     There are a few code pathes which invoke tasklet_disable() from
     non-sleepable context. For these a new disable variant which still
     spinwaits is provided which allows to switch tasklet_disable() to a
     sleep wait mechanism. For the atomic use cases this does not solve
     the live lock issue on PREEMPT_RT. That is mitigated by blocking on
     the RT specific softirq lock.

   - The PREEMPT_RT specific implementation of softirq processing and
     local_bh_disable/enable().

     On RT enabled kernels soft interrupt processing happens always in
     task context and all interrupt handlers, which are not explicitly
     marked to be invoked in hard interrupt context are forced into task
     context as well. This allows to protect against softirq processing
     with a per CPU lock, which in turn allows to make BH disabled
     regions preemptible.

     Most of the softirq handling code is still shared. The RT/non-RT
     specific differences are addressed with a set of inline functions
     which provide the context specific functionality. The
     local_bh_disable() / local_bh_enable() mechanism are obviously
     seperate.

   - The usual set of small improvements and cleanups

  Driver changes:

   - New drivers for Nuvoton WPCM450 and DT 79rc3243x interrupt
     controllers

   - Extended functionality for MStar, STM32 and SC7280 irq chips

   - Enhanced robustness for ARM GICv3/4.1 drivers

   - The usual set of cleanups and improvements all over the place"

* tag 'irq-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  irqchip/xilinx: Expose Kconfig option for Zynq/ZynqMP
  irqchip/gic-v3: Do not enable irqs when handling spurious interrups
  dt-bindings: interrupt-controller: Add IDT 79RC3243x Interrupt Controller
  irqchip: Add support for IDT 79rc3243x interrupt controller
  irqdomain: Drop references to recusive irqdomain setup
  irqdomain: Get rid of irq_create_strict_mappings()
  irqchip/jcore-aic: Kill use of irq_create_strict_mappings()
  ARM: PXA: Kill use of irq_create_strict_mappings()
  irqchip/gic-v4.1: Disable vSGI upon (GIC CPUIF < v4.1) detection
  irqchip/tb10x: Use 'fallthrough' to eliminate a warning
  genirq: Reduce irqdebug cacheline bouncing
  kernel: Initialize cpumask before parsing
  irqchip/wpcm450: Drop COMPILE_TEST
  irqchip/irq-mst: Support polarity configuration
  irqchip: Add driver for WPCM450 interrupt controller
  dt-bindings: interrupt-controller: Add nuvoton, wpcm450-aic
  dt-bindings: qcom,pdc: Add compatible for sc7280
  irqchip/stm32: Add usart instances exti direct event support
  irqchip/gic-v3: Fix OF_BAD_ADDR error handling
  irqchip/sifive-plic: Mark two global variables __ro_after_init
  ...
2021-04-26 09:43:16 -07:00
Paolo Bonzini
c4f71901d5 KVM/arm64 updates for Linux 5.13
New features:
 
 - Stage-2 isolation for the host kernel when running in protected mode
 - Guest SVE support when running in nVHE mode
 - Force W^X hypervisor mappings in nVHE mode
 - ITS save/restore for guests using direct injection with GICv4.1
 - nVHE panics now produce readable backtraces
 - Guest support for PTP using the ptp_kvm driver
 - Performance improvements in the S2 fault handler
 - Alexandru is now a reviewer (not really a new feature...)
 
 Fixes:
 - Proper emulation of the GICR_TYPER register
 - Handle the complete set of relocation in the nVHE EL2 object
 - Get rid of the oprofile dependency in the PMU code (and of the
   oprofile body parts at the same time)
 - Debug and SPE fixes
 - Fix vcpu reset
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmCCpuAPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpD2G8QALWQYeBggKnNmAJfuihzZ2WariBmgcENs2R2
 qNZ/Py6dIF+b69P68nmgrEV1x2Kp35cPJbBwXnnrS4FCB5tk0b8YMaj00QbiRIYV
 UXbPxQTmYO1KbevpoEcw8NmR4bZJ/hRYPuzcQG7CCMKIZw0zj2cMcBofzQpTOAp/
 CgItdcv7at3iwamQatfU9vUmC0nDdnjdIwSxTAJOYMVV1ENwtnYSNgZVo4XLTg7n
 xR/5Qx27PKBJw7GyTRAIIxKAzNXG2tDL+GVIHe4AnRp3z3La8sr6PJf7nz9MCmco
 ISgeY7EGQINzmm4LahpnV+2xwwxOWo8QotxRFGNuRTOBazfARyAbp97yJ6eXJUpa
 j0qlg3xK9neyIIn9BQKkKx4sY9V45yqkuVDsK6odmqPq3EE01IMTRh1N/XQi+sTF
 iGrlM3ZW4AjlT5zgtT9US/FRXeDKoYuqVCObJeXZdm3sJSwEqTAs0JScnc0YTsh7
 m30CODnomfR2y5X6GoaubbQ0wcZ2I20K1qtIm+2F6yzD5P1/3Yi8HbXMxsSWyYWZ
 1ldoSa+ZUQlzV9Ot0S3iJ4PkphLKmmO96VlxE2+B5gQG50PZkLzsr8bVyYOuJC8p
 T83xT9xd07cy+FcGgF9veZL99Y6BLHMa6ZwFUolYNbzJxqrmqyR1aiJMEBIcX+aP
 ACeKW1w5
 =fpey
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 updates for Linux 5.13

New features:

- Stage-2 isolation for the host kernel when running in protected mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
- Alexandru is now a reviewer (not really a new feature...)

Fixes:
- Proper emulation of the GICR_TYPER register
- Handle the complete set of relocation in the nVHE EL2 object
- Get rid of the oprofile dependency in the PMU code (and of the
  oprofile body parts at the same time)
- Debug and SPE fixes
- Fix vcpu reset
2021-04-23 07:41:17 -04:00
Paolo Bonzini
fd49e8ee70 Merge branch 'kvm-sev-cgroup' into HEAD 2021-04-22 13:19:01 -04:00
Lorenzo Pieralisi
46135d6f87 irqchip/gic-v4.1: Disable vSGI upon (GIC CPUIF < v4.1) detection
GIC CPU interfaces versions predating GIC v4.1 were not built to
accommodate vINTID within the vSGI range; as reported in the GIC
specifications (8.2 "Changes to the CPU interface"), it is
CONSTRAINED UNPREDICTABLE to deliver a vSGI to a PE with
ID_AA64PFR0_EL1.GIC < b0011.

Check the GIC CPUIF version by reading the SYS_ID_AA64_PFR0_EL1.

Disable vSGIs if a CPUIF version < 4.1 is detected to prevent using
vSGIs on systems where they may misbehave.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210317100719.3331-2-lorenzo.pieralisi@arm.com
2021-04-22 15:55:21 +01:00
Marc Zyngier
9a8aae605b Merge branch 'kvm-arm64/kill_oprofile_dependency' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-22 13:41:49 +01:00
Marc Zyngier
5421db1be3 KVM: arm64: Divorce the perf code from oprofile helpers
KVM/arm64 is the sole user of perf_num_counters(), and really
could do without it. Stop using the obsolete API by relying on
the existing probing code.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210414134409.1266357-2-maz@kernel.org
2021-04-22 13:32:39 +01:00
Sean Christopherson
cd4c718352 KVM: arm64: Convert to the gfn-based MMU notifier callbacks
Move arm64 to the gfn-base MMU notifier APIs, which do the hva->gfn
lookup in common code.

No meaningful functional change intended, though the exact order of
operations is slightly different since the memslot lookups occur before
calling into arch code.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210402005658.3024832-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:31:07 -04:00
Paolo Bonzini
6c9dd6d262 KVM: constify kvm_arch_flush_remote_tlbs_memslot
memslots are stored in RCU and there should be no need to
change them.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:31:04 -04:00
Maxim Levitsky
fa18aca927 KVM: aarch64: implement KVM_CAP_SET_GUEST_DEBUG2
Move KVM_GUESTDBG_VALID_MASK to kvm_host.h
and use it to return the value of this capability.
Compile tested only.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210401135451.1004564-5-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:31:03 -04:00
Sean Christopherson
501b918525 KVM: Move arm64's MMU notifier trace events to generic code
Move arm64's MMU notifier trace events into common code in preparation
for doing the hva->gfn lookup in common code.  The alternative would be
to trace the gfn instead of hva, but that's not obviously better and
could also be done in common code.  Tracing the notifiers is also quite
handy for debug regardless of architecture.

Remove a completely redundant tracepoint from PPC e500.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210326021957.1424875-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-04-17 08:30:56 -04:00
Marc Zyngier
e629003215 Merge branch 'kvm-arm64/vlpi-save-restore' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-13 15:41:45 +01:00
Marc Zyngier
c90aad55c5 Merge branch 'kvm-arm64/vgic-5.13' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-13 15:41:33 +01:00
Marc Zyngier
d8f37d291c Merge branch 'kvm-arm64/ptp' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-13 15:41:22 +01:00
Marc Zyngier
bba8857feb Merge branch 'kvm-arm64/nvhe-wxn' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-13 15:41:08 +01:00
Marc Zyngier
5c92a7643b Merge branch 'kvm-arm64/nvhe-panic-info' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-13 15:38:03 +01:00
Marc Zyngier
ad569b70aa Merge branch 'kvm-arm64/misc-5.13' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-13 15:35:58 +01:00
Marc Zyngier
3d63ef4d52 Merge branch 'kvm-arm64/memslot-fixes' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-13 15:35:40 +01:00
Marc Zyngier
ac5ce2456e Merge branch 'kvm-arm64/host-stage2' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-13 15:35:09 +01:00
Marc Zyngier
fbb31e5f3a Merge branch 'kvm-arm64/debug-5.13' into kvmarm-master/next
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-13 15:34:15 +01:00
Eric Auger
94ac083539 KVM: arm/arm64: Fix KVM_VGIC_V3_ADDR_TYPE_REDIST read
When reading the base address of the a REDIST region
through KVM_VGIC_V3_ADDR_TYPE_REDIST we expect the
redistributor region list to be populated with a single
element.

However list_first_entry() expects the list to be non empty.
Instead we should use list_first_entry_or_null which effectively
returns NULL if the list is empty.

Fixes: dbd9733ab6 ("KVM: arm/arm64: Replace the single rdist region by a list")
Cc: <Stable@vger.kernel.org> # v4.18+
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reported-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210412150034.29185-1-eric.auger@redhat.com
2021-04-13 15:04:50 +01:00
Alexandru Elisei
96f4f6809b KVM: arm64: Don't advertise FEAT_SPE to guests
Even though KVM sets up MDCR_EL2 to trap accesses to the SPE buffer and
sampling control registers and to inject an undefined exception, the
presence of FEAT_SPE is still advertised in the ID_AA64DFR0_EL1 register,
if the hardware supports it. Getting an undefined exception when accessing
a register usually happens for a hardware feature which is not implemented,
and indeed this is how PMU emulation is handled when the virtual machine
has been created without the KVM_ARM_VCPU_PMU_V3 feature. Let's be
consistent and never advertise FEAT_SPE, because KVM doesn't have support
for emulating it yet.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210409152154.198566-3-alexandru.elisei@arm.com
2021-04-11 09:46:13 +01:00
Alexandru Elisei
13611bc80d KVM: arm64: Don't print warning when trapping SPE registers
KVM sets up MDCR_EL2 to trap accesses to the SPE buffer and sampling
control registers and it relies on the fact that KVM injects an undefined
exception for unknown registers. This mechanism of injecting undefined
exceptions also prints a warning message for the host kernel; for example,
when a guest tries to access PMSIDR_EL1:

[    2.691830] kvm [142]: Unsupported guest sys_reg access at: 80009e78 [800003c5]
[    2.691830]  { Op0( 3), Op1( 0), CRn( 9), CRm( 9), Op2( 7), func_read },

This is unnecessary, because KVM has explicitly configured trapping of
those registers and is well aware of their existence. Prevent the warning
by adding the SPE registers to the list of registers that KVM emulates.
The access function will inject the undefined exception.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210409152154.198566-2-alexandru.elisei@arm.com
2021-04-11 09:46:12 +01:00
Marc Zyngier
85d7037461 KVM: arm64: Fully zero the vcpu state on reset
On vcpu reset, we expect all the registers to be brought back
to their initial state, which happens to be a bunch of zeroes.

However, some recent commit broke this, and is now leaving a bunch
of registers (such as the FP state) with whatever was left by the
guest. My bad.

Zero the reset of the state (32bit SPSRs and FPSIMD state).

Cc: stable@vger.kernel.org
Fixes: e47c2055c6 ("KVM: arm64: Make struct kvm_regs userspace-only")
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-04-09 18:26:25 +01:00
Sami Tolvanen
67dfd72b3e KVM: arm64: Disable CFI for nVHE
Disable CFI for the nVHE code to avoid address space confusion.

Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210408182843.1754385-18-samitolvanen@google.com
2021-04-08 16:04:23 -07:00
Sami Tolvanen
4f0f586bf0 treewide: Change list_sort to use const pointers
list_sort() internally casts the comparison function passed to it
to a different type with constant struct list_head pointers, and
uses this pointer to call the functions, which trips indirect call
Control-Flow Integrity (CFI) checking.

Instead of removing the consts, this change defines the
list_cmp_func_t type and changes the comparison function types of
all list_sort() callers to use const pointers, thus avoiding type
mismatches.

Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com
2021-04-08 16:04:22 -07:00
Alexandru Elisei
263d6287da KVM: arm64: Initialize VCPU mdcr_el2 before loading it
When a VCPU is created, the kvm_vcpu struct is initialized to zero in
kvm_vm_ioctl_create_vcpu(). On VHE systems, the first time
vcpu.arch.mdcr_el2 is loaded on hardware is in vcpu_load(), before it is
set to a sensible value in kvm_arm_setup_debug() later in the run loop. The
result is that KVM executes for a short time with MDCR_EL2 set to zero.

This has several unintended consequences:

* Setting MDCR_EL2.HPMN to 0 is constrained unpredictable according to ARM
  DDI 0487G.a, page D13-3820. The behavior specified by the architecture
  in this case is for the PE to behave as if MDCR_EL2.HPMN is set to a
  value less than or equal to PMCR_EL0.N, which means that an unknown
  number of counters are now disabled by MDCR_EL2.HPME, which is zero.

* The host configuration for the other debug features controlled by
  MDCR_EL2 is temporarily lost. This has been harmless so far, as Linux
  doesn't use the other fields, but that might change in the future.

Let's avoid both issues by initializing the VCPU's mdcr_el2 field in
kvm_vcpu_vcpu_first_run_init(), thus making sure that the MDCR_EL2 register
has a consistent value after each vcpu_load().

Fixes: d5a21bcc29 ("KVM: arm64: Move common VHE/non-VHE trap config in separate functions")
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210407144857.199746-3-alexandru.elisei@arm.com
2021-04-07 16:40:03 +01:00
Jianyong Wu
3bf725699b KVM: arm64: Add support for the KVM PTP service
Implement the hypervisor side of the KVM PTP interface.

The service offers wall time and cycle count from host to guest.
The caller must specify whether they want the host's view of
either the virtual or physical counter.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201209060932.212364-7-jianyong.wu@arm.com
2021-04-07 16:33:20 +01:00
Gavin Shan
10ba2d17d2 KVM: arm64: Don't retrieve memory slot again in page fault handler
We needn't retrieve the memory slot again in user_mem_abort() because
the corresponding memory slot has been passed from the caller. This
would save some CPU cycles. For example, the time used to write 1GB
memory, which is backed by 2MB hugetlb pages and write-protected, is
dropped by 6.8% from 928ms to 864ms.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210316041126.81860-4-gshan@redhat.com
2021-04-07 14:33:22 +01:00
Gavin Shan
c728fd4ce7 KVM: arm64: Use find_vma_intersection()
find_vma_intersection() has been existing to search the intersected
vma. This uses the function where it's applicable, to simplify the
code.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210316041126.81860-3-gshan@redhat.com
2021-04-07 14:33:22 +01:00
Gavin Shan
eab6214847 KVM: arm64: Hide kvm_mmu_wp_memory_region()
We needn't expose the function as it's only used by mmu.c since it
was introduced by commit c64735554c ("KVM: arm: Add initial dirty
page locking support").

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210316041126.81860-2-gshan@redhat.com
2021-04-07 14:33:22 +01:00
Suzuki K Poulose
a1319260bf arm64: KVM: Enable access to TRBE support for host
For a nvhe host, the EL2 must allow the EL1&0 translation
regime for TraceBuffer (MDCR_EL2.E2TB == 0b11). This must
be saved/restored over a trip to the guest. Also, before
entering the guest, we must flush any trace data if the
TRBE was enabled. And we must prohibit the generation
of trace while we are in EL1 by clearing the TRFCR_EL1.

For vhe, the EL2 must prevent the EL1 access to the Trace
Buffer.

The MDCR_EL2 bit definitions for TRBE are available here :

  https://developer.arm.com/documentation/ddi0601/2020-12/AArch64-Registers/

Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405164307.1720226-8-suzuki.poulose@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-04-06 16:05:28 -06:00
Suzuki K Poulose
d2602bb4f5 KVM: arm64: Move SPE availability check to VCPU load
At the moment, we check the availability of SPE on the given
CPU (i.e, SPE is implemented and is allowed at the host) during
every guest entry. This can be optimized a bit by moving the
check to vcpu_load time and recording the availability of the
feature on the current CPU via a new flag. This will also be useful
for adding the TRBE support.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Alexandru Elisei <Alexandru.Elisei@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405164307.1720226-7-suzuki.poulose@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-04-06 16:05:20 -06:00
Suzuki K Poulose
cc427cbb15 KVM: arm64: Handle access to TRFCR_EL1
Rather than falling to an "unhandled access", inject add an explicit
"undefined access" for TRFCR_EL1 access from the guest.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405164307.1720226-6-suzuki.poulose@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2021-04-06 16:05:12 -06:00
Eric Auger
28e9d4bce3 KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspace
Commit 23bde34771 ("KVM: arm64: vgic-v3: Drop the
reporting of GICR_TYPER.Last for userspace") temporarily fixed
a bug identified when attempting to access the GICR_TYPER
register before the redistributor region setting, but dropped
the support of the LAST bit.

Emulating the GICR_TYPER.Last bit still makes sense for
architecture compliance though. This patch restores its support
(if the redistributor region was set) while keeping the code safe.

We introduce a new helper, vgic_mmio_vcpu_rdist_is_last() which
computes whether a redistributor is the highest one of a series
of redistributor contributor pages.

With this new implementation we do not need to have a uaccess
read accessor anymore.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-9-eric.auger@redhat.com
2021-04-06 14:51:38 +01:00
Eric Auger
e5a3563546 kvm: arm64: vgic-v3: Introduce vgic_v3_free_redist_region()
To improve the readability, we introduce the new
vgic_v3_free_redist_region helper and also rename
vgic_v3_insert_redist_region into vgic_v3_alloc_redist_region

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-8-eric.auger@redhat.com
2021-04-06 14:51:38 +01:00
Eric Auger
da38530976 KVM: arm64: Simplify argument passing to vgic_uaccess_[read|write]
vgic_uaccess() takes a struct vgic_io_device argument, converts it
to a struct kvm_io_device and passes it to the read/write accessor
functions, which convert it back to a struct vgic_io_device.
Avoid the indirection by passing the struct vgic_io_device argument
directly to vgic_uaccess_{read,write}.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-7-eric.auger@redhat.com
2021-04-06 14:51:38 +01:00
Eric Auger
3a52116127 KVM: arm/arm64: vgic: Reset base address on kvm_vgic_dist_destroy()
On vgic_dist_destroy(), the addresses are not reset. However for
kvm selftest purpose this would allow to continue the test execution
even after a failure when running KVM_RUN. So let's reset the
base addresses.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-5-eric.auger@redhat.com
2021-04-06 14:51:38 +01:00
Eric Auger
8542a8f95a KVM: arm64: vgic-v3: Fix error handling in vgic_v3_set_redist_base()
vgic_v3_insert_redist_region() may succeed while
vgic_register_all_redist_iodevs fails. For example this happens
while adding a redistributor region overlapping a dist region. The
failure only is detected on vgic_register_all_redist_iodevs when
vgic_v3_check_base() gets called in vgic_register_redist_iodev().

In such a case, remove the newly added redistributor region and free
it.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-4-eric.auger@redhat.com
2021-04-06 14:51:37 +01:00
Eric Auger
53b16dd6ba KVM: arm64: Fix KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION read
The doc says:
"The characteristics of a specific redistributor region can
 be read by presetting the index field in the attr data.
 Only valid for KVM_DEV_TYPE_ARM_VGIC_V3"

Unfortunately the existing code fails to read the input attr data.

Fixes: 04c1109322 ("KVM: arm/arm64: Implement KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION")
Cc: stable@vger.kernel.org#v4.17+
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-3-eric.auger@redhat.com
2021-04-06 14:51:37 +01:00
Eric Auger
d9b201e99c KVM: arm64: vgic-v3: Fix some error codes when setting RDIST base
KVM_DEV_ARM_VGIC_GRP_ADDR group doc says we should return
-EEXIST in case the base address of the redist is already set.
We currently return -EINVAL.

However we need to return -EINVAL in case a legacy REDIST address
is attempted to be set while REDIST_REGIONS were set. This case
is discriminated by looking at the count field.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405163941.510258-2-eric.auger@redhat.com
2021-04-06 14:51:37 +01:00
Wang Wensheng
52b9e265d2 KVM: arm64: Fix error return code in init_hyp_mode()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: eeeee7193d ("KVM: arm64: Bootstrap PSCI SMC handler in nVHE EL2")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210406121759.5407-1-wangwensheng4@huawei.com
2021-04-06 14:20:23 +01:00
Andrew Scull
aec0fae62e KVM: arm64: Log source when panicking from nVHE hyp
To aid with debugging, add details of the source of a panic from nVHE
hyp. This is done by having nVHE hyp exit to nvhe_hyp_panic_handler()
rather than directly to panic(). The handler will then add the extra
details for debugging before panicking the kernel.

If the panic was due to a BUG(), look up the metadata to log the file
and line, if available, otherwise log an address that can be looked up
in vmlinux. The hyp offset is also logged to allow other hyp VAs to be
converted, similar to how the kernel offset is logged during a panic.

__hyp_panic_string is now inlined since it no longer needs to be
referenced as a symbol and the message is free to diverge between VHE
and nVHE.

The following is an example of the logs generated by a BUG in nVHE hyp.

[   46.754840] kvm [307]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/switch.c:242!
[   46.755357] kvm [307]: Hyp Offset: 0xfffea6c58e1e0000
[   46.755824] Kernel panic - not syncing: HYP panic:
[   46.755824] PS:400003c9 PC:0000d93a82c705ac ESR:f2000800
[   46.755824] FAR:0000000080080000 HPFAR:0000000000800800 PAR:0000000000000000
[   46.755824] VCPU:0000d93a880d0000
[   46.756960] CPU: 3 PID: 307 Comm: kvm-vcpu-0 Not tainted 5.12.0-rc3-00005-gc572b99cf65b-dirty #133
[   46.757459] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[   46.758366] Call trace:
[   46.758601]  dump_backtrace+0x0/0x1b0
[   46.758856]  show_stack+0x18/0x70
[   46.759057]  dump_stack+0xd0/0x12c
[   46.759236]  panic+0x16c/0x334
[   46.759426]  arm64_kernel_unmapped_at_el0+0x0/0x30
[   46.759661]  kvm_arch_vcpu_ioctl_run+0x134/0x750
[   46.759936]  kvm_vcpu_ioctl+0x2f0/0x970
[   46.760156]  __arm64_sys_ioctl+0xa8/0xec
[   46.760379]  el0_svc_common.constprop.0+0x60/0x120
[   46.760627]  do_el0_svc+0x24/0x90
[   46.760766]  el0_svc+0x2c/0x54
[   46.760915]  el0_sync_handler+0x1a4/0x1b0
[   46.761146]  el0_sync+0x170/0x180
[   46.761889] SMP: stopping secondary CPUs
[   46.762786] Kernel Offset: 0x3e1cd2820000 from 0xffff800010000000
[   46.763142] PHYS_OFFSET: 0xffffa9f680000000
[   46.763359] CPU features: 0x00240022,61806008
[   46.763651] Memory Limit: none
[   46.813867] ---[ end Kernel panic - not syncing: HYP panic:
[   46.813867] PS:400003c9 PC:0000d93a82c705ac ESR:f2000800
[   46.813867] FAR:0000000080080000 HPFAR:0000000000800800 PAR:0000000000000000
[   46.813867] VCPU:0000d93a880d0000 ]---

Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210318143311.839894-6-ascull@google.com
2021-04-01 09:54:37 +01:00
Andrew Scull
f79e616f27 KVM: arm64: Use BUG and BUG_ON in nVHE hyp
hyp_panic() reports the address of the panic by using ELR_EL2, but this
isn't a useful address when hyp_panic() is called directly. Replace such
direct calls with BUG() and BUG_ON() which use BRK to trigger an
exception that then goes to hyp_panic() with the correct address. Also
remove the hyp_panic() declaration from the header file to avoid
accidental misuse.

Signed-off-by: Andrew Scull <ascull@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210318143311.839894-5-ascull@google.com
2021-04-01 09:54:37 +01:00
David Brazdil
77e06b3001 KVM: arm64: Support PREL/PLT relocs in EL2 code
gen-hyprel tool parses object files of the EL2 portion of KVM
and generates runtime relocation data. While only filtering for
R_AARCH64_ABS64 relocations in the input object files, it has an
allow-list of relocation types that are used for relative
addressing. Other, unexpected, relocation types are rejected and
cause the build to fail.

This allow-list did not include the position-relative relocation
types R_AARCH64_PREL64/32/16 and the recently introduced _PLT32.
While not seen used by toolchains in the wild, add them to the
allow-list for completeness.

Fixes: 8c49b5d43d ("KVM: arm64: Generate hyp relocation data")
Cc: <stable@vger.kernel.org>
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210331133048.63311-1-dbrazdil@google.com
2021-03-31 14:59:19 +01:00
Will Deacon
923961a7ff KVM: arm64: Advertise KVM UID to guests via SMCCC
We can advertise ourselves to guests as KVM and provide a basic features
bitmap for discoverability of future hypervisor services.

Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201209060932.212364-3-jianyong.wu@arm.com
2021-03-31 09:16:56 +01:00
Xu Jia
b1306fef1f KVM: arm64: Make symbol '_kvm_host_prot_finalize' static
The sparse tool complains as follows:

arch/arm64/kvm/arm.c:1900:6: warning:
 symbol '_kvm_host_prot_finalize' was not declared. Should it be static?

This symbol is not used outside of arm.c, so this
commit marks it static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Xu Jia <xujia39@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1617176179-31931-1-git-send-email-xujia39@huawei.com
2021-03-31 09:14:16 +01:00
Marc Zyngier
7c4199375a KVM: arm64: Drop the CPU_FTR_REG_HYP_COPY infrastructure
Now that the read_ctr macro has been specialised for nVHE,
the whole CPU_FTR_REG_HYP_COPY infrastrcture looks completely
overengineered.

Simplify it by populating the two u64 quantities (MMFR0 and 1)
that the hypervisor need.

Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-25 11:01:03 +00:00
Marc Zyngier
755db23420 KVM: arm64: Generate final CTR_EL0 value when running in Protected mode
In protected mode, late CPUs are not allowed to boot (enforced by
the PSCI relay). We can thus specialise the read_ctr macro to
always return a pre-computed, sanitised value. Special care is
taken to prevent the use of this custome version outside of
the protected mode.

Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-25 11:00:33 +00:00
Shenming Lu
8082d50f48 KVM: arm64: GICv4.1: Give a chance to save VLPI state
Before GICv4.1, we don't have direct access to the VLPI state. So
we simply let it fail early when encountering any VLPI in saving.

But now we don't have to return -EACCES directly if on GICv4.1. Let’s
change the hard code and give a chance to save the VLPI state (and
preserve the UAPI).

Signed-off-by: Shenming Lu <lushenming@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210322060158.1584-7-lushenming@huawei.com
2021-03-24 18:12:21 +00:00
Zenghui Yu
12df742921 KVM: arm64: GICv4.1: Restore VLPI pending state to physical side
When setting the forwarding path of a VLPI (switch to the HW mode),
we can also transfer the pending state from irq->pending_latch to
VPT (especially in migration, the pending states of VLPIs are restored
into kvm’s vgic first). And we currently send "INT+VSYNC" to trigger
a VLPI to pending.

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Shenming Lu <lushenming@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210322060158.1584-6-lushenming@huawei.com
2021-03-24 18:12:21 +00:00
Shenming Lu
f66b7b151e KVM: arm64: GICv4.1: Try to save VLPI state in save_pending_tables
After pausing all vCPUs and devices capable of interrupting, in order
to save the states of all interrupts, besides flushing the states in
kvm’s vgic, we also try to flush the states of VLPIs in the virtual
pending tables into guest RAM, but we need to have GICv4.1 and safely
unmap the vPEs first.

As for the saving of VSGIs, which needs the vPEs to be mapped and might
conflict with the saving of VLPIs, but since we will map the vPEs back
at the end of save_pending_tables and both savings require the kvm->lock
to be held (thus only happen serially), it will work fine.

Signed-off-by: Shenming Lu <lushenming@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210322060158.1584-5-lushenming@huawei.com
2021-03-24 18:12:21 +00:00
Shenming Lu
80317fe4a6 KVM: arm64: GICv4.1: Add function to get VLPI state
With GICv4.1 and the vPE unmapped, which indicates the invalidation
of any VPT caches associated with the vPE, we can get the VLPI state
by peeking at the VPT. So we add a function for this.

Signed-off-by: Shenming Lu <lushenming@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210322060158.1584-4-lushenming@huawei.com
2021-03-24 18:12:20 +00:00
Suzuki K Poulose
a354a64d91 KVM: arm64: Disable guest access to trace filter controls
Disable guest access to the Trace Filter control registers.
We do not advertise the Trace filter feature to the guest
(ID_AA64DFR0_EL1: TRACE_FILT is cleared) already, but the guest
can still access the TRFCR_EL1 unless we trap it.

This will also make sure that the guest cannot fiddle with
the filtering controls set by a nvhe host.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210323120647.454211-3-suzuki.poulose@arm.com
2021-03-24 17:26:38 +00:00
Marc Zyngier
af22df997d KVM: arm64: Fix CPU interface MMIO compatibility detection
In order to detect whether a GICv3 CPU interface is MMIO capable,
we switch ICC_SRE_EL1.SRE to 0 and check whether it sticks.

However, this is only possible if *ALL* of the HCR_EL2 interrupt
overrides are set, and the CPU is perfectly allowed to ignore
the write to ICC_SRE_EL1 otherwise. This leads KVM to pretend
that a whole bunch of ARMv8.0 CPUs aren't MMIO-capable, and
breaks VMs that should work correctly otherwise.

Fix this by setting IMO/FMO/IMO before touching ICC_SRE_EL1,
and clear them afterwards. This allows us to reliably detect
the CPU interface capabilities.

Tested-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Fixes: 9739f6ef05 ("KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility")
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-24 17:26:38 +00:00
Quentin Perret
90134ac9ca KVM: arm64: Protect the .hyp sections from the host
When KVM runs in nVHE protected mode, use the host stage 2 to unmap the
hypervisor sections by marking them as owned by the hypervisor itself.
The long-term goal is to ensure the EL2 code can remain robust
regardless of the host's state, so this starts by making sure the host
cannot e.g. write to the .hyp sections directly.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-39-qperret@google.com
2021-03-19 12:02:19 +00:00
Quentin Perret
9589a38cdf KVM: arm64: Disable PMU support in protected mode
The host currently writes directly in EL2 per-CPU data sections from
the PMU code when running in nVHE. In preparation for unmapping the EL2
sections from the host stage 2, disable PMU support in protected mode as
we currently do not have a use-case for it.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-38-qperret@google.com
2021-03-19 12:02:19 +00:00
Quentin Perret
1025c8c0c6 KVM: arm64: Wrap the host with a stage 2
When KVM runs in protected nVHE mode, make use of a stage 2 page-table
to give the hypervisor some control over the host memory accesses. The
host stage 2 is created lazily using large block mappings if possible,
and will default to page mappings in absence of a better solution.

>From this point on, memory accesses from the host to protected memory
regions (e.g. not 'owned' by the host) are fatal and lead to hyp_panic().

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-36-qperret@google.com
2021-03-19 12:02:18 +00:00
Quentin Perret
def1aaf9e0 KVM: arm64: Provide sanitized mmfr* registers at EL2
We will need to read sanitized values of mmfr{0,1}_el1 at EL2 soon, so
add them to the list of copied variables.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-35-qperret@google.com
2021-03-19 12:01:22 +00:00
Quentin Perret
8942a237c7 KVM: arm64: Introduce KVM_PGTABLE_S2_IDMAP stage 2 flag
Introduce a new stage 2 configuration flag to specify that all mappings
in a given page-table will be identity-mapped, as will be the case for
the host. This allows to introduce sanity checks in the map path and to
avoid programming errors.

Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-34-qperret@google.com
2021-03-19 12:01:22 +00:00
Quentin Perret
bc224df155 KVM: arm64: Introduce KVM_PGTABLE_S2_NOFWB stage 2 flag
In order to further configure stage 2 page-tables, pass flags to the
init function using a new enum.

The first of these flags allows to disable FWB even if the hardware
supports it as we will need to do so for the host stage 2.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-33-qperret@google.com
2021-03-19 12:01:22 +00:00
Quentin Perret
2fcb3a5940 KVM: arm64: Add kvm_pgtable_stage2_find_range()
Since the host stage 2 will be identity mapped, and since it will own
most of memory, it would preferable for performance to try and use large
block mappings whenever that is possible. To ease this, introduce a new
helper in the KVM page-table code which allows to search for large
ranges of available IPA space. This will be used in the host memory
abort path to greedily idmap large portion of the PA space.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-32-qperret@google.com
2021-03-19 12:01:22 +00:00
Quentin Perret
3fab82347f KVM: arm64: Refactor the *_map_set_prot_attr() helpers
In order to ease their re-use in other code paths, refactor the
*_map_set_prot_attr() helpers to not depend on a map_data struct.
No functional change intended.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-31-qperret@google.com
2021-03-19 12:01:22 +00:00
Quentin Perret
807923e04a KVM: arm64: Use page-table to track page ownership
As the host stage 2 will be identity mapped, all the .hyp memory regions
and/or memory pages donated to protected guestis will have to marked
invalid in the host stage 2 page-table. At the same time, the hypervisor
will need a way to track the ownership of each physical page to ensure
memory sharing or donation between entities (host, guests, hypervisor) is
legal.

In order to enable this tracking at EL2, let's use the host stage 2
page-table itself. The idea is to use the top bits of invalid mappings
to store the unique identifier of the page owner. The page-table owner
(the host) gets identifier 0 such that, at boot time, it owns the entire
IPA space as the pgd starts zeroed.

Provide kvm_pgtable_stage2_set_owner() which allows to modify the
ownership of pages in the host stage 2. It re-uses most of the map()
logic, but ends up creating invalid mappings instead. This impacts
how we do refcount as we now need to count invalid mappings when they
are used for ownership tracking.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-30-qperret@google.com
2021-03-19 12:01:22 +00:00
Quentin Perret
f60ca2f932 KVM: arm64: Always zero invalid PTEs
kvm_set_invalid_pte() currently only clears bit 0 from a PTE because
stage2_map_walk_table_post() needs to be able to follow the anchor. In
preparation for re-using bits 63-01 from invalid PTEs, make sure to zero
it entirely by ensuring to cache the anchor's child upfront.

Acked-by: Will Deacon <will@kernel.org>
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-29-qperret@google.com
2021-03-19 12:01:22 +00:00
Quentin Perret
a14307f531 KVM: arm64: Sort the hypervisor memblocks
We will soon need to check if a Physical Address belongs to a memblock
at EL2, so make sure to sort them so this can be done efficiently.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-28-qperret@google.com
2021-03-19 12:01:22 +00:00
Quentin Perret
04e5de0309 KVM: arm64: Reserve memory for host stage 2
Extend the memory pool allocated for the hypervisor to include enough
pages to map all of memory at page granularity for the host stage 2.
While at it, also reserve some memory for device mappings.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-27-qperret@google.com
2021-03-19 12:01:22 +00:00
Quentin Perret
e37f37a0e7 KVM: arm64: Make memcache anonymous in pgtable allocator
The current stage2 page-table allocator uses a memcache to get
pre-allocated pages when it needs any. To allow re-using this code at
EL2 which uses a concept of memory pools, make the memcache argument of
kvm_pgtable_stage2_map() anonymous, and let the mm_ops zalloc_page()
callbacks use it the way they need to.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-26-qperret@google.com
2021-03-19 12:01:21 +00:00
Quentin Perret
159b859bee KVM: arm64: Refactor __populate_fault_info()
Refactor __populate_fault_info() to introduce __get_fault_info() which
will be used once the host is wrapped in a stage 2.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-25-qperret@google.com
2021-03-19 12:01:21 +00:00
Quentin Perret
bcb25a2b86 KVM: arm64: Refactor kvm_arm_setup_stage2()
In order to re-use some of the stage 2 setup code at EL2, factor parts
of kvm_arm_setup_stage2() out into separate functions.

No functional change intended.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-23-qperret@google.com
2021-03-19 12:01:21 +00:00
Quentin Perret
734864c177 KVM: arm64: Set host stage 2 using kvm_nvhe_init_params
Move the registers relevant to host stage 2 enablement to
kvm_nvhe_init_params to prepare the ground for enabling it in later
patches.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-22-qperret@google.com
2021-03-19 12:01:21 +00:00
Quentin Perret
cfb1a98de7 KVM: arm64: Use kvm_arch in kvm_s2_mmu
In order to make use of the stage 2 pgtable code for the host stage 2,
change kvm_s2_mmu to use a kvm_arch pointer in lieu of the kvm pointer,
as the host will have the former but not the latter.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-21-qperret@google.com
2021-03-19 12:01:21 +00:00
Quentin Perret
834cd93deb KVM: arm64: Use kvm_arch for stage 2 pgtable
In order to make use of the stage 2 pgtable code for the host stage 2,
use struct kvm_arch in lieu of struct kvm as the host will have the
former but not the latter.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-20-qperret@google.com
2021-03-19 12:01:21 +00:00
Quentin Perret
bfa79a8054 KVM: arm64: Elevate hypervisor mappings creation at EL2
Previous commits have introduced infrastructure to enable the EL2 code
to manage its own stage 1 mappings. However, this was preliminary work,
and none of it is currently in use.

Put all of this together by elevating the mapping creation at EL2 when
memory protection is enabled. In this case, the host kernel running
at EL1 still creates _temporary_ EL2 mappings, only used while
initializing the hypervisor, but frees them right after.

As such, all calls to create_hyp_mappings() after kvm init has finished
turn into hypercalls, as the host now has no 'legal' way to modify the
hypevisor page tables directly.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-19-qperret@google.com
2021-03-19 12:01:21 +00:00
Quentin Perret
f320bc742b KVM: arm64: Prepare the creation of s1 mappings at EL2
When memory protection is enabled, the EL2 code needs the ability to
create and manage its own page-table. To do so, introduce a new set of
hypercalls to bootstrap a memory management system at EL2.

This leads to the following boot flow in nVHE Protected mode:

 1. the host allocates memory for the hypervisor very early on, using
    the memblock API;

 2. the host creates a set of stage 1 page-table for EL2, installs the
    EL2 vectors, and issues the __pkvm_init hypercall;

 3. during __pkvm_init, the hypervisor re-creates its stage 1 page-table
    and stores it in the memory pool provided by the host;

 4. the hypervisor then extends its stage 1 mappings to include a
    vmemmap in the EL2 VA space, hence allowing to use the buddy
    allocator introduced in a previous patch;

 5. the hypervisor jumps back in the idmap page, switches from the
    host-provided page-table to the new one, and wraps up its
    initialization by enabling the new allocator, before returning to
    the host.

 6. the host can free the now unused page-table created for EL2, and
    will now need to issue hypercalls to make changes to the EL2 stage 1
    mappings instead of modifying them directly.

Note that for the sake of simplifying the review, this patch focuses on
the hypervisor side of things. In other words, this only implements the
new hypercalls, but does not make use of them from the host yet. The
host-side changes will follow in a subsequent patch.

Credits to Will for __pkvm_init_switch_pgd.

Acked-by: Will Deacon <will@kernel.org>
Co-authored-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-18-qperret@google.com
2021-03-19 12:01:21 +00:00
Quentin Perret
bc1d2892e9 KVM: arm64: Factor out vector address calculation
In order to re-map the guest vectors at EL2 when pKVM is enabled,
refactor __kvm_vector_slot2idx() and kvm_init_vector_slot() to move all
the address calculation logic in a static inline function.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-16-qperret@google.com
2021-03-19 12:01:20 +00:00
Quentin Perret
d460df1292 KVM: arm64: Provide __flush_dcache_area at EL2
We will need to do cache maintenance at EL2 soon, so compile a copy of
__flush_dcache_area at EL2, and provide a copy of arm64_ftr_reg_ctrel0
as it is needed by the read_ctr macro.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-15-qperret@google.com
2021-03-19 12:01:20 +00:00
Quentin Perret
7a440cc783 KVM: arm64: Enable access to sanitized CPU features at EL2
Introduce the infrastructure in KVM enabling to copy CPU feature
registers into EL2-owned data-structures, to allow reading sanitised
values directly at EL2 in nVHE.

Given that only a subset of these features are being read by the
hypervisor, the ones that need to be copied are to be listed under
<asm/kvm_cpufeature.h> together with the name of the nVHE variable that
will hold the copy. This introduces only the infrastructure enabling
this copy. The first users will follow shortly.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-14-qperret@google.com
2021-03-19 12:01:20 +00:00
Quentin Perret
8e17c66249 KVM: arm64: Introduce a Hyp buddy page allocator
When memory protection is enabled, the hyp code will require a basic
form of memory management in order to allocate and free memory pages at
EL2. This is needed for various use-cases, including the creation of hyp
mappings or the allocation of stage 2 page tables.

To address these use-case, introduce a simple memory allocator in the
hyp code. The allocator is designed as a conventional 'buddy allocator',
working with a page granularity. It allows to allocate and free
physically contiguous pages from memory 'pools', with a guaranteed order
alignment in the PA space. Each page in a memory pool is associated
with a struct hyp_page which holds the page's metadata, including its
refcount, as well as its current order, hence mimicking the kernel's
buddy system in the GFP infrastructure. The hyp_page metadata are made
accessible through a hyp_vmemmap, following the concept of
SPARSE_VMEMMAP in the kernel.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-13-qperret@google.com
2021-03-19 12:01:20 +00:00
Quentin Perret
40d9e41e52 KVM: arm64: Stub CONFIG_DEBUG_LIST at Hyp
In order to use the kernel list library at EL2, introduce stubs for the
CONFIG_DEBUG_LIST out-of-lines calls.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-12-qperret@google.com
2021-03-19 12:01:20 +00:00
Quentin Perret
e759604087 KVM: arm64: Introduce an early Hyp page allocator
With nVHE, the host currently creates all stage 1 hypervisor mappings at
EL1 during boot, installs them at EL2, and extends them as required
(e.g. when creating a new VM). But in a world where the host is no
longer trusted, it cannot have full control over the code mapped in the
hypervisor.

In preparation for enabling the hypervisor to create its own stage 1
mappings during boot, introduce an early page allocator, with minimal
functionality. This allocator is designed to be used only during early
bootstrap of the hyp code when memory protection is enabled, which will
then switch to using a full-fledged page allocator after init.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-11-qperret@google.com
2021-03-19 12:01:20 +00:00
Quentin Perret
380e18ade4 KVM: arm64: Introduce a BSS section for use at Hyp
Currently, the hyp code cannot make full use of a bss, as the kernel
section is mapped read-only.

While this mapping could simply be changed to read-write, it would
intermingle even more the hyp and kernel state than they currently are.
Instead, introduce a __hyp_bss section, that uses reserved pages, and
create the appropriate RW hyp mappings during KVM init.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-8-qperret@google.com
2021-03-19 12:01:20 +00:00
Quentin Perret
7aef0cbcdc KVM: arm64: Factor memory allocation out of pgtable.c
In preparation for enabling the creation of page-tables at EL2, factor
all memory allocation out of the page-table code, hence making it
re-usable with any compatible memory allocator.

No functional changes intended.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-7-qperret@google.com
2021-03-19 12:01:20 +00:00
Quentin Perret
cc706a6389 KVM: arm64: Avoid free_page() in page-table allocator
Currently, the KVM page-table allocator uses a mix of put_page() and
free_page() calls depending on the context even though page-allocation
is always achieved using variants of __get_free_page().

Make the code consistent by using put_page() throughout, and reduce the
memory management API surface used by the page-table code. This will
ease factoring out page-allocation from pgtable.c, which is a
pre-requisite to creating page-tables at EL2.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-6-qperret@google.com
2021-03-19 12:01:19 +00:00
Quentin Perret
9cc7758145 KVM: arm64: Initialize kvm_nvhe_init_params early
Move the initialization of kvm_nvhe_init_params in a dedicated function
that is run early, and only once during KVM init, rather than every time
the KVM vectors are set and reset.

This also opens the opportunity for the hypervisor to change the init
structs during boot, hence simplifying the replacement of host-provided
page-table by the one the hypervisor will create for itself.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-5-qperret@google.com
2021-03-19 12:01:19 +00:00
Will Deacon
67c2d32633 arm64: kvm: Add standalone ticket spinlock implementation for use at hyp
We will soon need to synchronise multiple CPUs in the hyp text at EL2.
The qspinlock-based locking used by the host is overkill for this purpose
and relies on the kernel's "percpu" implementation for the MCS nodes.

Implement a simple ticket locking scheme based heavily on the code removed
by commit c11090474d ("arm64: locking: Replace ticket lock implementation
with qspinlock").

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-4-qperret@google.com
2021-03-19 12:01:19 +00:00
Will Deacon
7b4a7b5e6f KVM: arm64: Link position-independent string routines into .hyp.text
Pull clear_page(), copy_page(), memcpy() and memset() into the nVHE hyp
code and ensure that we always execute the '__pi_' entry point on the
offchance that it changes in future.

[ qperret: Commit title nits and added linker script alias ]

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-3-qperret@google.com
2021-03-19 12:01:19 +00:00
Marc Zyngier
a1baa01f76 Linux 5.12-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmBOgu4eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGUd0H/3Ey8aWjVAig9Pe+
 VQVZKwG+LXWH6UmUx5qyaTxophhmGnWLvkigJMn63qIg4eQtfp2gNFHK+T4OJNIP
 ybnkjFZ337x4J9zD6m8mt4Wmelq9iW2wNOS+3YZAyYiGlXfMGM7SlYRCQRQznTED
 2O/JCMsOoP+Z8tr5ah/bzs0dANsXmTZ3QqRP2uzb6irKTgFR3/weOhj+Ht1oJ4Aq
 V+bgdcwhtk20hJhlvVeqws+o74LR789tTDCknlz/YNMv9e6VPfyIQ5vJAcFmZATE
 Ezj9yzkZ4IU+Ux6ikAyaFyBU8d1a4Wqye3eHCZBsEo6tcSAhbTZ90eoU86vh6ajS
 LZjwkNw=
 =6y1u
 -----END PGP SIGNATURE-----

Merge tag 'v5.12-rc3' into kvm-arm64/host-stage2

Linux 5.12-rc3

Signed-off-by: Marc Zyngier <maz@kernel.org>

# gpg: Signature made Sun 14 Mar 2021 21:41:02 GMT
# gpg:                using RSA key ABAF11C65A2970B130ABE3C479BE3E4300411886
# gpg:                issuer "torvalds@linux-foundation.org"
# gpg: Can't check signature: No public key
2021-03-19 12:01:04 +00:00
Marc Zyngier
5b08709313 KVM: arm64: Fix host's ZCR_EL2 restore on nVHE
We re-enter the EL1 host with CPTR_EL2.TZ set in order to
be able to lazily restore ZCR_EL2 when required.

However, the same CPTR_EL2 configuration also leads to trapping
when ZCR_EL2 is accessed from EL2. Duh!

Clear CPTR_EL2.TZ *before* writing to ZCR_EL2.

Fixes: beed09067b ("KVM: arm64: Trap host SVE accesses when the FPSIMD state is dirty")
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-18 18:30:26 +00:00
Marc Zyngier
fe2c8d1918 KVM: arm64: Turn SCTLR_ELx_FLAGS into INIT_SCTLR_EL2_MMU_ON
Only the nVHE EL2 code is using this define, so let's make it
plain that it is EL2 only, and refactor it to contain all the
bits we need when configuring the EL2 MMU, and only those.

Acked-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-18 15:52:02 +00:00
Marc Zyngier
bc6ddaa67a KVM: arm64: Use INIT_SCTLR_EL2_MMU_OFF to disable the MMU on KVM teardown
Instead of doing a RMW on SCTLR_EL2 to disable the MMU, use the
existing define that loads the right set of bits.

Acked-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-18 15:51:56 +00:00
Daniel Kiss
6e94095c55 KVM: arm64: Enable SVE support for nVHE
Now that KVM is equipped to deal with SVE on nVHE, remove the code
preventing it from being used as well as the bits of documentation
that were mentioning the incompatibility.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Daniel Kiss <daniel.kiss@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-18 14:23:19 +00:00
Marc Zyngier
8c8010d69c KVM: arm64: Save/restore SVE state for nVHE
Implement the SVE save/restore for nVHE, following a similar
logic to that of the VHE implementation:

- the SVE state is switched on trap from EL1 to EL2

- no further changes to ZCR_EL2 occur as long as the guest isn't
  preempted or exit to userspace

- ZCR_EL2 is reset to its default value on the first SVE access from
  the host EL1, and ZCR_EL1 restored to the default guest value in
  vcpu_put()

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-18 14:23:12 +00:00
Marc Zyngier
beed09067b KVM: arm64: Trap host SVE accesses when the FPSIMD state is dirty
ZCR_EL2 controls the upper bound for ZCR_EL1, and is set to
a potentially lower limit when the guest uses SVE. In order
to restore the SVE state on the EL1 host, we must first
reset ZCR_EL2 to its original value.

To make it as lazy as possible on the EL1 host side, set
the SVE trapping in place when exiting from the guest.
On the first EL1 access to SVE, ZCR_EL2 will be restored
to its full glory.

Suggested-by: Andrew Scull <ascull@google.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-18 14:22:31 +00:00
Marc Zyngier
b145a8437a KVM: arm64: Save guest's ZCR_EL1 before saving the FPSIMD state
Make sure the guest's ZCR_EL1 is saved before we save/flush the
state. This will be useful in later patches.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-18 13:57:54 +00:00
Marc Zyngier
0a9a98fda3 KVM: arm64: Map SVE context at EL2 when available
When running on nVHE, and that the vcpu supports SVE, map the
SVE state at EL2 so that KVM can access it.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-18 13:57:49 +00:00
Marc Zyngier
52029198c1 KVM: arm64: Rework SVE host-save/guest-restore
In order to keep the code readable, move the host-save/guest-restore
sequences in their own functions, with the following changes:
- the hypervisor ZCR is now set from C code
- ZCR_EL2 is always used as the EL2 accessor

This results in some minor assembler macro rework.
No functional change intended.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-18 13:57:37 +00:00
Marc Zyngier
468f3477ef KVM: arm64: Introduce vcpu_sve_vq() helper
The KVM code contains a number of "sve_vq_from_vl(vcpu->arch.sve_max_vl)"
instances, and we are about to add more.

Introduce vcpu_sve_vq() as a shorthand for this expression.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-18 11:24:10 +00:00
Marc Zyngier
83857371d4 KVM: arm64: Use {read,write}_sysreg_el1 to access ZCR_EL1
Switch to the unified EL1 accessors for ZCR_EL1, which will make
things easier for nVHE support.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-18 11:23:48 +00:00
Marc Zyngier
297b8603e3 KVM: arm64: Provide KVM's own save/restore SVE primitives
as we are about to change the way KVM deals with SVE, provide
KVM with its own save/restore SVE primitives.

No functional change intended.

Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-03-18 11:23:14 +00:00
Linus Torvalds
9d0c8e793f More fixes for ARM and x86.
-----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmBLsyoUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMpYgf/Zu1Byif+XZVdwm52wJN38ppUUVmn
 4u8HvQ8Ht+P0cGg1IaNx9D5QXGRgdn72qEpWUF5aH03ahTANAuf6zXw+evKmiub/
 RtJfxZWEcWeLdugLVHUSrR4MOox7uvFmCdcdht4sEPdjFdH/9JeceC3A1pZ/DYTR
 +eS+E3dMWQjXnd2Omo/5f5H1LTZjNLEditnkcHT5unwKKukc008V/avgs8xOAKJB
 xf3oqJF960IO+NYf8rRQb8WtyGeo0grrWjgeqvZ37gwGUaFB9ldVxchsVLsL66OR
 bJRIoSiTgL+TUYSMQ5mKG4tmmBnPHUHfgfNoOXlWMoJHIjFeQ9oM6eTHhA==
 =QTFW
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "More fixes for ARM and x86"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: LAPIC: Advancing the timer expiration on guest initiated write
  KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode
  KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged
  kvm: x86: annotate RCU pointers
  KVM: arm64: Fix exclusive limit for IPA size
  KVM: arm64: Reject VM creation when the default IPA size is unsupported
  KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
  KVM: arm64: Don't use cbz/adr with external symbols
  KVM: arm64: Fix range alignment when walking page tables
  KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility
  KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config()
  KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available
  KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key
  KVM: arm64: Fix nVHE hyp panic host context restore
  KVM: arm64: Avoid corrupting vCPU context register in guest exit
  KVM: arm64: nvhe: Save the SPE context early
  kvm: x86: use NULL instead of using plain integer as pointer
  KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled'
  KVM: x86: Ensure deadline timer has truly expired before posting its IRQ
2021-03-14 12:35:02 -07:00
Marc Zyngier
262b003d05 KVM: arm64: Fix exclusive limit for IPA size
When registering a memslot, we check the size and location of that
memslot against the IPA size to ensure that we can provide guest
access to the whole of the memory.

Unfortunately, this check rejects memslot that end-up at the exact
limit of the addressing capability for a given IPA size. For example,
it refuses the creation of a 2GB memslot at 0x8000000 with a 32bit
IPA space.

Fix it by relaxing the check to accept a memslot reaching the
limit of the IPA space.

Fixes: c3058d5da2 ("arm/arm64: KVM: Ensure memslots are within KVM_PHYS_SIZE")
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Andrew Jones <drjones@redhat.com>
Link: https://lore.kernel.org/r/20210311100016.3830038-3-maz@kernel.org
2021-03-12 15:43:22 +00:00
Marc Zyngier
7d717558dd KVM: arm64: Reject VM creation when the default IPA size is unsupported
KVM/arm64 has forever used a 40bit default IPA space, partially
due to its 32bit heritage (where the only choice is 40bit).

However, there are implementations in the wild that have a *cough*
much smaller *cough* IPA space, which leads to a misprogramming of
VTCR_EL2, and a guest that is stuck on its first memory access
if userspace dares to ask for the default IPA setting (which most
VMMs do).

Instead, blundly reject the creation of such VM, as we can't
satisfy the requirements from userspace (with a one-off warning).
Also clarify the boot warning, and document that the VM creation
will fail when an unsupported IPA size is provided.

Although this is an ABI change, it doesn't really change much
for userspace:

- the guest couldn't run before this change, but no error was
  returned. At least userspace knows what is happening.

- a memory slot that was accepted because it did fit the default
  IPA space now doesn't even get a chance to be registered.

The other thing that is left doing is to convince userspace to
actually use the IPA space setting instead of relying on the
antiquated default.

Fixes: 233a7cb235 ("kvm: arm64: Allow tuning the physical address size for VM")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20210311100016.3830038-2-maz@kernel.org
2021-03-12 15:42:57 +00:00
James Morse
26f55386f9 arm64/mm: Fix __enable_mmu() for new TGRAN range values
As per ARM ARM DDI 0487G.a, when FEAT_LPA2 is implemented, ID_AA64MMFR0_EL1
might contain a range of values to describe supported translation granules
(4K and 16K pages sizes in particular) instead of just enabled or disabled
values. This changes __enable_mmu() function to handle complete acceptable
range of values (depending on whether the field is signed or unsigned) now
represented with ID_AA64MMFR0_TGRAN_SUPPORTED_[MIN..MAX] pair. While here,
also fix similar situations in EFI stub and KVM as well.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-efi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/1615355590-21102-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-03-10 11:01:57 +00:00
Marc Zyngier
01dc9262ff KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
It recently became apparent that the ARMv8 architecture has interesting
rules regarding attributes being used when fetching instructions
if the MMU is off at Stage-1.

In this situation, the CPU is allowed to fetch from the PoC and
allocate into the I-cache (unless the memory is mapped with
the XN attribute at Stage-2).

If we transpose this to vcpus sharing a single physical CPU,
it is possible for a vcpu running with its MMU off to influence
another vcpu running with its MMU on, as the latter is expected to
fetch from the PoU (and self-patching code doesn't flush below that
level).

In order to solve this, reuse the vcpu-private TLB invalidation
code to apply the same policy to the I-cache, nuking it every time
the vcpu runs on a physical CPU that ran another vcpu of the same
VM in the past.

This involve renaming __kvm_tlb_flush_local_vmid() to
__kvm_flush_cpu_context(), and inserting a local i-cache invalidation
there.

Cc: stable@vger.kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210303164505.68492-1-maz@kernel.org
2021-03-09 17:58:56 +00:00
Sami Tolvanen
dbaee836d6 KVM: arm64: Don't use cbz/adr with external symbols
allmodconfig + CONFIG_LTO_CLANG_THIN=y fails to build due to following
linker errors:

  ld.lld: error: irqbypass.c:(function __guest_enter: .text+0x21CC):
  relocation R_AARCH64_CONDBR19 out of range: 2031220 is not in
  [-1048576, 1048575]; references hyp_panic
  >>> defined in vmlinux.o

  ld.lld: error: irqbypass.c:(function __guest_enter: .text+0x21E0):
  relocation R_AARCH64_ADR_PREL_LO21 out of range: 2031200 is not in
  [-1048576, 1048575]; references hyp_panic
  >>> defined in vmlinux.o

This is because with LTO, the compiler ends up placing hyp_panic()
more than 1MB away from __guest_enter(). Use an unconditional branch
and adr_l instead to fix the issue.

Link: https://github.com/ClangBuiltLinux/linux/issues/1317
Reported-by: Nathan Chancellor <nathan@kernel.org>
Suggested-by: Marc Zyngier <maz@kernel.org>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Will Deacon <will@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210305202124.3768527-1-samitolvanen@google.com
2021-03-09 08:48:24 +00:00
Jia He
357ad203d4 KVM: arm64: Fix range alignment when walking page tables
When walking the page tables at a given level, and if the start
address for the range isn't aligned for that level, we propagate
the misalignment on each iteration at that level.

This results in the walker ignoring a number of entries (depending
on the original misalignment) on each subsequent iteration.

Properly aligning the address before the next iteration addresses
this issue.

Cc: stable@vger.kernel.org
Reported-by: Howard Zhang <Howard.Zhang@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Jia He <justin.he@arm.com>
Fixes: b1e57de62c ("KVM: arm64: Add stand-alone page-table walker infrastructure")
[maz: rewrite commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210303024225.2591-1-justin.he@arm.com
Message-Id: <20210305185254.3730990-9-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06 04:18:41 -05:00
Marc Zyngier
9739f6ef05 KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility
It looks like we have broken firmware out there that wrongly advertises
a GICv2 compatibility interface, despite the CPUs not being able to deal
with it.

To work around this, check that the CPU initialising KVM is actually able
to switch to MMIO instead of system registers, and use that as a
precondition to enable GICv2 compatibility in KVM.

Note that the detection happens on a single CPU. If the firmware is
lying *and* that the CPUs are asymetric, all hope is lost anyway.

Reported-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Message-Id: <20210305185254.3730990-8-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06 04:18:41 -05:00
Marc Zyngier
b9d699e269 KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config()
As we are about to report a bit more information to the rest of
the kernel, rename __vgic_v3_get_ich_vtr_el2() to the more
explicit __vgic_v3_get_gic_config().

No functional change.

Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Message-Id: <20210305185254.3730990-7-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-03-06 04:18:41 -05:00