So far, we're using a complicated sequence of alternatives to
patch the kernel/hyp VA mask on non-VHE, and NOP out the
masking altogether when on VHE.
The newly introduced dynamic patching gives us the opportunity
to simplify that code by patching a single instruction with
the correct mask (instead of the mind bending cumulative masking
we have at the moment) or even a single NOP on VHE. This also
adds some initial code that will allow the patching callback
to switch to a more complex patching.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Currently we access the system registers array via the vcpu_sys_reg()
macro. However, we are about to change the behavior to some times
modify the register file directly, so let's change this to two
primitives:
* Accessor macros vcpu_write_sys_reg() and vcpu_read_sys_reg()
* Direct array access macro __vcpu_sys_reg()
The accessor macros should be used in places where the code needs to
access the currently loaded VCPU's state as observed by the guest. For
example, when trapping on cache related registers, a write to a system
register should go directly to the VCPU version of the register.
The direct array access macro can be used in places where the VCPU is
known to never be running (for example userspace access) or for
registers which are never context switched (for example all the PMU
system registers).
This rewrites all users of vcpu_sys_regs to one of the macros described
above.
No functional change.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In many cases, page tables can be accessed concurrently by either another
CPU (due to things like fast gup) or by the hardware page table walker
itself, which may set access/dirty bits. In such cases, it is important
to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
entries cannot be torn, merged or subject to apparent loss of coherence
due to compiler transformations.
Whilst there are some scenarios where this cannot happen (e.g. pinned
kernel mappings for the linear region), the overhead of using READ_ONCE
/WRITE_ONCE everywhere is minimal and makes the code an awful lot easier
to reason about. This patch consistently uses these macros in the arch
code, as well as explicitly namespacing pointers to page table entries
from the entries themselves by using adopting a 'p' suffix for the former
(as is sometimes used elsewhere in the kernel source).
Tested-by: Yury Norov <ynorov@caviumnetworks.com>
Tested-by: Richard Ruigrok <rruigrok@codeaurora.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
ARM:
- Include icache invalidation optimizations, improving VM startup time
- Support for forwarded level-triggered interrupts, improving
performance for timers and passthrough platform devices
- A small fix for power-management notifiers, and some cosmetic changes
PPC:
- Add MMIO emulation for vector loads and stores
- Allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without
requiring the complex thread synchronization of older CPU versions
- Improve the handling of escalation interrupts with the XIVE interrupt
controller
- Support decrement register migration
- Various cleanups and bugfixes.
s390:
- Cornelia Huck passed maintainership to Janosch Frank
- Exitless interrupts for emulated devices
- Cleanup of cpuflag handling
- kvm_stat counter improvements
- VSIE improvements
- mm cleanup
x86:
- Hypervisor part of SEV
- UMIP, RDPID, and MSR_SMI_COUNT emulation
- Paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit
- Allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more AVX512
features
- Show vcpu id in its anonymous inode name
- Many fixes and cleanups
- Per-VCPU MSR bitmaps (already merged through x86/pti branch)
- Stable KVM clock when nesting on Hyper-V (merged through x86/hyperv)
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJafvMtAAoJEED/6hsPKofo6YcH/Rzf2RmshrWaC3q82yfIV0Qz
Z8N8yJHSaSdc3Jo6cmiVj0zelwAxdQcyjwlT7vxt5SL2yML+/Q0st9Hc3EgGGXPm
Il99eJEl+2MYpZgYZqV8ff3mHS5s5Jms+7BITAeh6Rgt+DyNbykEAvzt+MCHK9cP
xtsIZQlvRF7HIrpOlaRzOPp3sK2/MDZJ1RBE7wYItK3CUAmsHim/LVYKzZkRTij3
/9b4LP1yMMbziG+Yxt1o682EwJB5YIat6fmDG9uFeEVI5rWWN7WFubqs8gCjYy/p
FX+BjpOdgTRnX+1m9GIj0Jlc/HKMXryDfSZS07Zy4FbGEwSiI5SfKECub4mDhuE=
=C/uD
-----END PGP SIGNATURE-----
Merge tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Radim Krčmář:
"ARM:
- icache invalidation optimizations, improving VM startup time
- support for forwarded level-triggered interrupts, improving
performance for timers and passthrough platform devices
- a small fix for power-management notifiers, and some cosmetic
changes
PPC:
- add MMIO emulation for vector loads and stores
- allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without
requiring the complex thread synchronization of older CPU versions
- improve the handling of escalation interrupts with the XIVE
interrupt controller
- support decrement register migration
- various cleanups and bugfixes.
s390:
- Cornelia Huck passed maintainership to Janosch Frank
- exitless interrupts for emulated devices
- cleanup of cpuflag handling
- kvm_stat counter improvements
- VSIE improvements
- mm cleanup
x86:
- hypervisor part of SEV
- UMIP, RDPID, and MSR_SMI_COUNT emulation
- paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit
- allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more
AVX512 features
- show vcpu id in its anonymous inode name
- many fixes and cleanups
- per-VCPU MSR bitmaps (already merged through x86/pti branch)
- stable KVM clock when nesting on Hyper-V (merged through
x86/hyperv)"
* tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (197 commits)
KVM: PPC: Book3S: Add MMIO emulation for VMX instructions
KVM: PPC: Book3S HV: Branch inside feature section
KVM: PPC: Book3S HV: Make HPT resizing work on POWER9
KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code
KVM: PPC: Book3S PR: Fix broken select due to misspelling
KVM: x86: don't forget vcpu_put() in kvm_arch_vcpu_ioctl_set_sregs()
KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled
KVM: PPC: Book3S HV: Drop locks before reading guest memory
kvm: x86: remove efer_reload entry in kvm_vcpu_stat
KVM: x86: AMD Processor Topology Information
x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested
kvm: embed vcpu id to dentry of vcpu anon inode
kvm: Map PFN-type memory regions as writable (if possible)
x86/kvm: Make it compile on 32bit and with HYPYERVISOR_GUEST=n
KVM: arm/arm64: Fixup userspace irqchip static key optimization
KVM: arm/arm64: Fix userspace_irqchip_in_use counting
KVM: arm/arm64: Fix incorrect timer_is_pending logic
MAINTAINERS: update KVM/s390 maintainers
MAINTAINERS: add Halil as additional vfio-ccw maintainer
MAINTAINERS: add David as a reviewer for KVM/s390
...
Now that we have per-CPU vectors, let's plug then in the KVM/arm64 code.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The vcpu parameter isn't used for anything, and gets in the way of
further cleanups. Let's get rid of it.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
So far, we loose the Exec property whenever we take permission
faults, as we always reconstruct the PTE/PMD from scratch. This
can be counter productive as we can end-up with the following
fault sequence:
X -> RO -> ROX -> RW -> RWX
Instead, we can lookup the existing PTE/PMD and clear the XN bit in the
new entry if it was already cleared in the old one, leadig to a much
nicer fault sequence:
X -> ROX -> RWX
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
We've so far eagerly invalidated the icache, no matter how
the page was faulted in (data or prefetch abort).
But we can easily track execution by setting the XN bits
in the S2 page tables, get the prefetch abort at HYP and
perform the icache invalidation at that time only.
As for most VMs, the instruction working set is pretty
small compared to the data set, this is likely to save
some traffic (specially as the invalidation is broadcast).
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
We currently tightly couple dcache clean with icache invalidation,
but KVM could do without the initial flush to PoU, as we've
already flushed things to PoC.
Let's introduce invalidate_icache_range which is limited to
invalidating the icache from the linear mapping (and thus
has none of the userspace fault handling complexity), and
wire it in KVM instead of flush_icache_range.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
As we're about to introduce opportunistic invalidation of the icache,
let's split dcache and icache flushing.
Acked-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Currently, when using VA_BITS < 48, if the ID map text happens to be
placed in physical memory above VA_BITS, we increase the VA size (up to
48) and create a new table level, in order to map in the ID map text.
This is okay because the system always supports 48 bits of VA.
This patch extends the code such that if the system supports 52 bits of
VA, and the ID map text is placed that high up, then we increase the VA
size accordingly, up to 52.
One difference from the current implementation is that so far the
condition of VA_BITS < 48 has meant that the top level table is always
"full", with the maximum number of entries, and an extra table level is
always needed. Now, when VA_BITS = 48 (and using 64k pages), the top
level table is not full, and we simply need to increase the number of
entries in it, instead of creating a new table level.
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[catalin.marinas@arm.com: reduce arguments to __create_hyp_mappings()]
[catalin.marinas@arm.com: reworked/renamed __cpu_uses_extended_idmap_level()]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The top 4 bits of a 52-bit physical address are positioned at bits
12..15 of a page table entry. Introduce macros to convert between a
physical address and its placement in a table entry, and change all
macros/functions that access PTEs to use them.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[catalin.marinas@arm.com: some long lines wrapped]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Instead of open coding the generation of page table entries, use the
macros/functions that exist for this - pfn_p*d and p*d_populate. Most
code in the kernel already uses these macros, this patch tries to fix
up the few places that don't. This is useful for the next patch in this
series, which needs to change the page table entry logic, and it's
better to have that logic in one place.
The KVM extended ID map is special, since we're creating a level above
CONFIG_PGTABLE_LEVELS and the required function isn't available. Leave
it as is and add a comment to explain it. (The normal kernel ID map code
doesn't need this change because its page tables are created in assembly
(__create_page_tables)).
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The top 4 bits of a 52-bit physical address are positioned at bits 2..5
in the TTBR registers. Introduce a couple of macros to move the bits
there, and change all TTBR writers to use them.
Leave TTBR0 PAN code unchanged, to avoid complicating it. A system with
52-bit PA will have PAN anyway (because it's ARMv8.1 or later), and a
system without 52-bit PA can only use up to 48-bit PAs. A later patch in
this series will add a kconfig dependency to ensure PAN is configured.
In addition, when using 52-bit PA there is a special alignment
requirement on the top-level table. We don't currently have any VA_BITS
configuration that would violate the requirement, but one could be added
in the future, so add a compile-time BUG_ON to check for it.
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
[catalin.marinas@arm.com: added TTBR_BADD_MASK_52 comment]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
To take advantage of the LSE atomic instructions and also make the code
cleaner, convert the kvm_set_s2pte_readonly() function to use the more
generic cmpxchg().
Cc: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
support; virtual interrupt controller performance improvements; support
for userspace virtual interrupt controller (slower, but necessary for
KVM on the weird Broadcom SoCs used by the Raspberry Pi 3)
* MIPS: basic support for hardware virtualization (ImgTec
P5600/P6600/I6400 and Cavium Octeon III)
* PPC: in-kernel acceleration for VFIO
* s390: support for guests without storage keys; adapter interruption
suppression
* x86: usual range of nVMX improvements, notably nested EPT support for
accessed and dirty bits; emulation of CPL3 CPUID faulting
* generic: first part of VCPU thread request API; kvm_stat improvements
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJZEHUkAAoJEL/70l94x66DBeYH/09wrpJ2FjU4Rqv7FxmqgWfH
9WGi4wvn/Z+XzQSyfMJiu2SfZVzU69/Y67OMHudy7vBT6knB+ziM7Ntoiu/hUfbG
0g5KsDX79FW15HuvuuGh9kSjUsj7qsQdyPZwP4FW/6ZoDArV9mibSvdjSmiUSMV/
2wxaoLzjoShdOuCe9EABaPhKK0XCrOYkygT6Paz1pItDxaSn8iW3ulaCuWMprUfG
Niq+dFemK464E4yn6HVD88xg5j2eUM6bfuXB3qR3eTR76mHLgtwejBzZdDjLG9fk
32PNYKhJNomBxHVqtksJ9/7cSR6iNPs7neQ1XHemKWTuYqwYQMlPj1NDy0aslQU=
=IsiZ
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- HYP mode stub supports kexec/kdump on 32-bit
- improved PMU support
- virtual interrupt controller performance improvements
- support for userspace virtual interrupt controller (slower, but
necessary for KVM on the weird Broadcom SoCs used by the Raspberry
Pi 3)
MIPS:
- basic support for hardware virtualization (ImgTec P5600/P6600/I6400
and Cavium Octeon III)
PPC:
- in-kernel acceleration for VFIO
s390:
- support for guests without storage keys
- adapter interruption suppression
x86:
- usual range of nVMX improvements, notably nested EPT support for
accessed and dirty bits
- emulation of CPL3 CPUID faulting
generic:
- first part of VCPU thread request API
- kvm_stat improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
kvm: nVMX: Don't validate disabled secondary controls
KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick
Revert "KVM: Support vCPU-based gfn->hva cache"
tools/kvm: fix top level makefile
KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING
KVM: Documentation: remove VM mmap documentation
kvm: nVMX: Remove superfluous VMX instruction fault checks
KVM: x86: fix emulation of RSM and IRET instructions
KVM: mark requests that need synchronization
KVM: return if kvm_vcpu_wake_up() did wake up the VCPU
KVM: add explicit barrier to kvm_vcpu_kick
KVM: perform a wake_up in kvm_make_all_cpus_request
KVM: mark requests that do not need a wakeup
KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up
KVM: x86: always use kvm_make_request instead of set_bit
KVM: add kvm_{test,clear}_request to replace {test,clear}_bit
s390: kvm: Cpu model support for msa6, msa7 and msa8
KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
kvm: better MWAIT emulation for guests
KVM: x86: virtualize cpuid faulting
...
With __cpu_reset_hyp_mode having become fairly dumb, there is no
need for kvm_get_idmap_start anymore.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
read_system_reg() can readily be confused with read_sysreg(),
whereas these are really quite different in their meaning.
This patches attempts to reduce the ambiguity be reserving "sysreg"
for the actual system register accessors.
read_system_reg() is instead renamed to read_sanitised_ftr_reg(),
to make it more obvious that the Linux-defined sanitised feature
register cache is being accessed here, not the underlying
architectural system registers.
cpufeature.c's internal __raw_read_system_reg() function is renamed
in line with its actual purpose: a form of read_sysreg() that
indexes on (non-compiletime-constant) encoding rather than symbolic
register name.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
A VPIPT I-cache has two main properties:
1. Lines allocated into the cache are tagged by VMID and a lookup can
only hit lines that were allocated with the current VMID.
2. I-cache invalidation from EL1/0 only invalidates lines that match the
current VMID of the CPU doing the invalidation.
This can cause issues with non-VHE configurations, where the host runs
at EL1 and wants to invalidate I-cache entries for a guest running with
a different VMID. VHE is not affected, because the host runs at EL2 and
I-cache invalidation applies as expected.
This patch solves the problem by invalidating the I-cache when unmapping
a page at stage 2 on a system with a VPIPT I-cache but not running with
VHE enabled. Hopefully this is an obscure enough configuration that the
overhead isn't anything to worry about, although it does mean that the
by-range I-cache invalidation currently performed when mapping at stage
2 can be elided on such systems, because the I-cache will be clean for
the guest VMID following a rollover event.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
cachetype.h and cache.h are small and both obviously related to caches.
Merge them together to reduce clutter.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As a recent change to ARMv8, ASID-tagged VIVT I-caches are removed
retrospectively from the architecture. Consequently, we don't need to
support them in Linux either.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
200 commits and noteworthy changes for most architectures.
* ARM:
- GICv3 save/restore
- cache flushing fixes
- working MSI injection for GICv3 ITS
- physical timer emulation
* MIPS:
- various improvements under the hood
- support for SMP guests
- a large rewrite of MMU emulation. KVM MIPS can now use MMU notifiers
to support copy-on-write, KSM, idle page tracking, swapping, ballooning
and everything else. KVM_CAP_READONLY_MEM is also supported, so that
writes to some memory regions can be treated as MMIO. The new MMU also
paves the way for hardware virtualization support.
* PPC:
- support for POWER9 using the radix-tree MMU for host and guest
- resizable hashed page table
- bugfixes.
* s390: expose more features to the guest
- more SIMD extensions
- instruction execution protection
- ESOP2
* x86:
- improved hashing in the MMU
- faster PageLRU tracking for Intel CPUs without EPT A/D bits
- some refactoring of nested VMX entry/exit code, preparing for live
migration support of nested hypervisors
- expose yet another AVX512 CPUID bit
- host-to-guest PTP support
- refactoring of interrupt injection, with some optimizations thrown in
and some duct tape removed.
- remove lazy FPU handling
- optimizations of user-mode exits
- optimizations of vcpu_is_preempted() for KVM guests
* generic:
- alternative signaling mechanism that doesn't pound on tsk->sighand->siglock
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJYral1AAoJEL/70l94x66DbNgH/Rx8YXuidFq2fe3RWOvld3RK
85OM/D5g38cTLpBE0/sJpcvX34iYN8U/l5foCZwpxB+83GHEk2Cr57JyfTogdaAJ
x8dBhHKQCA/HxSQUQLN6nFqRV+yT8WUR92Fhqx82+80BSen5Yzcfee/TDoW6T1IW
g8CYgX9FrRaGOX066ImAuUfdAdUVjyssfs9VttDTX+HiusPeuBPx/wsRe1ZEEPlH
vnltIJQb1ETV2GOZLUojKjzH6aZkjIl29XxjkYii9JTUornClG0DfW+5QT3uLrB5
gJ+G+Zmpsq8ZBx9jNDtAi7sFsoPY1Mzf+JPNCGXBra2sP2GrBAuXcxmgznRYltQ=
=8IIp
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"4.11 is going to be a relatively large release for KVM, with a little
over 200 commits and noteworthy changes for most architectures.
ARM:
- GICv3 save/restore
- cache flushing fixes
- working MSI injection for GICv3 ITS
- physical timer emulation
MIPS:
- various improvements under the hood
- support for SMP guests
- a large rewrite of MMU emulation. KVM MIPS can now use MMU
notifiers to support copy-on-write, KSM, idle page tracking,
swapping, ballooning and everything else. KVM_CAP_READONLY_MEM is
also supported, so that writes to some memory regions can be
treated as MMIO. The new MMU also paves the way for hardware
virtualization support.
PPC:
- support for POWER9 using the radix-tree MMU for host and guest
- resizable hashed page table
- bugfixes.
s390:
- expose more features to the guest
- more SIMD extensions
- instruction execution protection
- ESOP2
x86:
- improved hashing in the MMU
- faster PageLRU tracking for Intel CPUs without EPT A/D bits
- some refactoring of nested VMX entry/exit code, preparing for live
migration support of nested hypervisors
- expose yet another AVX512 CPUID bit
- host-to-guest PTP support
- refactoring of interrupt injection, with some optimizations thrown
in and some duct tape removed.
- remove lazy FPU handling
- optimizations of user-mode exits
- optimizations of vcpu_is_preempted() for KVM guests
generic:
- alternative signaling mechanism that doesn't pound on
tsk->sighand->siglock"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (195 commits)
x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64
x86/paravirt: Change vcp_is_preempted() arg type to long
KVM: VMX: use correct vmcs_read/write for guest segment selector/base
x86/kvm/vmx: Defer TR reload after VM exit
x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss
x86/kvm/vmx: Simplify segment_base()
x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels
x86/kvm/vmx: Don't fetch the TSS base from the GDT
x86/asm: Define the kernel TSS limit in a macro
kvm: fix page struct leak in handle_vmon
KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now
KVM: Return an error code only as a constant in kvm_get_dirty_log()
KVM: Return an error code only as a constant in kvm_get_dirty_log_protect()
KVM: Return directly after a failed copy_from_user() in kvm_vm_compat_ioctl()
KVM: x86: remove code for lazy FPU handling
KVM: race-free exit from KVM_RUN without POSIX signals
KVM: PPC: Book3S HV: Turn "KVM guest htab" message into a debug message
KVM: PPC: Book3S PR: Ratelimit copy data failure error messages
KVM: Support vCPU-based gfn->hva cache
KVM: use separate generations for each address space
...
Now that we unconditionally flush newly mapped pages to the PoC,
there is no need to care about the "uncached" status of individual
pages - they must all be visible all the way down.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
When we fault in a page, we flush it to the PoC (Point of Coherency)
if the faulting vcpu has its own caches off, so that it can observe
the page we just brought it.
But if the vcpu has its caches on, we skip that step. Bad things
happen when *another* vcpu tries to access that page with its own
caches disabled. At that point, there is no garantee that the
data has made it to the PoC, and we access stale data.
The obvious fix is to always flush to PoC when a page is faulted
in, no matter what the state of the vcpu is.
Cc: stable@vger.kernel.org
Fixes: 2d58b733c8 ("arm64: KVM: force cache clean on page fault when caches are off")
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
__pa_symbol is technically the marcro that should be used for kernel
symbols. Switch to this as a pre-requisite for DEBUG_VIRTUAL which
will do bounds checking.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Architecturally, TLBs are private to the (physical) CPU they're
associated with. But when multiple vcpus from the same VM are
being multiplexed on the same CPU, the TLBs are not private
to the vcpus (and are actually shared across the VMID).
Let's consider the following scenario:
- vcpu-0 maps PA to VA
- vcpu-1 maps PA' to VA
If run on the same physical CPU, vcpu-1 can hit TLB entries generated
by vcpu-0 accesses, and access the wrong physical page.
The solution to this is to keep a per-VM map of which vcpu ran last
on each given physical CPU, and invalidate local TLBs when switching
to a different vcpu from the same VM.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
All architectures:
Move `make kvmconfig` stubs from x86; use 64 bits for debugfs stats.
ARM:
Important fixes for not using an in-kernel irqchip; handle SError
exceptions and present them to guests if appropriate; proxying of GICV
access at EL2 if guest mappings are unsafe; GICv3 on AArch32 on ARMv8;
preparations for GICv3 save/restore, including ABI docs; cleanups and
a bit of optimizations.
MIPS:
A couple of fixes in preparation for supporting MIPS EVA host kernels;
MIPS SMP host & TLB invalidation fixes.
PPC:
Fix the bug which caused guests to falsely report lockups; other minor
fixes; a small optimization.
s390:
Lazy enablement of runtime instrumentation; up to 255 CPUs for nested
guests; rework of machine check deliver; cleanups and fixes.
x86:
IOMMU part of AMD's AVIC for vmexit-less interrupt delivery; Hyper-V
TSC page; per-vcpu tsc_offset in debugfs; accelerated INS/OUTS in
nVMX; cleanups and fixes.
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJX9iDrAAoJEED/6hsPKofoOPoIAIUlgojkb9l2l1XVDgsXdgQL
sRVhYSVv7/c8sk9vFImrD5ElOPZd+CEAIqFOu45+NM3cNi7gxip9yftUVs7wI5aC
eDZRWm1E4trDZLe54ZM9ThcqZzZZiELVGMfR1+ZndUycybwyWzafpXYsYyaXp3BW
hyHM3qVkoWO3dxBWFwHIoO/AUJrWYkRHEByKyvlC6KPxSdBPSa5c1AQwMCoE0Mo4
K/xUj4gBn9eMelNhg4Oqu/uh49/q+dtdoP2C+sVM8bSdquD+PmIeOhPFIcuGbGFI
B+oRpUhIuntN39gz8wInJ4/GRSeTuR2faNPxMn4E1i1u4LiuJvipcsOjPfe0a18=
=fZRB
-----END PGP SIGNATURE-----
Merge tag 'kvm-4.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Radim Krčmář:
"All architectures:
- move `make kvmconfig` stubs from x86
- use 64 bits for debugfs stats
ARM:
- Important fixes for not using an in-kernel irqchip
- handle SError exceptions and present them to guests if appropriate
- proxying of GICV access at EL2 if guest mappings are unsafe
- GICv3 on AArch32 on ARMv8
- preparations for GICv3 save/restore, including ABI docs
- cleanups and a bit of optimizations
MIPS:
- A couple of fixes in preparation for supporting MIPS EVA host
kernels
- MIPS SMP host & TLB invalidation fixes
PPC:
- Fix the bug which caused guests to falsely report lockups
- other minor fixes
- a small optimization
s390:
- Lazy enablement of runtime instrumentation
- up to 255 CPUs for nested guests
- rework of machine check deliver
- cleanups and fixes
x86:
- IOMMU part of AMD's AVIC for vmexit-less interrupt delivery
- Hyper-V TSC page
- per-vcpu tsc_offset in debugfs
- accelerated INS/OUTS in nVMX
- cleanups and fixes"
* tag 'kvm-4.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (140 commits)
KVM: MIPS: Drop dubious EntryHi optimisation
KVM: MIPS: Invalidate TLB by regenerating ASIDs
KVM: MIPS: Split kernel/user ASID regeneration
KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
KVM: arm/arm64: vgic: Don't flush/sync without a working vgic
KVM: arm64: Require in-kernel irqchip for PMU support
KVM: PPC: Book3s PR: Allow access to unprivileged MMCR2 register
KVM: PPC: Book3S PR: Support 64kB page size on POWER8E and POWER8NVL
KVM: PPC: Book3S: Remove duplicate setting of the B field in tlbie
KVM: PPC: BookE: Fix a sanity check
KVM: PPC: Book3S HV: Take out virtual core piggybacking code
KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread
ARM: gic-v3: Work around definition of gic_write_bpr1
KVM: nVMX: Fix the NMI IDT-vectoring handling
KVM: VMX: Enable MSR-BASED TPR shadow even if APICv is inactive
KVM: nVMX: Fix reload apic access page warning
kvmconfig: add virtio-gpu to config fragment
config: move x86 kvm_guest.config to a common location
arm64: KVM: Remove duplicating init code for setting VMID
ARM: KVM: Support vgic-v3
...
Make use of the new alternative_if and alternative_else_nop_endif and
get rid of our open-coded NOP sleds, making the code simpler to read.
Note that for __kvm_call_hyp the branch to __vhe_hyp_call has been moved
out of the alternative sequence, and in the default case there will be
four additional NOPs executed.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that 32-bit KVM no longer performs cache maintenance for page table
updates, we no longer need empty stubs for arm64. Remove them.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
We have both KERN_TO_HYP and kern_hyp_va, which do the exact same
thing. Let's standardize on the latter.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
We can now remove a number of dead #defines, thanks to the trampoline
code being gone.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
There is no way to free the boot PGD, because it doesn't exist
anymore as a standalone entity.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Since we now only have one set of page tables, the concept of
boot_pgd is useless and can be removed. We still keep it as
an element of the "extended idmap" thing.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
As we move towards a selectable HYP VA range, it is obvious that
we don't want to test a variable to find out if we need to use
the bottom VA range, the top VA range, or use the address as is
(for VHE).
Instead, we can expand our current helper to generate the right
mask or nop with code patching. We default to using the top VA
space, with alternatives to switch to the bottom one or to nop
out the instructions.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Define the two possible HYP VA regions in terms of VA_BITS,
and keep HYP_PAGE_OFFSET_MASK as a temporary compatibility
definition.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
HYP_PAGE_OFFSET is not massively useful. And the way we use it
in KERN_HYP_VA is inconsistent with the equivalent operation in
EL2, where we use a mask instead.
Let's replace the uses of HYP_PAGE_OFFSET with HYP_PAGE_OFFSET_MASK,
and get rid of the pointless macro.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Since dealing with VA ranges tends to hurt my brain badly, let's
start with a bit of documentation that will hopefully help
understanding what comes next...
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Currently, create_hyp_mappings applies a "one size fits all" page
protection (PAGE_HYP). As we're heading towards separate protections
for different sections, let's make this protection a parameter, and
let the callers pass their prefered protection (PAGE_HYP for everyone
for the time being).
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
- x86: miscellaneous fixes, AVIC support (local APIC virtualization,
AMD version)
- s390: polling for interrupts after a VCPU goes to halted state is
now enabled for s390; use hardware provided information about facility
bits that do not need any hypervisor activity, and other fixes for
cpu models and facilities; improve perf output; floating interrupt
controller improvements.
- MIPS: miscellaneous fixes
- PPC: bugfixes only
- ARM: 16K page size support, generic firmware probing layer for
timer and GIC
Christoffer Dall (KVM-ARM maintainer) says:
"There are a few changes in this pull request touching things outside
KVM, but they should all carry the necessary acks and it made the
merge process much easier to do it this way."
though actually the irqchip maintainers' acks didn't make it into the
patches. Marc Zyngier, who is both irqchip and KVM-ARM maintainer,
later acked at http://mid.gmane.org/573351D1.4060303@arm.com
"more formally and for documentation purposes".
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJXPJjyAAoJEL/70l94x66DhioH/j4fwQ0FmfPSM9PArzaFHQdx
LNE3tU4+bobbsy1BJr4DiAaOUQn3DAgwUvGLWXdeLiOXtoWXBiFHKaxlqEsCA6iQ
xcTH1TgfxsVoqGQ6bT9X/2GCx70heYpcWG3f+zqBy7ZfFmQykLAC/HwOr52VQL8f
hUFi3YmTHcnorp0n5Xg+9r3+RBS4D/kTbtdn6+KCLnPJ0RcgNkI3/NcafTemoofw
Tkv8+YYFNvKV13qlIfVqxMa0GwWI3pP6YaNKhaS5XO8Pu16HuuF1JthJsUBDzwBa
RInp8R9MoXgsBYhLpz3jc9vWG7G9yDl5LehsD9KOUGOaFYJ7sQN+QZOusa6jFgA=
=llO5
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Small release overall.
x86:
- miscellaneous fixes
- AVIC support (local APIC virtualization, AMD version)
s390:
- polling for interrupts after a VCPU goes to halted state is now
enabled for s390
- use hardware provided information about facility bits that do not
need any hypervisor activity, and other fixes for cpu models and
facilities
- improve perf output
- floating interrupt controller improvements.
MIPS:
- miscellaneous fixes
PPC:
- bugfixes only
ARM:
- 16K page size support
- generic firmware probing layer for timer and GIC
Christoffer Dall (KVM-ARM maintainer) says:
"There are a few changes in this pull request touching things
outside KVM, but they should all carry the necessary acks and it
made the merge process much easier to do it this way."
though actually the irqchip maintainers' acks didn't make it into the
patches. Marc Zyngier, who is both irqchip and KVM-ARM maintainer,
later acked at http://mid.gmane.org/573351D1.4060303@arm.com ('more
formally and for documentation purposes')"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (82 commits)
KVM: MTRR: remove MSR 0x2f8
KVM: x86: make hwapic_isr_update and hwapic_irr_update look the same
svm: Manage vcpu load/unload when enable AVIC
svm: Do not intercept CR8 when enable AVIC
svm: Do not expose x2APIC when enable AVIC
KVM: x86: Introducing kvm_x86_ops.apicv_post_state_restore
svm: Add VMEXIT handlers for AVIC
svm: Add interrupt injection via AVIC
KVM: x86: Detect and Initialize AVIC support
svm: Introduce new AVIC VMCB registers
KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick
KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking hooks
KVM: x86: Introducing kvm_x86_ops VM init/destroy hooks
KVM: x86: Rename kvm_apic_get_reg to kvm_lapic_get_reg
KVM: x86: Misc LAPIC changes to expose helper functions
KVM: shrink halt polling even more for invalid wakeups
KVM: s390: set halt polling to 80 microseconds
KVM: halt_polling: provide a way to qualify wakeups during poll
KVM: PPC: Book3S HV: Re-enable XICS fast path for irqfd-generated interrupts
kvm: Conditionally register IRQ bypass consumer
...
The ARMv8.1 architecture extensions introduce support for hardware
updates of the access and dirty information in page table entries. With
VTCR_EL2.HA enabled (bit 21), when the CPU accesses an IPA with the
PTE_AF bit cleared in the stage 2 page table, instead of raising an
Access Flag fault to EL2 the CPU sets the actual page table entry bit
(10). To ensure that kernel modifications to the page table do not
inadvertently revert a bit set by hardware updates, certain Stage 2
software pte/pmd operations must be performed atomically.
The main user of the AF bit is the kvm_age_hva() mechanism. The
kvm_age_hva_handler() function performs a "test and clear young" action
on the pte/pmd. This needs to be atomic in respect of automatic hardware
updates of the AF bit. Since the AF bit is in the same position for both
Stage 1 and Stage 2, the patch reuses the existing
ptep_test_and_clear_young() functionality if
__HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG is defined. Otherwise, the
existing pte_young/pte_mkold mechanism is preserved.
The kvm_set_s2pte_readonly() (and the corresponding pmd equivalent) have
to perform atomic modifications in order to avoid a race with updates of
the AF bit. The arm64 implementation has been re-written using
exclusives.
Currently, kvm_set_s2pte_writable() (and pmd equivalent) take a pointer
argument and modify the pte/pmd in place. However, these functions are
only used on local variables rather than actual page table entries, so
it makes more sense to follow the pte_mkwrite() approach for stage 1
attributes. The change to kvm_s2pte_mkwrite() makes it clear that these
functions do not modify the actual page table entries.
The (pte|pmd)_mkyoung() uses on Stage 2 entries (setting the AF bit
explicitly) do not need to be modified since hardware updates of the
dirty status are not supported by KVM, so there is no possibility of
losing such information.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
The current kvm implementation on arm64 does cpu-specific initialization
at system boot, and has no way to gracefully shutdown a core in terms of
kvm. This prevents kexec from rebooting the system at EL2.
This patch adds a cpu tear-down function and also puts an existing cpu-init
code into a separate function, kvm_arch_hardware_disable() and
kvm_arch_hardware_enable() respectively.
We don't need the arm64 specific cpu hotplug hook any more.
Since this patch modifies common code between arm and arm64, one stub
definition, __cpu_reset_hyp_mode(), is added on arm side to avoid
compilation errors.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
[Rebase, added separate VHE init/exit path, changed resets use of
kvm_call_hyp() to the __version, en/disabled hardware in init_subsystems(),
added icache maintenance to __kvm_hyp_reset() and removed lr restore, removed
guest-enter after teardown handling]
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that we don't have any fake page table levels for arm64,
cleanup the common code to get rid of the dead code.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
On arm64, the hardware supports concatenation of upto 16 tables,
at entry level for stage2 translations and we make use that whenever
possible. This could lead to reduced number of translation levels than
the normal (stage1 table) table. Also, since the IPA(40bit) is smaller
than the some of the supported VA_BITS (e.g, 48bit), there could be
different number of levels in stage-1 vs stage-2 tables. To reuse the
kernel host page table walker for stage2 we have been using a fake
software page table level, not known to the hardware. But with 16K
translations, there could be upto 2 fake software levels (with 48bit VA
and 40bit IPA), which complicates the code. Hence, we want to get rid of
the hack.
Now that we have explicit accessors for hyp vs stage2 page tables,
define the stage2 walker helpers accordingly based on the actual
table used by the hardware.
Once we know the number of translation levels used by the hardware,
it is merely a job of defining the helpers based on whether a
particular level is folded or not, looking at the number of levels.
Some facts before we calculate the translation levels:
1) Smallest page size supported by arm64 is 4K.
2) The minimum number of bits resolved at any page table level
is (PAGE_SHIFT - 3) at intermediate levels.
Both of them implies, minimum number of bits required for a level
change is 9.
Since we can concatenate upto 16 tables at stage2 entry, the total
number of page table levels used by the hardware for resolving N bits
is same as that for (N - 4) bits (with concatenation), as there cannot
be a level in between (N, N-4) as per the above rules.
Hence, we have
STAGE2_PGTABLE_LEVELS = PGTABLE_LEVELS(KVM_PHYS_SHIFT - 4)
With the current IPA limit (40bit), for all supported translations
and VA_BITS, we have the following condition (even for 36bit VA with
16K page size):
CONFIG_PGTABLE_LEVELS >= STAGE2_PGTABLE_LEVELS.
So, for e.g, if PUD is present in stage2, it is present in the hyp(host).
Hence, we fall back to the host definition if we find that a level is not
folded. Otherwise we redefine it accordingly. A build time check is added
to make sure the above condition holds. If this condition breaks in future,
we can rearrange the host level helpers and fix our code easily.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Now that we have switched to explicit page table routines,
get rid of the obsolete kvm_* wrappers.
Also, kvm_tlb_flush_vmid_by_ipa is now called only on stage2
page tables, hence get rid of the redundant check.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Introduce hyp_pxx_table_empty helpers for checking whether
a given table entry is empty. This will be used explicitly
once we switch to explicit routines for hyp page table walk.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Introduce stage2 page table helpers for arm64. With the fake
page table level still in place, the stage2 table has the same
number of levels as that of the host (and hyp), so they all
fallback to the host version.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Rearrange the code for fake pgd handling, which is applicable
only for arm64. This will later be removed once we introduce
the stage2 page table walker macros.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
- Initial page table creation reworked to avoid breaking large block
mappings (huge pages) into smaller ones. The ARM architecture requires
break-before-make in such cases to avoid TLB conflicts but that's not
always possible on live page tables
- Kernel virtual memory layout: the kernel image is no longer linked to
the bottom of the linear mapping (PAGE_OFFSET) but at the bottom of
the vmalloc space, allowing the kernel to be loaded (nearly) anywhere
in physical RAM
- Kernel ASLR: position independent kernel Image and modules being
randomly mapped in the vmalloc space with the randomness is provided
by UEFI (efi_get_random_bytes() patches merged via the arm64 tree,
acked by Matt Fleming)
- Implement relative exception tables for arm64, required by KASLR
(initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c but
actual x86 conversion to deferred to 4.7 because of the merge
dependencies)
- Support for the User Access Override feature of ARMv8.2: this allows
uaccess functions (get_user etc.) to be implemented using LDTR/STTR
instructions. Such instructions, when run by the kernel, perform
unprivileged accesses adding an extra level of protection. The
set_fs() macro is used to "upgrade" such instruction to privileged
accesses via the UAO bit
- Half-precision floating point support (part of ARMv8.2)
- Optimisations for CPUs with or without a hardware prefetcher (using
run-time code patching)
- copy_page performance improvement to deal with 128 bytes at a time
- Sanity checks on the CPU capabilities (via CPUID) to prevent
incompatible secondary CPUs from being brought up (e.g. weird
big.LITTLE configurations)
- valid_user_regs() reworked for better sanity check of the sigcontext
information (restored pstate information)
- ACPI parking protocol implementation
- CONFIG_DEBUG_RODATA enabled by default
- VDSO code marked as read-only
- DEBUG_PAGEALLOC support
- ARCH_HAS_UBSAN_SANITIZE_ALL enabled
- Erratum workaround Cavium ThunderX SoC
- set_pte_at() fix for PROT_NONE mappings
- Code clean-ups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJW6u95AAoJEGvWsS0AyF7xMyoP/3x2O6bgreSQ84BdO4JChN4+
RQ9OVdX8u2ItO9sgaCY2AA6KoiBuEjGmPl/XRuK0I7DpODTtRjEXQHuNNhz8AelC
hn4AEVqamY6Z5BzHFIjs8G9ydEbq+OXcKWEdwSsBhP/cMvI7ss3dps1f5iNPT5Vv
50E/kUz+aWYy7pKlB18VDV7TUOA3SuYuGknWV8+bOY5uPb8hNT3Y3fHOg/EuNNN3
DIuYH1V7XQkXtF+oNVIGxzzJCXULBE7egMcWAm1ydSOHK0JwkZAiL7OhI7ceVD0x
YlDxBnqmi4cgzfBzTxITAhn3OParwN6udQprdF1WGtFF6fuY2eRDSH/L/iZoE4DY
OulL951OsBtF8YC3+RKLk908/0bA2Uw8ftjCOFJTYbSnZBj1gWK41VkCYMEXiHQk
EaN8+2Iw206iYIoyvdjGCLw7Y0oakDoVD9vmv12SOaHeQljTkjoN8oIlfjjKTeP7
3AXj5v9BDMDVh40nkVayysRNvqe48Kwt9Wn0rhVTLxwdJEiFG/OIU6HLuTkretdN
dcCNFSQrRieSFHpBK9G0vKIpIss1ZwLm8gjocVXH7VK4Mo/TNQe4p2/wAF29mq4r
xu1UiXmtU3uWxiqZnt72LOYFCarQ0sFA5+pMEvF5W+NrVB0wGpXhcwm+pGsIi4IM
LepccTgykiUBqW5TRzPz
=/oS+
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Here are the main arm64 updates for 4.6. There are some relatively
intrusive changes to support KASLR, the reworking of the kernel
virtual memory layout and initial page table creation.
Summary:
- Initial page table creation reworked to avoid breaking large block
mappings (huge pages) into smaller ones. The ARM architecture
requires break-before-make in such cases to avoid TLB conflicts but
that's not always possible on live page tables
- Kernel virtual memory layout: the kernel image is no longer linked
to the bottom of the linear mapping (PAGE_OFFSET) but at the bottom
of the vmalloc space, allowing the kernel to be loaded (nearly)
anywhere in physical RAM
- Kernel ASLR: position independent kernel Image and modules being
randomly mapped in the vmalloc space with the randomness is
provided by UEFI (efi_get_random_bytes() patches merged via the
arm64 tree, acked by Matt Fleming)
- Implement relative exception tables for arm64, required by KASLR
(initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c
but actual x86 conversion to deferred to 4.7 because of the merge
dependencies)
- Support for the User Access Override feature of ARMv8.2: this
allows uaccess functions (get_user etc.) to be implemented using
LDTR/STTR instructions. Such instructions, when run by the kernel,
perform unprivileged accesses adding an extra level of protection.
The set_fs() macro is used to "upgrade" such instruction to
privileged accesses via the UAO bit
- Half-precision floating point support (part of ARMv8.2)
- Optimisations for CPUs with or without a hardware prefetcher (using
run-time code patching)
- copy_page performance improvement to deal with 128 bytes at a time
- Sanity checks on the CPU capabilities (via CPUID) to prevent
incompatible secondary CPUs from being brought up (e.g. weird
big.LITTLE configurations)
- valid_user_regs() reworked for better sanity check of the
sigcontext information (restored pstate information)
- ACPI parking protocol implementation
- CONFIG_DEBUG_RODATA enabled by default
- VDSO code marked as read-only
- DEBUG_PAGEALLOC support
- ARCH_HAS_UBSAN_SANITIZE_ALL enabled
- Erratum workaround Cavium ThunderX SoC
- set_pte_at() fix for PROT_NONE mappings
- Code clean-ups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (99 commits)
arm64: kasan: Fix zero shadow mapping overriding kernel image shadow
arm64: kasan: Use actual memory node when populating the kernel image shadow
arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission
arm64: Fix misspellings in comments.
arm64: efi: add missing frame pointer assignment
arm64: make mrs_s prefixing implicit in read_cpuid
arm64: enable CONFIG_DEBUG_RODATA by default
arm64: Rework valid_user_regs
arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly
arm64: KVM: Move kvm_call_hyp back to its original localtion
arm64: mm: treat memstart_addr as a signed quantity
arm64: mm: list kernel sections in order
arm64: lse: deal with clobbered IP registers after branch via PLT
arm64: mm: dump: Use VA_START directly instead of private LOWEST_ADDR
arm64: kconfig: add submenu for 8.2 architectural features
arm64: kernel: acpi: fix ioremap in ACPI parking protocol cpu_postboot
arm64: Add support for Half precision floating point
arm64: Remove fixmap include fragility
arm64: Add workaround for Cavium erratum 27456
arm64: mm: Mark .rodata as RO
...
The kern_hyp_va macro is pretty meaninless with VHE, as there is
only one mapping - the kernel one.
In order to keep the code readable and efficient, use runtime
patching to replace the 'and' instruction used to compute the VA
with a 'nop'.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Now that we have a clear understanding of the sign of a feature,
rename the routines to reflect the sign, so that it is not misused.
The cpuid_feature_extract_field() now accepts a 'sign' parameter.
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>