KVM precomputes the hyp VA of __kvm_hyp_host_vector, essentially a
constant (minus ASLR), before passing it to __kvm_hyp_init.
Now that we have alternatives for converting kimg VA to hyp VA, replace
this with computing the constant inside __kvm_hyp_init, thus removing
the need for an argument.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-10-dbrazdil@google.com
When a CPU is booted in EL2, the kernel checks for VHE support and
initializes the CPU core accordingly. For nVHE it also installs the stub
vectors and drops down to EL1.
Once KVM gains the ability to boot cores without going through the
kernel entry point, it will need to initialize the CPU the same way.
Extract the relevant bits of el2_setup into an init_el2_state macro
with an argument specifying whether to initialize for VHE or nVHE.
The following ifdefs are removed:
* CONFIG_ARM_GIC_V3 - always selected on arm64
* CONFIG_COMPAT - hstr_el2 can be set even without 32-bit support
No functional change intended. Size of el2_setup increased by
148 bytes due to duplication.
Signed-off-by: David Brazdil <dbrazdil@google.com>
[maz: reworked to fit the new PSTATE initial setup code]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-9-dbrazdil@google.com
CPU index should never be negative. Change the signature of
(set_)cpu_logical_map to take an unsigned int.
This still works even if the users treat the CPU index as an int,
and will allow the hypervisor's implementation to check that the index
is valid with a single upper-bound check.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-8-dbrazdil@google.com
Add an early parameter that allows users to select the mode of operation
for KVM/arm64.
For now, the only supported value is "protected". By passing this flag
users opt into the hypervisor placing additional restrictions on the
host kernel. These allow the hypervisor to spawn guests whose state is
kept private from the host. Restrictions will include stage-2 address
translation to prevent host from accessing guest memory, filtering its
SMC calls, etc.
Without this parameter, the default behaviour remains selecting VHE/nVHE
based on hardware support and CONFIG_ARM64_VHE.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-2-dbrazdil@google.com
ARM64 has non-pagetable aligned large page support with PTE_CONT, when
this bit is set the page is part of a super-page. Match the hugetlb
code and support these super pages for PTE and PMD levels.
This enables PERF_SAMPLE_{DATA,CODE}_PAGE_SIZE to report accurate
pagetable leaf sizes.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20201126125747.GG2414@hirez.programming.kicks-ass.net
Pull arm64 fixes from Will Deacon:
"I'm sad to say that we've got an unusually large arm64 fixes pull for
rc7 which addresses numerous significant instrumentation issues with
our entry code.
Without these patches, lockdep is hopelessly unreliable in some
configurations [1,2] and syzkaller is therefore not a lot of use
because it's so noisy.
Although much of this has always been broken, it appears to have been
exposed more readily by other changes such as 044d0d6de9 ("lockdep:
Only trace IRQ edges") and general lockdep improvements around IRQ
tracing and NMIs.
Fixing this properly required moving much of the instrumentation hooks
from our entry assembly into C, which Mark has been working on for the
last few weeks. We're not quite ready to move to the recently added
generic functions yet, but the code here has been deliberately written
to mimic that closely so we can look at cleaning things up once we
have a bit more breathing room.
Having said all that, the second version of these patches was posted
last week and I pushed it into our CI (kernelci and cki) along with a
commit which forced on PROVE_LOCKING, NOHZ_FULL and
CONTEXT_TRACKING_FORCE. The result? We found a real bug in the
md/raid10 code [3].
Oh, and there's also a really silly typo patch that's unrelated.
Summary:
- Fix numerous issues with instrumentation and exception entry
- Fix hideous typo in unused register field definition"
[1] https://lore.kernel.org/r/CACT4Y+aAzoJ48Mh1wNYD17pJqyEcDnrxGfApir=-j171TnQXhw@mail.gmail.com
[2] https://lore.kernel.org/r/20201119193819.GA2601289@elver.google.com
[3] https://lore.kernel.org/r/94c76d5e-466a-bc5f-e6c2-a11b65c39f83@redhat.com
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mte: Fix typo in macro definition
arm64: entry: fix EL1 debug transitions
arm64: entry: fix NMI {user, kernel}->kernel transitions
arm64: entry: fix non-NMI kernel<->kernel transitions
arm64: ptrace: prepare for EL1 irq/rcu tracking
arm64: entry: fix non-NMI user<->kernel transitions
arm64: entry: move el1 irq/nmi logic to C
arm64: entry: prepare ret_to_user for function call
arm64: entry: move enter_from_user_mode to entry-common.c
arm64: entry: mark entry code as noinstr
arm64: mark idle code as noinstr
arm64: syscall: exit userspace before unmasking exceptions
Now that set_fs() is gone, addr_limit_user_check() is redundant. Remove
the checks and associated thread flag.
To ensure that _TIF_WORK_MASK can be used as an immediate value in an
AND instruction (as it is in `ret_to_user`), TIF_MTE_ASYNC_FAULT is
renumbered to keep the constituent bits of _TIF_WORK_MASK contiguous.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-11-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that the uaccess primitives dont take addr_limit into account, we
have no need to manipulate this via set_fs() and get_fs(). Remove
support for these, along with some infrastructure this renders
redundant.
We no longer need to flip UAO to access kernel memory under KERNEL_DS,
and head.S unconditionally clears UAO for all kernel configurations via
an ERET in init_kernel_el. Thus, we don't need to dynamically flip UAO,
nor do we need to context-switch it. However, we still need to adjust
PAN during SDEI entry.
Masking of __user pointers no longer needs to use the dynamic value of
addr_limit, and can use a constant derived from the maximum possible
userspace task size. A new TASK_SIZE_MAX constant is introduced for
this, which is also used by core code. In configurations supporting
52-bit VAs, this may include a region of unusable VA space above a
48-bit TTBR0 limit, but never includes any portion of TTBR1.
Note that TASK_SIZE_MAX is an exclusive limit, while USER_DS and
KERNEL_DS were inclusive limits, and is converted to a mask by
subtracting one.
As the SDEI entry code repurposes the otherwise unnecessary
pt_regs::orig_addr_limit field to store the TTBR1 of the interrupted
context, for now we rename that to pt_regs::sdei_ttbr1. In future we can
consider factoring that out.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-10-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch separates arm64's user and kernel memory access primitives
into distinct routines, adding new __{get,put}_kernel_nofault() helpers
to access kernel memory, upon which core code builds larger copy
routines.
The kernel access routines (using LDR/STR) are not affected by PAN (when
legitimately accessing kernel memory), nor are they affected by UAO.
Switching to KERNEL_DS may set UAO, but this does not adversely affect
the kernel access routines.
The user access routines (using LDTR/STTR) are not affected by PAN (when
legitimately accessing user memory), but are affected by UAO. As these
are only legitimate to use under USER_DS with UAO clear, this should not
be problematic.
Routines performing atomics to user memory (futex and deprecated
instruction emulation) still need to transiently clear PAN, and these
are left as-is. These are never used on kernel memory.
Subsequent patches will refactor the uaccess helpers to remove redundant
code, and will also remove the redundant PAN/UAO manipulation.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-8-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As a step towards implementing __{get,put}_kernel_nofault(), this patch
splits most user-memory specific logic out of __{get,put}_user(), with
the memory access and fault handling in new __{raw_get,put}_mem()
helpers.
For now the LDR/LDTR patching is left within the *get_mem() helpers, and
will be removed in a subsequent patch.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-7-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We currently have many uaccess_*{enable,disable}*() variants, which
subsequent patches will cut down as part of removing set_fs() and
friends. Once this simplification is made, most uaccess routines will
only need to ensure that the user page tables are mapped in TTBR0, as is
currently dealt with by uaccess_ttbr0_{enable,disable}().
The existing uaccess_{enable,disable}() routines ensure that user page
tables are mapped in TTBR0, and also disable PAN protections, which is
necessary to be able to use atomics on user memory, but also permit
unrelated privileged accesses to access user memory.
As preparatory step, let's rename uaccess_{enable,disable}() to
uaccess_{enable,disable}_privileged(), highlighting this caveat and
discouraging wider misuse. Subsequent patches can reuse the
uaccess_{enable,disable}() naming for the common case of ensuring the
user page tables are mapped in TTBR0.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In preparation for removing addr_limit and set_fs() we must decouple the
SDEI PAN/UAO manipulation from the uaccess code, and explicitly
reinitialize these as required.
SDEI enters the kernel with a non-architectural exception, and prior to
the most recent revision of the specification (ARM DEN 0054B), PSTATE
bits (e.g. PAN, UAO) are not manipulated in the same way as for
architectural exceptions. Notably, older versions of the spec can be
read ambiguously as to whether PSTATE bits are inherited unchanged from
the interrupted context or whether they are generated from scratch, with
TF-A doing the latter.
We have three cases to consider:
1) The existing TF-A implementation of SDEI will clear PAN and clear UAO
(along with other bits in PSTATE) when delivering an SDEI exception.
2) In theory, implementations of SDEI prior to revision B could inherit
PAN and UAO (along with other bits in PSTATE) unchanged from the
interrupted context. However, in practice such implementations do not
exist.
3) Going forward, new implementations of SDEI must clear UAO, and
depending on SCTLR_ELx.SPAN must either inherit or set PAN.
As we can ignore (2) we can assume that upon SDEI entry, UAO is always
clear, though PAN may be clear, inherited, or set per SCTLR_ELx.SPAN.
Therefore, we must explicitly initialize PAN, but do not need to do
anything for UAO.
Considering what we need to do:
* When set_fs() is removed, force_uaccess_begin() will have no HW
side-effects. As this only clears UAO, which we can assume has already
been cleared upon entry, this is not a problem. We do not need to add
code to manipulate UAO explicitly.
* PAN may be cleared upon entry (in case 1 above), so where a kernel is
built to use PAN and this is supported by all CPUs, the kernel must
set PAN upon entry to ensure expected behaviour.
* PAN may be inherited from the interrupted context (in case 3 above),
and so where a kernel is not built to use PAN or where PAN support is
not uniform across CPUs, the kernel must clear PAN to ensure expected
behaviour.
This patch reworks the SDEI code accordingly, explicitly setting PAN to
the expected state in all cases. To cater for the cases where the kernel
does not use PAN or this is not uniformly supported by hardware we add a
new cpu_has_pan() helper which can be used regardless of whether the
kernel is built to use PAN.
The existing system_uses_ttbr0_pan() is redefined in terms of
system_uses_hw_pan() both for clarity and as a minor optimization when
HW PAN is not selected.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As with SCTLR_ELx and other control registers, some PSTATE bits are
UNKNOWN out-of-reset, and we may not be able to rely on hardware or
firmware to initialize them to our liking prior to entry to the kernel,
e.g. in the primary/secondary boot paths and return from idle/suspend.
It would be more robust (and easier to reason about) if we consistently
initialized PSTATE to a default value, as we do with control registers.
This will ensure that the kernel is not adversely affected by bits it is
not aware of, e.g. when support for a feature such as PAN/UAO is
disabled.
This patch ensures that PSTATE is consistently initialized at boot time
via an ERET. This is not intended to relax the existing requirements
(e.g. DAIF bits must still be set prior to entering the kernel). For
features detected dynamically (which may require system-wide support),
it is still necessary to subsequently modify PSTATE.
As ERET is not always a Context Synchronization Event, an ISB is placed
before each exception return to ensure updates to control registers have
taken effect. This handles the kernel being entered with SCTLR_ELx.EOS
clear (or any future control bits being in an UNKNOWN state).
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201113124937.20574-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
If we get a FSC_PERM fault, just using (logging_active && writable) to
determine calling kvm_pgtable_stage2_map(). There will be two more cases
we should consider.
(1) After logging_active is configged back to false from true. When we
get a FSC_PERM fault with write_fault and adjustment of hugepage is needed,
we should merge tables back to a block entry. This case is ignored by still
calling kvm_pgtable_stage2_relax_perms(), which will lead to an endless
loop and guest panic due to soft lockup.
(2) We use (FSC_PERM && logging_active && writable) to determine
collapsing a block entry into a table by calling kvm_pgtable_stage2_map().
But sometimes we may only need to relax permissions when trying to write
to a page other than a block.
In this condition,using kvm_pgtable_stage2_relax_perms() will be fine.
The ISS filed bit[1:0] in ESR_EL2 regesiter indicates the stage2 lookup
level at which a D-abort or I-abort occurred. By comparing granule of
the fault lookup level with vma_pagesize, we can strictly distinguish
conditions of calling kvm_pgtable_stage2_relax_perms() or
kvm_pgtable_stage2_map(), and the above two cases will be well considered.
Suggested-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201201201034.116760-4-wangyanan55@huawei.com
Cores that predate the introduction of ID_AA64PFR0_EL1.CSV3 to
the ARMv8 architecture have this field set to 0, even of some of
them are not affected by the vulnerability.
The kernel maintains a list of unaffected cores (A53, A55 and a few
others) so that it doesn't impose an expensive mitigation uncessarily.
As we do for CSV2, let's expose the CSV3 property to guests that run
on HW that is effectively not vulnerable. This can be reset to zero
by writing to the ID register from userspace, ensuring that VMs can
be migrated despite the new property being set.
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Exceptions which can be taken at (almost) any time are consdiered to be
NMIs. On arm64 that includes:
* SDEI events
* GICv3 Pseudo-NMIs
* Kernel stack overflows
* Unexpected/unhandled exceptions
... but currently debug exceptions (BRKs, breakpoints, watchpoints,
single-step) are not considered NMIs.
As these can be taken at any time, kernel features (lockdep, RCU,
ftrace) may not be in a consistent kernel state. For example, we may
take an NMI from the idle code or partway through an entry/exit path.
While nmi_enter() and nmi_exit() handle most of this state, notably they
don't save/restore the lockdep state across an NMI being taken and
handled. When interrupts are enabled and an NMI is taken, lockdep may
see interrupts become disabled within the NMI code, but not see
interrupts become enabled when returning from the NMI, leaving lockdep
believing interrupts are disabled when they are actually disabled.
The x86 code handles this in idtentry_{enter,exit}_nmi(), which will
shortly be moved to the generic entry code. As we can't use either yet,
we copy the x86 approach in arm64-specific helpers. All the NMI
entrypoints are marked as noinstr to prevent any instrumentation
handling code being invoked before the state has been corrected.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-11-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
There are periods in kernel mode when RCU is not watching and/or the
scheduler tick is disabled, but we can still take exceptions such as
interrupts. The arm64 exception handlers do not account for this, and
it's possible that RCU is not watching while an exception handler runs.
The x86/generic entry code handles this by ensuring that all (non-NMI)
kernel exception handlers call irqentry_enter() and irqentry_exit(),
which handle RCU, lockdep, and IRQ flag tracing. We can't yet move to
the generic entry code, and already hadnle the user<->kernel transitions
elsewhere, so we add new kernel<->kernel transition helpers alog the
lines of the generic entry code.
Since we now track interrupts becoming masked when an exception is
taken, local_daif_inherit() is modified to track interrupts becoming
re-enabled when the original context is inherited. To balance the
entry/exit paths, each handler masks all DAIF exceptions before
exit_to_kernel_mode().
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-10-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Exceptions from EL1 may be taken when RCU isn't watching (e.g. in idle
sequences), or when the lockdep hardirqs transiently out-of-sync with
the hardware state (e.g. in the middle of local_irq_enable()). To
correctly handle these cases, we'll need to save/restore this state
across some exceptions taken from EL1.
A series of subsequent patches will update EL1 exception handlers to
handle this. In preparation for this, and to avoid dependencies between
those patches, this patch adds two new fields to struct pt_regs so that
exception handlers can track this state.
Note that this is placed in pt_regs as some entry/exit sequences such as
el1_irq are invoked from assembly, which makes it very difficult to add
a separate structure as with the irqentry_state used by x86. We can
separate this once more of the exception logic is moved to C. While the
fields only need to be bool, they are both made u64 to keep pt_regs
16-byte aligned.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-9-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
When built with PROVE_LOCKING, NO_HZ_FULL, and CONTEXT_TRACKING_FORCE
will WARN() at boot time that interrupts are enabled when we call
context_tracking_user_enter(), despite the DAIF flags indicating that
IRQs are masked.
The problem is that we're not tracking IRQ flag changes accurately, and
so lockdep believes interrupts are enabled when they are not (and
vice-versa). We can shuffle things so to make this more accurate. For
kernel->user transitions there are a number of constraints we need to
consider:
1) When we call __context_tracking_user_enter() HW IRQs must be disabled
and lockdep must be up-to-date with this.
2) Userspace should be treated as having IRQs enabled from the PoV of
both lockdep and tracing.
3) As context_tracking_user_enter() stops RCU from watching, we cannot
use RCU after calling it.
4) IRQ flag tracing and lockdep have state that must be manipulated
before RCU is disabled.
... with similar constraints applying for user->kernel transitions, with
the ordering reversed.
The generic entry code has enter_from_user_mode() and
exit_to_user_mode() helpers to handle this. We can't use those directly,
so we add arm64 copies for now (without the instrumentation markers
which aren't used on arm64). These replace the existing user_exit() and
user_exit_irqoff() calls spread throughout handlers, and the exception
unmasking is left as-is.
Note that:
* The accounting for debug exceptions from userspace now happens in
el0_dbg() and ret_to_user(), so this is removed from
debug_exception_enter() and debug_exception_exit(). As
user_exit_irqoff() wakes RCU, the userspace-specific check is removed.
* The accounting for syscalls now happens in el0_svc(),
el0_svc_compat(), and ret_to_user(), so this is removed from
el0_svc_common(). This does not adversely affect the workaround for
erratum 1463225, as this does not depend on any of the state tracking.
* In ret_to_user() we mask interrupts with local_daif_mask(), and so we
need to inform lockdep and tracing. Here a trace_hardirqs_off() is
sufficient and safe as we have not yet exited kernel context and RCU
is usable.
* As PROVE_LOCKING selects TRACE_IRQFLAGS, the ifdeferry in entry.S only
needs to check for the latter.
* EL0 SError handling will be dealt with in a subsequent patch, as this
needs to be treated as an NMI.
Prior to this patch, booting an appropriately-configured kernel would
result in spats as below:
| DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
| WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5280 check_flags.part.54+0x1dc/0x1f0
| Modules linked in:
| CPU: 2 PID: 1 Comm: init Not tainted 5.10.0-rc3 #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 804003c5 (Nzcv DAIF +PAN -UAO -TCO BTYPE=--)
| pc : check_flags.part.54+0x1dc/0x1f0
| lr : check_flags.part.54+0x1dc/0x1f0
| sp : ffff80001003bd80
| x29: ffff80001003bd80 x28: ffff66ce801e0000
| x27: 00000000ffffffff x26: 00000000000003c0
| x25: 0000000000000000 x24: ffffc31842527258
| x23: ffffc31842491368 x22: ffffc3184282d000
| x21: 0000000000000000 x20: 0000000000000001
| x19: ffffc318432ce000 x18: 0080000000000000
| x17: 0000000000000000 x16: ffffc31840f18a78
| x15: 0000000000000001 x14: ffffc3184285c810
| x13: 0000000000000001 x12: 0000000000000000
| x11: ffffc318415857a0 x10: ffffc318406614c0
| x9 : ffffc318415857a0 x8 : ffffc31841f1d000
| x7 : 647261685f706564 x6 : ffffc3183ff7c66c
| x5 : ffff66ce801e0000 x4 : 0000000000000000
| x3 : ffffc3183fe00000 x2 : ffffc31841500000
| x1 : e956dc24146b3500 x0 : 0000000000000000
| Call trace:
| check_flags.part.54+0x1dc/0x1f0
| lock_is_held_type+0x10c/0x188
| rcu_read_lock_sched_held+0x70/0x98
| __context_tracking_enter+0x310/0x350
| context_tracking_enter.part.3+0x5c/0xc8
| context_tracking_user_enter+0x6c/0x80
| finish_ret_to_user+0x2c/0x13cr
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-8-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
In preparation for reworking the EL1 irq/nmi entry code, move the
existing logic to C. We no longer need the asm_nmi_enter() and
asm_nmi_exit() wrappers, so these are removed. The new C functions are
marked noinstr, which prevents compiler instrumentation and runtime
probing.
In subsequent patches we'll want the new C helpers to be called in all
cases, so we don't bother wrapping the calls with ifdeferry. Even when
the new C functions are stubs the trivial calls are unlikely to have a
measurable impact on the IRQ or NMI paths anyway.
Prototypes are added to <asm/exception.h> as otherwise (in some
configurations) GCC will complain about the lack of a forward
declaration. We already do this for existing function, e.g.
enter_from_user_mode().
The new helpers are marked as noinstr (which prevents all
instrumentation, tracing, and kprobes). Otherwise, there should be no
functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-7-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Our Meltdown mitigation state isn't exposed outside of cpufeature.c,
contrary to the rest of the Spectre mitigation state. As we are going
to use it in KVM, expose a arm64_get_meltdown_state() helper which
returns the same possible values as arm64_get_spectre_v?_state().
Signed-off-by: Marc Zyngier <maz@kernel.org>
Pull arm64 fixes from Will Deacon:
"The main changes are relating to our handling of access/dirty bits,
where our low-level page-table helpers could lead to stale young
mappings and loss of the dirty bit in some cases (the latter has not
been observed in practice, but could happen when clearing "soft-dirty"
if we enabled that). These were posted as part of a larger series, but
the rest of that is less urgent and needs a v2 which I'll get to
shortly.
In other news, we've now got a set of fixes to resolve the
lockdep/tracing problems that have been plaguing us for a while, but
they're still a bit "fresh" and I plan to send them to you next week
after we've got some more confidence in them (although initial CI
results look good).
Summary:
- Fix kerneldoc warnings generated by ACPI IORT code
- Fix pte_accessible() so that access flag is ignored
- Fix missing header #include
- Fix loss of software dirty bit across pte_wrprotect() when HW DBM
is enabled"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect()
arm64: pgtable: Fix pte_accessible()
ACPI/IORT: Fix doc warnings in iort.c
arm64/fpsimd: add <asm/insn.h> to <asm/kprobes.h> to fix fpsimd build
There are a number of places where we check for the KVM_ARM_VCPU_PMU_V3
feature. Wrap this check into a new kvm_vcpu_has_pmu(), and use
it at the existing locations.
No functional change.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Move the setting of SSBS directly into the HVC handler, using
the C helpers rather than the inline asssembly code.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Directly using the kimage_voffset variable is fine for now, but
will become more problematic as we start distrusting EL1.
Instead, patch the kimage_voffset into the HYP text, ensuring
we don't have to load an untrusted value later on.
Signed-off-by: Marc Zyngier <maz@kernel.org>
With hardware dirty bit management, calling pte_wrprotect() on a writable,
dirty PTE will lose the dirty state and return a read-only, clean entry.
Move the logic from ptep_set_wrprotect() into pte_wrprotect() to ensure that
the dirty bit is preserved for writable entries, as this is required for
soft-dirty bit management if we enable it in the future.
Cc: <stable@vger.kernel.org>
Fixes: 2f4b829c62 ("arm64: Add support for hardware updates of the access and dirty pte bits")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20201120143557.6715-3-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
pte_accessible() is used by ptep_clear_flush() to figure out whether TLB
invalidation is necessary when unmapping pages for reclaim. Although our
implementation is correct according to the architecture, returning true
only for valid, young ptes in the absence of racing page-table
modifications, this is in fact flawed due to lazy invalidation of old
ptes in ptep_clear_flush_young() where we elide the expensive DSB
instruction for completing the TLB invalidation.
Rather than penalise the aging path, adjust pte_accessible() to return
true for any valid pte, even if the access flag is cleared.
Cc: <stable@vger.kernel.org>
Fixes: 76c714be0e ("arm64: pgtable: implement pte_accessible()")
Reported-by: Yu Zhao <yuzhao@google.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20201120143557.6715-2-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Adding <asm/exception.h> brought in <asm/kprobes.h> which uses
<asm/probes.h>, which uses 'pstate_check_t' so the latter needs to
#include <asm/insn.h> for this typedef.
Fixes this build error:
In file included from arch/arm64/include/asm/kprobes.h:24,
from arch/arm64/include/asm/exception.h:11,
from arch/arm64/kernel/fpsimd.c:35:
arch/arm64/include/asm/probes.h:16:2: error: unknown type name 'pstate_check_t'
16 | pstate_check_t *pstate_cc;
Fixes: c6b90d5cf6 ("arm64/fpsimd: Fix missing-prototypes in fpsimd.c")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Tian Tao <tiantao6@hisilicon.com>
Link: https://lore.kernel.org/r/20201123044510.9942-1-rdunlap@infradead.org
Signed-off-by: Will Deacon <will@kernel.org>
To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for arm64.
Signed-off-by: Kees Cook <keescook@chromium.org>