Fix and subsequently rewrite Spectre mitigations, including the addition
of support for PR_SPEC_DISABLE_NOEXEC.
(Will Deacon and Marc Zyngier)
* for-next/ghostbusters: (22 commits)
arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option
arm64: Pull in task_stack_page() to Spectre-v4 mitigation code
KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled
arm64: Get rid of arm64_ssbd_state
KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state()
KVM: arm64: Get rid of kvm_arm_have_ssbd()
KVM: arm64: Simplify handling of ARCH_WORKAROUND_2
arm64: Rewrite Spectre-v4 mitigation code
arm64: Move SSBD prctl() handler alongside other spectre mitigation code
arm64: Rename ARM64_SSBD to ARM64_SPECTRE_V4
arm64: Treat SSBS as a non-strict system feature
arm64: Group start_thread() functions together
KVM: arm64: Set CSV2 for guests on hardware unaffected by Spectre-v2
arm64: Rewrite Spectre-v2 mitigation code
arm64: Introduce separate file for spectre mitigations and reporting
arm64: Rename ARM64_HARDEN_BRANCH_PREDICTOR to ARM64_SPECTRE_V2
KVM: arm64: Simplify install_bp_hardening_cb()
KVM: arm64: Replace CONFIG_KVM_INDIRECT_VECTORS with CONFIG_RANDOMIZE_BASE
arm64: Remove Spectre-related CONFIG_* options
arm64: Run ARCH_WORKAROUND_2 enabling code on all CPUs
...
Remove unused functions and parameters from ACPI IORT code.
(Zenghui Yu via Lorenzo Pieralisi)
* for-next/acpi:
ACPI/IORT: Remove the unused inline functions
ACPI/IORT: Drop the unused @ops of iort_add_device_replay()
Remove redundant code and fix documentation of caching behaviour for the
HVC_SOFT_RESTART hypercall.
(Pingfan Liu)
* for-next/boot:
Documentation/kvm/arm: improve description of HVC_SOFT_RESTART
arm64/relocate_kernel: remove redundant code
Improve reporting of unexpected kernel traps due to BPF JIT failure.
(Will Deacon)
* for-next/bpf:
arm64: Improve diagnostics when trapping BRK with FAULT_BRK_IMM
Improve robustness of user-visible HWCAP strings and their corresponding
numerical constants.
(Anshuman Khandual)
* for-next/cpuinfo:
arm64/cpuinfo: Define HWCAP name arrays per their actual bit definitions
Cleanups to handling of SVE and FPSIMD register state in preparation
for potential future optimisation of handling across syscalls.
(Julien Grall)
* for-next/fpsimd:
arm64/sve: Implement a helper to load SVE registers from FPSIMD state
arm64/sve: Implement a helper to flush SVE registers
arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN
arm64/signal: Update the comment in preserve_sve_context
arm64/fpsimd: Update documentation of do_sve_acc
Miscellaneous changes.
(Tian Tao and others)
* for-next/misc:
arm64/mm: return cpu_all_mask when node is NUMA_NO_NODE
arm64: mm: Fix missing-prototypes in pageattr.c
arm64/fpsimd: Fix missing-prototypes in fpsimd.c
arm64: hibernate: Remove unused including <linux/version.h>
arm64/mm: Refactor {pgd, pud, pmd, pte}_ERROR()
arm64: Remove the unused include statements
arm64: get rid of TEXT_OFFSET
arm64: traps: Add str of description to panic() in die()
Memory management updates and cleanups.
(Anshuman Khandual and others)
* for-next/mm:
arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD
arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op
arm64/mm: Unify CONT_PMD_SHIFT
arm64/mm: Unify CONT_PTE_SHIFT
arm64/mm: Remove CONT_RANGE_OFFSET
arm64/mm: Enable THP migration
arm64/mm: Change THP helpers to comply with generic MM semantics
arm64/mm/ptdump: Add address markers for BPF regions
Allow prefetchable PCI BARs to be exposed to userspace using normal
non-cacheable mappings.
(Clint Sbisa)
* for-next/pci:
arm64: Enable PCI write-combine resources under sysfs
Perf/PMU driver updates.
(Julien Thierry and others)
* for-next/perf:
perf: arm-cmn: Fix conversion specifiers for node type
perf: arm-cmn: Fix unsigned comparison to less than zero
arm_pmu: arm64: Use NMIs for PMU
arm_pmu: Introduce pmu_irq_ops
KVM: arm64: pmu: Make overflow handler NMI safe
arm64: perf: Defer irq_work to IPI_IRQ_WORK
arm64: perf: Remove PMU locking
arm64: perf: Avoid PMXEV* indirection
arm64: perf: Add missing ISB in armv8pmu_enable_counter()
perf: Add Arm CMN-600 PMU driver
perf: Add Arm CMN-600 DT binding
arm64: perf: Add support caps under sysfs
drivers/perf: thunderx2_pmu: Fix memory resource error handling
drivers/perf: xgene_pmu: Fix uninitialized resource struct
perf: arm_dsu: Support DSU ACPI devices
arm64: perf: Remove unnecessary event_idx check
drivers/perf: hisi: Add missing include of linux/module.h
arm64: perf: Add general hardware LLC events for PMUv3
Support for the Armv8.3 Pointer Authentication enhancements.
(By Amit Daniel Kachhap)
* for-next/ptrauth:
arm64: kprobe: clarify the comment of steppable hint instructions
arm64: kprobe: disable probe of fault prone ptrauth instruction
arm64: cpufeature: Modify address authentication cpufeature to exact
arm64: ptrauth: Introduce Armv8.3 pointer authentication enhancements
arm64: traps: Allow force_signal_inject to pass esr error code
arm64: kprobe: add checks for ARMv8.3-PAuth combined instructions
Tonnes of cleanup to the SDEI driver.
(Gavin Shan)
* for-next/sdei:
firmware: arm_sdei: Remove _sdei_event_unregister()
firmware: arm_sdei: Remove _sdei_event_register()
firmware: arm_sdei: Introduce sdei_do_local_call()
firmware: arm_sdei: Cleanup on cross call function
firmware: arm_sdei: Remove while loop in sdei_event_unregister()
firmware: arm_sdei: Remove while loop in sdei_event_register()
firmware: arm_sdei: Remove redundant error message in sdei_probe()
firmware: arm_sdei: Remove duplicate check in sdei_get_conduit()
firmware: arm_sdei: Unregister driver on error in sdei_init()
firmware: arm_sdei: Avoid nested statements in sdei_init()
firmware: arm_sdei: Retrieve event number from event instance
firmware: arm_sdei: Common block for failing path in sdei_event_create()
firmware: arm_sdei: Remove sdei_is_err()
Selftests for Pointer Authentication and FPSIMD/SVE context-switching.
(Mark Brown and Boyan Karatotev)
* for-next/selftests:
selftests: arm64: Add build and documentation for FP tests
selftests: arm64: Add wrapper scripts for stress tests
selftests: arm64: Add utility to set SVE vector lengths
selftests: arm64: Add stress tests for FPSMID and SVE context switching
selftests: arm64: Add test for the SVE ptrace interface
selftests: arm64: Test case for enumeration of SVE vector lengths
kselftests/arm64: add PAuth tests for single threaded consistency and differently initialized keys
kselftests/arm64: add PAuth test for whether exec() changes keys
kselftests/arm64: add nop checks for PAuth tests
kselftests/arm64: add a basic Pointer Authentication test
Implementation of ARCH_STACKWALK for unwinding.
(Mark Brown)
* for-next/stacktrace:
arm64: Move console stack display code to stacktrace.c
arm64: stacktrace: Convert to ARCH_STACKWALK
arm64: stacktrace: Make stack walk callback consistent with generic code
stacktrace: Remove reliable argument from arch_stack_walk() callback
Support for ASID pinning, which is required when sharing page-tables with
the SMMU.
(Jean-Philippe Brucker)
* for-next/svm:
arm64: cpufeature: Export symbol read_sanitised_ftr_reg()
arm64: mm: Pin down ASIDs for sharing mm with devices
Rely on firmware tables for establishing CPU topology.
(Valentin Schneider)
* for-next/topology:
arm64: topology: Stop using MPIDR for topology information
Spelling fixes.
(Xiaoming Ni and Yanfei Xu)
* for-next/tpyos:
arm64/numa: Fix a typo in comment of arm64_numa_init
arm64: fix some spelling mistakes in the comments by codespell
vDSO cleanups.
(Will Deacon)
* for-next/vdso:
arm64: vdso: Fix unusual formatting in *setup_additional_pages()
arm64: vdso32: Remove a bunch of #ifdef CONFIG_COMPAT_VDSO guards
The node type field is an enum type, so print it as a 32-bit quantity
rather than as an unsigned short.
Link: https://lore.kernel.org/r/202009302350.QIzfkx62-lkp@intel.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Will Deacon <will@kernel.org>
Ensure that the 'irq' field of 'struct arm_cmn_dtc' is a signed int
so that it can be compared '< 0'.
Link: https://lore.kernel.org/r/20200929170835.GA15956@embeddedor
Addresses-Coverity-ID: 1497488 ("Unsigned compared against 0")
Fixes: 0ba64770a2 ("perf: Add Arm CMN-600 PMU driver")
Reported-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
TCR_EL1.HD is permitted to be cached in a TLB, so invalidate the local
TLB after setting the bit when detected support for the feature. Although
this isn't strictly necessary, since we can happily operate with the bit
effectively clear, the current code uses an ISB in a half-hearted attempt
to make the change effective, so let's just fix that up.
Link: https://lore.kernel.org/r/20201001110405.18617-1-will@kernel.org
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Our use of broadcast TLB maintenance means that spurious page-faults
that have been handled already by another CPU do not require additional
TLB maintenance.
Make flush_tlb_fix_spurious_fault() a no-op and rely on the existing TLB
invalidation instead. Add an explicit flush_tlb_page() when making a page
dirty, as the TLB is permitted to cache the old read-only entry.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200728092220.GA21800@willie-the-truck
Signed-off-by: Will Deacon <will@kernel.org>
The PR_SPEC_DISABLE_NOEXEC option to the PR_SPEC_STORE_BYPASS prctl()
allows the SSB mitigation to be enabled only until the next execve(),
at which point the state will revert back to PR_SPEC_ENABLE and the
mitigation will be disabled.
Add support for PR_SPEC_DISABLE_NOEXEC on arm64.
Reported-by: Anthony Steinhauser <asteinhauser@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
The kbuild robot reports that we're relying on an implicit inclusion to
get a definition of task_stack_page() in the Spectre-v4 mitigation code,
which is not always in place for some configurations:
| arch/arm64/kernel/proton-pack.c:329:2: error: implicit declaration of function 'task_stack_page' [-Werror,-Wimplicit-function-declaration]
| task_pt_regs(task)->pstate |= val;
| ^
| arch/arm64/include/asm/processor.h:268:36: note: expanded from macro 'task_pt_regs'
| ((struct pt_regs *)(THREAD_SIZE + task_stack_page(p)) - 1)
| ^
| arch/arm64/kernel/proton-pack.c:329:2: note: did you mean 'task_spread_page'?
Add the missing include to fix the build error.
Fixes: a44acf477220 ("arm64: Move SSBD prctl() handler alongside other spectre mitigation code")
Reported-by: Anthony Steinhauser <asteinhauser@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/202009260013.Ul7AD29w%lkp@intel.com
Signed-off-by: Will Deacon <will@kernel.org>
Patching the EL2 exception vectors is integral to the Spectre-v2
workaround, where it can be necessary to execute CPU-specific sequences
to nobble the branch predictor before running the hypervisor text proper.
Remove the dependency on CONFIG_RANDOMIZE_BASE and allow the EL2 vectors
to be patched even when KASLR is not enabled.
Fixes: 7a132017e7a5 ("KVM: arm64: Replace CONFIG_KVM_INDIRECT_VECTORS with CONFIG_RANDOMIZE_BASE")
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/202009221053.Jv1XsQUZ%lkp@intel.com
Signed-off-by: Will Deacon <will@kernel.org>
Convert the KVM WA2 code to using the Spectre infrastructure,
making the code much more readable. It also allows us to
take SSBS into account for the mitigation.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Owing to the fact that the host kernel is always mitigated, we can
drastically simplify the WA2 handling by keeping the mitigation
state ON when entering the guest. This means the guest is either
unaffected or not mitigated.
This results in a nice simplification of the mitigation space,
and the removal of a lot of code that was never really used anyway.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Rewrite the Spectre-v4 mitigation handling code to follow the same
approach as that taken by Spectre-v2.
For now, report to KVM that the system is vulnerable (by forcing
'ssbd_state' to ARM64_SSBD_UNKNOWN), as this will be cleared up in
subsequent steps.
Signed-off-by: Will Deacon <will@kernel.org>
As part of the spectre consolidation effort to shift all of the ghosts
into their own proton pack, move all of the horrible SSBD prctl() code
out of its own 'ssbd.c' file.
Signed-off-by: Will Deacon <will@kernel.org>
In a similar manner to the renaming of ARM64_HARDEN_BRANCH_PREDICTOR
to ARM64_SPECTRE_V2, rename ARM64_SSBD to ARM64_SPECTRE_V4. This isn't
_entirely_ accurate, as we also need to take into account the interaction
with SSBS, but that will be taken care of in subsequent patches.
Signed-off-by: Will Deacon <will@kernel.org>
If all CPUs discovered during boot have SSBS, then spectre-v4 will be
considered to be "mitigated". However, we still allow late CPUs without
SSBS to be onlined, albeit with a "SANITY CHECK" warning. This is
problematic for userspace because it means that the system can quietly
transition to "Vulnerable" at runtime.
Avoid this by treating SSBS as a non-strict system feature: if all of
the CPUs discovered during boot have SSBS, then late arriving secondaries
better have it as well.
Signed-off-by: Will Deacon <will@kernel.org>
The is_ttbrX_addr() functions have somehow ended up in the middle of
the start_thread() functions, so move them out of the way to keep the
code readable.
Signed-off-by: Will Deacon <will@kernel.org>
If the system is not affected by Spectre-v2, then advertise to the KVM
guest that it is not affected, without the need for a safelist in the
guest.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
The Spectre-v2 mitigation code is pretty unwieldy and hard to maintain.
This is largely due to it being written hastily, without much clue as to
how things would pan out, and also because it ends up mixing policy and
state in such a way that it is very difficult to figure out what's going
on.
Rewrite the Spectre-v2 mitigation so that it clearly separates state from
policy and follows a more structured approach to handling the mitigation.
Signed-off-by: Will Deacon <will@kernel.org>
The spectre mitigation code is spread over a few different files, which
makes it both hard to follow, but also hard to remove it should we want
to do that in future.
Introduce a new file for housing the spectre mitigations, and populate
it with the spectre-v1 reporting code to start with.
Signed-off-by: Will Deacon <will@kernel.org>
For better or worse, the world knows about "Spectre" and not about
"Branch predictor hardening". Rename ARM64_HARDEN_BRANCH_PREDICTOR to
ARM64_SPECTRE_V2 as part of moving all of the Spectre mitigations into
their own little corner.
Signed-off-by: Will Deacon <will@kernel.org>
Use is_hyp_mode_available() to detect whether or not we need to patch
the KVM vectors for branch hardening, which avoids the need to take the
vector pointers as parameters.
Signed-off-by: Will Deacon <will@kernel.org>
The removal of CONFIG_HARDEN_BRANCH_PREDICTOR means that
CONFIG_KVM_INDIRECT_VECTORS is synonymous with CONFIG_RANDOMIZE_BASE,
so replace it.
Signed-off-by: Will Deacon <will@kernel.org>
The spectre mitigations are too configurable for their own good, leading
to confusing logic trying to figure out when we should mitigate and when
we shouldn't. Although the plethora of command-line options need to stick
around for backwards compatibility, the default-on CONFIG options that
depend on EXPERT can be dropped, as the mitigations only do anything if
the system is vulnerable, a mitigation is available and the command-line
hasn't disabled it.
Remove CONFIG_HARDEN_BRANCH_PREDICTOR and CONFIG_ARM64_SSBD in favour of
enabling this code unconditionally.
Signed-off-by: Will Deacon <will@kernel.org>
Commit 606f8e7b27 ("arm64: capabilities: Use linear array for
detection and verification") changed the way we deal with per-CPU errata
by only calling the .matches() callback until one CPU is found to be
affected. At this point, .matches() stop being called, and .cpu_enable()
will be called on all CPUs.
This breaks the ARCH_WORKAROUND_2 handling, as only a single CPU will be
mitigated.
In order to address this, forcefully call the .matches() callback from a
.cpu_enable() callback, which brings us back to the original behaviour.
Fixes: 606f8e7b27 ("arm64: capabilities: Use linear array for detection and verification")
Cc: <stable@vger.kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
The SMMUv3 driver would like to read the MMFR0 PARANGE field in order to
share CPU page tables with devices. Allow the driver to be built as
module by exporting the read_sanitized_ftr_reg() cpufeature symbol.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20200918101852.582559-7-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
To enable address space sharing with the IOMMU, introduce
arm64_mm_context_get() and arm64_mm_context_put(), that pin down a
context and ensure that it will keep its ASID after a rollover. Export
the symbols to let the modular SMMUv3 driver use them.
Pinning is necessary because a device constantly needs a valid ASID,
unlike tasks that only require one when running. Without pinning, we would
need to notify the IOMMU when we're about to use a new ASID for a task,
and it would get complicated when a new task is assigned a shared ASID.
Consider the following scenario with no ASID pinned:
1. Task t1 is running on CPUx with shared ASID (gen=1, asid=1)
2. Task t2 is scheduled on CPUx, gets ASID (1, 2)
3. Task tn is scheduled on CPUy, a rollover occurs, tn gets ASID (2, 1)
We would now have to immediately generate a new ASID for t1, notify
the IOMMU, and finally enable task tn. We are holding the lock during
all that time, since we can't afford having another CPU trigger a
rollover. The IOMMU issues invalidation commands that can take tens of
milliseconds.
It gets needlessly complicated. All we wanted to do was schedule task tn,
that has no business with the IOMMU. By letting the IOMMU pin tasks when
needed, we avoid stalling the slow path, and let the pinning fail when
we're out of shareable ASIDs.
After a rollover, the allocator expects at least one ASID to be available
in addition to the reserved ones (one per CPU). So (NR_ASIDS - NR_CPUS -
1) is the maximum number of ASIDs that can be shared with the IOMMU.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20200918101852.582559-5-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
_sdei_event_unregister() is called by sdei_event_unregister() and
sdei_device_freeze(). _sdei_event_unregister() covers the shared
and private events, but sdei_device_freeze() only covers the shared
events. So the logic to cover the private events isn't needed by
sdei_device_freeze().
sdei_event_unregister sdei_device_freeze
_sdei_event_unregister sdei_unregister_shared
_sdei_event_unregister
This removes _sdei_event_unregister(). Its logic is moved to its
callers accordingly. This shouldn't cause any logical changes.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20200922130423.10173-14-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
The function _sdei_event_register() is called by sdei_event_register()
and sdei_device_thaw() as the following functional call chain shows.
_sdei_event_register() covers the shared and private events, but
sdei_device_thaw() only covers the shared events. So the logic to
cover the private events in _sdei_event_register() isn't needed by
sdei_device_thaw().
Similarly, sdei_reregister_event_llocked() covers the shared and
private events in the regard of reenablement. The logic to cover
the private events isn't needed by sdei_device_thaw() either.
sdei_event_register sdei_device_thaw
_sdei_event_register sdei_reregister_shared
sdei_reregister_event_llocked
_sdei_event_register
This removes _sdei_event_register() and sdei_reregister_event_llocked().
Their logic is moved to sdei_event_register() and sdei_reregister_shared().
This shouldn't cause any logical changes.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20200922130423.10173-13-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
During the CPU hotplug, the private events are registered, enabled
or unregistered on the specific CPU. It repeats the same steps:
initializing cross call argument, make function call on local CPU,
check the returned error.
This introduces sdei_do_local_call() to cover the first steps. The
other benefit is to make CROSSCALL_INIT and struct sdei_crosscall_args
are only visible to sdei_do_{cross, local}_call().
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20200922130423.10173-12-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
This applies cleanup on the cross call functions, no functional
changes are introduced:
* Wrap the code block of CROSSCALL_INIT inside "do { } while (0)"
as linux kernel usually does. Otherwise, scripts/checkpatch.pl
reports warning regarding this.
* Use smp_call_func_t for @fn argument in sdei_do_cross_call()
as the function is called on target CPU(s).
* Remove unnecessary space before @event in sdei_do_cross_call()
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20200922130423.10173-11-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
This removes the unnecessary while loop in sdei_event_unregister()
because of the following two reasons. This shouldn't cause any
functional changes.
* The while loop is executed for once, meaning it's not needed
in theory.
* With the while loop removed, the nested statements can be
avoid to make the code a bit cleaner.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20200922130423.10173-10-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
This removes the unnecessary while loop in sdei_event_register()
because of the following two reasons. This shouldn't cause any
functional changes.
* The while loop is executed for once, meaning it's not needed
in theory.
* With the while loop removed, the nested statements can be
avoid to make the code a bit cleaner.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20200922130423.10173-9-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
This removes the redundant error message in sdei_probe() because
the case can be identified from the errno in next error message.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20200922130423.10173-8-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
The following two checks are duplicate because @acpi_disabled doesn't
depend on CONFIG_ACPI. So the duplicate check (IS_ENABLED(CONFIG_ACPI))
can be dropped. More details is provided to keep the commit log complete:
* @acpi_disabled is defined in arch/arm64/kernel/acpi.c when
CONFIG_ACPI is enabled.
* @acpi_disabled in defined in include/acpi.h when CONFIG_ACPI
is disabled.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20200922130423.10173-7-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
The SDEI platform device is created from device-tree node or ACPI
(SDEI) table. For the later case, the platform device is created
explicitly by this module. It'd better to unregister the driver on
failure to create the device to keep the symmetry. The driver, owned
by this module, isn't needed if the device isn't existing.
Besides, the errno (@ret) should be updated accordingly in this
case.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20200922130423.10173-6-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
In sdei_init(), the nested statements can be avoided by bailing
on error from platform_driver_register() or absent ACPI SDEI table.
With it, the code looks a bit more readable.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20200922130423.10173-5-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
In sdei_event_create(), the event number is retrieved from the
variable @event_num for the shared event. The event number was
stored in the event instance. So we can fetch it from the event
instance, similar to what we're doing for the private event.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20200922130423.10173-4-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
There are multiple calls of kfree(event) in the failing path of
sdei_event_create() to free the SDEI event. It means we need to
call it again when adding more code in the failing path. It's
prone to miss doing that and introduce memory leakage.
This introduces common block for failing path in sdei_event_create()
to resolve the issue. This shouldn't cause functional changes.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20200922130423.10173-3-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
sdei_is_err() is only called by sdei_to_linux_errno(). The logic of
checking on the error number is common to them. They can be combined
finely.
This removes sdei_is_err() and its logic is combined to the function
sdei_to_linux_errno(). Also, the assignment of @err to zero is also
dropped in invoke_sdei_fn() because it's always overridden afterwards.
This shouldn't cause functional changes.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20200922130423.10173-2-gshan@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
Add required PMU interrupt operations for NMIs. Request interrupt lines as
NMIs when possible, otherwise fall back to normal interrupts.
NMIs are only supported on the arm64 architecture with a GICv3 irqchip.
[Alexandru E.: Added that NMIs only work on arm64 + GICv3, print message
when PMU is using NMIs]
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20200924110706.254996-8-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Currently the PMU interrupt can either be a normal irq or a percpu irq.
Supporting NMI will introduce two cases for each existing one. It becomes
a mess of 'if's when managing the interrupt.
Define sets of callbacks for operations commonly done on the interrupt. The
appropriate set of callbacks is selected at interrupt request time and
simplifies interrupt enabling/disabling and freeing.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20200924110706.254996-7-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
kvm_vcpu_kick() is not NMI safe. When the overflow handler is called from
NMI context, defer waking the vcpu to an irq_work queue.
A vcpu can be freed while it's not running by kvm_destroy_vm(). Prevent
running the irq_work for a non-existent vcpu by calling irq_work_sync() on
the PMU destroy path.
[Alexandru E.: Added irq_work_sync()]
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Pouloze <suzuki.poulose@arm.com>
Cc: kvm@vger.kernel.org
Cc: kvmarm@lists.cs.columbia.edu
Link: https://lore.kernel.org/r/20200924110706.254996-6-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
When handling events, armv8pmu_handle_irq() calls perf_event_overflow(),
and subsequently calls irq_work_run() to handle any work queued by
perf_event_overflow(). As perf_event_overflow() raises IPI_IRQ_WORK when
queuing the work, this isn't strictly necessary and the work could be
handled as part of the IPI_IRQ_WORK handler.
In the common case the IPI handler will run immediately after the PMU IRQ
handler, and where the PE is heavily loaded with interrupts other handlers
may run first, widening the window where some counters are disabled.
In practice this window is unlikely to be a significant issue, and removing
the call to irq_work_run() would make the PMU IRQ handler NMI safe in
addition to making it simpler, so let's do that.
[Alexandru E.: Reworded commit message]
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200924110706.254996-5-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
The PMU is disabled and enabled, and the counters are programmed from
contexts where interrupts or preemption is disabled.
The functions to toggle the PMU and to program the PMU counters access the
registers directly and don't access data modified by the interrupt handler.
That, and the fact that they're always called from non-preemptible
contexts, means that we don't need to disable interrupts or use a spinlock.
[Alexandru E.: Explained why locking is not needed, removed WARN_ONs]
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200924110706.254996-4-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Currently we access the counter registers and their respective type
registers indirectly. This requires us to write to PMSELR, issue an ISB,
then access the relevant PMXEV* registers.
This is unfortunate, because:
* Under virtualization, accessing one register requires two traps to
the hypervisor, even though we could access the register directly with
a single trap.
* We have to issue an ISB which we could otherwise avoid the cost of.
* When we use NMIs, the NMI handler will have to save/restore the select
register in case the code it preempted was attempting to access a
counter or its type register.
We can avoid these issues by directly accessing the relevant registers.
This patch adds helpers to do so.
In armv8pmu_enable_event() we still need the ISB to prevent the PE from
reordering the write to PMINTENSET_EL1 register. If the interrupt is
enabled before we disable the counter and the new event is configured,
we might get an interrupt triggered by the previously programmed event
overflowing, but which we wrongly attribute to the event that we are
enabling. Execute an ISB after we disable the counter.
In the process, remove the comment that refers to the ARMv7 PMU.
[Julien T.: Don't inline read/write functions to avoid big code-size
increase, remove unused read_pmevtypern function,
fix counter index issue.]
[Alexandru E.: Removed comment, removed trailing semicolons in macros,
added ISB]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200924110706.254996-3-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Writes to the PMXEVTYPER_EL0 register are not self-synchronising. In
armv8pmu_enable_event(), the PE can reorder configuring the event type
after we have enabled the counter and the interrupt. This can lead to an
interrupt being asserted because of the previous event type that we were
counting using the same counter, not the one that we've just configured.
The same rationale applies to writes to the PMINTENSET_EL1 register. The PE
can reorder enabling the interrupt at any point in the future after we have
enabled the event.
Prevent both situations from happening by adding an ISB just before we
enable the event counter.
Fixes: 030896885a ("arm64: Performance counters support")
Reported-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200924110706.254996-2-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Initial driver for PMU event counting on the Arm CMN-600 interconnect.
CMN sports an obnoxiously complex distributed PMU system as part of
its debug and trace features, which can do all manner of things like
sampling, cross-triggering and generating CoreSight trace. This driver
covers the PMU functionality, plus the relevant aspects of watchpoints
for simply counting matching flits.
Tested-by: Tsahi Zidenberg <tsahee@amazon.com>
Tested-by: Tuan Phan <tuanphan@os.amperecomputing.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Document the requirements for the CMN-600 DT binding. The internal
topology is almost entirely discoverable by walking a tree of ID
registers, but sadly both the starting point for that walk and the
exact format of those registers are configuration-dependent and not
discoverable from some sane fixed location. Oh well.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>