Commit Graph

1058963 Commits

Author SHA1 Message Date
Kees Cook
3364c6ce23 arm64: atomics: lse: Dereference matching size
When building with -Warray-bounds, the following warning is generated:

In file included from ./arch/arm64/include/asm/lse.h:16,
                 from ./arch/arm64/include/asm/cmpxchg.h:14,
                 from ./arch/arm64/include/asm/atomic.h:16,
                 from ./include/linux/atomic.h:7,
                 from ./include/asm-generic/bitops/atomic.h:5,
                 from ./arch/arm64/include/asm/bitops.h:25,
                 from ./include/linux/bitops.h:33,
                 from ./include/linux/kernel.h:22,
                 from kernel/printk/printk.c:22:
./arch/arm64/include/asm/atomic_lse.h:247:9: warning: array subscript 'long unsigned int[0]' is partly outside array bounds of 'atomic_t[1]' [-Warray-bounds]
  247 |         asm volatile(                                                   \
      |         ^~~
./arch/arm64/include/asm/atomic_lse.h:266:1: note: in expansion of macro '__CMPXCHG_CASE'
  266 | __CMPXCHG_CASE(w,  , acq_, 32,  a, "memory")
      | ^~~~~~~~~~~~~~
kernel/printk/printk.c:3606:17: note: while referencing 'printk_cpulock_owner'
 3606 | static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1);
      |                 ^~~~~~~~~~~~~~~~~~~~

This is due to the compiler seeing an unsigned long * cast against
something (atomic_t) that is int sized. Replace the cast with the
matching size cast. This results in no change in binary output.

Note that __ll_sc__cmpxchg_case_##name##sz already uses the same
constraint:

	[v] "+Q" (*(u##sz *)ptr

Which is why only the LSE form needs updating and not the
LL/SC form, so this change is unlikely to be problematic.

Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220112202259.3950286-1-keescook@chromium.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-01-20 09:13:48 +00:00
Xiongfeng Wang
440323b6cf asm-generic: Add missing brackets for io_stop_wc macro
After using io_stop_wc(), drivers reports following compile error when
compiled on X86.

  drivers/net/ethernet/hisilicon/hns3/hns3_enet.c: In function ‘hns3_tx_push_bd’:
  drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:2058:12: error: expected ‘;’ before ‘(’ token
    io_stop_wc();
              ^
It is because I missed to add the brackets after io_stop_wc macro. So
let's add the missing brackets.

Fixes: d5624bb29f ("asm-generic: introduce io_stop_wc() and add implementation for ARM64")
Reported-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Link: https://lore.kernel.org/r/20220114105857.126300-1-wangxiongfeng2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-01-20 09:12:04 +00:00
Catalin Marinas
945409a6ef Merge branches 'for-next/misc', 'for-next/cache-ops-dzp', 'for-next/stacktrace', 'for-next/xor-neon', 'for-next/kasan', 'for-next/armv8_7-fp', 'for-next/atomics', 'for-next/bti', 'for-next/sve', 'for-next/kselftest' and 'for-next/kcsan', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf: (32 commits)
  arm64: perf: Don't register user access sysctl handler multiple times
  drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check
  perf/smmuv3: Fix unused variable warning when CONFIG_OF=n
  arm64: perf: Support new DT compatibles
  arm64: perf: Simplify registration boilerplate
  arm64: perf: Support Denver and Carmel PMUs
  drivers/perf: hisi: Add driver for HiSilicon PCIe PMU
  docs: perf: Add description for HiSilicon PCIe PMU driver
  dt-bindings: perf: Add YAML schemas for Marvell CN10K LLC-TAD pmu bindings
  drivers: perf: Add LLC-TAD perf counter support
  perf/smmuv3: Synthesize IIDR from CoreSight ID registers
  perf/smmuv3: Add devicetree support
  dt-bindings: Add Arm SMMUv3 PMCG binding
  perf/arm-cmn: Add debugfs topology info
  perf/arm-cmn: Add CI-700 Support
  dt-bindings: perf: arm-cmn: Add CI-700
  perf/arm-cmn: Support new IP features
  perf/arm-cmn: Demarcate CMN-600 specifics
  perf/arm-cmn: Move group validation data off-stack
  perf/arm-cmn: Optimise DTC counter accesses
  ...

* for-next/misc:
  : Miscellaneous patches
  arm64: Use correct method to calculate nomap region boundaries
  arm64: Drop outdated links in comments
  arm64: errata: Fix exec handling in erratum 1418040 workaround
  arm64: Unhash early pointer print plus improve comment
  asm-generic: introduce io_stop_wc() and add implementation for ARM64
  arm64: remove __dma_*_area() aliases
  docs/arm64: delete a space from tagged-address-abi
  arm64/fp: Add comments documenting the usage of state restore functions
  arm64: mm: Use asid feature macro for cheanup
  arm64: mm: Rename asid2idx() to ctxid2asid()
  arm64: kexec: reduce calls to page_address()
  arm64: extable: remove unused ex_handler_t definition
  arm64: entry: Use SDEI event constants
  arm64: Simplify checking for populated DT
  arm64/kvm: Fix bitrotted comment for SVE handling in handle_exit.c

* for-next/cache-ops-dzp:
  : Avoid DC instructions when DCZID_EL0.DZP == 1
  arm64: mte: DC {GVA,GZVA} shouldn't be used when DCZID_EL0.DZP == 1
  arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1

* for-next/stacktrace:
  : Unify the arm64 unwind code
  arm64: Make some stacktrace functions private
  arm64: Make dump_backtrace() use arch_stack_walk()
  arm64: Make profile_pc() use arch_stack_walk()
  arm64: Make return_address() use arch_stack_walk()
  arm64: Make __get_wchan() use arch_stack_walk()
  arm64: Make perf_callchain_kernel() use arch_stack_walk()
  arm64: Mark __switch_to() as __sched
  arm64: Add comment for stack_info::kr_cur
  arch: Make ARCH_STACKWALK independent of STACKTRACE

* for-next/xor-neon:
  : Use SHA3 instructions to speed up XOR
  arm64/xor: use EOR3 instructions when available

* for-next/kasan:
  : Log potential KASAN shadow aliases
  arm64: mm: log potential KASAN shadow alias
  arm64: mm: use die_kernel_fault() in do_mem_abort()

* for-next/armv8_7-fp:
  : Add HWCAPS for ARMv8.7 FEAT_AFP amd FEAT_RPRES
  arm64: cpufeature: add HWCAP for FEAT_RPRES
  arm64: add ID_AA64ISAR2_EL1 sys register
  arm64: cpufeature: add HWCAP for FEAT_AFP

* for-next/atomics:
  : arm64 atomics clean-ups and codegen improvements
  arm64: atomics: lse: define RETURN ops in terms of FETCH ops
  arm64: atomics: lse: improve constraints for simple ops
  arm64: atomics: lse: define ANDs in terms of ANDNOTs
  arm64: atomics lse: define SUBs in terms of ADDs
  arm64: atomics: format whitespace consistently

* for-next/bti:
  : BTI clean-ups
  arm64: Ensure that the 'bti' macro is defined where linkage.h is included
  arm64: Use BTI C directly and unconditionally
  arm64: Unconditionally override SYM_FUNC macros
  arm64: Add macro version of the BTI instruction
  arm64: ftrace: add missing BTIs
  arm64: kexec: use __pa_symbol(empty_zero_page)
  arm64: update PAC description for kernel

* for-next/sve:
  : SVE code clean-ups and refactoring in prepararation of Scalable Matrix Extensions
  arm64/sve: Minor clarification of ABI documentation
  arm64/sve: Generalise vector length configuration prctl() for SME
  arm64/sve: Make sysctl interface for SVE reusable by SME

* for-next/kselftest:
  : arm64 kselftest additions
  kselftest/arm64: Add pidbench for floating point syscall cases
  kselftest/arm64: Add a test program to exercise the syscall ABI
  kselftest/arm64: Allow signal tests to trigger from a function
  kselftest/arm64: Parameterise ptrace vector length information

* for-next/kcsan:
  : Enable KCSAN for arm64
  arm64: Enable KCSAN
2022-01-05 18:14:32 +00:00
Huacai Chen
daa149dd8c arm64: Use correct method to calculate nomap region boundaries
Nomap regions are treated as "reserved". When region boundaries are not
page aligned, we usually increase the "reserved" regions rather than
decrease them. So, we should use memblock_region_reserved_base_pfn()/
memblock_region_reserved_end_pfn() instead of memblock_region_memory_
base_pfn()/memblock_region_memory_base_pfn() to calculate boundaries.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Link: https://lore.kernel.org/r/20211022070646.41923-1-chenhuacai@loongson.cn
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-01-05 14:46:06 +00:00
Kees Cook
89d30b1150 arm64: Drop outdated links in comments
As started by commit 05a5f51ca5 ("Documentation: Replace lkml.org links
with lore"), an effort was made to replace lkml.org links with lore to
better use a single source that's more likely to stay available long-term.
However, it seems these links don't offer much value here, so just
remove them entirely.

Cc: Joe Perches <joe@perches.com>
Suggested-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/lkml/20210211100213.GA29813@willie-the-truck/
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20211215191835.1420010-1-keescook@chromium.org
[catalin.marinas@arm.com: removed the arch/arm changes]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-01-05 11:51:01 +00:00
Will Deacon
3da4390bcd arm64: perf: Don't register user access sysctl handler multiple times
Commit e201260081 ("arm64: perf: Add userspace counter access disable
switch") introduced a new 'perf_user_access' sysctl file to enable and
disable direct userspace access to the PMU counters. Sadly, Geert
reports that on his big.LITTLE SoC ('Renesas Salvator-XS w/ R-Car H3'),
the file is created for each PMU type probed, resulting in a splat
during boot:

  | hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7 counters available
  | sysctl duplicate entry: /kernel//perf_user_access
  | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc3-arm64-renesas-00003-ge2012600810c #1420
  | Hardware name: Renesas Salvator-X 2nd version board based on r8a77951 (DT)
  | Call trace:
  |  dump_backtrace+0x0/0x190
  |  show_stack+0x14/0x20
  |  dump_stack_lvl+0x88/0xb0
  |  dump_stack+0x14/0x2c
  |  __register_sysctl_table+0x384/0x818
  |  register_sysctl+0x20/0x28
  |  armv8_pmu_init.constprop.0+0x118/0x150
  |  armv8_a57_pmu_init+0x1c/0x28
  |  arm_pmu_device_probe+0x1b4/0x558
  |  armv8_pmu_device_probe+0x18/0x20
  |  platform_probe+0x64/0xd0
  |  hw perfevents: enabled with armv8_cortex_a57 PMU driver, 7 counters available

Introduce a state variable to track creation of the sysctl file and
ensure that it is only created once.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: e201260081 ("arm64: perf: Add userspace counter access disable switch")
Link: https://lore.kernel.org/r/CAMuHMdVcDxR9sGzc5pcnORiotonERBgc6dsXZXMd6wTvLGA9iw@mail.gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
2022-01-04 14:57:14 +00:00
Dan Carpenter
2da56881a7 drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check
The devm_ioremap() function does not return error pointers.  It returns
NULL.

Fixes: 036a7584be ("drivers: perf: Add LLC-TAD perf counter support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20211217145907.GA16611@kili
Signed-off-by: Will Deacon <will@kernel.org>
2022-01-04 13:58:17 +00:00
Will Deacon
527a7f5252 perf/smmuv3: Fix unused variable warning when CONFIG_OF=n
The kbuild robot reports that building the SMMUv3 PMU driver with
CONFIG_OF=n results in a warning for W=1 builds:

>> drivers/perf/arm_smmuv3_pmu.c:889:34: warning: unused variable 'smmu_pmu_of_match' [-Wunused-const-variable]
   static const struct of_device_id smmu_pmu_of_match[] = {
                                    ^

Guard the match table with #ifdef CONFIG_OF.

Link: https://lore.kernel.org/r/202201041700.01KZEzhb-lkp@intel.com
Fixes: 3f7be43561 ("perf/smmuv3: Add devicetree support")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Will Deacon <will@kernel.org>
2022-01-04 13:38:16 +00:00
D Scott Phillips
38e0257e0e arm64: errata: Fix exec handling in erratum 1418040 workaround
The erratum 1418040 workaround enables CNTVCT_EL1 access trapping in EL0
when executing compat threads. The workaround is applied when switching
between tasks, but the need for the workaround could also change at an
exec(), when a non-compat task execs a compat binary or vice versa. Apply
the workaround in arch_setup_new_exec().

This leaves a small window of time between SET_PERSONALITY and
arch_setup_new_exec where preemption could occur and confuse the old
workaround logic that compares TIF_32BIT between prev and next. Instead, we
can just read cntkctl to make sure it's in the state that the next task
needs. I measured cntkctl read time to be about the same as a mov from a
general-purpose register on N1. Update the workaround logic to examine the
current value of cntkctl instead of the previous task's compat state.

Fixes: d49f7d7376 ("arm64: Move handling of erratum 1418040 into C code")
Cc: <stable@vger.kernel.org> # 5.9.x
Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211220234114.3926-1-scott@os.amperecomputing.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-22 15:18:19 +00:00
Guilherme G. Piccoli
31e833b203 arm64: Unhash early pointer print plus improve comment
When facing a really early issue on DT parsing we have currently
a message that shows both the physical and virtual address of the FDT.
The printk pointer modifier for the virtual address shows a hashed
address there unless the user provides "no_hash_pointers" parameter in
the command-line. The situation in which this message shows-up is a bit
more serious though: the boot process is broken, nothing can be done
(even an oops is too much for this early stage) so we have this message
as a last resort in order to help debug bootloader issues, for example.
Hence, we hereby change that to "%px" in order to make debugging easy,
there's not much information leak risk in such early boot failure.

Also, we tried to improve a bit the commenting on that function, given
that if kernel fails there, it just hangs forever in a cpu_relax() loop.
The reason we cannot BUG/panic is that is too early to do so; thanks to
Mark Brown for pointing that on IRC and thanks Robin Murphy for the good
pointer hash discussion in the mailing-list.

Cc: Mark Brown <broonie@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20211221155230.1532850-1-gpiccoli@igalia.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-22 11:06:11 +00:00
Xiongfeng Wang
d5624bb29f asm-generic: introduce io_stop_wc() and add implementation for ARM64
For memory accesses with write-combining attributes (e.g. those returned
by ioremap_wc()), the CPU may wait for prior accesses to be merged with
subsequent ones. But in some situation, such wait is bad for the
performance.

We introduce io_stop_wc() to prevent the merging of write-combining
memory accesses before this macro with those after it.

We add implementation for ARM64 using DGH instruction and provide NOP
implementation for other architectures.

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Suggested-by: Will Deacon <will@kernel.org>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211221035556.60346-1-wangxiongfeng2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-22 10:44:53 +00:00
Catalin Marinas
dd73d18e7f arm64: Ensure that the 'bti' macro is defined where linkage.h is included
Not all .S files include asm/assembler.h, however the SYM_FUNC_*
definitions invoke the 'bti' macro. Include asm/assembler.h in
asm/linkage.h.

Fixes: 9be34be87c ("arm64: Add macro version of the BTI instruction")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-17 16:20:45 +00:00
Mark Rutland
c2c529b27c arm64: remove __dma_*_area() aliases
The __dma_inv_area() and __dma_clean_area() aliases make cache.S harder
to navigate, but don't gain us anything in practice.

For clarity, let's remove them along with their redundant comments. The
only users are __dma_map_area() and __dma_unmap_area(), which need to be
position independent, and can call __pi_dcache_inval_poc() and
__pi_dcache_clean_poc() directly.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20211206124715.4101571-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-15 11:19:41 +00:00
Yanteng Si
f2cefc0c2d docs/arm64: delete a space from tagged-address-abi
Since e71e2ace5721("userfaultfd: do not untag user pointers") which
introduced a warning:

linux/Documentation/arm64/tagged-address-abi.rst:52: WARNING: Unexpected indentation.

Let's fix it.

Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Link: https://lore.kernel.org/r/20211209091922.560979-1-siyanteng@loongson.cn
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 19:01:37 +00:00
Kefeng Wang
dd03762ab6 arm64: Enable KCSAN
This patch enables KCSAN for arm64, with updates to build rules
to not use KCSAN for several incompatible compilation units.

Recent GCC version(at least GCC10) made outline-atomics as the
default option(unlike Clang), which will cause linker errors
for kernel/kcsan/core.o. Disables the out-of-line atomics by
no-outline-atomics to fix the linker errors.

Meanwhile, as Mark said[1], some latent issues are needed to be
fixed which isn't just a KCSAN problem, we make the KCSAN depends
on EXPERT for now.

Tested selftest and kcsan_test(built with GCC11 and Clang 13),
and all passed.

[1] https://lkml.kernel.org/r/YadiUPpJ0gADbiHQ@FVFF77S0Q05N

Acked-by: Marco Elver <elver@google.com> # kernel/kcsan
Tested-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20211211131734.126874-1-wangkefeng.wang@huawei.com
[catalin.marinas@arm.com: added comment to justify EXPERT]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:54:34 +00:00
Mark Brown
2c94ebedc8 kselftest/arm64: Add pidbench for floating point syscall cases
Since it's likely to be useful for performance work with SVE let's have a
pidbench that gives us some numbers for consideration. In order to ensure
that we test exactly the scenario we want this is written in assembly - if
system libraries use SVE this would stop us exercising the case where the
process has never used SVE.

We exercise three cases:

 - Never having used SVE.
 - Having used SVE once.
 - Using SVE after each syscall.

by spinning running getpid() for a fixed number of iterations with the
time measured using CNTVCT_EL0 reported on the console. This is obviously
a totally unrealistic benchmark which will show the extremes of any
performance variation but equally given the potential gotchas with use of
FP instructions by system libraries it's good to have some concrete code
shared to make it easier to compare notes on results.

Testing over multiple SVE vector lengths will need to be done with vlset
currently, the test could be extended to iterate over all of them if
desired.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211202165107.1075259-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:41:56 +00:00
Mark Brown
12b792e5e2 arm64/fp: Add comments documenting the usage of state restore functions
Add comments to help people figure out when fpsimd_bind_state_to_cpu() and
fpsimd_update_current_state() are used.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211207163250.1373542-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:36:19 +00:00
Mark Brown
b77e995e3b kselftest/arm64: Add a test program to exercise the syscall ABI
Currently we don't have any coverage of the syscall ABI so let's add a very
dumb test program which sets up register patterns, does a sysscall and then
checks that the register state after the syscall matches what we expect.
The program is written in an extremely simplistic fashion with the goal of
making it easy to verify that it's doing what it thinks it's doing, it is
not a model of how one should write actual code.

Currently we validate the general purpose, FPSIMD and SVE registers. There
are other thing things that could be covered like FPCR and flags registers,
these can be covered incrementally - my main focus at the minute is
covering the ABI for the SVE registers.

The program repeats the tests for all possible SVE vector lengths in case
some vector length specific optimisation causes issues, as well as testing
FPSIMD only. It tries two syscalls, getpid() and sched_yield(), in an
effort to cover both immediate return to userspace and scheduling another
task though there are no guarantees which cases will be hit.

A new test directory "abi" is added to hold the test, it doesn't seem to
fit well into any of the existing directories.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211210184133.320748-7-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:35:10 +00:00
Mark Brown
9331a60485 kselftest/arm64: Allow signal tests to trigger from a function
Currently we have the facility to specify custom code to trigger a signal
but none of the tests use it and for some reason the framework requires us
to also specify a signal to send as a trigger in order to make use of a
custom trigger. This doesn't seem to make much sense, instead allow the
use of a custom trigger function without specifying a signal to inject.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211210184133.320748-6-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:35:10 +00:00
Mark Brown
18edbb6b32 kselftest/arm64: Parameterise ptrace vector length information
SME introduces a new mode called streaming mode in which the SVE registers
have a different vector length. Since the ptrace interface for this is
based on the existing SVE interface prepare for supporting this by moving
the regset specific configuration into  struct and passing that around,
allowing these tests to be reused for streaming mode. As we will also have
to verify the interoperation of the SVE and streaming SVE regsets don't
just iterate over an array.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211210184133.320748-5-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:35:10 +00:00
Mark Brown
aed34d9e52 arm64/sve: Minor clarification of ABI documentation
As suggested by Luis for the SME version of this explicitly say that the
vector length should be extracted from the return value of a set vector
length prctl() with a bitwise and rather than just any old and.

Suggested-by: Luis Machado <Luis.Machado@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211210184133.320748-4-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:33:44 +00:00
Mark Brown
30c43e73b3 arm64/sve: Generalise vector length configuration prctl() for SME
In preparation for adding SME support update the bulk of the implementation
for the vector length configuration prctl() calls to be independent of
vector type.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211210184133.320748-3-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:33:44 +00:00
Mark Brown
97bcbee404 arm64/sve: Make sysctl interface for SVE reusable by SME
The vector length configuration for SME is very similar to that for SVE
so in order to allow reuse refactor the SVE configuration so that it takes
the vector type from the struct ctl_table. Since there's no dedicated space
for this we repurpose the extra1 field to store the vector type, this is
otherwise unused for integer sysctls.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211210184133.320748-2-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:33:44 +00:00
Will Deacon
1609c22a8a Merge branch 'for-next/perf-cpu' into for-next/perf
* for-next/perf-cpu:
  arm64: perf: Support new DT compatibles
  arm64: perf: Simplify registration boilerplate
  arm64: perf: Support Denver and Carmel PMUs
2021-12-14 18:13:25 +00:00
Mark Brown
742a15b1a2 arm64: Use BTI C directly and unconditionally
Now we have a macro for BTI C that looks like a regular instruction change
all the users of the current BTI_C macro to just emit a BTI C directly and
remove the macro.

This does mean that we now unconditionally BTI annotate all assembly
functions, meaning that they are worse in this respect than code generated
by the compiler. The overhead should be minimal for implementations with a
reasonable HINT implementation.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20211214152714.2380849-4-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:12:58 +00:00
Mark Brown
481ee45ce9 arm64: Unconditionally override SYM_FUNC macros
Currently we only override the SYM_FUNC macros when we need to insert
BTI C into them, do this unconditionally to make it more likely that we'll
notice bugs in our override.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20211214152714.2380849-3-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:12:58 +00:00
Mark Brown
9be34be87c arm64: Add macro version of the BTI instruction
BTI is only available from v8.5 so we need to encode it using HINT in
generic code and for older toolchains. Add an assembler macro based on
one written by Mark Rutland which lets us use the mnemonic and update
the existing users.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20211214152714.2380849-2-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:12:58 +00:00
Catalin Marinas
580b536b50 Merge 'arm64/for-next/fixes' into for-next/bti
Needed for the arch/arm64/kernel/entry-ftrace.S fix.

* commit 'arm64/for-next/fixes^^':
  arm64: ftrace: add missing BTIs
  arm64: kexec: use __pa_symbol(empty_zero_page)
  arm64: update PAC description for kernel
2021-12-14 18:11:52 +00:00
Robin Murphy
893c34b60a arm64: perf: Support new DT compatibles
Wire up the new DT compatibles so we can present appropriate
PMU names to userspace for the latest and greatest CPUs.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/62d14ba12d847ec7f1fba7cb0b3b881b437e1cc5.1639490264.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14 18:05:11 +00:00
Robin Murphy
6ac9f30bd4 arm64: perf: Simplify registration boilerplate
With the trend for per-core events moving to userspace JSON, registering
names for PMUv3 implementations is increasingly a pure boilerplate
exercise. Let's wrap things a step further so we can generate the basic
PMUv3 init function with a macro invocation, and reduce further new
addition to just 2 lines each.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/b79477ea3b97f685d00511d4ecd2f686184dca34.1639490264.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14 18:05:11 +00:00
Thierry Reding
d4c4844a9b arm64: perf: Support Denver and Carmel PMUs
Add support for the NVIDIA Denver and Carmel PMUs using the generic
PMUv3 event map for now.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
[ rm: reorder entries alphabetically ]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/5f0f69d47acca78a9e479501aa4d8b429e23cf11.1639490264.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14 18:05:11 +00:00
Will Deacon
8bd09b41b8 Merge branch 'for-next/perf-user-counter-access' into for-next/perf
* for-next/perf-user-counter-access:
  Documentation: arm64: Document PMU counters access from userspace
  arm64: perf: Enable PMU counter userspace access for perf event
  arm64: perf: Add userspace counter access disable switch
  perf: Add a counter for number of user access events in context
  x86: perf: Move RDPMC event flag to a common definition
2021-12-14 13:42:22 +00:00
Will Deacon
1879a61f4a Merge branch 'for-next/perf-smmu' into for-next/perf
* for-next/perf-smmu:
  perf/smmuv3: Synthesize IIDR from CoreSight ID registers
  perf/smmuv3: Add devicetree support
  dt-bindings: Add Arm SMMUv3 PMCG binding
2021-12-14 13:42:17 +00:00
Will Deacon
8330904fed Merge branch 'for-next/perf-hisi' into for-next/perf
* for-next/perf-hisi:
  drivers/perf: hisi: Add driver for HiSilicon PCIe PMU
  docs: perf: Add description for HiSilicon PCIe PMU driver
2021-12-14 13:42:05 +00:00
Will Deacon
e73bc4fd78 Merge branch 'for-next/perf-cn10k' into for-next/perf
* for-next/perf-cn10k:
  dt-bindings: perf: Add YAML schemas for Marvell CN10K LLC-TAD pmu bindings
  drivers: perf: Add LLC-TAD perf counter support
2021-12-14 13:41:58 +00:00
Will Deacon
fc369f925f Merge branch 'for-next/perf-cmn' into for-next/perf
* for-next/perf-cmn:
  perf/arm-cmn: Add debugfs topology info
  perf/arm-cmn: Add CI-700 Support
  dt-bindings: perf: arm-cmn: Add CI-700
  perf/arm-cmn: Support new IP features
  perf/arm-cmn: Demarcate CMN-600 specifics
  perf/arm-cmn: Move group validation data off-stack
  perf/arm-cmn: Optimise DTC counter accesses
  perf/arm-cmn: Optimise DTM counter reads
  perf/arm-cmn: Refactor DTM handling
  perf/arm-cmn: Streamline node iteration
  perf/arm-cmn: Refactor node ID handling
  perf/arm-cmn: Drop compile-test restriction
  perf/arm-cmn: Account for NUMA affinity
  perf/arm-cmn: Fix CPU hotplug unregistration
2021-12-14 13:41:42 +00:00
Mark Rutland
053f58bab3 arm64: atomics: lse: define RETURN ops in terms of FETCH ops
The FEAT_LSE atomic instructions include LD* instructions which return
the original value of a memory location can be used to directly
implement FETCH opertations. Each RETURN op is implemented as a copy of
the corresponding FETCH op with a trailing instruction to generate the
new value of the memory location. We only directly implement
*_fetch_add*(), for which we have a trailing `add` instruction.

As the compiler has no visibility of the `add`, this leads to less than
optimal code generation when consuming the result.

For example, the compiler cannot constant-fold the addition into later
operations, and currently GCC 11.1.0 will compile:

       return __lse_atomic_sub_return(1, v) == 0;

As:

	mov     w1, #0xffffffff
	ldaddal w1, w2, [x0]
	add     w1, w1, w2
	cmp     w1, #0x0
	cset    w0, eq  // eq = none
	ret

This patch improves this by replacing the `add` with C addition after
the inline assembly block, e.g.

	ret += i;

This allows the compiler to manipulate `i`. This permits the compiler to
merge the `add` and `cmp` for the above, e.g.

	mov     w1, #0xffffffff
	ldaddal w1, w1, [x0]
	cmp     w1, #0x1
	cset    w0, eq  // eq = none
	ret

With this change the assembly for each RETURN op is identical to the
corresponding FETCH op (including barriers and clobbers) so I've removed
the inline assembly and rewritten each RETURN op in terms of the
corresponding FETCH op, e.g.

| static inline void __lse_atomic_add_return(int i, atomic_t *v)
| {
|       return __lse_atomic_fetch_add(i, v) + i
| }

The new construction does not adversely affect the common case, and
before and after this patch GCC 11.1.0 can compile:

	__lse_atomic_add_return(i, v)

As:

	ldaddal w0, w2, [x1]
	add     w0, w0, w2

... while having the freedom to do better elsewhere.

This is intended as an optimization and cleanup.
There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211210151410.2782645-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 13:00:33 +00:00
Mark Rutland
8a578a759a arm64: atomics: lse: improve constraints for simple ops
We have overly conservative assembly constraints for the basic FEAT_LSE
atomic instructions, and using more accurate and permissive constraints
will allow for better code generation.

The FEAT_LSE basic atomic instructions have come in two forms:

	LD{op}{order}{size} <Rs>, <Rt>, [<Rn>]
	ST{op}{order}{size} <Rs>, [<Rn>]

The ST* forms are aliases of the LD* forms where:

	ST{op}{order}{size} <Rs>, [<Rn>]
Is:
	LD{op}{order}{size} <Rs>, XZR, [<Rn>]

For either form, both <Rs> and <Rn> are read but not written back to,
and <Rt> is written with the original value of the memory location.
Where (<Rt> == <Rs>) or (<Rt> == <Rn>), <Rt> is written *after* the
other register value(s) are consumed. There are no UNPREDICTABLE or
CONSTRAINED UNPREDICTABLE behaviours when any pair of <Rs>, <Rt>, or
<Rn> are the same register.

Our current inline assembly always uses <Rs> == <Rt>, treating this
register as both an input and an output (using a '+r' constraint). This
forces the compiler to do some unnecessary register shuffling and/or
redundant value generation.

For example, the compiler cannot reuse the <Rs> value, and currently GCC
11.1.0 will compile:

	__lse_atomic_add(1, a);
	__lse_atomic_add(1, b);
	__lse_atomic_add(1, c);

As:

	mov     w3, #0x1
	mov     w4, w3
	stadd   w4, [x0]
	mov     w0, w3
	stadd   w0, [x1]
	stadd   w3, [x2]

We can improve this with more accurate constraints, separating <Rs> and
<Rt>, where <Rs> is an input-only register ('r'), and <Rt> is an
output-only value ('=r'). As <Rt> is written back after <Rs> is
consumed, it does not need to be earlyclobber ('=&r'), leaving the
compiler free to use the same register for both <Rs> and <Rt> where this
is desirable.

At the same time, the redundant 'r' constraint for `v` is removed, as
the `+Q` constraint is sufficient.

With this change, the above example becomes:

	mov     w3, #0x1
	stadd   w3, [x0]
	stadd   w3, [x1]
	stadd   w3, [x2]

I've made this change for the non-value-returning and FETCH ops. The
RETURN ops have a multi-instruction sequence for which we cannot use the
same constraints, and a subsequent patch will rewrite hte RETURN ops in
terms of the FETCH ops, relying on the ability for the compiler to reuse
the <Rs> value.

This is intended as an optimization.
There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211210151410.2782645-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 13:00:23 +00:00
Mark Rutland
5e9e43c987 arm64: atomics: lse: define ANDs in terms of ANDNOTs
The FEAT_LSE atomic instructions include atomic bit-clear instructions
(`ldclr*` and `stclr*`) which can be used to directly implement ANDNOT
operations. Each AND op is implemented as a copy of the corresponding
ANDNOT op with a leading `mvn` instruction to apply a bitwise NOT to the
`i` argument.

As the compiler has no visibility of the `mvn`, this leads to less than
optimal code generation when generating `i` into a register. For
example, __lse_atomic_fetch_and(0xf, v) can be compiled to:

	mov     w1, #0xf
	mvn     w1, w1
	ldclral w1, w1, [x2]

This patch improves this by replacing the `mvn` with NOT in C before the
inline assembly block, e.g.

	i = ~i;

This allows the compiler to generate `i` into a register more optimally,
e.g.

	mov     w1, #0xfffffff0
	ldclral w1, w1, [x2]

With this change the assembly for each AND op is identical to the
corresponding ANDNOT op (including barriers and clobbers), so I've
removed the inline assembly and rewritten each AND op in terms of the
corresponding ANDNOT op, e.g.

| static inline void __lse_atomic_and(int i, atomic_t *v)
| {
| 	return __lse_atomic_andnot(~i, v);
| }

This is intended as an optimization and cleanup.
There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211210151410.2782645-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 13:00:23 +00:00
Mark Rutland
ef53245060 arm64: atomics lse: define SUBs in terms of ADDs
The FEAT_LSE atomic instructions include atomic ADD instructions
(`stadd*` and `ldadd*`), but do not include atomic SUB instructions, so
we must build all of the SUB operations using the ADD instructions. We
open-code these today, with each SUB op implemented as a copy of the
corresponding ADD op with a leading `neg` instruction in the inline
assembly to negate the `i` argument.

As the compiler has no visibility of the `neg`, this leads to less than
optimal code generation when generating `i` into a register. For
example, __les_atomic_fetch_sub(1, v) can be compiled to:

	mov     w1, #0x1
	neg     w1, w1
	ldaddal w1, w1, [x2]

This patch improves this by replacing the `neg` with negation in C
before the inline assembly block, e.g.

	i = -i;

This allows the compiler to generate `i` into a register more optimally,
e.g.

	mov     w1, #0xffffffff
	ldaddal w1, w1, [x2]

With this change the assembly for each SUB op is identical to the
corresponding ADD op (including barriers and clobbers), so I've removed
the inline assembly and rewritten each SUB op in terms of the
corresponding ADD op, e.g.

| static inline void __lse_atomic_sub(int i, atomic_t *v)
| {
| 	__lse_atomic_add(-i, v);
| }

For clarity I've moved the definition of each SUB op immediately after
the corresponding ADD op, and used a single macro to create the RETURN
forms of both ops.

This is intended as an optimization and cleanup.
There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211210151410.2782645-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 13:00:23 +00:00
Mark Rutland
8e6082e94a arm64: atomics: format whitespace consistently
The code for the atomic ops is formatted inconsistently, and while this
is not a functional problem it is rather distracting when working on
them.

Some have ops have consistent indentation, e.g.

| #define ATOMIC_OP_ADD_RETURN(name, mb, cl...)                           \
| static inline int __lse_atomic_add_return##name(int i, atomic_t *v)     \
| {                                                                       \
|         u32 tmp;                                                        \
|                                                                         \
|         asm volatile(                                                   \
|         __LSE_PREAMBLE                                                  \
|         "       ldadd" #mb "    %w[i], %w[tmp], %[v]\n"                 \
|         "       add     %w[i], %w[i], %w[tmp]"                          \
|         : [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)        \
|         : "r" (v)                                                       \
|         : cl);                                                          \
|                                                                         \
|         return i;                                                       \
| }

While others have negative indentation for some lines, and/or have
misaligned trailing backslashes, e.g.

| static inline void __lse_atomic_##op(int i, atomic_t *v)                        \
| {                                                                       \
|         asm volatile(                                                   \
|         __LSE_PREAMBLE                                                  \
| "       " #asm_op "     %w[i], %[v]\n"                                  \
|         : [i] "+r" (i), [v] "+Q" (v->counter)                           \
|         : "r" (v));                                                     \
| }

This patch makes the indentation consistent and also aligns the trailing
backslashes. This makes the code easier to read for those (like myself)
who are easily distracted by these inconsistencies.

This is intended as a cleanup.
There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211210151410.2782645-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 13:00:23 +00:00
Qi Liu
8404b0fbc7 drivers/perf: hisi: Add driver for HiSilicon PCIe PMU
PCIe PMU Root Complex Integrated End Point(RCiEP) device is supported
to sample bandwidth, latency, buffer occupation etc.

Each PMU RCiEP device monitors multiple Root Ports, and each RCiEP is
registered as a PMU in /sys/bus/event_source/devices, so users can
select target PMU, and use filter to do further sets.

Filtering options contains:
event     - select the event.
port      - select target Root Ports. Information of Root Ports are
            shown under sysfs.
bdf       - select requester_id of target EP device.
trig_len  - set trigger condition for starting event statistics.
trig_mode - set trigger mode. 0 means starting to statistic when bigger
            than trigger condition, and 1 means smaller.
thr_len   - set threshold for statistics.
thr_mode  - set threshold mode. 0 means count when bigger than threshold,
            and 1 means smaller.

Acked-by: Krzysztof Wilczyński <kw@linux.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Reviewed-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Link: https://lore.kernel.org/r/20211202080633.2919-3-liuqi115@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14 12:30:26 +00:00
Qi Liu
c8602008e2 docs: perf: Add description for HiSilicon PCIe PMU driver
PCIe PMU Root Complex Integrated End Point(RCiEP) device is supported on
HiSilicon HIP09 platform. Document it to provide guidance on how to
use it.

Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Reviewed-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Link: https://lore.kernel.org/r/20211202080633.2919-2-liuqi115@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14 12:30:26 +00:00
Bhaskara Budiredla
4cbf47728f dt-bindings: perf: Add YAML schemas for Marvell CN10K LLC-TAD pmu bindings
Add device tree bindings for Last-level-cache Tag-and-data
(LLC-TAD) unit PMU for Marvell CN10K SoCs.

Signed-off-by: Bhaskara Budiredla <bbudiredla@marvell.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211115043506.6679-3-bbudiredla@marvell.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14 12:23:01 +00:00
Bhaskara Budiredla
036a7584be drivers: perf: Add LLC-TAD perf counter support
This driver adds support for Last-level cache tag-and-data unit
(LLC-TAD) PMU that is featured in some of the Marvell's CN10K
infrastructure silicons.

The LLC is divided into 2N slices distributed across N Mesh tiles
in a single-socket configuration. The driver always configures the
same counter for all of the TADs. The user would end up effectively
reserving one of eight counters in every TAD to look across all TADs.
The occurrences of events are aggregated and presented to the user
at the end of an application run. The driver does not provide a way
for the user to partition TADs so that different TADs are used for
different applications.

The event counters are zeroed to start event counting to avoid any
rollover issues. TAD perf counters are 64-bit, so it's not currently
possible to overflow event counters at current mesh and core
frequencies.

To measure tad pmu events use perf tool stat command. For instance:

perf stat -e tad_dat_msh_in_dss,tad_req_msh_out_any <application>
perf stat -e tad_alloc_any,tad_hit_any,tad_tag_rd <application>

Signed-off-by: Bhaskara Budiredla <bbudiredla@marvell.com>
Link: https://lore.kernel.org/r/20211115043506.6679-2-bbudiredla@marvell.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14 12:23:01 +00:00
Ard Biesheuvel
2c54b423cf arm64/xor: use EOR3 instructions when available
Use the EOR3 instruction to implement xor_blocks() if the instruction is
available, which is the case if the CPU implements the SHA-3 extension.
This is about 20% faster on Apple M1 when using the 5-way version.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20211213140252.2856053-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 12:14:26 +00:00
Robin Murphy
df457ca973 perf/smmuv3: Synthesize IIDR from CoreSight ID registers
The SMMU_PMCG_IIDR register was not present in older revisions of the
Arm SMMUv3 spec. On Arm Ltd. implementations, the IIDR value consists of
fields from several PIDR registers, allowing us to present a
standardized identifier to userspace.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20211117144844.241072-4-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14 12:09:52 +00:00
Jean-Philippe Brucker
3f7be43561 perf/smmuv3: Add devicetree support
Add device-tree support to the SMMUv3 PMCG driver.

Signed-off-by: Jay Chen <jkchen@linux.alibaba.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20211117144844.241072-3-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14 12:09:52 +00:00
Jean-Philippe Brucker
2704e75943 dt-bindings: Add Arm SMMUv3 PMCG binding
Add binding for the Arm SMMUv3 PMU. Each node represents a PMCG, and is
placed as a sibling node of the SMMU. Although the PMCGs registers may
be within the SMMU MMIO region, they are separate devices, and there can
be multiple PMCG devices for each SMMU (for example one for the TCU and
one for each TBU).

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20211117144844.241072-2-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14 12:09:52 +00:00
Robin Murphy
a88fa6c28b perf/arm-cmn: Add debugfs topology info
In general, detailed performance analysis will require knoweldge of the
the SoC beyond the CMN itself - e.g. which actual CPUs/peripherals/etc.
are connected to each node. However for certain development and bringup
tasks it can be useful to have a quick overview of the CMN internal
topology to hand too. Add a debugfs file to map this out.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/159fd4d7e19fb3c8801a8cb64ee73ec50f55903c.1638530442.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14 12:09:28 +00:00