mirror of
https://github.com/torvalds/linux.git
synced 2024-12-02 09:01:34 +00:00
3071f13d75
This adds a new dynamic PMU to the Perf Events framework to program and control the L3 cache PMUs in some Qualcomm Technologies SOCs. The driver supports a distributed cache architecture where the overall cache for a socket is comprised of multiple slices each with its own PMU. Access to each individual PMU is provided even though all CPUs share all the slices. User space needs to aggregate to individual counts to provide a global picture. The driver exports formatting and event information to sysfs so it can be used by the perf user space tools with the syntaxes: perf stat -a -e l3cache_0_0/read-miss/ perf stat -a -e l3cache_0_0/event=0x21/ Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Agustin Vega-Frias <agustinv@codeaurora.org> [will: fixed sparse issues] Signed-off-by: Will Deacon <will.deacon@arm.com>
26 lines
1.3 KiB
Plaintext
26 lines
1.3 KiB
Plaintext
Qualcomm Datacenter Technologies L3 Cache Performance Monitoring Unit (PMU)
|
|
===========================================================================
|
|
|
|
This driver supports the L3 cache PMUs found in Qualcomm Datacenter Technologies
|
|
Centriq SoCs. The L3 cache on these SOCs is composed of multiple slices, shared
|
|
by all cores within a socket. Each slice is exposed as a separate uncore perf
|
|
PMU with device name l3cache_<socket>_<instance>. User space is responsible
|
|
for aggregating across slices.
|
|
|
|
The driver provides a description of its available events and configuration
|
|
options in sysfs, see /sys/devices/l3cache*. Given that these are uncore PMUs
|
|
the driver also exposes a "cpumask" sysfs attribute which contains a mask
|
|
consisting of one CPU per socket which will be used to handle all the PMU
|
|
events on that socket.
|
|
|
|
The hardware implements 32bit event counters and has a flat 8bit event space
|
|
exposed via the "event" format attribute. In addition to the 32bit physical
|
|
counters the driver supports virtual 64bit hardware counters by using hardware
|
|
counter chaining. This feature is exposed via the "lc" (long counter) format
|
|
flag. E.g.:
|
|
|
|
perf stat -e l3cache_0_0/read-miss,lc/
|
|
|
|
Given that these are uncore PMUs the driver does not support sampling, therefore
|
|
"perf record" will not work. Per-task perf sessions are not supported.
|