Commit Graph

20 Commits

Author SHA1 Message Date
Will Deacon
6dbc002970 ARM: perf: prepare for moving CPU PMU code into separate file
The CPU PMU code is tightly coupled with generic ARM PMU handling code.
This makes it cumbersome when trying to add support for other ARM PMUs
(e.g. interconnect, L2 cache controller, bus) as the generic parts of
the code are not readily reusable.

This patch cleans up perf_event.c so that reusable code is exposed via
header files to other potential PMU drivers. The CPU code is
consistently named to identify it as such and also to prepare for moving
it into a separate file.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-08-23 11:35:52 +01:00
Will Deacon
04236f9fe0 ARM: perf: probe devicetree in preference to current CPU
The CPU PMU is probed using the current cpuid information as part of the
early_initcall initialising the architecture perf backend. For
architectures without NMI (such as ARM), this does not need to be
performed early and can be deferred to the driver probe callback. This
also allows us to probe the devicetree in preference to parsing the
current cpuid, which may be invalid on a big.LITTLE multi-cluster
system.

This patch defers the PMU probing and uses the devicetree information
when available.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2012-08-23 11:35:52 +01:00
Will Deacon
4295b898f5 ARM: 7448/1: perf: remove arm_perf_pmu_ids global enumeration
In order to provide PMU name strings compatible with the OProfile
user ABI, an enumeration of all PMUs is currently used by perf to
identify each PMU uniquely. Unfortunately, this does not scale well
in the presence of multiple PMUs and creates a single, global namespace
across all PMUs in the system.

This patch removes the enumeration and instead uses the name string
for the PMU to map onto the OProfile variant. perf_pmu_name is
implemented for CPU PMUs, which is all that OProfile cares about anyway.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-07-09 17:41:10 +01:00
Robert Richter
fd0d000b2c perf: Pass last sampling period to perf_sample_data_init()
We always need to pass the last sample period to
perf_sample_data_init(), otherwise the event distribution will be
wrong. Thus, modifiyng the function interface with the required period
as argument. So basically a pattern like this:

        perf_sample_data_init(&data, ~0ULL);
        data.period = event->hw.last_period;

will now be like that:

        perf_sample_data_init(&data, ~0ULL, event->hw.last_period);

Avoids unininitialized data.period and simplifies code.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-3-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-09 15:23:12 +02:00
Will Deacon
3f31ae1213 ARM: 7357/1: perf: fix overflow handling for xscale2 PMUs
xscale2 PMUs indicate overflow not via the PMU control register, but by
a separate overflow FLAG register instead.

This patch fixes the xscale2 PMU code to use this register to detect
to overflow and ensures that we clear any pending overflow when
disabling a counter.

Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-03-07 09:40:49 +00:00
Will Deacon
f6f5a30c83 ARM: 7356/1: perf: check that we have an event in the PMU IRQ handlers
The PMU IRQ handlers in perf assume that if a counter has overflowed
then perf must be responsible. In the paranoid world of crazy hardware,
this could be false, so check that we do have a valid event before
attempting to dereference NULL in the interrupt path.

Cc: <stable@vger.kernel.org>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-03-07 09:40:49 +00:00
Will Deacon
5727347180 ARM: 7354/1: perf: limit sample_period to half max_period in non-sampling mode
On ARM, the PMU does not stop counting after an overflow and therefore
IRQ latency affects the new counter value read by the kernel. This is
significant for non-sampling runs where it is possible for the new value
to overtake the previous one, causing the delta to be out by up to
max_period events.

Commit a737823d ("ARM: 6835/1: perf: ensure overflows aren't missed due
to IRQ latency") attempted to fix this problem by allowing interrupt
handlers to pass an overflow flag to the event update function, causing
the overflow calculation to assume that the counter passed through zero
when going from prev to new. Unfortunately, this doesn't work when
overflow occurs on the perf_task_tick path because we have the flag
cleared and end up computing a large negative delta.

This patch removes the overflow flag from armpmu_event_update and
instead limits the sample_period to half of the max_period for
non-sampling profiling runs.

Cc: <stable@vger.kernel.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-03-07 09:40:48 +00:00
Will Deacon
0445e7a58e ARM: perf: add support for stalled cycle ABI events
Commit 8f622422 ("perf events: Add generic front-end and back-end
stalled cycle event definitions") added two new ABI events for counting
stalled cycles.

This patch adds support for these new events to the ARM perf
implementation.

Cc: Jamie Iles <jamie@jamieiles.com>
Cc: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2011-12-02 15:16:16 +00:00
Mark Rutland
8be3f9a238 ARM: perf: remove cpu-related misnomers
Currently struct cpu_hw_events stores data on events running on a
PMU associated with a CPU. As this data is general enough to be used
for system PMUs, this name is a misnomer, and may cause confusion when
it is used for system PMUs.

Additionally, 'armpmu' is commonly used as a parameter name for an
instance of struct arm_pmu. The name is also used for a global instance
which represents the CPU's PMU.

As cpu_hw_events is now not tied to CPU PMUs, it is renamed to
pmu_hw_events, with instances of it renamed similarly. As the global
'armpmu' is CPU-specfic, it is renamed to cpu_pmu. This should make it
clearer which code is generic, and which is coupled with the CPU.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Ashwin Chaugule <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2011-08-31 10:50:12 +01:00
Mark Rutland
e1f431b57e ARM: perf: refactor event mapping
Currently mapping an event type to a hardware configuration value
depends on the data being pointed to from struct arm_pmu. These fields
(cache_map, event_map, raw_event_mask) are currently specific to CPU
PMUs, and do not serve the general case well.

This patch replaces the event map pointers on struct arm_pmu with a new
'map_event' function pointer. Small shim functions are used to reuse
the existing common code.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Ashwin Chaugule <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2011-08-31 10:50:09 +01:00
Mark Rutland
0f78d2d5cc ARM: perf: lock PMU registers per-CPU
Currently, a single lock serialises access to CPU PMU registers. This
global locking is unnecessary as PMU registers are local to the CPU
they monitor.

This patch replaces the global lock with a per-CPU lock. As the lock is
in struct cpu_hw_events, PMUs providing a single cpu_hw_events instance
can be locked globally.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Ashwin Chaugule <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2011-08-31 10:50:07 +01:00
Mark Rutland
c47f8684ba ARM: perf: remove active_mask
Currently, pmu_hw_events::active_mask is used to keep track of which
events are active in hardware. As we can stop counters and their
interrupts, this is unnecessary.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Ashwin Chaugule <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2011-08-31 10:50:02 +01:00
Will Deacon
d2b41f7456 ARM: perf: index Xscale and ARMv6 event counters starting from zero
Now that the ARMv7 PMU backend indexes event counters from zero, follow
suit and do the same for ARMv6 and Xscale.

Acked-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2011-08-31 10:18:00 +01:00
Mark Rutland
a6c93afed3 ARM: perf: de-const struct arm_pmu
This patch removes const qualifiers from instances of struct arm_pmu,
and functions initialising them, in preparation for generalising
arm_pmu usage to system (AKA uncore) PMUs.

This will allow for dynamically modifiable structures (locks,
struct pmu) to be added as members of struct arm_pmu.

Acked-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2011-08-31 10:16:58 +01:00
Peter Zijlstra
89d6c0b5bd perf, arch: Add generic NODE cache events
Add a NODE level to the generic cache events which is used to measure
local vs remote memory accesses. Like all other cache events, an
ACCESS is HIT+MISS, if there is no way to distinguish between reads
and writes do reads only etc..

The below needs filling out for !x86 (which I filled out with
unsupported events).

I'm fairly sure ARM can leave it like that since it doesn't strike me as
an architecture that even has NUMA support. SH might have something since
it does appear to have some NUMA bits.

Sparc64, PowerPC and MIPS certainly want a good look there since they
clearly are NUMA capable.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: David Miller <davem@davemloft.net>
Cc: Anton Blanchard <anton@samba.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1303508226.4865.8.camel@laptop
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01 11:06:38 +02:00
Peter Zijlstra
a8b0ca17b8 perf: Remove the nmi parameter from the swevent and overflow interface
The nmi parameter indicated if we could do wakeups from the current
context, if not, we would set some state and self-IPI and let the
resulting interrupt do the wakeup.

For the various event classes:

  - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
    the PMI-tail (ARM etc.)
  - tracepoint: nmi=0; since tracepoint could be from NMI context.
  - software: nmi=[0,1]; some, like the schedule thing cannot
    perform wakeups, and hence need 0.

As one can see, there is very little nmi=1 usage, and the down-side of
not using it is that on some platforms some software events can have a
jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).

The up-side however is that we can remove the nmi parameter and save a
bunch of conditionals in fast paths.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01 11:06:35 +02:00
Will Deacon
a737823d37 ARM: 6835/1: perf: ensure overflows aren't missed due to IRQ latency
If a counter overflows during a perf stat profiling run it may overtake
the last known value of the counter:

    0        prev     new                0xffffffff
    |----------|-------|----------------------|

In this case, the number of events that have occurred is
(0xffffffff - prev) + new. Unfortunately, the event update code will
not realise an overflow has occurred and will instead report the event
delta as (new - prev) which may be considerably smaller than the real
count.

This patch adds an extra argument to armpmu_event_update which indicates
whether or not an overflow has occurred. If an overflow has occurred
then we use the maximum period of the counter to calculate the elapsed
events.

Acked-by: Jamie Iles <jamie@jamieiles.com>
Reported-by: Ashwin Chaugule <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-03-26 10:06:09 +00:00
Will Deacon
961ec6daa7 ARM: 6521/1: perf: use raw_spinlock_t for pmu_lock
For kernels built with PREEMPT_RT, critical sections protected
by standard spinlocks are preemptible. This is not acceptable
on perf as (a) we may be scheduled onto a different CPU whilst
reading/writing banked PMU registers and (b) the latency when
reading the PMU registers becomes unpredictable.

This patch upgrades the pmu_lock spinlock to a raw_spinlock
instead.

Reported-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-12-04 11:18:08 +00:00
Will Deacon
4d6b7a779b ARM: 6512/1: perf: fix warnings generated by sparse
Russell reported a number of warnings coming from sparse when
checking the ARM perf_event.c files:

| perf_event.c seems to also have problems too:
|
|   CHECK   arch/arm/kernel/perf_event.c
|   arch/arm/kernel/perf_event.c:37:1: warning: symbol 'pmu_lock' was not declared. Should it be static?
|   arch/arm/kernel/perf_event.c:70:1: warning: symbol 'cpu_hw_events' was not declared. Should it be static?
|   arch/arm/kernel/perf_event.c:1006:1: warning: symbol 'armv6pmu_enable_event' was not declared. Should it be static?
|   arch/arm/kernel/perf_event.c:1113:1: warning: symbol 'armv6pmu_stop' was not declared. Should it be static?
|   arch/arm/kernel/perf_event.c:1956:6: warning: symbol 'armv7pmu_enable_event' was not declared. Should it be static?
|   arch/arm/kernel/perf_event.c:3072:14: warning: incorrect type in argument 1 (different address spaces)
|   arch/arm/kernel/perf_event.c:3072:14:    expected void const volatile [noderef] <asn:1>*<noident>
|   arch/arm/kernel/perf_event.c:3072:14:    got struct frame_tail *tail
|   arch/arm/kernel/perf_event.c:3074:49: warning: incorrect type in argument 2 (different address spaces)
|   arch/arm/kernel/perf_event.c:3074:49:    expected void const [noderef] <asn:1>*from
|   arch/arm/kernel/perf_event.c:3074:49:    got struct frame_tail *tail

This patch resolves these issues so we can live in silence
again.

Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-12-04 11:17:44 +00:00
Will Deacon
43eab87828 ARM: perf: separate PMU backends into multiple files
The ARM perf_event.c file contains all PMU backends and, as new PMUs
are introduced, will continue to grow.

This patch follows the example of x86 and splits the PMU implementations
into separate files which are then #included back into the main
file. Compile-time guards are added to each PMU file to avoid compiling
in code that is not relevant for the version of the architecture which
we are targetting.

Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2010-11-25 16:52:08 +00:00