linux/arch/x86/events/intel
Jiri Olsa df6c3db8d3 perf/x86/intel: Add proper condition to run sched_task callbacks
We have 2 functions using the same sched_task callback:

  - PEBS drain for free running counters
  - LBR save/store

Both of them are called from intel_pmu_sched_task() and
either of them can be unwillingly triggered when the
other one is configured to run.

Let's say there's PEBS drain configured in sched_task
callback for the event, but in the callback itself
(intel_pmu_sched_task()) we will also run the code for
LBR save/restore, which we did not ask for, but the
code in intel_pmu_sched_task() does not check for that.

This can lead to extra cycles in some perf monitoring,
like when we monitor PEBS event without LBR data.

  # perf record --no-timestamp -c 10000 -e cycles:p ./perf bench sched pipe -l 1000000

  (We need PEBS, non freq/non timestamp event to enable
   the sched_task callback)

The perf stat of cycles and msr:write_msr for above
command before the change:
  ...
  Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
                                 ./perf bench sched pipe -l 1000000' (5 runs):

    18,519,557,441      cycles:k
        91,195,527      msr:write_msr

      29.334476406 seconds time elapsed

And after the change:
  ...
  Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
                                 ./perf bench sched pipe -l 1000000' (5 runs):

    18,704,973,540      cycles:k
        27,184,720      msr:write_msr

      16.977875900 seconds time elapsed

There's no affect on cycles:k because the sched_task happens
with events switched off, however the msr:write_msr tracepoint
counter together with almost 50% of time speedup show the
improvement.

Monitoring LBR event and having extra PEBS drain processing
in sched_task callback showed just a little speedup, because
the drain function does not do much extra work in case there
is no PEBS data.

Adding conditions to recognize the configured work that needs
to be done in the x86_pmu's sched_task callback.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170719075247.GA27506@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-21 09:58:39 +02:00
..
bts.c perf/core: Keep AUX flags in the output handle 2017-03-16 09:51:10 +01:00
core.c perf/x86/intel: Add proper condition to run sched_task callbacks 2017-07-21 09:58:39 +02:00
cqm.c perf/x86/intel/cqm: Use cpuhp_setup_state_cpuslocked() 2017-05-26 10:10:40 +02:00
cstate.c perf/x86/intel: Enable C-state residency events for Apollo Lake 2017-07-18 14:13:40 +02:00
ds.c perf/x86/intel: Add proper condition to run sched_task callbacks 2017-07-21 09:58:39 +02:00
knc.c
lbr.c perf/x86/intel: Add proper condition to run sched_task callbacks 2017-07-21 09:58:39 +02:00
Makefile x86/perf/intel/rapl: Fix module name collision with powercap intel-rapl 2016-07-06 12:51:59 +02:00
p4.c perf/x86/intel/p4: Trival indentation fix, remove space 2016-05-20 09:18:22 +02:00
p6.c
pt.c perf/x86/intel/pt: Allow the disabling of branch tracing 2017-03-30 09:53:49 +02:00
pt.h perf/x86/intel/pt: Allow the disabling of branch tracing 2017-03-30 09:53:49 +02:00
rapl.c perf/x86: Fix Broadwell-EP DRAM RAPL events 2017-05-03 14:40:37 +02:00
uncore_nhmex.c
uncore_snb.c perf/x86/uncore: Fix crash by removing bogus event_list[] handling for SNB client uncore IMC 2016-11-16 09:46:35 +01:00
uncore_snbep.c perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code 2017-01-11 12:13:21 +01:00
uncore.c perf/x86/intel/uncore: Fix wrong box pointer check 2017-06-29 21:28:13 +02:00
uncore.h x86/events: Remove last remnants of old filenames 2017-03-01 11:27:26 +01:00