linux/tools/perf/Documentation
Stephane Eranian d99c22eabe perf record: Add num-synthesize-threads option
To control degree of parallelism of the synthesize_mmap() code which
is scanning /proc/PID/task/PID/maps and can be time consuming.
Mimic perf top way of handling the option.
If not specified will default to 1 thread, i.e. default behavior before
this option.

On a desktop computer the processing of /proc/PID/task/PID/maps isn't
slow enough to warrant parallel processing and the thread creation has
some cost - hence the default of 1. On a loaded server with
>100 cores it is possible to see synthesis times in the order of
seconds and in this case having the option is desirable.

As the processing is a synchronization point, it is legitimate to worry if
Amdahl's law will apply to this patch. Profiling with this patch in
place:
https://lore.kernel.org/lkml/20200415054050.31645-4-irogers@google.com/
shows:
...
      - 32.59% __perf_event__synthesize_threads
         - 32.54% __event__synthesize_thread
            + 22.13% perf_event__synthesize_mmap_events
            + 6.68% perf_event__get_comm_ids.constprop.0
            + 1.49% process_synthesized_event
            + 1.29% __GI___readdir64
            + 0.60% __opendir
...
That is the processing is 1.49% of execution time and there is plenty to
make parallel. This is shown in the benchmark in this patch:

https://lore.kernel.org/lkml/20200415054050.31645-2-irogers@google.com/

  Computing performance of multi threaded perf event synthesis by
  synthesizing events on CPU 0:
   Number of synthesis threads: 1
     Average synthesis took: 127729.000 usec (+- 3372.880 usec)
     Average num. events: 21548.600 (+- 0.306)
     Average time per event 5.927 usec
   Number of synthesis threads: 2
     Average synthesis took: 88863.500 usec (+- 385.168 usec)
     Average num. events: 21552.800 (+- 0.327)
     Average time per event 4.123 usec
   Number of synthesis threads: 3
     Average synthesis took: 83257.400 usec (+- 348.617 usec)
     Average num. events: 21553.200 (+- 0.327)
     Average time per event 3.863 usec
   Number of synthesis threads: 4
     Average synthesis took: 75093.000 usec (+- 422.978 usec)
     Average num. events: 21554.200 (+- 0.200)
     Average time per event 3.484 usec
   Number of synthesis threads: 5
     Average synthesis took: 64896.600 usec (+- 353.348 usec)
     Average num. events: 21558.000 (+- 0.000)
     Average time per event 3.010 usec
   Number of synthesis threads: 6
     Average synthesis took: 59210.200 usec (+- 342.890 usec)
     Average num. events: 21560.000 (+- 0.000)
     Average time per event 2.746 usec
   Number of synthesis threads: 7
     Average synthesis took: 54093.900 usec (+- 306.247 usec)
     Average num. events: 21562.000 (+- 0.000)
     Average time per event 2.509 usec
   Number of synthesis threads: 8
     Average synthesis took: 48938.700 usec (+- 341.732 usec)
     Average num. events: 21564.000 (+- 0.000)
     Average time per event 2.269 usec

Where average time per synthesized event goes from 5.927 usec with 1
thread to 2.269 usec with 8. This isn't a linear speed up as not all of
synthesize code has been made parallel. If the synthesis time was about
10 seconds then using 8 threads may bring this down to less than 4.

Signed-off-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Jones <tonyj@suse.de>
Cc: yuzhoujian <yuzhoujian@didichuxing.com>
Link: http://lore.kernel.org/lkml/20200422155038.9380-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-23 11:10:41 -03:00
..
android.txt perf tools: Update android build documentation 2016-07-04 20:27:27 -03:00
asciidoc.conf perf docs: Allow man page date to be specified 2019-09-27 09:26:14 -03:00
asciidoctor-extensions.rb perf Documentation: Support for asciidoctor 2018-04-26 13:47:10 -03:00
build-xed.txt perf script: Add --insn-trace for instruction decoding 2018-10-24 15:29:50 -03:00
Build.txt perf tools: Add doc about how to build perf with Asan and UBSan 2019-03-19 16:52:04 -03:00
callchain-overhead-calculation.txt perf tools: Document --children option in more detail 2015-04-29 10:38:06 -03:00
db-export.txt perf db-export: Add brief documentation 2019-06-05 09:47:57 -03:00
examples.txt perf record: Remove -f/--force option 2013-07-08 17:37:25 -03:00
intel-bts.txt perf tools: Add Intel BTS support 2015-08-21 11:34:10 -03:00
intel-pt.txt perf intel-pt: Update intel-pt.txt file with new location of the documentation 2020-03-11 11:00:33 -03:00
itrace.txt perf auxtrace: Add an option to synthesize callchains for regular events 2020-04-16 12:19:15 -03:00
jit-interface.txt perf symbols: Add description of JIT interface 2012-08-13 14:55:02 -03:00
jitdump-specification.txt perf docs: Correct and clarify jitdump spec 2019-09-30 17:29:51 -03:00
Makefile perf doc: allow ASCIIDOC_EXTRA to be an argument 2020-04-18 09:05:00 -03:00
manpage-1.72.xsl
manpage-base.xsl
manpage-bold-literal.xsl
manpage-normal.xsl
manpage-suppress-sp.xsl
perf-annotate.txt perf tools: Support --prefix/--prefix-strip 2020-01-14 12:02:19 -03:00
perf-archive.txt perf archive: Remove duplicated 'runs' in man page 2013-12-09 15:21:45 -03:00
perf-bench.txt perf bench: Add event synthesis benchmark 2020-04-16 12:19:12 -03:00
perf-buildid-cache.txt perf buildid-cache: Support --purge-all option 2018-04-26 09:30:26 -03:00
perf-buildid-list.txt perf report: Accept fifos as input file 2011-12-23 17:01:03 -02:00
perf-c2c.txt perf c2c: Add option to enable the LBR stitching approach 2020-04-18 09:05:01 -03:00
perf-config.txt perf callchain: Update docs regarding kernel/user space unwinding 2020-03-25 16:13:21 -03:00
perf-data.txt perf tools: Correct title markers for asciidoctor 2018-03-07 10:26:32 -03:00
perf-diff.txt perf diff: Report noisy for cycles diff 2019-10-11 10:57:00 -03:00
perf-evlist.txt perf evlist: Document missing --force option 2017-11-16 14:50:07 -03:00
perf-ftrace.txt perf tools: Correct title markers for asciidoctor 2018-03-07 10:26:32 -03:00
perf-help.txt
perf-inject.txt perf intel-pt: Add Intel PT man page references 2020-03-11 11:00:09 -03:00
perf-intel-pt.txt perf intel-pt: Add Intel PT man page references 2020-03-11 11:00:09 -03:00
perf-kallsyms.txt perf tools: Correct title markers for asciidoctor 2018-03-07 10:26:32 -03:00
perf-kmem.txt perf kmem: Document a missing option & an argument 2018-02-16 14:55:42 -03:00
perf-kvm.txt perf kvm: Clarify the 'perf kvm' -i and -o command line options 2019-12-02 15:38:59 -03:00
perf-list.txt perf parser: Add support to specify rXXX event with pmu 2020-04-18 09:05:00 -03:00
perf-lock.txt perf lock: Document missing options 2017-11-16 14:50:04 -03:00
perf-mem.txt perf mem/c2c: Fix perf_mem_events to support powerpc 2019-02-04 11:32:14 -03:00
perf-probe.txt perf-probe: Add user memory access attribute support 2019-05-25 23:04:42 -04:00
perf-record.txt perf record: Add num-synthesize-threads option 2020-04-23 11:10:41 -03:00
perf-report.txt perf report: Add option to enable the LBR stitching approach 2020-04-18 09:05:01 -03:00
perf-sched.txt perf sched timehist: Add support for filtering on CPU 2020-01-06 11:46:09 -03:00
perf-script-perl.txt perf tools: Correct title markers for asciidoctor 2018-03-07 10:26:32 -03:00
perf-script-python.txt perf script python: Add dict fields introduction to Documentation 2018-06-06 15:40:10 -03:00
perf-script.txt perf script: Add option to enable the LBR stitching approach 2020-04-18 09:05:01 -03:00
perf-stat.txt perf stat: Improve runtime stat for interval mode 2020-04-23 11:03:46 -03:00
perf-test.txt perf test: Add -F/--dont-fork option 2016-06-30 18:27:45 -03:00
perf-timechart.txt perf timechart: Document missing --force option 2017-11-16 14:50:06 -03:00
perf-top.txt perf top: Add option to enable the LBR stitching approach 2020-04-18 09:05:01 -03:00
perf-trace.txt perf trace: Introduce --errno-summary 2019-10-15 13:03:49 -03:00
perf-version.txt perf version: Add man page 2018-04-02 13:52:23 -03:00
perf.data-directory-format.txt perf record: Put a copy of kcore into the perf.data directory 2019-11-06 15:43:05 -03:00
perf.data-file-format.txt perf header: Support CPU PMU capabilities 2020-04-18 09:05:00 -03:00
perf.txt perf tool: Provide an option to print perf_event_open args and return value 2019-11-12 08:32:27 -03:00
perfconfig.example perf config: Show default report configuration in example and docs 2016-09-01 09:44:13 -03:00
tips.txt perf tools: Fix typos / broken sentences 2019-07-02 16:08:16 -03:00