linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-21 02:21:36 +00:00

History

Stephane Eranian d99c22eabe perf record: Add num-synthesize-threads option To control degree of parallelism of the synthesize_mmap() code which is scanning /proc/PID/task/PID/maps and can be time consuming. Mimic perf top way of handling the option. If not specified will default to 1 thread, i.e. default behavior before this option. On a desktop computer the processing of /proc/PID/task/PID/maps isn't slow enough to warrant parallel processing and the thread creation has some cost - hence the default of 1. On a loaded server with >100 cores it is possible to see synthesis times in the order of seconds and in this case having the option is desirable. As the processing is a synchronization point, it is legitimate to worry if Amdahl's law will apply to this patch. Profiling with this patch in place: https://lore.kernel.org/lkml/20200415054050.31645-4-irogers@google.com/ shows: ... - 32.59% __perf_event__synthesize_threads - 32.54% __event__synthesize_thread + 22.13% perf_event__synthesize_mmap_events + 6.68% perf_event__get_comm_ids.constprop.0 + 1.49% process_synthesized_event + 1.29% __GI___readdir64 + 0.60% __opendir ... That is the processing is 1.49% of execution time and there is plenty to make parallel. This is shown in the benchmark in this patch: https://lore.kernel.org/lkml/20200415054050.31645-2-irogers@google.com/ Computing performance of multi threaded perf event synthesis by synthesizing events on CPU 0: Number of synthesis threads: 1 Average synthesis took: 127729.000 usec (+- 3372.880 usec) Average num. events: 21548.600 (+- 0.306) Average time per event 5.927 usec Number of synthesis threads: 2 Average synthesis took: 88863.500 usec (+- 385.168 usec) Average num. events: 21552.800 (+- 0.327) Average time per event 4.123 usec Number of synthesis threads: 3 Average synthesis took: 83257.400 usec (+- 348.617 usec) Average num. events: 21553.200 (+- 0.327) Average time per event 3.863 usec Number of synthesis threads: 4 Average synthesis took: 75093.000 usec (+- 422.978 usec) Average num. events: 21554.200 (+- 0.200) Average time per event 3.484 usec Number of synthesis threads: 5 Average synthesis took: 64896.600 usec (+- 353.348 usec) Average num. events: 21558.000 (+- 0.000) Average time per event 3.010 usec Number of synthesis threads: 6 Average synthesis took: 59210.200 usec (+- 342.890 usec) Average num. events: 21560.000 (+- 0.000) Average time per event 2.746 usec Number of synthesis threads: 7 Average synthesis took: 54093.900 usec (+- 306.247 usec) Average num. events: 21562.000 (+- 0.000) Average time per event 2.509 usec Number of synthesis threads: 8 Average synthesis took: 48938.700 usec (+- 341.732 usec) Average num. events: 21564.000 (+- 0.000) Average time per event 2.269 usec Where average time per synthesized event goes from 5.927 usec with 1 thread to 2.269 usec with 8. This isn't a linear speed up as not all of synthesize code has been made parallel. If the synthesis time was about 10 seconds then using 8 threads may bring this down to less than 4. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Jones <tonyj@suse.de> Cc: yuzhoujian <yuzhoujian@didichuxing.com> Link: http://lore.kernel.org/lkml/20200422155038.9380-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>		2020-04-23 11:10:41 -03:00
..
android.txt	perf tools: Update android build documentation	2016-07-04 20:27:27 -03:00
asciidoc.conf	perf docs: Allow man page date to be specified	2019-09-27 09:26:14 -03:00
asciidoctor-extensions.rb	perf Documentation: Support for asciidoctor	2018-04-26 13:47:10 -03:00
build-xed.txt	perf script: Add --insn-trace for instruction decoding	2018-10-24 15:29:50 -03:00
Build.txt	perf tools: Add doc about how to build perf with Asan and UBSan	2019-03-19 16:52:04 -03:00
callchain-overhead-calculation.txt	perf tools: Document --children option in more detail	2015-04-29 10:38:06 -03:00
db-export.txt	perf db-export: Add brief documentation	2019-06-05 09:47:57 -03:00
examples.txt	perf record: Remove -f/--force option	2013-07-08 17:37:25 -03:00
intel-bts.txt	perf tools: Add Intel BTS support	2015-08-21 11:34:10 -03:00
intel-pt.txt	perf intel-pt: Update intel-pt.txt file with new location of the documentation	2020-03-11 11:00:33 -03:00
itrace.txt	perf auxtrace: Add an option to synthesize callchains for regular events	2020-04-16 12:19:15 -03:00
jit-interface.txt	perf symbols: Add description of JIT interface	2012-08-13 14:55:02 -03:00
jitdump-specification.txt	perf docs: Correct and clarify jitdump spec	2019-09-30 17:29:51 -03:00
Makefile	perf doc: allow ASCIIDOC_EXTRA to be an argument	2020-04-18 09:05:00 -03:00
manpage-1.72.xsl
manpage-base.xsl
manpage-bold-literal.xsl
manpage-normal.xsl
manpage-suppress-sp.xsl
perf-annotate.txt	perf tools: Support --prefix/--prefix-strip	2020-01-14 12:02:19 -03:00
perf-archive.txt	perf archive: Remove duplicated 'runs' in man page	2013-12-09 15:21:45 -03:00
perf-bench.txt	perf bench: Add event synthesis benchmark	2020-04-16 12:19:12 -03:00
perf-buildid-cache.txt	perf buildid-cache: Support --purge-all option	2018-04-26 09:30:26 -03:00
perf-buildid-list.txt	perf report: Accept fifos as input file	2011-12-23 17:01:03 -02:00
perf-c2c.txt	perf c2c: Add option to enable the LBR stitching approach	2020-04-18 09:05:01 -03:00
perf-config.txt	perf callchain: Update docs regarding kernel/user space unwinding	2020-03-25 16:13:21 -03:00
perf-data.txt	perf tools: Correct title markers for asciidoctor	2018-03-07 10:26:32 -03:00
perf-diff.txt	perf diff: Report noisy for cycles diff	2019-10-11 10:57:00 -03:00
perf-evlist.txt	perf evlist: Document missing --force option	2017-11-16 14:50:07 -03:00
perf-ftrace.txt	perf tools: Correct title markers for asciidoctor	2018-03-07 10:26:32 -03:00
perf-help.txt
perf-inject.txt	perf intel-pt: Add Intel PT man page references	2020-03-11 11:00:09 -03:00
perf-intel-pt.txt	perf intel-pt: Add Intel PT man page references	2020-03-11 11:00:09 -03:00
perf-kallsyms.txt	perf tools: Correct title markers for asciidoctor	2018-03-07 10:26:32 -03:00
perf-kmem.txt	perf kmem: Document a missing option & an argument	2018-02-16 14:55:42 -03:00
perf-kvm.txt	perf kvm: Clarify the 'perf kvm' -i and -o command line options	2019-12-02 15:38:59 -03:00
perf-list.txt	perf parser: Add support to specify rXXX event with pmu	2020-04-18 09:05:00 -03:00
perf-lock.txt	perf lock: Document missing options	2017-11-16 14:50:04 -03:00
perf-mem.txt	perf mem/c2c: Fix perf_mem_events to support powerpc	2019-02-04 11:32:14 -03:00
perf-probe.txt	perf-probe: Add user memory access attribute support	2019-05-25 23:04:42 -04:00
perf-record.txt	perf record: Add num-synthesize-threads option	2020-04-23 11:10:41 -03:00
perf-report.txt	perf report: Add option to enable the LBR stitching approach	2020-04-18 09:05:01 -03:00
perf-sched.txt	perf sched timehist: Add support for filtering on CPU	2020-01-06 11:46:09 -03:00
perf-script-perl.txt	perf tools: Correct title markers for asciidoctor	2018-03-07 10:26:32 -03:00
perf-script-python.txt	perf script python: Add dict fields introduction to Documentation	2018-06-06 15:40:10 -03:00
perf-script.txt	perf script: Add option to enable the LBR stitching approach	2020-04-18 09:05:01 -03:00
perf-stat.txt	perf stat: Improve runtime stat for interval mode	2020-04-23 11:03:46 -03:00
perf-test.txt	perf test: Add -F/--dont-fork option	2016-06-30 18:27:45 -03:00
perf-timechart.txt	perf timechart: Document missing --force option	2017-11-16 14:50:06 -03:00
perf-top.txt	perf top: Add option to enable the LBR stitching approach	2020-04-18 09:05:01 -03:00
perf-trace.txt	perf trace: Introduce --errno-summary	2019-10-15 13:03:49 -03:00
perf-version.txt	perf version: Add man page	2018-04-02 13:52:23 -03:00
perf.data-directory-format.txt	perf record: Put a copy of kcore into the perf.data directory	2019-11-06 15:43:05 -03:00
perf.data-file-format.txt	perf header: Support CPU PMU capabilities	2020-04-18 09:05:00 -03:00
perf.txt	perf tool: Provide an option to print perf_event_open args and return value	2019-11-12 08:32:27 -03:00
perfconfig.example	perf config: Show default report configuration in example and docs	2016-09-01 09:44:13 -03:00
tips.txt	perf tools: Fix typos / broken sentences	2019-07-02 16:08:16 -03:00