linux/tools
Alexey Budankov 2566349648 perf record: Enable LBR callstack capture jointly with thread stack
Enable '-j stack' applicability together with '--call-graph dwarf'
option so thread stack data and LBR call stack could be captured
jointly:

  $ perf record -g --call-graph dwarf,1024 -j stack,u -- stack_test

Collected LBR call stack can be used to augment DWARF call stack
calculated from the raw thread stack data and to provide more
comprehensive call stack information for cases when collected SIZE is
not enough to cover complete thread stack.

Such cases are typical for workloads that allocate large arrays of data
on its threads stacks or the possible SIZE to collect can't be large
enough due to workload nature or system configuration and this is where
hardware captured LBR call stacks can provide missing stack frames.
Possible DWARF plus LBR call stacks consolidation algorithm description
follows.

With this patch set perf report command UI currently ignores collected
LBR call stack data and still provides DWARF based call stacks
information.

  ===========================================================================

  Overview:

   Legend:

   THS - thread stack
   CTX - thread register context
   SWS - software stack
   SSF - skipped stack frames
   PSS - Perf sample stack

   ip,sp,bp - HW registers values
   d        - allocated stack regions
   kip      - ip address in the kernel space
   K        - captured thread stack size

        THS

       -----
       |   |<-stack bottom
        ...
       |---|
       |ip4|
       |---|         PSS = SWS(THS(K))
       |   |
   --> |   |
   |   |d3 |                  user/
   |   |---|         user PSS kernel PSS
   |   |ip3|         ------   ------
   |   |---|         |SSF |   |SSF |
   |   |   |          ....     ....
   |   |   |         ------   ------
   |   |d2 |         | -1 |   | -1 |
       |---|   user  ------   ------
   K   |ip2|   CTX   |ip3 |   |ip3 |
       |---|         |----|   |----|
   |   |d1 |   ...   |ip2 | , |ip2 |
   |   |---|  |---|  |----|   |----|
   |   |ip1|  |bp0|  |ip1 |   |ip1 |
   |   |---|  |---|  |----|   |----|
   |   |   |  |ip0|->|ip0 |   |ip0 |<-user stack top
   |   |   |  |---|  ------   ------
   |   |   |<-|sp0|<-stack    |kip0|<-kernel stack bottom
   --> -----  -----   top     |----|
                              |kip1|
                              |----|
		              |kip2|
		              |----|
                               ....
			      |    |<-kernel stack top
                              ------

  Algorithm details:

   Legend:

   HWS - hardware stack
   K-SWS - kernel software stack

			 BRANCH
			 TABLE

		 HWS      ip   ip
			  from to
		 ------  -----------
		 |ip7`|  |ip7`|    |
		 |----|  |----|----|
		 |ip6`|  |ip6`|    |
	user PSS |----|  |----|----|
		 |ip5`|  |ip5`|    |
	------   |----|  |----|----|
	| -1 |   |ip4`|  |ip4`|    |
	------   |----|  |----|----|
	|ip3 |~~~|ip3`|  |ip3`|    |
	|----|   |----|  |----|----|
	|ip2 |~~~|ip2`|  |ip2`|    |
	|----| 	 |----|  |----|----|
	|ip1 |~~~|ip1`|  |ip1`|ip0`|
	|----| 	 |----|  -----------
	|ip0 |~~~|ip0`|<---------'
	------   ------

	1. if (sym(ipj) == sym(ipj`)), j=0-3 ===> user PSS
	2. ipj`                      , j=4-7 ===> user PSS

  Augmented PSS = A_SWS(SWS(THS(K)), HWS):

	         user/
       user PSS  kernel PSS

	------   ------
	|ip7`|   |ip7`|<-user PSS bottom
	|----|   |----|
	|ip6`|   |ip6`|
	|----|   |----|
    HWS	|ip5`|   |ip5`|
	|----|   |----|
	|ip4`|   |ip4`|
	------   ------
	|ip3 |   |ip3 |
	|----|   |----|
    SWS |ip2 |   |ip2 |
	|----|   |----|
	|ip1 |   |ip1 |
	|----|   |----|
	|ip0 |   |ip0 |<-user PSS top
	------   ------
		 |kip0|<-kernel PSS bottom
		 |----|
		 |kip1|
	   K-SWS |----|
		 |kip2|
		 |----|
		 |kip3|<-kernel PSS top
		 ------

                  APSS

Committer testing:

Before:

  # perf record -g --call-graph dwarf,1024 -j stack,u ls > /dev/null
  unknown branch filter stack, check man page

   Usage: perf record [<options>] [<command>]
      or: perf record [<options>] -- <command> [<options>]

      -j, --branch-filter <branch filter mask>
                            branch stack filter modes
  # perf record -g --call-graph dwarf,1024 -j u ls > /dev/null
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.054 MB perf.data (12 samples) ]
  # perf evlist -v
  cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|PERIOD|BRANCH_STACK|REGS_USER|STACK_USER|DATA_SRC, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY, sample_regs_user: 0xff0fff, sample_stack_user: 1024
   #

After:

  # perf record -g --call-graph dwarf,1024 -j stack,u ls > /dev/null
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.044 MB perf.data (11 samples) ]
  [root@quaco ~]# perf evlist -v
  cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|PERIOD|BRANCH_STACK|REGS_USER|STACK_USER|DATA_SRC, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: USER|CALL_STACK, sample_regs_user: 0xff0fff, sample_stack_user: 1024
  #

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/e9e00090-66fb-d2a4-c90f-1d12344f7788@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-20 12:18:58 -03:00
..
accounting
arch tools arch x86: Sync asm/cpufeatures.h with the with the kernel 2019-08-20 12:17:09 -03:00
bpf tools: bpftool: add raw_tracepoint_writable prog type to header 2019-07-12 15:14:42 +02:00
build tools build: Add capability-related feature detection 2019-08-12 17:14:14 -03:00
cgroup
crypto
debugging
firewire treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
firmware Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
gpio Bulk GPIO changes for the v5.3 kernel cycle: 2019-07-09 09:07:00 -07:00
hv treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 280 2019-06-05 17:36:36 +02:00
iio Second set of IIO device support, features, cleanups and minor fixes for 5.3. 2019-07-01 10:58:13 +02:00
include tools headers: Synchronize linux/bits.h with the kernel sources 2019-08-20 12:09:21 -03:00
io_uring
kvm/kvm_stat treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 2019-06-19 17:09:53 +02:00
laptop treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505 2019-06-19 17:11:22 +02:00
leds
lib tools lib traceevent: Fix "robust" test of do_generate_dynamic_list_file 2019-08-20 12:18:54 -03:00
memory-model tools/memory-model: Improve data-race detection 2019-06-24 09:08:54 -07:00
nfsd
objtool objtool: Improve UACCESS coverage 2019-07-25 08:36:39 +02:00
pci pci-v5.3-changes 2019-07-15 20:44:49 -07:00
pcmcia treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 247 2019-06-19 17:09:08 +02:00
perf perf record: Enable LBR callstack capture jointly with thread stack 2019-08-20 12:18:58 -03:00
power kbuild: create *.mod with full directory path and remove MODVERDIR 2019-07-18 02:19:31 +09:00
scripts perf build: Do not use -Wshadow on gcc < 4.8 2019-07-23 09:04:54 -03:00
spi treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 178 2019-05-30 11:29:19 -07:00
testing Bugfixes (arm and x86) and cleanups. 2019-08-09 15:46:29 -07:00
thermal/tmon
time treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 282 2019-06-05 17:36:37 +02:00
usb treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 2019-05-30 11:26:37 -07:00
virtio treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482 2019-06-19 17:09:52 +02:00
vm tools/vm/slabinfo: add sorting info to help menu 2019-07-12 11:05:46 -07:00
wmi treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 2019-06-19 17:09:55 +02:00
Makefile tools: Keep list of tools in alphabetical order 2019-08-14 10:59:59 -03:00