Commit Graph

22723 Commits

Author SHA1 Message Date
Jiri Olsa
bb91a073ed perf tools: Allow to build with -ltcmalloc
By using "make TCMALLOC=1" you can enable perf to be build for usage
with libtcmalloc.so (gperftools).

Get heap profile (tools/perf directory):

  $ <install gperftools>
  $ make TCMALLOC=1 DEBUG=1
  $ HEAPPROFILE=/tmp/heapprof ./perf ...
  $ pprof ./perf /tmp/heapprof.000*
  (pprof) top
  Total: 2335.5 MB
    1735.1  74.3%  74.3%   1735.1  74.3% memdup
     402.0  17.2%  91.5%    402.0  17.2% zalloc
     140.2   6.0%  97.5%    145.8   6.2% map__new
      33.6   1.4%  98.9%     33.6   1.4% symbol__new
      12.4   0.5%  99.5%     12.4   0.5% alloc_event
       6.2   0.3%  99.7%      6.2   0.3% nsinfo__new
       5.5   0.2% 100.0%      5.5   0.2% nsinfo__copy
       0.3   0.0% 100.0%      0.3   0.0% dso__new
       0.1   0.0% 100.0%      0.1   0.0% do_read_string
       0.0   0.0% 100.0%      0.0   0.0% __GI__IO_file_doallocate

See callstack:
  $ pprof --pdf ./perf /tmp/heapprof.00* > callstack.pdf
  $ pprof --web ./perf /tmp/heapprof.00*

Committer testing:

Install gperftools, on fedora:

  # dnf install gperftools-devel

Then build:

 $ make TCMALLOC=1 DEBUG=1 -C tools/perf O=/tmp/build/perf install-bin

Verify that it linked against the right library:

  $ ldd ~/bin/perf | grep tcma
	libtcmalloc.so.4 => /lib64/libtcmalloc.so.4 (0x00007fb2953a7000)
  $

Run 'perf trace' system wide for 1 minute:

  # HEAPPROFILE=/tmp/heapprof perf trace -a sleep 1m
  <SNIP>
   59985.524 ( 0.006 ms): Web Content/20354 recvmsg(fd: 9<socket:[1762817]>, msg: 0x7ffee5fdafb0) = -1 EAGAIN (Resource temporarily unavailable)
   59985.536 ( 0.005 ms): Web Content/20354 recvmsg(fd: 9<socket:[1762817]>, msg: 0x7ffee5fdafc0) = -1 EAGAIN (Resource temporarily unavailable)
   59981.956 (10.143 ms): SCTP timer/21716  ... [continued]: select())                            = 0 (Timeout)
   59985.549 (         ): Web Content/20354 poll(ufds: 0x7f1df38af180, nfds: 3, timeout_msecs: 4294967295) ...
       0.926 (59999.481 ms): sleep/29764  ... [continued]: nanosleep())                           = 0
   59992.133 (         ): SCTP timer/21716 select(tvp: 0x7ff5bf7fee80)                            ...
   60000.477 ( 0.009 ms): sleep/29764 close(fd: 1)                                                = 0
   60000.493 ( 0.005 ms): sleep/29764 close(fd: 2)                                                = 0
   60000.514 (         ): sleep/29764 exit_group()                                                = ?
  Dumping heap profile to /tmp/heapprof.0001.heap (Exiting, 3 MB in use)
[root@quaco ~]#

Install pprof:

  # dnf install pprof

And run it:

  # pprof ~/bin/perf /tmp/heapprof.0001.heap
  Using local file /root/bin/perf.
  Using local file /tmp/heapprof.0001.heap.
  Welcome to pprof!  For help, type 'help'.
  (pprof) top
  Total: 4.0 MB
       1.7  42.0%  42.0%      2.2  54.1% map__new
       0.9  23.3%  65.3%      0.9  23.3% zalloc
       0.5  11.4%  76.7%      0.5  11.4% dso__new
       0.2   5.6%  82.3%      0.3   8.5% trace__sys_enter
       0.2   4.9%  87.2%      0.2   4.9% __GI___strdup
       0.2   3.8%  91.0%      0.2   3.8% new_term
       0.1   2.2%  93.2%      0.4  10.1% __perf_pmu__new_alias
       0.0   1.0%  94.3%      0.0   1.2% event_read_fields
       0.0   0.8%  95.1%      0.0   0.8% nsinfo__new
       0.0   0.7%  95.8%      0.1   3.2% trace__read_syscall_info
  (pprof)

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191013151427.11941-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-15 08:36:22 -03:00
Christian Kellner
2def297ec7 pidfd: add tests for NSpid info in fdinfo
Add a test that checks that if pid namespaces are configured the fdinfo
file of a pidfd contains an NSpid: entry containing the process id in
the current and additionally all nested namespaces. In the case that
a pidfd is from a pid namespace not in the same namespace hierarchy as
the process accessing the fdinfo file, ensure the 'NSpid' shows 0 for
that pidfd, analogous to the 'Pid' entry.

Signed-off-by: Christian Kellner <christian@kellner.me>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20191014162034.2185-2-ckellner@redhat.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2019-10-15 12:17:11 +02:00
Prarit Bhargava
1aa7177cdc tools/power/x86/intel-speed-select: Implement base-freq commands on CascadeLake-N
Add functionality for base-freq info|enable|disable info on CascadeLake-N.

Sample output:
Intel(R) Speed Select Technology
Executing on CPU model:85[0x55]
 package-0
  die-0
    cpu-0
      speed-select-base-freq
        high-priority-base-frequency(MHz):2700000
        high-priority-cpu-mask:00000000,0000e8c0
        high-priority-cpu-list:6,7,11,13,14,15
        low-priority-base-frequency(MHz):2100000
 package-1
  die-0
    cpu-20
      speed-select-base-freq
        high-priority-base-frequency(MHz):2700000
        high-priority-cpu-mask:0000000e,8c000000
        high-priority-cpu-list:26,27,31,33,34,35
        low-priority-base-frequency(MHz):2100000

The enable command always returns success, and the disable command always
returns failed because SST-BF cannot be enabled or disabled from the OS on
CascadeLake-N.

Enable command also have support for --auto|-a option, which sets cpufreq
scaling_min to max, so that the high priority base frequency can be the
required minimum for high priority cores. Disable command with -a/--auto
option reset the setting back to the min frequency.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-10-15 11:02:30 +03:00
Prarit Bhargava
062e4aac92 tools/power/x86/intel-speed-select: Implement 'perf-profile info' on CascadeLake-N
Add functionality for "perf-profile info" on CascadeLake-N.

Sample output:
intel-speed-select perf-profile info
Intel(R) Speed Select Technology
Executing on CPU model:85[0x55]
 package-0
  die-0
    cpu-0
      perf-profile-level-0
        cpu-count:20
        enable-cpu-mask:00000000,000fffff
        enable-cpu-list:0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
        thermal-design-power-ratio:23
        base-frequency(MHz):2300
        speed-select-turbo-freq:unsupported
        speed-select-base-freq:enabled
        speed-select-base-freq
          high-priority-base-frequency(MHz):2700000
          high-priority-cpu-mask:00000000,0000e8c0
          high-priority-cpu-list:6,7,11,13,14,15
          low-priority-base-frequency(MHz):2100000
 package-1
  die-0
    cpu-20
      perf-profile-level-0
        cpu-count:20
        enable-cpu-mask:000000ff,fff00000
        enable-cpu-list:20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39
        thermal-design-power-ratio:23
        base-frequency(MHz):2300
        speed-select-turbo-freq:unsupported
        speed-select-base-freq:enabled
        speed-select-base-freq
          high-priority-base-frequency(MHz):2700000
          high-priority-cpu-mask:0000000e,8c000000
          high-priority-cpu-list:26,27,31,33,34,35
          low-priority-base-frequency(MHz):2100000

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-10-15 11:02:30 +03:00
Prarit Bhargava
c829f0ef7b tools/power/x86/intel-speed-select: Implement CascadeLake-N help and command functions structures
CascadeLake-N only supports SST-BF and needs some of the perf-profile
commands, and the base-freq commands.

Add help functions, and create an empty command structures (the functions
will be implemented later in this patchset).  Call these functions
when running on CascadeLake-N.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-10-15 11:02:29 +03:00
Prarit Bhargava
1c1d935c84 tools/power/x86/intel-speed-select: Add check for CascadeLake-N models
Three CascadeLake-N models (6252N, 6230N, and 5218N) have SST-PBF support.

Return an error if the CascadeLake processor is not one of these specific
models.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-10-15 11:02:29 +03:00
Prarit Bhargava
210369dc73 tools/power/x86/intel-speed-select: Make process_command generic
Make the process_command take any help command and command list.  This
will make it easier to help commands and a command list for CascadeLake-N.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-10-15 11:02:29 +03:00
Prarit Bhargava
ce1326a2f9 tools/power/x86/intel-speed-select: Add int argument to command functions
The current code structure has similar but separate command functions for
the enable and disable operations.  This can be improved by adding an int
argument to the command function structure, and interpreting 1 as enable
and 0 as disable.  This change results in the removal of the disable
command functions.

Add int argument to the command function structure.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-10-15 11:02:29 +03:00
Srinivas Pandruvada
4e26fabfe1 tools/power/x86/intel-speed-select: Refuse to disable core-power when getting used
The turbo-freq feature is dependent on the core-power feature. If the
core-power feature is disabled while the turbo-freq feature is enabled,
this will break the turbo-freq feature. This is a firmware limitation,
where they can't return error under this scenario.

So when trying to disable core-power, make sure that the turbo-freq
feature is not enabled. If it enabled, return error if user is trying to
disable the core-power feature.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-10-15 11:02:29 +03:00
Srinivas Pandruvada
a6a82f9bcd tools/power/x86/intel-speed-select: Turbo-freq feature auto mode
Introduce --auto|-a option to turbo-freq enable feature, so that it
does in one step for users who are OK by setting all passed target cores
as high priority and set in CLOS 0 and remaining in CLOS 3. In this way,
users don't have to take multiple steps to enable turbo-freq feature. For
users who want more fine grain control, they can always use core-power
feature to set custom CLOS configuration and assignment.

While here also print the error to output when clos configuration fails.

For example
intel-speed-select -c 0-4 turbo-freq enable --auto

The above command will enable turbo-freq and core-power feature. Also
mark CPU 0 to CPU 4 as high priority.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-10-15 11:02:29 +03:00
Srinivas Pandruvada
354bd06f40 tools/power/x86/intel-speed-select: Base-freq feature auto mode
Introduce --auto|-a option to base-freq enable feature, so that it
does in one step for users who are OK by setting all cores with higher
base frequency to be set in CLOS 0 and remaining in CLOS 3. This option
also sets corresponding clos.min to CLOS 0 and CLOS3. In this way, users
don't have to take multiple steps to enable base-freq feature. For users
who want more fine grain control, they can always use core-power feature
to set custom CLOS configuration and assignment.

Also adjust cpufreq/scaling_min_freq for higher and lower priority cores.

For example user can use:
intel-speed-select base-freq enable --auto

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-10-15 11:02:29 +03:00
Srinivas Pandruvada
abd120e3bd tools/power/x86/intel-speed-select: Remove warning for unused result
Fix warning for:
isst-config.c: In function ‘set_cpu_online_offline’:
isst-config.c:221:3: warning: ignoring return value of ‘write’,
declared with attribute warn_unused_result [-Wunused-result]
   write(fd, "1\n", 2);

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2019-10-15 11:02:28 +03:00
Ingo Molnar
39b656ee9f Merge tag 'perf-core-for-mingo-5.5-20191011' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

perf trace:

  Arnaldo Carvalho de Melo:

  - Reuse the strace-like syscall_arg_fmt->scnprintf() beautification routines
    (convert integer arguments into strings, like open flags, etc) in tracepoint
    arguments.

    For now the type based scnprintf routines (pid_t, umode_t, etc) and the
    ones based in well known arg name based ("fd", etc) gets associated with
    tracepoint args of that type.

    A tracepoint only arg, "msr", for the msr:{write,read}_msr gets added as
    an initial step.

  - Introduce syscall_arg_fmt->strtoul() methods to be the reverse operation
    of ->scnprintf(), i.e. to go from a string to an integer.

  - Implement --filter, just like in 'perf record', that affects the tracepoint
    events specied thus far in the command line, use the ->strtoul() methods
    to allow strings in tables associated with beautifiers to the integers
    the in-kernel tracepoint (eBPF later) filters expect, e.g.:

     # perf trace --max-events 1 -e sched:*ipi --filter="cpu==1 || cpu==2"
      0.000 as/24630 sched:sched_wake_idle_without_ipi(cpu: 1)
     #

     # perf trace --max-events 1 --max-stack=32 -e msr:* --filter="msr==IA32_TSC_DEADLINE"
      207.000 cc1/19963 msr:write_msr(msr: IA32_TSC_DEADLINE, val: 5442316760822)
                                        do_trace_write_msr ([kernel.kallsyms])
                                        do_trace_write_msr ([kernel.kallsyms])
                                        lapic_next_deadline ([kernel.kallsyms])
                                        clockevents_program_event ([kernel.kallsyms])
                                        hrtimer_interrupt ([kernel.kallsyms])
                                        smp_apic_timer_interrupt ([kernel.kallsyms])
                                        apic_timer_interrupt ([kernel.kallsyms])
                                        [0x6ff66c] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                        [0x7047c3] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                        [0x707708] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                        execute_one_pass (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                        [0x4f3d37] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                        [0x4f3d49] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                        execute_pass_list (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                        cgraph_node::expand (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                        [0x2625b4] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                        symbol_table::finalize_compilation_unit (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                        [0x5ae8b9] (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                        toplev::main (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                        main (/usr/lib/gcc-cross/alpha-linux-gnu/8/cc1)
                                        [0x26b6a] (/usr/lib/x86_64-linux-gnu/libc-2.29.so)
     #
     # perf trace --max-events 8 -e msr:* --filter="msr==IA32_SPEC_CTRL"
         0.000 :13281/13281 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
         0.063 migration/3/25 msr:write_msr(msr: IA32_SPEC_CTRL)
         0.217 kworker/u16:1-/4826 msr:write_msr(msr: IA32_SPEC_CTRL)
         0.687 rcu_sched/11 msr:write_msr(msr: IA32_SPEC_CTRL)
         0.696 :13280/13280 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
         0.305 :13281/13281 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
         0.355 :13274/13274 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
         2.743 kworker/u16:0-/6711 msr:write_msr(msr: IA32_SPEC_CTRL)
     #
     # perf trace --max-events 8 --cpu 1 -e msr:* --filter="msr!=IA32_SPEC_CTRL && msr!=IA32_TSC_DEADLINE && msr != FS_BASE"
           0.000 mtr-packet/30819 msr:write_msr(msr: 0x830, val: 68719479037)
           0.096 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
         238.925 mtr-packet/30819 msr:write_msr(msr: 0x830, val: 8589936893)
         511.010 :0/0 msr:write_msr(msr: 0x830, val: 68719479037)
        1005.052 :0/0 msr:read_msr(msr: IA32_TSC_ADJUST)
        1235.131 CPU 0/KVM/3750 msr:write_msr(msr: 0x830, val: 4294969595)
        1235.195 CPU 0/KVM/3750 msr:read_msr(msr: IA32_SYSENTER_ESP, val: -2199023037952)
        1235.201 CPU 0/KVM/3750 msr:read_msr(msr: IA32_APICBASE, val: 4276096000)
     #

  - Default to not using libtraceevent and its plugins for beautifying
    tracepoint arguments, since now we're reusing the strace-like beatufiers.
    Use --libtraceevent_print (using just --libtrace is unambiguous and can
    be used as a short hand) to go back to those beautifiers.

    This will help in the transition, as can be seen in some of the sched tracepoints
    that still need some work in the libbeauty based mode:

    # trace --no-inherit -e msr:*,*sleep,sched:* sleep 1
         0.000 (         ): sched:sched_waking(comm: "trace", pid: 3319 (trace), prio: 120, success: 1)
         0.006 (         ): sched:sched_wakeup(comm: "trace", pid: 3319 (trace), prio: 120, success: 1)
         0.348 (         ): sched:sched_process_exec(filename: 140212596720100, pid: 3319 (sleep), old_pid: 3319 (sleep))
         0.490 (         ): msr:write_msr(msr: FS_BASE, val: 139631189321088)
         0.670 (         ): nanosleep(rqtp: 0x7ffc52c23bc0)                                    ...
         0.674 (         ): sched:sched_stat_runtime(comm: "sleep", pid: 3319 (sleep), runtime: 659259, vruntime: 78942418342)
         0.675 (         ): sched:sched_switch(prev_comm: "sleep", prev_pid: 3319 (sleep), prev_prio: 120, prev_state: 1, next_comm: "swapper/0", next_prio: 120)
      1001.059 (         ): sched:sched_waking(comm: "sleep", pid: 3319 (sleep), prio: 120, success: 1)
      1001.098 (         ): sched:sched_wakeup(comm: "sleep", pid: 3319 (sleep), prio: 120, success: 1)
         0.670 (1000.504 ms):  ... [continued]: nanosleep())                                        = 0
      1001.456 (         ): sched:sched_process_exit(comm: "sleep", pid: 3319 (sleep), prio: 120)
    # trace --libtrace --no-inherit -e msr:*,*sleep,sched:* sleep 1
    # trace --libtrace --no-inherit -e msr:*,*sleep,sched:* sleep 1
         0.000 (         ): sched:sched_waking(comm=trace pid=3323 prio=120 target_cpu=000)
         0.007 (         ): sched:sched_wakeup(comm=trace pid=3323 prio=120 target_cpu=000)
         0.382 (         ): sched:sched_process_exec(filename=/usr/bin/sleep pid=3323 old_pid=3323)
         0.525 (         ): msr:write_msr(c0000100, value 7f5d508a0580)
         0.713 (         ): nanosleep(rqtp: 0x7fff487fb4a0)                                    ...
         0.717 (         ): sched:sched_stat_runtime(comm=sleep pid=3323 runtime=617722 [ns] vruntime=78957731636 [ns])
         0.719 (         ): sched:sched_switch(prev_comm=sleep prev_pid=3323 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120)
      1001.117 (         ): sched:sched_waking(comm=sleep pid=3323 prio=120 target_cpu=000)
      1001.157 (         ): sched:sched_wakeup(comm=sleep pid=3323 prio=120 target_cpu=000)
         0.713 (1000.522 ms):  ... [continued]: nanosleep())                                        = 0
      1001.538 (         ): sched:sched_process_exit(comm=sleep pid=3323 prio=120)
    #

  - Make -v (verbose) mode be honoured for .perfconfig based trace.add_events,
    to help in diagnosing problems with building eBPF events (-e source.c).

  - When using eBPF syscall payload augmentation do not show strace-like
    syscalls when all the user specified was some tracepoint event, bringing
    the behaviour in line with that of when not using eBPF augmentation.

Intel PT:

  exported-sql-viewer GUI:

  Adrian Hunter:

  - Add LookupModel, HBoxLayout, VBoxLayout, global time range calculations
    so as to add a time chart by CPU.

perf script:

  Andi Kleen:

  - Allow --time (to specify a time span of interest) with --reltime

perf diff:

  Jin Yao:

  - Report noise for cycles diff, i.e. a histogram + stddev.
    (timestamps relative to start).

perf annotate:

  Arnaldo Carvalho de Melo:

  - Initialize env->cpuid when running in live mode (perf top), as it
    is used in some of the per arch annotation init routines.

samples bpf:

  Björn Töpel:

  - Fixup fallout of using tools/perf/perf-sys. from outside tools/perf.

Core:

  Ian Rogers:

  - Avoid 'sample_reg_masks' being const + weak, as this breaks with some
    compilers that constant-propagate from the weak symbol.

libperf:

  - First part of moving the perf_mmap class from tools/perf to libperf.

  - Propagate CFLAGS to libperf from the tools/perf Makefile.

Vendor events:

  John Garry:

  - Add entry in MAINTAINERS with reviewers for the for perf tool arm64
    pmu-events files.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-15 07:19:55 +02:00
David S. Miller
a98d62c3ee Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2019-10-14

The following pull-request contains BPF updates for your *net-next* tree.

12 days of development and
85 files changed, 1889 insertions(+), 1020 deletions(-)

The main changes are:

1) auto-generation of bpf_helper_defs.h, from Andrii.

2) split of bpf_helpers.h into bpf_{helpers, helper_defs, endian, tracing}.h
   and move into libbpf, from Andrii.

3) Track contents of read-only maps as scalars in the verifier, from Andrii.

4) small x86 JIT optimization, from Daniel.

5) cross compilation support, from Ivan.

6) bpf flow_dissector enhancements, from Jakub and Stanislav.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14 12:17:21 -07:00
Michael S. Tsirkin
edc5774c09 tools/virtio: xen stub
Fixes test module build.

Reported-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-10-13 09:38:27 -04:00
Andrii Nakryiko
598dc04fa0 selftests/bpf: Remove obsolete pahole/BTF support detection
Given lots of selftests won't work without recent enough Clang/LLVM that
fully supports BTF, there is no point in maintaining outdated BTF
support detection and fall-back to pahole logic. Just assume we have
everything we need.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191011220146.3798961-3-andriin@fb.com
2019-10-12 16:15:10 -07:00
Andrii Nakryiko
3fbe31ae7e selftests/bpf: Enforce libbpf build before BPF programs are built
Given BPF programs rely on libbpf's bpf_helper_defs.h, which is
auto-generated during libbpf build, libbpf build has to happen before
we attempt progs/*.c build. Enforce it as order-only dependency.

Fixes: 24f25763d6 ("libbpf: auto-generate list of BPF helper definitions")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191011220146.3798961-2-andriin@fb.com
2019-10-12 16:15:10 -07:00
Ivan Khoronzhuk
793a349cd8 libbpf: Add C/LDFLAGS to libbpf.so and test_libpf targets
In case of C/LDFLAGS there is no way to pass them correctly to build
command, for instance when --sysroot is used or external libraries
are used, like -lelf, wich can be absent in toolchain. This can be
used for samples/bpf cross-compiling allowing to get elf lib from
sysroot.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191011002808.28206-13-ivan.khoronzhuk@linaro.org
2019-10-12 16:08:59 -07:00
Ivan Khoronzhuk
5c26f9a783 libbpf: Don't use cxx to test_libpf target
No need to use C++ for test_libbpf target when libbpf is on C and it
can be tested with C, after this change the CXXFLAGS in makefiles can
be avoided, at least in bpf samples, when sysroot is used, passing
same C/LDFLAGS as for lib.

Add "return 0" in test_libbpf to avoid warn, but also remove spaces at
start of the lines to keep same style and avoid warns while apply.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191011002808.28206-12-ivan.khoronzhuk@linaro.org
2019-10-12 16:08:59 -07:00
Linus Torvalds
465a7e291f Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Mostly tooling fixes, but also a couple of updates for new Intel
  models (which are technically hw-enablement, but to users it's a fix
  to perf behavior on those new CPUs - hope this is fine), an AUX
  inheritance fix, event time-sharing fix, and a fix for lost non-perf
  NMI events on AMD systems"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  perf/x86/cstate: Add Tiger Lake CPU support
  perf/x86/msr: Add Tiger Lake CPU support
  perf/x86/intel: Add Tiger Lake CPU support
  perf/x86/cstate: Update C-state counters for Ice Lake
  perf/x86/msr: Add new CPU model numbers for Ice Lake
  perf/x86/cstate: Add Comet Lake CPU support
  perf/x86/msr: Add Comet Lake CPU support
  perf/x86/intel: Add Comet Lake CPU support
  perf/x86/amd: Change/fix NMI latency mitigation to use a timestamp
  perf/core: Fix corner case in perf_rotate_context()
  perf/core: Rework memory accounting in perf_mmap()
  perf/core: Fix inheritance of aux_output groups
  perf annotate: Don't return -1 for error when doing BPF disassembly
  perf annotate: Return appropriate error code for allocation failures
  perf annotate: Fix arch specific ->init() failure errors
  perf annotate: Propagate the symbol__annotate() error return
  perf annotate: Fix the signedness of failure returns
  perf annotate: Propagate perf_env__arch() error
  perf evsel: Fall back to global 'perf_env' in perf_evsel__env()
  perf tools: Propagate get_cpuid() error
  ...
2019-10-12 15:15:17 -07:00
Linus Torvalds
db60a5a035 Merge tag 'powerpc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
 "Fix a kernel crash in spufs_create_root() on Cell machines, since the
  new mount API went in.

  Fix a regression in our KVM code caused by our recent PCR changes.

  Avoid a warning message about a failing hypervisor API on systems that
  don't have that API.

  A couple of minor build fixes.

  Thanks to: Alexey Kardashevskiy, Alistair Popple, Desnes A. Nunes do
  Rosario, Emmanuel Nicolet, Jordan Niethe, Laurent Dufour, Stephen
  Rothwell"

* tag 'powerpc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  spufs: fix a crash in spufs_create_root()
  powerpc/kvm: Fix kvmppc_vcore->in_guest value in kvmhv_switch_to_host
  selftests/powerpc: Fix compile error on tlbie_test due to newer gcc
  powerpc/pseries: Remove confusing warning message.
  powerpc/64s/radix: Fix build failure with RADIX_MMU=n
2019-10-12 14:13:55 -07:00
David S. Miller
8caf8a91f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2019-10-12

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) a bunch of small fixes. Nothing critical.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-12 11:21:56 -07:00
Jiri Pirko
9b88fc5496 selftests: add netdevsim devlink health tests
Add basic tests to verify functionality of netdevsim reporters.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-11 21:04:28 -07:00
Andrii Nakryiko
e78dcbf414 libbpf: Handle invalid typedef emitted by old GCC
Old GCC versions are producing invalid typedef for __gnuc_va_list
pointing to void. Special-case this and emit valid:

typedef __builtin_va_list __gnuc_va_list;

Reported-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20191011032901.452042-1-andriin@fb.com
2019-10-11 23:11:55 +02:00
Andrii Nakryiko
409017847d libbpf: Generate more efficient BPF_CORE_READ code
Existing BPF_CORE_READ() macro generates slightly suboptimal code. If
there are intermediate pointers to be read, initial source pointer is
going to be assigned into a temporary variable and then temporary
variable is going to be uniformly used as a "source" pointer for all
intermediate pointer reads. Schematically (ignoring all the type casts),
BPF_CORE_READ(s, a, b, c) is expanded into:
({
	const void *__t = src;
	bpf_probe_read(&__t, sizeof(*__t), &__t->a);
	bpf_probe_read(&__t, sizeof(*__t), &__t->b);

	typeof(s->a->b->c) __r;
	bpf_probe_read(&__r, sizeof(*__r), &__t->c);
})

This initial `__t = src` makes calls more uniform, but causes slightly
less optimal register usage sometimes when compiled with Clang. This can
cascase into, e.g., more register spills.

This patch fixes this issue by generating more optimal sequence:
({
	const void *__t;
	bpf_probe_read(&__t, sizeof(*__t), &src->a); /* <-- src here */
	bpf_probe_read(&__t, sizeof(*__t), &__t->b);

	typeof(s->a->b->c) __r;
	bpf_probe_read(&__r, sizeof(*__r), &__t->c);
})

Fixes: 7db3822ab9 ("libbpf: Add BPF_CORE_READ/BPF_CORE_READ_INTO helpers")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191011023847.275936-1-andriin@fb.com
2019-10-11 22:35:46 +02:00
Jakub Sitnicki
f97eea1756 selftests/bpf: Check that flow dissector can be re-attached
Make sure a new flow dissector program can be attached to replace the old
one with a single syscall. Also check that attaching the same program twice
is prohibited.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20191011082946.22695-3-jakub@cloudflare.com
2019-10-11 22:26:22 +02:00
Jin Yao
cebf7d51a6 perf diff: Report noisy for cycles diff
This patch prints the stddev and hist for the cycles diff of program
block. It can help us to understand if the cycles is noisy or not.

This patch is inspired by Andi Kleen's patch:

  https://lwn.net/Articles/600471/

We create new option '--cycles-hist'.

Example:

  perf record -b ./div
  perf record -b ./div
  perf diff -c cycles

  # Baseline                                [Program Block Range] Cycles Diff  Shared Object      Symbol
  # ........  .......................................................... ....  .................  ............................
  #
      46.72%                                      [div.c:40 -> div.c:40]    0  div                [.] main
      46.72%                                      [div.c:42 -> div.c:44]    0  div                [.] main
      46.72%                                      [div.c:42 -> div.c:39]    0  div                [.] main
      20.54%                          [random_r.c:357 -> random_r.c:394]    1  libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:357 -> random_r.c:380]    0  libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -> random_r.c:388]    0  libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -> random_r.c:391]    0  libc-2.27.so       [.] __random_r
      17.04%                              [random.c:288 -> random.c:291]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:291 -> random.c:291]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:293 -> random.c:293]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -> random.c:295]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -> random.c:295]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:298 -> random.c:298]    0  libc-2.27.so       [.] __random
       8.40%                                      [div.c:22 -> div.c:25]    0  div                [.] compute_flag
       8.40%                                      [div.c:27 -> div.c:28]    0  div                [.] compute_flag
       5.14%                                    [rand.c:26 -> rand.c:27]    0  libc-2.27.so       [.] rand
       5.14%                                    [rand.c:28 -> rand.c:28]    0  libc-2.27.so       [.] rand
       2.15%                                  [rand@plt+0 -> rand@plt+0]    0  div                [.] rand@plt
       0.00%                                                                   [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
       0.00%                                [do_mmap+714 -> do_mmap+732]  -10  [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+737 -> do_mmap+765]    1  [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+262 -> do_mmap+299]    0  [kernel.kallsyms]  [k] do_mmap
       0.00%  [__x86_indirect_thunk_r15+0 -> __x86_indirect_thunk_r15+0]    7  [kernel.kallsyms]  [k] __x86_indirect_thunk_r15
       0.00%            [native_sched_clock+0 -> native_sched_clock+119]   -1  [kernel.kallsyms]  [k] native_sched_clock
       0.00%                 [native_write_msr+0 -> native_write_msr+16]  -13  [kernel.kallsyms]  [k] native_write_msr

When we enable the option '--cycles-hist', the output is

  perf diff -c cycles --cycles-hist

  # Baseline                                [Program Block Range] Cycles Diff        stddev/Hist  Shared Object      Symbol
  # ........  .......................................................... ....  .................  .................  ............................
  #
      46.72%                                      [div.c:40 -> div.c:40]    0  ± 37.8% ▁█▁▁██▁█   div                [.] main
      46.72%                                      [div.c:42 -> div.c:44]    0  ± 49.4% ▁▁▂█▂▂▂▂   div                [.] main
      46.72%                                      [div.c:42 -> div.c:39]    0  ± 24.1% ▃█▂▄▁▃▂▁   div                [.] main
      20.54%                          [random_r.c:357 -> random_r.c:394]    1  ± 33.5% ▅▂▁█▃▁▂▁   libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:357 -> random_r.c:380]    0  ± 39.4% ▁▁█▁██▅▁   libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -> random_r.c:388]    0                     libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -> random_r.c:391]    0  ± 41.2% ▁▃▁▂█▄▃▁   libc-2.27.so       [.] __random_r
      17.04%                              [random.c:288 -> random.c:291]    0  ± 48.8% ▁▁▁▁███▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:291 -> random.c:291]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:293 -> random.c:293]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -> random.c:295]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -> random.c:295]    0                     libc-2.27.so       [.] __random
      17.04%                              [random.c:298 -> random.c:298]    0  ± 75.6% ▃█▁▁▁▁▁▁   libc-2.27.so       [.] __random
       8.40%                                      [div.c:22 -> div.c:25]    0  ± 42.1% ▁▃▁▁███▁   div                [.] compute_flag
       8.40%                                      [div.c:27 -> div.c:28]    0  ± 41.8% ██▁▁▄▁▁▄   div                [.] compute_flag
       5.14%                                    [rand.c:26 -> rand.c:27]    0  ± 37.8% ▁▁▁████▁   libc-2.27.so       [.] rand
       5.14%                                    [rand.c:28 -> rand.c:28]    0                     libc-2.27.so       [.] rand
       2.15%                                  [rand@plt+0 -> rand@plt+0]    0                     div                [.] rand@plt
       0.00%                                                                                      [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
       0.00%                                [do_mmap+714 -> do_mmap+732]  -10                     [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+737 -> do_mmap+765]    1                     [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+262 -> do_mmap+299]    0                     [kernel.kallsyms]  [k] do_mmap
       0.00%  [__x86_indirect_thunk_r15+0 -> __x86_indirect_thunk_r15+0]    7                     [kernel.kallsyms]  [k] __x86_indirect_thunk_r15
       0.00%            [native_sched_clock+0 -> native_sched_clock+119]   -1  ± 38.5% ▄█▁        [kernel.kallsyms]  [k] native_sched_clock
       0.00%                 [native_write_msr+0 -> native_write_msr+16]  -13  ± 47.1% ▁█▇▃▁▁     [kernel.kallsyms]  [k] native_write_msr

 v8:
 ---
 Rebase to perf/core branch

 v7:
 ---
 1. v6 got Jiri's ACK.
 2. Rebase to latest perf/core branch.

 v6:
 ---
 1. Jiri provides better code for using data__hpp_register() in ui_init().
    Use this code in v6.

 v5:
 ---
 1. Refine the use of data__hpp_register() in ui_init() according to
    Jiri's suggestion.

 v4:
 ---
 1. Rename the new option from '--noisy' to '--cycles-hist'
 2. Remove the option '-n'.
 3. Only update the spark value and stats when '--cycles-hist' is enabled.
 4. Remove the code of printing '..'.

 v3:
 ---
 1. Move the histogram to a separate column
 2. Move the svals[] out of struct stats

 v2:
 ---
 Jiri got a compile error,

  CC       builtin-diff.o
  builtin-diff.c: In function ‘compute_cycles_diff’:
  builtin-diff.c:712:10: error: taking the absolute value of unsigned type ‘u64’ {aka ‘long unsigned int’} has no effect [-Werror=absolute-value]
  712 |          labs(pair->block_info->cycles_spark[i] -
      |          ^~~~

 Because the result of u64 - u64 is still u64. Now we change the type of
 cycles_spark[] to s64.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20190925011446.30678-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-11 10:57:00 -03:00
Jiri Olsa
55542113c6 perf tools: Propagate CFLAGS to libperf
Andi reported that 'make DEBUG=1' does not propagate to the libbperf
code. It's true also for the other flags. Changing the code to propagate
the global build flags to libperf compilation.

Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191011122155.15738-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-11 10:55:22 -03:00
Michael S. Tsirkin
c461e8df0c tools/virtio: more stubs
fix test module build.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-10-11 09:27:27 -04:00
Haishuang Yan
176a52043a selftests: netfilter: add ipvs tunnel test case
Test virtual server via ipip tunnel.

Tested:
# selftests: netfilter: ipvs.sh
# Testing DR mode...
# Testing NAT mode...
# Testing Tunnel mode...
# ipvs.sh: PASS
ok 6 selftests: netfilter: ipvs.sh

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2019-10-11 10:05:27 +02:00
Haishuang Yan
0ed1546206 selftests: netfilter: add ipvs nat test case
Test virtual server via NAT.

Tested:
# selftests: netfilter: ipvs.sh
# Testing DR mode...
# Testing NAT mode...
# ipvs.sh: PASS

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2019-10-11 10:05:24 +02:00
Haishuang Yan
867d219079 selftests: netfilter: add ipvs test script
Test virutal server via directing routing for IPv4.

Tested:

# selftests: netfilter: ipvs.sh
# Testing DR mode...
# ipvs.sh: PASS
ok 6 selftests: netfilter: ipvs.sh

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2019-10-11 10:05:20 +02:00
Andrii Nakryiko
666b2c10ee selftests/bpf: Add read-only map values propagation tests
Add tests checking that verifier does proper constant propagation for
read-only maps. If constant propagation didn't work, skipp_loop and
part_loop BPF programs would be rejected due to BPF verifier otherwise
not being able to prove they ever complete. With constant propagation,
though, they are succesfully validated as properly terminating loops.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191009201458.2679171-3-andriin@fb.com
2019-10-11 01:49:15 +02:00
Roman Mashak
71229c84aa tc-testing: updated pedit test cases
Added test case for layered IP operation for a single source IP4/IP6
address and a single destination IP4/IP6 address.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-10 16:46:04 -07:00
Christian Brauner
0eebfed295 seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE
Test whether a syscall can be performed after having been intercepted by
the seccomp notifier. The test uses dup() and kcmp() since it allows us to
nicely test whether the dup() syscall actually succeeded by comparing whether
the fds refer to the same underlying struct file.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Tycho Andersen <tycho@tycho.ws>
CC: Tyler Hicks <tyhicks@canonical.com>
Cc: stable@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20190920083007.11475-4-christian.brauner@ubuntu.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2019-10-10 14:45:51 -07:00
Christian Brauner
223e660bc7 seccomp: avoid overflow in implicit constant conversion
USER_NOTIF_MAGIC is assigned to int variables in this test so set it to INT_MAX
to avoid warnings:

seccomp_bpf.c: In function ‘user_notification_continue’:
seccomp_bpf.c:3088:26: warning: overflow in implicit constant conversion [-Woverflow]
 #define USER_NOTIF_MAGIC 116983961184613L
                          ^
seccomp_bpf.c:3572:15: note: in expansion of macro ‘USER_NOTIF_MAGIC’
  resp.error = USER_NOTIF_MAGIC;
               ^~~~~~~~~~~~~~~~

Fixes: 6a21cc50f0 ("seccomp: add a return code to trap to userspace")
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: stable@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: bpf@vger.kernel.org
Reviewed-by: Tycho Andersen <tycho@tycho.ws>
Link: https://lore.kernel.org/r/20190920083007.11475-3-christian.brauner@ubuntu.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2019-10-10 14:35:48 -07:00
Jiri Olsa
84227cb11f libperf: Adopt perf_evlist__filter_pollfd() from tools/perf
Introduce the perf_evlist__filter_pollfd function and export it in the
perf/evlist.h header, so that libperf users can check if the descriptor
is still alive.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-27-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-10 12:58:45 -03:00
Jiri Olsa
696f27c994 libperf: Introduce perf_evlist__purge()
Add a static perf_evlist__purge() function to purge evsels from a evlist.

Add also perf_evlist__for_each_entry_safe() which is used by
perf_evlist__purge().

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-26-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-10 12:57:22 -03:00
Jiri Olsa
93dd6e2831 libperf: Introduce perf_evlist__exit()
Add the perf_evlist__exit() function, so far it's not exported and added
only for internal use for perf and libperf.

USe it to release cpus/threads and pollfd array.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-25-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-10 12:56:01 -03:00
Jiri Olsa
230662e15e libperf: Move the pollfd allocation from tools/perf to libperf
It's needed in libperf only, so move it to the perf_evlist__mmap_ops()
function.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-24-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-10 12:54:35 -03:00
Jiri Olsa
285aaeac8c libperf: Centralize map refcnt setting
Currently when a new map is mmapped we set its refcnt to 2 in the
perf_evlist_mmap_ops::mmap callback.

Every mmap gets its refcnt set to 2 when it's first mmaped:

  - 1 for the current user, which will be taken out by a call to
    perf_evlist__munmap_filtered(), where we find out there's
    no more data comming from kernel to this mmap.

  - 1 for the drain code where in perf_mmap__consume() the mmap
    is released if it is empty.

Move this common setup into libperf's generic code before the mmap
callback is called.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-23-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-10 12:52:41 -03:00
Jiri Olsa
923d0f1868 perf evlist: Switch to libperf's mmap interface
Switch to the libperf mmap interface by calling directly
perf_evlist__mmap_ops() and removing perf's evlist__mmap_per_*
functions.

By switching to libperf perf_evlist__mmap() we need to operate over
'struct perf_mmap' in evlist__add_pollfd, so make the related changes
there.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-22-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-10 12:46:04 -03:00
Jiri Olsa
b80132b12a perf evlist: Introduce perf_evlist__mmap_cb_mmap()
Add the perf_evlist__mmap_cb_mmap() function to call perf specific
mmap__mmap() function during perf_evlist__mmap_ops() call.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-21-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-10 12:44:59 -03:00
Jiri Olsa
bb1b1885e2 perf evlist: Introduce perf_evlist__mmap_cb_get()
Add the perf_evlist__mmap_cb_get() function to return 'struct perf_mmap'
object during perf_evlist__mmap_ops() call.

The array of 'struct mmap' is allocated via evlist__alloc_mmap(), in
this callback we simply returns pointer to the base object.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-20-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-10 12:30:21 -03:00
Jiri Olsa
9abd2ab237 perf tools: Introduce perf_evlist__mmap_cb_idx()
Add perf_evlist__mmap_cb_idx function to call auxtrace_mmap_params__set_idx()
on each new index during perf_evlist__mmap_ops call.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-19-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-10 12:23:52 -03:00
Jiri Olsa
b5911e7ac2 libperf: Introduce perf_evlist_mmap_ops::mmap callback
Add the perf_evlist_mmap_ops::mmap callback to be called in
mmap_per_evsel() to actually mmap the map.

Add libperf's perf_evlist__mmap_cb_mmap() function as libperf's mmap
callback.

New mmaped map gets refcount set to 2 in mmap__mmap(), we follow that in
mmap callback. We will move this to common place after we switch to
perf_evlist__mmap().

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-18-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-10 12:22:21 -03:00
Jiri Olsa
3a8bb58121 libperf: Add perf_evlist_mmap_ops::get callback
Add the perf_evlist_mmap_ops::get callback to be called in
mmap_per_evsel() to get/allocate the 'struct perf_mmap' object.

Add the libperf's perf_evlist__mmap_cb_get() function as libperf's get
callback.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-17-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-10 12:21:11 -03:00
Jiri Olsa
1fcbb75cc5 libperf: Introduce perf_evlist_mmap_ops::idx callback
Add the perf_evlist_mmap_ops::idx callback to be called in
mmap_per_cpu() and mmap_per_thread() with current cpu and thread
indexes.

It's used by current aux code, so perf will use this callback to set the
aux index.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-16-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-10 12:20:08 -03:00
Jiri Olsa
0b5ea10d4c libperf: Introduce perf_evlist__mmap_ops()
To be able to pass specific callbacks to evlist's mmap.

There will be a specific call to this function from perf's
evlist__mmap() and libperf's perf_evlist__mmap() functions in following
changes.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-15-jolsa@kernel.org
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
2019-10-10 12:18:00 -03:00
Jiri Olsa
d1a177595b libperf: Adopt perf_evlist__mmap()/munmap() from tools/perf
Add libperf's version of perf_evlist__mmap()/munmap() functions and
exporting them in the perf/evlist.h header.

It's the backbone of what we have in perf code. The following changes
will add needed callbacks and then we'll finally switch the perf code to
use libperf's version.

Add mmap/mmap_ovw 'struct perf_mmap' object arrays to hold maps for
libperf's evlist.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191007125344.14268-14-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-10-10 12:15:58 -03:00