Commit Graph

753424 Commits

Author SHA1 Message Date
Arnaldo Carvalho de Melo
9ac94e31ca perf tools: Read the cache line size lazily
It is not read as commonly as 'page_size', so it makes sense to read it
lazily, caching its value when it is first read.

Less files open unconditionally at startup.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-35xhrq91u94uc1djtclek1ie@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-17 16:03:34 -03:00
Arnaldo Carvalho de Melo
6e1690c4c0 tools include compiler-gcc: Add __pure attribute helper
Adopt it from the kernel sources, will be used soon.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-oubheiqj8edo5rzewt11cbn0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-17 15:17:21 -03:00
Arnaldo Carvalho de Melo
789e465058 tools lib api fs tracing_path: Make tracing_events_path private
Not anymore accessed outside this library, keep it private.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-wg1m07flfrg1rm06jjzie8si@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-17 14:51:23 -03:00
Arnaldo Carvalho de Melo
7014e0e3bf tools lib api fs tracing_path: Introduce opendir() method
That takes care of using the right call to get the tracing_path
directory, the one that will end up calling tracing_path_set() to figure
out where tracefs is mounted.

One more step in doing just lazy reading of system structures to reduce
the number of operations done unconditionaly at 'perf' start.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-42zzi0f274909bg9mxzl81bu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-17 14:50:38 -03:00
Arnaldo Carvalho de Melo
25a7d91427 perf parse-events: Use get/put_events_file()
Instead of accessing the trace_events_path variable directly, that may
not have been properly initialized wrt detecting where tracefs is
mounted.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-id7hzn1ydgkxbumeve5wapqz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-17 14:49:36 -03:00
Arnaldo Carvalho de Melo
c02cab228e perf tools: Reuse the path to the tracepoint /events/ directory
When using for_each_event() we needlessly rebuild the whole path to
the tracepoint directory, reuse the dir_path instead, saving some cycles
and reducing the size of the next patch.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-54bcs15n0cp6gwcgpc4hptyc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-17 14:25:07 -03:00
Arnaldo Carvalho de Melo
40c3c0c9ac tools lib api fs tracing_path: Introduce get/put_events_file() helpers
To make reading events files a tad more compact than with
get_tracing_files("events/foo").

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-do6xgtwpmfl8zjs1euxsd2du@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-17 12:01:50 -03:00
Arnaldo Carvalho de Melo
17c257e867 tools lib api: Unexport 'tracing_path' variable
One should use tracing_path_mount() instead, so more things get done
lazily instead of at every 'perf' tool call startup.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-fci4yll35idd9yuslp67vqc2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-16 16:27:14 -03:00
Arnaldo Carvalho de Melo
00a6270361 tools lib api: The tracing_mnt variable doesn't need to be global
Its only used in the file it is defined, so just make it static.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-p5x29u6mq2ml3mtnbg9844ad@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-16 16:20:12 -03:00
Arnaldo Carvalho de Melo
d01bd1ac92 perf config: Call perf_config__init() lazily
We check what perf_config__init() does at each perf_config() call,
namely if the static perf_config instance was created, so instead of
bailing out in that case, try to allocate it, bailing if it fails.

Now to get the perf_config() call out of the start of perf's main()
function, doing it also lazily.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-4bo45k6ivsmbxpfpdte4orsg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-16 16:11:09 -03:00
Ingo Molnar
5aafae8d09 perf/core improvements and fixes:
- Add '-e intel_pt//u' test to the 'parse-events' 'perf test' entry,
   to help avoiding regressions in the events parser such as one
   that caused a revert in v4.17-rc (Arnaldo Carvalho de Melo)
 
 - Fix NULL return handling in bpf__prepare_load() (YueHaibing)
 
 - Warn about 'perf buildid-cache --purge-all' failures (Ravi Bangoria)
 
 - Add infrastructure to help in writing eBPF C programs to be used
   with '-e name.c' type events in tools such as 'record' and 'trace',
   with headers for common constructs and an examples directory that
   will get populated as we add more such helpers and the 'perf bpf'
   branch that Jiri Olsa has been working on (Arnaldo Carvalho de Melo)
 
 - Handle uncore event aliases in small groups properly (Kan Liang)
 
 - Use the "_stest" symbol to identify the kernel map when loading kcore (Adrian Hunter)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEELb9bqkb7Te0zijNb1lAW81NSqkAFAlr8Pz8ACgkQ1lAW81NS
 qkAmYw/8CyEMpCeqFTwo1tF22EiEI3mWyWGB0U12MWipNtkvMtJRR5Lv5Oga3kyU
 56aZuKGcfGfnjpLjHQ61xZWhdHhuMfKLnnudm3ouIy67EDSkWJi/hnF27hhEH+w+
 V24ptj7vFY+q/4TVyMn2mWpajp/GSwZq8LIJxfPKWbz2nKebLol6CLp0/+dY+dDL
 pE2aDatc2HbvK07McnuiJ0dMnfqdPSR2c9fu/e59L7OO9rM5dHfwdTsT1aoDKS5v
 2FAqDGgNJRtrLOj3fMaUISC9iSKBwqCuuVm0Ub1QUIwdB0EtpyOwGTYw5Ek0yBPB
 8Xz6t/5FYFNk4VvLfRn1XV1sxb2W+jgi1c2sNThNRh4TY0oPMqL2VVyKFY3Rb8xz
 mDS4IVhnswyd66WeIdvA3QeRSS4f46fAijmrGhqo9ew0JiaQznH0OlGxvLKVZjWw
 C7Odu7F/gfdmAQIZrO5XWce8ItLHkYwiDj1yagusoGI41xDPZlaDfjksEDd5R2JA
 uiEq6v5RNHUYCyFqIZNLGX2yJpfIsppEHdaQ3gpb+bjMedLG2kbfwIDvFXpNlWri
 LUxGz16bYxbDCI2z/D46RtBOKtnkluIdJjiXkDUxjfbxyB1jAdeiaV0jYLPoV7fN
 H59NpgqHSYjUcig0pM5rKB1XS43S46+qsxn940c0r0fkpZFgUZI=
 =oM4D
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-4.18-20180516' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

- Add '-e intel_pt//u' test to the 'parse-events' 'perf test' entry,
  to help avoiding regressions in the events parser such as one
  that caused a revert in v4.17-rc (Arnaldo Carvalho de Melo)

- Fix NULL return handling in bpf__prepare_load() (YueHaibing)

- Warn about 'perf buildid-cache --purge-all' failures (Ravi Bangoria)

- Add infrastructure to help in writing eBPF C programs to be used
  with '-e name.c' type events in tools such as 'record' and 'trace',
  with headers for common constructs and an examples directory that
  will get populated as we add more such helpers and the 'perf bpf'
  branch that Jiri Olsa has been working on (Arnaldo Carvalho de Melo)

- Handle uncore event aliases in small groups properly (Kan Liang)

- Use the "_stest" symbol to identify the kernel map when loading kcore (Adrian Hunter)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-16 17:56:43 +02:00
YueHaibing
7a36a287de perf bpf: Fix NULL return handling in bpf__prepare_load()
bpf_object__open()/bpf_object__open_buffer can return error pointer or
NULL, check the return values with IS_ERR_OR_NULL() in bpf__prepare_load
and bpf__prepare_load_buffer

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: netdev@vger.kernel.org
Link: https://lkml.kernel.org/n/tip-psf4xwc09n62al2cb9s33v9h@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-16 10:01:55 -03:00
Kan Liang
3cdc5c2cb9 perf parse-events: Handle uncore event aliases in small groups properly
Perf stat doesn't count the uncore event aliases from the same uncore
block in a group, for example:

  perf stat -e '{unc_m_cas_count.all,unc_m_clockticks}' -a -I 1000
  #           time             counts unit events
       1.000447342      <not counted>      unc_m_cas_count.all
       1.000447342      <not counted>      unc_m_clockticks
       2.000740654      <not counted>      unc_m_cas_count.all
       2.000740654      <not counted>      unc_m_clockticks

The output is very misleading. It gives a wrong impression that the
uncore event doesn't work.

An uncore block could be composed by several PMUs. An uncore event alias
is a joint name which means the same event runs on all PMUs of a block.
Perf doesn't support mixed events from different PMUs in the same group.
It is wrong to put uncore event aliases in a big group.

The right way is to split the big group into multiple small groups which
only include the events from the same PMU.

Only uncore event aliases from the same uncore block should be specially
handled here. It doesn't make sense to mix the uncore events with other
uncore events from different blocks or even core events in a group.

With the patch:
  #           time             counts unit events
     1.001557653            140,833      unc_m_cas_count.all
     1.001557653      1,330,231,332      unc_m_clockticks
     2.002709483             85,007      unc_m_cas_count.all
     2.002709483      1,429,494,563      unc_m_clockticks

Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Agustin Vega-Frias <agustinv@codeaurora.org>
Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1525727623-19768-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-16 10:01:54 -03:00
Adrian Hunter
5654997838 perf tools: Use the "_stest" symbol to identify the kernel map when loading kcore
The first symbol is not necessarily in the kernel text.  Instead of
using the first symbol, use the _stest symbol to identify the kernel map
when loading kcore.

This allows for the introduction of symbols to identify the x86_64 PTI
entry trampolines.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1525866228-30321-6-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-15 14:31:25 -03:00
Arnaldo Carvalho de Melo
d8fc764d0b perf bpf: Add probe() helper to reduce kprobes boilerplate
So that kprobe definitions become:

  int probe(function, variables)(void *ctx, int err, var1, var2, ...)

The existing 5sec.c, got converted and goes from:

  SEC("func=hrtimer_nanosleep rqtp->tv_sec")
  int func(void *ctx, int err, long sec)
  {
  }

To:

  int probe(hrtimer_nanosleep, rqtp->tv_sec)(void *ctx, int err, long sec)
  {
  }

If we decide to add tv_nsec as well, then it becomes:

  $ cat tools/perf/examples/bpf/5sec.c
  #include <bpf.h>

  int probe(hrtimer_nanosleep, rqtp->tv_sec rqtp->tv_nsec)(void *ctx, int err, long sec, long nsec)
  {
	  return sec == 5;
  }

  license(GPL);
  $

And if we run it, system wide as before and run some 'sleep' with values
for the tv_nsec field, we get:

  # perf trace --no-syscalls -e tools/perf/examples/bpf/5sec.c
     0.000 perf_bpf_probe:hrtimer_nanosleep:(ffffffff9811b5f0) tv_sec=5 tv_nsec=100000000
  9641.650 perf_bpf_probe:hrtimer_nanosleep:(ffffffff9811b5f0) tv_sec=5 tv_nsec=123450001
  ^C#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-1v9r8f6ds5av0w9pcwpeknyl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-15 14:31:24 -03:00
Arnaldo Carvalho de Melo
1f477305ab perf bpf: Add license(NAME) helper
To further reduce boilerplate.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-vst6hj335s0ebxzqltes3nsc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-15 14:31:24 -03:00
Arnaldo Carvalho de Melo
7542b767b0 perf bpf: Add kprobe example to catch 5s naps
Description:

. Disable strace like syscall tracing (--no-syscalls), or try tracing
  just some (-e *sleep).

. Attach a filter function to a kernel function, returning when it should
  be considered, i.e. appear on the output:

  $ cat tools/perf/examples/bpf/5sec.c
  #include <bpf.h>

  SEC("func=hrtimer_nanosleep rqtp->tv_sec")
  int func(void *ctx, int err, long sec)
  {
	  return sec == 5;
  }

  char _license[] SEC("license") = "GPL";
  int _version SEC("version") = LINUX_VERSION_CODE;
  $

. Run it system wide, so that any sleep of >= 5 seconds and < than 6
  seconds gets caught.

. Ask for callgraphs using DWARF info, so that userspace can be unwound

. While this is running, run something like "sleep 5s".

  # perf trace --no-syscalls -e tools/perf/examples/bpf/5sec.c/call-graph=dwarf/
     0.000 perf_bpf_probe:func:(ffffffff9811b5f0) tv_sec=5
                                       hrtimer_nanosleep ([kernel.kallsyms])
                                       __x64_sys_nanosleep ([kernel.kallsyms])
                                       do_syscall_64 ([kernel.kallsyms])
                                       entry_SYSCALL_64 ([kernel.kallsyms])
                                       __GI___nanosleep (/usr/lib64/libc-2.26.so)
                                       rpl_nanosleep (/usr/bin/sleep)
                                       xnanosleep (/usr/bin/sleep)
                                       main (/usr/bin/sleep)
                                       __libc_start_main (/usr/lib64/libc-2.26.so)
                                       _start (/usr/bin/sleep)
  ^C#

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-2nmxth2l2h09f9gy85lyexcq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-15 14:31:24 -03:00
Arnaldo Carvalho de Melo
dd8e4ead6e perf bpf: Add bpf.h to be used in eBPF proggies
So, the first helper is the one shortening a variable/function section
attribute, from, for instance:

  char _license[] __attribute__((section("license"), used)) = "GPL";

to:

  char _license[] SEC("license") = "GPL";

Convert empty.c to that and it becomes:

  # cat ~acme/lib/examples/perf/bpf/empty.c
  #include <bpf.h>

  char _license[] SEC("license") = "GPL";
  int _version SEC("version") = LINUX_VERSION_CODE;
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-zmeg52dlvy51rdlhyumfl5yf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-15 14:31:23 -03:00
Arnaldo Carvalho de Melo
8f12a2ff00 perf bpf: Add 'examples' directories
The first one is the bare minimum that bpf infrastructure accepts before
it expects actual events to be set up:

  $ cat tools/perf/examples/bpf/empty.c
  char _license[] __attribute__((section("license"), used)) = "GPL";
  int _version __attribute__((section("version"), used)) = LINUX_VERSION_CODE;
  $

If you remove that "version" line, then it will be refused with:

  # perf trace -e tools/perf/examples/bpf/empty.c
  event syntax error: 'tools/perf/examples/bpf/empty.c'
                       \___ Failed to load tools/perf/examples/bpf/empty.c from source: 'version' section incorrect or lost

  (add -v to see detail)
  Run 'perf list' for a list of valid events

   Usage: perf trace [<options>] [<command>]
      or: perf trace [<options>] -- <command> [<options>]
      or: perf trace record [<options>] [<command>]
      or: perf trace record [<options>] -- <command> [<options>]

      -e, --event <event>   event/syscall selector. use 'perf list' to list available events
  #

The next ones will, step by step, show simple filters, then the needs
for headers will be made clear, it will be put in place and tested with
new examples, rinse, repeat.

Back to using this first one to test the perf+bpf infrastructure:

If we run it will fail, as no functions are present connecting with,
say, a tracepoint or a function using the kprobes or uprobes
infrastructure:

  # perf trace -e tools/perf/examples/bpf/empty.c
  WARNING: event parser found nothing
  invalid or unsupported event: 'tools/perf/examples/bpf/empty.c'
  Run 'perf list' for a list of valid events

   Usage: perf trace [<options>] [<command>]
      or: perf trace [<options>] -- <command> [<options>]
      or: perf trace record [<options>] [<command>]
      or: perf trace record [<options>] -- <command> [<options>]

      -e, --event <event>   event/syscall selector. use 'perf list' to list available events
  #

But, if we set things up to dump the generated object file to a file,
and do this after having run 'make install', still on the developer's
$HOME directory:

  # cat ~/.perfconfig
  [llvm]

	dump-obj = true
  #
  # perf trace -e ~acme/lib/examples/perf/bpf/empty.c
  LLVM: dumping /home/acme/lib/examples/perf/bpf/empty.o
  WARNING: event parser found nothing
  invalid or unsupported event: '/home/acme/lib/examples/perf/bpf/empty.c'
  <SNIP>
  #

We can look at the dumped object file:

  # ls -la ~acme/lib/examples/perf/bpf/empty.o
  -rw-r--r--. 1 root root 576 May  4 12:10 /home/acme/lib/examples/perf/bpf/empty.o
  # file ~acme/lib/examples/perf/bpf/empty.o
  /home/acme/lib/examples/perf/bpf/empty.o: ELF 64-bit LSB relocatable, *unknown arch 0xf7* version 1 (SYSV), not stripped
  # readelf -sw ~acme/lib/examples/perf/bpf/empty.o

  Symbol table '.symtab' contains 3 entries:
     Num:    Value          Size Type    Bind   Vis      Ndx Name
       0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
       1: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    3 _license
       2: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    4 _version
  #
  # tools/bpf/bpftool/bpftool --pretty ~acme/lib/examples/perf/bpf/empty.o
  null
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-y7dkhakejz3013o0w21n98xd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-15 14:31:23 -03:00
Arnaldo Carvalho de Melo
1b16fffa38 perf llvm-utils: Add bpf include path to clang command line
We'll start putting headers for helpers to be used in eBPF proggies in
there:

  # perf trace -v --no-syscalls -e empty.c |& grep "llvm compiling command : "
  llvm compiling command : /usr/lib64/ccache/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41100   -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h  -I/home/acme/lib/include/perf/bpf -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/4.17.0-rc3-00034-gf4ef6a438cee/build -c /home/acme/bpf/empty.c -target bpf -O2 -o -
  #

Notice the "-I/home/acme/lib/include/perf/bpf"

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-6xq94xro8xlb5s9urznh3f9k@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-15 14:31:17 -03:00
Ravi Bangoria
d8ed87bc17 perf buildid-cache: Warn --purge-all failures
Warn perf buildid-cache --purge-all failures in non verbose mode.

Ex.:

  $ sudo chown root:root /home/ravi/.debug -R
  $ sudo chmod 700 /home/ravi/.debug/ -R
  $ ./perf buildid-cache -P
    Couldn't remove some caches. Error: Permission denied.

Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20180510043651.12189-1-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-15 10:32:16 -03:00
Arnaldo Carvalho de Melo
b3f58c8da6 perf tests parse-events: Add intel_pt parse test
To avoid regressions such as the one fixed by 4a35a9027f ("Revert
"perf pmu: Fix pmu events parsing rule""), where '-e intel_pt//u' got
broken, with this new entry in this 'perf tests' subtest, we would have
caught it before pushing upstream.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-kw62fys9bwdgsp722so2ln1l@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-15 10:31:59 -03:00
Arnaldo Carvalho de Melo
291c161f6c Merge remote-tracking branch 'tip/perf/urgent' into perf/core
To pick up fixes, notably the revert for the intel_pt//u regression.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-05-15 10:30:17 -03:00
Ingo Molnar
f3903c9161 perf/urgent fixes:
- Fix segfault when processing unknown threads in cs-etm (Leo Yan)
 
 - Fix "perf test inet_pton" on s390 failing due to missing inline (Thomas Richter)
 
 - Display all available events on 'perf annotate --stdio' (Jin Yao)
 
 - Add missing newline when parsing empty BPF proggie (Arnaldo Carvalho de Melo)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEELb9bqkb7Te0zijNb1lAW81NSqkAFAlr5fx4ACgkQ1lAW81NS
 qkCmfQ//ZbpL9ezoBuAfX61jv2n44ksqUV9NK+MZMfXr7hMQwY0ny2i8C9r/yhv4
 W5pEDLcCyfEGRryEQgJSGw7w792WPFt2CT4NG/0dBqlKjzo+yaCXtbCrzfBqPgzs
 G8x95JhONDVe5bwlRHscI7k5dEp/Rics1x46gtCGNgeHrHaTm/RfGxXdR/Fp09id
 JaiO0czT2JxrtaLqim5v2Rm7k47Vc0i34BQwVQ/Tx0eRlGpyyUGMEM/6EWP355gO
 wG2lTtY24RMX9wEPBiEpxUz5mCZibZSOFSnn2AWYnhV1pDoyrG7j3A3PG4nIzZ4/
 UuOsKQWA6ojgpbnk5S7Z7RUIZLy+OpzYAvExsF4zCleoNZ+BdcxyxZHwKDB1wEW1
 9YJ3mWdeE4fJqWnJaRxzUpPChdW9VCOTcks6lIlFKfABjdB3+nfCjShbiSg7l/bU
 V8umQCFKxe5j+ZJomrZXGZldX6k1CslXuipYB+aAaGlR8wUl0Z50KW9cWWBHN5pJ
 3tVKLpedxm2KAsFG119m+aA/stpq3z2idtOMU+A6k/bjCHYplKn/NLnDWPbXti5N
 ac09fDrjBaPPerHhAcZ742H3Ttt8hJgWe7COR4Rv5q9PZoU+NT//38eFR/8NtAIY
 1ngk0KlrNM7bR6Muzgqr9zINX7aB5vi3fCeWTB2yKZXo9sZ9sqk=
 =FXHV
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo-4.17-20180514' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

- Fix segfault when processing unknown threads in cs-etm (Leo Yan)

- Fix "perf test inet_pton" on s390 failing due to missing inline (Thomas Richter)

- Display all available events on 'perf annotate --stdio' (Jin Yao)

- Add missing newline when parsing empty BPF proggie (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-15 08:20:45 +02:00
Linus Torvalds
67b8d5c708 Linux 4.17-rc5 2018-05-13 16:15:17 -07:00
Linus Torvalds
66e1c94db3 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/pti updates from Thomas Gleixner:
 "A mixed bag of fixes and updates for the ghosts which are hunting us.

  The scheduler fixes have been pulled into that branch to avoid
  conflicts.

   - A set of fixes to address a khread_parkme() race which caused lost
     wakeups and loss of state.

   - A deadlock fix for stop_machine() solved by moving the wakeups
     outside of the stopper_lock held region.

   - A set of Spectre V1 array access restrictions. The possible
     problematic spots were discuvered by Dan Carpenters new checks in
     smatch.

   - Removal of an unused file which was forgotten when the rest of that
     functionality was removed"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso: Remove unused file
  perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msr
  perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driver
  perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map()
  perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_*
  perf/core: Fix possible Spectre-v1 indexing for ->aux_pages[]
  sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]
  sched/core: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]
  sched/core: Introduce set_special_state()
  kthread, sched/wait: Fix kthread_parkme() completion issue
  kthread, sched/wait: Fix kthread_parkme() wait-loop
  sched/fair: Fix the update of blocked load when newly idle
  stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock
2018-05-13 10:53:08 -07:00
Linus Torvalds
86a4ac433b Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
 "Revert the new NUMA aware placement approach which turned out to
  create more problems than it solved"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "sched/numa: Delay retrying placement for automatic NUMA balance after wake_affine()"
2018-05-13 10:46:53 -07:00
Linus Torvalds
baeda7131f Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling fixes from Thomas Gleixner:
 "Another small set of perf tooling fixes and updates:

   - Revert "perf pmu: Fix pmu events parsing rule", as it broke Intel
     PT event description parsing (Arnaldo Carvalho de Melo)

   - Sync x86's cpufeatures.h and kvm UAPI headers with the kernel
     sources, suppressing the ABI drift warnings (Arnaldo Carvalho de
     Melo)

   - Remove duplicated entry for westmereep-dp in Intel's mapfile.csv
     (William Cohen)

   - Fix typo in 'perf bench numa' options description (Yisheng Xie)"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "perf pmu: Fix pmu events parsing rule"
  tools headers kvm: Sync ARM UAPI headers with the kernel sources
  tools headers kvm: Sync uapi/linux/kvm.h with the kernel sources
  tools headers: Sync x86 cpufeatures.h with the kernel sources
  perf vendor events intel: Remove duplicated entry for westmereep-dp in mapfile.csv
  perf bench numa: Fix typo in options
2018-05-13 10:44:32 -07:00
Linus Torvalds
0503fd658d another dma-mapping fix for 4.17-rc:
- just one little fix from Jean to avoid a harmless but very annoying
    warning, especially for the drm code
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlr34fULHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYN3hRAAn3GXCRs4g9Mzn9+AcdgOicLsMbOGDnxY8xxYdAUL
 S6dc2c9Jyh/oPL6A9VvyS5rnbO62i9IUQQBo7tnd9Mw6wqk4jT1zdxCPhRxgnoHt
 MZyV4rH2aJaOXA1idg/TSW/rxzH/JeHV7RYnyvRcrM1rI0kO3fQXBAqAbfLjAJi+
 o/rSOCP+DsS2K+RxPBdczZ7LeeJ+Md5SYNJ9VAUh+kUHr9k7aD2edhiyjphdgZ6k
 moa0vkLJHDUD11zJDu5v201j05KI04/lK2XExWtFhx0NaSx1L8DdvJVgIj3Z2dDD
 +XX5hJHfKymR9QVPspZz2n3q75qM/H3x4CxqKcudOctbnAvxjGCrMkA0d3j2cXA/
 3zyBgr4Fv+aXr5EqcH2EkpDcmoe1OVxkVeMGXZvXv8fPdAKZNkSAnscBTlAwXze1
 eWI+xTjDjGyknLcGvnJx+NeNW0XnU9epehpWp7yKtlgkA9ND9a8hPELKR0n+gHQ+
 rMfsl1oqStHnRluDnr1wPKNo3rQnZj7ClKKiyTf4Jks6hBTLECoKeDoGU/XyYURw
 Klf0382V5hjVwCJziUuNDU+ztb9fzja5nCFo5EGtzL82JPQP8rvOLFPYdPEpoZqu
 +Fp/j47n6ZJx/irR3jCPkIcTO8ztE73AqiB6uN2K1eqYl1+GdJ383PSjdVuZQxGp
 ohI=
 =dvUb
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-4.17-5' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fix from Christoph Hellwig:
 "Just one little fix from Jean to avoid a harmless but very annoying
  warning, especially for the drm code"

* tag 'dma-mapping-4.17-5' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: silent unwanted warning "buffer is full"
2018-05-13 10:28:53 -07:00
Linus Torvalds
ccda3c4b77 some small SMB3 fixes for 4.17-rc5, some for stable
-----BEGIN PGP SIGNATURE-----
 
 iQHHBAABCgAxFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAlr04iwTHHNtZnJlbmNo
 QGdtYWlsLmNvbQAKCRCKLL1wB3JPUaQ/C/4xelt5SEsr3AEj9SUJFwjHZJ09pSCI
 T1e/ztJLVorE2VAKhVub9bN5hP4vrAfUzWTHDeDB02eQy8BLUaUKFMxhN2UjRfix
 pdL9vLIGMXNCuGROWnO3UHbE99b1Dht8pdpmt+tTevSwYnHj7ACbyoZJsESOhged
 5WDwUCM2FTmyu4L3UqD8ZDp5jrvx2q6qTOpck53v3lLbnkRx/rNLLdamFXo1IRd3
 cypXIck+v+g1pv0mbs5cYdtHRQqiiP7RbY7bmvNJrV0G95iNGgonuWaCwiMOLqmZ
 R9aw9ktGZepu2aWlhhiV1nNxI1zo7pvMSoRYT2Pd1ZaxUbTb3LvBwkviAFgz8Dft
 a10MzvLMrxY79q/4FdWf+FdsQ0QHh98XwsMdAFuraI0zlNN+9/0UPDYd8X/LFgoK
 3GOZDNyQ6LPvCsJ4ZmzugneuCRzWk/HU4y1uo67vjqUODKjPPZ8qXGOz0sNSJfnp
 gjpdw6G2GW4PLH4q3oppbNfetAaqhm24pMw=
 =qAFz
 -----END PGP SIGNATURE-----

Merge tag '4.17-rc4-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Some small SMB3 fixes for 4.17-rc5, some for stable"

* tag '4.17-rc4-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: directory sync should not return an error
  cifs: smb2ops: Fix listxattr() when there are no EAs
  cifs: smbd: Enable signing with smbdirect
  cifs: Allocate validate negotiation request through kmalloc
2018-05-12 18:49:53 -07:00
Linus Torvalds
427fbe8926 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal fixes from Zhang Rui:

 - fix NULL pointer dereference on module load/probe for int3403_thermal
   driver

 - fix an emergency shutdown issue on exynos thermal driver

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
  thermal: exynos: Propagate error value from tmu_read()
  thermal: exynos: Reading temperature makes sense only when TMU is turned on
  thermal: int3403_thermal: Fix NULL pointer deref on module load / probe
2018-05-12 10:58:57 -07:00
Linus Torvalds
0d4cafd12f for-linus-20180511
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJa9gfhAAoJEPfTWPspceCm1vkP/3sNyf+FJy0Jz+EEHN01wJZP
 z77IRN+r5dKIi2bnOpfVSRTpEyCP9QTOAmv1waNygMFG6OcVmDFsgFoe/fnqGVo8
 VZjiQ+rpVrP8rEXzi1nMJunuImmx3JIV8E44+IKWKulipxHy38WXy73tWSEBrkoN
 uvL2UGJZOaKMoxC8XRsqDK/JVrhFSif44hNp8STPAo3e3YBPs95xJpV5qeIlJ8i1
 K2CXIiLVUUyM17vnW7dZR+nMoXHMXbQWwWpbWvgZ/ZDpPREHHddG+9NRkrinkGZI
 pLIyyPcamFifhiFvbV+3oWDiG/f/2CXFo45iO/uuxxW8JQgCMyQq4TukiTYrlbCt
 ciayjj0MQy8lF0IRIuFSgTyxp8iJrBYJEkaqBZcIRNdMTEFnE/ig43G7i7LqmZon
 7YjCC1EBZ6UNKgXaazqX+573wFUScCHMcmHcTWqE4Yfj93Q8mVTVedyYaJTwFSIq
 bNM2dspae+4iBxQa3z4w66dIcDC5ngJyKq2w2YdcO3H4KnT3FpTp/yv8GGW9y8lZ
 JiP2MYA2j4G3JtJLE40L6L2FnmI70APbHSX1JiLsS22J4UpLGElo8NUe5uwYPd/u
 jy7J6VAu/sl6e/+PlBIRV01pOGXzPSg3Fis9jZFtZJy2zddZVw+grapI6cs3o1Hq
 DYZnS0nFd2GmvnKmTI9p
 =Oq/z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20180511' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Just a few NVMe fixes this round - one fixing a use-after-free, one
  fixes the return value after controller reset, and the last one fixes
  an issue where some drives will spuriously EIO. We should get these
  into 4.17"

* tag 'for-linus-20180511' of git://git.kernel.dk/linux-block:
  nvme: add quirk to force medium priority for SQ creation
  nvme: Fix sync controller reset return
  nvme: fix use-after-free in nvme_free_ns_head
2018-05-12 10:55:48 -07:00
Jean Delvare
05e13bb57e swiotlb: silent unwanted warning "buffer is full"
If DMA_ATTR_NO_WARN is passed to swiotlb_alloc_buffer(), it should be
passed further down to swiotlb_tbl_map_single(). Otherwise we escape
half of the warnings but still log the other half.

This is one of the multiple causes of spurious warnings reported at:
https://bugs.freedesktop.org/show_bug.cgi?id=104082

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: 0176adb004 ("swiotlb: refactor coherent buffer allocation")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Michel Dänzer <michel@daenzer.net>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: stable@vger.kernel.org # v4.16
2018-05-12 11:57:37 +02:00
Mel Gorman
789ba28013 Revert "sched/numa: Delay retrying placement for automatic NUMA balance after wake_affine()"
This reverts commit 7347fc87df.

Srikar Dronamra pointed out that while the commit in question did show
a performance improvement on ppc64, it did so at the cost of disabling
active CPU migration by automatic NUMA balancing which was not the intent.
The issue was that a serious flaw in the logic failed to ever active balance
if SD_WAKE_AFFINE was disabled on scheduler domains. Even when it's enabled,
the logic is still bizarre and against the original intent.

Investigation showed that fixing the patch in either the way he suggested,
using the correct comparison for jiffies values or introducing a new
numa_migrate_deferred variable in task_struct all perform similarly to a
revert with a mix of gains and losses depending on the workload, machine
and socket count.

The original intent of the commit was to handle a problem whereby
wake_affine, idle balancing and automatic NUMA balancing disagree on the
appropriate placement for a task. This was particularly true for cases where
a single task was a massive waker of tasks but where wake_wide logic did
not apply.  This was particularly noticeable when a futex (a barrier) woke
all worker threads and tried pulling the wakees to the waker nodes. In that
specific case, it could be handled by tuning MPI or openMP appropriately,
but the behavior is not illogical and was worth attempting to fix. However,
the approach was wrong. Given that we're at rc4 and a fix is not obvious,
it's better to play safe, revert this commit and retry later.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: efault@gmx.de
Cc: ggherdovich@suse.cz
Cc: hpa@zytor.com
Cc: matt@codeblueprint.co.uk
Cc: mpe@ellerman.id.au
Link: http://lkml.kernel.org/r/20180509163115.6fnnyeg4vdm2ct4v@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-12 08:37:56 +02:00
Linus Torvalds
f0ab773f5c Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "13 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  rbtree: include rcu.h
  scripts/faddr2line: fix error when addr2line output contains discriminator
  ocfs2: take inode cluster lock before moving reflinked inode from orphan dir
  mm, oom: fix concurrent munlock and oom reaper unmap, v3
  mm: migrate: fix double call of radix_tree_replace_slot()
  proc/kcore: don't bounds check against address 0
  mm: don't show nr_indirectly_reclaimable in /proc/vmstat
  mm: sections are not offlined during memory hotremove
  z3fold: fix reclaim lock-ups
  init: fix false positives in W+X checking
  lib/find_bit_benchmark.c: avoid soft lockup in test_find_first_bit()
  KASAN: prohibit KASAN+STRUCTLEAK combination
  MAINTAINERS: update Shuah's email address
2018-05-11 18:04:12 -07:00
Sebastian Andrzej Siewior
2075b16e32 rbtree: include rcu.h
Since commit c1adf20052 ("Introduce rb_replace_node_rcu()")
rbtree_augmented.h uses RCU related data structures but does not include
the header file.  It works as long as it gets somehow included before
that and fails otherwise.

Link: http://lkml.kernel.org/r/20180504103159.19938-1-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11 17:28:45 -07:00
Changbin Du
78eb0c6356 scripts/faddr2line: fix error when addr2line output contains discriminator
When addr2line output contains discriminator, the current awk script
cannot parse it.  This patch fixes it by extracting key words using
regex which is more reliable.

  $ scripts/faddr2line vmlinux tlb_flush_mmu_free+0x26
  tlb_flush_mmu_free+0x26/0x50:
  tlb_flush_mmu_free at mm/memory.c:258 (discriminator 3)
  scripts/faddr2line: eval: line 173: unexpected EOF while looking for matching `)'

Link: http://lkml.kernel.org/r/1525323379-25193-1-git-send-email-changbin.du@intel.com
Fixes: 6870c0165f ("scripts/faddr2line: show the code context")
Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: NeilBrown <neilb@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11 17:28:45 -07:00
Ashish Samant
e438302920 ocfs2: take inode cluster lock before moving reflinked inode from orphan dir
While reflinking an inode, we create a new inode in orphan directory,
then take EX lock on it, reflink the original inode to orphan inode and
release EX lock.  Once the lock is released another node could request
it in EX mode from ocfs2_recover_orphans() which causes downconvert of
the lock, on this node, to NL mode.

Later we attempt to initialize security acl for the orphan inode and
move it to the reflink destination.  However, while doing this we dont
take EX lock on the inode.  This could potentially cause problems
because we could be starting transaction, accessing journal and
modifying metadata of the inode while holding NL lock and with another
node holding EX lock on the inode.

Fix this by taking orphan inode cluster lock in EX mode before
initializing security and moving orphan inode to reflink destination.
Use the __tracker variant while taking inode lock to avoid recursive
locking in the ocfs2_init_security_and_acl() call chain.

Link: http://lkml.kernel.org/r/1523475107-7639-1-git-send-email-ashish.samant@oracle.com
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Acked-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11 17:28:45 -07:00
David Rientjes
27ae357fa8 mm, oom: fix concurrent munlock and oom reaper unmap, v3
Since exit_mmap() is done without the protection of mm->mmap_sem, it is
possible for the oom reaper to concurrently operate on an mm until
MMF_OOM_SKIP is set.

This allows munlock_vma_pages_all() to concurrently run while the oom
reaper is operating on a vma.  Since munlock_vma_pages_range() depends
on clearing VM_LOCKED from vm_flags before actually doing the munlock to
determine if any other vmas are locking the same memory, the check for
VM_LOCKED in the oom reaper is racy.

This is especially noticeable on architectures such as powerpc where
clearing a huge pmd requires serialize_against_pte_lookup().  If the pmd
is zapped by the oom reaper during follow_page_mask() after the check
for pmd_none() is bypassed, this ends up deferencing a NULL ptl or a
kernel oops.

Fix this by manually freeing all possible memory from the mm before
doing the munlock and then setting MMF_OOM_SKIP.  The oom reaper can not
run on the mm anymore so the munlock is safe to do in exit_mmap().  It
also matches the logic that the oom reaper currently uses for
determining when to set MMF_OOM_SKIP itself, so there's no new risk of
excessive oom killing.

This issue fixes CVE-2018-1000200.

Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1804241526320.238665@chino.kir.corp.google.com
Fixes: 2129258024 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: David Rientjes <rientjes@google.com>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>	[4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11 17:28:45 -07:00
Naoya Horiguchi
013567be19 mm: migrate: fix double call of radix_tree_replace_slot()
radix_tree_replace_slot() is called twice for head page, it's obviously
a bug.  Let's fix it.

Link: http://lkml.kernel.org/r/20180423072101.GA12157@hori1.linux.bs1.fc.nec.co.jp
Fixes: e71769ae52 ("mm: enable thp migration for shmem thp")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Zi Yan <zi.yan@sent.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11 17:28:45 -07:00
Laura Abbott
3955333df9 proc/kcore: don't bounds check against address 0
The existing kcore code checks for bad addresses against __va(0) with
the assumption that this is the lowest address on the system.  This may
not hold true on some systems (e.g.  arm64) and produce overflows and
crashes.  Switch to using other functions to validate the address range.

It's currently only seen on arm64 and it's not clear if anyone wants to
use that particular combination on a stable release.  So this is not
urgent for stable.

Link: http://lkml.kernel.org/r/20180501201143.15121-1-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Tested-by: Dave Anderson <anderson@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>a
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11 17:28:45 -07:00
Roman Gushchin
7aaf772723 mm: don't show nr_indirectly_reclaimable in /proc/vmstat
Don't show nr_indirectly_reclaimable in /proc/vmstat, because there is
no need to export this vm counter to userspace, and some changes are
expected in reclaimable object accounting, which can alter this counter.

Link: http://lkml.kernel.org/r/20180425191422.9159-1-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11 17:28:45 -07:00
Pavel Tatashin
27227c7338 mm: sections are not offlined during memory hotremove
Memory hotplug and hotremove operate with per-block granularity.  If the
machine has a large amount of memory (more than 64G), the size of a
memory block can span multiple sections.  By mistake, during hotremove
we set only the first section to offline state.

The bug was discovered because kernel selftest started to fail:
  https://lkml.kernel.org/r/20180423011247.GK5563@yexl-desktop

After commit, "mm/memory_hotplug: optimize probe routine".  But, the bug
is older than this commit.  In this optimization we also added a check
for sections to be in a proper state during hotplug operation.

Link: http://lkml.kernel.org/r/20180427145257.15222-1-pasha.tatashin@oracle.com
Fixes: 2d070eab2e ("mm: consider zone which is not fully populated to have holes")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11 17:28:45 -07:00
Vitaly Wool
6098d7e136 z3fold: fix reclaim lock-ups
Do not try to optimize in-page object layout while the page is under
reclaim.  This fixes lock-ups on reclaim and improves reclaim
performance at the same time.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20180430125800.444cae9706489f412ad12621@gmail.com
Signed-off-by: Vitaly Wool <vitaly.vul@sony.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: <Oleksiy.Avramchenko@sony.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11 17:28:45 -07:00
Jeffrey Hugo
ae646f0b9c init: fix false positives in W+X checking
load_module() creates W+X mappings via __vmalloc_node_range() (from
layout_and_allocate()->move_module()->module_alloc()) by using
PAGE_KERNEL_EXEC.  These mappings are later cleaned up via
"call_rcu_sched(&freeinit->rcu, do_free_init)" from do_init_module().

This is a problem because call_rcu_sched() queues work, which can be run
after debug_checkwx() is run, resulting in a race condition.  If hit,
the race results in a nasty splat about insecure W+X mappings, which
results in a poor user experience as these are not the mappings that
debug_checkwx() is intended to catch.

This issue is observed on multiple arm64 platforms, and has been
artificially triggered on an x86 platform.

Address the race by flushing the queued work before running the
arch-defined mark_rodata_ro() which then calls debug_checkwx().

Link: http://lkml.kernel.org/r/1525103946-29526-1-git-send-email-jhugo@codeaurora.org
Fixes: e1a58320a3 ("x86/mm: Warn on W^X mappings")
Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Reported-by: Timur Tabi <timur@codeaurora.org>
Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11 17:28:45 -07:00
Yury Norov
4ba281d5bd lib/find_bit_benchmark.c: avoid soft lockup in test_find_first_bit()
test_find_first_bit() is intentionally sub-optimal, and may cause soft
lockup due to long time of run on some systems.  So decrease length of
bitmap to traverse to avoid lockup.

With the change below, time of test execution doesn't exceed 0.2 seconds
on my testing system.

Link: http://lkml.kernel.org/r/20180420171949.15710-1-ynorov@caviumnetworks.com
Fixes: 4441fca0a2 ("lib: test module for find_*_bit() functions")
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11 17:28:45 -07:00
Dmitry Vyukov
c9cf87ea6a KASAN: prohibit KASAN+STRUCTLEAK combination
Currently STRUCTLEAK inserts initialization out of live scope of variables
from KASAN point of view.  This leads to KASAN false positive reports.
Prohibit this combination for now.

Link: http://lkml.kernel.org/r/20180419172451.104700-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dennis Zhou <dennisszhou@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11 17:28:45 -07:00
Shuah Khan (Samsung OSG)
1d1c8e5f0d MAINTAINERS: update Shuah's email address
Update email address in MAINTAINERS file due to IT infrastructure changes
at Samsung.

Link: http://lkml.kernel.org/r/20180501212815.25911-1-shuah@kernel.org
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-05-11 17:28:45 -07:00
Linus Torvalds
4bc871984f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Verify lengths of keys provided by the user is AF_KEY, from Kevin
    Easton.

 2) Add device ID for BCM89610 PHY. Thanks to Bhadram Varka.

 3) Add Spectre guards to some ATM code, courtesy of Gustavo A. R.
    Silva.

 4) Fix infinite loop in NSH protocol code. To Eric Dumazet we are most
    grateful for this fix.

 5) Line up /proc/net/netlink headers properly. This fix from YU Bo, we
    do appreciate.

 6) Use after free in TLS code. Once again we are blessed by the
    honorable Eric Dumazet with this fix.

 7) Fix regression in TLS code causing stalls on partial TLS records.
    This fix is bestowed upon us by Andrew Tomt.

 8) Deal with too small MTUs properly in LLC code, another great gift
    from Eric Dumazet.

 9) Handle cached route flushing properly wrt. MTU locking in ipv4, to
    Hangbin Liu we give thanks for this.

10) Fix regression in SO_BINDTODEVIC handling wrt. UDP socket demux.
    Paolo Abeni, he gave us this.

11) Range check coalescing parameters in mlx4 driver, thank you Moshe
    Shemesh.

12) Some ipv6 ICMP error handling fixes in rxrpc, from our good brother
    David Howells.

13) Fix kexec on mlx5 by freeing IRQs in shutdown path. Daniel Juergens,
    you're the best!

14) Don't send bonding RLB updates to invalid MAC addresses. Debabrata
    Benerjee saved us!

15) Uh oh, we were leaking in udp_sendmsg and ping_v4_sendmsg. The ship
    is now water tight, thanks to Andrey Ignatov.

16) IPSEC memory leak in ixgbe from Colin Ian King, man we've got holes
    everywhere!

17) Fix error path in tcf_proto_create, Jiri Pirko what would we do
    without you!

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (92 commits)
  net sched actions: fix refcnt leak in skbmod
  net: sched: fix error path in tcf_proto_create() when modules are not configured
  net sched actions: fix invalid pointer dereferencing if skbedit flags missing
  ixgbe: fix memory leak on ipsec allocation
  ixgbevf: fix ixgbevf_xmit_frame()'s return type
  ixgbe: return error on unsupported SFP module when resetting
  ice: Set rq_last_status when cleaning rq
  ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg
  mlxsw: core: Fix an error handling path in 'mlxsw_core_bus_device_register()'
  bonding: send learning packets for vlans on slave
  bonding: do not allow rlb updates to invalid mac
  net/mlx5e: Err if asked to offload TC match on frag being first
  net/mlx5: E-Switch, Include VF RDMA stats in vport statistics
  net/mlx5: Free IRQs in shutdown path
  rxrpc: Trace UDP transmission failure
  rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages
  rxrpc: Fix the min security level for kernel calls
  rxrpc: Fix error reception on AF_INET6 sockets
  rxrpc: Fix missing start of call timeout
  qed: fix spelling mistake: "taskelt" -> "tasklet"
  ...
2018-05-11 14:14:46 -07:00
Linus Torvalds
a1f45efbb9 NFS client fixes for Linux 4.17-rc4
Bugfixes:
 - Fix a possible NFSoRDMA list corruption during recovery
 - Fix sunrpc tracepoint crashes
 
 Other change:
 - Update Trond's email in the MAINTAINERS file
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlr2ABEACgkQ18tUv7Cl
 QOvLew//WipZ1of+dZpiGa95pqVKBrIxq5R1y8LACmEKaiyfHOOoFcaopI7YDU1r
 OkBRZkldMLKOSGZsQ9xEjh3OOPgW60oInFZ2sD2qjnph23x09IcDbiCp8iJ0PTFI
 iD9ioUKc3h7FSl0pJQSjIo9+9fFsTZzIioxP7tDZt2Kog5OMIZeWAqRIj1xmgu5i
 TX793gTFJ+SfMSkvWZM5oOHVEmW/oXAgWsgaVXEqkdjK2JI6KYKqAgMj0CLvvNIo
 S2eeJjbyd9Hl59lDo50NzrZEQESlPYod6ZDfEOmF50mxC3MCLlmtAgwXKknVaY1N
 1L4tFuBoXBLV0jctBztuqMIDKXncoNlsCvr38WqkBaFxikKpK8dFqeByh+wCTdtz
 pwMPHFDQmQB1mIwqzQa+O6MAZ5n3a/cgyWQtoymlq5ddQU3roB2euWXRmaoXPudY
 SnmEVYxq839Ukw16qNa1HkKkroy8Zzqr5+sS30w/l916U9/S3ZolXF+XU5ux+6hQ
 Mlu9aW5SCP4S5QresaAcjPcBdLvbjN8/h/I8bdCmPRCGVKSkcxSz2MZYUli8UxAq
 tht4tQtuCY1XInQPnuf20egnJnrhpgQjb8Xx5BvTtcEkFvz9F36lzK4ot0lqQzTo
 tGDDW8gpeskt0Z1PC4eD1gq/E+FSywP7gg/g32AMdk2GpCewBog=
 =xfe9
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
 "These patches fix both a possible corruption during NFSoRDMA MR
  recovery, and a sunrpc tracepoint crash.

  Additionally, Trond has a new email address to put in the MAINTAINERS
  file"

* tag 'nfs-for-4.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  Change Trond's email address in MAINTAINERS
  sunrpc: Fix latency trace point crashes
  xprtrdma: Fix list corruption / DMAR errors during MR recovery
2018-05-11 13:56:43 -07:00