Commit Graph

4495 Commits

Author SHA1 Message Date
Peter Zijlstra
c86e9b987c lockdep: Prepare for noinstr sections
Force inlining and prevent instrumentation of all sorts by marking the
functions which are invoked from low level entry code with 'noinstr'.

Split the irqflags tracking into two parts. One which does the heavy
lifting while RCU is watching and the final one which can be invoked after
RCU is turned off.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/20200505134100.484532537@linutronix.de
2020-05-19 15:47:21 +02:00
Thomas Gleixner
0995a5dfbe tracing: Provide lockdep less trace_hardirqs_on/off() variants
trace_hardirqs_on/off() is only partially safe vs. RCU idle. The tracer
core itself is safe, but the resulting tracepoints can be utilized by
e.g. BPF which is unsafe.

Provide variants which do not contain the lockdep invocation so the lockdep
and tracer invocations can be split at the call site and placed
properly. This is required because lockdep needs to be aware of the state
before switching away from RCU idle and after switching to RCU idle because
these transitions can take locks.

As these code pathes are going to be non-instrumentable the tracer can be
invoked after RCU is turned on and before the switch to RCU idle. So for
these new variants there is no need to invoke the rcuidle aware tracer
functions.

Name them so they match the lockdep counterparts.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134100.270771162@linutronix.de
2020-05-19 15:47:21 +02:00
Jan Kara
870c153cf0 blktrace: Report pid with note messages
Currently informational messages within block trace do not have PID
information of the process reporting the message included. With BFQ it
is sometimes useful to have the information and there's no good reason
to omit the information from the trace. So just fill in pid information
when generating note message.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-16 14:29:39 -06:00
David S. Miller
da07f52d3c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Move the bpf verifier trace check into the new switch statement in
HEAD.

Resolve the overlapping changes in hinic, where bug fixes overlap
the addition of VF support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15 13:48:59 -07:00
Linus Torvalds
f85c1598dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix sk_psock reference count leak on receive, from Xiyu Yang.

 2) CONFIG_HNS should be invisible, from Geert Uytterhoeven.

 3) Don't allow locking route MTUs in ipv6, RFCs actually forbid this,
    from Maciej Żenczykowski.

 4) ipv4 route redirect backoff wasn't actually enforced, from Paolo
    Abeni.

 5) Fix netprio cgroup v2 leak, from Zefan Li.

 6) Fix infinite loop on rmmod in conntrack, from Florian Westphal.

 7) Fix tcp SO_RCVLOWAT hangs, from Eric Dumazet.

 8) Various bpf probe handling fixes, from Daniel Borkmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits)
  selftests: mptcp: pm: rm the right tmp file
  dpaa2-eth: properly handle buffer size restrictions
  bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier
  bpf: Add bpf_probe_read_{user, kernel}_str() to do_refine_retval_range
  bpf: Restrict bpf_probe_read{, str}() only to archs where they work
  MAINTAINERS: Mark networking drivers as Maintained.
  ipmr: Add lockdep expression to ipmr_for_each_table macro
  ipmr: Fix RCU list debugging warning
  drivers: net: hamradio: Fix suspicious RCU usage warning in bpqether.c
  net: phy: broadcom: fix BCM54XX_SHD_SCR3_TRDDAPD value for BCM54810
  tcp: fix error recovery in tcp_zerocopy_receive()
  MAINTAINERS: Add Jakub to networking drivers.
  MAINTAINERS: another add of Karsten Graul for S390 networking
  drivers: ipa: fix typos for ipa_smp2p structure doc
  pppoe: only process PADT targeted at local interfaces
  selftests/bpf: Enforce returning 0 for fentry/fexit programs
  bpf: Enforce returning 0 for fentry/fexit progs
  net: stmmac: fix num_por initialization
  security: Fix the default value of secid_to_secctx hook
  libbpf: Fix register naming in PT_REGS s390 macros
  ...
2020-05-15 13:10:06 -07:00
Alexei Starovoitov
2c78ee898d bpf: Implement CAP_BPF
Implement permissions as stated in uapi/linux/capability.h
In order to do that the verifier allow_ptr_leaks flag is split
into four flags and they are set as:
  env->allow_ptr_leaks = bpf_allow_ptr_leaks();
  env->bypass_spec_v1 = bpf_bypass_spec_v1();
  env->bypass_spec_v4 = bpf_bypass_spec_v4();
  env->bpf_capable = bpf_capable();

The first three currently equivalent to perfmon_capable(), since leaking kernel
pointers and reading kernel memory via side channel attacks is roughly
equivalent to reading kernel memory with cap_perfmon.

'bpf_capable' enables bounded loops, precision tracking, bpf to bpf calls and
other verifier features. 'allow_ptr_leaks' enable ptr leaks, ptr conversions,
subtraction of pointers. 'bypass_spec_v1' disables speculative analysis in the
verifier, run time mitigations in bpf array, and enables indirect variable
access in bpf programs. 'bypass_spec_v4' disables emission of sanitation code
by the verifier.

That means that the networking BPF program loaded with CAP_BPF + CAP_NET_ADMIN
will have speculative checks done by the verifier and other spectre mitigation
applied. Such networking BPF program will not be able to leak kernel pointers
and will not be able to access arbitrary kernel memory.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200513230355.7858-3-alexei.starovoitov@gmail.com
2020-05-15 17:29:41 +02:00
Daniel Borkmann
b2a5212fb6 bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier
Usage of plain %s conversion specifier in bpf_trace_printk() suffers from the
very same issue as bpf_probe_read{,str}() helpers, that is, it is broken on
archs with overlapping address ranges.

While the helpers have been addressed through work in 6ae08ae3de ("bpf: Add
probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers"), we need
an option for bpf_trace_printk() as well to fix it.

Similarly as with the helpers, force users to make an explicit choice by adding
%pks and %pus specifier to bpf_trace_printk() which will then pick the corresponding
strncpy_from_unsafe*() variant to perform the access under KERNEL_DS or USER_DS.
The %pk* (kernel specifier) and %pu* (user specifier) can later also be extended
for other objects aside strings that are probed and printed under tracing, and
reused out of other facilities like bpf_seq_printf() or BTF based type printing.

Existing behavior of %s for current users is still kept working for archs where it
is not broken and therefore gated through CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
For archs not having this property we fall-back to pick probing under KERNEL_DS as
a sensible default.

Fixes: 8d3b7dce86 ("bpf: add support for %s specifier to bpf_trace_printk()")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Link: https://lore.kernel.org/bpf/20200515101118.6508-4-daniel@iogearbox.net
2020-05-15 08:10:36 -07:00
Daniel Borkmann
0ebeea8ca8 bpf: Restrict bpf_probe_read{, str}() only to archs where they work
Given the legacy bpf_probe_read{,str}() BPF helpers are broken on archs
with overlapping address ranges, we should really take the next step to
disable them from BPF use there.

To generally fix the situation, we've recently added new helper variants
bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str().
For details on them, see 6ae08ae3de ("bpf: Add probe_read_{user, kernel}
and probe_read_{user,kernel}_str helpers").

Given bpf_probe_read{,str}() have been around for ~5 years by now, there
are plenty of users at least on x86 still relying on them today, so we
cannot remove them entirely w/o breaking the BPF tracing ecosystem.

However, their use should be restricted to archs with non-overlapping
address ranges where they are working in their current form. Therefore,
move this behind a CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE and
have x86, arm64, arm select it (other archs supporting it can follow-up
on it as well).

For the remaining archs, they can workaround easily by relying on the
feature probe from bpftool which spills out defines that can be used out
of BPF C code to implement the drop-in replacement for old/new kernels
via: bpftool feature probe macro

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/bpf/20200515101118.6508-2-daniel@iogearbox.net
2020-05-15 08:10:36 -07:00
Linus Torvalds
f44d5c4890 Various tracing fixes:
- Fix a crash when having function tracing and function stack tracing on
    the command line. The ftrace trampolines are created as executable and
    read only. But the stack tracer tries to modify them with text_poke()
    which expects all kernel text to still be writable at boot.
    Keep the trampolines writable at boot, and convert them to read-only
    with the rest of the kernel.
 
  - A selftest was triggering in the ring buffer iterator code, that
    is no longer valid with the update of keeping the ring buffer
    writable while a iterator is reading. Just bail after three failed
    attempts to get an event and remove the warning and disabling of the
    ring buffer.
 
  - While modifying the ring buffer code, decided to remove all the
    unnecessary BUG() calls.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXr1CDhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qsXcAQCoL229SBrtHsn4DUO7eAQRppUT3hNw
 RuKzvQ56+1GccQEAh8VGCeg89uMSK6imrTujEl6VmOUdbgrD5R96yiKoGQw=
 =vi+k
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull more tracing fixes from Steven Rostedt:
 "Various tracing fixes:

   - Fix a crash when having function tracing and function stack tracing
     on the command line.

     The ftrace trampolines are created as executable and read only. But
     the stack tracer tries to modify them with text_poke() which
     expects all kernel text to still be writable at boot. Keep the
     trampolines writable at boot, and convert them to read-only with
     the rest of the kernel.

   - A selftest was triggering in the ring buffer iterator code, that is
     no longer valid with the update of keeping the ring buffer writable
     while a iterator is reading.

     Just bail after three failed attempts to get an event and remove
     the warning and disabling of the ring buffer.

   - While modifying the ring buffer code, decided to remove all the
     unnecessary BUG() calls"

* tag 'trace-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ring-buffer: Remove all BUG() calls
  ring-buffer: Don't deactivate the ring buffer on failed iterator reads
  x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up
2020-05-14 11:46:52 -07:00
Steven Rostedt (VMware)
da4d401a6b ring-buffer: Remove all BUG() calls
There's a lot of checks to make sure the ring buffer is working, and if an
anomaly is detected, it safely shuts itself down. But there's a few cases
that it will call BUG(), which defeats the point of being safe (it crashes
the kernel when an anomaly is found!). There's no reason for them. Switch
them all to either WARN_ON_ONCE() (when no ring buffer descriptor is present),
or to RB_WARN_ON() (when a ring buffer descriptor is present).

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-14 08:51:02 -04:00
Steven Rostedt (VMware)
3d2353de81 ring-buffer: Don't deactivate the ring buffer on failed iterator reads
If the function tracer is running and the trace file is read (which uses the
ring buffer iterator), the iterator can get in sync with the writes, and
caues it to fail to find a page with content it can read three times. This
causes a warning and deactivation of the ring buffer code.

Looking at the other cases of failure to get an event, it appears that
there's a chance that the writer could cause them too. Since the iterator is
a "best effort" to read the ring buffer if there's an active writer (the
consumer reader is made for this case "see trace_pipe"), if it fails to get
an event after three tries, simply give up and return NULL. Don't warn, nor
disable the ring buffer on this failure.

Link: https://lore.kernel.org/r/20200429090508.GG5770@shao2-debian

Reported-by: kernel test robot <lkp@intel.com>
Fixes: ff84c50cfb ("ring-buffer: Do not die if rb_iter_peek() fails more than thrice")
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-14 08:50:51 -04:00
Steven Rostedt (VMware)
59566b0b62 x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up
Booting one of my machines, it triggered the following crash:

 Kernel/User page tables isolation: enabled
 ftrace: allocating 36577 entries in 143 pages
 Starting tracer 'function'
 BUG: unable to handle page fault for address: ffffffffa000005c
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0003) - permissions violation
 PGD 2014067 P4D 2014067 PUD 2015063 PMD 7b253067 PTE 7b252061
 Oops: 0003 [#1] PREEMPT SMP PTI
 CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-test+ #24
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
 RIP: 0010:text_poke_early+0x4a/0x58
 Code: 34 24 48 89 54 24 08 e8 bf 72 0b 00 48 8b 34 24 48 8b 4c 24 08 84 c0 74 0b 48 89 df f3 a4 48 83 c4 10 5b c3 9c 58 fa 48 89 df <f3> a4 50 9d 48 83 c4 10 5b e9 d6 f9 ff ff
0 41 57 49
 RSP: 0000:ffffffff82003d38 EFLAGS: 00010046
 RAX: 0000000000000046 RBX: ffffffffa000005c RCX: 0000000000000005
 RDX: 0000000000000005 RSI: ffffffff825b9a90 RDI: ffffffffa000005c
 RBP: ffffffffa000005c R08: 0000000000000000 R09: ffffffff8206e6e0
 R10: ffff88807b01f4c0 R11: ffffffff8176c106 R12: ffffffff8206e6e0
 R13: ffffffff824f2440 R14: 0000000000000000 R15: ffffffff8206eac0
 FS:  0000000000000000(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffa000005c CR3: 0000000002012000 CR4: 00000000000006b0
 Call Trace:
  text_poke_bp+0x27/0x64
  ? mutex_lock+0x36/0x5d
  arch_ftrace_update_trampoline+0x287/0x2d5
  ? ftrace_replace_code+0x14b/0x160
  ? ftrace_update_ftrace_func+0x65/0x6c
  __register_ftrace_function+0x6d/0x81
  ftrace_startup+0x23/0xc1
  register_ftrace_function+0x20/0x37
  func_set_flag+0x59/0x77
  __set_tracer_option.isra.19+0x20/0x3e
  trace_set_options+0xd6/0x13e
  apply_trace_boot_options+0x44/0x6d
  register_tracer+0x19e/0x1ac
  early_trace_init+0x21b/0x2c9
  start_kernel+0x241/0x518
  ? load_ucode_intel_bsp+0x21/0x52
  secondary_startup_64+0xa4/0xb0

I was able to trigger it on other machines, when I added to the kernel
command line of both "ftrace=function" and "trace_options=func_stack_trace".

The cause is the "ftrace=function" would register the function tracer
and create a trampoline, and it will set it as executable and
read-only. Then the "trace_options=func_stack_trace" would then update
the same trampoline to include the stack tracer version of the function
tracer. But since the trampoline already exists, it updates it with
text_poke_bp(). The problem is that text_poke_bp() called while
system_state == SYSTEM_BOOTING, it will simply do a memcpy() and not
the page mapping, as it would think that the text is still read-write.
But in this case it is not, and we take a fault and crash.

Instead, lets keep the ftrace trampolines read-write during boot up,
and then when the kernel executable text is set to read-only, the
ftrace trampolines get set to read-only as well.

Link: https://lkml.kernel.org/r/20200430202147.4dc6e2de@oasis.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable@vger.kernel.org
Fixes: 768ae4406a ("x86/ftrace: Use text_poke()")
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-12 18:24:34 -04:00
Linus Torvalds
24085f70a6 Tracing fixes to previous fixes:
Unfortunately, the last set of fixes introduced some minor bugs:
 
  - The bootconfig apply_xbc() leak fix caused the application to return
    a positive number on success, when it should have returned zero.
 
  - The preempt_irq_delay_thread fix to make the creation code
    wait for the kthread to finish to prevent it from executing after
    module unload, can now cause the kthread to exit before it even
    executes (preventing it to run its tests).
 
  - The fix to the bootconfig that fixed the initrd to remove the
    bootconfig from causing the kernel to panic, now prints a warning
    that the bootconfig is not found, even when bootconfig is not
    on the command line.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXrq2ehQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qrdjAQDGNaJa7Ft13KTDTNTioKmOorOi38vF
 ava4E3uBHl3StQD/anJmVq7Kk4WJFKGYemV6usbjDqy510PCFu/VQ1AbGQc=
 =hJvk
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Fixes to previous fixes.

  Unfortunately, the last set of fixes introduced some minor bugs:

   - The bootconfig apply_xbc() leak fix caused the application to
     return a positive number on success, when it should have returned
     zero.

   - The preempt_irq_delay_thread fix to make the creation code wait for
     the kthread to finish to prevent it from executing after module
     unload, can now cause the kthread to exit before it even executes
     (preventing it to run its tests).

   - The fix to the bootconfig that fixed the initrd to remove the
     bootconfig from causing the kernel to panic, now prints a warning
     that the bootconfig is not found, even when bootconfig is not on
     the command line"

* tag 'trace-v5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  bootconfig: Fix to prevent warning message if no bootconfig option
  tracing: Wait for preempt irq delay thread to execute
  tools/bootconfig: Fix apply_xbc() to return zero on success
2020-05-12 11:06:26 -07:00
Steven Rostedt (VMware)
8b1fac2e73 tracing: Wait for preempt irq delay thread to execute
A bug report was posted that running the preempt irq delay module on a slow
machine, and removing it quickly could lead to the thread created by the
modlue to execute after the module is removed, and this could cause the
kernel to crash. The fix for this was to call kthread_stop() after creating
the thread to make sure it finishes before allowing the module to be
removed.

Now this caused the opposite problem on fast machines. What now happens is
the kthread_stop() can cause the kthread never to execute and the test never
to run. To fix this, add a completion and wait for the kthread to execute,
then wait for it to end.

This issue caused the ftracetest selftests to fail on the preemptirq tests.

Link: https://lore.kernel.org/r/20200510114210.15d9e4af@oasis.local.home

Cc: stable@vger.kernel.org
Fixes: d16a8c3107 ("tracing: Wait for preempt irq delay thread to finish")
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-11 17:00:34 -04:00
Yonghong Song
492e639f0c bpf: Add bpf_seq_printf and bpf_seq_write helpers
Two helpers bpf_seq_printf and bpf_seq_write, are added for
writing data to the seq_file buffer.

bpf_seq_printf supports common format string flag/width/type
fields so at least I can get identical results for
netlink and ipv6_route targets.

For bpf_seq_printf and bpf_seq_write, return value -EOVERFLOW
specifically indicates a write failure due to overflow, which
means the object will be repeated in the next bpf invocation
if object collection stays the same. Note that if the object
collection is changed, depending how collection traversal is
done, even if the object still in the collection, it may not
be visited.

For bpf_seq_printf, format %s, %p{i,I}{4,6} needs to
read kernel memory. Reading kernel memory may fail in
the following two cases:
  - invalid kernel address, or
  - valid kernel address but requiring a major fault
If reading kernel memory failed, the %s string will be
an empty string and %p{i,I}{4,6} will be all 0.
Not returning error to bpf program is consistent with
what bpf_trace_printk() does for now.

bpf_seq_printf may return -EBUSY meaning that internal percpu
buffer for memory copy of strings or other pointees is
not available. Bpf program can return 1 to indicate it
wants the same object to be repeated. Right now, this should not
happen on no-RT kernels since migrate_disable(), which guards
bpf prog call, calls preempt_disable().

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175914.2476661-1-yhs@fb.com
2020-05-09 17:05:26 -07:00
Linus Torvalds
78a5255ffb Stop the ad-hoc games with -Wno-maybe-initialized
We have some rather random rules about when we accept the
"maybe-initialized" warnings, and when we don't.

For example, we consider it unreliable for gcc versions < 4.9, but also
if -O3 is enabled, or if optimizing for size.  And then various kernel
config options disabled it, because they know that they trigger that
warning by confusing gcc sufficiently (ie PROFILE_ALL_BRANCHES).

And now gcc-10 seems to be introducing a lot of those warnings too, so
it falls under the same heading as 4.9 did.

At the same time, we have a very straightforward way to _enable_ that
warning when wanted: use "W=2" to enable more warnings.

So stop playing these ad-hoc games, and just disable that warning by
default, with the known and straight-forward "if you want to work on the
extra compiler warnings, use W=123".

Would it be great to have code that is always so obvious that it never
confuses the compiler whether a variable is used initialized or not?
Yes, it would.  In a perfect world, the compilers would be smarter, and
our source code would be simpler.

That's currently not the world we live in, though.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-09 13:57:10 -07:00
Zou Wei
192b7993b3 tracing: Make tracing_snapshot_instance_cond() static
Fix the following sparse warning:

kernel/trace/trace.c:950:6: warning: symbol 'tracing_snapshot_instance_cond'
was not declared. Should it be static?

Link: http://lkml.kernel.org/r/1587614905-48692-1-git-send-email-zou_wei@huawei.com

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-07 13:32:58 -04:00
Steven Rostedt (VMware)
11f5efc3ab tracing: Add a vmalloc_sync_mappings() for safe measure
x86_64 lazily maps in the vmalloc pages, and the way this works with per_cpu
areas can be complex, to say the least. Mappings may happen at boot up, and
if nothing synchronizes the page tables, those page mappings may not be
synced till they are used. This causes issues for anything that might touch
one of those mappings in the path of the page fault handler. When one of
those unmapped mappings is touched in the page fault handler, it will cause
another page fault, which in turn will cause a page fault, and leave us in
a loop of page faults.

Commit 763802b53a ("x86/mm: split vmalloc_sync_all()") split
vmalloc_sync_all() into vmalloc_sync_unmappings() and
vmalloc_sync_mappings(), as on system exit, it did not need to do a full
sync on x86_64 (although it still needed to be done on x86_32). By chance,
the vmalloc_sync_all() would synchronize the page mappings done at boot up
and prevent the per cpu area from being a problem for tracing in the page
fault handler. But when that synchronization in the exit of a task became a
nop, it caused the problem to appear.

Link: https://lore.kernel.org/r/20200429054857.66e8e333@oasis.local.home

Cc: stable@vger.kernel.org
Fixes: 737223fbca ("tracing: Consolidate buffer allocation code")
Reported-by: "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com>
Suggested-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-07 13:32:57 -04:00
Steven Rostedt (VMware)
d16a8c3107 tracing: Wait for preempt irq delay thread to finish
Running on a slower machine, it is possible that the preempt delay kernel
thread may still be executing if the module was immediately removed after
added, and this can cause the kernel to crash as the kernel thread might be
executing after its code has been removed.

There's no reason that the caller of the code shouldn't just wait for the
delay thread to finish, as the thread can also be created by a trigger in
the sysfs code, which also has the same issues.

Link: http://lore.kernel.org/r/5EA2B0C8.2080706@cn.fujitsu.com

Cc: stable@vger.kernel.org
Fixes: 793937236d ("lib: Add module for testing preemptoff/irqsoff latency tracers")
Reported-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Reviewed-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Reviewed-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-07 13:32:40 -04:00
Masami Hiramatsu
5b4dcd2d20 tracing/kprobes: Reject new event if loc is NULL
Reject the new event which has NULL location for kprobes.
For kprobes, user must specify at least the location.

Link: http://lkml.kernel.org/r/158779376597.6082.1411212055469099461.stgit@devnote2

Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 2a588dd1d5 ("tracing: Add kprobe event command generation functions")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-06 09:04:11 -04:00
Masami Hiramatsu
da0f1f4167 tracing/boottime: Fix kprobe event API usage
Fix boottime kprobe events to use API correctly for
multiple events.

For example, when we set a multiprobe kprobe events in
bootconfig like below,

  ftrace.event.kprobes.myevent {
  	probes = "vfs_read $arg1 $arg2", "vfs_write $arg1 $arg2"
  }

This cause an error;

  trace_boot: Failed to add probe: p:kprobes/myevent (null)  vfs_read $arg1 $arg2  vfs_write $arg1 $arg2

This shows the 1st argument becomes NULL and multiprobes
are merged to 1 probe.

Link: http://lkml.kernel.org/r/158779375766.6082.201939936008972838.stgit@devnote2

Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 29a1548105 ("tracing: Change trace_boot to use kprobe_event interface")
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-06 09:04:11 -04:00
Masami Hiramatsu
dcbd21c9fc tracing/kprobes: Fix a double initialization typo
Fix a typo that resulted in an unnecessary double
initialization to addr.

Link: http://lkml.kernel.org/r/158779374968.6082.2337484008464939919.stgit@devnote2

Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Fixes: c7411a1a12 ("tracing/kprobe: Check whether the non-suffixed symbol is notrace")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-06 09:04:11 -04:00
Daniel Borkmann
0b54142e4b Merge branch 'work.sysctl' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull in Christoph Hellwig's series that changes the sysctl's ->proc_handler
methods to take kernel pointers instead. It gets rid of the set_fs address
space overrides used by BPF. As per discussion, pull in the feature branch
into bpf-next as it relates to BPF sysctl progs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200427071508.GV23230@ZenIV.linux.org.uk/T/
2020-04-28 21:23:38 +02:00
Paul E. McKenney
e5a971d76d ftrace: Use synchronize_rcu_tasks_rude() instead of ftrace_sync()
This commit replaces the schedule_on_each_cpu(ftrace_sync) instances
with synchronize_rcu_tasks_rude().

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
[ paulmck: Make Kconfig adjustments noted by kbuild test robot. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2020-04-27 11:03:53 -07:00
Christoph Hellwig
32927393dc sysctl: pass kernel pointers to ->proc_handler
Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from  userspace in common code.  This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.

As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-27 02:07:40 -04:00
Maciej Żenczykowski
71d1921477 bpf: add bpf_ktime_get_boot_ns()
On a device like a cellphone which is constantly suspending
and resuming CLOCK_MONOTONIC is not particularly useful for
keeping track of or reacting to external network events.
Instead you want to use CLOCK_BOOTTIME.

Hence add bpf_ktime_get_boot_ns() as a mirror of bpf_ktime_get_ns()
based around CLOCK_BOOTTIME instead of CLOCK_MONOTONIC.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-04-26 09:43:05 -07:00
Jason Yan
d013496f99 tracing: Convert local functions in tracing_map.c to static
Fix the following sparse warning:

kernel/trace/tracing_map.c:286:6: warning: symbol
'tracing_map_array_clear' was not declared. Should it be static?
kernel/trace/tracing_map.c:297:6: warning: symbol
'tracing_map_array_free' was not declared. Should it be static?
kernel/trace/tracing_map.c:319:26: warning: symbol
'tracing_map_array_alloc' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20200410073312.38855-1-yanaijie@huawei.com

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-04-22 22:07:26 -04:00
Steven Rostedt (VMware)
353da87921 ftrace: Fix memory leak caused by not freeing entry in unregister_ftrace_direct()
kmemleak reported the following:

unreferenced object 0xffff90d47127a920 (size 32):
  comm "modprobe", pid 1766, jiffies 4294792031 (age 162.568s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 22 01 00 00 00 00 ad de  ........".......
    00 78 12 a7 ff ff ff ff 00 00 b6 c0 ff ff ff ff  .x..............
  backtrace:
    [<00000000bb79e72e>] register_ftrace_direct+0xcb/0x3a0
    [<00000000295e4f79>] do_one_initcall+0x72/0x340
    [<00000000873ead18>] do_init_module+0x5a/0x220
    [<00000000974d9de5>] load_module+0x2235/0x2550
    [<0000000059c3d6ce>] __do_sys_finit_module+0xc0/0x120
    [<000000005a8611b4>] do_syscall_64+0x60/0x230
    [<00000000a0cdc49e>] entry_SYSCALL_64_after_hwframe+0x49/0xb3

The entry used to save the direct descriptor needs to be freed
when unregistering.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-04-22 22:04:27 -04:00
Vamshi K Sthambamkadi
9da73974eb tracing: Fix memory leaks in trace_events_hist.c
kmemleak report 1:
    [<9092c50b>] kmem_cache_alloc_trace+0x138/0x270
    [<05a2c9ed>] create_field_var+0xcf/0x180
    [<528a2d68>] action_create+0xe2/0xc80
    [<63f50b61>] event_hist_trigger_func+0x15b5/0x1920
    [<28ea5d3d>] trigger_process_regex+0x7b/0xc0
    [<3138e86f>] event_trigger_write+0x4d/0xb0
    [<ffd66c19>] __vfs_write+0x30/0x200
    [<4f424a0d>] vfs_write+0x96/0x1b0
    [<da59a290>] ksys_write+0x53/0xc0
    [<3717101a>] __ia32_sys_write+0x15/0x20
    [<c5f23497>] do_fast_syscall_32+0x70/0x250
    [<46e2629c>] entry_SYSENTER_32+0xaf/0x102

This is because save_vars[] of struct hist_trigger_data are
not destroyed

kmemleak report 2:
    [<9092c50b>] kmem_cache_alloc_trace+0x138/0x270
    [<6e5e97c5>] create_var+0x3c/0x110
    [<de82f1b9>] create_field_var+0xaf/0x180
    [<528a2d68>] action_create+0xe2/0xc80
    [<63f50b61>] event_hist_trigger_func+0x15b5/0x1920
    [<28ea5d3d>] trigger_process_regex+0x7b/0xc0
    [<3138e86f>] event_trigger_write+0x4d/0xb0
    [<ffd66c19>] __vfs_write+0x30/0x200
    [<4f424a0d>] vfs_write+0x96/0x1b0
    [<da59a290>] ksys_write+0x53/0xc0
    [<3717101a>] __ia32_sys_write+0x15/0x20
    [<c5f23497>] do_fast_syscall_32+0x70/0x250
    [<46e2629c>] entry_SYSENTER_32+0xaf/0x102

struct hist_field allocated through create_var() do not initialize
"ref" field to 1. The code in __destroy_hist_field() does not destroy
object if "ref" is initialized to zero, the condition
if (--hist_field->ref > 1) always passes since unsigned int wraps.

kmemleak report 3:
    [<f8666fcc>] __kmalloc_track_caller+0x139/0x2b0
    [<bb7f80a5>] kstrdup+0x27/0x50
    [<39d70006>] init_var_ref+0x58/0xd0
    [<8ca76370>] create_var_ref+0x89/0xe0
    [<f045fc39>] action_create+0x38f/0xc80
    [<7c146821>] event_hist_trigger_func+0x15b5/0x1920
    [<07de3f61>] trigger_process_regex+0x7b/0xc0
    [<e87daf8f>] event_trigger_write+0x4d/0xb0
    [<19bf1512>] __vfs_write+0x30/0x200
    [<64ce4d27>] vfs_write+0x96/0x1b0
    [<a6f34170>] ksys_write+0x53/0xc0
    [<7d4230cd>] __ia32_sys_write+0x15/0x20
    [<8eadca00>] do_fast_syscall_32+0x70/0x250
    [<235cf985>] entry_SYSENTER_32+0xaf/0x102

hist_fields (system & event_name) are not freed

Link: http://lkml.kernel.org/r/20200422061503.GA5151@cosmos

Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-04-22 21:24:00 -04:00
Ingo Molnar
87cfeb1920 perf/core fixes and improvements:
kernel + tools/perf:
 
   Alexey Budankov:
 
   - Introduce CAP_PERFMON to kernel and user space.
 
 callchains:
 
   Adrian Hunter:
 
   - Allow using Intel PT to synthesize callchains for regular events.
 
   Kan Liang:
 
   - Stitch LBR records from multiple samples to get deeper backtraces,
     there are caveats, see the csets for details.
 
 perf script:
 
   Andreas Gerstmayr:
 
   - Add flamegraph.py script
 
 BPF:
 
   Jiri Olsa:
 
   - Synthesize bpf_trampoline/dispatcher ksymbol events.
 
 perf stat:
 
   Arnaldo Carvalho de Melo:
 
   - Honour --timeout for forked workloads.
 
   Stephane Eranian:
 
   - Force error in fallback on :k events, to avoid counting nothing when
     the user asks for kernel events but is not allowed to.
 
 perf bench:
 
   Ian Rogers:
 
   - Add event synthesis benchmark.
 
 tools api fs:
 
   Stephane Eranian:
 
  - Make xxx__mountpoint() more scalable
 
 libtraceevent:
 
   He Zhe:
 
   - Handle return value of asprintf.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXp2LlQAKCRCyPKLppCJ+
 J95oAP0ZihVUhESv/gdeX0IDE5g6Rd2V6LNcRj+jb7gX9NlQkwD/UfS454WV1ftQ
 qTwrkKPzY/5Tm2cLuVE7r7fJ6naDHgU=
 =FHm4
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-5.8-20200420' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core fixes and improvements from Arnaldo Carvalho de Melo:

kernel + tools/perf:

  Alexey Budankov:

  - Introduce CAP_PERFMON to kernel and user space.

callchains:

  Adrian Hunter:

  - Allow using Intel PT to synthesize callchains for regular events.

  Kan Liang:

  - Stitch LBR records from multiple samples to get deeper backtraces,
    there are caveats, see the csets for details.

perf script:

  Andreas Gerstmayr:

  - Add flamegraph.py script

BPF:

  Jiri Olsa:

  - Synthesize bpf_trampoline/dispatcher ksymbol events.

perf stat:

  Arnaldo Carvalho de Melo:

  - Honour --timeout for forked workloads.

  Stephane Eranian:

  - Force error in fallback on :k events, to avoid counting nothing when
    the user asks for kernel events but is not allowed to.

perf bench:

  Ian Rogers:

  - Add event synthesis benchmark.

tools api fs:

  Stephane Eranian:

 - Make xxx__mountpoint() more scalable

libtraceevent:

  He Zhe:

  - Handle return value of asprintf.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-22 14:08:28 +02:00
Alexey Budankov
031258da05 trace/bpf_trace: Open access for CAP_PERFMON privileged process
Open access to bpf_trace monitoring for CAP_PERFMON privileged process.
Providing the access under CAP_PERFMON capability singly, without the
rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the
credentials and makes operation more secure.

CAP_PERFMON implements the principle of least privilege for performance
monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
principle of least privilege: A security design principle that states
that a process or program be granted only those privileges (e.g.,
capabilities) necessary to accomplish its legitimate function, and only
for the time that such privileges are actually required)

For backward compatibility reasons access to bpf_trace monitoring
remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN
usage for secure bpf_trace monitoring is discouraged with respect to
CAP_PERFMON capability.

Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Igor Lubashev <ilubashe@akamai.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Cc: linux-doc@vger.kernel.org
Cc: linux-man@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: selinux@vger.kernel.org
Link: http://lore.kernel.org/lkml/c0a0ae47-8b6e-ff3e-416b-3cd1faaf71c0@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-16 12:19:08 -03:00
Xiao Yang
0bbe7f7199 tracing: Fix the race between registering 'snapshot' event trigger and triggering 'snapshot' operation
Traced event can trigger 'snapshot' operation(i.e. calls snapshot_trigger()
or snapshot_count_trigger()) when register_snapshot_trigger() has completed
registration but doesn't allocate buffer for 'snapshot' event trigger.  In
the rare case, 'snapshot' operation always detects the lack of allocated
buffer so make register_snapshot_trigger() allocate buffer first.

trigger-snapshot.tc in kselftest reproduces the issue on slow vm:
-----------------------------------------------------------
cat trace
...
ftracetest-3028  [002] ....   236.784290: sched_process_fork: comm=ftracetest pid=3028 child_comm=ftracetest child_pid=3036
     <...>-2875  [003] ....   240.460335: tracing_snapshot_instance_cond: *** SNAPSHOT NOT ALLOCATED ***
     <...>-2875  [003] ....   240.460338: tracing_snapshot_instance_cond: *** stopping trace here!   ***
-----------------------------------------------------------

Link: http://lkml.kernel.org/r/20200414015145.66236-1-yangx.jy@cn.fujitsu.com

Cc: stable@vger.kernel.org
Fixes: 93e31ffbf4 ("tracing: Add 'snapshot' event trigger command")
Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-04-14 22:02:10 -04:00
Ingo Molnar
3b02a051d2 Linux 5.7-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl6TbaUeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGhgkH/iWpiKvosA20HJjC
 rBqYeJPxQsgZTuBieWJ+MeVxbpcF7RlM4c+glyvg3QJhHwIEG58dl6LBrQbAyBAR
 aFHNojr1iAYOruVCGnU3pA008YZiwUIDv/ZQ4DF8fmIU2vI2mJ6qHBv3XDl4G2uR
 Nwz8Eu9AgIwZM5coomVOSmoWyFy7Vxmb7W+3t5VmKsvOWx4ib9kyQtOIkvQDEl7j
 XCbWfI0xDQr6LFOm4jnCi5R/LhJ2LIqqIvHHrunbpszM8IwK797jCXz4im+dmd5Y
 +km46N7a8pDqri36xXz1gdBAU3eG7Pt1NyvfjwRVTdX4GquQ2MT0GoojxbLxUP3y
 3pEsQuE=
 =whbL
 -----END PGP SIGNATURE-----

Merge tag 'v5.7-rc1' into locking/kcsan, to resolve conflicts and refresh

Resolve these conflicts:

	arch/x86/Kconfig
	arch/x86/kernel/Makefile

Do a minor "evil merge" to move the KCSAN entry up a bit by a few lines
in the Kconfig to reduce the probability of future conflicts.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-04-13 09:44:39 +02:00
Linus Torvalds
aa1a8ce533 New tracing features:
- The ring buffer is no longer disabled when reading the trace file.
    The trace_pipe file was made to be used for live tracing and reading
    as it acted like the normal producer/consumer. As the trace file
    would not consume the data, the easy way of handling it was to just
    disable writes to the ring buffer. This came to a surprise to the
    BPF folks who complained about lost events due to reading.
    This is no longer an issue. If someone wants to keep the old disabling
    there's a new option "pause-on-trace" that can be set.
 
  - New set_ftrace_notrace_pid file. PIDs in this file will not be traced
    by the function tracer. Similar to set_ftrace_pid, which makes the
    function tracer only trace those tasks with PIDs in the file, the
    set_ftrace_notrace_pid does the reverse.
 
  - New set_event_notrace_pid file. PIDs in this file will cause events
    not to be traced if triggered by a task with a matching PID.
    Similar to the set_event_pid file but will not be traced.
    Note, sched_waking and sched_switch events may still be trace if
    one of the tasks referenced by those events contains a PID that
    is allowed to be traced.
 
 Tracing related features:
 
  - New bootconfig option, that is attached to the initrd file.
    If bootconfig is on the command line, then the initrd file
    is searched looking for a bootconfig appended at the end.
 
  - New GPU tracepoint infrastructure to help the gfx drivers to get
    off debugfs (acked by Greg Kroah-Hartman)
 
 Other minor updates and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXokgWRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qgrHAP0UkKs/52JY4oWa3OIh/OqK+vnCrIwz
 zGvDFOYM0fKbwgD9FZWgzlcaYK5m2Cxlhp4VoraZveHMLJUhnEHtdX6X0wk=
 =Rebj
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "New tracing features:

   - The ring buffer is no longer disabled when reading the trace file.

     The trace_pipe file was made to be used for live tracing and
     reading as it acted like the normal producer/consumer. As the trace
     file would not consume the data, the easy way of handling it was to
     just disable writes to the ring buffer.

     This came to a surprise to the BPF folks who complained about lost
     events due to reading. This is no longer an issue. If someone wants
     to keep the old disabling there's a new option "pause-on-trace"
     that can be set.

   - New set_ftrace_notrace_pid file. PIDs in this file will not be
     traced by the function tracer.

     Similar to set_ftrace_pid, which makes the function tracer only
     trace those tasks with PIDs in the file, the set_ftrace_notrace_pid
     does the reverse.

   - New set_event_notrace_pid file. PIDs in this file will cause events
     not to be traced if triggered by a task with a matching PID.

     Similar to the set_event_pid file but will not be traced. Note,
     sched_waking and sched_switch events may still be traced if one of
     the tasks referenced by those events contains a PID that is allowed
     to be traced.

  Tracing related features:

   - New bootconfig option, that is attached to the initrd file.

     If bootconfig is on the command line, then the initrd file is
     searched looking for a bootconfig appended at the end.

   - New GPU tracepoint infrastructure to help the gfx drivers to get
     off debugfs (acked by Greg Kroah-Hartman)

  And other minor updates and fixes"

* tag 'trace-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits)
  tracing: Do not allocate buffer in trace_find_next_entry() in atomic
  tracing: Add documentation on set_ftrace_notrace_pid and set_event_notrace_pid
  selftests/ftrace: Add test to test new set_event_notrace_pid file
  selftests/ftrace: Add test to test new set_ftrace_notrace_pid file
  tracing: Create set_event_notrace_pid to not trace tasks
  ftrace: Create set_ftrace_notrace_pid to not trace tasks
  ftrace: Make function trace pid filtering a bit more exact
  ftrace/kprobe: Show the maxactive number on kprobe_events
  tracing: Have the document reflect that the trace file keeps tracing enabled
  ring-buffer/tracing: Have iterator acknowledge dropped events
  tracing: Do not disable tracing when reading the trace file
  ring-buffer: Do not disable recording when there is an iterator
  ring-buffer: Make resize disable per cpu buffer instead of total buffer
  ring-buffer: Optimize rb_iter_head_event()
  ring-buffer: Do not die if rb_iter_peek() fails more than thrice
  ring-buffer: Have rb_iter_head_event() handle concurrent writer
  ring-buffer: Add page_stamp to iterator for synchronization
  ring-buffer: Rename ring_buffer_read() to read_buffer_iter_advance()
  ring-buffer: Have ring_buffer_empty() not depend on tracing stopped
  tracing: Save off entry when peeking at next entry
  ...
2020-04-05 10:36:18 -07:00
Steven Rostedt (VMware)
8e99cf91b9 tracing: Do not allocate buffer in trace_find_next_entry() in atomic
When dumping out the trace data in latency format, a check is made to peek
at the next event to compare its timestamp to the current one, and if the
delta is of a greater size, it will add a marker showing so. But to do this,
it needs to save the current event otherwise peeking at the next event will
remove the current event. To save the event, a temp buffer is used, and if
the event is bigger than the temp buffer, the temp buffer is freed and a
bigger buffer is allocated.

This allocation is a problem when called in atomic context. The only way
this gets called via atomic context is via ftrace_dump(). Thus, use a static
buffer of 128 bytes (which covers most events), and if the event is bigger
than that, simply return NULL. The callers of trace_find_next_entry() need
to handle a NULL case, as that's what would happen if the allocation failed.

Link: https://lore.kernel.org/r/20200326091256.GR11705@shao2-debian

Fixes: ff895103a8 ("tracing: Save off entry when peeking at next entry")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-04-03 11:30:50 -04:00
David S. Miller
ed52f2c608 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 19:52:37 -07:00
KP Singh
fc611f47f2 bpf: Introduce BPF_PROG_TYPE_LSM
Introduce types and configs for bpf programs that can be attached to
LSM hooks. The programs can be enabled by the config option
CONFIG_BPF_LSM.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Florent Revest <revest@google.com>
Reviewed-by: Thomas Garnier <thgarnie@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Link: https://lore.kernel.org/bpf/20200329004356.27286-2-kpsingh@chromium.org
2020-03-30 01:34:00 +02:00
Steven Rostedt (VMware)
2768362603 tracing: Create set_event_notrace_pid to not trace tasks
There's currently a way to select a task that should only have its events
traced, but there's no way to select a task not to have itsevents traced.
Add a set_event_notrace_pid file that acts the same as set_event_pid (and is
also affected by event-fork), but the task pids in this file will not be
traced even if they are listed in the set_event_pid file. This makes it easy
for tools like trace-cmd to "hide" itself from beint traced by events when
it is recording other tasks.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-27 16:39:02 -04:00
Steven Rostedt (VMware)
b3b1e6eded ftrace: Create set_ftrace_notrace_pid to not trace tasks
There's currently a way to select a task that should only be traced by
functions, but there's no way to select a task not to be traced by the
function tracer. Add a set_ftrace_notrace_pid file that acts the same as
set_ftrace_pid (and is also affected by function-fork), but the task pids in
this file will not be traced even if they are listed in the set_ftrace_pid
file. This makes it easy for tools like trace-cmd to "hide" itself from the
function tracer when it is recording other tasks.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-27 16:39:02 -04:00
Steven Rostedt (VMware)
717e3f5ebc ftrace: Make function trace pid filtering a bit more exact
The set_ftrace_pid file is used to filter function tracing to only trace
tasks that are listed in that file. Instead of testing the pids listed in
that file (it's a bitmask) at each function trace event, the logic is done
via a sched_switch hook. A flag is set when the next task to run is in the
list of pids in the set_ftrace_pid file. But the sched_switch hook is not at
the exact location of when the task switches, and the flag gets set before
the task to be traced actually runs. This leaves a residue of traced
functions that do not belong to the pid that should be filtered on.

By changing the logic slightly, where instead of having  a boolean flag to
test, record the pid that should be traced, with special values for not to
trace and always trace. Then at each function call, a check will be made to
see if the function should be ignored, or if the current pid matches the
function that should be traced, and only trace if it matches (or if it has
the special value to always trace).

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-27 16:39:02 -04:00
Masami Hiramatsu
6a13a0d7b4 ftrace/kprobe: Show the maxactive number on kprobe_events
Show maxactive parameter on kprobe_events.
This allows user to save the current configuration and
restore it without losing maxactive parameter.

Link: http://lkml.kernel.org/r/4762764a-6df7-bc93-ed60-e336146dce1f@gmail.com
Link: http://lkml.kernel.org/r/158503528846.22706.5549974121212526020.stgit@devnote2

Cc: stable@vger.kernel.org
Fixes: 696ced4fb1 ("tracing/kprobes: expose maxactive for kretprobe in kprobe_events")
Reported-by: Taeung Song <treeze.taeung@gmail.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-27 16:39:02 -04:00
Steven Rostedt (VMware)
c9b7a4a72f ring-buffer/tracing: Have iterator acknowledge dropped events
Have the ring_buffer_iterator set a flag if events were dropped as it were
to go and peek at the next event. Have the trace file display this fact if
it happened with a "LOST EVENTS" message.

Link: http://lkml.kernel.org/r/20200317213417.045858900@goodmis.org

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-27 16:39:01 -04:00
Steven Rostedt (VMware)
06e0a548ba tracing: Do not disable tracing when reading the trace file
When opening the "trace" file, it is no longer necessary to disable tracing.

Note, a new option is created called "pause-on-trace", when set, will cause
the trace file to emulate its original behavior.

Link: http://lkml.kernel.org/r/20200317213416.903351225@goodmis.org

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-27 16:39:01 -04:00
Steven Rostedt (VMware)
1039221cc2 ring-buffer: Do not disable recording when there is an iterator
Now that the iterator can handle a concurrent writer, do not disable writing
to the ring buffer when there is an iterator present.

Link: http://lkml.kernel.org/r/20200317213416.759770696@goodmis.org

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-27 16:38:58 -04:00
Steven Rostedt (VMware)
07b8b10ec9 ring-buffer: Make resize disable per cpu buffer instead of total buffer
When the ring buffer becomes writable for even when the trace file is read,
it must still not be resized. But since tracers can be activated while the
trace file is being read, the irqsoff tracer can modify the per CPU buffers,
and this can cause the reader of the trace file to update the wrong buffer's
resize disable bit, as the irqsoff tracer swaps out cpu buffers.

By making the resize disable per cpu_buffer, it makes the update follow the
per cpu_buffer even if it's swapped out with the snapshot buffer and keeps
the release of the trace file modifying the same data as the open did.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-27 16:21:22 -04:00
David S. Miller
9fb16955fb Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Overlapping header include additions in macsec.c

A bug fix in 'net' overlapping with the removal of 'version'
string in ena_netdev.c

Overlapping test additions in selftests Makefile

Overlapping PCI ID table adjustments in iwlwifi driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25 18:58:11 -07:00
Marco Elver
f5d2313bd3 kcsan, trace: Make KCSAN compatible with tracing
Previously the system would lock up if ftrace was enabled together with
KCSAN. This is due to recursion on reporting if the tracer code is
instrumented with KCSAN.

To avoid this for all types of tracing, disable KCSAN instrumentation
for all of kernel/trace.

Furthermore, since KCSAN relies on udelay() to introduce delay, we have
to disable ftrace for udelay() (currently done for x86) in case KCSAN is
used together with lockdep and ftrace. The reason is that it may corrupt
lockdep IRQ flags tracing state due to a peculiar case of recursion
(details in Makefile comment).

Reported-by: Qian Cai <cai@lca.pw>
Tested-by: Qian Cai <cai@lca.pw>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-03-21 09:44:41 +01:00
Steven Rostedt (VMware)
153368ce1b ring-buffer: Optimize rb_iter_head_event()
As it is fine to perform several "peeks" of event data in the ring buffer
via the iterator before moving it forward, do not re-read the event, just
return what was read before. Otherwise, it can cause inconsistent results,
especially when testing multiple CPU buffers to interleave them.

Link: http://lkml.kernel.org/r/20200317213416.592032170@goodmis.org

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-19 19:11:20 -04:00
Steven Rostedt (VMware)
ff84c50cfb ring-buffer: Do not die if rb_iter_peek() fails more than thrice
As the iterator will be reading a live buffer, and if the event being read
is on a page that a writer crosses, it will fail and try again, the
condition in rb_iter_peek() that only allows a retry to happen three times
is no longer valid. Allow rb_iter_peek() to retry more than three times
without killing the ring buffer, but only if rb_iter_head_event() had failed
at least once.

Link: http://lkml.kernel.org/r/20200317213416.452888193@goodmis.org

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-19 19:11:19 -04:00
Steven Rostedt (VMware)
785888c544 ring-buffer: Have rb_iter_head_event() handle concurrent writer
Have the ring_buffer_iter structure have a place to store an event, such
that it can not be overwritten by a writer, and load it in such a way via
rb_iter_head_event() that it will return NULL and reset the iter to the
start of the current page if a writer updated the page.

Link: http://lkml.kernel.org/r/20200317213416.306959216@goodmis.org

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-19 19:11:19 -04:00
Steven Rostedt (VMware)
28e3fc56a4 ring-buffer: Add page_stamp to iterator for synchronization
Have the ring_buffer_iter structure contain a page_stamp, such that it can
be used to see if the writer entered the page the iterator is on. When going
to a new page, the iterator will record the time stamp of that page. When
reading events, it can copy the event to an internal buffer on the iterator
(to be implemented later), then check the page's time stamp with its own to
see if the writer entered the page. If so, it will need to try to read the
event again.

Link: http://lkml.kernel.org/r/20200317213416.163549674@goodmis.org

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-19 19:11:19 -04:00
Steven Rostedt (VMware)
bc1a72afdc ring-buffer: Rename ring_buffer_read() to read_buffer_iter_advance()
When the ring buffer was first created, the iterator followed the normal
producer/consumer operations where it had both a peek() operation, that just
returned the event at the current location, and a read(), that would return
the event at the current location and also increment the iterator such that
the next peek() or read() will return the next event.

The only use of the ring_buffer_read() is currently to move the iterator to
the next location and nothing now actually reads the event it returns.
Rename this function to its actual use case to ring_buffer_iter_advance(),
which also adds the "iter" part to the name, which is more meaningful. As
the timestamp returned by ring_buffer_read() was never used, there's no
reason that this new version should bother having returning it. It will also
become a void function.

Link: http://lkml.kernel.org/r/20200317213416.018928618@goodmis.org

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-19 19:11:19 -04:00
Steven Rostedt (VMware)
ead6ecfdde ring-buffer: Have ring_buffer_empty() not depend on tracing stopped
It was complained about that when the trace file is read, that the tracing
is disabled, as the iterator expects writing to the buffer it reads is not
updated. Several steps are needed to make the iterator handle a writer,
by testing if things have changed as it reads.

This step is to make ring_buffer_empty() expect the buffer to be changing.
Note if the current location of the iterator is overwritten, then it will
return false as new data is being added. Note, that this means that data
will be skipped.

Link: http://lkml.kernel.org/r/20200317213415.870741809@goodmis.org

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-19 19:11:19 -04:00
Steven Rostedt (VMware)
ff895103a8 tracing: Save off entry when peeking at next entry
In order to have the iterator read the buffer even when it's still updating,
it requires that the ring buffer iterator saves each event in a separate
location outside the ring buffer such that its use is immutable.

There's one use case that saves off the event returned from the ring buffer
interator and calls it again to look at the next event, before going back to
use the first event. As the ring buffer iterator will only have a single
copy, this use case will no longer be supported.

Instead, have the one use case create its own buffer to store the first
event when looking at the next event. This way, when looking at the first
event again, it wont be corrupted by the second read.

Link: http://lkml.kernel.org/r/20200317213415.722539921@goodmis.org

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-19 17:48:36 -04:00
Nathan Chancellor
bf2cbe044d tracing: Use address-of operator on section symbols
Clang warns:

../kernel/trace/trace.c:9335:33: warning: array comparison always
evaluates to true [-Wtautological-compare]
        if (__stop___trace_bprintk_fmt != __start___trace_bprintk_fmt)
                                       ^
1 warning generated.

These are not true arrays, they are linker defined symbols, which are
just addresses. Using the address of operator silences the warning and
does not change the runtime result of the check (tested with some print
statements compiled in with clang + ld.lld and gcc + ld.bfd in QEMU).

Link: http://lkml.kernel.org/r/20200220051011.26113-1-natechancellor@gmail.com

Link: https://github.com/ClangBuiltLinux/linux/issues/893
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-19 16:27:41 -04:00
David S. Miller
44ef976ab3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-03-13

The following pull-request contains BPF updates for your *net-next* tree.

We've added 86 non-merge commits during the last 12 day(s) which contain
a total of 107 files changed, 5771 insertions(+), 1700 deletions(-).

The main changes are:

1) Add modify_return attach type which allows to attach to a function via
   BPF trampoline and is run after the fentry and before the fexit programs
   and can pass a return code to the original caller, from KP Singh.

2) Generalize BPF's kallsyms handling and add BPF trampoline and dispatcher
   objects to be visible in /proc/kallsyms so they can be annotated in
   stack traces, from Jiri Olsa.

3) Extend BPF sockmap to allow for UDP next to existing TCP support in order
   in order to enable this for BPF based socket dispatch, from Lorenz Bauer.

4) Introduce a new bpftool 'prog profile' command which attaches to existing
   BPF programs via fentry and fexit hooks and reads out hardware counters
   during that period, from Song Liu. Example usage:

   bpftool prog profile id 337 duration 3 cycles instructions llc_misses

        4228 run_cnt
     3403698 cycles                                              (84.08%)
     3525294 instructions   #  1.04 insn per cycle               (84.05%)
          13 llc_misses     #  3.69 LLC misses per million isns  (83.50%)

5) Batch of improvements to libbpf, bpftool and BPF selftests. Also addition
   of a new bpf_link abstraction to keep in particular BPF tracing programs
   attached even when the applicaion owning them exits, from Andrii Nakryiko.

6) New bpf_get_current_pid_tgid() helper for tracing to perform PID filtering
   and which returns the PID as seen by the init namespace, from Carlos Neira.

7) Refactor of RISC-V JIT code to move out common pieces and addition of a
   new RV32G BPF JIT compiler, from Luke Nelson.

8) Add gso_size context member to __sk_buff in order to be able to know whether
   a given skb is GSO or not, from Willem de Bruijn.

9) Add a new bpf_xdp_output() helper which reuses XDP's existing perf RB output
   implementation but can be called from tracepoint programs, from Eelco Chaudron.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-13 20:52:03 -07:00
David S. Miller
242a6df688 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2020-03-12

The following pull-request contains BPF updates for your *net* tree.

We've added 12 non-merge commits during the last 8 day(s) which contain
a total of 12 files changed, 161 insertions(+), 15 deletions(-).

The main changes are:

1) Andrii fixed two bugs in cgroup-bpf.

2) John fixed sockmap.

3) Luke fixed x32 jit.

4) Martin fixed two issues in struct_ops.

5) Yonghong fixed bpf_send_signal.

6) Yoshiki fixed BTF enum.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-13 11:13:45 -07:00
David S. Miller
1d34357931 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Minor overlapping changes, nothing serious.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 22:34:48 -07:00
Eelco Chaudron
d831ee84bf bpf: Add bpf_xdp_output() helper
Introduce new helper that reuses existing xdp perf_event output
implementation, but can be called from raw_tracepoint programs
that receive 'struct xdp_buff *' as a tracepoint argument.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/158348514556.2239.11050972434793741444.stgit@xdp-tutorial
2020-03-12 17:47:38 -07:00
Carlos Neira
b4490c5c4e bpf: Added new helper bpf_get_ns_current_pid_tgid
New bpf helper bpf_get_ns_current_pid_tgid,
This helper will return pid and tgid from current task
which namespace matches dev_t and inode number provided,
this will allows us to instrument a process inside a container.

Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200304204157.58695-3-cneirabustos@gmail.com
2020-03-12 17:33:11 -07:00
Linus Torvalds
36feb99630 Have ftrace lookup_rec() return a consistent record otherwise it
can break live patching.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXmj5ExQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6quQGAQDO35RBAQDGmpxnSCQPNwrzqokM8p8d
 1e1xshwOVnwqgAEA7csC4u1n5Z8ncIl5Pd8ygt4nXeqw4AenHLeZIdhfegc=
 =+AeW
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull ftrace fix from Steven Rostedt:
 "Have ftrace lookup_rec() return a consistent record otherwise it can
  break live patching"

* tag 'trace-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Return the first found result in lookup_rec()
2020-03-11 09:54:59 -07:00
Artem Savkov
d9815bff6b ftrace: Return the first found result in lookup_rec()
It appears that ip ranges can overlap so. In that case lookup_rec()
returns whatever results it got last even if it found nothing in last
searched page.

This breaks an obscure livepatch late module patching usecase:
  - load livepatch
  - load the patched module
  - unload livepatch
  - try to load livepatch again

To fix this return from lookup_rec() as soon as it found the record
containing searched-for ip. This used to be this way prior lookup_rec()
introduction.

Link: http://lkml.kernel.org/r/20200306174317.21699-1-asavkov@redhat.com

Cc: stable@vger.kernel.org
Fixes: 7e16f581a8 ("ftrace: Separate out functionality from ftrace_location_range()")
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-11 10:37:12 -04:00
Linus Torvalds
5dfcc13902 block-5.6-2020-03-07
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl5j8hwQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpnjID/4/XVrqtVNUzVoVOtkOyxyesBrJVMHEQEpJ
 PZssv835IStw0ENhxQJfGjPaIFc9Ff6PMkeN5KRAlMoEc+NkrJShF3owGf+6Bps7
 rxpblPxaw+CJFa31YBDZVjMCvbVkDm40G5SsJh+xzdIjlWz7MppkkMPdrErPwY8V
 0vnrIc+mKBKfBMZTwVkycYtp17LVgfXguledoWzxM1y47IW5UasKh8jdzhbu8Hvt
 zztdQrigUdb+9XnLGCZIY0JQOyrhJ5zQpZ40FzbvxdYrQZXOoYT8L7iFu/z0Wi7K
 p3a+G+B4WowtLYW78me4Uut5RrHq2XOehSypfujanQlpgXPGjS3TdHT3an2T8XPQ
 NyGsZsn/eLm3btNbhGUd8vqpQy5EmWhqmwvYk9tFAoSFLiLcvCC624b/TCYPL+gk
 3ZiI7mXBMjHnUZ0J/RF6kZWTAZDvr/tE7UZt1f8r1eEr8VDzCNp5Pst+HCVIguYD
 g9eWF8oH6wYoj39UKf1k+vW2GjXGFsnfivObaxhyz03sAPXK2wQlzAe/4jZ24XNr
 TRtOXh97c3CbLAwdUHehlzzdR3U7h0n2KsmrTC5AGmLABmR79s7BJ0+pexuZituO
 LwU8+gpf7AugHTrLg1eNXAmBHW44I1ticXYiWcT4iSPn99kNIhlW+Jb1iTGoiu7n
 nXyS3b5SCw==
 =xwKl
 -----END PGP SIGNATURE-----

Merge tag 'block-5.6-2020-03-07' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Here are a few fixes that should go into this release. This contains:

   - Revert of a bad bcache patch from this merge window

   - Removed unused function (Daniel)

   - Fixup for the blktrace fix from Jan from this release (Cengiz)

   - Fix of deeper level bfqq overwrite in BFQ (Carlo)"

* tag 'block-5.6-2020-03-07' of git://git.kernel.dk/linux-block:
  block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group()
  blktrace: fix dereference after null check
  Revert "bcache: ignore pending signals when creating gc and allocator thread"
  block: Remove used kblockd_schedule_work_on()
2020-03-07 14:14:38 -06:00
KP Singh
3e7c67d90e bpf: Fix bpf_prog_test_run_tracing for !CONFIG_NET
test_run.o is not built when CONFIG_NET is not set and
bpf_prog_test_run_tracing being referenced in bpf_trace.o causes the
linker error:

ld: kernel/trace/bpf_trace.o:(.rodata+0x38): undefined reference to
 `bpf_prog_test_run_tracing'

Add a __weak function in bpf_trace.c to handle this.

Fixes: da00d2f117 ("bpf: Add test ops for BPF_PROG_TYPE_TRACING")
Signed-off-by: KP Singh <kpsingh@google.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200305220127.29109-1-kpsingh@chromium.org
2020-03-05 15:14:58 -08:00
Yonghong Song
1bc7896e9e bpf: Fix deadlock with rq_lock in bpf_send_signal()
When experimenting with bpf_send_signal() helper in our production
environment (5.2 based), we experienced a deadlock in NMI mode:
   #5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24
   #6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012
   #7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd
   #8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55
   #9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602
  #10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a
  #11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227
  #12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140
  #13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf
  #14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09
  #15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47
  #16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d
  #17 [ffffc9002219fc90] schedule at ffffffff81a3e219
  #18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9
  #19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529
  #20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc
  #21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c
  #22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602
  #23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068

The above call stack is actually very similar to an issue
reported by Commit eac9153f2b ("bpf/stackmap: Fix deadlock with
rq_lock in bpf_get_stack()") by Song Liu. The only difference is
bpf_send_signal() helper instead of bpf_get_stack() helper.

The above deadlock is triggered with a perf_sw_event.
Similar to Commit eac9153f2b, the below almost identical reproducer
used tracepoint point sched/sched_switch so the issue can be easily caught.
  /* stress_test.c */
  #include <stdio.h>
  #include <stdlib.h>
  #include <sys/mman.h>
  #include <pthread.h>
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <fcntl.h>

  #define THREAD_COUNT 1000
  char *filename;
  void *worker(void *p)
  {
        void *ptr;
        int fd;
        char *pptr;

        fd = open(filename, O_RDONLY);
        if (fd < 0)
                return NULL;
        while (1) {
                struct timespec ts = {0, 1000 + rand() % 2000};

                ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
                usleep(1);
                if (ptr == MAP_FAILED) {
                        printf("failed to mmap\n");
                        break;
                }
                munmap(ptr, 4096 * 64);
                usleep(1);
                pptr = malloc(1);
                usleep(1);
                pptr[0] = 1;
                usleep(1);
                free(pptr);
                usleep(1);
                nanosleep(&ts, NULL);
        }
        close(fd);
        return NULL;
  }

  int main(int argc, char *argv[])
  {
        void *ptr;
        int i;
        pthread_t threads[THREAD_COUNT];

        if (argc < 2)
                return 0;

        filename = argv[1];

        for (i = 0; i < THREAD_COUNT; i++) {
                if (pthread_create(threads + i, NULL, worker, NULL)) {
                        fprintf(stderr, "Error creating thread\n");
                        return 0;
                }
        }

        for (i = 0; i < THREAD_COUNT; i++)
                pthread_join(threads[i], NULL);
        return 0;
  }
and the following command:
  1. run `stress_test /bin/ls` in one windown
  2. hack bcc trace.py with the following change:
     --- a/tools/trace.py
     +++ b/tools/trace.py
     @@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s);
              __data.tgid = __tgid;
              __data.pid = __pid;
              bpf_get_current_comm(&__data.comm, sizeof(__data.comm));
     +        bpf_send_signal(10);
      %s
      %s
              %s.perf_submit(%s, &__data, sizeof(__data));
  3. in a different window run
     ./trace.py -p $(pidof stress_test) t:sched:sched_switch

The deadlock can be reproduced in our production system.

Similar to Song's fix, the fix is to delay sending signal if
irqs is disabled to avoid deadlocks involving with rq_lock.
With this change, my above stress-test in our production system
won't cause deadlock any more.

I also implemented a scale-down version of reproducer in the
selftest (a subsequent commit). With latest bpf-next,
it complains for the following potential deadlock.
  [   32.832450] -> #1 (&p->pi_lock){-.-.}:
  [   32.833100]        _raw_spin_lock_irqsave+0x44/0x80
  [   32.833696]        task_rq_lock+0x2c/0xa0
  [   32.834182]        task_sched_runtime+0x59/0xd0
  [   32.834721]        thread_group_cputime+0x250/0x270
  [   32.835304]        thread_group_cputime_adjusted+0x2e/0x70
  [   32.835959]        do_task_stat+0x8a7/0xb80
  [   32.836461]        proc_single_show+0x51/0xb0
  ...
  [   32.839512] -> #0 (&(&sighand->siglock)->rlock){....}:
  [   32.840275]        __lock_acquire+0x1358/0x1a20
  [   32.840826]        lock_acquire+0xc7/0x1d0
  [   32.841309]        _raw_spin_lock_irqsave+0x44/0x80
  [   32.841916]        __lock_task_sighand+0x79/0x160
  [   32.842465]        do_send_sig_info+0x35/0x90
  [   32.842977]        bpf_send_signal+0xa/0x10
  [   32.843464]        bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000
  [   32.844301]        trace_call_bpf+0x115/0x270
  [   32.844809]        perf_trace_run_bpf_submit+0x4a/0xc0
  [   32.845411]        perf_trace_sched_switch+0x10f/0x180
  [   32.846014]        __schedule+0x45d/0x880
  [   32.846483]        schedule+0x5f/0xd0
  ...

  [   32.853148] Chain exists of:
  [   32.853148]   &(&sighand->siglock)->rlock --> &p->pi_lock --> &rq->lock
  [   32.853148]
  [   32.854451]  Possible unsafe locking scenario:
  [   32.854451]
  [   32.855173]        CPU0                    CPU1
  [   32.855745]        ----                    ----
  [   32.856278]   lock(&rq->lock);
  [   32.856671]                                lock(&p->pi_lock);
  [   32.857332]                                lock(&rq->lock);
  [   32.857999]   lock(&(&sighand->siglock)->rlock);

  Deadlock happens on CPU0 when it tries to acquire &sighand->siglock
  but it has been held by CPU1 and CPU1 tries to grab &rq->lock
  and cannot get it.

  This is not exactly the callstack in our production environment,
  but sympotom is similar and both locks are using spin_lock_irqsave()
  to acquire the lock, and both involves rq_lock. The fix to delay
  sending signal when irq is disabled also fixed this issue.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200304191104.2796501-1-yhs@fb.com
2020-03-05 14:02:22 -08:00
Cengiz Can
153031a301 blktrace: fix dereference after null check
There was a recent change in blktrace.c that added a RCU protection to
`q->blk_trace` in order to fix a use-after-free issue during access.

However the change missed an edge case that can lead to dereferencing of
`bt` pointer even when it's NULL:

Coverity static analyzer marked this as a FORWARD_NULL issue with CID
1460458.

```
/kernel/trace/blktrace.c: 1904 in sysfs_blk_trace_attr_store()
1898            ret = 0;
1899            if (bt == NULL)
1900                    ret = blk_trace_setup_queue(q, bdev);
1901
1902            if (ret == 0) {
1903                    if (attr == &dev_attr_act_mask)
>>>     CID 1460458:  Null pointer dereferences  (FORWARD_NULL)
>>>     Dereferencing null pointer "bt".
1904                            bt->act_mask = value;
1905                    else if (attr == &dev_attr_pid)
1906                            bt->pid = value;
1907                    else if (attr == &dev_attr_start_lba)
1908                            bt->start_lba = value;
1909                    else if (attr == &dev_attr_end_lba)
```

Added a reassignment with RCU annotation to fix the issue.

Fixes: c780e86dd4 ("blktrace: Protect q->blk_trace with RCU")
Cc: stable@vger.kernel.org
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Cengiz Can <cengiz@kernel.wtf>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-05 13:42:40 -07:00
KP Singh
da00d2f117 bpf: Add test ops for BPF_PROG_TYPE_TRACING
The current fexit and fentry tests rely on a different program to
exercise the functions they attach to. Instead of doing this, implement
the test operations for tracing which will also be used for
BPF_MODIFY_RETURN in a subsequent patch.

Also, clean up the fexit test to use the generated skeleton.

Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200304191853.1529-7-kpsingh@chromium.org
2020-03-04 13:41:06 -08:00
Steven Rostedt (VMware)
5412e0b763 tracing: Remove unused TRACE_BUFFER bits
Commit 567cd4da54 ("ring-buffer: User context bit recursion checking")
added the TRACE_BUFFER bits to be used in the current task's trace_recursion
field. But the final submission of the logic removed the use of those bits,
but never removed the bits themselves (they were never used in upstream
Linux). These can be safely removed.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-03 17:34:02 -05:00
Steven Rostedt (VMware)
b396bfdebf tracing: Have hwlat ts be first instance and record count of instances
The hwlat tracer runs a loop of width time during a given window. It then
reports the max latency over a given threshold and records a timestamp. But
this timestamp is the time after the width has finished, and not the time it
actually triggered.

Record the actual time when the latency was greater than the threshold as
well as the number of times it was greater in a given width per window.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-03 17:33:43 -05:00
David S. Miller
9f0ca0c1a5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-02-28

The following pull-request contains BPF updates for your *net-next* tree.

We've added 41 non-merge commits during the last 7 day(s) which contain
a total of 49 files changed, 1383 insertions(+), 499 deletions(-).

The main changes are:

1) BPF and Real-Time nicely co-exist.

2) bpftool feature improvements.

3) retrieve bpf_sk_storage via INET_DIAG.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-29 15:53:35 -08:00
Linus Torvalds
2edc78b9a4 block-5.6-2020-02-28
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl5ZXl0QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpmltEACSA4yxdvWsVMYRCijjm/FzBEq7C8PSsWNK
 H8KPmjQiNpbSiZSi1uMVsHMlhBmBM8ZQ6Zc+gbZSs6xMqa4yP/iRtmzxnGonC7TB
 f5Ne2QuC0+TKMFJJTG8cCTzrgEOrWYkFKkmabzDml7HtloJtuzgArrmPzRj2sUfY
 J+d0osdp1b4U4sqhhAnxSm/zYJkGrQb+9UgNdVjhZCUzaX6oCcuK8xUwu2reLGlM
 qPkSKOywnl3WHCSCJXsCrNLKX0QWtIfMzlWDr40GYgHauPBbWfa8+1yHR1/lWP4R
 zyxGk63I9f6/+iQSUC72wP77bAVWKW674c53jgd7r1pNL9TiuK+a3E4lgf7eU+rl
 ymA/rM6Iy3SjTgiLT57PPOecsILJns3cwZ6mhvSRs0+zpao7LOQZXWdu9V0+Fyqo
 jur+7Ll/Qfdv/CLlM94DeBJtwhaTWiHTfDoaDHlG9p1/vvcWWXTUTIVPwAD+YGbj
 geio/bIWECnQxDtZL5Jikf5zsC76aQ46vvxK4F6RJlXj6jaugIbN3mWLsg17sUVf
 Y4h+IEVtQr0zA0LkPrfVdAS9IqVlTrMRDCkrrlhsDt7FI0orCOag7JOcmN2/nPn/
 2H22nl6i02b0gdGrScU5pyBswSPaImddH5tqE9uL2rK4hrFe6oKxL5EicTFDZmTh
 tHnukoc+Yg==
 =1bzv
 -----END PGP SIGNATURE-----

Merge tag 'block-5.6-2020-02-28' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Passthrough insertion fix (Ming)

 - Kill off some unused arguments (John)

 - blktrace RCU fix (Jan)

 - Dead fields removal for null_blk (Dongli)

 - NVMe polled IO fix (Bijan)

* tag 'block-5.6-2020-02-28' of git://git.kernel.dk/linux-block:
  nvme-pci: Hold cq_poll_lock while completing CQEs
  blk-mq: Remove some unused function arguments
  null_blk: remove unused fields in 'nullb_cmd'
  blktrace: Protect q->blk_trace with RCU
  blk-mq: insert passthrough request into hctx->dispatch directly
2020-02-28 11:43:30 -08:00
David S. Miller
9f6e055907 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
The mptcp conflict was overlapping additions.

The SMC conflict was an additional and removal happening at the same
time.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-27 18:31:39 -08:00
Linus Torvalds
91ad64a84e Tracing updates:
Change in API of bootconfig (before it comes live in a release)
   - Have a magic value "BOOTCONFIG" in initrd to know a bootconfig exists
   - Set CONFIG_BOOT_CONFIG to 'n' by default
   - Show error if "bootconfig" on cmdline but not compiled in
   - Prevent redefining the same value
   - Have a way to append values
   - Added a SELECT BLK_DEV_INITRD to fix a build failure
 
  Synthetic event fixes:
   - Switch to raw_smp_processor_id() for recording CPU value in preempt
     section. (No care for what the value actually is)
   - Fix samples always recording u64 values
   - Fix endianess
   - Check number of values matches number of fields
   - Fix a printing bug
 
  Fix of trace_printk() breaking postponed start up tests
 
  Make a function static that is only used in a single file.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXlW4vxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qtioAP0WLEm3dWO0z3321h/a0DSshC+Bslu3
 HDPTsGVGrXmvggEA/lr1ikRHd8PsO7zW8BfaZMxoXaTqXiuSrzEWxnMlFw0=
 =O8PM
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing and bootconfig updates:
 "Fixes and changes to bootconfig before it goes live in a release.

  Change in API of bootconfig (before it comes live in a release):
  - Have a magic value "BOOTCONFIG" in initrd to know a bootconfig
    exists
  - Set CONFIG_BOOT_CONFIG to 'n' by default
  - Show error if "bootconfig" on cmdline but not compiled in
  - Prevent redefining the same value
  - Have a way to append values
  - Added a SELECT BLK_DEV_INITRD to fix a build failure

  Synthetic event fixes:
  - Switch to raw_smp_processor_id() for recording CPU value in preempt
    section. (No care for what the value actually is)
  - Fix samples always recording u64 values
  - Fix endianess
  - Check number of values matches number of fields
  - Fix a printing bug

  Fix of trace_printk() breaking postponed start up tests

  Make a function static that is only used in a single file"

* tag 'trace-v5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  bootconfig: Fix CONFIG_BOOTTIME_TRACING dependency issue
  bootconfig: Add append value operator support
  bootconfig: Prohibit re-defining value on same key
  bootconfig: Print array as multiple commands for legacy command line
  bootconfig: Reject subkey and value on same parent key
  tools/bootconfig: Remove unneeded error message silencer
  bootconfig: Add bootconfig magic word for indicating bootconfig explicitly
  bootconfig: Set CONFIG_BOOT_CONFIG=n by default
  tracing: Clear trace_state when starting trace
  bootconfig: Mark boot_config_checksum() static
  tracing: Disable trace_printk() on post poned tests
  tracing: Have synthetic event test use raw_smp_processor_id()
  tracing: Fix number printing bug in print_synth_event()
  tracing: Check that number of vals matches number of synth event fields
  tracing: Make synth_event trace functions endian-correct
  tracing: Make sure synth_event_trace() example always uses u64
2020-02-26 10:34:42 -08:00
Masami Hiramatsu
2910b5aa6f bootconfig: Fix CONFIG_BOOTTIME_TRACING dependency issue
Since commit d8a953ddde ("bootconfig: Set CONFIG_BOOT_CONFIG=n by
default") also changed the CONFIG_BOOTTIME_TRACING to select
CONFIG_BOOT_CONFIG to show the boot-time tracing on the menu,
it introduced wrong dependencies with BLK_DEV_INITRD as below.

WARNING: unmet direct dependencies detected for BOOT_CONFIG
  Depends on [n]: BLK_DEV_INITRD [=n]
  Selected by [y]:
  - BOOTTIME_TRACING [=y] && TRACING_SUPPORT [=y] && FTRACE [=y] && TRACING [=y]

This makes the CONFIG_BOOT_CONFIG selects CONFIG_BLK_DEV_INITRD to
fix this error and make CONFIG_BOOTTIME_TRACING=n by default, so
that both boot-time tracing and boot configuration off but those
appear on the menu list.

Link: http://lkml.kernel.org/r/158264140162.23842.11237423518607465535.stgit@devnote2

Fixes: d8a953ddde ("bootconfig: Set CONFIG_BOOT_CONFIG=n by default")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Compiled-tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-25 19:07:58 -05:00
Jan Kara
c780e86dd4 blktrace: Protect q->blk_trace with RCU
KASAN is reporting that __blk_add_trace() has a use-after-free issue
when accessing q->blk_trace. Indeed the switching of block tracing (and
thus eventual freeing of q->blk_trace) is completely unsynchronized with
the currently running tracing and thus it can happen that the blk_trace
structure is being freed just while __blk_add_trace() works on it.
Protect accesses to q->blk_trace by RCU during tracing and make sure we
wait for the end of RCU grace period when shutting down tracing. Luckily
that is rare enough event that we can afford that. Note that postponing
the freeing of blk_trace to an RCU callback should better be avoided as
it could have unexpected user visible side-effects as debugfs files
would be still existing for a short while block tracing has been shut
down.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=205711
CC: stable@vger.kernel.org
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reported-by: Tristan Madani <tristmd@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-25 08:40:07 -07:00
Thomas Gleixner
b0a81b94cc bpf/trace: Remove redundant preempt_disable from trace_call_bpf()
Similar to __bpf_trace_run this is redundant because __bpf_trace_run() is
invoked from a trace point via __DO_TRACE() which already disables
preemption _before_ invoking any of the functions which are attached to a
trace point.

Remove it and add a cant_sleep() check.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200224145643.059995527@linutronix.de
2020-02-24 16:18:20 -08:00
Alexei Starovoitov
70ed0706a4 bpf: disable preemption for bpf progs attached to uprobe
trace_call_bpf() no longer disables preemption on its own.
All callers of this function has to do it explicitly.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2020-02-24 16:17:14 -08:00
Thomas Gleixner
1b7a51a63b bpf/trace: Remove EXPORT from trace_call_bpf()
All callers are built in. No point to export this.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-02-24 16:16:38 -08:00
Thomas Gleixner
f03efe49bd bpf/tracing: Remove redundant preempt_disable() in __bpf_trace_run()
__bpf_trace_run() disables preemption around the BPF_PROG_RUN() invocation.

This is redundant because __bpf_trace_run() is invoked from a trace point
via __DO_TRACE() which already disables preemption _before_ invoking any of
the functions which are attached to a trace point.

Remove it and add a cant_sleep() check.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200224145642.847220186@linutronix.de
2020-02-24 16:12:20 -08:00
Masami Hiramatsu
d8a953ddde bootconfig: Set CONFIG_BOOT_CONFIG=n by default
Set CONFIG_BOOT_CONFIG=n by default. This also warns
user if CONFIG_BOOT_CONFIG=n but "bootconfig" is given
in the kernel command line.

Link: http://lkml.kernel.org/r/158220111291.26565.9036889083940367969.stgit@devnote2

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-20 17:52:12 -05:00
Masami Hiramatsu
7ab215f22d tracing: Clear trace_state when starting trace
Clear trace_state data structure when starting trace
in __synth_event_trace_start() internal function.

Currently trace_state is initialized only in the
synth_event_trace_start() API, but the trace_state
in synth_event_trace() and synth_event_trace_array()
are on the stack without initialization.
This means those APIs will see wrong parameters and
wil skip closing process in __synth_event_trace_end()
because trace_state->disabled may be !0.

Link: http://lkml.kernel.org/r/158193315899.8868.1781259176894639952.stgit@devnote2

Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-20 17:48:59 -05:00
Steven Rostedt (VMware)
78041c0c9e tracing: Disable trace_printk() on post poned tests
The tracing seftests checks various aspects of the tracing infrastructure,
and one is filtering. If trace_printk() is active during a self test, it can
cause the filtering to fail, which will disable that part of the trace.

To keep the selftests from failing because of trace_printk() calls,
trace_printk() checks the variable tracing_selftest_running, and if set, it
does not write to the tracing buffer.

As some tracers were registered earlier in boot, the selftest they triggered
would fail because not all the infrastructure was set up for the full
selftest. Thus, some of the tests were post poned to when their
infrastructure was ready (namely file system code). The postpone code did
not set the tracing_seftest_running variable, and could fail if a
trace_printk() was added and executed during their run.

Cc: stable@vger.kernel.org
Fixes: 9afecfbb95 ("tracing: Postpone tracer start-up tests till the system is more robust")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-20 17:43:58 -05:00
Steven Rostedt (VMware)
3c18a9be7c tracing: Have synthetic event test use raw_smp_processor_id()
The test code that tests synthetic event creation pushes in as one of its
test fields the current CPU using "smp_processor_id()". As this is just
something to see if the value is correctly passed in, and the actual CPU
used does not matter, use raw_smp_processor_id(), otherwise with debug
preemption enabled, a warning happens as the smp_processor_id() is called
without preemption enabled.

Link: http://lkml.kernel.org/r/20200220162950.35162579@gandalf.local.home

Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-20 17:43:41 -05:00
Tom Zanussi
784bd0847e tracing: Fix number printing bug in print_synth_event()
Fix a varargs-related bug in print_synth_event() which resulted in
strange output and oopses on 32-bit x86 systems. The problem is that
trace_seq_printf() expects the varargs to match the format string, but
print_synth_event() was always passing u64 values regardless.  This
results in unspecified behavior when unpacking with va_arg() in
trace_seq_printf().

Add a function that takes the size into account when calling
trace_seq_printf().

Before:

  modprobe-1731  [003] ....   919.039758: gen_synth_test: next_pid_field=777(null)next_comm_field=hula hoops ts_ns=1000000 ts_ms=1000 cpu=3(null)my_string_field=thneed my_int_field=598(null)

After:

 insmod-1136  [001] ....    36.634590: gen_synth_test: next_pid_field=777 next_comm_field=hula hoops ts_ns=1000000 ts_ms=1000 cpu=1 my_string_field=thneed my_int_field=598

Link: http://lkml.kernel.org/r/a9b59eb515dbbd7d4abe53b347dccf7a8e285657.1581720155.git.zanussi@kernel.org

Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-20 16:24:17 -05:00
Tom Zanussi
3843083772 tracing: Check that number of vals matches number of synth event fields
Commit 7276531d4036('tracing: Consolidate trace() functions')
inadvertently dropped the synth_event_trace() and
synth_event_trace_array() checks that verify the number of values
passed in matches the number of fields in the synthetic event being
traced, so add them back.

Link: http://lkml.kernel.org/r/32819cac708714693669e0dfe10fe9d935e94a16.1581720155.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-20 16:24:17 -05:00
Tom Zanussi
1d9d4c9019 tracing: Make synth_event trace functions endian-correct
synth_event_trace(), synth_event_trace_array() and
__synth_event_add_val() write directly into the trace buffer and need
to take endianness into account, like trace_event_raw_event_synth()
does.

Link: http://lkml.kernel.org/r/2011354355e405af9c9d28abba430d1f5ff7771a.1581720155.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-20 16:24:17 -05:00
Tom Zanussi
279eef0531 tracing: Make sure synth_event_trace() example always uses u64
synth_event_trace() is the varargs version of synth_event_trace_array(),
which takes an array of u64, as do synth_event_add_val() et al.

To not only be consistent with those, but also to address the fact
that synth_event_trace() expects every arg to be of the same type
since it doesn't also pass in e.g. a format string, the caller needs
to make sure all args are of the same type, u64.  u64 is used because
it needs to accomodate the largest type available in synthetic events,
which is u64.

This fixes the bug reported by the kernel test robot/Rong Chen.

Link: https://lore.kernel.org/lkml/20200212113444.GS12867@shao2-debian/
Link: http://lkml.kernel.org/r/894c4e955558b521210ee0642ba194a9e603354c.1581720155.git.zanussi@kernel.org

Fixes: 9fe41efaca ("tracing: Add synth event generation test module")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-20 16:24:17 -05:00
Daniel Xu
fff7b64355 bpf: Add bpf_read_branch_records() helper
Branch records are a CPU feature that can be configured to record
certain branches that are taken during code execution. This data is
particularly interesting for profile guided optimizations. perf has had
branch record support for a while but the data collection can be a bit
coarse grained.

We (Facebook) have seen in experiments that associating metadata with
branch records can improve results (after postprocessing). We generally
use bpf_probe_read_*() to get metadata out of userspace. That's why bpf
support for branch records is useful.

Aside from this particular use case, having branch data available to bpf
progs can be useful to get stack traces out of userspace applications
that omit frame pointers.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200218030432.4600-2-dxu@dxuuu.xyz
2020-02-19 14:37:36 -08:00
Song Liu
b80b033bed bpf: Allow bpf_perf_event_read_value in all BPF programs
bpf_perf_event_read_value() is NMI safe. Enable it for all BPF programs.
This can be used in fentry/fexit to profile BPF program and individual
kernel function with hardware counters.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200214234146.2910011-1-songliubraving@fb.com
2020-02-18 16:08:27 +01:00
Linus Torvalds
61a7595403 Various fixes:
- Fix an uninitialized variable
 
  - Fix compile bug to bootconfig userspace tool (in tools directory)
 
  - Suppress some error messages of bootconfig userspace tool
 
  - Remove unneded CONFIG_LIBXBC from bootconfig
 
  - Allocate bootconfig xbc_nodes dynamically.
    To ease complaints about taking up static memory at boot up
 
  - Use of parse_args() to parse bootconfig instead of strstr() usage
    Prevents issues of double quotes containing the interested string
 
  - Fix missing ring_buffer_nest_end() on synthetic event error path
 
  - Return zero not -EINVAL on soft disabled synthetic event
    (soft disabling must be the same as hard disabling, which returns zero)
 
  - Consolidate synthetic event code (remove duplicate code)
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXkMTwRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qnJQAQD5aKiD6jx+zGtLsFTuZMcEGvhhUuJ6
 oUaSvhKO3UqezwD/V5avKuuC0wWt//gOWDY+0+4QNjmvn1GLnaNYohs6fg0=
 =NLT9
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Various fixes:

   - Fix an uninitialized variable

   - Fix compile bug to bootconfig userspace tool (in tools directory)

   - Suppress some error messages of bootconfig userspace tool

   - Remove unneded CONFIG_LIBXBC from bootconfig

   - Allocate bootconfig xbc_nodes dynamically. To ease complaints about
     taking up static memory at boot up

   - Use of parse_args() to parse bootconfig instead of strstr() usage
     Prevents issues of double quotes containing the interested string

   - Fix missing ring_buffer_nest_end() on synthetic event error path

   - Return zero not -EINVAL on soft disabled synthetic event (soft
     disabling must be the same as hard disabling, which returns zero)

   - Consolidate synthetic event code (remove duplicate code)"

* tag 'trace-v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Consolidate trace() functions
  tracing: Don't return -EINVAL when tracing soft disabled synth events
  tracing: Add missing nest end to synth_event_trace_start() error case
  tools/bootconfig: Suppress non-error messages
  bootconfig: Allocate xbc_nodes array dynamically
  bootconfig: Use parse_args() to find bootconfig and '--'
  tracing/kprobe: Fix uninitialized variable bug
  bootconfig: Remove unneeded CONFIG_LIBXBC
  tools/bootconfig: Fix wrong __VA_ARGS__ usage
2020-02-11 16:39:18 -08:00
Tom Zanussi
7276531d40 tracing: Consolidate trace() functions
Move the checking, buffer reserve and buffer commit code in
synth_event_trace_start/end() into inline functions
__synth_event_trace_start/end() so they can also be used by
synth_event_trace() and synth_event_trace_array(), and then have all
those functions use them.

Also, change synth_event_trace_state.enabled to disabled so it only
needs to be set if the event is disabled, which is not normally the
case.

Link: http://lkml.kernel.org/r/b1f3108d0f450e58192955a300e31d0405ab4149.1581374549.git.zanussi@kernel.org

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-10 22:00:21 -05:00
Tom Zanussi
0c62f6cd9e tracing: Don't return -EINVAL when tracing soft disabled synth events
There's no reason to return -EINVAL when tracing a synthetic event if
it's soft disabled - treat it the same as if it were hard disabled and
return normally.

Have synth_event_trace() and synth_event_trace_array() just return
normally, and have synth_event_trace_start set the trace state to
disabled and return.

Link: http://lkml.kernel.org/r/df5d02a1625aff97c9866506c5bada6a069982ba.1581374549.git.zanussi@kernel.org

Fixes: 8dcc53ad95 ("tracing: Add synth_event_trace() and related functions")
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-10 22:00:13 -05:00
Tom Zanussi
d090409abb tracing: Add missing nest end to synth_event_trace_start() error case
If the ring_buffer reserve in synth_event_trace_start() fails, the
matching ring_buffer_nest_end() should be called in the error code,
since nothing else will ever call it in this case.

Link: http://lkml.kernel.org/r/20abc444b3eeff76425f895815380abe7aa53ff8.1581374549.git.zanussi@kernel.org

Fixes: 8dcc53ad95 ("tracing: Add synth_event_trace() and related functions")
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-10 21:58:19 -05:00
Gustavo A. R. Silva
10f129cb59 tracing/kprobe: Fix uninitialized variable bug
There is a potential execution path in which variable *ret* is returned
without being properly initialized, previously.

Fix this by initializing variable *ret* to 0.

Link: http://lkml.kernel.org/r/20200205223404.GA3379@embeddedor

Addresses-Coverity-ID: 1491142 ("Uninitialized scalar variable")
Fixes: 2a588dd1d5 ("tracing: Add kprobe event command generation functions")
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-10 12:07:42 -05:00
Linus Torvalds
e310396bb8 Tracing updates:
- Added new "bootconfig".
    Looks for a file appended to initrd to add boot config options.
    This has been discussed thoroughly at Linux Plumbers.
    Very useful for adding kprobes at bootup.
    Only enabled if "bootconfig" is on the real kernel command line.
 
  - Created dynamic event creation.
    Merges common code between creating synthetic events and
      kprobe events.
 
  - Rename perf "ring_buffer" structure to "perf_buffer"
 
  - Rename ftrace "ring_buffer" structure to "trace_buffer"
    Had to rename existing "trace_buffer" to "array_buffer"
 
  - Allow trace_printk() to work withing (some) tracing code.
 
  - Sort of tracing configs to be a little better organized
 
  - Fixed bug where ftrace_graph hash was not being protected properly
 
  - Various other small fixes and clean ups
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXjtAURQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qshOAQDzopQmvAVrrI6oogghr8JQA30Z2yqT
 i+Ld7vPWL2MV9wEA1S+zLGDSYrj8f/vsCq6BxRYT1ApO+YtmY6LTXiUejwg=
 =WNds
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - Added new "bootconfig".

   This looks for a file appended to initrd to add boot config options,
   and has been discussed thoroughly at Linux Plumbers.

   Very useful for adding kprobes at bootup.

   Only enabled if "bootconfig" is on the real kernel command line.

 - Created dynamic event creation.

   Merges common code between creating synthetic events and kprobe
   events.

 - Rename perf "ring_buffer" structure to "perf_buffer"

 - Rename ftrace "ring_buffer" structure to "trace_buffer"

   Had to rename existing "trace_buffer" to "array_buffer"

 - Allow trace_printk() to work withing (some) tracing code.

 - Sort of tracing configs to be a little better organized

 - Fixed bug where ftrace_graph hash was not being protected properly

 - Various other small fixes and clean ups

* tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (88 commits)
  bootconfig: Show the number of nodes on boot message
  tools/bootconfig: Show the number of bootconfig nodes
  bootconfig: Add more parse error messages
  bootconfig: Use bootconfig instead of boot config
  ftrace: Protect ftrace_graph_hash with ftrace_sync
  ftrace: Add comment to why rcu_dereference_sched() is open coded
  tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
  tracing: Annotate ftrace_graph_hash pointer with __rcu
  bootconfig: Only load bootconfig if "bootconfig" is on the kernel cmdline
  tracing: Use seq_buf for building dynevent_cmd string
  tracing: Remove useless code in dynevent_arg_pair_add()
  tracing: Remove check_arg() callbacks from dynevent args
  tracing: Consolidate some synth_event_trace code
  tracing: Fix now invalid var_ref_vals assumption in trace action
  tracing: Change trace_boot to use synth_event interface
  tracing: Move tracing selftests to bottom of menu
  tracing: Move mmio tracer config up with the other tracers
  tracing: Move tracing test module configs together
  tracing: Move all function tracing configs together
  tracing: Documentation for in-kernel synthetic event API
  ...
2020-02-06 07:12:11 +00:00
Steven Rostedt (VMware)
54a16ff6f2 ftrace: Protect ftrace_graph_hash with ftrace_sync
As function_graph tracer can run when RCU is not "watching", it can not be
protected by synchronize_rcu() it requires running a task on each CPU before
it can be freed. Calling schedule_on_each_cpu(ftrace_sync) needs to be used.

Link: https://lore.kernel.org/r/20200205131110.GT2935@paulmck-ThinkPad-P72

Cc: stable@vger.kernel.org
Fixes: b9b0c831be ("ftrace: Convert graph filter to use hash tables")
Reported-by: "Paul E. McKenney" <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-05 17:16:42 -05:00
Steven Rostedt (VMware)
16052dd5bd ftrace: Add comment to why rcu_dereference_sched() is open coded
Because the function graph tracer can execute in sections where RCU is not
"watching", the rcu_dereference_sched() for the has needs to be open coded.
This is fine because the RCU "flavor" of the ftrace hash is protected by
its own RCU handling (it does its own little synchronization on every CPU
and does not rely on RCU sched).

Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-05 17:15:57 -05:00
Amol Grover
fd0e6852c4 tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
Fix following instances of sparse error
kernel/trace/ftrace.c:5667:29: error: incompatible types in comparison
kernel/trace/ftrace.c:5813:21: error: incompatible types in comparison
kernel/trace/ftrace.c:5868:36: error: incompatible types in comparison
kernel/trace/ftrace.c:5870:25: error: incompatible types in comparison

Use rcu_dereference_protected to dereference the newly annotated pointer.

Link: http://lkml.kernel.org/r/20200205055701.30195-1-frextrite@gmail.com

Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-05 17:14:37 -05:00
Amol Grover
24a9729f83 tracing: Annotate ftrace_graph_hash pointer with __rcu
Fix following instances of sparse error
kernel/trace/ftrace.c:5664:29: error: incompatible types in comparison
kernel/trace/ftrace.c:5785:21: error: incompatible types in comparison
kernel/trace/ftrace.c:5864:36: error: incompatible types in comparison
kernel/trace/ftrace.c:5866:25: error: incompatible types in comparison

Use rcu_dereference_protected to access the __rcu annotated pointer.

Link: http://lkml.kernel.org/r/20200201072703.17330-1-frextrite@gmail.com

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Amol Grover <frextrite@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-05 17:14:26 -05:00
Linus Torvalds
72f582ff85 Merge branch 'work.recursive_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs recursive removal updates from Al Viro:
 "We have quite a few places where synthetic filesystems do an
  equivalent of 'rm -rf', with varying amounts of code duplication,
  wrong locking, etc. That really ought to be a library helper.

  Only debugfs (and very similar tracefs) are converted here - I have
  more conversions, but they'd never been in -next, so they'll have to
  wait"

* 'work.recursive_removal' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  simple_recursive_removal(): kernel-side rm -rf for ramfs-style filesystems
2020-02-05 05:09:46 +00:00