Commit Graph

34208 Commits

Author SHA1 Message Date
Alan Maguire
c4d0bfb450 bpf: Add bpf_snprintf_btf helper
A helper is added to support tracing kernel type information in BPF
using the BPF Type Format (BTF).  Its signature is

long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr,
		      u32 btf_ptr_size, u64 flags);

struct btf_ptr * specifies

- a pointer to the data to be traced
- the BTF id of the type of data pointed to
- a flags field is provided for future use; these flags
  are not to be confused with the BTF_F_* flags
  below that control how the btf_ptr is displayed; the
  flags member of the struct btf_ptr may be used to
  disambiguate types in kernel versus module BTF, etc;
  the main distinction is the flags relate to the type
  and information needed in identifying it; not how it
  is displayed.

For example a BPF program with a struct sk_buff *skb
could do the following:

	static struct btf_ptr b = { };

	b.ptr = skb;
	b.type_id = __builtin_btf_type_id(struct sk_buff, 1);
	bpf_snprintf_btf(str, sizeof(str), &b, sizeof(b), 0, 0);

Default output looks like this:

(struct sk_buff){
 .transport_header = (__u16)65535,
 .mac_header = (__u16)65535,
 .end = (sk_buff_data_t)192,
 .head = (unsigned char *)0x000000007524fd8b,
 .data = (unsigned char *)0x000000007524fd8b,
 .truesize = (unsigned int)768,
 .users = (refcount_t){
  .refs = (atomic_t){
   .counter = (int)1,
  },
 },
}

Flags modifying display are as follows:

- BTF_F_COMPACT:	no formatting around type information
- BTF_F_NONAME:		no struct/union member names/types
- BTF_F_PTR_RAW:	show raw (unobfuscated) pointer values;
			equivalent to %px.
- BTF_F_ZERO:		show zero-valued struct/union members;
			they are not displayed by default

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1601292670-1616-4-git-send-email-alan.maguire@oracle.com
2020-09-28 18:26:58 -07:00
Alan Maguire
31d0bc8163 bpf: Move to generic BTF show support, apply it to seq files/strings
generalize the "seq_show" seq file support in btf.c to support
a generic show callback of which we support two instances; the
current seq file show, and a show with snprintf() behaviour which
instead writes the type data to a supplied string.

Both classes of show function call btf_type_show() with different
targets; the seq file or the string to be written.  In the string
case we need to track additional data - length left in string to write
and length to return that we would have written (a la snprintf).

By default show will display type information, field members and
their types and values etc, and the information is indented
based upon structure depth. Zeroed fields are omitted.

Show however supports flags which modify its behaviour:

BTF_SHOW_COMPACT - suppress newline/indent.
BTF_SHOW_NONAME - suppress show of type and member names.
BTF_SHOW_PTR_RAW - do not obfuscate pointer values.
BTF_SHOW_UNSAFE - do not copy data to safe buffer before display.
BTF_SHOW_ZERO - show zeroed values (by default they are not shown).

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1601292670-1616-3-git-send-email-alan.maguire@oracle.com
2020-09-28 18:26:58 -07:00
Alan Maguire
76654e67f3 bpf: Provide function to get vmlinux BTF information
It will be used later for BPF structure display support

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1601292670-1616-2-git-send-email-alan.maguire@oracle.com
2020-09-28 18:26:58 -07:00
Toke Høiland-Jørgensen
f7b12b6fea bpf: verifier: refactor check_attach_btf_id()
The check_attach_btf_id() function really does three things:

1. It performs a bunch of checks on the program to ensure that the
   attachment is valid.

2. It stores a bunch of state about the attachment being requested in
   the verifier environment and struct bpf_prog objects.

3. It allocates a trampoline for the attachment.

This patch splits out (1.) and (3.) into separate functions which will
perform the checks, but return the computed values instead of directly
modifying the environment. This is done in preparation for reusing the
checks when the actual attachment is happening, which will allow tracing
programs to have multiple (compatible) attachments.

This also fixes a bug where a bunch of checks were skipped if a trampoline
already existed for the tracing target.

Fixes: 6ba43b761c ("bpf: Attachment verification for BPF_MODIFY_RETURN")
Fixes: 1e6c62a882 ("bpf: Introduce sleepable BPF programs")
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-28 17:10:34 -07:00
Toke Høiland-Jørgensen
efc68158c4 bpf: change logging calls from verbose() to bpf_log() and use log pointer
In preparation for moving code around, change a bunch of references to
env->log (and the verbose() logging helper) to use bpf_log() and a direct
pointer to struct bpf_verifier_log. While we're touching the function
signature, mark the 'prog' argument to bpf_check_type_match() as const.

Also enhance the bpf_verifier_log_needed() check to handle NULL pointers
for the log struct so we can re-use the code with logging disabled.

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-28 17:09:59 -07:00
Toke Høiland-Jørgensen
1af9270e90 bpf: disallow attaching modify_return tracing functions to other BPF programs
From the checks and commit messages for modify_return, it seems it was
never the intention that it should be possible to attach a tracing program
with expected_attach_type == BPF_MODIFY_RETURN to another BPF program.
However, check_attach_modify_return() will only look at the function name,
so if the target function starts with "security_", the attach will be
allowed even for bpf2bpf attachment.

Fix this oversight by also blocking the modification if a target program is
supplied.

Fixes: 18644cec71 ("bpf: Fix use-after-free in fmod_ret check")
Fixes: 6ba43b761c ("bpf: Attachment verification for BPF_MODIFY_RETURN")
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-28 17:08:07 -07:00
Lorenz Bauer
6550f2dddf bpf: sockmap: Enable map_update_elem from bpf_iter
Allow passing a pointer to a BTF struct sock_common* when updating
a sockmap or sockhash. Since BTF pointers can fault and therefore be
NULL at runtime we need to add an additional !sk check to
sock_map_update_elem. Since we may be passed a request or timewait
socket we also need to check sk_fullsock. Doing this allows calling
map_update_elem on sockmap from bpf_iter context, which uses
BTF pointers.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200928090805.23343-2-lmb@cloudflare.com
2020-09-28 16:40:46 -07:00
Lorenzo Bianconi
efa90b5093 bpf, cpumap: Remove rcpu pointer from cpu_map_build_skb signature
Get rid of bpf_cpu_map_entry pointer in cpu_map_build_skb routine
signature since it is no longer needed.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/33cb9b7dc447de3ea6fd6ce713ac41bca8794423.1601292015.git.lorenzo@kernel.org
2020-09-28 23:30:42 +02:00
Song Liu
1b4d60ec16 bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint
Add .test_run for raw_tracepoint. Also, introduce a new feature that runs
the target program on a specific CPU. This is achieved by a new flag in
bpf_attr.test, BPF_F_TEST_RUN_ON_CPU. When this flag is set, the program
is triggered on cpu with id bpf_attr.test.cpu. This feature is needed for
BPF programs that handle perf_event and other percpu resources, as the
program can access these resource locally.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200925205432.1777-2-songliubraving@fb.com
2020-09-28 21:52:36 +02:00
John Fastabend
4fbb38a3b2 bpf, verifier: Remove redundant var_off.value ops in scalar known reg cases
In BPF_AND and BPF_OR alu cases we have this pattern when the src and dst
tnum is a constant.

 1 dst_reg->var_off = tnum_[op](dst_reg->var_off, src_reg.var_off)
 2 scalar32_min_max_[op]
 3       if (known) return
 4 scalar_min_max_[op]
 5       if (known)
 6          __mark_reg_known(dst_reg,
                   dst_reg->var_off.value [op] src_reg.var_off.value)

The result is in 1 we calculate the var_off value and store it in the
dst_reg. Then in 6 we duplicate this logic doing the op again on the
value.

The duplication comes from the the tnum_[op] handlers because they have
already done the value calcuation. For example this is tnum_and().

 struct tnum tnum_and(struct tnum a, struct tnum b)
 {
	u64 alpha, beta, v;

	alpha = a.value | a.mask;
	beta = b.value | b.mask;
	v = a.value & b.value;
	return TNUM(v, alpha & beta & ~v);
 }

So lets remove the redundant op calculation. Its confusing for readers
and unnecessary. Its also not harmful because those ops have the
property, r1 & r1 = r1 and r1 | r1 = r1.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-25 16:47:21 -07:00
Martin KaFai Lau
592a349864 bpf: Change bpf_sk_storage_*() to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON
This patch changes the bpf_sk_storage_*() to take
ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer
returned by the bpf_skc_to_*() helpers also.

A micro benchmark has been done on a "cgroup_skb/egress" bpf program
which does a bpf_sk_storage_get().  It was driven by netperf doing
a 4096 connected UDP_STREAM test with 64bytes packet.
The stats from "kernel.bpf_stats_enabled" shows no meaningful difference.

The sk_storage_get_btf_proto, sk_storage_delete_btf_proto,
btf_sk_storage_get_proto, and btf_sk_storage_delete_proto are
no longer needed, so they are removed.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200925000402.3856307-1-kafai@fb.com
2020-09-25 13:58:01 -07:00
Martin KaFai Lau
1df8f55a37 bpf: Enable bpf_skc_to_* sock casting helper to networking prog type
There is a constant need to add more fields into the bpf_tcp_sock
for the bpf programs running at tc, sock_ops...etc.

A current workaround could be to use bpf_probe_read_kernel().  However,
other than making another helper call for reading each field and missing
CO-RE, it is also not as intuitive to use as directly reading
"tp->lsndtime" for example.  While already having perfmon cap to do
bpf_probe_read_kernel(), it will be much easier if the bpf prog can
directly read from the tcp_sock.

This patch tries to do that by using the existing casting-helpers
bpf_skc_to_*() whose func_proto returns a btf_id.  For example, the
func_proto of bpf_skc_to_tcp_sock returns the btf_id of the
kernel "struct tcp_sock".

These helpers are also added to is_ptr_cast_function().
It ensures the returning reg (BPF_REF_0) will also carries the ref_obj_id.
That will keep the ref-tracking works properly.

The bpf_skc_to_* helpers are made available to most of the bpf prog
types in filter.c. The bpf_skc_to_* helpers will be limited by
perfmon cap.

This patch adds a ARG_PTR_TO_BTF_ID_SOCK_COMMON.  The helper accepting
this arg can accept a btf-id-ptr (PTR_TO_BTF_ID + &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON])
or a legacy-ctx-convert-skc-ptr (PTR_TO_SOCK_COMMON).  The bpf_skc_to_*()
helpers are changed to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that
they will accept pointer obtained from skb->sk.

Instead of specifying both arg_type and arg_btf_id in the same func_proto
which is how the current ARG_PTR_TO_BTF_ID does, the arg_btf_id of
the new ARG_PTR_TO_BTF_ID_SOCK_COMMON is specified in the
compatible_reg_types[] in verifier.c.  The reason is the arg_btf_id is
always the same.  Discussion in this thread:
https://lore.kernel.org/bpf/20200922070422.1917351-1-kafai@fb.com/

The ARG_PTR_TO_BTF_ID_ part gives a clear expectation that the helper is
expecting a PTR_TO_BTF_ID which could be NULL.  This is the same
behavior as the existing helper taking ARG_PTR_TO_BTF_ID.

The _SOCK_COMMON part means the helper is also expecting the legacy
SOCK_COMMON pointer.

By excluding the _OR_NULL part, the bpf prog cannot call helper
with a literal NULL which doesn't make sense in most cases.
e.g. bpf_skc_to_tcp_sock(NULL) will be rejected.  All PTR_TO_*_OR_NULL
reg has to do a NULL check first before passing into the helper or else
the bpf prog will be rejected.  This behavior is nothing new and
consistent with the current expectation during bpf-prog-load.

[ ARG_PTR_TO_BTF_ID_SOCK_COMMON will be used to replace
  ARG_PTR_TO_SOCK* of other existing helpers later such that
  those existing helpers can take the PTR_TO_BTF_ID returned by
  the bpf_skc_to_*() helpers.

  The only special case is bpf_sk_lookup_assign() which can accept a
  literal NULL ptr.  It has to be handled specially in another follow
  up patch if there is a need (e.g. by renaming ARG_PTR_TO_SOCKET_OR_NULL
  to ARG_PTR_TO_BTF_ID_SOCK_COMMON_OR_NULL). ]

[ When converting the older helpers that take ARG_PTR_TO_SOCK* in
  the later patch, if the kernel does not support BTF,
  ARG_PTR_TO_BTF_ID_SOCK_COMMON will behave like ARG_PTR_TO_SOCK_COMMON
  because no reg->type could have PTR_TO_BTF_ID in this case.

  It is not a concern for the newer-btf-only helper like the bpf_skc_to_*()
  here though because these helpers must require BTF vmlinux to begin
  with. ]

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200925000350.3855720-1-kafai@fb.com
2020-09-25 13:58:01 -07:00
Martin KaFai Lau
a968d5e277 bpf: Move the PTR_TO_BTF_ID check to check_reg_type()
check_reg_type() checks whether a reg can be used as an arg of a
func_proto.  For PTR_TO_BTF_ID, the check is actually not
completely done until the reg->btf_id is pointing to a
kernel struct that is acceptable by the func_proto.

Thus, this patch moves the btf_id check into check_reg_type().
"arg_type" and "arg_btf_id" are passed to check_reg_type() instead of
"compatible".  The compatible_reg_types[] usage is localized in
check_reg_type() now.

The "if (!btf_id) verbose(...); " is also removed since it won't happen.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200925000344.3854828-1-kafai@fb.com
2020-09-25 13:58:01 -07:00
Alexei Starovoitov
182bf3f3dd Merge branch 'rtt-speedup.2020.09.16a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into bpf-next
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-23 19:32:09 -07:00
Alexei Starovoitov
f00f2f7fe8 Revert "bpf: Fix potential call bpf_link_free() in atomic context"
This reverts commit 31f23a6a18.

This change made many selftests/bpf flaky: flow_dissector, sk_lookup, sk_assign and others.
There was no issue in the code.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-23 19:14:11 -07:00
David S. Miller
6d772f328d Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-09-23

The following pull-request contains BPF updates for your *net-next* tree.

We've added 95 non-merge commits during the last 22 day(s) which contain
a total of 124 files changed, 4211 insertions(+), 2040 deletions(-).

The main changes are:

1) Full multi function support in libbpf, from Andrii.

2) Refactoring of function argument checks, from Lorenz.

3) Make bpf_tail_call compatible with functions (subprograms), from Maciej.

4) Program metadata support, from YiFei.

5) bpf iterator optimizations, from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-23 13:11:11 -07:00
Lorenz Bauer
0789e13bc3 bpf: Explicitly size compatible_reg_types
Arrays with designated initializers have an implicit length of the highest
initialized value plus one. I used this to ensure that newly added entries
in enum bpf_reg_type get a NULL entry in compatible_reg_types.

This is difficult to understand since it requires knowledge of the
peculiarities of designated initializers. Use __BPF_ARG_TYPE_MAX to size
the array instead.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200923160156.80814-1-lmb@cloudflare.com
2020-09-23 11:46:46 -07:00
David S. Miller
3ab0a7a0c3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Two minor conflicts:

1) net/ipv4/route.c, adding a new local variable while
   moving another local variable and removing it's
   initial assignment.

2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
   One pretty prints the port mode differently, whilst another
   changes the driver to try and obtain the port mode from
   the port node rather than the switch node.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-22 16:45:34 -07:00
Linus Torvalds
d3017135c4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:

 - fix failure to add bond interfaces to a bridge, the offload-handling
   code was too defensive there and recent refactoring unearthed that.
   Users complained (Ido)

 - fix unnecessarily reflecting ECN bits within TOS values / QoS marking
   in TCP ACK and reset packets (Wei)

 - fix a deadlock with bpf iterator. Hopefully we're in the clear on
   this front now... (Yonghong)

 - BPF fix for clobbering r2 in bpf_gen_ld_abs (Daniel)

 - fix AQL on mt76 devices with FW rate control and add a couple of AQL
   issues in mac80211 code (Felix)

 - fix authentication issue with mwifiex (Maximilian)

 - WiFi connectivity fix: revert IGTK support in ti/wlcore (Mauro)

 - fix exception handling for multipath routes via same device (David
   Ahern)

 - revert back to a BH spin lock flavor for nsid_lock: there are paths
   which do require the BH context protection (Taehee)

 - fix interrupt / queue / NAPI handling in the lantiq driver (Hauke)

 - fix ife module load deadlock (Cong)

 - make an adjustment to netlink reply message type for code added in
   this release (the sole change touching uAPI here) (Michal)

 - a number of fixes for small NXP and Microchip switches (Vladimir)

[ Pull request acked by David: "you can expect more of this in the
  future as I try to delegate more things to Jakub" ]

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (167 commits)
  net: mscc: ocelot: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
  net: dsa: seville: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
  net: dsa: felix: fix some key offsets for IP4_TCP_UDP VCAP IS2 entries
  inet_diag: validate INET_DIAG_REQ_PROTOCOL attribute
  net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU
  net: Update MAINTAINERS for MediaTek switch driver
  net/mlx5e: mlx5e_fec_in_caps() returns a boolean
  net/mlx5e: kTLS, Avoid kzalloc(GFP_KERNEL) under spinlock
  net/mlx5e: kTLS, Fix leak on resync error flow
  net/mlx5e: kTLS, Add missing dma_unmap in RX resync
  net/mlx5e: kTLS, Fix napi sync and possible use-after-free
  net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported
  net/mlx5e: Fix using wrong stats_grps in mlx5e_update_ndo_stats()
  net/mlx5e: Fix multicast counter not up-to-date in "ip -s"
  net/mlx5e: Fix endianness when calculating pedit mask first bit
  net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported
  net/mlx5e: CT: Fix freeing ct_label mapping
  net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready
  net/mlx5e: Use synchronize_rcu to sync with NAPI
  net/mlx5e: Use RCU to protect rq->xdp_prog
  ...
2020-09-22 14:43:50 -07:00
Linus Torvalds
eff48ddeab Tracing fixes:
- Check kprobe is enabled before unregistering from ftrace as it isn't
   registered when disabled.
 
 - Remove kprobes enabled via command-line that is on init text when freed.
 
 - Add missing RCU synchronization for ftrace trampoline symbols removed
   from kallsyms.
 
 - Free trampoline on error path if ftrace_startup() fails.
 
 - Give more space for the longer PID numbers in trace output.
 
 - Fix a possible double free in the histogram code.
 
 - A couple of fixes that were discovered by sparse.
 -----BEGIN PGP SIGNATURE-----
 
 iIkEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCX2lEchQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qokAAQDphHFTOlgjKi7lF7bc5V1bl/MT1bVo
 bJRHV8w2agtXMgD49ElFOl6znXqid3X++0dYZ5/AQgOZXf1rsYS05Pj0Dw==
 =+cVX
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Check kprobe is enabled before unregistering from ftrace as it isn't
   registered when disabled.

 - Remove kprobes enabled via command-line that is on init text when
   freed.

 - Add missing RCU synchronization for ftrace trampoline symbols removed
   from kallsyms.

 - Free trampoline on error path if ftrace_startup() fails.

 - Give more space for the longer PID numbers in trace output.

 - Fix a possible double free in the histogram code.

 - A couple of fixes that were discovered by sparse.

* tag 'trace-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  bootconfig: init: make xbc_namebuf static
  kprobes: tracing/kprobes: Fix to kill kprobes on initmem after boot
  tracing: fix double free
  ftrace: Let ftrace_enable_sysctl take a kernel pointer buffer
  tracing: Make the space reserved for the pid wider
  ftrace: Fix missing synchronize_rcu() removing trampoline from kallsyms
  ftrace: Free the trampoline when ftrace_startup() fails
  kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()
2020-09-22 09:08:33 -07:00
Jiri Olsa
a8a717963f selftests/bpf: Fix stat probe in d_path test
Some kernels builds might inline vfs_getattr call within fstat
syscall code path, so fentry/vfs_getattr trampoline is not called.

Add security_inode_getattr to allowlist and switch the d_path test stat
trampoline to security_inode_getattr.

Keeping dentry_open and filp_close, because they are in their own
files, so unlikely to be inlined, but in case they are, adding
security_file_open.

Adding flags that indicate trampolines were called and failing
the test if any of them got missed, so it's easier to identify
the issue next time.

Fixes: e4d1af4b16 ("selftests/bpf: Add test for d_path helper")
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200918112338.2618444-1-jolsa@kernel.org
2020-09-21 16:18:00 -07:00
Lorenz Bauer
f79e7ea571 bpf: Use a table to drive helper arg type checks
The mapping between bpf_arg_type and bpf_reg_type is encoded in a big
hairy if statement that is hard to follow. The debug output also leaves
to be desired: if a reg_type doesn't match we only print one of the
options, instead printing all the valid ones.

Convert the if statement into a table which is then used to drive type
checking. If none of the reg_types match we print all options, e.g.:

    R2 type=rdonly_buf expected=fp, pkt, pkt_meta, map_value

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200921121227.255763-12-lmb@cloudflare.com
2020-09-21 15:00:41 -07:00
Lorenz Bauer
fd1b0d604c bpf: Hoist type checking for nullable arg types
check_func_arg has a plethora of weird if statements with empty branches.
They work around the fact that *_OR_NULL argument types should accept a
SCALAR_VALUE register, as long as it's value is 0. These statements make
it difficult to reason about the type checking logic.

Instead, skip more detailed type checking logic iff the register is 0,
and the function expects a nullable type. This allows simplifying the type
checking itself.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200921121227.255763-11-lmb@cloudflare.com
2020-09-21 15:00:41 -07:00
Lorenz Bauer
c18f0b6aee bpf: Check ARG_PTR_TO_SPINLOCK register type in check_func_arg
Move the check for PTR_TO_MAP_VALUE to check_func_arg, where all other
checking is done as well. Move the invocation of process_spin_lock away
from the register type checking, to allow a future refactoring.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200921121227.255763-10-lmb@cloudflare.com
2020-09-21 15:00:41 -07:00
Lorenz Bauer
a2bbe7cc90 bpf: Set meta->raw_mode for pointers close to use
If we encounter a pointer to memory, we set meta->raw_mode depending
on the type of memory we point at. What isn't obvious is that this
information is only used when the next memory size argument is
encountered.

Move the assignment closer to where it's used, and add a comment that
explains what is going on.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200921121227.255763-9-lmb@cloudflare.com
2020-09-21 15:00:41 -07:00
Lorenz Bauer
feec704016 bpf: Make context access check generic
Always check context access if the register we're operating on is
PTR_TO_CTX, rather than relying on ARG_PTR_TO_CTX. This allows
simplifying the arg_type checking section of the function.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200921121227.255763-8-lmb@cloudflare.com
2020-09-21 15:00:41 -07:00
Lorenz Bauer
02f7c9585d bpf: Make reference tracking generic
Instead of dealing with reg->ref_obj_id individually for every arg type that
needs it, rely on the fact that ref_obj_id is zero if the register is not
reference tracked.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200921121227.255763-7-lmb@cloudflare.com
2020-09-21 15:00:41 -07:00
Lorenz Bauer
d7b9454a4f bpf: Make BTF pointer type checking generic
Perform BTF type checks if the register we're working on contains a BTF
pointer, rather than if the argument is for a BTF pointer. This is easier
to understand, and allows removing the code from the arg_type checking
section of the function.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200921121227.255763-6-lmb@cloudflare.com
2020-09-21 15:00:40 -07:00
Lorenz Bauer
9436ef6e86 bpf: Allow specifying a BTF ID per argument in function protos
Function prototypes using ARG_PTR_TO_BTF_ID currently use two ways to signal
which BTF IDs are acceptable. First, bpf_func_proto.btf_id is an array of
IDs, one for each argument. This array is only accessed up to the highest
numbered argument that uses ARG_PTR_TO_BTF_ID and may therefore be less than
five arguments long. It usually points at a BTF_ID_LIST. Second, check_btf_id
is a function pointer that is called by the verifier if present. It gets the
actual BTF ID of the register, and the argument number we're currently checking.
It turns out that the only user check_arg_btf_id ignores the argument, and is
simply used to check whether the BTF ID has a struct sock_common at it's start.

Replace both of these mechanisms with an explicit BTF ID for each argument
in a function proto. Thanks to btf_struct_ids_match this is very flexible:
check_arg_btf_id can be replaced by requiring struct sock_common.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200921121227.255763-5-lmb@cloudflare.com
2020-09-21 15:00:40 -07:00
Lorenz Bauer
0d004c020b bpf: Check scalar or invalid register in check_helper_mem_access
Move the check for a NULL or zero register to check_helper_mem_access. This
makes check_stack_boundary easier to understand.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200921121227.255763-3-lmb@cloudflare.com
2020-09-21 15:00:40 -07:00
Lorenz Bauer
2af30f115d btf: Make btf_set_contains take a const pointer
bsearch doesn't modify the contents of the array, so we can take a const pointer.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200921121227.255763-2-lmb@cloudflare.com
2020-09-21 15:00:40 -07:00
Linus Torvalds
9847774063 Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU fix from Paul McKenney:
 "This contains a single commit that fixes a bug that was introduced in
  the last merge window. This bug causes a compiler warning complaining
  about show_rcu_tasks_classic_gp_kthread() being an unused static
  function in !SMP kernels.

  The fix is straightforward, just adding an 'inline' to make this a
  static inline function, thus avoiding the warning.

  This bug was reported by Laurent Pinchart, who would like it fixed
  sooner rather than later"

* 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  rcu-tasks: Prevent complaints of unused show_rcu_tasks_classic_gp_kthread()
2020-09-21 12:42:31 -07:00
Muchun Song
31f23a6a18 bpf: Fix potential call bpf_link_free() in atomic context
The in_atomic() macro cannot always detect atomic context, in particular,
it cannot know about held spinlocks in non-preemptible kernels. Although,
there is no user call bpf_link_put() with holding spinlock now, be on the
safe side, so we can avoid this in the future.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200917074453.20621-1-songmuchun@bytedance.com
2020-09-21 21:20:17 +02:00
Linus Torvalds
e2bff391ca Fix the seccomp syscall rewriting so that trace and audit see the
rewritten syscall number, from Kees Cook.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl9ntRoACgkQEsHwGGHe
 VUojcQ//SuRLaKH0JSSfMCKWtnqfhE4bhExaQyj9hqwDI60mdbnm+uTCI5+mWUPr
 B2C5+wxBJ+wHKZgPHOHP5ZW2h9bEkHVBtTjLAJv3VdCNtckANZJDyVe7b91kG0Ku
 +qHKtQU5zkVjAWo6BNBG84SpDBFDZP7E2Dk/085QwLHdJfYlXB2KlpsP3QAovYq7
 abj0C9OxZKABhfuf/bOXHF3aJ/3Yq+HHiqP+GLFjzlGj82uuAMQwwSyUw3Qj8Mb2
 J9XXI+7U11EBDEeedT3+YLtp0qqLz4v0Ijg3EQS3Pii7fwpeI/Cz23dbv1gLkqUY
 qv/uY/Lbr4UplL09P4MFXH98cfjqutC7phNlhrpQqckNzNcVUxBPEGN87nW4KE46
 wKYP7I0Os1KogqVvxTmz3FWzYl1ciUHBaINFzto/8EZXLuX89w1ASmuTzS6qP2EW
 w7IvvJNVaQNui6Vfk8EOJ7HKudoMHIQb3Y4XmJpM1/CdaPW1dt/9ZG1llt1n716r
 5v3SkAGRQgZcbyQQgDeAd6KgDsWhPINmCpaUaLytZLuQRTMc9R06dkvOMIKoq3gY
 vEKSO9jAQnVqJSr5WYYJxYbecjcVIQ4+HSqGU+4Tt0J/CkJ4cIlgzHv33B8OSJEH
 xrdn3oUXZl02jImRyljZCL5ZnQ9fVFs2XrCFC7ygE5VBesTV2VQ=
 =qMUA
 -----END PGP SIGNATURE-----

Merge tag 'core_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull syscall tracing fix from Borislav Petkov:
 "Fix the seccomp syscall rewriting so that trace and audit see the
  rewritten syscall number, from Kees Cook"

* tag 'core_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  core/entry: Report syscall correctly for trace and audit
2020-09-20 15:37:15 -07:00
Linus Torvalds
3d491679b8 * Fix lockdep's detection of "USED" <- "IN-NMI" inversions, from Peter
Zijlstra.
 
 * Make percpu-rwsem operations on the semaphore's ->read_count IRQ-safe
   because it can be used in an IRQ context, from Hou Tao.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl9nsLQACgkQEsHwGGHe
 VUo0+g/7B9JzDtRSgchT095VcD8w+YcZTyiJM58q9I9OZMxi1zdJZPyoQZ2xZjnG
 aczDJN5H6P6OcBm949EUCHhmDEDfoZpC7Y5FEHe9dJitPmC7rRilGJuz4Im8td3N
 DsLhpFe8KUSqRVyygjvjM393md8tw+m4Jq+syjWri4/1wj1Fs4jHdhKWgz6b2cup
 JXbfjgVOkHVOTloMHnmgdHOPvkh60/LoG8r/5gzLbD8Z/FTD3BSOCTgw+8L8EB+b
 yiWWuEcR5LDjy7sNY2xVhTB+nkHJXe4o+HVufZoAzd1j7FDLfDVSNaZHJVp2m7Gp
 W47xRrzIKIJl6DUp5E4TGi9aZvkO/h5kqSOZ4MhfaeEqw9DLW/KxbherJjzVY3bb
 Nt74N+N+WOq6riTVx5wJkDRmtT5RaeW1kKJUaSeGMl3fxCken5CKE1WyBbae0GiU
 kq3tCn4t0OKCWhuiixFNg6RxuXokjfzXDiHr9aUt3x5musc9P+5YUIdOMsaHwoAH
 0YrAZoRCPyE2ABUgKC8hoTc3Y+TIwji7c0B9S2GoHRTm1LZTRURlLOUQyJ4oU9Ez
 Vt5TILwre9lX9tTW7BlxlOCYkgRQn78vLBSOy1CEUTsORbdYnv/kCy5S5yHEILpT
 f3E345wzdd8zrXGQfX2jsVgNAx0F+81Fb9H9FMJSCFgjIya+SRM=
 =70QQ
 -----END PGP SIGNATURE-----

Merge tag 'locking_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Borislav Petkov:
 "Two fixes from the locking/urgent pile:

   - Fix lockdep's detection of "USED" <- "IN-NMI" inversions (Peter
     Zijlstra)

   - Make percpu-rwsem operations on the semaphore's ->read_count
     IRQ-safe because it can be used in an IRQ context (Hou Tao)"

* tag 'locking_urgent_for_v5.9_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/percpu-rwsem: Use this_cpu_{inc,dec}() for read_count
  locking/lockdep: Fix "USED" <- "IN-NMI" inversions
2020-09-20 15:25:33 -07:00
Linus Torvalds
325d0eab4f Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "15 patches.

  Subsystems affected by this patch series: mailmap, mm/hotfixes,
  mm/thp, mm/memory-hotplug, misc, kcsan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  kcsan: kconfig: move to menu 'Generic Kernel Debugging Instruments'
  fs/fs-writeback.c: adjust dirtytime_interval_handler definition to match prototype
  stackleak: let stack_erasing_sysctl take a kernel pointer buffer
  ftrace: let ftrace_enable_sysctl take a kernel pointer buffer
  mm/memory_hotplug: drain per-cpu pages again during memory offline
  selftests/vm: fix display of page size in map_hugetlb
  mm/thp: fix __split_huge_pmd_locked() for migration PMD
  kprobes: fix kill kprobe which has been marked as gone
  tmpfs: restore functionality of nr_inodes=0
  mlock: fix unevictable_pgs event counts on THP
  mm: fix check_move_unevictable_pages() on THP
  mm: migration of hugetlbfs page skip memcg
  ksm: reinstate memcg charge on copied pages
  mailmap: add older email addresses for Kees Cook
2020-09-19 18:18:37 -07:00
Tobias Klauser
4773ef33fc stackleak: let stack_erasing_sysctl take a kernel pointer buffer
Commit 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
signature of stack_erasing_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:

kernel/stackleak.c:31:50: warning: incorrect type in argument 3 (different address spaces)
kernel/stackleak.c:31:50:    expected void *
kernel/stackleak.c:31:50:    got void [noderef] __user *buffer

Fixes: 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/20200907093253.13656-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19 13:13:39 -07:00
Tobias Klauser
7bb82ac30c ftrace: let ftrace_enable_sysctl take a kernel pointer buffer
Commit 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
signature of ftrace_enable_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:

kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces)
kernel/trace/ftrace.c:7544:43:    expected void *
kernel/trace/ftrace.c:7544:43:    got void [noderef] __user *buffer

Fixes: 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19 13:13:39 -07:00
Muchun Song
b0399092cc kprobes: fix kill kprobe which has been marked as gone
If a kprobe is marked as gone, we should not kill it again.  Otherwise, we
can disarm the kprobe more than once.  In that case, the statistics of
kprobe_ftrace_enabled can unbalance which can lead to that kprobe do not
work.

Fixes: e8386a0cb2 ("kprobes: support probing module __exit function")
Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Song Liu <songliubraving@fb.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200822030055.32383-1-songmuchun@bytedance.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19 13:13:38 -07:00
Linus Torvalds
eb5f95f159 s390 fixes for 5.9-rc6
- Fix order in trace_hardirqs_off_caller() to make locking state
   consistent even if the IRQ tracer calls into lockdep again.
   Touches common code. Acked-by Peter Zijlstra.
 
 - Correctly handle secure storage violation exception to avoid kernel
   panic triggered by user space misbehaviour.
 
 - Switch the idle->seqcount over to using raw_write_*() to avoid
   "suspicious RCU usage".
 
 - Fix memory leaks on hard unplug in pci code.
 
 - Use kvmalloc instead of kmalloc for larger allocations in zcrypt.
 
 - Add few missing __init annotations to static functions to avoid section
   mismatch complains when functions are not inlined.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAl9lQEoACgkQjYWKoQLX
 FBhFRQf+I2F63h/HjZj/84CITUjDyCgcPrtq9eXrxU8E4kjqyvt7BulPO2xQt5KG
 WHivnZp6m5lCbYyMF+FivfhLOlZBf8VrpVTqyc6PTEVGAaP1G+g/w8PAlRBg+er3
 Th4YwMvRY5iNqaD2c0FjFy0aT5/r5RjQX54AG/6aXmNvdmnNkW0/oJ0fX/hBxfSv
 734rPbwBFaHMn9y01oJiIQhhhdVz+uDBgjO/A5PeO0HWDeUJq9NJvS9mfa2fN45+
 7L3korQSEv1htYwDQIK/nAPC4YT4td++58Dwi0Ae/w9bA/qT0amkV8gEkRm9lyqE
 lypav0b8slm2eqlVKUHzJmnYpvZfwA==
 =AuGe
 -----END PGP SIGNATURE-----

Merge tag 's390-5.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

 - Fix order in trace_hardirqs_off_caller() to make locking state
   consistent even if the IRQ tracer calls into lockdep again. Touches
   common code. Acked-by Peter Zijlstra.

 - Correctly handle secure storage violation exception to avoid kernel
   panic triggered by user space misbehaviour.

 - Switch the idle->seqcount over to using raw_write_*() to avoid
  "suspicious RCU usage".

 - Fix memory leaks on hard unplug in pci code.

 - Use kvmalloc instead of kmalloc for larger allocations in zcrypt.

 - Add few missing __init annotations to static functions to avoid
   section mismatch complains when functions are not inlined.

* tag 's390-5.9-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: add 3f program exception handler
  lockdep: fix order in trace_hardirqs_off_caller()
  s390/pci: fix leak of DMA tables on hard unplug
  s390/init: add missing __init annotations
  s390/zcrypt: fix kmalloc 256k failure
  s390/idle: fix suspicious RCU usage
2020-09-18 18:51:08 -07:00
Martin KaFai Lau
70b971118e bpf: Use hlist_add_head_rcu when linking to local_storage
The local_storage->list will be traversed by rcu reader in parallel.
Thus, hlist_add_head_rcu() is needed in bpf_selem_link_storage_nolock().
This patch fixes it.

This part of the code has recently been refactored in bpf-next
and this patch makes changes to the new file "bpf_local_storage.c".
Instead of using the original offending commit in the Fixes tag,
the commit that created the file "bpf_local_storage.c" is used.

A separate fix has been provided to the bpf tree.

Fixes: 450af8d0f6 ("bpf: Split bpf_local_storage to bpf_sk_storage")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200916204453.2003915-1-kafai@fb.com
2020-09-19 01:12:35 +02:00
Masami Hiramatsu
82d083ab60 kprobes: tracing/kprobes: Fix to kill kprobes on initmem after boot
Since kprobe_event= cmdline option allows user to put kprobes on the
functions in initmem, kprobe has to make such probes gone after boot.
Currently the probes on the init functions in modules will be handled
by module callback, but the kernel init text isn't handled.
Without this, kprobes may access non-exist text area to disable or
remove it.

Link: https://lkml.kernel.org/r/159972810544.428528.1839307531600646955.stgit@devnote2

Fixes: 970988e19e ("tracing/kprobe: Add kprobe_event= boot parameter")
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-18 14:27:24 -04:00
Tom Rix
46bbe5c671 tracing: fix double free
clang static analyzer reports this problem

trace_events_hist.c:3824:3: warning: Attempt to free
  released memory
    kfree(hist_data->attrs->var_defs.name[i]);

In parse_var_defs() if there is a problem allocating
var_defs.expr, the earlier var_defs.name is freed.
This free is duplicated by free_var_defs() which frees
the rest of the list.

Because free_var_defs() has to run anyway, remove the
second free fom parse_var_defs().

Link: https://lkml.kernel.org/r/20200907135845.15804-1-trix@redhat.com

Cc: stable@vger.kernel.org
Fixes: 30350d65ac ("tracing: Add variable support to hist triggers")
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-18 13:16:40 -04:00
Tobias Klauser
54fa9ba564 ftrace: Let ftrace_enable_sysctl take a kernel pointer buffer
Commit 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
signature of ftrace_enable_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:

kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces)
kernel/trace/ftrace.c:7544:43:    expected void *
kernel/trace/ftrace.c:7544:43:    got void [noderef] __user *buffer

Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch

Fixes: 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-18 13:15:56 -04:00
Sebastian Andrzej Siewior
795d6379a4 tracing: Make the space reserved for the pid wider
For 64bit CONFIG_BASE_SMALL=0 systems PID_MAX_LIMIT is set by default to
4194304. During boot the kernel sets a new value based on number of CPUs
but no lower than 32768. It is 1024 per CPU so with 128 CPUs the default
becomes 131072 which needs six digits.
This value can be increased during run time but must not exceed the
initial upper limit.

Systemd sometime after v241 sets it to the upper limit during boot. The
result is that when the pid exceeds five digits, the trace output is a
little hard to read because it is no longer properly padded (same like
on big iron with 98+ CPUs).

Increase the pid padding to seven digits.

Link: https://lkml.kernel.org/r/20200904082331.dcdkrr3bkn3e4qlg@linutronix.de

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-18 12:42:11 -04:00
Adrian Hunter
478ece9573 ftrace: Fix missing synchronize_rcu() removing trampoline from kallsyms
Add synchronize_rcu() after list_del_rcu() in
ftrace_remove_trampoline_from_kallsyms() to protect readers of
ftrace_ops_trampoline_list (in ftrace_get_trampoline_kallsym)
which is used when kallsyms is read.

Link: https://lkml.kernel.org/r/20200901091617.31837-1-adrian.hunter@intel.com

Fixes: fc0ea795f5 ("ftrace: Add symbols for ftrace trampolines")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-18 12:22:42 -04:00
Miroslav Benes
d5e47505e0 ftrace: Free the trampoline when ftrace_startup() fails
Commit fc0ea795f5 ("ftrace: Add symbols for ftrace trampolines")
missed to remove ops from new ftrace_ops_trampoline_list in
ftrace_startup() if ftrace_hash_ipmodify_enable() fails there. It may
lead to BUG if such ops come from a module which may be removed.

Moreover, the trampoline itself is not freed in this case.

Fix it by calling ftrace_trampoline_free() during the rollback.

Link: https://lkml.kernel.org/r/20200831122631.28057-1-mbenes@suse.cz

Fixes: fc0ea795f5 ("ftrace: Add symbols for ftrace trampolines")
Fixes: f8b8be8a31 ("ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict")
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-18 12:19:08 -04:00
Masami Hiramatsu
3031313eb3 kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()
Commit 0cb2f1372b ("kprobes: Fix NULL pointer dereference at
kprobe_ftrace_handler") fixed one bug but not completely fixed yet.
If we run a kprobe_module.tc of ftracetest, kernel showed a warning
as below.

# ./ftracetest test.d/kprobe/kprobe_module.tc
=== Ftrace unit tests ===
[1] Kprobe dynamic event - probing module
...
[   22.400215] ------------[ cut here ]------------
[   22.400962] Failed to disarm kprobe-ftrace at trace_printk_irq_work+0x0/0x7e [trace_printk] (-2)
[   22.402139] WARNING: CPU: 7 PID: 200 at kernel/kprobes.c:1091 __disarm_kprobe_ftrace.isra.0+0x7e/0xa0
[   22.403358] Modules linked in: trace_printk(-)
[   22.404028] CPU: 7 PID: 200 Comm: rmmod Not tainted 5.9.0-rc2+ #66
[   22.404870] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
[   22.406139] RIP: 0010:__disarm_kprobe_ftrace.isra.0+0x7e/0xa0
[   22.406947] Code: 30 8b 03 eb c9 80 3d e5 09 1f 01 00 75 dc 49 8b 34 24 89 c2 48 c7 c7 a0 c2 05 82 89 45 e4 c6 05 cc 09 1f 01 01 e8 a9 c7 f0 ff <0f> 0b 8b 45 e4 eb b9 89 c6 48 c7 c7 70 c2 05 82 89 45 e4 e8 91 c7
[   22.409544] RSP: 0018:ffffc90000237df0 EFLAGS: 00010286
[   22.410385] RAX: 0000000000000000 RBX: ffffffff83066024 RCX: 0000000000000000
[   22.411434] RDX: 0000000000000001 RSI: ffffffff810de8d3 RDI: ffffffff810de8d3
[   22.412687] RBP: ffffc90000237e10 R08: 0000000000000001 R09: 0000000000000001
[   22.413762] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88807c478640
[   22.414852] R13: ffffffff8235ebc0 R14: ffffffffa00060c0 R15: 0000000000000000
[   22.415941] FS:  00000000019d48c0(0000) GS:ffff88807d7c0000(0000) knlGS:0000000000000000
[   22.417264] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   22.418176] CR2: 00000000005bb7e3 CR3: 0000000078f7a000 CR4: 00000000000006a0
[   22.419309] Call Trace:
[   22.419990]  kill_kprobe+0x94/0x160
[   22.420652]  kprobes_module_callback+0x64/0x230
[   22.421470]  notifier_call_chain+0x4f/0x70
[   22.422184]  blocking_notifier_call_chain+0x49/0x70
[   22.422979]  __x64_sys_delete_module+0x1ac/0x240
[   22.423733]  do_syscall_64+0x38/0x50
[   22.424366]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   22.425176] RIP: 0033:0x4bb81d
[   22.425741] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e0 ff ff ff f7 d8 64 89 01 48
[   22.428726] RSP: 002b:00007ffc70fef008 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
[   22.430169] RAX: ffffffffffffffda RBX: 00000000019d48a0 RCX: 00000000004bb81d
[   22.431375] RDX: 0000000000000000 RSI: 0000000000000880 RDI: 00007ffc70fef028
[   22.432543] RBP: 0000000000000880 R08: 00000000ffffffff R09: 00007ffc70fef320
[   22.433692] R10: 0000000000656300 R11: 0000000000000246 R12: 00007ffc70fef028
[   22.434635] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
[   22.435682] irq event stamp: 1169
[   22.436240] hardirqs last  enabled at (1179): [<ffffffff810df542>] console_unlock+0x422/0x580
[   22.437466] hardirqs last disabled at (1188): [<ffffffff810df19b>] console_unlock+0x7b/0x580
[   22.438608] softirqs last  enabled at (866): [<ffffffff81c0038e>] __do_softirq+0x38e/0x490
[   22.439637] softirqs last disabled at (859): [<ffffffff81a00f42>] asm_call_on_stack+0x12/0x20
[   22.440690] ---[ end trace 1e7ce7e1e4567276 ]---
[   22.472832] trace_kprobe: This probe might be able to register after target module is loaded. Continue.

This is because the kill_kprobe() calls disarm_kprobe_ftrace() even
if the given probe is not enabled. In that case, ftrace_set_filter_ip()
fails because the given probe point is not registered to ftrace.

Fix to check the given (going) probe is enabled before invoking
disarm_kprobe_ftrace().

Link: https://lkml.kernel.org/r/159888672694.1411785.5987998076694782591.stgit@devnote2

Fixes: 0cb2f1372b ("kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler")
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-09-18 11:50:51 -04:00
Alexei Starovoitov
09b28d76ea bpf: Add abnormal return checks.
LD_[ABS|IND] instructions may return from the function early. bpf_tail_call
pseudo instruction is either fallthrough or return. Allow them in the
subprograms only when subprograms are BTF annotated and have scalar return
types. Allow ld_abs and tail_call in the main program even if it calls into
subprograms. In the past that was not ok to do for ld_abs, since it was JITed
with special exit sequence. Since bpf_gen_ld_abs() was introduced the ld_abs
looks like normal exit insn from JIT point of view, so it's safe to allow them
in the main program.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-17 19:56:07 -07:00
Maciej Fijalkowski
e411901c0b bpf: allow for tailcalls in BPF subprograms for x64 JIT
Relax verifier's restriction that was meant to forbid tailcall usage
when subprog count was higher than 1.

Also, do not max out the stack depth of program that utilizes tailcalls.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-09-17 19:56:06 -07:00