Commit Graph

35371 Commits

Author SHA1 Message Date
Linus Torvalds
17b6c49da3 Merge tag 'x86_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - Add a new Intel model number for Alder Lake

 - Differentiate which aspects of the FPU state get saved/restored when
   the FPU is used in-kernel and fix a boot crash on K7 due to early
   MXCSR access before CR4.OSFXSR is even set.

 - A couple of noinstr annotation fixes

 - Correct die ID setting on AMD for users of topology information which
   need the correct die ID

 - A SEV-ES fix to handle string port IO to/from kernel memory properly

* tag 'x86_urgent_for_v5.11_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add another Alder Lake CPU to the Intel family
  x86/mmx: Use KFPU_387 for MMX string operations
  x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
  x86/topology: Make __max_die_per_package available unconditionally
  x86: __always_inline __{rd,wr}msr()
  x86/mce: Remove explicit/superfluous tracing
  locking/lockdep: Avoid noinstr warning for DEBUG_LOCKDEP
  locking/lockdep: Cure noinstr fail
  x86/sev: Fix nonistr violation
  x86/entry: Fix noinstr fail
  x86/cpu/amd: Set __max_die_per_package on AMD
  x86/sev-es: Handle string port IO to kernel memory properly
2021-01-24 09:46:05 -08:00
Linus Torvalds
c509ce2378 Merge tag 'for-linus-2021-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull misc fixes from Christian Brauner:

 - Jann reported sparse complaints because of a missing __user
   annotation in a helper we added way back when we added
   pidfd_send_signal() to avoid compat syscall handling. Fix it.

 - Yanfei replaces a reference in a comment to the _do_fork() helper I
   removed a while ago with a reference to the new kernel_clone()
   replacement

 - Alexander Guril added a simple coding style fix

* tag 'for-linus-2021-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  kthread: remove comments about old _do_fork() helper
  Kernel: fork.c: Fix coding style: Do not use {} around single-line statements
  signal: Add missing __user annotation to copy_siginfo_from_user_any
2021-01-24 09:35:28 -08:00
Pan Bian
b9557caaf8 bpf, inode_storage: Put file handler if no storage was found
Put file f if inode_storage_ptr() returns NULL.

Fixes: 8ea636848a ("bpf: Implement bpf_local_storage for inodes")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210121020856.25507-1-bianpan2016@163.com
2021-01-22 23:19:24 +01:00
Loris Reiff
f4a2da755a bpf, cgroup: Fix problematic bounds check
Since ctx.optlen is signed, a larger value than max_value could be
passed, as it is later on used as unsigned, which causes a WARN_ON_ONCE
in the copy_to_user.

Fixes: 0d01da6afc ("bpf: implement getsockopt and setsockopt hooks")
Signed-off-by: Loris Reiff <loris.reiff@liblor.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20210122164232.61770-2-loris.reiff@liblor.ch
2021-01-22 23:11:47 +01:00
Loris Reiff
bb8b81e396 bpf, cgroup: Fix optlen WARN_ON_ONCE toctou
A toctou issue in `__cgroup_bpf_run_filter_getsockopt` can trigger a
WARN_ON_ONCE in a check of `copy_from_user`.

`*optlen` is checked to be non-negative in the individual getsockopt
functions beforehand. Changing `*optlen` in a race to a negative value
will result in a `copy_from_user(ctx.optval, optval, ctx.optlen)` with
`ctx.optlen` being a negative integer.

Fixes: 0d01da6afc ("bpf: implement getsockopt and setsockopt hooks")
Signed-off-by: Loris Reiff <loris.reiff@liblor.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20210122164232.61770-1-loris.reiff@liblor.ch
2021-01-22 23:11:34 +01:00
Peter Zijlstra
741ba80f6f sched: Relax the set_cpus_allowed_ptr() semantics
Now that we have KTHREAD_IS_PER_CPU to denote the critical per-cpu
tasks to retain during CPU offline, we can relax the warning in
set_cpus_allowed_ptr(). Any spurious kthread that wants to get on at
the last minute will get pushed off before it can run.

While during CPU online there is no harm, and actual benefit, to
allowing kthreads back on early, it simplifies hotplug code and fixes
a number of outstanding races.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Lai jiangshan <jiangshanlai@gmail.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103507.240724591@infradead.org
2021-01-22 15:09:44 +01:00
Peter Zijlstra
5ba2ffba13 sched: Fix CPU hotplug / tighten is_per_cpu_kthread()
Prior to commit 1cf12e08bc ("sched/hotplug: Consolidate task
migration on CPU unplug") we'd leave any task on the dying CPU and
break affinity and force them off at the very end.

This scheme had to change in order to enable migrate_disable(). One
cannot wait for migrate_disable() to complete while stuck in
stop_machine(). Furthermore, since we need at the very least: idle,
hotplug and stop threads at any point before stop_machine, we can't
break affinity and/or push those away.

Under the assumption that all per-cpu kthreads are sanely handled by
CPU hotplug, the new code no long breaks affinity or migrates any of
them (which then includes the critical ones above).

However, there's an important difference between per-cpu kthreads and
kthreads that happen to have a single CPU affinity which is lost. The
latter class very much relies on the forced affinity breaking and
migration semantics previously provided.

Use the new kthread_is_per_cpu() infrastructure to tighten
is_per_cpu_kthread() and fix the hot-unplug problems stemming from the
change.

Fixes: 1cf12e08bc ("sched/hotplug: Consolidate task migration on CPU unplug")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103507.102416009@infradead.org
2021-01-22 15:09:44 +01:00
Peter Zijlstra
975707f227 sched: Prepare to use balance_push in ttwu()
In preparation of using the balance_push state in ttwu() we need it to
provide a reliable and consistent state.

The immediate problem is that rq->balance_callback gets cleared every
schedule() and then re-set in the balance_push_callback() itself. This
is not a reliable signal, so add a variable that stays set during the
entire time.

Also move setting it before the synchronize_rcu() in
sched_cpu_deactivate(), such that we get guaranteed visibility to
ttwu(), which is a preempt-disable region.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.966069627@infradead.org
2021-01-22 15:09:43 +01:00
Peter Zijlstra
640f17c824 workqueue: Restrict affinity change to rescuer
create_worker() will already set the right affinity using
kthread_bind_mask(), this means only the rescuer will need to change
it's affinity.

Howveer, while in cpu-hot-unplug a regular task is not allowed to run
on online&&!active as it would be pushed away quite agressively. We
need KTHREAD_IS_PER_CPU to survive in that environment.

Therefore set the affinity after getting that magic flag.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.826629830@infradead.org
2021-01-22 15:09:43 +01:00
Peter Zijlstra
5c25b5ff89 workqueue: Tag bound workers with KTHREAD_IS_PER_CPU
Mark the per-cpu workqueue workers as KTHREAD_IS_PER_CPU.

Workqueues have unfortunate semantics in that per-cpu workers are not
default flushed and parked during hotplug, however a subset does
manual flush on hotplug and hard relies on them for correctness.

Therefore play silly games..

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.693465814@infradead.org
2021-01-22 15:09:42 +01:00
Peter Zijlstra
ac687e6e8c kthread: Extract KTHREAD_IS_PER_CPU
There is a need to distinguish geniune per-cpu kthreads from kthreads
that happen to have a single CPU affinity.

Geniune per-cpu kthreads are kthreads that are CPU affine for
correctness, these will obviously have PF_KTHREAD set, but must also
have PF_NO_SETAFFINITY set, lest userspace modify their affinity and
ruins things.

However, these two things are not sufficient, PF_NO_SETAFFINITY is
also set on other tasks that have their affinities controlled through
other means, like for instance workqueues.

Therefore another bit is needed; it turns out kthread_create_per_cpu()
already has such a bit: KTHREAD_IS_PER_CPU, which is used to make
kthread_park()/kthread_unpark() work correctly.

Expose this flag and remove the implicit setting of it from
kthread_create_on_cpu(); the io_uring usage of it seems dubious at
best.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.557620262@infradead.org
2021-01-22 15:09:42 +01:00
Peter Zijlstra
22f667c97a sched: Don't run cpu-online with balance_push() enabled
We don't need to push away tasks when we come online, mark the push
complete right before the CPU dies.

XXX hotplug state machine has trouble with rollback here.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210121103506.415606087@infradead.org
2021-01-22 15:09:42 +01:00
Lai Jiangshan
547a77d02f workqueue: Use cpu_possible_mask instead of cpu_active_mask to break affinity
The scheduler won't break affinity for us any more, and we should
"emulate" the same behavior when the scheduler breaks affinity for
us.  The behavior is "changing the cpumask to cpu_possible_mask".

And there might be some other CPUs online later while the worker is
still running with the pending work items.  The worker should be allowed
to use the later online CPUs as before and process the work items ASAP.
If we use cpu_active_mask here, we can't achieve this goal but
using cpu_possible_mask can.

Fixes: 06249738a4 ("workqueue: Manually break affinity on hotplug")
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210111152638.2417-4-jiangshanlai@gmail.com
2021-01-22 15:09:41 +01:00
Valentin Schneider
36c6e17bf1 sched/core: Print out straggler tasks in sched_cpu_dying()
Since commit

  1cf12e08bc ("sched/hotplug: Consolidate task migration on CPU unplug")

tasks are expected to move themselves out of a out-going CPU. For most
tasks this will be done automagically via BALANCE_PUSH, but percpu kthreads
will have to cooperate and move themselves away one way or another.

Currently, some percpu kthreads (workqueues being a notable exemple) do not
cooperate nicely and can end up on an out-going CPU at the time
sched_cpu_dying() is invoked.

Print the dying rq's tasks to shed some light on the stragglers.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210113183141.11974-1-valentin.schneider@arm.com
2021-01-22 15:09:41 +01:00
Linus Torvalds
2561bbbe2e Merge tag 'printk-for-5.11-printk-rework-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk fixes from Petr Mladek:

 - Fix line counting and buffer size calculation. Both regressions
   caused that a reader buffer might not get filled as much as possible.

 - Restore non-documented behavior of printk() reader API and make it
   official.

   It did not fill the last byte of the provided buffer before 5.10. Two
   architectures, powerpc and um, used it to add the trailing '\0'.
   There might theoretically be more callers depending on this behavior
   in userspace.

* tag 'printk-for-5.11-printk-rework-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: fix buffer overflow potential for print_text()
  printk: fix kmsg_dump_get_buffer length calulations
  printk: ringbuffer: fix line counting
2021-01-21 11:37:22 -08:00
Petr Mladek
535b6a122c Merge branch 'printk-rework' into for-linus 2021-01-21 16:06:21 +01:00
Jakub Kicinski
0fe2f273ab Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/can/dev.c
  commit 03f16c5075 ("can: dev: can_restart: fix use after free bug")
  commit 3e77f70e73 ("can: dev: move driver related infrastructure into separate subdir")

  Code move.

drivers/net/dsa/b53/b53_common.c
 commit 8e4052c32d ("net: dsa: b53: fix an off by one in checking "vlan->vid"")
 commit b7a9e0da2d ("net: switchdev: remove vid_begin -> vid_end range from VLAN objects")

 Field rename.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-20 12:16:11 -08:00
Linus Torvalds
75439bc439 Merge tag 'net-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Networking fixes for 5.11-rc5, including fixes from bpf, wireless, and
  can trees.

  Current release - regressions:

   - nfc: nci: fix the wrong NCI_CORE_INIT parameters

  Current release - new code bugs:

   - bpf: allow empty module BTFs

  Previous releases - regressions:

   - bpf: fix signed_{sub,add32}_overflows type handling

   - tcp: do not mess with cloned skbs in tcp_add_backlog()

   - bpf: prevent double bpf_prog_put call from bpf_tracing_prog_attach

   - bpf: don't leak memory in bpf getsockopt when optlen == 0

   - tcp: fix potential use-after-free due to double kfree()

   - mac80211: fix encryption issues with WEP

   - devlink: use right genl user_ptr when handling port param get/set

   - ipv6: set multicast flag on the multicast route

   - tcp: fix TCP_USER_TIMEOUT with zero window

  Previous releases - always broken:

   - bpf: local storage helpers should check nullness of owner ptr passed

   - mac80211: fix incorrect strlen of .write in debugfs

   - cls_flower: call nla_ok() before nla_next()

   - skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too"

* tag 'net-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
  net: systemport: free dev before on error path
  net: usb: cdc_ncm: don't spew notifications
  net: mscc: ocelot: Fix multicast to the CPU port
  tcp: Fix potential use-after-free due to double kfree()
  bpf: Fix signed_{sub,add32}_overflows type handling
  can: peak_usb: fix use after free bugs
  can: vxcan: vxcan_xmit: fix use after free bug
  can: dev: can_restart: fix use after free bug
  tcp: fix TCP socket rehash stats mis-accounting
  net: dsa: b53: fix an off by one in checking "vlan->vid"
  tcp: do not mess with cloned skbs in tcp_add_backlog()
  selftests: net: fib_tests: remove duplicate log test
  net: nfc: nci: fix the wrong NCI_CORE_INIT parameters
  sh_eth: Fix power down vs. is_opened flag ordering
  net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled
  netfilter: rpfilter: mask ecn bits before fib lookup
  udp: mask TOS bits in udp_v4_early_demux()
  xsk: Clear pool even for inactive queues
  bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callback
  sh_eth: Make PHY access aware of Runtime PM to fix reboot crash
  ...
2021-01-20 11:52:21 -08:00
Daniel Borkmann
bc895e8b2a bpf: Fix signed_{sub,add32}_overflows type handling
Fix incorrect signed_{sub,add32}_overflows() input types (and a related buggy
comment). It looks like this might have slipped in via copy/paste issue, also
given prior to 3f50f132d8 ("bpf: Verifier, do explicit ALU32 bounds tracking")
the signature of signed_sub_overflows() had s64 a and s64 b as its input args
whereas now they are truncated to s32. Thus restore proper types. Also, the case
of signed_add32_overflows() is not consistent to signed_sub32_overflows(). Both
have s32 as inputs, therefore align the former.

Fixes: 3f50f132d8 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: De4dCr0w <sa516203@mail.ustc.edu.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-01-20 17:19:40 +01:00
Linus Torvalds
45dfb8a565 Merge tag 'task_work-2021-01-19' of git://git.kernel.dk/linux-block
Pull task_work fix from Jens Axboe:
 "The TIF_NOTIFY_SIGNAL change inadvertently removed the unconditional
  task_work run we had in get_signal().

  This caused a regression for some setups, since we're relying on eg
  ____fput() being run to close and release, for example, a pipe and
  wake the other end.

  For 5.11, I prefer the simple solution of just reinstating the
  unconditional run, even if it conceptually doesn't make much sense -
  if you need that kind of guarantee, you should be using TWA_SIGNAL
  instead of TWA_NOTIFY. But it's the trivial fix for 5.11, and would
  ensure that other potential gotchas/assumptions for task_work don't
  regress for 5.11.

  We're looking into further simplifying the task_work notifications for
  5.12 which would resolve that too"

* tag 'task_work-2021-01-19' of git://git.kernel.dk/linux-block:
  task_work: unconditionally run task_work from get_signal()
2021-01-19 13:26:05 -08:00
Mircea Cirjaliu
301a33d518 bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callback
I assume this was obtained by copy/paste. Point it to bpf_map_peek_elem()
instead of bpf_map_pop_elem(). In practice it may have been less likely
hit when under JIT given shielded via 84430d4232 ("bpf, verifier: avoid
retpoline for map push/pop/peek operation").

Fixes: f1a2e44a3a ("bpf: add queue and stack maps")
Signed-off-by: Mircea Cirjaliu <mcirjaliu@bitdefender.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Mauricio Vasquez <mauriciovasquezbernal@gmail.com>
Link: https://lore.kernel.org/bpf/AM7PR02MB6082663DFDCCE8DA7A6DD6B1BBA30@AM7PR02MB6082.eurprd02.prod.outlook.com
2021-01-19 22:04:08 +01:00
John Ogness
f0e386ee0c printk: fix buffer overflow potential for print_text()
Before the commit 896fbe20b4 ("printk: use the lockless
ringbuffer"), msg_print_text() would only write up to size-1 bytes
into the provided buffer. Some callers expect this behavior and
append a terminator to returned string. In particular:

arch/powerpc/xmon/xmon.c:dump_log_buf()
arch/um/kernel/kmsg_dump.c:kmsg_dumper_stdout()

msg_print_text() has been replaced by record_print_text(), which
currently fills the full size of the buffer. This causes a
buffer overflow for the above callers.

Change record_print_text() so that it will only use size-1 bytes
for text data. Also, for paranoia sakes, add a terminator after
the text data.

And finally, document this behavior so that it is clear that only
size-1 bytes are used and a terminator is added.

Fixes: 896fbe20b4 ("printk: use the lockless ringbuffer")
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210114170412.4819-1-john.ogness@linutronix.de
2021-01-19 11:42:14 +01:00
Jakub Kicinski
2d9116be76 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-01-16

1) Extend atomic operations to the BPF instruction set along with x86-64 JIT support,
   that is, atomic{,64}_{xchg,cmpxchg,fetch_{add,and,or,xor}}, from Brendan Jackman.

2) Add support for using kernel module global variables (__ksym externs in BPF
   programs) retrieved via module's BTF, from Andrii Nakryiko.

3) Generalize BPF stackmap's buildid retrieval and add support to have buildid
   stored in mmap2 event for perf, from Jiri Olsa.

4) Various fixes for cross-building BPF sefltests out-of-tree which then will
   unblock wider automated testing on ARM hardware, from Jean-Philippe Brucker.

5) Allow to retrieve SOL_SOCKET opts from sock_addr progs, from Daniel Borkmann.

6) Clean up driver's XDP buffer init and split into two helpers to init per-
   descriptor and non-changing fields during processing, from Lorenzo Bianconi.

7) Minor misc improvements to libbpf & bpftool, from Ian Rogers.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (41 commits)
  perf: Add build id data in mmap2 event
  bpf: Add size arg to build_id_parse function
  bpf: Move stack_map_get_build_id into lib
  bpf: Document new atomic instructions
  bpf: Add tests for new BPF atomic operations
  bpf: Add bitwise atomic instructions
  bpf: Pull out a macro for interpreting atomic ALU operations
  bpf: Add instructions for atomic_[cmp]xchg
  bpf: Add BPF_FETCH field / create atomic_fetch_add instruction
  bpf: Move BPF_STX reserved field check into BPF_STX verifier code
  bpf: Rename BPF_XADD and prepare to encode other atomics in .imm
  bpf: x86: Factor out a lookup table for some ALU opcodes
  bpf: x86: Factor out emission of REX byte
  bpf: x86: Factor out emission of ModR/M for *(reg + off)
  tools/bpftool: Add -Wall when building BPF programs
  bpf, libbpf: Avoid unused function warning on bpf_tail_call_static
  selftests/bpf: Install btf_dump test cases
  selftests/bpf: Fix installation of urandom_read
  selftests/bpf: Move generated test files to $(TEST_GEN_FILES)
  selftests/bpf: Fix out-of-tree build
  ...
====================

Link: https://lore.kernel.org/r/20210116012922.17823-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 17:57:26 -08:00
Jakub Kicinski
e23a8d0021 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-01-16

1) Fix a double bpf_prog_put() for BPF_PROG_{TYPE_EXT,TYPE_TRACING} types in
   link creation's error path causing a refcount underflow, from Jiri Olsa.

2) Fix BTF validation errors for the case where kernel modules don't declare
   any new types and end up with an empty BTF, from Andrii Nakryiko.

3) Fix BPF local storage helpers to first check their {task,inode} owners for
   being NULL before access, from KP Singh.

4) Fix a memory leak in BPF setsockopt handling for the case where optlen is
   zero and thus temporary optval buffer should be freed, from Stanislav Fomichev.

5) Fix a syzbot memory allocation splat in BPF_PROG_TEST_RUN infra for
   raw_tracepoint caused by too big ctx_size_in, from Song Liu.

6) Fix LLVM code generation issues with verifier where PTR_TO_MEM{,_OR_NULL}
   registers were spilled to stack but not recognized, from Gilad Reti.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  MAINTAINERS: Update my email address
  selftests/bpf: Add verifier test for PTR_TO_MEM spill
  bpf: Support PTR_TO_MEM{,_OR_NULL} register spilling
  bpf: Reject too big ctx_size_in for raw_tp test run
  libbpf: Allow loading empty BTFs
  bpf: Allow empty module BTFs
  bpf: Don't leak memory in bpf getsockopt when optlen == 0
  bpf: Update local storage test to check handling of null ptrs
  bpf: Fix typo in bpf_inode_storage.c
  bpf: Local storage helpers should check nullness of owner ptr passed
  bpf: Prevent double bpf_prog_put call from bpf_tracing_prog_attach
====================

Link: https://lore.kernel.org/r/20210116002025.15706-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15 16:34:59 -08:00
John Ogness
89ccf18f03 printk: fix kmsg_dump_get_buffer length calulations
kmsg_dump_get_buffer() uses @syslog to determine if the syslog
prefix should be written to the buffer. However, when calculating
the maximum number of records that can fit into the buffer, it
always counts the bytes from the syslog prefix.

Use @syslog when calculating the maximum number of records that can
fit into the buffer.

Fixes: e2ae715d66 ("kmsg - kmsg_dump() use iterator to receive log buffer content")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210113164413.1599-1-john.ogness@linutronix.de
2021-01-15 11:32:52 +01:00
John Ogness
668af87f99 printk: ringbuffer: fix line counting
Counting text lines in a record simply involves counting the number
of newline characters (+1). However, it is searching the full data
block for newline characters, even though the text data can be (and
often is) a subset of that area. Since the extra area in the data
block was never initialized, the result is that extra newlines may
be seen and counted.

Restrict newline searching to the text data length.

Fixes: b6cf8b3f33 ("printk: add lockless ringbuffer")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210113144234.6545-1-john.ogness@linutronix.de
2021-01-15 11:30:03 +01:00
Jiri Olsa
88a16a1309 perf: Add build id data in mmap2 event
Adding support to carry build id data in mmap2 event.

The build id data replaces maj/min/ino/ino_generation
fields, which are also used to identify map's binary,
so it's ok to replace them with build id data:

  union {
          struct {
                  u32       maj;
                  u32       min;
                  u64       ino;
                  u64       ino_generation;
          };
          struct {
                  u8        build_id_size;
                  u8        __reserved_1;
                  u16       __reserved_2;
                  u8        build_id[20];
          };
  };

Replaced maj/min/ino/ino_generation fields give us size
of 24 bytes. We use 20 bytes for build id data, 1 byte
for size and rest is unused.

There's new misc bit for mmap2 to signal there's build
id data in it:

  #define PERF_RECORD_MISC_MMAP_BUILD_ID   (1 << 14)

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/20210114134044.1418404-4-jolsa@kernel.org
2021-01-14 19:29:58 -08:00
Jiri Olsa
921f88fc89 bpf: Add size arg to build_id_parse function
It's possible to have other build id types (other than default SHA1).
Currently there's also ld support for MD5 build id.

Adding size argument to build_id_parse function, that returns (if defined)
size of the parsed build id, so we can recognize the build id type.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210114134044.1418404-3-jolsa@kernel.org
2021-01-14 19:29:58 -08:00
Jiri Olsa
bd7525dacd bpf: Move stack_map_get_build_id into lib
Moving stack_map_get_build_id into lib with
declaration in linux/buildid.h header:

  int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id);

This function returns build id for given struct vm_area_struct.
There is no functional change to stack_map_get_build_id function.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210114134044.1418404-2-jolsa@kernel.org
2021-01-14 19:29:58 -08:00
Brendan Jackman
981f94c3e9 bpf: Add bitwise atomic instructions
This adds instructions for

atomic[64]_[fetch_]and
atomic[64]_[fetch_]or
atomic[64]_[fetch_]xor

All these operations are isomorphic enough to implement with the same
verifier, interpreter, and x86 JIT code, hence being a single commit.

The main interesting thing here is that x86 doesn't directly support
the fetch_ version these operations, so we need to generate a CMPXCHG
loop in the JIT. This requires the use of two temporary registers,
IIUC it's safe to use BPF_REG_AX and x86's AUX_REG for this purpose.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-10-jackmanb@google.com
2021-01-14 18:34:29 -08:00
Brendan Jackman
462910670e bpf: Pull out a macro for interpreting atomic ALU operations
Since the atomic operations that are added in subsequent commits are
all isomorphic with BPF_ADD, pull out a macro to avoid the
interpreter becoming dominated by lines of atomic-related code.

Note that this sacrificies interpreter performance (combining
STX_ATOMIC_W and STX_ATOMIC_DW into single switch case means that we
need an extra conditional branch to differentiate them) in favour of
compact and (relatively!) simple C code.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-9-jackmanb@google.com
2021-01-14 18:34:29 -08:00
Brendan Jackman
5ffa25502b bpf: Add instructions for atomic_[cmp]xchg
This adds two atomic opcodes, both of which include the BPF_FETCH
flag. XCHG without the BPF_FETCH flag would naturally encode
atomic_set. This is not supported because it would be of limited
value to userspace (it doesn't imply any barriers). CMPXCHG without
BPF_FETCH woulud be an atomic compare-and-write. We don't have such
an operation in the kernel so it isn't provided to BPF either.

There are two significant design decisions made for the CMPXCHG
instruction:

 - To solve the issue that this operation fundamentally has 3
   operands, but we only have two register fields. Therefore the
   operand we compare against (the kernel's API calls it 'old') is
   hard-coded to be R0. x86 has similar design (and A64 doesn't
   have this problem).

   A potential alternative might be to encode the other operand's
   register number in the immediate field.

 - The kernel's atomic_cmpxchg returns the old value, while the C11
   userspace APIs return a boolean indicating the comparison
   result. Which should BPF do? A64 returns the old value. x86 returns
   the old value in the hard-coded register (and also sets a
   flag). That means return-old-value is easier to JIT, so that's
   what we use.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-8-jackmanb@google.com
2021-01-14 18:34:29 -08:00
Brendan Jackman
5ca419f286 bpf: Add BPF_FETCH field / create atomic_fetch_add instruction
The BPF_FETCH field can be set in bpf_insn.imm, for BPF_ATOMIC
instructions, in order to have the previous value of the
atomically-modified memory location loaded into the src register
after an atomic op is carried out.

Suggested-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-7-jackmanb@google.com
2021-01-14 18:34:29 -08:00
Brendan Jackman
c5bcb5eb4d bpf: Move BPF_STX reserved field check into BPF_STX verifier code
I can't find a reason why this code is in resolve_pseudo_ldimm64;
since I'll be modifying it in a subsequent commit, tidy it up.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-6-jackmanb@google.com
2021-01-14 18:34:29 -08:00
Brendan Jackman
91c960b005 bpf: Rename BPF_XADD and prepare to encode other atomics in .imm
A subsequent patch will add additional atomic operations. These new
operations will use the same opcode field as the existing XADD, with
the immediate discriminating different operations.

In preparation, rename the instruction mode BPF_ATOMIC and start
calling the zero immediate BPF_ADD.

This is possible (doesn't break existing valid BPF progs) because the
immediate field is currently reserved MBZ and BPF_ADD is zero.

All uses are removed from the tree but the BPF_XADD definition is
kept around to avoid breaking builds for people including kernel
headers.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Björn Töpel <bjorn.topel@gmail.com>
Link: https://lore.kernel.org/bpf/20210114181751.768687-5-jackmanb@google.com
2021-01-14 18:34:29 -08:00
Gilad Reti
744ea4e388 bpf: Support PTR_TO_MEM{,_OR_NULL} register spilling
Add support for pointer to mem register spilling, to allow the verifier
to track pointers to valid memory addresses. Such pointers are returned
for example by a successful call of the bpf_ringbuf_reserve helper.

The patch was partially contributed by CyberArk Software, Inc.

Fixes: 457f44363a ("bpf: Implement BPF ring buffer and verifier support for it")
Suggested-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Gilad Reti <gilad.reti@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210113053810.13518-1-gilad.reti@gmail.com
2021-01-13 19:47:43 -08:00
Thomas Gleixner
ce09ccc502 genirq: Export irq_check_status_bit()
One of the users can be built modular:

  ERROR: modpost: "irq_check_status_bit" [drivers/perf/arm_spe_pmu.ko] undefined!

Fixes: fdd0296304 ("genirq: Move status flag checks to core")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201227192049.GA195845@roeck-us.net
2021-01-13 15:48:05 +01:00
Andrii Nakryiko
541c3bad8d bpf: Support BPF ksym variables in kernel modules
Add support for directly accessing kernel module variables from BPF programs
using special ldimm64 instructions. This functionality builds upon vmlinux
ksym support, but extends ldimm64 with src_reg=BPF_PSEUDO_BTF_ID to allow
specifying kernel module BTF's FD in insn[1].imm field.

During BPF program load time, verifier will resolve FD to BTF object and will
take reference on BTF object itself and, for module BTFs, corresponding module
as well, to make sure it won't be unloaded from under running BPF program. The
mechanism used is similar to how bpf_prog keeps track of used bpf_maps.

One interesting change is also in how per-CPU variable is determined. The
logic is to find .data..percpu data section in provided BTF, but both vmlinux
and module each have their own .data..percpu entries in BTF. So for module's
case, the search for DATASEC record needs to look at only module's added BTF
types. This is implemented with custom search function.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20210112075520.4103414-6-andrii@kernel.org
2021-01-12 17:24:30 -08:00
Brendan Jackman
28a8add641 bpf: Fix a verifier message for alloc size helper arg
The error message here is misleading, the argument will be rejected unless
it is a known constant.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210112123913.2016804-1-jackmanb@google.com
2021-01-12 21:41:17 +01:00
Thomas Gleixner
4bae052dde Merge tag 'irqchip-fixes-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip fixes from Marc Zyngier:

 - Fix the MIPS CPU interrupt controller hierarchy
 - Simplify the PRUSS Kconfig entry
 - Eliminate trivial build warnings on the MIPS Loongson liointc
 - Fix error path in devm_platform_get_irqs_affinity()
 - Turn the BCM2836 IPI irq_eoi callback into irq_ack
 - Fix initialisation of on-stack msi_alloc_info
 - Cleanup spurious comma in irq-sl28cpld

Link: https://lore.kernel.org/r/20210110110001.2328708-1-maz@kernel.org
2021-01-12 21:23:55 +01:00
Geert Uytterhoeven
e3fab2f3de ntp: Fix RTC synchronization on 32-bit platforms
Due to an integer overflow, RTC synchronization now happens every 2s
instead of the intended 11 minutes.  Fix this by forcing 64-bit
arithmetic for the sync period calculation.

Annotate the other place which multiplies seconds for consistency as well.

Fixes: c9e6189fb0 ("ntp: Make the RTC synchronization more reliable")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210111103956.290378-1-geert+renesas@glider.be
2021-01-12 21:13:01 +01:00
Chunguang Xu
aba428a0c6 timekeeping: Remove unused get_seconds()
The get_seconds() cleanup seems to have been completed, now it is
time to delete the legacy interface to avoid misuse later.

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/1606816351-26900-1-git-send-email-brookxu@tencent.com
2021-01-12 21:13:01 +01:00
Andrii Nakryiko
bcc5e6162d bpf: Allow empty module BTFs
Some modules don't declare any new types and end up with an empty BTF,
containing only valid BTF header and no types or strings sections. This
currently causes BTF validation error. There is nothing wrong with such BTF,
so fix the issue by allowing module BTFs with no types or strings.

Fixes: 36e68442d1 ("bpf: Load and verify kernel module BTFs")
Reported-by: Christopher William Snowhill <chris@kode54.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210110070341.1380086-1-andrii@kernel.org
2021-01-12 21:11:30 +01:00
Peter Zijlstra
77ca93a6b1 locking/lockdep: Avoid noinstr warning for DEBUG_LOCKDEP
vmlinux.o: warning: objtool: lock_is_held_type()+0x60: call to check_flags.part.0() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210106144017.652218215@infradead.org
2021-01-12 21:10:59 +01:00
Peter Zijlstra
0afda3a888 locking/lockdep: Cure noinstr fail
When the compiler doesn't feel like inlining, it causes a noinstr
fail:

  vmlinux.o: warning: objtool: lock_is_held_type()+0xb: call to lockdep_enabled() leaves .noinstr.text section

Fixes: 4d004099a6 ("lockdep: Fix lockdep recursion")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210106144017.592595176@infradead.org
2021-01-12 21:10:59 +01:00
Stanislav Fomichev
4be34f3d07 bpf: Don't leak memory in bpf getsockopt when optlen == 0
optlen == 0 indicates that the kernel should ignore BPF buffer
and use the original one from the user. We, however, forget
to free the temporary buffer that we've allocated for BPF.

Fixes: d8fe449a9c ("bpf: Don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE")
Reported-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210112162829.775079-1-sdf@google.com
2021-01-12 21:05:07 +01:00
KP Singh
84d571d46c bpf: Fix typo in bpf_inode_storage.c
Fix "gurranteed" -> "guaranteed" in bpf_inode_storage.c

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210112075525.256820-4-kpsingh@kernel.org
2021-01-12 16:07:57 +01:00
KP Singh
1a9c72ad4c bpf: Local storage helpers should check nullness of owner ptr passed
The verifier allows ARG_PTR_TO_BTF_ID helper arguments to be NULL, so
helper implementations need to check this before dereferencing them.
This was already fixed for the socket storage helpers but not for task
and inode.

The issue can be reproduced by attaching an LSM program to
inode_rename hook (called when moving files) which tries to get the
inode of the new file without checking for its nullness and then trying
to move an existing file to a new path:

  mv existing_file new_file_does_not_exist

The report including the sample program and the steps for reproducing
the bug:

  https://lore.kernel.org/bpf/CANaYP3HWkH91SN=wTNO9FL_2ztHfqcXKX38SSE-JJ2voh+vssw@mail.gmail.com

Fixes: 4cf1bc1f10 ("bpf: Implement task local storage")
Fixes: 8ea636848a ("bpf: Implement bpf_local_storage for inodes")
Reported-by: Gilad Reti <gilad.reti@gmail.com>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210112075525.256820-3-kpsingh@kernel.org
2021-01-12 16:07:56 +01:00
Jiri Olsa
5541075a34 bpf: Prevent double bpf_prog_put call from bpf_tracing_prog_attach
The bpf_tracing_prog_attach error path calls bpf_prog_put
on prog, which causes refcount underflow when it's called
from link_create function.

  link_create
    prog = bpf_prog_get              <-- get
    ...
    tracing_bpf_link_attach(prog..
      bpf_tracing_prog_attach(prog..
        out_put_prog:
          bpf_prog_put(prog);        <-- put

    if (ret < 0)
      bpf_prog_put(prog);            <-- put

Removing bpf_prog_put call from bpf_tracing_prog_attach
and making sure its callers call it instead.

Fixes: 4a1e7c0c63 ("bpf: Support attaching freplace programs to multiple attach points")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210111191650.1241578-1-jolsa@kernel.org
2021-01-12 00:17:34 +01:00
Linus Torvalds
a0d54b4f5b Merge tag 'trace-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
 "Blacklist properly on all archs.

  The code to blacklist notrace functions for kprobes was not using the
  right kconfig option, which caused some archs (powerpc) to possibly
  not blacklist them"

* tag 'trace-v5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/kprobes: Do the notrace functions check without kprobes on ftrace
2021-01-11 14:37:13 -08:00