Commit Graph

16112 Commits

Author SHA1 Message Date
Martin KaFai Lau
29003875bd bpf: Change bpf_setsockopt(SOL_SOCKET) to reuse sk_setsockopt()
After the prep work in the previous patches,
this patch removes most of the dup code from bpf_setsockopt(SOL_SOCKET)
and reuses them from sk_setsockopt().

The sock ptr test is added to the SO_RCVLOWAT because
the sk->sk_socket could be NULL in some of the bpf hooks.

The existing optname white-list is refactored into a new
function sol_socket_setsockopt().

Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061804.4178920-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:13 -07:00
Martin KaFai Lau
e42c7beee7 bpf: net: Consider has_current_bpf_ctx() when testing capable() in sk_setsockopt()
When bpf program calling bpf_setsockopt(SOL_SOCKET),
it could be run in softirq and doesn't make sense to do the capable
check.  There was a similar situation in bpf_setsockopt(TCP_CONGESTION).
In commit 8d650cdeda ("tcp: fix tcp_set_congestion_control() use from bpf hook"),
tcp_set_congestion_control(..., cap_net_admin) was added to skip
the cap check for bpf prog.

This patch adds sockopt_ns_capable() and sockopt_capable() for
the sk_setsockopt() to use.  They will consider the
has_current_bpf_ctx() before doing the ns_capable() and capable() test.
They are in EXPORT_SYMBOL for the ipv6 module to use in a latter patch.

Suggested-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061723.4175820-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:12 -07:00
Martin KaFai Lau
24426654ed bpf: net: Avoid sk_setsockopt() taking sk lock when called from bpf
Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from
the sk_setsockopt().  The number of supported optnames are
increasing ever and so as the duplicated code.

One issue in reusing sk_setsockopt() is that the bpf prog
has already acquired the sk lock.  This patch adds a
has_current_bpf_ctx() to tell if the sk_setsockopt() is called from
a bpf prog.  The bpf prog calling bpf_setsockopt() is either running
in_task() or in_serving_softirq().  Both cases have the current->bpf_ctx
initialized.  Thus, the has_current_bpf_ctx() only needs to
test !!current->bpf_ctx.

This patch also adds sockopt_{lock,release}_sock() helpers
for sk_setsockopt() to use.  These helpers will test
has_current_bpf_ctx() before acquiring/releasing the lock.  They are
in EXPORT_SYMBOL for the ipv6 module to use in a latter patch.

Note on the change in sock_setbindtodevice().  sockopt_lock_sock()
is done in sock_setbindtodevice() instead of doing the lock_sock
in sock_bindtoindex(..., lock_sk = true).

Reviewed-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220817061717.4175589-1-kafai@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-18 17:06:12 -07:00
Eyal Birger
7ec9fce4b3 ip_tunnel: Respect tunnel key's "flow_flags" in IP tunnels
Commit 451ef36bd2 ("ip_tunnels: Add new flow flags field to ip_tunnel_key")
added a "flow_flags" member to struct ip_tunnel_key which was later used by
the commit in the fixes tag to avoid dropping packets with sources that
aren't locally configured when set in bpf_set_tunnel_key().

VXLAN and GENEVE were made to respect this flag, ip tunnels like IPIP and GRE
were not.

This commit fixes this omission by making ip_tunnel_init_flow() receive
the flow flags from the tunnel key in the relevant collect_md paths.

Fixes: b8fff74852 ("bpf: Set flow flag to allow any source IP in bpf_tunnel_key")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Paul Chaignon <paul@isovalent.com>
Link: https://lore.kernel.org/bpf/20220818074118.726639-1-eyal.birger@gmail.com
2022-08-18 21:18:28 +02:00
Jakub Kicinski
3f5f728a72 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:

====================
bpf-next 2022-08-17

We've added 45 non-merge commits during the last 14 day(s) which contain
a total of 61 files changed, 986 insertions(+), 372 deletions(-).

The main changes are:

1) New bpf_ktime_get_tai_ns() BPF helper to access CLOCK_TAI, from Kurt
   Kanzenbach and Jesper Dangaard Brouer.

2) Few clean ups and improvements for libbpf 1.0, from Andrii Nakryiko.

3) Expose crash_kexec() as kfunc for BPF programs, from Artem Savkov.

4) Add ability to define sleepable-only kfuncs, from Benjamin Tissoires.

5) Teach libbpf's bpf_prog_load() and bpf_map_create() to gracefully handle
   unsupported names on old kernels, from Hangbin Liu.

6) Allow opting out from auto-attaching BPF programs by libbpf's BPF skeleton,
   from Hao Luo.

7) Relax libbpf's requirement for shared libs to be marked executable, from
   Henqgi Chen.

8) Improve bpf_iter internals handling of error returns, from Hao Luo.

9) Few accommodations in libbpf to support GCC-BPF quirks, from James Hilliard.

10) Fix BPF verifier logic around tracking dynptr ref_obj_id, from Joanne Koong.

11) bpftool improvements to handle full BPF program names better, from Manu
    Bretelle.

12) bpftool fixes around libcap use, from Quentin Monnet.

13) BPF map internals clean ups and improvements around memory allocations,
    from Yafang Shao.

14) Allow to use cgroup_get_from_file() on cgroupv1, allowing BPF cgroup
    iterator to work on cgroupv1, from Yosry Ahmed.

15) BPF verifier internal clean ups, from Dave Marchevsky and Joanne Koong.

16) Various fixes and clean ups for selftests/bpf and vmtest.sh, from Daniel
    Xu, Artem Savkov, Joanne Koong, Andrii Nakryiko, Shibin Koikkara Reeny.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits)
  selftests/bpf: Few fixes for selftests/bpf built in release mode
  libbpf: Clean up deprecated and legacy aliases
  libbpf: Streamline bpf_attr and perf_event_attr initialization
  libbpf: Fix potential NULL dereference when parsing ELF
  selftests/bpf: Tests libbpf autoattach APIs
  libbpf: Allows disabling auto attach
  selftests/bpf: Fix attach point for non-x86 arches in test_progs/lsm
  libbpf: Making bpf_prog_load() ignore name if kernel doesn't support
  selftests/bpf: Update CI kconfig
  selftests/bpf: Add connmark read test
  selftests/bpf: Add existing connection bpf_*_ct_lookup() test
  bpftool: Clear errno after libcap's checks
  bpf: Clear up confusion in bpf_skb_adjust_room()'s documentation
  bpftool: Fix a typo in a comment
  libbpf: Add names for auxiliary maps
  bpf: Use bpf_map_area_alloc consistently on bpf map creation
  bpf: Make __GFP_NOWARN consistent in bpf map creation
  bpf: Use bpf_map_area_free instread of kvfree
  bpf: Remove unneeded memset in queue_stack_map creation
  libbpf: preserve errno across pr_warn/pr_info/pr_debug
  ...
====================

Link: https://lore.kernel.org/r/20220817215656.1180215-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17 20:29:36 -07:00
Jakub Kicinski
bec13ba9ce Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:

====================
netfilter: conntrack and nf_tables bug fixes

The following patchset contains netfilter fixes for net.

Broken since 5.19:
  A few ancient connection tracking helpers assume TCP packets cannot
  exceed 64kb in size, but this isn't the case anymore with 5.19 when
  BIG TCP got merged, from myself.

Regressions since 5.19:
  1. 'conntrack -E expect' won't display anything because nfnetlink failed
     to enable events for expectations, only for normal conntrack events.

  2. partially revert change that added resched calls to a function that can
     be in atomic context.  Both broken and fixed up by myself.

Broken for several releases (up to original merge of nf_tables):
  Several fixes for nf_tables control plane, from Pablo.
  This fixes up resource leaks in error paths and adds more sanity
  checks for mutually exclusive attributes/flags.

Kconfig:
  NF_CONNTRACK_PROCFS is very old and doesn't provide all info provided
  via ctnetlink, so it should not default to y. From Geert Uytterhoeven.

Selftests:
  rework nft_flowtable.sh: it frequently indicated failure; the way it
  tried to detect an offload failure did not work reliably.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  testing: selftests: nft_flowtable.sh: rework test to detect offload failure
  testing: selftests: nft_flowtable.sh: use random netns names
  netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y
  netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified
  netfilter: nf_tables: disallow NFT_SET_ELEM_CATCHALL and NFT_SET_ELEM_INTERVAL_END
  netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags
  netfilter: nf_tables: validate NFTA_SET_ELEM_OBJREF based on NFT_SET_OBJECT flag
  netfilter: nf_tables: really skip inactive sets when allocating name
  netfilter: nfnetlink: re-enable conntrack expectation events
  netfilter: nf_tables: fix scheduling-while-atomic splat
  netfilter: nf_ct_irc: cap packet search space to 4k
  netfilter: nf_ct_ftp: prefer skb_linearize
  netfilter: nf_ct_h323: cap packet size at 64k
  netfilter: nf_ct_sane: remove pseudo skb linearization
  netfilter: nf_tables: possible module reference underflow in error path
  netfilter: nf_tables: disallow NFTA_SET_ELEM_KEY_END with NFT_SET_ELEM_INTERVAL_END flag
  netfilter: nf_tables: use READ_ONCE and WRITE_ONCE for shared generation id access
====================

Link: https://lore.kernel.org/r/20220817140015.25843-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17 20:17:45 -07:00
David Howells
fc4aaf9fb3 net: Fix suspicious RCU usage in bpf_sk_reuseport_detach()
bpf_sk_reuseport_detach() calls __rcu_dereference_sk_user_data_with_flags()
to obtain the value of sk->sk_user_data, but that function is only usable
if the RCU read lock is held, and neither that function nor any of its
callers hold it.

Fix this by adding a new helper, __locked_read_sk_user_data_with_flags()
that checks to see if sk->sk_callback_lock() is held and use that here
instead.

Alternatively, making __rcu_dereference_sk_user_data_with_flags() use
rcu_dereference_checked() might suffice.

Without this, the following warning can be occasionally observed:

=============================
WARNING: suspicious RCU usage
6.0.0-rc1-build2+ #563 Not tainted
-----------------------------
include/net/sock.h:592 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
5 locks held by locktest/29873:
 #0: ffff88812734b550 (&sb->s_type->i_mutex_key#9){+.+.}-{3:3}, at: __sock_release+0x77/0x121
 #1: ffff88812f5621b0 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_close+0x1c/0x70
 #2: ffff88810312f5c8 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_unhash+0x76/0x1c0
 #3: ffffffff83768bb8 (reuseport_lock){+...}-{2:2}, at: reuseport_detach_sock+0x18/0xdd
 #4: ffff88812f562438 (clock-AF_INET){++..}-{2:2}, at: bpf_sk_reuseport_detach+0x24/0xa4

stack backtrace:
CPU: 1 PID: 29873 Comm: locktest Not tainted 6.0.0-rc1-build2+ #563
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x4c/0x5f
 bpf_sk_reuseport_detach+0x6d/0xa4
 reuseport_detach_sock+0x75/0xdd
 inet_unhash+0xa5/0x1c0
 tcp_set_state+0x169/0x20f
 ? lockdep_sock_is_held+0x3a/0x3a
 ? __lock_release.isra.0+0x13e/0x220
 ? reacquire_held_locks+0x1bb/0x1bb
 ? hlock_class+0x31/0x96
 ? mark_lock+0x9e/0x1af
 __tcp_close+0x50/0x4b6
 tcp_close+0x28/0x70
 inet_release+0x8e/0xa7
 __sock_release+0x95/0x121
 sock_close+0x14/0x17
 __fput+0x20f/0x36a
 task_work_run+0xa3/0xcc
 exit_to_user_mode_prepare+0x9c/0x14d
 syscall_exit_to_user_mode+0x18/0x44
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: cf8c1e9672 ("net: refactor bpf_sk_reuseport_detach()")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hawkins Jiawei <yin31149@gmail.com>
Link: https://lore.kernel.org/r/166064248071.3502205.10036394558814861778.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17 16:42:59 -07:00
Zhengchao Shao
52327d2e39 net: sched: remove the unused return value of unregister_qdisc
Return value of unregister_qdisc is unused, remove it.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20220815030417.271894-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-16 19:37:06 -07:00
Alexander Mikhalitsyn
0ff4eb3d5e neighbour: make proxy_queue.qlen limit per-device
Right now we have a neigh_param PROXY_QLEN which specifies maximum length
of neigh_table->proxy_queue. But in fact, this limitation doesn't work well
because check condition looks like:
tbl->proxy_queue.qlen > NEIGH_VAR(p, PROXY_QLEN)

The problem is that p (struct neigh_parms) is a per-device thing,
but tbl (struct neigh_table) is a system-wide global thing.

It seems reasonable to make proxy_queue limit per-device based.

v2:
	- nothing changed in this patch
v3:
	- rebase to net tree

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@kernel.org>
Cc: Yajun Deng <yajun.deng@linux.dev>
Cc: Roopa Prabhu <roopa@nvidia.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Cc: Konstantin Khorenko <khorenko@virtuozzo.com>
Cc: kernel@openvz.org
Cc: devel@openvz.org
Suggested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-15 11:25:09 +01:00
Linus Torvalds
7ebfc85e2c Including fixes from bluetooth, bpf, can and netfilter.
A little longer PR than usual but it's all fixes, no late features.
 It's long partially because of timing, and partially because of
 follow ups to stuff that got merged a week or so before the merge
 window and wasn't as widely tested. Maybe the Bluetooth fixes are
 a little alarming so we'll address that, but the rest seems okay
 and not scary.
 
 Notably we're including a fix for the netfilter Kconfig [1], your
 WiFi warning [2] and a bluetooth fix which should unblock syzbot [3].
 
 Current release - regressions:
 
  - Bluetooth:
    - don't try to cancel uninitialized works [3]
    - L2CAP: fix use-after-free caused by l2cap_chan_put
 
  - tls: rx: fix device offload after recent rework
 
  - devlink: fix UAF on failed reload and leftover locks in mlxsw
 
 Current release - new code bugs:
 
  - netfilter:
    - flowtable: fix incorrect Kconfig dependencies [1]
    - nf_tables: fix crash when nf_trace is enabled
 
  - bpf:
    - use proper target btf when exporting attach_btf_obj_id
    - arm64: fixes for bpf trampoline support
 
  - Bluetooth:
    - ISO: unlock on error path in iso_sock_setsockopt()
    - ISO: fix info leak in iso_sock_getsockopt()
    - ISO: fix iso_sock_getsockopt for BT_DEFER_SETUP
    - ISO: fix memory corruption on iso_pinfo.base
    - ISO: fix not using the correct QoS
    - hci_conn: fix updating ISO QoS PHY
 
  - phy: dp83867: fix get nvmem cell fail
 
 Previous releases - regressions:
 
  - wifi: cfg80211: fix validating BSS pointers in
    __cfg80211_connect_result [2]
 
  - atm: bring back zatm uAPI after ATM had been removed
 
  - properly fix old bug making bonding ARP monitor mode not being
    able to work with software devices with lockless Tx
 
  - tap: fix null-deref on skb->dev in dev_parse_header_protocol
 
  - revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" it helps
    some devices and breaks others
 
  - netfilter:
    - nf_tables: many fixes rejecting cross-object linking
      which may lead to UAFs
    - nf_tables: fix null deref due to zeroed list head
    - nf_tables: validate variable length element extension
 
  - bgmac: fix a BUG triggered by wrong bytes_compl
 
  - bcmgenet: indicate MAC is in charge of PHY PM
 
 Previous releases - always broken:
 
  - bpf:
    - fix bad pointer deref in bpf_sys_bpf() injected via test infra
    - disallow non-builtin bpf programs calling the prog_run command
    - don't reinit map value in prealloc_lru_pop
    - fix UAFs during the read of map iterator fd
    - fix invalidity check for values in sk local storage map
    - reject sleepable program for non-resched map iterator
 
  - mptcp:
    - move subflow cleanup in mptcp_destroy_common()
    - do not queue data on closed subflows
 
  - virtio_net: fix memory leak inside XDP_TX with mergeable
 
  - vsock: fix memory leak when multiple threads try to connect()
 
  - rework sk_user_data sharing to prevent psock leaks
 
  - geneve: fix TOS inheriting for ipv4
 
  - tunnels & drivers: do not use RT_TOS for IPv6 flowlabel
 
  - phy: c45 baset1: do not skip aneg configuration if clock role
    is not specified
 
  - rose: avoid overflow when /proc displays timer information
 
  - x25: fix call timeouts in blocking connects
 
  - can: mcp251x: fix race condition on receive interrupt
 
  - can: j1939:
    - replace user-reachable WARN_ON_ONCE() with netdev_warn_once()
    - fix memory leak of skbs in j1939_session_destroy()
 
 Misc:
 
  - docs: bpf: clarify that many things are not uAPI
 
  - seg6: initialize induction variable to first valid array index
    (to silence clang vs objtool warning)
 
  - can: ems_usb: fix clang 14's -Wunaligned-access warning
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmL1TtkACgkQMUZtbf5S
 Iruz8Q/+O5xFFsjxuyZD0Mw9d3Jeo3ZI9PeeDvcYl5dZXVegpxqorujTFntxv1Ad
 JC8o5qqms3kO51d+W/yai6iDacEHX2YcJrupZve+vGvpOEVmBRY5O0E1AckJ18+u
 ItmjSVESkybUP5P08/An7Y0dMmj9Xb2z84dGkLe+n8lg6/fimo6Ki6yZjcOBOALu
 AYquMXUcnwztRMbTFjscbJjBd4xFMKZEtthljYtPdIReIN976wmMNYYx+jcPK7ha
 g39Kv6maklp4euerkGIJ/AMnOWHaOGCFjIaz7rr4444NDfrKdt/jeirUXJaz77Jo
 TJM2UOwgOeg6WZkSa3cmdq6UdjdkJ6LTe2CJFf1wJ1qfhAi+s8yWoszsM2Enp+66
 c/mo9jTCMAjmgEJF11idZuz2S697/5j0hvbfM3ZPgNyNBgn8qxz/Z56fNOisx95u
 TkoKKFnGH+mcm/et+omBcyLBtBVK2+/6B6mpl6btf4DOkPn5KFYWHV67uV3ksHzQ
 ye+pnzidoIG0yKbRM2EQKXk7ELKROpl52xUHko93ZinMJt0Q7jBm7tZhJozNFEzi
 hWgUvpmNXgawzLYQcJ9jJmKw3PmYZnRhvYZB/1r91YamM28Hd58k9WfpWtUtjYJN
 N0X58L6JSnKPqzR70pcFppz6iBlh0tHdcEQGWhhKU5ScS3FDxGc=
 =C5Ck
 -----END PGP SIGNATURE-----

Merge tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, bpf, can and netfilter.

  A little larger than usual but it's all fixes, no late features. It's
  large partially because of timing, and partially because of follow ups
  to stuff that got merged a week or so before the merge window and
  wasn't as widely tested. Maybe the Bluetooth fixes are a little
  alarming so we'll address that, but the rest seems okay and not scary.

  Notably we're including a fix for the netfilter Kconfig [1], your WiFi
  warning [2] and a bluetooth fix which should unblock syzbot [3].

  Current release - regressions:

   - Bluetooth:
      - don't try to cancel uninitialized works [3]
      - L2CAP: fix use-after-free caused by l2cap_chan_put

   - tls: rx: fix device offload after recent rework

   - devlink: fix UAF on failed reload and leftover locks in mlxsw

  Current release - new code bugs:

   - netfilter:
      - flowtable: fix incorrect Kconfig dependencies [1]
      - nf_tables: fix crash when nf_trace is enabled

   - bpf:
      - use proper target btf when exporting attach_btf_obj_id
      - arm64: fixes for bpf trampoline support

   - Bluetooth:
      - ISO: unlock on error path in iso_sock_setsockopt()
      - ISO: fix info leak in iso_sock_getsockopt()
      - ISO: fix iso_sock_getsockopt for BT_DEFER_SETUP
      - ISO: fix memory corruption on iso_pinfo.base
      - ISO: fix not using the correct QoS
      - hci_conn: fix updating ISO QoS PHY

   - phy: dp83867: fix get nvmem cell fail

  Previous releases - regressions:

   - wifi: cfg80211: fix validating BSS pointers in
     __cfg80211_connect_result [2]

   - atm: bring back zatm uAPI after ATM had been removed

   - properly fix old bug making bonding ARP monitor mode not being able
     to work with software devices with lockless Tx

   - tap: fix null-deref on skb->dev in dev_parse_header_protocol

   - revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" it helps some
     devices and breaks others

   - netfilter:
      - nf_tables: many fixes rejecting cross-object linking which may
        lead to UAFs
      - nf_tables: fix null deref due to zeroed list head
      - nf_tables: validate variable length element extension

   - bgmac: fix a BUG triggered by wrong bytes_compl

   - bcmgenet: indicate MAC is in charge of PHY PM

  Previous releases - always broken:

   - bpf:
      - fix bad pointer deref in bpf_sys_bpf() injected via test infra
      - disallow non-builtin bpf programs calling the prog_run command
      - don't reinit map value in prealloc_lru_pop
      - fix UAFs during the read of map iterator fd
      - fix invalidity check for values in sk local storage map
      - reject sleepable program for non-resched map iterator

   - mptcp:
      - move subflow cleanup in mptcp_destroy_common()
      - do not queue data on closed subflows

   - virtio_net: fix memory leak inside XDP_TX with mergeable

   - vsock: fix memory leak when multiple threads try to connect()

   - rework sk_user_data sharing to prevent psock leaks

   - geneve: fix TOS inheriting for ipv4

   - tunnels & drivers: do not use RT_TOS for IPv6 flowlabel

   - phy: c45 baset1: do not skip aneg configuration if clock role is
     not specified

   - rose: avoid overflow when /proc displays timer information

   - x25: fix call timeouts in blocking connects

   - can: mcp251x: fix race condition on receive interrupt

   - can: j1939:
      - replace user-reachable WARN_ON_ONCE() with netdev_warn_once()
      - fix memory leak of skbs in j1939_session_destroy()

  Misc:

   - docs: bpf: clarify that many things are not uAPI

   - seg6: initialize induction variable to first valid array index (to
     silence clang vs objtool warning)

   - can: ems_usb: fix clang 14's -Wunaligned-access warning"

* tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (117 commits)
  net: atm: bring back zatm uAPI
  dpaa2-eth: trace the allocated address instead of page struct
  net: add missing kdoc for struct genl_multicast_group::flags
  nfp: fix use-after-free in area_cache_get()
  MAINTAINERS: use my korg address for mt7601u
  mlxsw: minimal: Fix deadlock in ports creation
  bonding: fix reference count leak in balance-alb mode
  net: usb: qmi_wwan: Add support for Cinterion MV32
  bpf: Shut up kern_sys_bpf warning.
  net/tls: Use RCU API to access tls_ctx->netdev
  tls: rx: device: don't try to copy too much on detach
  tls: rx: device: bound the frag walk
  net_sched: cls_route: remove from list when handle is 0
  selftests: forwarding: Fix failing tests with old libnet
  net: refactor bpf_sk_reuseport_detach()
  net: fix refcount bug in sk_psock_get (2)
  selftests/bpf: Ensure sleepable program is rejected by hash map iter
  selftests/bpf: Add write tests for sk local storage map iterator
  selftests/bpf: Add tests for reading a dangling map iter fd
  bpf: Only allow sleepable program for resched-able iterator
  ...
2022-08-11 13:45:37 -07:00
Jakub Kicinski
5c221f0af6 net: add missing kdoc for struct genl_multicast_group::flags
Multicast group flags were added in commit 4d54cc3211 ("mptcp: avoid
lock_fast usage in accept path"), but it missed adding the kdoc.

Mention which flags go into that field, and do the same for
op structs.

Link: https://lore.kernel.org/r/20220809232012.403730-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-11 09:26:04 -07:00
Florian Westphal
0b2f3212b5 netfilter: nfnetlink: re-enable conntrack expectation events
To avoid allocation of the conntrack extension area when possible,
the default behaviour was changed to only allocate the event extension
if a userspace program is subscribed to a notification group.

Problem is that while 'conntrack -E' does enable the event allocation
behind the scenes, 'conntrack -E expect' does not: no expectation events
are delivered unless user sets
"net.netfilter.nf_conntrack_events" back to 1 (always on).

Fix the autodetection to also consider EXP type group.

We need to track the 6 event groups (3+3, new/update/destroy for events and
for expectations each) independently, else we'd disable events again
if an expectation group becomes empty while there is still an active
event group.

Fixes: 2794cdb0b9 ("netfilter: nfnetlink: allow to detect if ctnetlink listeners exist")
Reported-by: Yi Chen <yiche@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-08-11 18:09:54 +02:00
Maxim Mikityanskiy
94ce3b64c6 net/tls: Use RCU API to access tls_ctx->netdev
Currently, tls_device_down synchronizes with tls_device_resync_rx using
RCU, however, the pointer to netdev is stored using WRITE_ONCE and
loaded using READ_ONCE.

Although such approach is technically correct (rcu_dereference is
essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store
NULL), using special RCU helpers for pointers is more valid, as it
includes additional checks and might change the implementation
transparently to the callers.

Mark the netdev pointer as __rcu and use the correct RCU helpers to
access it. For non-concurrent access pass the right conditions that
guarantee safe access (locks taken, refcount value). Also use the
correct helper in mlx5e, where even READ_ONCE was missing.

The transition to RCU exposes existing issues, fixed by this commit:

1. bond_tls_device_xmit could read netdev twice, and it could become
NULL the second time, after the NULL check passed.

2. Drivers shouldn't stop processing the last packet if tls_device_down
just set netdev to NULL, before tls_dev_del was called. This prevents a
possible packet drop when transitioning to the fallback software mode.

Fixes: 89df6a8104 ("net/bonding: Implement TLS TX device offload")
Fixes: c55dcdd435 ("net/tls: Fix use-after-free after the TLS device goes down and up")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Link: https://lore.kernel.org/r/20220810081602.1435800-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10 22:58:43 -07:00
Jakub Kicinski
fbe8870f72 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
bpf 2022-08-10

We've added 23 non-merge commits during the last 7 day(s) which contain
a total of 19 files changed, 424 insertions(+), 35 deletions(-).

The main changes are:

1) Several fixes for BPF map iterator such as UAFs along with selftests, from Hou Tao.

2) Fix BPF syscall program's {copy,strncpy}_from_bpfptr() to not fault, from Jinghao Jia.

3) Reject BPF syscall programs calling BPF_PROG_RUN, from Alexei Starovoitov and YiFei Zhu.

4) Fix attach_btf_obj_id info to pick proper target BTF, from Stanislav Fomichev.

5) BPF design Q/A doc update to clarify what is not stable ABI, from Paul E. McKenney.

6) Fix BPF map's prealloc_lru_pop to not reinitialize, from Kumar Kartikeya Dwivedi.

7) Fix bpf_trampoline_put to avoid leaking ftrace hash, from Jiri Olsa.

8) Fix arm64 JIT to address sparse errors around BPF trampoline, from Xu Kuohai.

9) Fix arm64 JIT to use kvcalloc instead of kcalloc for internal program address
   offset buffer, from Aijun Sun.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (23 commits)
  selftests/bpf: Ensure sleepable program is rejected by hash map iter
  selftests/bpf: Add write tests for sk local storage map iterator
  selftests/bpf: Add tests for reading a dangling map iter fd
  bpf: Only allow sleepable program for resched-able iterator
  bpf: Check the validity of max_rdwr_access for sock local storage map iterator
  bpf: Acquire map uref in .init_seq_private for sock{map,hash} iterator
  bpf: Acquire map uref in .init_seq_private for sock local storage map iterator
  bpf: Acquire map uref in .init_seq_private for hash map iterator
  bpf: Acquire map uref in .init_seq_private for array map iterator
  bpf: Disallow bpf programs call prog_run command.
  bpf, arm64: Fix bpf trampoline instruction endianness
  selftests/bpf: Add test for prealloc_lru_pop bug
  bpf: Don't reinit map value in prealloc_lru_pop
  bpf: Allow calling bpf_prog_test kfuncs in tracing programs
  bpf, arm64: Allocate program buffer using kvcalloc instead of kcalloc
  selftests/bpf: Excercise bpf_obj_get_info_by_fd for bpf2bpf
  bpf: Use proper target btf when exporting attach_btf_obj_id
  mptcp, btf: Add struct mptcp_sock definition when CONFIG_MPTCP is disabled
  bpf: Cleanup ftrace hash in bpf_trampoline_put
  BPF: Fix potential bad pointer dereference in bpf_sys_bpf()
  ...
====================

Link: https://lore.kernel.org/r/20220810190624.10748-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10 21:48:15 -07:00
Hawkins Jiawei
2a0133723f net: fix refcount bug in sk_psock_get (2)
Syzkaller reports refcount bug as follows:
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19
Modules linked in:
CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0
 <TASK>
 __refcount_add_not_zero include/linux/refcount.h:163 [inline]
 __refcount_inc_not_zero include/linux/refcount.h:227 [inline]
 refcount_inc_not_zero include/linux/refcount.h:245 [inline]
 sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439
 tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091
 tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983
 tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057
 tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659
 tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682
 sk_backlog_rcv include/net/sock.h:1061 [inline]
 __release_sock+0x134/0x3b0 net/core/sock.c:2849
 release_sock+0x54/0x1b0 net/core/sock.c:3404
 inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909
 __sys_shutdown_sock net/socket.c:2331 [inline]
 __sys_shutdown_sock net/socket.c:2325 [inline]
 __sys_shutdown+0xf1/0x1b0 net/socket.c:2343
 __do_sys_shutdown net/socket.c:2351 [inline]
 __se_sys_shutdown net/socket.c:2349 [inline]
 __x64_sys_shutdown+0x50/0x70 net/socket.c:2349
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
 </TASK>

During SMC fallback process in connect syscall, kernel will
replaces TCP with SMC. In order to forward wakeup
smc socket waitqueue after fallback, kernel will sets
clcsk->sk_user_data to origin smc socket in
smc_fback_replace_callbacks().

Later, in shutdown syscall, kernel will calls
sk_psock_get(), which treats the clcsk->sk_user_data
as psock type, triggering the refcnt warning.

So, the root cause is that smc and psock, both will use
sk_user_data field. So they will mismatch this field
easily.

This patch solves it by using another bit(defined as
SK_USER_DATA_PSOCK) in PTRMASK, to mark whether
sk_user_data points to a psock object or not.
This patch depends on a PTRMASK introduced in commit f1ff5ce2cd
("net, sk_msg: Clear sk_user_data pointer on clone if tagged").

For there will possibly be more flags in the sk_user_data field,
this patch also refactor sk_user_data flags code to be more generic
to improve its maintainability.

Reported-and-tested-by: syzbot+5f26f85569bd179c18ce@syzkaller.appspotmail.com
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10 21:47:58 -07:00
Christophe JAILLET
84b709d310 ax88796: Fix some typo in a comment
s/by caused/be caused/
s/ax88786/ax88796/

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/7db4b622d2c3e5af58c1d1f32b81836f4af71f18.1659801746.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-09 22:14:02 -07:00
Pablo Neira Ayuso
f323ef3a0d netfilter: nf_tables: disallow jump to implicit chain from set element
Extend struct nft_data_desc to add a flag field that specifies
nft_data_init() is being called for set element data.

Use it to disallow jump to implicit chain from set element, only jump
to chain via immediate expression is allowed.

Fixes: d0e2c7de92 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09 20:13:29 +02:00
Pablo Neira Ayuso
341b694160 netfilter: nf_tables: upfront validation of data via nft_data_init()
Instead of parsing the data and then validate that type and length are
correct, pass a description of the expected data so it can be validated
upfront before parsing it to bail out earlier.

This patch adds a new .size field to specify the maximum size of the
data area. The .len field is optional and it is used as an input/output
field, it provides the specific length of the expected data in the input
path. If then .len field is not specified, then obtained length from the
netlink attribute is stored. This is required by cmp, bitwise, range and
immediate, which provide no netlink attribute that describes the data
length. The immediate expression uses the destination register type to
infer the expected data type.

Relying on opencoded validation of the expected data might lead to
subtle bugs as described in 7e6bc1f6ca ("netfilter: nf_tables:
stricter validation of element data").

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09 20:13:29 +02:00
Pablo Neira Ayuso
34aae2c2fb netfilter: nf_tables: validate variable length element extension
Update template to validate variable length extensions. This patch adds
a new .ext_len[id] field to the template to store the expected extension
length. This is used to sanity check the initialization of the variable
length extension.

Use PTR_ERR() in nft_set_elem_init() to report errors since, after this
update, there are two reason why this might fail, either because of
ENOMEM or insufficient room in the extension field (EINVAL).

Kernels up until 7e6bc1f6ca ("netfilter: nf_tables: stricter
validation of element data") allowed to copy more data to the extension
than was allocated. This ext_len field allows to validate if the
destination has the correct size as additional check.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09 19:38:16 +02:00
Kumar Kartikeya Dwivedi
6e116280b4 net: netfilter: Remove ifdefs for code shared by BPF and ctnetlink
The current ifdefry for code shared by the BPF and ctnetlink side looks
ugly. As per Pablo's request, simplify this by unconditionally compiling
in the code. This can be revisited when the shared code between the two
grows further.

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220725085130.11553-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-09 09:41:54 -07:00
Jiri Olsa
f1d41f7720 mptcp, btf: Add struct mptcp_sock definition when CONFIG_MPTCP is disabled
The btf_sock_ids array needs struct mptcp_sock BTF ID for the
bpf_skc_to_mptcp_sock helper.

When CONFIG_MPTCP is disabled, the 'struct mptcp_sock' is not
defined and resolve_btfids will complain with:

  [...]
  BTFIDS  vmlinux
  WARN: resolve_btfids: unresolved symbol mptcp_sock
  [...]

Add an empty definition for struct mptcp_sock when CONFIG_MPTCP
is disabled.

Fixes: 3bc253c2e6 ("bpf: Add bpf_skc_to_mptcp_sock_proto")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220802163324.1873044-1-jolsa@kernel.org
2022-08-08 15:30:45 +02:00
Linus Torvalds
ea0c39260d 9p-for-5.20
- a couple of fixes
 - add a tracepoint for fid refcounting
 - some cleanup/followup on fid lookup
 - some cleanup around req refcounting
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmLuKqkACgkQq06b7GqY
 5nA+RBAAvuA6AjKQSvsNxHsqSFMahwoE3cCPlF/QlmAnnZa1DzmRI3kKTAuuDKkE
 Q4c7DjxzDJCWRTlkpNUCGZycHqpu1QQbfoTo43sIqob7W8xlajeemc5Fxqtw5sPM
 m+0SzN7vvNJpCr+6pxMXwwGHSZmZOvpFjwj7cUjhpF/V1WO8bNxfCGzQdF0hX1Vn
 2HoBbFUOmacL9Z/pF3O/ZG9LwCFuRQsH0EmbFBlJy1WdDtlVHTXlzDaGQ7EGaY4D
 17UR6iITsj9ozacLhvk094PLIc3/RHDGLrm3C4Ka3zmUI7BsiYWPDmai3Pu/DNqn
 JJ5sZkdrVowxyBbGxw8GpZ4YDJtGsU5XglFPdkw+ZazxhZLNEIstPXg0HTZybrOe
 GE+WskWB0qS+RpX0tnYEcX6qOHWm3/63Yq5NG6A9tLQUSFku02jS/bCQSLBrmGWW
 Js24IWvFSTvl6XytHeldYhJP618pNUxXRSqgYv96vT/LI3mUrIMN+IVBNPujO6p2
 jIYXNoaqLoY3efXKW/WQmp7C/52ZP4ly4fOiz7qtHTQsCcIk8Xo6zwHtm/FkNEqc
 sMZdqgLrxKPNBAlT8iEtt//wU2fB7mFt988p0pc+5lAK5t0h67KZJV6vDwTAMObX
 wV6Ht+QhOHtJwO779fk8FhZPNRPYZYMutIyFNlRx+4gCpBuqJPc=
 =941k
 -----END PGP SIGNATURE-----

Merge tag '9p-for-5.20' of https://github.com/martinetd/linux

Pull 9p updates from Dominique Martinet:

 - a couple of fixes

 - add a tracepoint for fid refcounting

 - some cleanup/followup on fid lookup

 - some cleanup around req refcounting

* tag '9p-for-5.20' of https://github.com/martinetd/linux:
  net/9p: Initialize the iounit field during fid creation
  net: 9p: fix refcount leak in p9_read_work() error handling
  9p: roll p9_tag_remove into p9_req_put
  9p: Add client parameter to p9_req_put()
  9p: Drop kref usage
  9p: Fix some kernel-doc comments
  9p fid refcount: cleanup p9_fid_put calls
  9p fid refcount: add a 9p_fid_ref tracepoint
  9p fid refcount: add p9_fid_get/put wrappers
  9p: Fix minor typo in code comment
  9p: Remove unnecessary variable for old fids while walking from d_parent
  9p: Make the path walk logic more clear about when cloning is required
  9p: Track the root fid with its own variable during lookups
2022-08-06 14:48:54 -07:00
Vladimir Oltean
06799a9085 net: bonding: replace dev_trans_start() with the jiffies of the last ARP/NS
The bonding driver piggybacks on time stamps kept by the network stack
for the purpose of the netdev TX watchdog, and this is problematic
because it does not work with NETIF_F_LLTX devices.

It is hard to say why the driver looks at dev_trans_start() of the
slave->dev, considering that this is updated even by non-ARP/NS probes
sent by us, and even by traffic not sent by us at all (for example PTP
on physical slave devices). ARP monitoring in active-backup mode appears
to still work even if we track only the last TX time of actual ARP
probes.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-03 19:20:12 -07:00
Linus Torvalds
f86d1fbbe7 Networking changes for 6.0.
Core
 ----
 
  - Refactor the forward memory allocation to better cope with memory
    pressure with many open sockets, moving from a per socket cache to
    a per-CPU one
 
  - Replace rwlocks with RCU for better fairness in ping, raw sockets
    and IP multicast router.
 
  - Network-side support for IO uring zero-copy send.
 
  - A few skb drop reason improvements, including codegen the source file
    with string mapping instead of using macro magic.
 
  - Rename reference tracking helpers to a more consistent
    netdev_* schema.
 
  - Adapt u64_stats_t type to address load/store tearing issues.
 
  - Refine debug helper usage to reduce the log noise caused by bots.
 
 BPF
 ---
  - Improve socket map performance, avoiding skb cloning on read
    operation.
 
  - Add support for 64 bits enum, to match types exposed by kernel.
 
  - Introduce support for sleepable uprobes program.
 
  - Introduce support for enum textual representation in libbpf.
 
  - New helpers to implement synproxy with eBPF/XDP.
 
  - Improve loop performances, inlining indirect calls when
    possible.
 
  - Removed all the deprecated libbpf APIs.
 
  - Implement new eBPF-based LSM flavor.
 
  - Add type match support, which allow accurate queries to the
    eBPF used types.
 
  - A few TCP congetsion control framework usability improvements.
 
  - Add new infrastructure to manipulate CT entries via eBPF programs.
 
  - Allow for livepatch (KLP) and BPF trampolines to attach to the same
    kernel function.
 
 Protocols
 ---------
 
  - Introduce per network namespace lookup tables for unix sockets,
    increasing scalability and reducing contention.
 
  - Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.
 
  - Add support to forciby close TIME_WAIT TCP sockets via user-space
    tools.
 
  - Significant performance improvement for the TLS 1.3 receive path,
    both for zero-copy and not-zero-copy.
 
  - Support for changing the initial MTPCP subflow priority/backup
    status
 
  - Introduce virtually contingus buffers for sockets over RDMA,
    to cope better with memory pressure.
 
  - Extend CAN ethtool support with timestamping capabilities
 
  - Refactor CAN build infrastructure to allow building only the needed
    features.
 
 Driver API
 ----------
 
  - Remove devlink mutex to allow parallel commands on multiple links.
 
  - Add support for pause stats in distributed switch.
 
  - Implement devlink helpers to query and flash line cards.
 
  - New helper for phy mode to register conversion.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.
 
  - Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.
 
  - Ethernet DSA driver for the Microchip LAN937x switch.
 
  - Ethernet PHY driver for the Aquantia AQR113C EPHY.
 
  - CAN driver for the OBD-II ELM327 interface.
 
  - CAN driver for RZ/N1 SJA1000 CAN controller.
 
  - Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.
 
 Drivers
 -------
 
  - Intel Ethernet NICs:
    - i40e: add support for vlan pruning
    - i40e: add support for XDP framented packets
    - ice: improved vlan offload support
    - ice: add support for PPPoE offload
 
  - Mellanox Ethernet (mlx5)
    - refactor packet steering offload for performance and scalability
    - extend support for TC offload
    - refactor devlink code to clean-up the locking schema
    - support stacked vlans for bridge offloads
    - use TLS objects pool to improve connection rate
 
  - Netronome Ethernet NICs (nfp):
    - extend support for IPv6 fields mangling offload
    - add support for vepa mode in HW bridge
    - better support for virtio data path acceleration (VDPA)
    - enable TSO by default
 
  - Microsoft vNIC driver (mana)
    - add support for XDP redirect
 
  - Others Ethernet drivers:
    - bonding: add per-port priority support
    - microchip lan743x: extend phy support
    - Fungible funeth: support UDP segmentation offload and XDP xmit
    - Solarflare EF100: add support for virtual function representors
    - MediaTek SoC: add XDP support
 
  - Mellanox Ethernet/IB switch (mlxsw):
    - dropped support for unreleased H/W (XM router).
    - improved stats accuracy
    - unified bridge model coversion improving scalability
      (parts 1-6)
    - support for PTP in Spectrum-2 asics
 
  - Broadcom PHYs
    - add PTP support for BCM54210E
    - add support for the BCM53128 internal PHY
 
  - Marvell Ethernet switches (prestera):
    - implement support for multicast forwarding offload
 
  - Embedded Ethernet switches:
    - refactor OcteonTx MAC filter for better scalability
    - improve TC H/W offload for the Felix driver
    - refactor the Microchip ksz8 and ksz9477 drivers to share
      the probe code (parts 1, 2), add support for phylink
      mac configuration
 
  - Other WiFi:
    - Microchip wilc1000: diable WEP support and enable WPA3
    - Atheros ath10k: encapsulation offload support
 
 Old code removal:
 
  - Neterion vxge ethernet driver: this is untouched since more than
    10 years.
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmLqN+oSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkB9kQAI9VqW0c3SfiTJnkVBEIovZ6Tnh5stD2
 UYFkh1BdchLsYxi7W4XMpVPSzRztiTP87mIx5c/KvIzj+QNeWL1XWRJSPdI9HhTD
 pTAA/tM2OG7bqrbyQiKDNfpQdNl7+kk1RwnYd+f9RFl1QVuIJaYhmjVwrsN5xF/+
 jUsotpROarM2dGFWiFwJbKhP2zMDT+6qEEahM8pEPggKhv8wRLYjany2cZVEe4e0
 WGUpbINAS8gEKm0Ob922WaDfDrcK/N1Z0jNz/kMaENkK18Vvc7F6bCO0DzAawKX9
 QZMMwm6mHp3EThflJAMAzCGIYiIcwLhykgdyj8rrjPhFrWbMD2Sdsbo21HOXU/8j
 u4aAhVl+d+h7emmbgBoJ8sycVJ7BQlXz7lX20sTgADv9xI4/dPhQ17CMRuwX6fXX
 JSrn6P6e1LTV5CEg6vrlSPnKPY6uhFn/cPw47FxCjRwJ9phVnp+8uZWQmf9Pz3yf
 Ok/tcj+juFbsmuOshHy2cbRkuNZNS0oRWlSTBo5795ZwOLSakMonR3L+ev2aOvzz
 DVrFp2Y/iIVwMSFdCbouYdYnhArPRhOAtCmZc2afY8aBN7aaMgrdTy3+mzUoHy3I
 FG3K+VuKpfi0vY4zn6ZoLZDIpyXIoJJ93RcSGltD32t3Dp1RaQMVEI4s45k05PVm
 1nYpXKHA8qML
 =hxEG
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking changes from Paolo Abeni:
 "Core:

   - Refactor the forward memory allocation to better cope with memory
     pressure with many open sockets, moving from a per socket cache to
     a per-CPU one

   - Replace rwlocks with RCU for better fairness in ping, raw sockets
     and IP multicast router.

   - Network-side support for IO uring zero-copy send.

   - A few skb drop reason improvements, including codegen the source
     file with string mapping instead of using macro magic.

   - Rename reference tracking helpers to a more consistent netdev_*
     schema.

   - Adapt u64_stats_t type to address load/store tearing issues.

   - Refine debug helper usage to reduce the log noise caused by bots.

  BPF:

   - Improve socket map performance, avoiding skb cloning on read
     operation.

   - Add support for 64 bits enum, to match types exposed by kernel.

   - Introduce support for sleepable uprobes program.

   - Introduce support for enum textual representation in libbpf.

   - New helpers to implement synproxy with eBPF/XDP.

   - Improve loop performances, inlining indirect calls when possible.

   - Removed all the deprecated libbpf APIs.

   - Implement new eBPF-based LSM flavor.

   - Add type match support, which allow accurate queries to the eBPF
     used types.

   - A few TCP congetsion control framework usability improvements.

   - Add new infrastructure to manipulate CT entries via eBPF programs.

   - Allow for livepatch (KLP) and BPF trampolines to attach to the same
     kernel function.

  Protocols:

   - Introduce per network namespace lookup tables for unix sockets,
     increasing scalability and reducing contention.

   - Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.

   - Add support to forciby close TIME_WAIT TCP sockets via user-space
     tools.

   - Significant performance improvement for the TLS 1.3 receive path,
     both for zero-copy and not-zero-copy.

   - Support for changing the initial MTPCP subflow priority/backup
     status

   - Introduce virtually contingus buffers for sockets over RDMA, to
     cope better with memory pressure.

   - Extend CAN ethtool support with timestamping capabilities

   - Refactor CAN build infrastructure to allow building only the needed
     features.

  Driver API:

   - Remove devlink mutex to allow parallel commands on multiple links.

   - Add support for pause stats in distributed switch.

   - Implement devlink helpers to query and flash line cards.

   - New helper for phy mode to register conversion.

  New hardware / drivers:

   - Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.

   - Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.

   - Ethernet DSA driver for the Microchip LAN937x switch.

   - Ethernet PHY driver for the Aquantia AQR113C EPHY.

   - CAN driver for the OBD-II ELM327 interface.

   - CAN driver for RZ/N1 SJA1000 CAN controller.

   - Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.

  Drivers:

   - Intel Ethernet NICs:
      - i40e: add support for vlan pruning
      - i40e: add support for XDP framented packets
      - ice: improved vlan offload support
      - ice: add support for PPPoE offload

   - Mellanox Ethernet (mlx5)
      - refactor packet steering offload for performance and scalability
      - extend support for TC offload
      - refactor devlink code to clean-up the locking schema
      - support stacked vlans for bridge offloads
      - use TLS objects pool to improve connection rate

   - Netronome Ethernet NICs (nfp):
      - extend support for IPv6 fields mangling offload
      - add support for vepa mode in HW bridge
      - better support for virtio data path acceleration (VDPA)
      - enable TSO by default

   - Microsoft vNIC driver (mana)
      - add support for XDP redirect

   - Others Ethernet drivers:
      - bonding: add per-port priority support
      - microchip lan743x: extend phy support
      - Fungible funeth: support UDP segmentation offload and XDP xmit
      - Solarflare EF100: add support for virtual function representors
      - MediaTek SoC: add XDP support

   - Mellanox Ethernet/IB switch (mlxsw):
      - dropped support for unreleased H/W (XM router).
      - improved stats accuracy
      - unified bridge model coversion improving scalability (parts 1-6)
      - support for PTP in Spectrum-2 asics

   - Broadcom PHYs
      - add PTP support for BCM54210E
      - add support for the BCM53128 internal PHY

   - Marvell Ethernet switches (prestera):
      - implement support for multicast forwarding offload

   - Embedded Ethernet switches:
      - refactor OcteonTx MAC filter for better scalability
      - improve TC H/W offload for the Felix driver
      - refactor the Microchip ksz8 and ksz9477 drivers to share the
        probe code (parts 1, 2), add support for phylink mac
        configuration

   - Other WiFi:
      - Microchip wilc1000: diable WEP support and enable WPA3
      - Atheros ath10k: encapsulation offload support

  Old code removal:

   - Neterion vxge ethernet driver: this is untouched since more than 10 years"

* tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits)
  doc: sfp-phylink: Fix a broken reference
  wireguard: selftests: support UML
  wireguard: allowedips: don't corrupt stack when detecting overflow
  wireguard: selftests: update config fragments
  wireguard: ratelimiter: use hrtimer in selftest
  net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ
  net: usb: ax88179_178a: Bind only to vendor-specific interface
  selftests: net: fix IOAM test skip return code
  net: usb: make USB_RTL8153_ECM non user configurable
  net: marvell: prestera: remove reduntant code
  octeontx2-pf: Reduce minimum mtu size to 60
  net: devlink: Fix missing mutex_unlock() call
  net/tls: Remove redundant workqueue flush before destroy
  net: txgbe: Fix an error handling path in txgbe_probe()
  net: dsa: Fix spelling mistakes and cleanup code
  Documentation: devlink: add add devlink-selftests to the table of contents
  dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock
  net: ionic: fix error check for vlan flags in ionic_set_nic_features()
  net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr()
  nfp: flower: add support for tunnel offload without key ID
  ...
2022-08-03 16:29:08 -07:00
Paolo Abeni
7c6327c77d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

net/ax25/af_ax25.c
  d7c4c9e075 ("ax25: fix incorrect dev_tracker usage")
  d62607c3fe ("net: rename reference+tracking helpers")

drivers/net/netdevsim/fib.c
  180a6a3ee6 ("netdevsim: fib: Fix reference count leak on route deletion failure")
  012ec02ae4 ("netdevsim: convert driver to use unlocked devlink API during init/fini")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-03 09:04:55 +02:00
Linus Torvalds
b349b1181d for-5.20/io_uring-2022-07-29
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLkm5gQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpmKMD/4l3QIrLbjYIxlfrzQcHbmYuUkbQtj3SbZg
 6ejbnGVhCs1P9DdXH8MgE2BxgpiXQE0CqOK7vbSoo5ep2n2UTLI2DIxAl74SMIo7
 0wmJXtUJySuViKr3NYVHqlN180MkQYddBz0nGElhkQBPBCMhW8CrtPCeURr/YyHp
 2RxSYBXiUx2gRyig+klnp6oPEqelcBZJUyNHdA9yVrgl/RhB/t2rKj7D++8ukQM3
 Zuyh8WIkTeTfUz9hdGG7fuCEdZN4DlO2CCEc7uy0cKi6VRCKH4hYUCqClJ+/cfd2
 43dUI2O7B6D1t/ObFh8AGIDXBDqVA6ePQohQU6gooRkfQiBPKkc9d0ts4yIhRqca
 AjkzNM+0Eve3A01loJ8J84w8oZnvNpYEv5n8/sZVLWcyU3UIs0I88nC2OBiFtoRq
 d77CtFLwOTo+r3STtAhnZOqez90rhS6BqKtqlUP346PCuFItl6/MbGtwdTbLYEFj
 CVNIb2pERWSr2NxGv4lFyXaX/cRwruxojWH7yc3rRYjr4Ykevd1pe/fMGNiMAnKw
 5em/3QU3qq0ZVcXLMihksKeHHFIQwGDRMuyuv/fktV10+yYXQ0t16WzkJT3aR8Xo
 cqs0r8+6Jnj3uYcOMzj/FoLcpEPr21hnwAtzLto1mG1Wh4JRn/D7Nx5zqxPLxcW+
 NiU6VihPOw==
 =gxeV
 -----END PGP SIGNATURE-----

Merge tag 'for-5.20/io_uring-2022-07-29' of git://git.kernel.dk/linux-block

Pull io_uring updates from Jens Axboe:

 - As per (valid) complaint in the last merge window, fs/io_uring.c has
   grown quite large these days. io_uring isn't really tied to fs
   either, as it supports a wide variety of functionality outside of
   that.

   Move the code to io_uring/ and split it into files that either
   implement a specific request type, and split some code into helpers
   as well. The code is organized a lot better like this, and io_uring.c
   is now < 4K LOC (me).

 - Deprecate the epoll_ctl opcode. It'll still work, just trigger a
   warning once if used. If we don't get any complaints on this, and I
   don't expect any, then we can fully remove it in a future release
   (me).

 - Improve the cancel hash locking (Hao)

 - kbuf cleanups (Hao)

 - Efficiency improvements to the task_work handling (Dylan, Pavel)

 - Provided buffer improvements (Dylan)

 - Add support for recv/recvmsg multishot support. This is similar to
   the accept (or poll) support for have for multishot, where a single
   SQE can trigger everytime data is received. For applications that
   expect to do more than a few receives on an instantiated socket, this
   greatly improves efficiency (Dylan).

 - Efficiency improvements for poll handling (Pavel)

 - Poll cancelation improvements (Pavel)

 - Allow specifiying a range for direct descriptor allocations (Pavel)

 - Cleanup the cqe32 handling (Pavel)

 - Move io_uring types to greatly cleanup the tracing (Pavel)

 - Tons of great code cleanups and improvements (Pavel)

 - Add a way to do sync cancelations rather than through the sqe -> cqe
   interface, as that's a lot easier to use for some use cases (me).

 - Add support to IORING_OP_MSG_RING for sending direct descriptors to a
   different ring. This avoids the usually problematic SCM case, as we
   disallow those. (me)

 - Make the per-command alloc cache we use for apoll generic, place
   limits on it, and use it for netmsg as well (me).

 - Various cleanups (me, Michal, Gustavo, Uros)

* tag 'for-5.20/io_uring-2022-07-29' of git://git.kernel.dk/linux-block: (172 commits)
  io_uring: ensure REQ_F_ISREG is set async offload
  net: fix compat pointer in get_compat_msghdr()
  io_uring: Don't require reinitable percpu_ref
  io_uring: fix types in io_recvmsg_multishot_overflow
  io_uring: Use atomic_long_try_cmpxchg in __io_account_mem
  io_uring: support multishot in recvmsg
  net: copy from user before calling __get_compat_msghdr
  net: copy from user before calling __copy_msghdr
  io_uring: support 0 length iov in buffer select in compat
  io_uring: fix multishot ending when not polled
  io_uring: add netmsg cache
  io_uring: impose max limit on apoll cache
  io_uring: add abstraction around apoll cache
  io_uring: move apoll cache to poll.c
  io_uring: consolidate hash_locked io-wq handling
  io_uring: clear REQ_F_HASH_LOCKED on hash removal
  io_uring: don't race double poll setting REQ_F_ASYNC_DATA
  io_uring: don't miss setting REQ_F_DOUBLE_POLL
  io_uring: disable multishot recvmsg
  io_uring: only trace one of complete or overflow
  ...
2022-08-02 13:20:44 -07:00
Maxim Mikityanskiy
8eaa1d1108 net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ
Striding RQ uses MTT page mapping, where each page corresponds to an XSK
frame. MTT pages have alignment requirements, and XSK frames don't have
any alignment guarantees in the unaligned mode. Frames with improper
alignment must be discarded, otherwise the packet data will be written
at a wrong address.

Fixes: 282c0c798f ("net/mlx5e: Allow XSK frames smaller than a page")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20220729121356.3990867-1-maximmi@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-02 15:07:09 +02:00
Eric Dumazet
2df91e397d net: rose: add netdev ref tracker to 'struct rose_sock'
This will help debugging netdevice refcount problems with
CONFIG_NET_DEV_REFCNT_TRACKER=y

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tested-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-01 11:59:23 -07:00
Jakub Kicinski
5fc7c5887c Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:

====================
 bpf-next 2022-07-29

We've added 22 non-merge commits during the last 4 day(s) which contain
a total of 27 files changed, 763 insertions(+), 120 deletions(-).

The main changes are:

1) Fixes to allow setting any source IP with bpf_skb_set_tunnel_key() helper,
   from Paul Chaignon.

2) Fix for bpf_xdp_pointer() helper when doing sanity checking, from Joanne Koong.

3) Fix for XDP frame length calculation, from Lorenzo Bianconi.

4) Libbpf BPF_KSYSCALL docs improvements and fixes to selftests to accommodate
   s390x quirks with socketcall(), from Ilya Leoshkevich.

5) Allow/denylist and CI configs additions to selftests/bpf to improve BPF CI,
   from Daniel Müller.

6) BPF trampoline + ftrace follow up fixes, from Song Liu and Xu Kuohai.

7) Fix allocation warnings in netdevsim, from Jakub Kicinski.

8) bpf_obj_get_opts() libbpf API allowing to provide file flags, from Joe Burton.

9) vsnprintf usage fix in bpf_snprintf_btf(), from Fedor Tokarev.

10) Various small fixes and clean ups, from Daniel Müller, Rongguang Wei,
    Jörn-Thorben Hinz, Yang Li.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (22 commits)
  bpf: Remove unneeded semicolon
  libbpf: Add bpf_obj_get_opts()
  netdevsim: Avoid allocation warnings triggered from user space
  bpf: Fix NULL pointer dereference when registering bpf trampoline
  bpf: Fix test_progs -j error with fentry/fexit tests
  selftests/bpf: Bump internal send_signal/send_signal_tracepoint timeout
  bpftool: Don't try to return value from void function in skeleton
  bpftool: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE macro
  bpf: btf: Fix vsnprintf return value check
  libbpf: Support PPC in arch_specific_syscall_pfx
  selftests/bpf: Adjust vmtest.sh to use local kernel configuration
  selftests/bpf: Copy over libbpf configs
  selftests/bpf: Sort configuration
  selftests/bpf: Attach to socketcall() in test_probe_user
  libbpf: Extend BPF_KSYSCALL documentation
  bpf, devmap: Compute proper xdp_frame len redirecting frames
  bpf: Fix bpf_xdp_pointer return pointer
  selftests/bpf: Don't assign outer source IP to host
  bpf: Set flow flag to allow any source IP in bpf_tunnel_key
  geneve: Use ip_tunnel_key flow flags in route lookups
  ...
====================

Link: https://lore.kernel.org/r/20220729230948.1313527-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-29 19:04:29 -07:00
Mike Manning
944fd1aeac net: allow unbound socket for packets in VRF when tcp_l3mdev_accept set
The commit 3c82a21f43 ("net: allow binding socket in a VRF when
there's an unbound socket") changed the inet socket lookup to avoid
packets in a VRF from matching an unbound socket. This is to ensure the
necessary isolation between the default and other VRFs for routing and
forwarding. VRF-unaware processes running in the default VRF cannot
access another VRF and have to be run with 'ip vrf exec <vrf>'. This is
to be expected with tcp_l3mdev_accept disabled, but could be reallowed
when this sysctl option is enabled. So instead of directly checking dif
and sdif in inet[6]_match, here call inet_sk_bound_dev_eq(). This
allows a match on unbound socket for non-zero sdif i.e. for packets in
a VRF, if tcp_l3mdev_accept is enabled.

Fixes: 3c82a21f43 ("net: allow binding socket in a VRF when there's an unbound socket")
Signed-off-by: Mike Manning <mvrmanning@gmail.com>
Link: https://lore.kernel.org/netdev/a54c149aed38fded2d3b5fdb1a6c89e36a083b74.camel@lasnet.de/
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-29 11:58:54 +01:00
Andy Shevchenko
29192a170e firewire: net: Make use of get_unaligned_be48(), put_unaligned_be48()
Since we have a proper endianness converters for BE 48-bit data use
them.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20220726144906.5217-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 22:21:54 -07:00
Eric Dumazet
d7c4c9e075 ax25: fix incorrect dev_tracker usage
While investigating a separate rose issue [1], and enabling
CONFIG_NET_DEV_REFCNT_TRACKER=y, Bernard reported an orthogonal ax25 issue [2]

An ax25_dev can be used by one (or many) struct ax25_cb.
We thus need different dev_tracker, one per struct ax25_cb.

After this patch is applied, we are able to focus on rose.

[1] https://lore.kernel.org/netdev/fb7544a1-f42e-9254-18cc-c9b071f4ca70@free.fr/

[2]
[  205.798723] reference already released.
[  205.798732] allocated in:
[  205.798734]  ax25_bind+0x1a2/0x230 [ax25]
[  205.798747]  __sys_bind+0xea/0x110
[  205.798753]  __x64_sys_bind+0x18/0x20
[  205.798758]  do_syscall_64+0x5c/0x80
[  205.798763]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  205.798768] freed in:
[  205.798770]  ax25_release+0x115/0x370 [ax25]
[  205.798778]  __sock_release+0x42/0xb0
[  205.798782]  sock_close+0x15/0x20
[  205.798785]  __fput+0x9f/0x260
[  205.798789]  ____fput+0xe/0x10
[  205.798792]  task_work_run+0x64/0xa0
[  205.798798]  exit_to_user_mode_prepare+0x18b/0x190
[  205.798804]  syscall_exit_to_user_mode+0x26/0x40
[  205.798808]  do_syscall_64+0x69/0x80
[  205.798812]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  205.798827] ------------[ cut here ]------------
[  205.798829] WARNING: CPU: 2 PID: 2605 at lib/ref_tracker.c:136 ref_tracker_free.cold+0x60/0x81
[  205.798837] Modules linked in: rose netrom mkiss ax25 rfcomm cmac algif_hash algif_skcipher af_alg bnep snd_hda_codec_hdmi nls_iso8859_1 i915 rtw88_8821ce rtw88_8821c x86_pkg_temp_thermal rtw88_pci intel_powerclamp rtw88_core snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio coretemp snd_hda_intel kvm_intel snd_intel_dspcfg mac80211 snd_hda_codec kvm i2c_algo_bit drm_buddy drm_dp_helper btusb drm_kms_helper snd_hwdep btrtl snd_hda_core btbcm joydev crct10dif_pclmul btintel crc32_pclmul ghash_clmulni_intel mei_hdcp btmtk intel_rapl_msr aesni_intel bluetooth input_leds snd_pcm crypto_simd syscopyarea processor_thermal_device_pci_legacy sysfillrect cryptd intel_soc_dts_iosf snd_seq sysimgblt ecdh_generic fb_sys_fops rapl libarc4 processor_thermal_device intel_cstate processor_thermal_rfim cec snd_timer ecc snd_seq_device cfg80211 processor_thermal_mbox mei_me processor_thermal_rapl mei rc_core at24 snd intel_pch_thermal intel_rapl_common ttm soundcore int340x_thermal_zone video
[  205.798948]  mac_hid acpi_pad sch_fq_codel ipmi_devintf ipmi_msghandler drm msr parport_pc ppdev lp parport ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 hid_generic usbhid hid i2c_i801 i2c_smbus r8169 xhci_pci ahci libahci realtek lpc_ich xhci_pci_renesas [last unloaded: ax25]
[  205.798992] CPU: 2 PID: 2605 Comm: ax25ipd Not tainted 5.18.11-F6BVP #3
[  205.798996] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020
[  205.798999] RIP: 0010:ref_tracker_free.cold+0x60/0x81
[  205.799005] Code: e8 d2 01 9b ff 83 7b 18 00 74 14 48 c7 c7 2f d7 ff 98 e8 10 6e fc ff 8b 7b 18 e8 b8 01 9b ff 4c 89 ee 4c 89 e7 e8 5d fd 07 00 <0f> 0b b8 ea ff ff ff e9 30 05 9b ff 41 0f b6 f7 48 c7 c7 a0 fa 4e
[  205.799008] RSP: 0018:ffffaf5281073958 EFLAGS: 00010286
[  205.799011] RAX: 0000000080000000 RBX: ffff9a0bd687ebe0 RCX: 0000000000000000
[  205.799014] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 00000000ffffffff
[  205.799016] RBP: ffffaf5281073a10 R08: 0000000000000003 R09: fffffffffffd5618
[  205.799019] R10: 0000000000ffff10 R11: 000000000000000f R12: ffff9a0bc53384d0
[  205.799022] R13: 0000000000000282 R14: 00000000ae000001 R15: 0000000000000001
[  205.799024] FS:  0000000000000000(0000) GS:ffff9a0d0f300000(0000) knlGS:0000000000000000
[  205.799028] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  205.799031] CR2: 00007ff6b8311554 CR3: 000000001ac10004 CR4: 00000000001706e0
[  205.799033] Call Trace:
[  205.799035]  <TASK>
[  205.799038]  ? ax25_dev_device_down+0xd9/0x1b0 [ax25]
[  205.799047]  ? ax25_device_event+0x9f/0x270 [ax25]
[  205.799055]  ? raw_notifier_call_chain+0x49/0x60
[  205.799060]  ? call_netdevice_notifiers_info+0x52/0xa0
[  205.799065]  ? dev_close_many+0xc8/0x120
[  205.799070]  ? unregister_netdevice_many+0x13d/0x890
[  205.799073]  ? unregister_netdevice_queue+0x90/0xe0
[  205.799076]  ? unregister_netdev+0x1d/0x30
[  205.799080]  ? mkiss_close+0x7c/0xc0 [mkiss]
[  205.799084]  ? tty_ldisc_close+0x2e/0x40
[  205.799089]  ? tty_ldisc_hangup+0x137/0x210
[  205.799092]  ? __tty_hangup.part.0+0x208/0x350
[  205.799098]  ? tty_vhangup+0x15/0x20
[  205.799103]  ? pty_close+0x127/0x160
[  205.799108]  ? tty_release+0x139/0x5e0
[  205.799112]  ? __fput+0x9f/0x260
[  205.799118]  ax25_dev_device_down+0xd9/0x1b0 [ax25]
[  205.799126]  ax25_device_event+0x9f/0x270 [ax25]
[  205.799135]  raw_notifier_call_chain+0x49/0x60
[  205.799140]  call_netdevice_notifiers_info+0x52/0xa0
[  205.799146]  dev_close_many+0xc8/0x120
[  205.799152]  unregister_netdevice_many+0x13d/0x890
[  205.799157]  unregister_netdevice_queue+0x90/0xe0
[  205.799161]  unregister_netdev+0x1d/0x30
[  205.799165]  mkiss_close+0x7c/0xc0 [mkiss]
[  205.799170]  tty_ldisc_close+0x2e/0x40
[  205.799173]  tty_ldisc_hangup+0x137/0x210
[  205.799178]  __tty_hangup.part.0+0x208/0x350
[  205.799184]  tty_vhangup+0x15/0x20
[  205.799188]  pty_close+0x127/0x160
[  205.799193]  tty_release+0x139/0x5e0
[  205.799199]  __fput+0x9f/0x260
[  205.799203]  ____fput+0xe/0x10
[  205.799208]  task_work_run+0x64/0xa0
[  205.799213]  do_exit+0x33b/0xab0
[  205.799217]  ? __handle_mm_fault+0xc4f/0x15f0
[  205.799224]  do_group_exit+0x35/0xa0
[  205.799228]  __x64_sys_exit_group+0x18/0x20
[  205.799232]  do_syscall_64+0x5c/0x80
[  205.799238]  ? handle_mm_fault+0xba/0x290
[  205.799242]  ? debug_smp_processor_id+0x17/0x20
[  205.799246]  ? fpregs_assert_state_consistent+0x26/0x50
[  205.799251]  ? exit_to_user_mode_prepare+0x49/0x190
[  205.799256]  ? irqentry_exit_to_user_mode+0x9/0x20
[  205.799260]  ? irqentry_exit+0x33/0x40
[  205.799263]  ? exc_page_fault+0x87/0x170
[  205.799268]  ? asm_exc_page_fault+0x8/0x30
[  205.799273]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  205.799277] RIP: 0033:0x7ff6b80eaca1
[  205.799281] Code: Unable to access opcode bytes at RIP 0x7ff6b80eac77.
[  205.799283] RSP: 002b:00007fff6dfd4738 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[  205.799287] RAX: ffffffffffffffda RBX: 00007ff6b8215a00 RCX: 00007ff6b80eaca1
[  205.799290] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001
[  205.799293] RBP: 0000000000000001 R08: ffffffffffffff80 R09: 0000000000000028
[  205.799295] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ff6b8215a00
[  205.799298] R13: 0000000000000000 R14: 00007ff6b821aee8 R15: 00007ff6b821af00
[  205.799304]  </TASK>

Fixes: feef318c85 ("ax25: fix UAF bugs of net_device caused by rebinding operation")
Reported-by: Bernard F6BVP <f6bvp@free.fr>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Duoming Zhou <duoming@zju.edu.cn>
Link: https://lore.kernel.org/r/20220728051821.3160118-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 22:06:15 -07:00
Vikas Gupta
08f588fa30 devlink: introduce framework for selftests
Add a framework for running selftests.
Framework exposes devlink commands and test suite(s) to the user
to execute and query the supported tests by the driver.

Below are new entries in devlink_nl_ops
devlink_nl_cmd_selftests_show_doit/dumpit: To query the supported
selftests by the drivers.
devlink_nl_cmd_selftests_run: To execute selftests. Users can
provide a test mask for executing group tests or standalone tests.

Documentation/networking/devlink/ path is already part of MAINTAINERS &
the new files come under this path. Hence no update needed to the
MAINTAINERS

Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 21:56:53 -07:00
Tariq Toukan
7adc91e0c9 net/tls: Multi-threaded calls to TX tls_dev_del
Multiple TLS device-offloaded contexts can be added in parallel via
concurrent calls to .tls_dev_add, while calls to .tls_dev_del are
sequential in tls_device_gc_task.

This is not a sustainable behavior. This creates a rate gap between add
and del operations (addition rate outperforms the deletion rate).  When
running for enough time, the TLS device resources could get exhausted,
failing to offload new connections.

Replace the single-threaded garbage collector work with a per-context
alternative, so they can be handled on several cores in parallel. Use
a new dedicated destruct workqueue for this.

Tested with mlx5 device:
Before: 22141 add/sec,   103 del/sec
After:  11684 add/sec, 11684 del/sec

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 21:50:54 -07:00
Jakub Kicinski
272ac32f56 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 18:21:16 -07:00
Ziyang Xuan
85f0173df3 ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr
Change net device's MTU to smaller than IPV6_MIN_MTU or unregister
device while matching route. That may trigger null-ptr-deref bug
for ip6_ptr probability as following.

=========================================================
BUG: KASAN: null-ptr-deref in find_match.part.0+0x70/0x134
Read of size 4 at addr 0000000000000308 by task ping6/263

CPU: 2 PID: 263 Comm: ping6 Not tainted 5.19.0-rc7+ #14
Call trace:
 dump_backtrace+0x1a8/0x230
 show_stack+0x20/0x70
 dump_stack_lvl+0x68/0x84
 print_report+0xc4/0x120
 kasan_report+0x84/0x120
 __asan_load4+0x94/0xd0
 find_match.part.0+0x70/0x134
 __find_rr_leaf+0x408/0x470
 fib6_table_lookup+0x264/0x540
 ip6_pol_route+0xf4/0x260
 ip6_pol_route_output+0x58/0x70
 fib6_rule_lookup+0x1a8/0x330
 ip6_route_output_flags_noref+0xd8/0x1a0
 ip6_route_output_flags+0x58/0x160
 ip6_dst_lookup_tail+0x5b4/0x85c
 ip6_dst_lookup_flow+0x98/0x120
 rawv6_sendmsg+0x49c/0xc70
 inet_sendmsg+0x68/0x94

Reproducer as following:
Firstly, prepare conditions:
$ip netns add ns1
$ip netns add ns2
$ip link add veth1 type veth peer name veth2
$ip link set veth1 netns ns1
$ip link set veth2 netns ns2
$ip netns exec ns1 ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1
$ip netns exec ns2 ip -6 addr add 2001:0db8:0:f101::2/64 dev veth2
$ip netns exec ns1 ifconfig veth1 up
$ip netns exec ns2 ifconfig veth2 up
$ip netns exec ns1 ip -6 route add 2000::/64 dev veth1 metric 1
$ip netns exec ns2 ip -6 route add 2001::/64 dev veth2 metric 1

Secondly, execute the following two commands in two ssh windows
respectively:
$ip netns exec ns1 sh
$while true; do ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1; ip -6 route add 2000::/64 dev veth1 metric 1; ping6 2000::2; done

$ip netns exec ns1 sh
$while true; do ip link set veth1 mtu 1000; ip link set veth1 mtu 1500; sleep 5; done

It is because ip6_ptr has been assigned to NULL in addrconf_ifdown() firstly,
then ip6_ignore_linkdown() accesses ip6_ptr directly without NULL check.

	cpu0			cpu1
fib6_table_lookup
__find_rr_leaf
			addrconf_notify [ NETDEV_CHANGEMTU ]
			addrconf_ifdown
			RCU_INIT_POINTER(dev->ip6_ptr, NULL)
find_match
ip6_ignore_linkdown

So we can add NULL check for ip6_ptr before using in ip6_ignore_linkdown() to
fix the null-ptr-deref bug.

Fixes: dcd1f57295 ("net/ipv6: Remove fib6_idev")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220728013307.656257-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 10:42:44 -07:00
Paolo Abeni
7d85e9cb40 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
ice: PPPoE offload support

Marcin Szycik says:

Add support for dissecting PPPoE and PPP-specific fields in flow dissector:
PPPoE session id and PPP protocol type. Add support for those fields in
tc-flower and support offloading PPPoE. Finally, add support for hardware
offload of PPPoE packets in switchdev mode in ice driver.

Example filter:
tc filter add dev $PF1 ingress protocol ppp_ses prio 1 flower pppoe_sid \
    1234 ppp_proto ip skip_sw action mirred egress redirect dev $VF1_PR

Changes in iproute2 are required to use the new fields (will be submitted
soon).

ICE COMMS DDP package is required to create a filter in ice.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: Add support for PPPoE hardware offload
  flow_offload: Introduce flow_match_pppoe
  net/sched: flower: Add PPPoE filter
  flow_dissector: Add PPPoE dissectors
====================

Link: https://lore.kernel.org/r/20220726203133.2171332-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-28 11:54:56 +02:00
Jakub Kicinski
5f10376b6b add missing includes and forward declarations to networking includes under linux/
Similarly to a recent include/net/ cleanup, this patch adds
missing includes to networking headers under include/linux.
All these problems are currently masked by the existing users
including the missing dependency before the broken header.

Link: https://lore.kernel.org/all/20220723045755.2676857-1-kuba@kernel.org/ v1
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20220726215652.158167-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-28 11:29:36 +02:00
Stefan Raspl
8b2fed8e27 net/smc: Pass on DMBE bit mask in IRQ handler
Make the DMBE bits, which are passed on individually in ism_move() as
parameter idx, available to the receiver.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27 13:24:42 +01:00
Stefan Raspl
0a2f4f9893 s390/ism: Cleanups
Reworked signature of the function to retrieve the system EID: No plausible
reason to use a double pointer. And neither to pass in the device as an
argument, as this identifier is by definition per system, not per device.
Plus some minor consistency edits.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27 13:24:42 +01:00
Jakub Kicinski
84c61fe1a7 tls: rx: do not use the standard strparser
TLS is a relatively poor fit for strparser. We pause the input
every time a message is received, wait for a read which will
decrypt the message, start the parser, repeat. strparser is
built to delineate the messages, wrap them in individual skbs
and let them float off into the stack or a different socket.
TLS wants the data pages and nothing else. There's no need
for TLS to keep cloning (and occasionally skb_unclone()'ing)
the TCP rx queue.

This patch uses a pre-allocated skb and attaches the skbs
from the TCP rx queue to it as frags. TLS is careful never
to modify the input skb without CoW'ing / detaching it first.

Since we call TCP rx queue cleanup directly we also get back
the benefit of skb deferred free.

Overall this results in a 6% gain in my benchmarks.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26 14:38:51 -07:00
Jakub Kicinski
3f92a64e44 tcp: allow tls to decrypt directly from the tcp rcv queue
Expose TCP rx queue accessor and cleanup, so that TLS can
decrypt directly from the TCP queue. The expectation
is that the caller can access the skb returned from
tcp_recv_skb() and up to inq bytes worth of data (some
of which may be in ->next skbs) and then call
tcp_read_done() when data has been consumed.
The socket lock must be held continuously across
those two operations.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26 14:38:51 -07:00
Jiri Pirko
7b2d9a1a50 net: devlink: introduce nested devlink entity for line card
For the purpose of exposing device info and allow flash update which is
going to be implemented in follow-up patches, introduce a possibility
for a line card to expose relation to nested devlink entity. The nested
devlink entity represents the line card.

Example:

$ devlink lc show pci/0000:01:00.0 lc 1
pci/0000:01:00.0:
  lc 1 state active type 16x100G nested_devlink auxiliary/mlxsw_core.lc.0
    supported_types:
       16x100G
$ devlink dev show auxiliary/mlxsw_core.lc.0
auxiliary/mlxsw_core.lc.0

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26 13:50:51 -07:00
Luiz Augusto von Dentz
d0be8347c6 Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
This fixes the following trace which is caused by hci_rx_work starting up
*after* the final channel reference has been put() during sock_close() but
*before* the references to the channel have been destroyed, so instead
the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to
prevent referencing a channel that is about to be destroyed.

  refcount_t: increment on 0; use-after-free.
  BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
  Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705

  CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S      W
  4.14.234-00003-g1fb6d0bd49a4-dirty #28
  Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150
  Google Inc. MSM sm8150 Flame DVT (DT)
  Workqueue: hci0 hci_rx_work
  Call trace:
   dump_backtrace+0x0/0x378
   show_stack+0x20/0x2c
   dump_stack+0x124/0x148
   print_address_description+0x80/0x2e8
   __kasan_report+0x168/0x188
   kasan_report+0x10/0x18
   __asan_load4+0x84/0x8c
   refcount_dec_and_test+0x20/0xd0
   l2cap_chan_put+0x48/0x12c
   l2cap_recv_frame+0x4770/0x6550
   l2cap_recv_acldata+0x44c/0x7a4
   hci_acldata_packet+0x100/0x188
   hci_rx_work+0x178/0x23c
   process_one_work+0x35c/0x95c
   worker_thread+0x4cc/0x960
   kthread+0x1a8/0x1c4
   ret_from_fork+0x10/0x18

Cc: stable@kernel.org
Reported-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26 13:35:24 -07:00
Wojciech Drewek
6a21b0856d flow_offload: Introduce flow_match_pppoe
Allow to offload PPPoE filters by adding flow_rule_match_pppoe.
Drivers can extract PPPoE specific fields from now on.

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26 10:49:27 -07:00
Wojciech Drewek
46126db9c8 flow_dissector: Add PPPoE dissectors
Allow to dissect PPPoE specific fields which are:
- session ID (16 bits)
- ppp protocol (16 bits)
- type (16 bits) - this is PPPoE ethertype, for now only
  ETH_P_PPP_SES is supported, possible ETH_P_PPP_DISC
  in the future

The goal is to make the following TC command possible:

  # tc filter add dev ens6f0 ingress prio 1 protocol ppp_ses \
      flower \
        pppoe_sid 12 \
        ppp_proto ip \
      action drop

Note that only PPPoE Session is supported.

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26 09:49:12 -07:00
Paul Chaignon
451ef36bd2 ip_tunnels: Add new flow flags field to ip_tunnel_key
This commit extends the ip_tunnel_key struct with a new field for the
flow flags, to pass them to the route lookups. This new field will be
populated and used in subsequent commits.

Signed-off-by: Paul Chaignon <paul@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/f8bfd4983bd06685a59b1e3ba76ca27496f51ef3.1658759380.git.paul@isovalent.com
2022-07-26 12:43:16 +02:00
Jakub Kicinski
2baf8ba532 wireless-next patches for v5.20
Third set of patches for v5.20. MLO work continues and we have a lot
 of stack changes due to that, including driver API changes. Not much
 driver patches except on mt76.
 
 Major changes:
 
 cfg80211/mac80211
 
 * more prepartion for Wi-Fi 7 Multi-Link Operation (MLO) support,
   works with one link now
 
 * align with IEEE Draft P802.11be_D2.0
 
 * hardware timestamps for receive and transmit
 
 mt76
 
 * preparation for new chipset support
 
 * ACPI SAR support
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmLe1k4RHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZvxlQf8DrZIllhF0q/7Wry3JuG0gbNA+V2eI/lc
 OYrephsDBm/dvvyjcFWcdUzxoNk0k1+aOrx/09JijHFgCGKVwuK1+hxYVfjW2q43
 9mHxJBo4NcMk1RDDM3paVuZ8QMHuYugbv2mQOZeAEq2XloAaqEM7wVE+bb4Mgtgx
 VAKS5du2igrSt83wl8BRMFb9MPAM1EQ3Cw7Ro5T4y+1Qm/hrBm6qWizSpqh9CXYx
 pDLR3pvQxiD4Axa0Uq3rUbyF4hLwciqSFOJvr2sI3q7b9YElA7wIi6NQzMkYJH6Z
 7HW5K6UIQbblAaQkv2BLqpU1N6puTHUOAf5Md31vOAaOcGbSI5hbUA==
 =Cnxg
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2022-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v5.20

Third set of patches for v5.20. MLO work continues and we have a lot
of stack changes due to that, including driver API changes. Not much
driver patches except on mt76.

Major changes:

cfg80211/mac80211
 - more prepartion for Wi-Fi 7 Multi-Link Operation (MLO) support,
   works with one link now
 - align with IEEE Draft P802.11be_D2.0
 - hardware timestamps for receive and transmit

mt76
 - preparation for new chipset support
 - ACPI SAR support

* tag 'wireless-next-2022-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (254 commits)
  wifi: mac80211: fix link data leak
  wifi: mac80211: mlme: fix disassoc with MLO
  wifi: mac80211: add macros to loop over active links
  wifi: mac80211: remove erroneous sband/link validation
  wifi: mac80211: mlme: transmit assoc frame with address translation
  wifi: mac80211: verify link addresses are different
  wifi: mac80211: rx: track link in RX data
  wifi: mac80211: optionally implement MLO multicast TX
  wifi: mac80211: expand ieee80211_mgmt_tx() for MLO
  wifi: nl80211: add MLO link ID to the NL80211_CMD_FRAME TX API
  wifi: mac80211: report link ID to cfg80211 on mgmt RX
  wifi: cfg80211: report link ID in NL80211_CMD_FRAME
  wifi: mac80211: add hardware timestamps for RX and TX
  wifi: cfg80211: add hardware timestamps to frame RX info
  wifi: cfg80211/nl80211: move rx management data into a struct
  wifi: cfg80211: add a function for reporting TX status with hardware timestamps
  wifi: nl80211: add RX and TX timestamp attributes
  wifi: ieee80211: add helper functions for detecting TM/FTM frames
  wifi: mac80211_hwsim: handle links for wmediumd/virtio
  wifi: mac80211: sta_info: fix link_sta insertion
  ...
====================

Link: https://lore.kernel.org/r/20220725174547.EA465C341C6@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-25 18:20:52 -07:00
David S. Miller
e222dc8d84 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2022-07-20

1) Don't set DST_NOPOLICY in IPv4, a recent patch made this
   superfluous. From Eyal Birger.

2) Convert alg_key to flexible array member to avoid an iproute2
   compile warning when built with gcc-12.
   From Stephen Hemminger.

3) xfrm_register_km and xfrm_unregister_km do always return 0
   so change the type to void. From Zhengchao Shao.

4) Fix spelling mistake in esp6.c
   From Zhang Jiaming.

5) Improve the wording of comment above XFRM_OFFLOAD flags.
   From Petr Vaněk.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25 13:25:39 +01:00
Kuniyuki Iwashima
0273954595 net: Fix data-races around sysctl_[rw]mem(_offset)?.
While reading these sysctl variables, they can be changed concurrently.
Thus, we need to add READ_ONCE() to their readers.

  - .sysctl_rmem
  - .sysctl_rwmem
  - .sysctl_rmem_offset
  - .sysctl_wmem_offset
  - sysctl_tcp_rmem[1, 2]
  - sysctl_tcp_wmem[1, 2]
  - sysctl_decnet_rmem[1]
  - sysctl_decnet_wmem[1]
  - sysctl_tipc_rmem[1]

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25 12:42:09 +01:00
Dylan Yudaken
72c531f8ef net: copy from user before calling __get_compat_msghdr
this is in preparation for multishot receive from io_uring, where it needs
to have access to the original struct user_msghdr.

functionally this should be a no-op.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220714110258.1336200-3-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:17 -06:00
Matthias May
7074732c8f ip_tunnels: allow VXLAN/GENEVE to inherit TOS/TTL from VLAN
The current code allows for VXLAN and GENEVE to inherit the TOS
respective the TTL when skb-protocol is ETH_P_IP or ETH_P_IPV6.
However when the payload is VLAN encapsulated, then this inheriting
does not work, because the visible skb-protocol is of type
ETH_P_8021Q or ETH_P_8021AD.

Instead of skb->protocol use skb_protocol().

Signed-off-by: Matthias May <matthias.may@westermo.com>
Link: https://lore.kernel.org/r/20220721202718.10092-1-matthias.may@westermo.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-22 21:47:28 -07:00
Jakub Kicinski
4a934eca7b bluetooth-next pull request for net-next:
- Add support for IM Networks PID 0x3568
  - Add support for BCM4349B1
  - Add support for CYW55572
  - Add support for MT7922 VID/PID 0489/e0e2
  - Add support for Realtek RTL8852C
  - Initial support for Isochronous Channels/ISO sockets
  - Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING quirk
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmLbPmsZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKSo7EACc8njgHO2pN8ncWvgu/gH8
 0v1lRBoi+Tyzk5gZtdM0rIbE3t7tFqml3Kr0WsCkzV6CGgnqCw5i/MKZXCV8G4tG
 0ZsY8y6NMiCFR6wQq3rdNS8NqPmlCHWm6yY2EISEM6qbtF8HwxQvXdkzznZPHgVG
 DR7i5fAVuGA6rIL+9NSG/TxHjJvq6Bmu3v9Uu/V062I7NrBMw9Jr0Ic1EaUqgYck
 QL8+653ZZMxYPxt978UekbQIYEp3YwZ5MTACtX86j2s5tlZKuivKTIZch1vSaOi1
 1zC6up208+p2/4+Yq7FJ2kA6d2be3FD26oT1xymRhqiMakCRrHfdmTFpC7/J4ZgX
 /4mcIREkUoO2duim+91Hgt1Ww1vaD4joPwXD6AILbK1bdp0pi0gw47bQF8XO8uIh
 yPQqnoGWSJGD1VknPh5x7lGcAYQ3bgSg0L3TlQ4gN9Qc/emuC6UOQ9QPvwmyOilG
 ZrDn2p1Rpsoj8vVRTv6+CgKqLokXNUTPixCAaS4AIygRzhIzwReYYNMYUZgYMrTk
 Qf6bKqczEut1vq8NZiN3TTxLLOWHB5+3cdl/QRv4ZeoNXMXPmbtJ0y9Vud66GhZu
 vS6F+BNQapIkEbM6xrC/OepCgzrVoVn2BQuDO6SQA3pg4JxFeAl+V434Va3tqVeK
 9h6GHrl8KePjh528NXbH3Q==
 =Ne87
 -----END PGP SIGNATURE-----

Merge tag 'for-net-next-2022-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Luiz Augusto von Dentz says:

====================
bluetooth-next pull request for net-next:

 - Add support for IM Networks PID 0x3568
 - Add support for BCM4349B1
 - Add support for CYW55572
 - Add support for MT7922 VID/PID 0489/e0e2
 - Add support for Realtek RTL8852C
 - Initial support for Isochronous Channels/ISO sockets
 - Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING quirk

* tag 'for-net-next-2022-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (58 commits)
  Bluetooth: btusb: Detect if an ACL packet is in fact an ISO packet
  Bluetooth: btusb: Add support for ISO packets
  Bluetooth: ISO: Add broadcast support
  Bluetooth: Add initial implementation of BIS connections
  Bluetooth: Add BTPROTO_ISO socket type
  Bluetooth: Add initial implementation of CIS connections
  Bluetooth: hci_core: Introduce hci_recv_event_data
  Bluetooth: Convert delayed discov_off to hci_sync
  Bluetooth: Remove update_scan hci_request dependancy
  Bluetooth: Remove dead code from hci_request.c
  Bluetooth: btrtl: Fix typo in comment
  Bluetooth: MGMT: Fix holding hci_conn reference while command is queued
  Bluetooth: mgmt: Fix using hci_conn_abort
  Bluetooth: Use bt_status to convert from errno
  Bluetooth: Add bt_status
  Bluetooth: hci_sync: Split hci_dev_open_sync
  Bluetooth: hci_sync: Refactor remove Adv Monitor
  Bluetooth: hci_sync: Refactor add Adv Monitor
  Bluetooth: hci_sync: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING
  Bluetooth: btusb: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING for fake CSR
  ...
====================

Link: https://lore.kernel.org/r/20220723002232.964796-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-22 19:00:17 -07:00
Luiz Augusto von Dentz
f764a6c2c1 Bluetooth: ISO: Add broadcast support
This adds broadcast support for BTPROTO_ISO by extending the
sockaddr_iso with a new struct sockaddr_iso_bc where the socket user
can set the broadcast address when receiving, the SID and the BIS
indexes it wants to synchronize.

When using BTPROTO_ISO for broadcast the roles are:

Broadcaster -> uses connect with address set to BDADDR_ANY:
> tools/isotest -s 00:00:00:00:00:00

Broadcast Receiver -> uses listen with address set to broadcaster:
> tools/isotest -d 00:AA:01:00:00:00

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22 17:14:13 -07:00
Luiz Augusto von Dentz
eca0ae4aea Bluetooth: Add initial implementation of BIS connections
This adds initial support for BIS/BIG which includes:

== Broadcaster role: Setup a periodic advertising and create a BIG ==

> tools/isotest -s 00:00:00:00:00:00
isotest[63]: Connected [00:00:00:00:00:00]
isotest[63]: QoS BIG 0x00 BIS 0x00 Packing 0x00 Framing 0x00]
isotest[63]: Output QoS [Interval 10000 us Latency 10 ms SDU 40 PHY 0x02
RTN 2]
isotest[63]: Sending ...
isotest[63]: Number of packets: 1
isotest[63]: Socket jitter buffer: 80 buffer
< HCI Command: LE Set Perio.. (0x08|0x003e) plen 7
...
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Periodic Advertising Parameters (0x08|0x003e) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Perio.. (0x08|0x003f) plen 7
...
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Periodic Advertising Data (0x08|0x003f) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Perio.. (0x08|0x0040) plen 2
...
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Periodic Advertising Enable (0x08|0x0040) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Create B.. (0x08|0x0068) plen 31
...
> HCI Event: Command Status (0x0f) plen 4
      LE Create Broadcast Isochronous Group (0x08|0x0068) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 21
      LE Broadcast Isochronous Group Complete (0x1b)
      ...

== Broadcast Receiver role: Create a PA Sync and BIG Sync ==

> tools/isotest -i hci1 -d 00:AA:01:00:00:00
isotest[66]: Waiting for connection 00:AA:01:00:00:00...
< HCI Command: LE Periodic Advert.. (0x08|0x0044) plen 14
...
> HCI Event: Command Status (0x0f) plen 4
      LE Periodic Advertising Create Sync (0x08|0x0044) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Extended Sca.. (0x08|0x0041) plen 8
...
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Extended Scan Parameters (0x08|0x0041) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Extended Sca.. (0x08|0x0042) plen 6
...
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Extended Scan Enable (0x08|0x0042) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 29
      LE Extended Advertising Report (0x0d)
      ...
> HCI Event: LE Meta Event (0x3e) plen 16
      LE Periodic Advertising Sync Established (0x0e)
      ...
< HCI Command: LE Broadcast Isoch.. (0x08|0x006b) plen 25
...
> HCI Event: Command Status (0x0f) plen 4
      LE Broadcast Isochronous Group Create Sync (0x08|0x006b) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 17
      LE Broadcast Isochronous Group Sync Estabilished (0x1d)
      ...

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22 17:13:56 -07:00
Luiz Augusto von Dentz
ccf74f2390 Bluetooth: Add BTPROTO_ISO socket type
This introduces a new socket type BTPROTO_ISO which can be enabled with
use of ISO Socket experiemental UUID, it can used to initiate/accept
connections and transfer packets between userspace and kernel similarly
to how BTPROTO_SCO works:

Central -> uses connect with address set to destination bdaddr:
> tools/isotest -s 00:AA:01:00:00:00

Peripheral -> uses listen:
> tools/isotest -d

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22 17:13:39 -07:00
Luiz Augusto von Dentz
26afbd826e Bluetooth: Add initial implementation of CIS connections
This adds the initial implementation of CIS connections and introduces
the ISO packets/links.

== Central: Set CIG Parameters, create a CIS and Setup Data Path ==

> tools/isotest -s <address>

< HCI Command: LE Extended Create... (0x08|0x0043) plen 26
...
> HCI Event: Command Status (0x0f) plen 4
      LE Extended Create Connection (0x08|0x0043) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 31
      LE Enhanced Connection Complete (0x0a)
      ...
< HCI Command: LE Create Connected... (0x08|0x0064) plen 5
...
> HCI Event: Command Status (0x0f) plen 4
      LE Create Connected Isochronous Stream (0x08|0x0064) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 29
      LE Connected Isochronous Stream Established (0x19)
      ...
< HCI Command: LE Setup Isochronou.. (0x08|0x006e) plen 13
...
> HCI Event: Command Complete (0x0e) plen 6
      LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1
        Status: Success (0x00)
        Handle: 257
< HCI Command: LE Setup Isochronou.. (0x08|0x006e) plen 13
...
> HCI Event: Command Complete (0x0e) plen 6
      LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1
        Status: Success (0x00)
        Handle: 257

== Peripheral: Accept CIS and Setup Data Path ==

> tools/isotest -d

 HCI Event: LE Meta Event (0x3e) plen 7
      LE Connected Isochronous Stream Request (0x1a)
...
< HCI Command: LE Accept Co.. (0x08|0x0066) plen 2
...
> HCI Event: LE Meta Event (0x3e) plen 29
      LE Connected Isochronous Stream Established (0x19)
...
< HCI Command: LE Setup Is.. (0x08|0x006e) plen 13
...
> HCI Event: Command Complete (0x0e) plen 6
      LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1
        Status: Success (0x00)
        Handle: 257
< HCI Command: LE Setup Is.. (0x08|0x006e) plen 13
...
> HCI Event: Command Complete (0x0e) plen 6
      LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1
        Status: Success (0x00)
        Handle: 257

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22 17:13:22 -07:00
Jakub Kicinski
b3fce974d4 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
bpf-next 2022-07-22

We've added 73 non-merge commits during the last 12 day(s) which contain
a total of 88 files changed, 3458 insertions(+), 860 deletions(-).

The main changes are:

1) Implement BPF trampoline for arm64 JIT, from Xu Kuohai.

2) Add ksyscall/kretsyscall section support to libbpf to simplify tracing kernel
   syscalls through kprobe mechanism, from Andrii Nakryiko.

3) Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel
   function, from Song Liu & Jiri Olsa.

4) Add new kfunc infrastructure for netfilter's CT e.g. to insert and change
   entries, from Kumar Kartikeya Dwivedi & Lorenzo Bianconi.

5) Add a ksym BPF iterator to allow for more flexible and efficient interactions
   with kernel symbols, from Alan Maguire.

6) Bug fixes in libbpf e.g. for uprobe binary path resolution, from Dan Carpenter.

7) Fix BPF subprog function names in stack traces, from Alexei Starovoitov.

8) libbpf support for writing custom perf event readers, from Jon Doron.

9) Switch to use SPDX tag for BPF helper man page, from Alejandro Colomar.

10) Fix xsk send-only sockets when in busy poll mode, from Maciej Fijalkowski.

11) Reparent BPF maps and their charging on memcg offlining, from Roman Gushchin.

12) Multiple follow-up fixes around BPF lsm cgroup infra, from Stanislav Fomichev.

13) Use bootstrap version of bpftool where possible to speed up builds, from Pu Lehui.

14) Cleanup BPF verifier's check_func_arg() handling, from Joanne Koong.

15) Make non-prealloced BPF map allocations low priority to play better with
    memcg limits, from Yafang Shao.

16) Fix BPF test runner to reject zero-length data for skbs, from Zhengchao Shao.

17) Various smaller cleanups and improvements all over the place.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (73 commits)
  bpf: Simplify bpf_prog_pack_[size|mask]
  bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)
  bpf, x64: Allow to use caller address from stack
  ftrace: Allow IPMODIFY and DIRECT ops on the same function
  ftrace: Add modify_ftrace_direct_multi_nolock
  bpf/selftests: Fix couldn't retrieve pinned program in xdp veth test
  bpf: Fix build error in case of !CONFIG_DEBUG_INFO_BTF
  selftests/bpf: Fix test_verifier failed test in unprivileged mode
  selftests/bpf: Add negative tests for new nf_conntrack kfuncs
  selftests/bpf: Add tests for new nf_conntrack kfuncs
  selftests/bpf: Add verifier tests for trusted kfunc args
  net: netfilter: Add kfuncs to set and change CT status
  net: netfilter: Add kfuncs to set and change CT timeout
  net: netfilter: Add kfuncs to allocate and insert CT
  net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup
  bpf: Add documentation for kfuncs
  bpf: Add support for forcing kfunc args to be trusted
  bpf: Switch to new kfunc flags infrastructure
  tools/resolve_btfids: Add support for 8-byte BTF sets
  bpf: Introduce 8-byte BTF set
  ...
====================

Link: https://lore.kernel.org/r/20220722221218.29943-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-22 16:55:44 -07:00
Wei Wang
4d8f24eeed Revert "tcp: change pingpong threshold to 3"
This reverts commit 4a41f453be.

This to-be-reverted commit was meant to apply a stricter rule for the
stack to enter pingpong mode. However, the condition used to check for
interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is
jiffy based and might be too coarse, which delays the stack entering
pingpong mode.
We revert this patch so that we no longer use the above condition to
determine interactive session, and also reduce pingpong threshold to 1.

Fixes: 4a41f453be ("tcp: change pingpong threshold to 3")
Reported-by: LemmyHuang <hlm3280@163.com>
Suggested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220721204404.388396-1-weiwan@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-22 15:09:10 -07:00
Luiz Augusto von Dentz
dfe6d5c3ec Bluetooth: hci_core: Introduce hci_recv_event_data
This introduces hci_recv_event_data to make it simpler to access the
contents of last received event rather than having to pass its contents
to the likes of *_ind/*_cfm callbacks.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22 13:20:52 -07:00
Brian Gix
bb87672562 Bluetooth: Remove update_scan hci_request dependancy
This removes the remaining calls to HCI_OP_WRITE_SCAN_ENABLE from
hci_request call chains, and converts them to hci_sync calls.

Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22 12:55:40 -07:00
Brian Gix
ec2904c259 Bluetooth: Remove dead code from hci_request.c
The discov_update work queue is no longer used as a result
of the hci_sync rework.

The __hci_req_hci_power_on() function is no longer referenced in the
code as a result of the hci_sync rework.

Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22 12:55:21 -07:00
Gregory Greenman
9f781533bb wifi: mac80211: add macros to loop over active links
Add a preliminary version which will be updated later
to loop over vif's and sta's active links.

Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:47 +02:00
Johannes Berg
963d0e8d08 wifi: mac80211: optionally implement MLO multicast TX
For drivers using software encryption for multicast TX, such
as mac80211_hwsim, mac80211 needs to duplicate the multicast
frames on each link, if MLO is enabled. Do this, but don't
just make it dependent on the key but provide a separate flag
for drivers to opt out of this.

This is not very efficient, I expect that drivers will do it
in firmware/hardware or at least with DMA engine assistence,
so this is mostly for hwsim.

To make this work, also implement the SNS11 sequence number
space that an AP MLD shall have, and modify the API to the
__ieee80211_subif_start_xmit() function to always require the
link ID bits to be set.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:36 +02:00
Johannes Berg
e1e68b14c5 wifi: mac80211: expand ieee80211_mgmt_tx() for MLO
There are a couple of new things that should be possible
with MLO:
 * selecting the link to transmit to a station by link ID,
   which a previous patch added to the nl80211 API
 * selecting the link by frequency, similarly
 * allowing transmittion to an MLD without specifying any
   channel or link ID, with MLD addresses

Enable these use cases. Also fix the address comparison
in client mode to use the AP (MLD) address.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:35 +02:00
Johannes Berg
95f498bb49 wifi: nl80211: add MLO link ID to the NL80211_CMD_FRAME TX API
Allow optionally specifying the link ID to transmit on,
which can be done instead of the link frequency, on an
MLD addressed frame. Both can also be omitted in which
case the frame must be MLD addressed and link selection
(and address translation) will be done on lower layers.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:33 +02:00
Johannes Berg
6074c9e574 wifi: cfg80211: report link ID in NL80211_CMD_FRAME
If given by the underlying driver, report the link ID for
MLO in NL80211_CMD_FRAME.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:30 +02:00
Avraham Stern
f9202638df wifi: mac80211: add hardware timestamps for RX and TX
When the low level driver reports hardware timestamps for frame
TX status or frame RX, pass the timestamps to cfg80211.

Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:29 +02:00
Avraham Stern
1ff715ffa0 wifi: cfg80211: add hardware timestamps to frame RX info
Add hardware timestamps to management frame RX info.
This shall be used by drivers that support hardware timestamping for
Timing measurement and Fine timing measurement action frames RX.

Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:27 +02:00
Avraham Stern
00b3d84010 wifi: cfg80211/nl80211: move rx management data into a struct
The functions for reporting rx management take many arguments.
Collect all the arguments into a struct, which also make it easier
to add more arguments if needed.

Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:26 +02:00
Avraham Stern
ea7d50c925 wifi: cfg80211: add a function for reporting TX status with hardware timestamps
Add a function for reporting TX status with hardware timestamps. This
function shall be used for reporting the TX status of Timing
measurement and Fine timing measurement action frames by devices that
support reporting hardware timestamps.

Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:24 +02:00
Jakub Kicinski
949d6b405e net: add missing includes and forward declarations under net/
This patch adds missing includes to headers under include/net.
All these problems are currently masked by the existing users
including the missing dependency before the broken header.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22 12:53:22 +01:00
Kuniyuki Iwashima
36eeee75ef tcp: Fix a data-race around sysctl_tcp_adv_win_scale.
While reading sysctl_tcp_adv_win_scale, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22 12:06:17 +01:00
Lorenzo Bianconi
ef69aa3a98 net: netfilter: Add kfuncs to set and change CT status
Introduce bpf_ct_set_status and bpf_ct_change_status kfunc helpers in
order to set nf_conn field of allocated entry or update nf_conn status
field of existing inserted entry. Use nf_ct_change_status_common to
share the permitted status field changes between netlink and BPF side
by refactoring ctnetlink_change_status.

It is required to introduce two kfuncs taking nf_conn___init and nf_conn
instead of sharing one because KF_TRUSTED_ARGS flag causes strict type
checking. This would disallow passing nf_conn___init to kfunc taking
nf_conn, and vice versa. We cannot remove the KF_TRUSTED_ARGS flag as we
only want to accept refcounted pointers and not e.g. ct->master.

Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and
bpf_ct_change_* kfuncs are meant to be used on inserted or looked up
CT entry.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-10-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-21 21:03:16 -07:00
Kumar Kartikeya Dwivedi
0b38923644 net: netfilter: Add kfuncs to set and change CT timeout
Introduce bpf_ct_set_timeout and bpf_ct_change_timeout kfunc helpers in
order to change nf_conn timeout. This is same as ctnetlink_change_timeout,
hence code is shared between both by extracting it out to
__nf_ct_change_timeout. It is also updated to return an error when it
sees IPS_FIXED_TIMEOUT_BIT bit in ct->status, as that check was missing.

It is required to introduce two kfuncs taking nf_conn___init and nf_conn
instead of sharing one because KF_TRUSTED_ARGS flag causes strict type
checking. This would disallow passing nf_conn___init to kfunc taking
nf_conn, and vice versa. We cannot remove the KF_TRUSTED_ARGS flag as we
only want to accept refcounted pointers and not e.g. ct->master.

Apart from this, bpf_ct_set_timeout is only called for newly allocated
CT so it doesn't need to inspect the status field just yet. Sharing the
helpers even if it was possible would make timeout setting helper
sensitive to order of setting status and timeout after allocation.

Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and
bpf_ct_change_* kfuncs are meant to be used on inserted or looked up
CT entry.

Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-9-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-21 21:03:16 -07:00
Lorenzo Bianconi
d7e79c97c0 net: netfilter: Add kfuncs to allocate and insert CT
Introduce bpf_xdp_ct_alloc, bpf_skb_ct_alloc and bpf_ct_insert_entry
kfuncs in order to insert a new entry from XDP and TC programs.
Introduce bpf_nf_ct_tuple_parse utility routine to consolidate common
code.

We extract out a helper __nf_ct_set_timeout, used by the ctnetlink and
nf_conntrack_bpf code, extract it out to nf_conntrack_core, so that
nf_conntrack_bpf doesn't need a dependency on CONFIG_NF_CT_NETLINK.
Later this helper will be reused as a helper to set timeout of allocated
but not yet inserted CT entry.

The allocation functions return struct nf_conn___init instead of
nf_conn, to distinguish allocated CT from an already inserted or looked
up CT. This is later used to enforce restrictions on what kfuncs
allocated CT can be used with.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-8-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-21 21:03:16 -07:00
Luiz Augusto von Dentz
1f7435c8f6 Bluetooth: mgmt: Fix using hci_conn_abort
This fixes using hci_conn_abort instead of using hci_conn_abort_sync.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:16:10 -07:00
Luiz Augusto von Dentz
ca2045e059 Bluetooth: Add bt_status
This adds bt_status which can be used to convert Unix errno to
Bluetooth status.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:15:31 -07:00
Manish Mandlik
7cf5c2978f Bluetooth: hci_sync: Refactor remove Adv Monitor
Make use of hci_cmd_sync_queue for removing an advertisement monitor.

Signed-off-by: Manish Mandlik <mmandlik@google.com>
Reviewed-by: Miao-chen Chou <mcchou@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:14:55 -07:00
Manish Mandlik
b747a83690 Bluetooth: hci_sync: Refactor add Adv Monitor
Make use of hci_cmd_sync_queue for adding an advertisement monitor.

Signed-off-by: Manish Mandlik <mmandlik@google.com>
Reviewed-by: Miao-chen Chou <mcchou@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:14:32 -07:00
Zijun Hu
63b1a7dd38 Bluetooth: hci_sync: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING
Core driver addtionally checks LMP feature bit "Erroneous Data Reporting"
instead of quirk HCI_QUIRK_BROKEN_ERR_DATA_REPORTING to decide if HCI
commands HCI_Read|Write_Default_Erroneous_Data_Reporting are broken, so
remove this unnecessary quirk.

Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Tested-by: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:14:10 -07:00
Zijun Hu
766ae2422b Bluetooth: hci_sync: Check LMP feature bit instead of quirk
BT core driver should addtionally check LMP feature bit
"Erroneous Data Reporting" instead of quirk
HCI_QUIRK_BROKEN_ERR_DATA_REPORTING set by BT device driver to decide if
HCI commands HCI_Read|Write_Default_Erroneous_Data_Reporting are broken.

BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 2, Part C | page 587
This feature indicates whether the device is able to support the
Packet_Status_Flag and the HCI commands HCI_Write_Default_-
Erroneous_Data_Reporting and HCI_Read_Default_Erroneous_-
Data_Reporting.

the quirk was introduced by 'commit cde1a8a992 ("Bluetooth: btusb: Fix
and detect most of the Chinese Bluetooth controllers")' to mark HCI
commands HCI_Read|Write_Default_Erroneous_Data_Reporting broken by BT
device driver, but the reason why these two HCI commands are broken is
that feature "Erroneous Data Reporting" is not enabled by firmware, this
scenario is illustrated by below log of QCA controllers with USB I/F:

@ RAW Open: hcitool (privileged) version 2.22
< HCI Command: Read Local Supported Commands (0x04|0x0002) plen 0
> HCI Event: Command Complete (0x0e) plen 68
      Read Local Supported Commands (0x04|0x0002) ncmd 1
        Status: Success (0x00)
        Commands: 288 entries
......
          Read Default Erroneous Data Reporting (Octet 18 - Bit 2)
          Write Default Erroneous Data Reporting (Octet 18 - Bit 3)
......

< HCI Command: Read Default Erroneous Data Reporting (0x03|0x005a) plen 0
> HCI Event: Command Complete (0x0e) plen 4
      Read Default Erroneous Data Reporting (0x03|0x005a) ncmd 1
        Status: Unknown HCI Command (0x01)

< HCI Command: Read Local Supported Features (0x04|0x0003) plen 0
> HCI Event: Command Complete (0x0e) plen 12
      Read Local Supported Features (0x04|0x0003) ncmd 1
        Status: Success (0x00)
        Features: 0xff 0xfe 0x0f 0xfe 0xd8 0x3f 0x5b 0x87
          3 slot packets
......

Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Tested-by: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:13:18 -07:00
Dan Carpenter
6f43f6169a Bluetooth: clean up error pointer checking
The bt_skb_sendmsg() function can't return NULL so there is no need to
check for that.  Several of these checks were removed previously but
this one was missed.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:11:10 -07:00
Luiz Augusto von Dentz
34a718bc86 Bluetooth: HCI: Fix not always setting Scan Response/Advertising Data
The scan response and advertising data needs to be tracked on a per
instance (adv_info) since when these instaces are removed so are their
data, to fix that new flags are introduced which is used to mark when
the data changes and then checked to confirm when the data needs to be
synced with the controller.

Tested-by: Tedd Ho-Jeong An <tedd.an@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:07:30 -07:00
Abhishek Pandit-Subedi
359ee4f834 Bluetooth: Unregister suspend with userchannel
When HCI_USERCHANNEL is used, unregister the suspend notifier when
binding and register when releasing. The userchannel socket should be
left alone after open is completed.

Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-07-21 17:05:58 -07:00
Schspa Shi
877afadad2 Bluetooth: When HCI work queue is drained, only queue chained work
The HCI command, event, and data packet processing workqueue is drained
to avoid deadlock in commit
76727c02c1 ("Bluetooth: Call drain_workqueue() before resetting state").

There is another delayed work, which will queue command to this drained
workqueue. Which results in the following error report:

Bluetooth: hci2: command 0x040f tx timeout
WARNING: CPU: 1 PID: 18374 at kernel/workqueue.c:1438 __queue_work+0xdad/0x1140
Workqueue: events hci_cmd_timeout
RIP: 0010:__queue_work+0xdad/0x1140
RSP: 0000:ffffc90002cffc60 EFLAGS: 00010093
RAX: 0000000000000000 RBX: ffff8880b9d3ec00 RCX: 0000000000000000
RDX: ffff888024ba0000 RSI: ffffffff814e048d RDI: ffff8880b9d3ec08
RBP: 0000000000000008 R08: 0000000000000000 R09: 00000000b9d39700
R10: ffffffff814f73c6 R11: 0000000000000000 R12: ffff88807cce4c60
R13: 0000000000000000 R14: ffff8880796d8800 R15: ffff8880796d8800
FS:  0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c0174b4000 CR3: 000000007cae9000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ? queue_work_on+0xcb/0x110
 ? lockdep_hardirqs_off+0x90/0xd0
 queue_work_on+0xee/0x110
 process_one_work+0x996/0x1610
 ? pwq_dec_nr_in_flight+0x2a0/0x2a0
 ? rwlock_bug.part.0+0x90/0x90
 ? _raw_spin_lock_irq+0x41/0x50
 worker_thread+0x665/0x1080
 ? process_one_work+0x1610/0x1610
 kthread+0x2e9/0x3a0
 ? kthread_complete_and_exit+0x40/0x40
 ret_from_fork+0x1f/0x30
 </TASK>

To fix this, we can add a new HCI_DRAIN_WQ flag, and don't queue the
timeout workqueue while command workqueue is draining.

Fixes: 76727c02c1 ("Bluetooth: Call drain_workqueue() before resetting state")
Reported-by: syzbot+63bed493aebbf6872647@syzkaller.appspotmail.com
Signed-off-by: Schspa Shi <schspa@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-07-21 17:05:22 -07:00
Jakub Kicinski
6e0e846ee2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-21 13:03:39 -07:00
Jakub Kicinski
602ae008ab Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next:

1) Simplify nf_ct_get_tuple(), from Jackie Liu.

2) Add format to request_module() call, from Bill Wendling.

3) Add /proc/net/stats/nf_flowtable to monitor in-flight pending
   hardware offload objects to be processed, from Vlad Buslov.

4) Missing rcu annotation and accessors in the netfilter tree,
   from Florian Westphal.

5) Merge h323 conntrack helper nat hooks into single object,
   also from Florian.

6) A batch of update to fix sparse warnings treewide,
   from Florian Westphal.

7) Move nft_cmp_fast_mask() where it used, from Florian.

8) Missing const in nf_nat_initialized(), from James Yonan.

9) Use bitmap API for Maglev IPVS scheduler, from Christophe Jaillet.

10) Use refcount_inc instead of _inc_not_zero in flowtable,
    from Florian Westphal.

11) Remove pr_debug in xt_TPROXY, from Nathan Cancellor.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: xt_TPROXY: remove pr_debug invocations
  netfilter: flowtable: prefer refcount_inc
  netfilter: ipvs: Use the bitmap API to allocate bitmaps
  netfilter: nf_nat: in nf_nat_initialized(), use const struct nf_conn *
  netfilter: nf_tables: move nft_cmp_fast_mask to where its used
  netfilter: nf_tables: use correct integer types
  netfilter: nf_tables: add and use BE register load-store helpers
  netfilter: nf_tables: use the correct get/put helpers
  netfilter: x_tables: use correct integer types
  netfilter: nfnetlink: add missing __be16 cast
  netfilter: nft_set_bitmap: Fix spelling mistake
  netfilter: h323: merge nat hook pointers into one
  netfilter: nf_conntrack: use rcu accessors where needed
  netfilter: nf_conntrack: add missing __rcu annotations
  netfilter: nf_flow_table: count pending offload workqueue tasks
  net/sched: act_ct: set 'net' pointer when creating new nf_flow_table
  netfilter: conntrack: use correct format characters
  netfilter: conntrack: use fallthrough to cleanup
====================

Link: https://lore.kernel.org/r/20220720230754.209053-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-20 18:05:51 -07:00
Kuniyuki Iwashima
4845b5713a tcp: Fix data-races around sysctl_tcp_slow_start_after_idle.
While reading sysctl_tcp_slow_start_after_idle, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 35089bb203 ("[TCP]: Add tcp_slow_start_after_idle sysctl.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:14:50 +01:00
Kuniyuki Iwashima
3d72bb4188 udp: Fix a data-race around sysctl_udp_l3mdev_accept.
While reading sysctl_udp_l3mdev_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 63a6fff353 ("net: Avoid receiving packets with an l3mdev on unbound UDP sockets")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:14:49 +01:00
Kuniyuki Iwashima
9b55c20f83 ip: Fix data-races around sysctl_ip_prot_sock.
sysctl_ip_prot_sock is accessed concurrently, and there is always a chance
of data-race.  So, all readers and writers need some basic protection to
avoid load/store-tearing.

Fixes: 4548b683b7 ("Introduce a sysctl that modifies the value of PROT_SOCK.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:14:49 +01:00
Davide Caratti
ca0cab1192 net/sched: remove qdisc_root_lock() helper
the last caller has been removed with commit 96f5e66e8a ("mac80211: fix
aggregation for hardware with ampdu queues"), so it's safe to remove this
function.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/703d549e3088367651d92a059743f1be848d74b7.1658133689.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19 17:14:55 -07:00
Taehee Yoo
30e22a6ebc amt: use workqueue for gateway side message handling
There are some synchronization issues(amt->status, amt->req_cnt, etc)
if the interface is in gateway mode because gateway message handlers
are processed concurrently.
This applies a work queue for processing these messages instead of
expanding the locking context.

So, the purposes of this patch are to fix exist race conditions and to make
gateway to be able to validate a gateway status more correctly.

When the AMT gateway interface is created, it tries to establish to relay.
The establishment step looks stateless, but it should be managed well.
In order to handle messages in the gateway, it saves the current
status(i.e. AMT_STATUS_XXX).
This patch makes gateway code to be worked with a single thread.

Now, all messages except the multicast are triggered(received or
delay expired), and these messages will be stored in the event
queue(amt->events).
Then, the single worker processes stored messages asynchronously one
by one.
The multicast data message type will be still processed immediately.

Now, amt->lock is only needed to access the event queue(amt->events)
if an interface is the gateway mode.

Fixes: cbc21dc1cf ("amt: add data plane of amt interface")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-19 12:37:02 +02:00
Jiri Pirko
f655dacb59 net: devlink: remove unused locked functions
Remove locked versions of functions that are no longer used by anyone.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 20:10:48 -07:00
Jiri Pirko
012ec02ae4 netdevsim: convert driver to use unlocked devlink API during init/fini
Prepare for devlink reload being called with devlink->lock held and
convert the netdevsim driver to use unlocked devlink API during init and
fini flows. Take devl_lock() in reload_down() and reload_up() ops in the
meantime before reload cmd is converted to take the lock itself.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 20:10:48 -07:00
Jiri Pirko
eb0e9fa2c6 net: devlink: add unlocked variants of devlink_region_create/destroy() functions
Add unlocked variants of devlink_region_create/destroy() functions
to be used in drivers called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 20:10:48 -07:00
Jiri Pirko
70a2ff8936 net: devlink: add unlocked variants of devlink_dpipe*() functions
Add unlocked variants of devlink_dpipe*() functions to be used
in drivers called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 20:10:47 -07:00
Jiri Pirko
755cfa69c4 net: devlink: add unlocked variants of devlink_sb*() functions
Add unlocked variants of devlink_sb*() functions to be used
in drivers called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 20:10:47 -07:00
Jiri Pirko
c223d6a4bf net: devlink: add unlocked variants of devlink_resource*() functions
Add unlocked variants of devlink_resource*() functions to be used
in drivers called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 20:10:46 -07:00
Jiri Pirko
852e85a704 net: devlink: add unlocked variants of devling_trap*() functions
Add unlocked variants of devl_trap*() functions to be used in drivers
called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 20:10:46 -07:00
Kuniyuki Iwashima
55be873695 tcp: Fix a data-race around sysctl_tcp_notsent_lowat.
While reading sysctl_tcp_notsent_lowat, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: c9bee3b7fd ("tcp: TCP_NOTSENT_LOWAT socket option")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima
39e24435a7 tcp: Fix data-races around some timeout sysctl knobs.
While reading these sysctl knobs, they can be changed concurrently.
Thus, we need to add READ_ONCE() to their readers.

  - tcp_retries1
  - tcp_retries2
  - tcp_orphan_retries
  - tcp_fin_timeout

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima
f2f316e287 tcp: Fix data-races around keepalive sysctl knobs.
While reading sysctl_tcp_keepalive_(time|probes|intvl), they can be changed
concurrently.  Thus, we need to add READ_ONCE() to their readers.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18 12:21:54 +01:00
Jakub Kicinski
c618db2afe tls: rx: async: hold onto the input skb
Async crypto currently benefits from the fact that we decrypt
in place. When we allow input and output to be different skbs
we will have to hang onto the input while we move to the next
record. Clone the inputs and keep them on a list.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18 11:24:11 +01:00
Jakub Kicinski
53d57999fe tls: rx: remove the message decrypted tracking
We no longer allow a decrypted skb to remain linked to ctx->recv_pkt.
Anything on the list is decrypted, anything on ctx->recv_pkt needs
to be decrypted.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18 11:24:10 +01:00
Jakub Kicinski
4cbc325ed6 tls: rx: allow only one reader at a time
recvmsg() in TLS gets data from the skb list (rx_list) or fresh
skbs we read from TCP via strparser. The former holds skbs which were
already decrypted for peek or decrypted and partially consumed.

tls_wait_data() only notices appearance of fresh skbs coming out
of TCP (or psock). It is possible, if there is a concurrent call
to peek() and recv() that the peek() will move the data from input
to rx_list without recv() noticing. recv() will then read data out
of order or never wake up.

This is not a practical use case/concern, but it makes the self
tests less reliable. This patch solves the problem by allowing
only one reader in.

Because having multiple processes calling read()/peek() is not
normal avoid adding a lock and try to fast-path the single reader
case.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18 11:24:10 +01:00
Wen Gu
4bc5008e43 net/smc: Introduce a sysctl for setting SMC-R buffer type
This patch introduces the sysctl smcr_buf_type for setting
the type of SMC-R sndbufs and RMBs.

Valid values includes:

- SMCR_PHYS_CONT_BUFS, which means use physically contiguous
  buffers for better performance and is the default value.

- SMCR_VIRT_CONT_BUFS, which means use virtually contiguous
  buffers in case of physically contiguous memory is scarce.

- SMCR_MIXED_BUFS, which means first try to use physically
  contiguous buffers. If not available, then use virtually
  contiguous buffers.

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18 11:19:17 +01:00
Kuniyuki Iwashima
11052589cf tcp/udp: Make early_demux back namespacified.
Commit e21145a987 ("ipv4: namespacify ip_early_demux sysctl knob") made
it possible to enable/disable early_demux on a per-netns basis.  Then, we
introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for
TCP/UDP in commit dddb64bcb3 ("net: Add sysctl to toggle early demux for
tcp and udp").  However, the .proc_handler() was wrong and actually
disabled us from changing the behaviour in each netns.

We can execute early_demux if net.ipv4.ip_early_demux is on and each proto
.early_demux() handler is not NULL.  When we toggle (tcp|udp)_early_demux,
the change itself is saved in each netns variable, but the .early_demux()
handler is a global variable, so the handler is switched based on the
init_net's sysctl variable.  Thus, netns (tcp|udp)_early_demux knobs have
nothing to do with the logic.  Whether we CAN execute proto .early_demux()
is always decided by init_net's sysctl knob, and whether we DO it or not is
by each netns ip_early_demux knob.

This patch namespacifies (tcp|udp)_early_demux again.  For now, the users
of the .early_demux() handler are TCP and UDP only, and they are called
directly to avoid retpoline.  So, we can remove the .early_demux() handler
from inet6?_protos and need not dereference them in ip6?_rcv_finish_core().
If another proto needs .early_demux(), we can restore it at that time.

Fixes: dddb64bcb3 ("net: Add sysctl to toggle early demux for tcp and udp")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-15 18:50:35 -07:00
Kuniyuki Iwashima
08a75f1067 tcp: Fix data-races around sysctl_tcp_l3mdev_accept.
While reading sysctl_tcp_l3mdev_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 6dd9a14e92 ("net: Allow accepted sockets to be bound to l3mdev domain")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima
1a0008f9df tcp/dccp: Fix a data-race around sysctl_tcp_fwmark_accept.
While reading sysctl_tcp_fwmark_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 84f39b08d7 ("net: support marking accepting TCP sockets")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima
85d0b4dbd7 ip: Fix a data-race around sysctl_fwmark_reflect.
While reading sysctl_fwmark_reflect, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: e110861f86 ("net: add a sysctl to reflect the fwmark on replies")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima
289d3b21fb ip: Fix data-races around sysctl_ip_nonlocal_bind.
While reading sysctl_ip_nonlocal_bind, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima
60c158dc7b ip: Fix data-races around sysctl_ip_fwd_use_pmtu.
While reading sysctl_ip_fwd_use_pmtu, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: f87c10a8aa ("ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15 11:49:55 +01:00
Kuniyuki Iwashima
8281b7ec5c ip: Fix data-races around sysctl_ip_default_ttl.
While reading sysctl_ip_default_ttl, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15 11:49:55 +01:00
Andrei Otcheretianski
3e0278b717 wifi: mac80211: select link when transmitting to non-MLO stations
When an MLO AP is transmitting to a non-MLO station, addr2 should be set
to a link address. This should be done before the frame is encrypted as
otherwise aad verification would fail. In case of software encryption
this can't be left for the device to handle, and should be done by
mac80211 when building the frame hdr.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:23 +02:00
Johannes Berg
7464f66515 wifi: cfg80211: add cfg80211_get_iftype_ext_capa()
Add a helper function cfg80211_get_iftype_ext_capa() to
look up interface type-specific (extended) capabilities.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:23 +02:00
Gregory Greenman
7840bd468a wifi: mac80211: remove link_id parameter from link_info_changed()
Since struct ieee80211_bss_conf already contains link_id,
passing link_id is not necessary.

Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:20 +02:00
Gregory Greenman
727eff4dd1 wifi: mac80211: replace link_id with link_conf in switch/(un)assign_vif_chanctx()
Since mac80211 already has a protected pointer to link_conf,
pass it to the driver to avoid additional RCU locking.

Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:20 +02:00
Andrei Otcheretianski
67207bab93 wifi: cfg80211/mac80211: Support control port TX from specific link
In case of authentication with a legacy station, link addressed EAPOL
frames should be sent. Support it.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:19 +02:00
Johannes Berg
4e9c3af398 wifi: nl80211: add EML/MLD capabilities to per-iftype capabilities
We have the per-interface type capabilities, currently for
extended capabilities, add the EML/MLD capabilities there
to have this advertised by the driver.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:19 +02:00
Johannes Berg
19654a61bf wifi: cfg80211: add ieee80211_chanwidth_rate_flags()
To simplify things when we don't have a full chandef,
add ieee80211_chanwidth_rate_flags().

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:18 +02:00
Gregory Greenman
b327c84c32 wifi: mac80211: replace link_id with link_conf in start/stop_ap()
When calling start/stop_ap(), mac80211 already has a protected
link_conf pointer. Pass it to the driver, so it shouldn't
handle RCU protection.

Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:17 +02:00
Johannes Berg
5cd212cb64 wifi: cfg80211: extend cfg80211_rx_assoc_resp() for MLO
Extend the cfg80211_rx_assoc_resp() to cover multiple
BSSes, the AP MLD address and local link addresses
for MLO.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:17 +02:00
Johannes Berg
cd47c0f57a wifi: cfg80211: put cfg80211_rx_assoc_resp() arguments into a struct
For MLO we'll need a lot more arguments, including all the
BSS pointers and link addresses, so move the data to a struct
to be able to extend it more easily later.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:17 +02:00
Johannes Berg
e69dac88a1 wifi: cfg80211: adjust assoc comeback for MLO
We only report the BSSID to userspace, so change the
argument from BSS struct pointer to AP address, which
we'll use to carry either the BSSID or AP MLD address.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:17 +02:00
Johannes Berg
f662d2f4e2 wifi: cfg80211: prepare association failure APIs for MLO
For MLO, we need the ability to report back multiple BSS
structures to release, as well as the AP MLD address (if
attempting to make an MLO connection).

Unify cfg80211_assoc_timeout() and cfg80211_abandon_assoc()
into a new cfg80211_assoc_failure() that gets a structure
parameter with the necessary data.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:16 +02:00
Johannes Berg
8f6e0dfc22 wifi: cfg80211: remove BSS pointer from cfg80211_disassoc_request
The race described by the comment in mac80211 hasn't existed
since the locking rework to use the same lock and for MLO we
need to pass the AP MLD address, so just pass the BSSID or
AP MLD address instead of the BSS struct pointer, and adjust
all the code accordingly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:16 +02:00
Johannes Berg
b65567b03c wifi: mac80211: mlme: track AP (MLD) address separately
To prepare a bit more for MLO in the client code,
track the AP's address (for now only the BSSID, but
will track the AP MLD's address later) separately
from the per-link BSSID.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:16 +02:00
Johannes Berg
b3e2130bf5 wifi: mac80211: change QoS settings API to take link into account
Take the link into account in the QoS settings (EDCA parameters)
APIs.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:15 +02:00
Johannes Berg
a3b8008dc1 wifi: mac80211: move ps setting to vif config
This really shouldn't be in a per-link config, we don't want
to let anyone control it that way (if anything, link powersave
could be forced through APIs to activate/deactivate a link),
and we don't support powersave in software with devices that
can do MLO.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:15 +02:00
Johannes Berg
3fbddae46e wifi: mac80211: provide link ID in link_conf
It might be useful to drivers to be able to pass only the
link_conf pointer, rather than both the pointer and the
link_id; add the link_id to the link_conf to facility that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:15 +02:00
Johannes Berg
23cc6d8c37 wifi: cfg80211: make cfg80211_auth_request::key_idx signed
We might assign -1 to it in some cases when key is NULL,
which means the key_idx isn't used but can lead to a
warning from static checkers such as smatch.

Make the struct member signed simply to avoid that, we
only need a range of -1..3 anyway.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:14 +02:00
Johannes Berg
d8675a6351 wifi: mac80211: RCU-ify link/link_conf pointers
Since links can be added and removed dynamically, we need to
somehow protect the sdata->link[] and vif->link_conf[] array
pointers from disappearing when accessing them without locks.
RCU-ify the pointers to achieve this, which requires quite a
bit of rework.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:14 +02:00
Shaul Triebitz
b95eb7f0ee wifi: cfg80211/mac80211: separate link params from station params
Put the link_station_parameters structure in the station_parameters
structure (and remove the station_parameters fields already existing
in link_station_parameters).
Now, for an MLD station, the default link is added together with
the station.

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:13 +02:00
Shaul Triebitz
577e5b8c39 wifi: cfg80211: add API to add/modify/remove a link station
Add an API for adding/modifying/removing a link of a station.

Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-15 11:43:13 +02:00
Jiri Pirko
9a7923668b net: devlink: make devlink_dpipe_headers_register() return void
The return value is not used, so change the return value type to void.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-14 21:58:46 -07:00
Jakub Kicinski
816cd16883 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/net/sock.h
  310731e2f1 ("net: Fix data-races around sysctl_mem.")
  e70f3c7012 ("Revert "net: set SK_MEM_QUANTUM to 4096"")
https://lore.kernel.org/all/20220711120211.7c8b7cba@canb.auug.org.au/

net/ipv4/fib_semantics.c
  747c143072 ("ip: fix dflt addr selection for connected nexthop")
  d62607c3fe ("net: rename reference+tracking helpers")

net/tls/tls.h
include/net/tls.h
  3d8c51b25a ("net/tls: Check for errors in tls_device_init")
  5879031423 ("tls: create an internal header")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-14 15:27:35 -07:00
Maciej Fijalkowski
ca2e1a6270 xsk: Mark napi_id on sendmsg()
When application runs in busy poll mode and does not receive a single
packet but only sends them, it is currently impossible to get into
napi_busy_loop() as napi_id is only marked on Rx side in xsk_rcv_check().
In there, napi_id is being taken from xdp_rxq_info carried by xdp_buff.
From Tx perspective, we do not have access to it. What we have handy is
the xsk pool.

Xsk pool works on a pool of internal xdp_buff wrappers called xdp_buff_xsk.
AF_XDP ZC enabled drivers call xp_set_rxq_info() so each of xdp_buff_xsk
has a valid pointer to xdp_rxq_info of underlying queue. Therefore, on Tx
side, napi_id can be pulled from xs->pool->heads[0].xdp.rxq->napi_id. Hide
this pointer chase under helper function, xsk_pool_get_napi_id().

Do this only for sockets working in ZC mode as otherwise rxq pointers would
not be initialized.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220707130842.49408-1-maciej.fijalkowski@intel.com
2022-07-14 22:45:34 +02:00
Tariq Toukan
3d8c51b25a net/tls: Check for errors in tls_device_init
Add missing error checks in tls_device_init.

Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220714070754.1428-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-14 10:12:39 -07:00
James Yonan
9d2f00fb0a netfilter: nf_nat: in nf_nat_initialized(), use const struct nf_conn *
nf_nat_initialized() doesn't modify passed struct nf_conn,
so declare as const.

This is helpful for code readability and makes it possible
to call nf_nat_initialized() with a const struct nf_conn *.

Signed-off-by: James Yonan <james@openvpn.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-14 00:24:06 +02:00
Zhengchao Shao
bc5c8260f4 net/sched: remove return value of unregister_tcf_proto_ops
Return value of unregister_tcf_proto_ops is unused, remove it.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-13 14:46:59 +01:00
David S. Miller
736002fb6a A fairly large set of updates for next, highlights:
ath10k
  * ethernet frame format support
 
 rtw89
  * TDLS support
 
 cfg80211/mac80211
  * airtime fairness fixes
  * EHT support continued, especially in AP mode
  * initial (and still major) rework for multi-link
    operation (MLO) from 802.11be/wifi 7
 
 As usual, also many small updates/cleanups/fixes/etc.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmLOca8ACgkQB8qZga/f
 l8S2sQ//VyUyfPxKTnos4xLm9cZFYbP4/JAl+e1QwbYpa8TtQFMjyiDq+/mTiowA
 gS5qdiAllS75MyxH5LuVJ1fSWe7DmSQ1A733gO4cQUxPUtaUrtXWZpsinYT+Vk4J
 a20kOic/9KCD6j1JFLEFToaDBHxO6Rbqo1knnTuOpMXIV6H/ou0PNlj6Ys66oFLV
 V5SvsoeIfCXsN3j/8JyGgjIC52LiNLam3VfdalParurY8yAxda0ub9IKvYqL/s3M
 PZyuHUc0kJsL/2094sjmn6SKZobjTzrOQcLgq4nPXgspp+8YQ+CUf97QS8nH5rBV
 AOlv7+WOiC9Ext/rBzxwZvjCmJUZSVn44mDMjafzIfTYDn0sB9m4CpqfQpgK5zvC
 mf+jhvI99VuK3S4Zx/xRhNFZMAZZG65zkJKEACclBL2Bcs9A+z12CPIWvalEb3/k
 Hk38VlUIMWPQlbcJW7oVTNH8HNpKIuOCecxKWZC+8MDDb/ZhIYhFqFNMb5TnbOBI
 GMXIDBlfYZgvBKHgwcj9G24QGgm1P+yKGyDcnVH0KPismZwt0gm9R+VX2B4HyBnD
 neT/7wx8yxsm7ujJIF28CM+BnF9vxZKVPGUS6XhS2aarOKanAalybsm9DKLwlArZ
 Qlr2rwaTM+ZkHS82Yapv6At97IYvfiq+ju3b940aL3YrOmgHoqs=
 =smwk
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2022-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
A fairly large set of updates for next, highlights:

ath10k
 * ethernet frame format support

rtw89
 * TDLS support

cfg80211/mac80211
 * airtime fairness fixes
 * EHT support continued, especially in AP mode
 * initial (and still major) rework for multi-link
   operation (MLO) from 802.11be/wifi 7

As usual, also many small updates/cleanups/fixes/etc.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-13 14:28:52 +01:00
David S. Miller
67de8acdd3 A small set of fixes for
* queue selection in mesh/ocb
  * queue handling on interface stop
  * hwsim virtio device vs. some other virtio changes
  * dt-bindings email addresses
  * color collision memory allocation
  * a const variable in rtw88
  * shared SKB transmit in the ethernet format path
  * P2P client port authorization
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmLOcFIACgkQB8qZga/f
 l8TK6g//dM2kjGZhyDJUnUicUplN6m4sHLeVqqWCJiUaZepg0Zb3zwEhfEjXnYgn
 nWfFCqyRYN2JgESKFG2LNliAUW954ccu5mAHNoR41SXjwPxPLZblYqdirdtMsbv3
 VM6Ar7WKVWqIer103lUOmiH+tSMObuUhfESbFVByutJfRAcWOolEIJdoAQEmqoKt
 BgU0frkZLGpX9PTzJaT5KmgOnXstrWqdTY1JzLPR93k+fN0kwsOcBtwipqYTombI
 gcnIMb5eY16EHQES9Rf02PIGDe9Oka2+xr9gfOAwFE5JWgh6j6TwHnXBi6UM5mby
 /i6owhSS9km1rwTzsqJnpC89zZ1E26e5W7i6tDdQ+70OorSgPjMOGiyPNP+1KX0x
 P9CfFGV6c2CICCfylva7lQXoBkAUn9uQsimGBOzYY3eWt5gYZKrwNistLKlrZQca
 qRMRCXApfPvcyPvkX4DEuiJDgi+74nUqm0okIHLVHN4QfAuoq22DzTlTlFiF6OCJ
 Fj5URCCfwyuwNtaF0W6IH8PnhkD8VQjYHH0RqclQAUaS5yJxj4x///GTGPwYDCxe
 JcbASQfDOK1QmN4C3vOweym9J5jUdJR4fbvuj2iJhL0qQLrQZrKHoPfu8J5G4EyC
 rtHAVmz8eI+IQtYsppRpQbRpNtmcj773FXhQ2wNqkZ6Y7i/GtFE=
 =GrDi
 -----END PGP SIGNATURE-----

Merge tag 'wireless-2022-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Johannes Berg says:

====================
A small set of fixes for
 * queue selection in mesh/ocb
 * queue handling on interface stop
 * hwsim virtio device vs. some other virtio changes
 * dt-bindings email addresses
 * color collision memory allocation
 * a const variable in rtw88
 * shared SKB transmit in the ethernet format path
 * P2P client port authorization
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-13 14:27:38 +01:00
Jiri Pirko
277cbb6bc4 net: devlink: move unlocked function prototypes alongside the locked ones
Maintain the same order as it is in devlink.c for function prototypes.
The most of the locked variants would very likely soon be removed
and the unlocked version would be the only one.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-13 13:49:44 +01:00
Kuniyuki Iwashima
1dace01492 raw: Fix a data-race around sysctl_raw_l3mdev_accept.
While reading sysctl_raw_l3mdev_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 6897445fb1 ("net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFs")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-13 12:56:49 +01:00
Maksym Glubokiy
83d85bb069 net: extract port range fields from fl_flow_key
So it can be used for port range filter offloading.

Co-developed-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Signed-off-by: Volodymyr Mytnyk <volodymyr.mytnyk@plvision.eu>
Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-13 12:16:56 +01:00
Zhengchao Shao
5022e221c9 net: change the type of ip_route_input_rcu to static
The type of ip_route_input_rcu should be static.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20220711073549.8947-1-shaozhengchao@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-12 15:08:45 +02:00
Moshe Shemesh
df539fc62b devlink: Remove unused functions devlink_rate_leaf_create/destroy
The previous patch removed the last usage of the functions
devlink_rate_leaf_create() and devlink_rate_nodes_destroy(). Thus,
remove these function from devlink API.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-12 10:26:22 +02:00
Moshe Shemesh
868232f5cd devlink: Remove unused function devlink_rate_nodes_destroy
The previous patch removed the last usage of the function
devlink_rate_nodes_destroy(). Thus, remove this function from devlink
API.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-12 10:26:22 +02:00
Christophe JAILLET
2b8bf3d6c9 net/fq_impl: Use the bitmap API to allocate bitmaps
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them.

It is less verbose and it improves the semantic.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/c7bf099af07eb497b02d195906ee8c11fea3b3bd.1657377335.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-11 19:49:38 -07:00
Florian Westphal
6b77205374 netfilter: nf_tables: move nft_cmp_fast_mask to where its used
... and cast result to u32 so sparse won't complain anymore.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:40:46 +02:00
Florian Westphal
7278b3c1e4 netfilter: nf_tables: add and use BE register load-store helpers
Same as the existing ones, no conversions. This is just for sparse sake
only so that we no longer mix be16/u16 and be32/u32 types.

Alternative is to add __force __beX in various places, but this
seems nicer.

objdiff shows no changes.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:40:46 +02:00
Florian Westphal
6976890e89 netfilter: nf_conntrack: add missing __rcu annotations
Access to the hook pointers use correct helpers but the pointers lack
the needed __rcu annotation.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:25:15 +02:00
Vlad Buslov
b038177636 netfilter: nf_flow_table: count pending offload workqueue tasks
To improve hardware offload debuggability count pending 'add', 'del' and
'stats' flow_table offload workqueue tasks. Counters are incremented before
scheduling new task and decremented when workqueue handler finishes
executing. These counters allow user to diagnose congestion on hardware
offload workqueues that can happen when either CPU is starved and workqueue
jobs are executed at lower rate than new ones are added or when
hardware/driver can't keep up with the rate.

Implement the described counters as percpu counters inside new struct
netns_ft which is stored inside struct net. Expose them via new procfs file
'/proc/net/stats/nf_flowtable' that is similar to existing 'nf_conntrack'
file.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-11 16:25:14 +02:00
sewookseo
e22aa14866 net: Find dst with sk's xfrm policy not ctl_sk
If we set XFRM security policy by calling setsockopt with option
IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock'
struct. However tcp_v6_send_response doesn't look up dst_entry with the
actual socket but looks up with tcp control socket. This may cause a
problem that a RST packet is sent without ESP encryption & peer's TCP
socket can't receive it.
This patch will make the function look up dest_entry with actual socket,
if the socket has XFRM policy(sock_policy), so that the TCP response
packet via this function can be encrypted, & aligned on the encrypted
TCP socket.

Tested: We encountered this problem when a TCP socket which is encrypted
in ESP transport mode encryption, receives challenge ACK at SYN_SENT
state. After receiving challenge ACK, TCP needs to send RST to
establish the socket at next SYN try. But the RST was not encrypted &
peer TCP socket still remains on ESTABLISHED state.
So we verified this with test step as below.
[Test step]
1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED).
2. Client tries a new connection on the same TCP ports(src & dst).
3. Server will return challenge ACK instead of SYN,ACK.
4. Client will send RST to server to clear the SOCKET.
5. Client will retransmit SYN to server on the same TCP ports.
[Expected result]
The TCP connection should be established.

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Sehee Lee <seheele@google.com>
Signed-off-by: Sewook Seo <sewookseo@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-11 13:39:56 +01:00
David S. Miller
e45955766b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) refcount_inc_not_zero() is not semantically equivalent to
   atomic_int_not_zero(), from Florian Westphal. My understanding was
   that refcount_*() API provides a wrapper to easier debugging of
   reference count leaks, however, there are semantic differences
   between these two APIs, where refcount_inc_not_zero() needs a barrier.
   Reason for this subtle difference to me is unknown.

2) packet logging is not correct for ARP and IP packets, from the
   ARP family and netdev/egress respectively. Use skb_network_offset()
   to reach the headers accordingly.

3) set element extension length have been growing over time, replace
   a BUG_ON by EINVAL which might be triggerable from userspace.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-11 11:58:38 +01:00
Jakub Kicinski
0076cad301 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2022-07-09

We've added 94 non-merge commits during the last 19 day(s) which contain
a total of 125 files changed, 5141 insertions(+), 6701 deletions(-).

The main changes are:

1) Add new way for performing BTF type queries to BPF, from Daniel Müller.

2) Add inlining of calls to bpf_loop() helper when its function callback is
   statically known, from Eduard Zingerman.

3) Implement BPF TCP CC framework usability improvements, from Jörn-Thorben Hinz.

4) Add LSM flavor for attaching per-cgroup BPF programs to existing LSM
   hooks, from Stanislav Fomichev.

5) Remove all deprecated libbpf APIs in prep for 1.0 release, from Andrii Nakryiko.

6) Add benchmarks around local_storage to BPF selftests, from Dave Marchevsky.

7) AF_XDP sample removal (given move to libxdp) and various improvements around AF_XDP
   selftests, from Magnus Karlsson & Maciej Fijalkowski.

8) Add bpftool improvements for memcg probing and bash completion, from Quentin Monnet.

9) Add arm64 JIT support for BPF-2-BPF coupled with tail calls, from Jakub Sitnicki.

10) Sockmap optimizations around throughput of UDP transmissions which have been
    improved by 61%, from Cong Wang.

11) Rework perf's BPF prologue code to remove deprecated functions, from Jiri Olsa.

12) Fix sockmap teardown path to avoid sleepable sk_psock_stop, from John Fastabend.

13) Fix libbpf's cleanup around legacy kprobe/uprobe on error case, from Chuang Wang.

14) Fix libbpf's bpf_helpers.h to work with gcc for the case of its sec/pragma
    macro, from James Hilliard.

15) Fix libbpf's pt_regs macros for riscv to use a0 for RC register, from Yixun Lan.

16) Fix bpftool to show the name of type BPF_OBJ_LINK, from Yafang Shao.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (94 commits)
  selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n
  bpf: Correctly propagate errors up from bpf_core_composites_match
  libbpf: Disable SEC pragma macro on GCC
  bpf: Check attach_func_proto more carefully in check_return_code
  selftests/bpf: Add test involving restrict type qualifier
  bpftool: Add support for KIND_RESTRICT to gen min_core_btf command
  MAINTAINERS: Add entry for AF_XDP selftests files
  selftests, xsk: Rename AF_XDP testing app
  bpf, docs: Remove deprecated xsk libbpf APIs description
  selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage
  libbpf, riscv: Use a0 for RC register
  libbpf: Remove unnecessary usdt_rel_ip assignments
  selftests/bpf: Fix few more compiler warnings
  selftests/bpf: Fix bogus uninitialized variable warning
  bpftool: Remove zlib feature test from Makefile
  libbpf: Cleanup the legacy uprobe_event on failed add/attach_event()
  libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy()
  libbpf: Cleanup the legacy kprobe_event on failed add/attach_event()
  selftests/bpf: Add type match test against kernel's task_struct
  selftests/bpf: Add nested type to type based tests
  ...
====================

Link: https://lore.kernel.org/r/20220708233145.32365-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-09 12:24:16 -07:00
Pablo Neira Ayuso
c39ba4de6b netfilter: nf_tables: replace BUG_ON by element length check
BUG_ON can be triggered from userspace with an element with a large
userdata area. Replace it by length check and return EINVAL instead.
Over time extensions have been growing in size.

Pick a sufficiently old Fixes: tag to propagate this fix.

Fixes: 7d7402642e ("netfilter: nf_tables: variable sized set element keys / data")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-09 16:25:09 +02:00
Geliang Tang
f7657ff4a7 mptcp: move MPTCPOPT_HMAC_LEN to net/mptcp.h
Move macro MPTCPOPT_HMAC_LEN definition from net/mptcp/protocol.h to
include/net/mptcp.h.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-09 12:19:23 +01:00
Kent Overstreet
8b11ff098a 9p: Add client parameter to p9_req_put()
This is to aid in adding mempools, in the next patch.

Link: https://lkml.kernel.org/r/20220704014243.153050-2-kent.overstreet@gmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-07-09 14:38:35 +09:00
Kent Overstreet
6cda12864c 9p: Drop kref usage
An upcoming patch is going to require passing the client through
p9_req_put() -> p9_req_free(), but that's awkward with the kref
indirection - so this patch switches to using refcount_t directly.

Link: https://lkml.kernel.org/r/20220704014243.153050-1-kent.overstreet@gmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-07-09 14:38:12 +09:00
Jakub Kicinski
5879031423 tls: create an internal header
include/net/tls.h is getting a little long, and is probably hard
for driver authors to navigate. Split out the internals into a
header which will live under net/tls/. While at it move some
static inlines with a single user into the source files, add
a few tls_ prefixes and fix spelling of 'proccess'.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 18:38:45 -07:00
Jakub Kicinski
50a07aa531 tls: rx: always allocate max possible aad size for decrypt
AAD size is either 5 or 13. Really no point complicating
the code for the 8B of difference. This will also let us
turn the chunked up buffer into a sane struct.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 18:38:45 -07:00
Jakub Kicinski
2d91ecace6 strparser: pad sk_skb_cb to avoid straddling cachelines
sk_skb_cb lives within skb->cb[]. skb->cb[] straddles
2 cache lines, each containing 24B of data.
The first cache line does not contain much interesting
information for users of strparser, so pad things a little.
Previously strp_msg->full_len would live in the first cache
line and strp_msg->offset in the second.

We need to reorder the 8 byte temp_reg with struct tls_msg
to prevent a 4B hole which would push the struct over 48B.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08 18:38:44 -07:00
Kuniyuki Iwashima
310731e2f1 net: Fix data-races around sysctl_mem.
While reading .sysctl_mem, it can be changed concurrently.
So, we need to add READ_ONCE() to avoid data-races.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08 12:10:33 +01:00
Jakub Kicinski
83ec88d81a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-07 12:07:37 -07:00
Jakub Kicinski
88527790c0 tls: rx: add sockopt for enabling optimistic decrypt with TLS 1.3
Since optimisitic decrypt may add extra load in case of retries
require socket owner to explicitly opt-in.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06 12:56:35 +01:00
Vlad Buslov
052f744f44 net/sched: act_police: allow 'continue' action offload
Offloading police with action TC_ACT_UNSPEC was erroneously disabled even
though it was supported by mlx5 matchall offload implementation, which
didn't verify the action type but instead assumed that any single police
action attached to matchall classifier is a 'continue' action. Lack of
action type check made it non-obvious what mlx5 matchall implementation
actually supports and caused implementers and reviewers of referenced
commits to disallow it as a part of improved validation code.

Fixes: b8cd5831c6 ("net: flow_offload: add tc police action parameters")
Fixes: b50e462bc2 ("net/sched: act_police: Add extack messages for offload failure")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-06 12:44:39 +01:00
Vladimir Oltean
d7be266adb net: sched: provide shim definitions for taprio_offload_{get,free}
All callers of taprio_offload_get() and taprio_offload_free() prior to
the blamed commit are conditionally compiled based on CONFIG_NET_SCH_TAPRIO.

felix_vsc9959.c is different; it provides vsc9959_qos_port_tas_set()
even when taprio is compiled out.

Provide shim definitions for the functions exported by taprio so that
felix_vsc9959.c is able to compile. vsc9959_qos_port_tas_set() in that
case is dead code anyway, and ocelot_port->taprio remains NULL, which is
fine for the rest of the logic.

Fixes: 1c9017e44a ("net: dsa: felix: keep reference on entire tc-taprio config")
Reported-by: Colin Foster <colin.foster@in-advantage.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Colin Foster <colin.foster@in-advantage.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://lore.kernel.org/r/20220704190241.1288847-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-05 17:50:38 -07:00
Prasanna Vengateshan
092f875131 net: dsa: tag_ksz: add tag handling for Microchip LAN937x
The Microchip LAN937X switches have a tagging protocol which is
very similar to KSZ tagging. So that the implementation is added to
tag_ksz.c and reused common APIs

Signed-off-by: Prasanna Vengateshan <prasanna.vengateshan@microchip.com>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-02 16:34:05 +01:00
Dominique Martinet
286c171b86 9p fid refcount: add a 9p_fid_ref tracepoint
This adds a tracepoint event for 9p fid lifecycle tracing: when a fid
is created, its reference count increased/decreased, and freed.
The new 9p_fid_ref tracepoint should help anyone wishing to debug any
fid problem such as missing clunk (destroy) or use-after-free.

Link: https://lkml.kernel.org/r/20220612085330.1451496-6-asmadeus@codewreck.org
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-07-02 18:52:21 +09:00
Dominique Martinet
b48dbb998d 9p fid refcount: add p9_fid_get/put wrappers
I was recently reminded that it is not clear that p9_client_clunk()
was actually just decrementing refcount and clunking only when that
reaches zero: make it clear through a set of helpers.

This will also allow instrumenting refcounting better for debugging
next patch

Link: https://lkml.kernel.org/r/20220612085330.1451496-5-asmadeus@codewreck.org
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-07-02 18:52:21 +09:00
Paolo Abeni
e918c137db net: remove SK_RECLAIM_THRESHOLD and SK_RECLAIM_CHUNK
There are no more users for the mentioned macros, just
drop them.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-01 13:25:00 +01:00
Aloka Dixit
8bc65d38ee wifi: nl80211: retrieve EHT related elements in AP mode
Add support to retrieve EHT capabilities and EHT operation elements
passed by the userspace in the beacon template and store the pointers
in struct cfg80211_ap_settings to be used by the drivers.

Co-developed-by: Vikram Kandukuri <quic_vikram@quicinc.com>
Signed-off-by: Vikram Kandukuri <quic_vikram@quicinc.com>
Co-developed-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com>
Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com>
Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com>
Link: https://lore.kernel.org/r/20220523064904.28523-1-quic_alokad@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-01 12:37:54 +02:00
Veerendranath Jakkam
ecad3b0b99 wifi: cfg80211: Increase akm_suites array size in cfg80211_crypto_settings
Increase akm_suites array size in struct cfg80211_crypto_settings to 10
and advertise the capability to userspace. This allows userspace to send
more than two AKMs to driver in netlink commands such as
NL80211_CMD_CONNECT.

This capability is needed for implementing WPA3-Personal transition mode
correctly with any driver that handles roaming internally. Currently,
the possible AKMs for multi-AKM connect can include PSK, PSK-SHA-256,
SAE, FT-PSK and FT-SAE. Since the count is already 5, increasing
the akm_suites array size to 10 should be reasonable for future
usecases.

Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com>
Link: https://lore.kernel.org/r/1653312358-12321-1-git-send-email-quic_vjakkam@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-01 12:07:08 +02:00
Felix Fietkau
942741dabc wifi: mac80211: switch airtime fairness back to deficit round-robin scheduling
This reverts commits 6a789ba679 and
2433647bc8.

The virtual time scheduler code has a number of issues:
- queues slowed down by hardware/firmware powersave handling were not properly
  handled.
- on ath10k in push-pull mode, tx queues that the driver tries to pull from
  were starved, causing excessive latency
- delay between tx enqueue and reported airtime use were causing excessively
  bursty tx behavior

The bursty behavior may also be present on the round-robin scheduler, but there
it is much easier to fix without introducing additional regressions

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://lore.kernel.org/r/20220625212411.36675-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-01 10:51:41 +02:00
Johannes Berg
7f884baae6 wifi: mac80211: fix a kernel-doc complaint
Somehow kernel-doc complains here about strong markup, but
we really don't need the [] so just remove that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-01 10:42:40 +02:00
Johannes Berg
c8a9415e6d wifi: cfg80211: remove redundant documentation
These struct members no longer exist, remove them
from documentation.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-01 10:36:36 +02:00
Mauro Carvalho Chehab
82757b792b wifi: mac80211: add a missing comma at kernel-doc markup
The lack of the colon makes it not parse the function parameter:
	include/net/mac80211.h:6250: warning: Function parameter or member 'vif' not described in 'ieee80211_channel_switch_disconnect'

Fix it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lore.kernel.org/r/11c1bdb861d89c93058fcfe312749b482851cbdb.1656409369.git.mchehab@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-01 10:30:05 +02:00
Mauro Carvalho Chehab
2d8b08fef0 wifi: cfg80211: fix kernel-doc warnings all over the file
There are currently 17 kernel-doc warnings on this file:
	include/net/cfg80211.h:391: warning: Function parameter or member 'bw' not described in 'ieee80211_eht_mcs_nss_supp'
	include/net/cfg80211.h:437: warning: Function parameter or member 'eht_cap' not described in 'ieee80211_sband_iftype_data'
	include/net/cfg80211.h:507: warning: Function parameter or member 's1g' not described in 'ieee80211_sta_s1g_cap'
	include/net/cfg80211.h:1390: warning: Function parameter or member 'counter_offset_beacon' not described in 'cfg80211_color_change_settings'
	include/net/cfg80211.h:1390: warning: Function parameter or member 'counter_offset_presp' not described in 'cfg80211_color_change_settings'
	include/net/cfg80211.h:1430: warning: Enum value 'STATION_PARAM_APPLY_STA_TXPOWER' not described in enum 'station_parameters_apply_mask'
	include/net/cfg80211.h:2195: warning: Function parameter or member 'dot11MeshConnectedToAuthServer' not described in 'mesh_config'
	include/net/cfg80211.h:2341: warning: Function parameter or member 'short_ssid' not described in 'cfg80211_scan_6ghz_params'
	include/net/cfg80211.h:3328: warning: Function parameter or member 'kck_len' not described in 'cfg80211_gtk_rekey_data'
	include/net/cfg80211.h:3698: warning: Function parameter or member 'ftm' not described in 'cfg80211_pmsr_result'
	include/net/cfg80211.h:3828: warning: Function parameter or member 'global_mcast_stypes' not described in 'mgmt_frame_regs'
	include/net/cfg80211.h:4977: warning: Function parameter or member 'ftm' not described in 'cfg80211_pmsr_capabilities'
	include/net/cfg80211.h:5742: warning: Function parameter or member 'u' not described in 'wireless_dev'
	include/net/cfg80211.h:5742: warning: Function parameter or member 'links' not described in 'wireless_dev'
	include/net/cfg80211.h:5742: warning: Function parameter or member 'valid_links' not described in 'wireless_dev'
	include/net/cfg80211.h:6076: warning: Function parameter or member 'is_amsdu' not described in 'ieee80211_data_to_8023_exthdr'
	include/net/cfg80211.h:6949: warning: Function parameter or member 'sig_dbm' not described in 'cfg80211_notify_new_peer_candidate'

Address them, in order to build a better documentation from this
header.

Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lore.kernel.org/r/f6f522cdc716a01744bb0eae2186f4592976222b.1656409369.git.mchehab@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-01 10:28:55 +02:00
Jakub Kicinski
0d8730f07c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c
  9c5de246c1 ("net: sparx5: mdb add/del handle non-sparx5 devices")
  fbb89d02e3 ("net: sparx5: Allow mdb entries to both CPU and ports")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-30 16:31:00 -07:00
Yuwei Wang
211da42eaa net, neigh: introduce interval_probe_time_ms for periodic probe
commit ed6cd6a178 ("net, neigh: Set lower cap for neigh_managed_work rearming")
fixed a case when DELAY_PROBE_TIME is configured to 0, the processing of the
system work queue hog CPU to 100%, and further more we should introduce
a new option used by periodic probe

Signed-off-by: Yuwei Wang <wangyuweihx@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-30 13:14:35 +02:00
Vladimir Oltean
3eb4a4c344 net: switchdev: add reminder near struct switchdev_notifier_fdb_info
br_switchdev_fdb_notify() creates an on-stack FDB info variable, and
initializes it member by member. As such, newly added fields which are
not initialized by br_switchdev_fdb_notify() will contain junk bytes
from the stack.

Other uses of struct switchdev_notifier_fdb_info have a struct
initializer which should put zeroes in the uninitialized fields.

Add a reminder above the structure for future developers. Recently
discussed during review.

Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220524152144.40527-2-schultz.hans+netdev@gmail.com/#24877698
Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220524152144.40527-3-schultz.hans+netdev@gmail.com/#24912269
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20220628100831.2899434-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-29 20:37:36 -07:00
Oleksij Rempel
3d410403a5 net: dsa: add get_pause_stats support
Add support for pause stats

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-29 20:17:11 -07:00
Lorenzo Bianconi
03895c8414 wifi: mac80211: add gfp_t parameter to ieeee80211_obss_color_collision_notify
Introduce the capability to specify gfp_t parameter to
ieeee80211_obss_color_collision_notify routine since it runs in
interrupt context in ieee80211_rx_check_bss_color_collision().

Fixes: 6d945a33f2 ("mac80211: introduce BSS color collision detection")
Co-developed-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/02c990fb3fbd929c8548a656477d20d6c0427a13.1655419135.git.lorenzo@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-29 11:43:15 +02:00
Florian Westphal
e34b9ed96c netfilter: nf_tables: avoid skb access on nf_stolen
When verdict is NF_STOLEN, the skb might have been freed.

When tracing is enabled, this can result in a use-after-free:
1. access to skb->nf_trace
2. access to skb->mark
3. computation of trace id
4. dump of packet payload

To avoid 1, keep a cached copy of skb->nf_trace in the
trace state struct.
Refresh this copy whenever verdict is != STOLEN.

Avoid 2 by skipping skb->mark access if verdict is STOLEN.

3 is avoided by precomputing the trace id.

Only dump the packet when verdict is not "STOLEN".

Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-06-27 19:22:54 +02:00
Clément Léger
a08d6a6dc8 net: dsa: add Renesas RZ/N1 switch tag driver
The switch that is present on the Renesas RZ/N1 SoC uses a specific
VLAN value followed by 6 bytes which contains forwarding configuration.

Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-27 11:37:55 +01:00
Clément Léger
67f38b1c73 net: dsa: add support for ethtool get_rmon_stats()
Add support to allow dsa drivers to specify the .get_rmon_stats()
operation.

Signed-off-by: Clément Léger <clement.leger@bootlin.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-27 11:37:55 +01:00
Richard Gobert
ede57d58e6 net: helper function skb_len_add
Move the len fields manipulation in the skbs to a helper function.
There is a comment specifically requesting this and there are several
other areas in the code displaying the same pattern which can be
refactored.
This improves code readability.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Link: https://lore.kernel.org/r/20220622160853.GA6478@debian
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-24 16:24:38 -07:00
Hangbin Liu
0a2ff7cc8a Bonding: add per-port priority for failover re-selection
Add per port priority support for bonding active slave re-selection during
failover. A higher number means higher priority in selection. The primary
slave still has the highest priority. This option also follows the
primary_reselect rules.

This option could only be configured via netlink.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-24 11:27:59 +01:00
Hangbin Liu
f2b3b28ce5 bonding: add slave_dev field for bond_opt_value
Currently, bond_opt_value are mostly used for bonding option settings. If
we want to set a value for slave, we need to re-alloc a string to store
both slave name and vlaue, like bond_option_queue_id_set() does, which
is complex and dumb.

As Jon suggested, let's add a union field slave_dev for bond_opt_value,
which will be benefit for future slave option setting. In function
__bond_opt_init(), we will always check the extra field and set it
if it's not NULL.

Suggested-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-24 11:27:59 +01:00
Zhengchao Shao
f41b284a2c xfrm: change the type of xfrm_register_km and xfrm_unregister_km
Functions xfrm_register_km and xfrm_unregister_km do always return 0,
change the type of functions to void.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-06-24 10:19:11 +02:00
Jakub Kicinski
93817be8b6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-23 12:33:24 -07:00
Jakub Kicinski
e34a07c0ae sock: redo the psock vs ULP protection check
Commit 8a59f9d1e3 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()")
has moved the inet_csk_has_ulp(sk) check from sk_psock_init() to
the new tcp_bpf_update_proto() function. I'm guessing that this
was done to allow creating psocks for non-inet sockets.

Unfortunately the destruction path for psock includes the ULP
unwind, so we need to fail the sk_psock_init() itself.
Otherwise if ULP is already present we'll notice that later,
and call tcp_update_ulp() with the sk_proto of the ULP
itself, which will most likely result in the ULP looping
its callbacks.

Fixes: 8a59f9d1e3 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20220620191353.1184629-2-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-23 10:08:30 +02:00
Kuniyuki Iwashima
2f7ca90a01 af_unix: Remove unix_table_locks.
unix_table_locks are to protect the global hash table, unix_socket_table.
The previous commit removed it, so let's clean up the unnecessary locks.

Here is a test result on EC2 c5.9xlarge where 10 processes run concurrently
in different netns and bind 100,000 sockets for each.

  without this series : 1m 38s
  with this series    :    11s

It is ~10x faster because the global hash table is split into 10 netns in
this case.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-22 12:59:43 +01:00
Kuniyuki Iwashima
cf2f225e26 af_unix: Put a socket into a per-netns hash table.
This commit replaces the global hash table with a per-netns one and removes
the global one.

We now link a socket in each netns's hash table so we can save some netns
comparisons when iterating through a hash bucket.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-22 12:59:43 +01:00
Kuniyuki Iwashima
b6e8113830 af_unix: Define a per-netns hash table.
This commit adds a per netns hash table for AF_UNIX, which size is fixed
as UNIX_HASH_SIZE for now.

The first implementation defines a per-netns hash table as a single array
of lock and list:

	struct unix_hashbucket {
		spinlock_t		lock;
		struct hlist_head	head;
	};

	struct netns_unix {
		struct unix_hashbucket	*hash;
		...
	};

But, Eric pointed out memory cost that the structure has holes because of
sizeof(spinlock_t), which is 4 (or more if LOCKDEP is enabled). [0]  It
could be expensive on a host with thousands of netns and few AF_UNIX
sockets.  For this reason, a per-netns hash table uses two dense arrays.

	struct unix_table {
		spinlock_t		*locks;
		struct hlist_head	*buckets;
	};

	struct netns_unix {
		struct unix_table	table;
		...
	};

Note the length of the list has a significant impact rather than lock
contention, so having shared locks can be an option.  But, per-netns
locks and lists still perform better than the global locks and per-netns
lists. [1]

Also, this patch adds a change so that struct netns_unix disappears from
struct net if CONFIG_UNIX is disabled.

[0]: https://lore.kernel.org/netdev/CANn89iLVxO5aqx16azNU7p7Z-nz5NrnM5QTqOzueVxEnkVTxyg@mail.gmail.com/
[1]: https://lore.kernel.org/netdev/20220617175215.1769-1-kuniyu@amazon.com/

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-22 12:59:43 +01:00
Kuniyuki Iwashima
f302d180c6 af_unix: Include the whole hash table size in UNIX_HASH_SIZE.
Currently, the size of AF_UNIX hash table is UNIX_HASH_SIZE * 2,
the first half for bind()ed sockets and the second half for unbound
ones.  UNIX_HASH_SIZE * 2 is used to define the table and iterate
over it.

In some places, we use ARRAY_SIZE(unix_socket_table) instead of
UNIX_HASH_SIZE * 2.  However, we cannot use it anymore because we
will allocate the hash table dynamically.  Then, we would have to
add UNIX_HASH_SIZE * 2 in many places, which would be troublesome.

This patch adapts the UNIX_HASH_SIZE definition to include bound
and unbound sockets and defines a new UNIX_HASH_MOD macro to ease
calculations.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-22 12:59:43 +01:00
Eric Dumazet
af185d8c76 raw: complete rcu conversion
raw_diag_dump() can use rcu_read_lock() instead of read_lock()

Now the hashinfo lock is only used from process context,
in write mode only, we can convert it to a spinlock,
and we do not need to block BH anymore.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220620100509.3493504-1-eric.dumazet@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-06-21 11:38:29 +02:00
Cong Wang
965b57b469 net: Introduce a new proto_ops ->read_skb()
Currently both splice() and sockmap use ->read_sock() to
read skb from receive queue, but for sockmap we only read
one entire skb at a time, so ->read_sock() is too conservative
to use. Introduce a new proto_ops ->read_skb() which supports
this sematic, with this we can finally pass the ownership of
skb to recv actors.

For non-TCP protocols, all ->read_sock() can be simply
converted to ->read_skb().

Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220615162014.89193-3-xiyou.wangcong@gmail.com
2022-06-20 14:05:52 +02:00