Commit Graph

70385 Commits

Author SHA1 Message Date
Li Qiong
fc54cb8d87 libceph: check pointer before assigned to "c->rules[]"
It should be better to check pointer firstly, then assign it
to c->rules[]. Refine code a little bit.

Signed-off-by: Li Qiong <liqiong@nfschina.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-08-03 00:54:12 +02:00
Linus Torvalds
42df1cbf6a for-5.20/io_uring-zerocopy-send-2022-07-29
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLkm/MQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpoaXD/9Nevo4KQmlG83ZcZfu2d51VlGtt6/Dl7LL
 pr07RfnRFJcjeCPCwXCXmu6rrlY+inpfEWv9iCR/ImoeESOJCzm0dN/nlffO/zT1
 E0h5AlEoDv2bYrCnVkbfvxL722TZqGeLiDE4YY1jVbuUfs3TDmLQzfGbORK+Zw4y
 wPEMDZP1yWHoyeHUGWFasu6dpWiAwsZ4sTX0J631YwIBDNWKZqtienIiY15rK4dz
 GioBea6voe8Fos0VEhCBOKXMmV9mG4yVOPeaDbTWTRfuzGNF8b7t2vg7mz+PrbBY
 M8h1oEt+/+FnsCIZqfaEUzqHX6quv46OVtq/F5L3yNz/5QEsnqfv08ZFwD3sXdgZ
 /RFxXamfcn/LoxzZ9eLu3MeyzpXp6frxBcgTNGc3q2TlIwXr1WsIx2N4PxZh00GM
 ssW/ulaOZvZmOmDlbdeSC7sp3R1JmHO4qVlHowr58ce8pkishNTwlZZGr0sHyeNq
 /Wkd9NQEQEFD6AIzZ/Mz9CsmzHeHYpy6GhicFrcLuU4YF/fnQ6T4hTjlIlucGv/S
 IeqoAHrurCB0/p1ml6VfJ58xUWXNCCCkKC5+xu8Vm6/RgMlIw5KkzvVEBfflnomB
 wVJLYsLw41gnlqqpwISR39I7cDV+s6xC5P8YAA/NLz692HDIUrRX14dlbZuXIgbc
 ROeHB2N5+g==
 =vSwm
 -----END PGP SIGNATURE-----

Merge tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of git://git.kernel.dk/linux-block

Pull io_uring zerocopy support from Jens Axboe:
 "This adds support for efficient support for zerocopy sends through
  io_uring. Both ipv4 and ipv6 is supported, as well as both TCP and
  UDP.

  The core network changes to support this is in a stable branch from
  Jakub that both io_uring and net-next has pulled in, and the io_uring
  changes are layered on top of that.

  All of the work has been done by Pavel"

* tag 'for-5.20/io_uring-zerocopy-send-2022-07-29' of git://git.kernel.dk/linux-block: (34 commits)
  io_uring: notification completion optimisation
  io_uring: export req alloc from core
  io_uring/net: use unsigned for flags
  io_uring/net: make page accounting more consistent
  io_uring/net: checks errors of zc mem accounting
  io_uring/net: improve io_get_notif_slot types
  selftests/io_uring: test zerocopy send
  io_uring: enable managed frags with register buffers
  io_uring: add zc notification flush requests
  io_uring: rename IORING_OP_FILES_UPDATE
  io_uring: flush notifiers after sendzc
  io_uring: sendzc with fixed buffers
  io_uring: allow to pass addr into sendzc
  io_uring: account locked pages for non-fixed zc
  io_uring: wire send zc request type
  io_uring: add notification slot registration
  io_uring: add rsrc referencing for notifiers
  io_uring: complete notifiers in tw
  io_uring: cache struct io_notif
  io_uring: add zc notification infrastructure
  ...
2022-08-02 13:37:55 -07:00
Linus Torvalds
b349b1181d for-5.20/io_uring-2022-07-29
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmLkm5gQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpmKMD/4l3QIrLbjYIxlfrzQcHbmYuUkbQtj3SbZg
 6ejbnGVhCs1P9DdXH8MgE2BxgpiXQE0CqOK7vbSoo5ep2n2UTLI2DIxAl74SMIo7
 0wmJXtUJySuViKr3NYVHqlN180MkQYddBz0nGElhkQBPBCMhW8CrtPCeURr/YyHp
 2RxSYBXiUx2gRyig+klnp6oPEqelcBZJUyNHdA9yVrgl/RhB/t2rKj7D++8ukQM3
 Zuyh8WIkTeTfUz9hdGG7fuCEdZN4DlO2CCEc7uy0cKi6VRCKH4hYUCqClJ+/cfd2
 43dUI2O7B6D1t/ObFh8AGIDXBDqVA6ePQohQU6gooRkfQiBPKkc9d0ts4yIhRqca
 AjkzNM+0Eve3A01loJ8J84w8oZnvNpYEv5n8/sZVLWcyU3UIs0I88nC2OBiFtoRq
 d77CtFLwOTo+r3STtAhnZOqez90rhS6BqKtqlUP346PCuFItl6/MbGtwdTbLYEFj
 CVNIb2pERWSr2NxGv4lFyXaX/cRwruxojWH7yc3rRYjr4Ykevd1pe/fMGNiMAnKw
 5em/3QU3qq0ZVcXLMihksKeHHFIQwGDRMuyuv/fktV10+yYXQ0t16WzkJT3aR8Xo
 cqs0r8+6Jnj3uYcOMzj/FoLcpEPr21hnwAtzLto1mG1Wh4JRn/D7Nx5zqxPLxcW+
 NiU6VihPOw==
 =gxeV
 -----END PGP SIGNATURE-----

Merge tag 'for-5.20/io_uring-2022-07-29' of git://git.kernel.dk/linux-block

Pull io_uring updates from Jens Axboe:

 - As per (valid) complaint in the last merge window, fs/io_uring.c has
   grown quite large these days. io_uring isn't really tied to fs
   either, as it supports a wide variety of functionality outside of
   that.

   Move the code to io_uring/ and split it into files that either
   implement a specific request type, and split some code into helpers
   as well. The code is organized a lot better like this, and io_uring.c
   is now < 4K LOC (me).

 - Deprecate the epoll_ctl opcode. It'll still work, just trigger a
   warning once if used. If we don't get any complaints on this, and I
   don't expect any, then we can fully remove it in a future release
   (me).

 - Improve the cancel hash locking (Hao)

 - kbuf cleanups (Hao)

 - Efficiency improvements to the task_work handling (Dylan, Pavel)

 - Provided buffer improvements (Dylan)

 - Add support for recv/recvmsg multishot support. This is similar to
   the accept (or poll) support for have for multishot, where a single
   SQE can trigger everytime data is received. For applications that
   expect to do more than a few receives on an instantiated socket, this
   greatly improves efficiency (Dylan).

 - Efficiency improvements for poll handling (Pavel)

 - Poll cancelation improvements (Pavel)

 - Allow specifiying a range for direct descriptor allocations (Pavel)

 - Cleanup the cqe32 handling (Pavel)

 - Move io_uring types to greatly cleanup the tracing (Pavel)

 - Tons of great code cleanups and improvements (Pavel)

 - Add a way to do sync cancelations rather than through the sqe -> cqe
   interface, as that's a lot easier to use for some use cases (me).

 - Add support to IORING_OP_MSG_RING for sending direct descriptors to a
   different ring. This avoids the usually problematic SCM case, as we
   disallow those. (me)

 - Make the per-command alloc cache we use for apoll generic, place
   limits on it, and use it for netmsg as well (me).

 - Various cleanups (me, Michal, Gustavo, Uros)

* tag 'for-5.20/io_uring-2022-07-29' of git://git.kernel.dk/linux-block: (172 commits)
  io_uring: ensure REQ_F_ISREG is set async offload
  net: fix compat pointer in get_compat_msghdr()
  io_uring: Don't require reinitable percpu_ref
  io_uring: fix types in io_recvmsg_multishot_overflow
  io_uring: Use atomic_long_try_cmpxchg in __io_account_mem
  io_uring: support multishot in recvmsg
  net: copy from user before calling __get_compat_msghdr
  net: copy from user before calling __copy_msghdr
  io_uring: support 0 length iov in buffer select in compat
  io_uring: fix multishot ending when not polled
  io_uring: add netmsg cache
  io_uring: impose max limit on apoll cache
  io_uring: add abstraction around apoll cache
  io_uring: move apoll cache to poll.c
  io_uring: consolidate hash_locked io-wq handling
  io_uring: clear REQ_F_HASH_LOCKED on hash removal
  io_uring: don't race double poll setting REQ_F_ASYNC_DATA
  io_uring: don't miss setting REQ_F_DOUBLE_POLL
  io_uring: disable multishot recvmsg
  io_uring: only trace one of complete or overflow
  ...
2022-08-02 13:20:44 -07:00
Ammar Faizi
80ef928643 net: devlink: Fix missing mutex_unlock() call
Commit 2dec18ad82 forgets to call mutex_unlock() before the function
returns in the error path:

   New smatch warnings:
   net/core/devlink.c:6392 devlink_nl_cmd_region_new() warn: inconsistent \
   returns '&region->snapshot_lock'.

Make sure we call mutex_unlock() in this error path.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 2dec18ad82 ("net: devlink: remove region snapshots list dependency on devlink->lock")
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220801115742.1309329-1-ammar.faizi@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-01 12:47:10 -07:00
Tariq Toukan
d81c7cdd7a net/tls: Remove redundant workqueue flush before destroy
destroy_workqueue() safely destroys the workqueue after draining it.
No need for the explicit call to flush_workqueue(). Remove it.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220801112444.26175-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-01 12:44:38 -07:00
Xie Shaowen
062cf5ebc2 net: dsa: Fix spelling mistakes and cleanup code
fix follow spelling misktakes:
	desconstructed ==> deconstructed
	enforcment ==> enforcement

Reported-by: Hacash Robot <hacashRobot@santino.com>
Signed-off-by: Xie Shaowen <studentxswpy@163.com>
Link: https://lore.kernel.org/r/20220730092254.3102875-1-studentxswpy@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-01 12:23:06 -07:00
Hangyu Hua
a41b17ff9d dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock
In the case of sk->dccps_qpolicy == DCCPQ_POLICY_PRIO, dccp_qpolicy_full
will drop a skb when qpolicy is full. And the lock in dccp_sendmsg is
released before sock_alloc_send_skb and then relocked after
sock_alloc_send_skb. The following conditions may lead dccp_qpolicy_push
to add skb to an already full sk_write_queue:

thread1--->lock
thread1--->dccp_qpolicy_full: queue is full. drop a skb
thread1--->unlock
thread2--->lock
thread2--->dccp_qpolicy_full: queue is not full. no need to drop.
thread2--->unlock
thread1--->lock
thread1--->dccp_qpolicy_push: add a skb. queue is full.
thread1--->unlock
thread2--->lock
thread2--->dccp_qpolicy_push: add a skb!
thread2--->unlock

Fix this by moving dccp_qpolicy_full.

Fixes: b1308dc015 ("[DCCP]: Set TX Queue Length Bounds via Sysctl")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Link: https://lore.kernel.org/r/20220729110027.40569-1-hbh25y@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-01 12:11:56 -07:00
Eric Dumazet
2df91e397d net: rose: add netdev ref tracker to 'struct rose_sock'
This will help debugging netdevice refcount problems with
CONFIG_NET_DEV_REFCNT_TRACKER=y

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tested-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-01 11:59:23 -07:00
Eric Dumazet
931027820e net: rose: fix netdev reference changes
Bernard reported that trying to unload rose module would lead
to infamous messages:

unregistered_netdevice: waiting for rose0 to become free. Usage count = xx

This patch solves the issue, by making sure each socket referring to
a netdevice holds a reference count on it, and properly releases it
in rose_release().

rose_dev_first() is also fixed to take a device reference
before leaving the rcu_read_locked section.

Following patch will add ref_tracker annotations to ease
future bug hunting.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-01 11:59:23 -07:00
Jiri Pirko
09b278462f net: devlink: enable parallel ops on netlink interface
As the devlink_mutex was removed and all devlink instances are protected
individually by devlink->lock mutex, allow the netlink ops to run
in parallel and therefore allow user to execute commands on multiple
devlink instances simultaneously.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-01 12:14:00 +01:00
Jiri Pirko
d3efc2a6a6 net: devlink: remove devlink_mutex
All accesses to devlink structure from userspace and drivers are locked
with devlink->lock instance mutex. Also, devlinks xa_array iteration is
taken care of by iteration helpers taking devlink reference.

Therefore, remove devlink_mutex as it is no longer needed.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-01 12:14:00 +01:00
Jiri Pirko
644a66c60f net: devlink: convert reload command to take implicit devlink->lock
Convert reload command to behave the same way as the rest of the
commands and let if be called with devlink->lock held. Remove the
temporary devl_lock taking from drivers. As the DEVLINK_NL_FLAG_NO_LOCK
flag is no longer used, remove it alongside.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-01 12:14:00 +01:00
Jiri Pirko
c2368b1980 net: devlink: introduce "unregistering" mark and use it during devlinks iteration
Add new mark called "unregistering" to be set at the beginning of
devlink_unregister() function. Check this mark during devlinks
iteration in order to prevent getting a reference of devlink which is
being currently unregistered.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-01 12:14:00 +01:00
Kuniyuki Iwashima
02a7cb2866 udp: Remove redundant __udp_sysctl_init() call from udp_init().
__udp_sysctl_init() is called for init_net via udp_sysctl_ops.

While at it, we can rename __udp_sysctl_init() to udp_sysctl_init().

Fixes: 1e80295158 ("udp: Move the udp sysctl to namespace.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-01 12:07:53 +01:00
Li Qiong
5121db6afb net/rds: Use PTR_ERR instead of IS_ERR for rdsdebug()
If 'local_odp_mr->r_trans_private' is a error code,
it is better to print the error code than to print
the value of IS_ERR().

Signed-off-by: Li Qiong <liqiong@nfschina.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-01 11:45:15 +01:00
Steven Rostedt (Google)
9abc291812 batman-adv: tracing: Use the new __vstring() helper
Instead of open coding a __dynamic_array() with a fixed length (which
defeats the purpose of the dynamic array in the first place). Use the new
__vstring() helper that will use a va_list and only write enough of the
string into the ring buffer that is needed.

Link: https://lkml.kernel.org/r/20220724191650.236b1355@rorschach.local.home

Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <a@unstable.cc>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: b.a.t.m.a.n@lists.open-mesh.org
Cc: netdev@vger.kernel.org
Acked-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-30 13:52:47 -04:00
Yu Zhe
0f14a8351a dn_route: replace "jiffies-now>0" with "jiffies!=now"
Use "jiffies != now" to replace "jiffies - now > 0" to make
code more readable. We want to put a limit on how long the
loop can run for before rescheduling.

Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Link: https://lore.kernel.org/r/20220729061712.22666-1-yuzhe@nfschina.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-29 20:12:49 -07:00
Jakub Kicinski
5fc7c5887c Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:

====================
 bpf-next 2022-07-29

We've added 22 non-merge commits during the last 4 day(s) which contain
a total of 27 files changed, 763 insertions(+), 120 deletions(-).

The main changes are:

1) Fixes to allow setting any source IP with bpf_skb_set_tunnel_key() helper,
   from Paul Chaignon.

2) Fix for bpf_xdp_pointer() helper when doing sanity checking, from Joanne Koong.

3) Fix for XDP frame length calculation, from Lorenzo Bianconi.

4) Libbpf BPF_KSYSCALL docs improvements and fixes to selftests to accommodate
   s390x quirks with socketcall(), from Ilya Leoshkevich.

5) Allow/denylist and CI configs additions to selftests/bpf to improve BPF CI,
   from Daniel Müller.

6) BPF trampoline + ftrace follow up fixes, from Song Liu and Xu Kuohai.

7) Fix allocation warnings in netdevsim, from Jakub Kicinski.

8) bpf_obj_get_opts() libbpf API allowing to provide file flags, from Joe Burton.

9) vsnprintf usage fix in bpf_snprintf_btf(), from Fedor Tokarev.

10) Various small fixes and clean ups, from Daniel Müller, Rongguang Wei,
    Jörn-Thorben Hinz, Yang Li.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (22 commits)
  bpf: Remove unneeded semicolon
  libbpf: Add bpf_obj_get_opts()
  netdevsim: Avoid allocation warnings triggered from user space
  bpf: Fix NULL pointer dereference when registering bpf trampoline
  bpf: Fix test_progs -j error with fentry/fexit tests
  selftests/bpf: Bump internal send_signal/send_signal_tracepoint timeout
  bpftool: Don't try to return value from void function in skeleton
  bpftool: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE macro
  bpf: btf: Fix vsnprintf return value check
  libbpf: Support PPC in arch_specific_syscall_pfx
  selftests/bpf: Adjust vmtest.sh to use local kernel configuration
  selftests/bpf: Copy over libbpf configs
  selftests/bpf: Sort configuration
  selftests/bpf: Attach to socketcall() in test_probe_user
  libbpf: Extend BPF_KSYSCALL documentation
  bpf, devmap: Compute proper xdp_frame len redirecting frames
  bpf: Fix bpf_xdp_pointer return pointer
  selftests/bpf: Don't assign outer source IP to host
  bpf: Set flow flag to allow any source IP in bpf_tunnel_key
  geneve: Use ip_tunnel_key flow flags in route lookups
  ...
====================

Link: https://lore.kernel.org/r/20220729230948.1313527-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-29 19:04:29 -07:00
Chuck Lever
28fffa6c57 SUNRPC: Expand the svc_alloc_arg_err tracepoint
Record not only the number of pages requested, but the number of
pages that were actually allocated, to get a measure of progress
(or lack thereof).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-07-29 20:08:56 -04:00
Andrea Mayer
13f0296be8 seg6: add support for SRv6 H.L2Encaps.Red behavior
The SRv6 H.L2Encaps.Red behavior described in [1] is an optimization of
the SRv6 H.L2Encaps behavior [2].

H.L2Encaps.Red reduces the length of the SRH by excluding the first
segment (SID) in the SRH of the pushed IPv6 header. The first SID is
only placed in the IPv6 Destination Address field of the pushed IPv6
header.
When the SRv6 Policy only contains one SID the SRH is omitted, unless
there is an HMAC TLV to be carried.

[1] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.4
[2] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.3

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Anton Makarov <anton.makarov11235@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-29 12:14:03 +01:00
Andrea Mayer
b07c8cdbe9 seg6: add support for SRv6 H.Encaps.Red behavior
The SRv6 H.Encaps.Red behavior described in [1] is an optimization of
the SRv6 H.Encaps behavior [2].

H.Encaps.Red reduces the length of the SRH by excluding the first
segment (SID) in the SRH of the pushed IPv6 header. The first SID is
only placed in the IPv6 Destination Address field of the pushed IPv6
header.
When the SRv6 Policy only contains one SID the SRH is omitted, unless
there is an HMAC TLV to be carried.

[1] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.2
[2] - https://datatracker.ietf.org/doc/html/rfc8986#section-5.1

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Anton Makarov <anton.makarov11235@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-29 12:14:02 +01:00
Zhengchao Shao
dc633700f0 net/af_packet: check len when min_header_len equals to 0
User can use AF_PACKET socket to send packets with the length of 0.
When min_header_len equals to 0, packet_snd will call __dev_queue_xmit
to send packets, and sock->type can be any type.

Reported-by: syzbot+5ea725c25d06fb9114c4@syzkaller.appspotmail.com
Fixes: fd18942244 ("bpf: Don't redirect packets with invalid pkt_len")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-29 12:09:27 +01:00
Eric Dumazet
d7c4c9e075 ax25: fix incorrect dev_tracker usage
While investigating a separate rose issue [1], and enabling
CONFIG_NET_DEV_REFCNT_TRACKER=y, Bernard reported an orthogonal ax25 issue [2]

An ax25_dev can be used by one (or many) struct ax25_cb.
We thus need different dev_tracker, one per struct ax25_cb.

After this patch is applied, we are able to focus on rose.

[1] https://lore.kernel.org/netdev/fb7544a1-f42e-9254-18cc-c9b071f4ca70@free.fr/

[2]
[  205.798723] reference already released.
[  205.798732] allocated in:
[  205.798734]  ax25_bind+0x1a2/0x230 [ax25]
[  205.798747]  __sys_bind+0xea/0x110
[  205.798753]  __x64_sys_bind+0x18/0x20
[  205.798758]  do_syscall_64+0x5c/0x80
[  205.798763]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  205.798768] freed in:
[  205.798770]  ax25_release+0x115/0x370 [ax25]
[  205.798778]  __sock_release+0x42/0xb0
[  205.798782]  sock_close+0x15/0x20
[  205.798785]  __fput+0x9f/0x260
[  205.798789]  ____fput+0xe/0x10
[  205.798792]  task_work_run+0x64/0xa0
[  205.798798]  exit_to_user_mode_prepare+0x18b/0x190
[  205.798804]  syscall_exit_to_user_mode+0x26/0x40
[  205.798808]  do_syscall_64+0x69/0x80
[  205.798812]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  205.798827] ------------[ cut here ]------------
[  205.798829] WARNING: CPU: 2 PID: 2605 at lib/ref_tracker.c:136 ref_tracker_free.cold+0x60/0x81
[  205.798837] Modules linked in: rose netrom mkiss ax25 rfcomm cmac algif_hash algif_skcipher af_alg bnep snd_hda_codec_hdmi nls_iso8859_1 i915 rtw88_8821ce rtw88_8821c x86_pkg_temp_thermal rtw88_pci intel_powerclamp rtw88_core snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio coretemp snd_hda_intel kvm_intel snd_intel_dspcfg mac80211 snd_hda_codec kvm i2c_algo_bit drm_buddy drm_dp_helper btusb drm_kms_helper snd_hwdep btrtl snd_hda_core btbcm joydev crct10dif_pclmul btintel crc32_pclmul ghash_clmulni_intel mei_hdcp btmtk intel_rapl_msr aesni_intel bluetooth input_leds snd_pcm crypto_simd syscopyarea processor_thermal_device_pci_legacy sysfillrect cryptd intel_soc_dts_iosf snd_seq sysimgblt ecdh_generic fb_sys_fops rapl libarc4 processor_thermal_device intel_cstate processor_thermal_rfim cec snd_timer ecc snd_seq_device cfg80211 processor_thermal_mbox mei_me processor_thermal_rapl mei rc_core at24 snd intel_pch_thermal intel_rapl_common ttm soundcore int340x_thermal_zone video
[  205.798948]  mac_hid acpi_pad sch_fq_codel ipmi_devintf ipmi_msghandler drm msr parport_pc ppdev lp parport ramoops pstore_blk reed_solomon pstore_zone efi_pstore ip_tables x_tables autofs4 hid_generic usbhid hid i2c_i801 i2c_smbus r8169 xhci_pci ahci libahci realtek lpc_ich xhci_pci_renesas [last unloaded: ax25]
[  205.798992] CPU: 2 PID: 2605 Comm: ax25ipd Not tainted 5.18.11-F6BVP #3
[  205.798996] Hardware name: To be filled by O.E.M. To be filled by O.E.M./CK3, BIOS 5.011 09/16/2020
[  205.798999] RIP: 0010:ref_tracker_free.cold+0x60/0x81
[  205.799005] Code: e8 d2 01 9b ff 83 7b 18 00 74 14 48 c7 c7 2f d7 ff 98 e8 10 6e fc ff 8b 7b 18 e8 b8 01 9b ff 4c 89 ee 4c 89 e7 e8 5d fd 07 00 <0f> 0b b8 ea ff ff ff e9 30 05 9b ff 41 0f b6 f7 48 c7 c7 a0 fa 4e
[  205.799008] RSP: 0018:ffffaf5281073958 EFLAGS: 00010286
[  205.799011] RAX: 0000000080000000 RBX: ffff9a0bd687ebe0 RCX: 0000000000000000
[  205.799014] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 00000000ffffffff
[  205.799016] RBP: ffffaf5281073a10 R08: 0000000000000003 R09: fffffffffffd5618
[  205.799019] R10: 0000000000ffff10 R11: 000000000000000f R12: ffff9a0bc53384d0
[  205.799022] R13: 0000000000000282 R14: 00000000ae000001 R15: 0000000000000001
[  205.799024] FS:  0000000000000000(0000) GS:ffff9a0d0f300000(0000) knlGS:0000000000000000
[  205.799028] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  205.799031] CR2: 00007ff6b8311554 CR3: 000000001ac10004 CR4: 00000000001706e0
[  205.799033] Call Trace:
[  205.799035]  <TASK>
[  205.799038]  ? ax25_dev_device_down+0xd9/0x1b0 [ax25]
[  205.799047]  ? ax25_device_event+0x9f/0x270 [ax25]
[  205.799055]  ? raw_notifier_call_chain+0x49/0x60
[  205.799060]  ? call_netdevice_notifiers_info+0x52/0xa0
[  205.799065]  ? dev_close_many+0xc8/0x120
[  205.799070]  ? unregister_netdevice_many+0x13d/0x890
[  205.799073]  ? unregister_netdevice_queue+0x90/0xe0
[  205.799076]  ? unregister_netdev+0x1d/0x30
[  205.799080]  ? mkiss_close+0x7c/0xc0 [mkiss]
[  205.799084]  ? tty_ldisc_close+0x2e/0x40
[  205.799089]  ? tty_ldisc_hangup+0x137/0x210
[  205.799092]  ? __tty_hangup.part.0+0x208/0x350
[  205.799098]  ? tty_vhangup+0x15/0x20
[  205.799103]  ? pty_close+0x127/0x160
[  205.799108]  ? tty_release+0x139/0x5e0
[  205.799112]  ? __fput+0x9f/0x260
[  205.799118]  ax25_dev_device_down+0xd9/0x1b0 [ax25]
[  205.799126]  ax25_device_event+0x9f/0x270 [ax25]
[  205.799135]  raw_notifier_call_chain+0x49/0x60
[  205.799140]  call_netdevice_notifiers_info+0x52/0xa0
[  205.799146]  dev_close_many+0xc8/0x120
[  205.799152]  unregister_netdevice_many+0x13d/0x890
[  205.799157]  unregister_netdevice_queue+0x90/0xe0
[  205.799161]  unregister_netdev+0x1d/0x30
[  205.799165]  mkiss_close+0x7c/0xc0 [mkiss]
[  205.799170]  tty_ldisc_close+0x2e/0x40
[  205.799173]  tty_ldisc_hangup+0x137/0x210
[  205.799178]  __tty_hangup.part.0+0x208/0x350
[  205.799184]  tty_vhangup+0x15/0x20
[  205.799188]  pty_close+0x127/0x160
[  205.799193]  tty_release+0x139/0x5e0
[  205.799199]  __fput+0x9f/0x260
[  205.799203]  ____fput+0xe/0x10
[  205.799208]  task_work_run+0x64/0xa0
[  205.799213]  do_exit+0x33b/0xab0
[  205.799217]  ? __handle_mm_fault+0xc4f/0x15f0
[  205.799224]  do_group_exit+0x35/0xa0
[  205.799228]  __x64_sys_exit_group+0x18/0x20
[  205.799232]  do_syscall_64+0x5c/0x80
[  205.799238]  ? handle_mm_fault+0xba/0x290
[  205.799242]  ? debug_smp_processor_id+0x17/0x20
[  205.799246]  ? fpregs_assert_state_consistent+0x26/0x50
[  205.799251]  ? exit_to_user_mode_prepare+0x49/0x190
[  205.799256]  ? irqentry_exit_to_user_mode+0x9/0x20
[  205.799260]  ? irqentry_exit+0x33/0x40
[  205.799263]  ? exc_page_fault+0x87/0x170
[  205.799268]  ? asm_exc_page_fault+0x8/0x30
[  205.799273]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  205.799277] RIP: 0033:0x7ff6b80eaca1
[  205.799281] Code: Unable to access opcode bytes at RIP 0x7ff6b80eac77.
[  205.799283] RSP: 002b:00007fff6dfd4738 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[  205.799287] RAX: ffffffffffffffda RBX: 00007ff6b8215a00 RCX: 00007ff6b80eaca1
[  205.799290] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001
[  205.799293] RBP: 0000000000000001 R08: ffffffffffffff80 R09: 0000000000000028
[  205.799295] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ff6b8215a00
[  205.799298] R13: 0000000000000000 R14: 00007ff6b821aee8 R15: 00007ff6b821af00
[  205.799304]  </TASK>

Fixes: feef318c85 ("ax25: fix UAF bugs of net_device caused by rebinding operation")
Reported-by: Bernard F6BVP <f6bvp@free.fr>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Duoming Zhou <duoming@zju.edu.cn>
Link: https://lore.kernel.org/r/20220728051821.3160118-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 22:06:15 -07:00
Moshe Shemesh
c90005b5f7 devlink: Hold the instance lock in health callbacks
Let the core take the devlink instance lock around health callbacks and
remove the now redundant locking in the drivers.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 21:58:47 -07:00
Jiri Pirko
2dec18ad82 net: devlink: remove region snapshots list dependency on devlink->lock
After mlx4 driver is converted to do locked reload,
devlink_region_snapshot_create() may be called from both locked and
unlocked context.

Note that in mlx4 region snapshots could be created on any command
failure. That can happen in any flow that involves commands to FW,
which means most of the driver flows.

So resolve this by removing dependency on devlink->lock for region
snapshots list consistency and introduce new mutex to ensure it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 21:58:46 -07:00
Jiri Pirko
5502e8712c net: devlink: remove region snapshot ID tracking dependency on devlink->lock
After mlx4 driver is converted to do locked reload, functions to get/put
regions snapshot ID may be called from both locked and unlocked context.

So resolve this by removing dependency on devlink->lock for region
snapshot ID tracking by using internal xa_lock() to maintain
shapshot_ids xa_array consistency.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 21:58:46 -07:00
Vikas Gupta
08f588fa30 devlink: introduce framework for selftests
Add a framework for running selftests.
Framework exposes devlink commands and test suite(s) to the user
to execute and query the supported tests by the driver.

Below are new entries in devlink_nl_ops
devlink_nl_cmd_selftests_show_doit/dumpit: To query the supported
selftests by the drivers.
devlink_nl_cmd_selftests_run: To execute selftests. Users can
provide a test mask for executing group tests or standalone tests.

Documentation/networking/devlink/ path is already part of MAINTAINERS &
the new files come under this path. Hence no update needed to the
MAINTAINERS

Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 21:56:53 -07:00
Tariq Toukan
7adc91e0c9 net/tls: Multi-threaded calls to TX tls_dev_del
Multiple TLS device-offloaded contexts can be added in parallel via
concurrent calls to .tls_dev_add, while calls to .tls_dev_del are
sequential in tls_device_gc_task.

This is not a sustainable behavior. This creates a rate gap between add
and del operations (addition rate outperforms the deletion rate).  When
running for enough time, the TLS device resources could get exhausted,
failing to offload new connections.

Replace the single-threaded garbage collector work with a per-context
alternative, so they can be handled on several cores in parallel. Use
a new dedicated destruct workqueue for this.

Tested with mlx5 device:
Before: 22141 add/sec,   103 del/sec
After:  11684 add/sec, 11684 del/sec

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 21:50:54 -07:00
Tariq Toukan
113671b255 net/tls: Perform immediate device ctx cleanup when possible
TLS context destructor can be run in atomic context. Cleanup operations
for device-offloaded contexts could require access and interaction with
the device callbacks, which might sleep. Hence, the cleanup of such
contexts must be deferred and completed inside an async work.

For all others, this is not necessary, as cleanup is atomic. Invoke
cleanup immediately for them, avoiding queueing redundant gc work.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 21:50:54 -07:00
Yang Li
8fd1e15177 tls: rx: Fix unsigned comparison with less than zero
The return from the call to tls_rx_msg_size() is int, it can be
a negative error code, however this is being assigned to an
unsigned long variable 'sz', so making 'sz' an int.

Eliminate the following coccicheck warning:
./net/tls/tls_strp.c:211:6-8: WARNING: Unsigned expression compared with zero: sz < 0

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220728031019.32838-1-yang.lee@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 21:50:39 -07:00
Jakub Kicinski
e20691fa36 tls: rx: fix the false positive warning
I went too far in the accessor conversion, we can't use tls_strp_msg()
after decryption because the message may not be ready. What we care
about on this path is that the output skb is detached, i.e. we didn't
somehow just turn around and used the input skb with its TCP data
still attached. So look at the anchor directly.

Fixes: 84c61fe1a7 ("tls: rx: do not use the standard strparser")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 21:50:00 -07:00
Jakub Kicinski
d11ef9cc5a tls: strp: rename and multithread the workqueue
Paolo points out that there seems to be no strong reason strparser
users a single threaded workqueue. Perhaps there were some performance
or pinning considerations? Since we don't know (and it's the slow path)
let's default to the most natural, multi-threaded choice.

Also rename the workqueue to "tls-".

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 21:49:59 -07:00
Jakub Kicinski
70f03fc2fc tls: rx: don't consider sock_rcvtimeo() cumulative
Eric indicates that restarting rcvtimeo on every wait may be fine.
I thought that we should consider it cumulative, and made
tls_rx_reader_lock() return the remaining timeo after acquiring
the reader lock.

tls_rx_rec_wait() gets its timeout passed in by value so it
does not keep track of time previously spent.

Make the lock waiting consistent with tls_rx_rec_wait() - don't
keep track of time spent.

Read the timeo fresh in tls_rx_rec_wait().
It's unclear to me why callers are supposed to cache the value.

Link: https://lore.kernel.org/all/CANn89iKcmSfWgvZjzNGbsrndmCch2HC_EPZ7qmGboDNaWoviNQ@mail.gmail.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 21:49:59 -07:00
Jakub Kicinski
272ac32f56 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 18:21:16 -07:00
Kuniyuki Iwashima
e27326009a net: ping6: Fix memleak in ipv6_renew_options().
When we close ping6 sockets, some resources are left unfreed because
pingv6_prot is missing sk->sk_prot->destroy().  As reported by
syzbot [0], just three syscalls leak 96 bytes and easily cause OOM.

    struct ipv6_sr_hdr *hdr;
    char data[24] = {0};
    int fd;

    hdr = (struct ipv6_sr_hdr *)data;
    hdr->hdrlen = 2;
    hdr->type = IPV6_SRCRT_TYPE_4;

    fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP);
    setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24);
    close(fd);

To fix memory leaks, let's add a destroy function.

Note the socket() syscall checks if the GID is within the range of
net.ipv4.ping_group_range.  The default value is [1, 0] so that no
GID meets the condition (1 <= GID <= 0).  Thus, the local DoS does
not succeed until we change the default value.  However, at least
Ubuntu/Fedora/RHEL loosen it.

    $ cat /usr/lib/sysctl.d/50-default.conf
    ...
    -net.ipv4.ping_group_range = 0 2147483647

Also, there could be another path reported with these options, and
some of them require CAP_NET_RAW.

  setsockopt
      IPV6_ADDRFORM (inet6_sk(sk)->pktoptions)
      IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu)
      IPV6_HOPOPTS (inet6_sk(sk)->opt)
      IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt)
      IPV6_RTHDR (inet6_sk(sk)->opt)
      IPV6_DSTOPTS (inet6_sk(sk)->opt)
      IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt)

  getsockopt
      IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list)

For the record, I left a different splat with syzbot's one.

  unreferenced object 0xffff888006270c60 (size 96):
    comm "repro2", pid 231, jiffies 4294696626 (age 13.118s)
    hex dump (first 32 bytes):
      01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00  ....D...........
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    backtrace:
      [<00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554)
      [<000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715)
      [<00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024)
      [<000000007096a025>] __sys_setsockopt (net/socket.c:2254)
      [<000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262)
      [<000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
      [<00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)

[0]: https://syzkaller.appspot.com/bug?extid=a8430774139ec3ab7176

Fixes: 6d0bfe2261 ("net: ipv6: Add IPv6 support to the ping socket.")
Reported-by: syzbot+a8430774139ec3ab7176@syzkaller.appspotmail.com
Reported-by: Ayushman Dutta <ayudutta@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220728012220.46918-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 10:42:08 -07:00
Paolo Abeni
7d85e9cb40 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
ice: PPPoE offload support

Marcin Szycik says:

Add support for dissecting PPPoE and PPP-specific fields in flow dissector:
PPPoE session id and PPP protocol type. Add support for those fields in
tc-flower and support offloading PPPoE. Finally, add support for hardware
offload of PPPoE packets in switchdev mode in ice driver.

Example filter:
tc filter add dev $PF1 ingress protocol ppp_ses prio 1 flower pppoe_sid \
    1234 ppp_proto ip skip_sw action mirred egress redirect dev $VF1_PR

Changes in iproute2 are required to use the new fields (will be submitted
soon).

ICE COMMS DDP package is required to create a filter in ice.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
  ice: Add support for PPPoE hardware offload
  flow_offload: Introduce flow_match_pppoe
  net/sched: flower: Add PPPoE filter
  flow_dissector: Add PPPoE dissectors
====================

Link: https://lore.kernel.org/r/20220726203133.2171332-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-28 11:54:56 +02:00
Jiri Pirko
2bb88b2c4f net: devlink: remove redundant net_eq() check from sb_pool_get_dumpit()
The net_eq() check is already performed inside
devlinks_xa_for_each_registered_get() helper, so remove the redundant
appearance.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20220727055912.568391-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27 18:54:25 -07:00
Zhengchao Shao
a482d47d33 net/sched: sch_cbq: change the type of cbq_set_lss to void
Change the type of cbq_set_lss to void.

Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Link: https://lore.kernel.org/r/20220726030748.243505-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27 18:30:18 -07:00
Xin Long
181d8d2066 sctp: leave the err path free in sctp_stream_init to sctp_stream_free
A NULL pointer dereference was reported by Wei Chen:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  RIP: 0010:__list_del_entry_valid+0x26/0x80
  Call Trace:
   <TASK>
   sctp_sched_dequeue_common+0x1c/0x90
   sctp_sched_prio_dequeue+0x67/0x80
   __sctp_outq_teardown+0x299/0x380
   sctp_outq_free+0x15/0x20
   sctp_association_free+0xc3/0x440
   sctp_do_sm+0x1ca7/0x2210
   sctp_assoc_bh_rcv+0x1f6/0x340

This happens when calling sctp_sendmsg without connecting to server first.
In this case, a data chunk already queues up in send queue of client side
when processing the INIT_ACK from server in sctp_process_init() where it
calls sctp_stream_init() to alloc stream_in. If it fails to alloc stream_in
all stream_out will be freed in sctp_stream_init's err path. Then in the
asoc freeing it will crash when dequeuing this data chunk as stream_out
is missing.

As we can't free stream out before dequeuing all data from send queue, and
this patch is to fix it by moving the err path stream_out/in freeing in
sctp_stream_init() to sctp_stream_free() which is eventually called when
freeing the asoc in sctp_association_free(). This fix also makes the code
in sctp_process_init() more clear.

Note that in sctp_association_init() when it fails in sctp_stream_init(),
sctp_association_free() will not be called, and in that case it should
go to 'stream_free' err path to free stream instead of 'fail_init'.

Fixes: 5bbbbe32a4 ("sctp: introduce stream scheduler foundations")
Reported-by: Wei Chen <harperchen1110@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/831a3dc100c4908ff76e5bcc363be97f2778bc0b.1658787066.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27 18:23:22 -07:00
Trond Myklebust
72691a269f SUNRPC: Don't reuse bvec on retransmission of the request
If a request is re-encoded and then retransmitted, we need to make sure
that we also re-encode the bvec, in case the page lists have changed.

Fixes: ff053dbbaf ("SUNRPC: Move the call to xprt_send_pagedata() out of xprt_sock_sendmsg()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-27 16:26:17 -04:00
Eric Dumazet
e62d2e1103 tcp: md5: fix IPv4-mapped support
After the blamed commit, IPv4 SYN packets handled
by a dual stack IPv6 socket are dropped, even if
perfectly valid.

$ nstat | grep MD5
TcpExtTCPMD5Failure             5                  0.0

For a dual stack listener, an incoming IPv4 SYN packet
would call tcp_inbound_md5_hash() with @family == AF_INET,
while tp->af_specific is pointing to tcp_sock_ipv6_specific.

Only later when an IPv4-mapped child is created, tp->af_specific
is changed to tcp_sock_ipv6_mapped_specific.

Fixes: 7bbb765b73 ("net/tcp: Merge TCP-MD5 inbound callbacks")
Reported-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Tested-by: Leonard Crestez <cdleonard@gmail.com>
Link: https://lore.kernel.org/r/20220726115743.2759832-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-27 10:18:21 -07:00
Trond Myklebust
6622e3a731 SUNRPC: Reinitialise the backchannel request buffers before reuse
When we're reusing the backchannel requests instead of freeing them,
then we should reinitialise any values of the send/receive xdr_bufs so
that they reflect the available space.

Fixes: 0d2a970d0a ("SUNRPC: Fix a backchannel race")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-27 12:45:17 -04:00
Stefan Raspl
28ec53f3a8 net/smc: Enable module load on netlink usage
Previously, the smc and smc_diag modules were automatically loaded as
dependencies of the ism module whenever an ISM device was present.
With the pending rework of the ISM API, the smc module will no longer
automatically be loaded in presence of an ISM device. Usage of an AF_SMC
socket will still trigger loading of the smc modules, but usage of a
netlink socket will not.
This is addressed by setting the correct module aliases.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27 13:24:42 +01:00
Stefan Raspl
8b2fed8e27 net/smc: Pass on DMBE bit mask in IRQ handler
Make the DMBE bits, which are passed on individually in ism_move() as
parameter idx, available to the receiver.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27 13:24:42 +01:00
Stefan Raspl
0a2f4f9893 s390/ism: Cleanups
Reworked signature of the function to retrieve the system EID: No plausible
reason to use a double pointer. And neither to pass in the device as an
argument, as this identifier is by definition per system, not per device.
Plus some minor consistency edits.

Signed-off-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27 13:24:42 +01:00
Heiko Carstens
eb481b02bd net/smc: Eliminate struct smc_ism_position
This struct is used in a single place only, and its usage generates
inefficient code. Time to clean up!

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-and-tested-by: Stefan Raspl <raspl@linux.ibm.com>
Signed-off-by: Wenjia Zhang < wenjia@linux.ibm.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-27 13:24:42 +01:00
Eric Dumazet
a7e555d4a1 ip6mr: remove stray rcu_read_unlock() from ip6_mr_forward()
One rcu_read_unlock() should have been removed in blamed commit.

Fixes: 9b1c21d898 ("ip6mr: do not acquire mrt_lock while calling ip6_mr_forward()")
Reported-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20220725200554.2563581-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26 19:59:18 -07:00
Mat Martineau
b5177ed92b mptcp: Do not return EINPROGRESS when subflow creation succeeds
New subflows are created within the kernel using O_NONBLOCK, so
EINPROGRESS is the expected return value from kernel_connect().
__mptcp_subflow_connect() has the correct logic to consider EINPROGRESS
to be a successful case, but it has also used that error code as its
return value.

Before v5.19 this was benign: all the callers ignored the return
value. Starting in v5.19 there is a MPTCP_PM_CMD_SUBFLOW_CREATE generic
netlink command that does use the return value, so the EINPROGRESS gets
propagated to userspace.

Make __mptcp_subflow_connect() always return 0 on success instead.

Fixes: ec3edaa7ca ("mptcp: Add handling of outgoing MP_JOIN requests")
Fixes: 702c2f646d ("mptcp: netlink: allow userspace-driven subflow establishment")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Link: https://lore.kernel.org/r/20220725205231.87529-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26 19:57:55 -07:00
Jakub Kicinski
e77ea97d2b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:

====================
netfilter updates for net

Three late fixes for netfilter:

1) If nf_queue user requests packet truncation below size of l3 header,
   we corrupt the skb, then crash.  Reject such requests.

2) add cond_resched() calls when doing cycle detection in the
   nf_tables graph.  This avoids softlockup warning with certain
   rulesets.

3) Reject rulesets that use nftables 'queue' expression in family/chain
   combinations other than those that are supported.  Currently the ruleset
   will load, but when userspace attempts to reinject you get WARN splat +
   packet drops.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_queue: only allow supported familes and hooks
  netfilter: nf_tables: add rescheduling points during loop detection walks
  netfilter: nf_queue: do not allow packet truncation below transport header offset
====================

Link: https://lore.kernel.org/r/20220726192056.13497-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26 19:53:09 -07:00
Jakub Kicinski
e53f529397 bluetooth pull request for net:
- Fix early wakeup after suspend
  - Fix double free on error
  - Fix use-after-free on l2cap_chan_put
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmLgXYkZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKd4AD/9RgobBI3tUfpKe0BXhju2p
 pjx/oL1SSqfF5kOvHNn/37dZ6tXQlqTl4yM0pD36Gos+UX1n1zyaBxm8teEhHx7h
 /yLiWW8cGmae1+OwrfL63uqjRnwJUNjp4EPHZNWczsCrSIr+RqYkWomyFzcXWcsi
 bjh2USvURKQZqeSDLMvEWI1H3Xjcu9q3atE5sXvJpUmthqraI5ewBYTLUMmvIM3J
 sm/wcA7cIl1HEOE21h6L+irMN33wtfjKADoYYV6BlWdAzj6xMmr/yxCbBS3jNNTW
 XaZLwJTp11YxeDu+ZSvEEIe5Q05DDY4qBMxWZxIFaBhxNWO+sDexHGjYonGvlL6M
 OjuLMdyqfDOCFuWYeuBAcQA9TfA/ibA7WC9OeEf7aZONI3ZPeX+71VXlAlz9Z8ly
 /DfV8keY7FIhoKrxJRgJa94hS/o4/zrIeP3Gylw4itbooLX0bKbrg9YBvgmldMT6
 054vdN8bXdfBi4ooWkzzCyZ86I/PRS3mtenJj+fK9iYaKZGRB4t3bOYo/7Gabb62
 VqrivooaVSL9A3bBRB3WO+5igf2o5gz03nAQjm8G1X/2gBtXz8wU3nZOrAsGa2W8
 3+dmx4BJ8EgUE4IDmbZIvFZwJaT7njjHmnDYpTlbl17Evquqjv8NooF5x0wTaebV
 cGhaI9RTPdLVtbDR0jXj4Q==
 =4Cuo
 -----END PGP SIGNATURE-----

Merge tag 'for-net-2022-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - Fix early wakeup after suspend
 - Fix double free on error
 - Fix use-after-free on l2cap_chan_put

* tag 'for-net-2022-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
  Bluetooth: Always set event mask on suspend
  Bluetooth: mgmt: Fix double free on error path
====================

Link: https://lore.kernel.org/r/20220726221328.423714-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26 19:48:24 -07:00
Jakub Kicinski
84c61fe1a7 tls: rx: do not use the standard strparser
TLS is a relatively poor fit for strparser. We pause the input
every time a message is received, wait for a read which will
decrypt the message, start the parser, repeat. strparser is
built to delineate the messages, wrap them in individual skbs
and let them float off into the stack or a different socket.
TLS wants the data pages and nothing else. There's no need
for TLS to keep cloning (and occasionally skb_unclone()'ing)
the TCP rx queue.

This patch uses a pre-allocated skb and attaches the skbs
from the TCP rx queue to it as frags. TLS is careful never
to modify the input skb without CoW'ing / detaching it first.

Since we call TCP rx queue cleanup directly we also get back
the benefit of skb deferred free.

Overall this results in a 6% gain in my benchmarks.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26 14:38:51 -07:00
Jakub Kicinski
8b3c59a7a0 tls: rx: device: add input CoW helper
Wrap the remaining skb_cow_data() into a helper, so it's easier
to replace down the lane. The new version will change the skb
so make sure relevant pointers get reloaded after the call.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26 14:38:51 -07:00
Jakub Kicinski
3f92a64e44 tcp: allow tls to decrypt directly from the tcp rcv queue
Expose TCP rx queue accessor and cleanup, so that TLS can
decrypt directly from the TCP queue. The expectation
is that the caller can access the skb returned from
tcp_recv_skb() and up to inq bytes worth of data (some
of which may be in ->next skbs) and then call
tcp_read_done() when data has been consumed.
The socket lock must be held continuously across
those two operations.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26 14:38:51 -07:00
Jakub Kicinski
d4e5db6452 tls: rx: device: keep the zero copy status with offload
The non-zero-copy path assumes a full skb with decrypted contents.
This means the device offload would have to CoW the data. Try
to keep the zero-copy status instead, copy the data to user space.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26 14:38:51 -07:00
Jakub Kicinski
b93f570016 tls: rx: don't free the output in case of zero-copy
In the future we'll want to reuse the input skb in case of
zero-copy so we shouldn't always free darg.skb. Move the
freeing of darg.skb into the non-zc cases. All cases will
now free ctx->recv_pkt (inside let tls_rx_rec_done()).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26 14:38:51 -07:00
Jakub Kicinski
dd47ed3620 tls: rx: factor SW handling out of tls_rx_one_record()
After recent changes the SW side of tls_rx_one_record() can
be nicely encapsulated in its own function. Move the pad handling
as well. This will be useful for ->zc handling in tls_decrypt_device().

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26 14:38:50 -07:00
Jakub Kicinski
b92a13d488 tls: rx: wrap recv_pkt accesses in helpers
To allow for the logic to change later wrap accesses
which interrogate the input skb in helper functions.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26 14:38:50 -07:00
Xin Xiong
9c9cb23e00 xfrm: fix refcount leak in __xfrm_policy_check()
The issue happens on an error path in __xfrm_policy_check(). When the
fetching process of the object `pols[1]` fails, the function simply
returns 0, forgetting to decrement the reference count of `pols[0]`,
which is incremented earlier by either xfrm_sk_policy_lookup() or
xfrm_policy_lookup(). This may result in memory leaks.

Fix it by decreasing the reference count of `pols[0]` in that path.

Fixes: 134b0fc544 ("IPsec: propagate security module errors up from flow_cache_lookup")
Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-07-26 23:08:24 +02:00
Jiri Pirko
7b2d9a1a50 net: devlink: introduce nested devlink entity for line card
For the purpose of exposing device info and allow flash update which is
going to be implemented in follow-up patches, introduce a possibility
for a line card to expose relation to nested devlink entity. The nested
devlink entity represents the line card.

Example:

$ devlink lc show pci/0000:01:00.0 lc 1
pci/0000:01:00.0:
  lc 1 state active type 16x100G nested_devlink auxiliary/mlxsw_core.lc.0
    supported_types:
       16x100G
$ devlink dev show auxiliary/mlxsw_core.lc.0
auxiliary/mlxsw_core.lc.0

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26 13:50:51 -07:00
Jiri Pirko
294c4f57cf net: devlink: move net check into devlinks_xa_for_each_registered_get()
Benefit from having devlinks iterator helper
devlinks_xa_for_each_registered_get() and move the net pointer
check inside.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26 13:50:50 -07:00
Jiri Pirko
30bab7cdb5 net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration
Remove dependency on devlink_mutex during devlinks xarray iteration.

The reason is that devlink_register/unregister() functions taking
devlink_mutex would deadlock during devlink reload operation of devlink
instance which registers/unregisters nested devlink instances.

The devlinks xarray consistency is ensured internally by xarray.
There is a reference taken when working with devlink using
devlink_try_get(). But there is no guarantee that devlink pointer
picked during xarray iteration is not freed before devlink_try_get()
is called.

Make sure that devlink_try_get() works with valid pointer.
Achieve it by:
1) Splitting devlink_put() so the completion is sent only
   after grace period. Completion unblocks the devlink_unregister()
   routine, which is followed-up by devlink_free()
2) During devlinks xa_array iteration, get devlink pointer from xa_array
   holding RCU read lock and taking reference using devlink_try_get()
   before unlock.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26 13:50:50 -07:00
Luiz Augusto von Dentz
d0be8347c6 Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
This fixes the following trace which is caused by hci_rx_work starting up
*after* the final channel reference has been put() during sock_close() but
*before* the references to the channel have been destroyed, so instead
the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to
prevent referencing a channel that is about to be destroyed.

  refcount_t: increment on 0; use-after-free.
  BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
  Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705

  CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S      W
  4.14.234-00003-g1fb6d0bd49a4-dirty #28
  Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150
  Google Inc. MSM sm8150 Flame DVT (DT)
  Workqueue: hci0 hci_rx_work
  Call trace:
   dump_backtrace+0x0/0x378
   show_stack+0x20/0x2c
   dump_stack+0x124/0x148
   print_address_description+0x80/0x2e8
   __kasan_report+0x168/0x188
   kasan_report+0x10/0x18
   __asan_load4+0x84/0x8c
   refcount_dec_and_test+0x20/0xd0
   l2cap_chan_put+0x48/0x12c
   l2cap_recv_frame+0x4770/0x6550
   l2cap_recv_acldata+0x44c/0x7a4
   hci_acldata_packet+0x100/0x188
   hci_rx_work+0x178/0x23c
   process_one_work+0x35c/0x95c
   worker_thread+0x4cc/0x960
   kthread+0x1a8/0x1c4
   ret_from_fork+0x10/0x18

Cc: stable@kernel.org
Reported-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26 13:35:24 -07:00
Abhishek Pandit-Subedi
ef61b6ea15 Bluetooth: Always set event mask on suspend
When suspending, always set the event mask once disconnects are
successful. Otherwise, if wakeup is disallowed, the event mask is not
set before suspend continues and can result in an early wakeup.

Fixes: 182ee45da0 ("Bluetooth: hci_sync: Rework hci_suspend_notifier")
Cc: stable@vger.kernel.org
Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26 13:35:13 -07:00
Dan Carpenter
4b2f4e072f Bluetooth: mgmt: Fix double free on error path
Don't call mgmt_pending_remove() twice (double free).

Fixes: 6b88eff43704 ("Bluetooth: hci_sync: Refactor remove Adv Monitor")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-26 13:32:40 -07:00
Tetsuo Handa
aa40d5a435 wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()
lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1],
for commit f856373e2f ("wifi: mac80211: do not wake queues on a vif
that is being stopped") guards clear_bit() using fq.lock even before
fq_init() from ieee80211_txq_setup_flows() initializes this spinlock.

According to discussion [2], Toke was not happy with expanding usage of
fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we
can instead use synchronize_rcu() for flushing ieee80211_wake_txqs().

Link: https://syzkaller.appspot.com/bug?extid=eceab52db7c4b961e9d6 [1]
Link: https://lkml.kernel.org/r/874k0zowh2.fsf@toke.dk [2]
Reported-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: f856373e2f ("wifi: mac80211: do not wake queues on a vif that is being stopped")
Tested-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@kernel.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/9cc9b81d-75a3-3925-b612-9d0ad3cab82b@I-love.SAKURA.ne.jp
[ pick up commit 3598cb6e18 ("wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()") from -next]
Link: https://lore.kernel.org/all/87o7xcq6qt.fsf@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-26 13:23:05 -07:00
Florian Westphal
47f4f510ad netfilter: nft_queue: only allow supported familes and hooks
Trying to use 'queue' statement in ingress (for example)
triggers a splat on reinject:

WARNING: CPU: 3 PID: 1345 at net/netfilter/nf_queue.c:291

... because nf_reinject cannot find the ruleset head.

The netdev family doesn't support async resume at the moment anyway,
so disallow loading such rulesets with a more appropriate
error message.

v2: add 'validate' callback and also check hook points, v1 did
allow ingress use in 'table inet', but that doesn't work either. (Pablo)

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-26 21:12:42 +02:00
Florian Westphal
81ea010667 netfilter: nf_tables: add rescheduling points during loop detection walks
Add explicit rescheduling points during ruleset walk.

Switching to a faster algorithm is possible but this is a much
smaller change, suitable for nf tree.

Link: https://bugzilla.netfilter.org/show_bug.cgi?id=1460
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-26 21:12:42 +02:00
Florian Westphal
99a63d36cb netfilter: nf_queue: do not allow packet truncation below transport header offset
Domingo Dirutigliano and Nicola Guerrera report kernel panic when
sending nf_queue verdict with 1-byte nfta_payload attribute.

The IP/IPv6 stack pulls the IP(v6) header from the packet after the
input hook.

If user truncates the packet below the header size, this skb_pull() will
result in a malformed skb (skb->len < 0).

Fixes: 7af4cc3fa1 ("[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink")
Reported-by: Domingo Dirutigliano <pwnzer0tt1@proton.me>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-26 21:12:42 +02:00
Wojciech Drewek
6a21b0856d flow_offload: Introduce flow_match_pppoe
Allow to offload PPPoE filters by adding flow_rule_match_pppoe.
Drivers can extract PPPoE specific fields from now on.

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26 10:49:27 -07:00
Wojciech Drewek
5008750eff net/sched: flower: Add PPPoE filter
Add support for PPPoE specific fields for tc-flower.
Those fields can be provided only when protocol was set
to ETH_P_PPP_SES. Defines, dump, load and set are being done here.

Overwrite basic.n_proto only in case of PPP_IP and PPP_IPV6,
otherwise leave it as ETH_P_PPP_SES.

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26 10:20:29 -07:00
Wojciech Drewek
46126db9c8 flow_dissector: Add PPPoE dissectors
Allow to dissect PPPoE specific fields which are:
- session ID (16 bits)
- ppp protocol (16 bits)
- type (16 bits) - this is PPPoE ethertype, for now only
  ETH_P_PPP_SES is supported, possible ETH_P_PPP_DISC
  in the future

The goal is to make the following TC command possible:

  # tc filter add dev ens6f0 ingress prio 1 protocol ppp_ses \
      flower \
        pppoe_sid 12 \
        ppp_proto ip \
      action drop

Note that only PPPoE Session is supported.

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26 09:49:12 -07:00
Benjamin Poirier
9b134b1694 bridge: Do not send empty IFLA_AF_SPEC attribute
After commit b6c02ef549 ("bridge: Netlink interface fix."),
br_fill_ifinfo() started to send an empty IFLA_AF_SPEC attribute when a
bridge vlan dump is requested but an interface does not have any vlans
configured.

iproute2 ignores such an empty attribute since commit b262a9becbcb
("bridge: Fix output with empty vlan lists") but older iproute2 versions as
well as other utilities have their output changed by the cited kernel
commit, resulting in failed test cases. Regardless, emitting an empty
attribute is pointless and inefficient.

Avoid this change by canceling the attribute if no AF_SPEC data was added.

Fixes: b6c02ef549 ("bridge: Netlink interface fix.")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/20220725001236.95062-1-bpoirier@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-26 15:35:53 +02:00
Joanne Koong
bbd52178e2 bpf: Fix bpf_xdp_pointer return pointer
For the case where offset + len == size, bpf_xdp_pointer should return a
valid pointer to the addr because that access is permitted. We should
only return NULL in the case where offset + len exceeds size.

Fixes: 3f364222d0 ("net: xdp: introduce bpf_xdp_pointer utility routine")
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/bpf/20220722220105.2065466-1-joannelkoong@gmail.com
2022-07-26 13:15:33 +02:00
Paul Chaignon
b8fff74852 bpf: Set flow flag to allow any source IP in bpf_tunnel_key
Commit 26101f5ab6 ("bpf: Add source ip in "struct bpf_tunnel_key"")
added support for getting and setting the outer source IP of encapsulated
packets via the bpf_skb_{get,set}_tunnel_key BPF helper. This change
allows BPF programs to set any IP address as the source, including for
example the IP address of a container running on the same host.

In that last case, however, the encapsulated packets are dropped when
looking up the route because the source IP address isn't assigned to any
interface on the host. To avoid this, we need to set the
FLOWI_FLAG_ANYSRC flag.

Fixes: 26101f5ab6 ("bpf: Add source ip in "struct bpf_tunnel_key"")
Signed-off-by: Paul Chaignon <paul@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/76873d384e21288abe5767551a0799ac93ec07fb.1658759380.git.paul@isovalent.com
2022-07-26 12:43:16 +02:00
Duoming Zhou
b89fc26f74 sctp: fix sleep in atomic context bug in timer handlers
There are sleep in atomic context bugs in timer handlers of sctp
such as sctp_generate_t3_rtx_event(), sctp_generate_probe_event(),
sctp_generate_t1_init_event(), sctp_generate_timeout_event(),
sctp_generate_t3_rtx_event() and so on.

The root cause is sctp_sched_prio_init_sid() with GFP_KERNEL parameter
that may sleep could be called by different timer handlers which is in
interrupt context.

One of the call paths that could trigger bug is shown below:

      (interrupt context)
sctp_generate_probe_event
  sctp_do_sm
    sctp_side_effects
      sctp_cmd_interpreter
        sctp_outq_teardown
          sctp_outq_init
            sctp_sched_set_sched
              n->init_sid(..,GFP_KERNEL)
                sctp_sched_prio_init_sid //may sleep

This patch changes gfp_t parameter of init_sid in sctp_sched_set_sched()
from GFP_KERNEL to GFP_ATOMIC in order to prevent sleep in atomic
context bugs.

Fixes: 5bbbbe32a4 ("sctp: introduce stream scheduler foundations")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://lore.kernel.org/r/20220723015809.11553-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-25 19:39:05 -07:00
William Dean
aa246499bb net: delete extra space and tab in blank line
delete extra space and tab in blank line, there is no functional change.

Reported-by: Hacash Robot <hacashRobot@santino.com>
Signed-off-by: William Dean <williamsukatube@gmail.com>
Link: https://lore.kernel.org/r/20220723073222.2961602-1-williamsukatube@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-25 19:38:31 -07:00
Vladimir Oltean
c7560d1203 net: dsa: fix reference counting for LAG FDBs
Due to an invalid conflict resolution on my side while working on 2
different series (LAG FDBs and FDB isolation), dsa_switch_do_lag_fdb_add()
does not store the database associated with a dsa_mac_addr structure.

So after adding an FDB entry associated with a LAG, dsa_mac_addr_find()
fails to find it while deleting it, because &a->db is zeroized memory
for all stored FDB entries of lag->fdbs, and dsa_switch_do_lag_fdb_del()
returns -ENOENT rather than deleting the entry.

Fixes: c26933639b ("net: dsa: request drivers to perform FDB isolation")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220723012411.1125066-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-25 19:37:06 -07:00
Jakub Kicinski
2baf8ba532 wireless-next patches for v5.20
Third set of patches for v5.20. MLO work continues and we have a lot
 of stack changes due to that, including driver API changes. Not much
 driver patches except on mt76.
 
 Major changes:
 
 cfg80211/mac80211
 
 * more prepartion for Wi-Fi 7 Multi-Link Operation (MLO) support,
   works with one link now
 
 * align with IEEE Draft P802.11be_D2.0
 
 * hardware timestamps for receive and transmit
 
 mt76
 
 * preparation for new chipset support
 
 * ACPI SAR support
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmLe1k4RHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZvxlQf8DrZIllhF0q/7Wry3JuG0gbNA+V2eI/lc
 OYrephsDBm/dvvyjcFWcdUzxoNk0k1+aOrx/09JijHFgCGKVwuK1+hxYVfjW2q43
 9mHxJBo4NcMk1RDDM3paVuZ8QMHuYugbv2mQOZeAEq2XloAaqEM7wVE+bb4Mgtgx
 VAKS5du2igrSt83wl8BRMFb9MPAM1EQ3Cw7Ro5T4y+1Qm/hrBm6qWizSpqh9CXYx
 pDLR3pvQxiD4Axa0Uq3rUbyF4hLwciqSFOJvr2sI3q7b9YElA7wIi6NQzMkYJH6Z
 7HW5K6UIQbblAaQkv2BLqpU1N6puTHUOAf5Md31vOAaOcGbSI5hbUA==
 =Cnxg
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2022-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v5.20

Third set of patches for v5.20. MLO work continues and we have a lot
of stack changes due to that, including driver API changes. Not much
driver patches except on mt76.

Major changes:

cfg80211/mac80211
 - more prepartion for Wi-Fi 7 Multi-Link Operation (MLO) support,
   works with one link now
 - align with IEEE Draft P802.11be_D2.0
 - hardware timestamps for receive and transmit

mt76
 - preparation for new chipset support
 - ACPI SAR support

* tag 'wireless-next-2022-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (254 commits)
  wifi: mac80211: fix link data leak
  wifi: mac80211: mlme: fix disassoc with MLO
  wifi: mac80211: add macros to loop over active links
  wifi: mac80211: remove erroneous sband/link validation
  wifi: mac80211: mlme: transmit assoc frame with address translation
  wifi: mac80211: verify link addresses are different
  wifi: mac80211: rx: track link in RX data
  wifi: mac80211: optionally implement MLO multicast TX
  wifi: mac80211: expand ieee80211_mgmt_tx() for MLO
  wifi: nl80211: add MLO link ID to the NL80211_CMD_FRAME TX API
  wifi: mac80211: report link ID to cfg80211 on mgmt RX
  wifi: cfg80211: report link ID in NL80211_CMD_FRAME
  wifi: mac80211: add hardware timestamps for RX and TX
  wifi: cfg80211: add hardware timestamps to frame RX info
  wifi: cfg80211/nl80211: move rx management data into a struct
  wifi: cfg80211: add a function for reporting TX status with hardware timestamps
  wifi: nl80211: add RX and TX timestamp attributes
  wifi: ieee80211: add helper functions for detecting TM/FTM frames
  wifi: mac80211_hwsim: handle links for wmediumd/virtio
  wifi: mac80211: sta_info: fix link_sta insertion
  ...
====================

Link: https://lore.kernel.org/r/20220725174547.EA465C341C6@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-25 18:20:52 -07:00
Olga Kornievskaia
92cc04f60a SUNRPC create a function that probes only offline transports
For only offline transports, attempt to check connectivity via
a NULL call and, if that succeeds, call a provided session trunking
detection function.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25 10:06:04 -04:00
Olga Kornievskaia
273d6aed9e SUNRPC export xprt_iter_rewind function
Make xprt_iter_rewind callable outside of xprtmultipath.c

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25 10:06:04 -04:00
Olga Kornievskaia
7960aa9e4d SUNRPC restructure rpc_clnt_setup_test_and_add_xprt
In preparation for code re-use, pull out the part of the
rpc_clnt_setup_test_and_add_xprt() portion that sends a NULL rpc
and then calls a session trunking function into a helper function.

Re-organize the end of the function for code re-use.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25 10:06:04 -04:00
Olga Kornievskaia
497e6464d6 SUNRPC create an rpc function that allows xprt removal from rpc_clnt
Expose a function that allows a removal of xprt from the rpc_clnt.

When called from NFS that's running a trunked transport then don't
decrement the active transport counter.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25 10:06:04 -04:00
Olga Kornievskaia
9368fd6c75 SUNRPC enable back offline transports in trunking discovery
When we are adding a transport to a xprt_switch that's already on
the list but has been marked OFFLINE, then make the state ONLINE
since it's been tested now.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25 10:06:04 -04:00
Olga Kornievskaia
95d0d30c66 SUNRPC create an iterator to list only OFFLINE xprts
Create a new iterator helper that will go thru the all the transports
in the switch and return transports that are marked OFFLINE.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25 10:06:04 -04:00
Olga Kornievskaia
895245ccea SUNRPC add function to offline remove trunkable transports
Iterate thru available transports in the xprt_switch for all
trunkable transports offline and possibly remote them as well.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25 10:06:04 -04:00
Olga Kornievskaia
7ffcdaa670 SUNRPC expose functions for offline remote xprt functionality
Re-arrange the code that make offline transport and delete transport
callable functions.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25 10:06:04 -04:00
David S. Miller
e222dc8d84 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2022-07-20

1) Don't set DST_NOPOLICY in IPv4, a recent patch made this
   superfluous. From Eyal Birger.

2) Convert alg_key to flexible array member to avoid an iproute2
   compile warning when built with gcc-12.
   From Stephen Hemminger.

3) xfrm_register_km and xfrm_unregister_km do always return 0
   so change the type to void. From Zhengchao Shao.

4) Fix spelling mistake in esp6.c
   From Zhang Jiaming.

5) Improve the wording of comment above XFRM_OFFLOAD flags.
   From Petr Vaněk.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25 13:25:39 +01:00
Kuniyuki Iwashima
96b9bd8c6d ipv4: Fix data-races around sysctl_fib_notify_on_flag_change.
While reading sysctl_fib_notify_on_flag_change, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 680aea08e7 ("net: ipv4: Emit notification when fib hardware flags are changed")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25 12:42:10 +01:00
Kuniyuki Iwashima
870e3a634b tcp: Fix data-races around sysctl_tcp_reflect_tos.
While reading sysctl_tcp_reflect_tos, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: ac8f1710c1 ("tcp: reflect tos value received in SYN to the socket")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25 12:42:10 +01:00
Kuniyuki Iwashima
79f55473bf tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.
While reading sysctl_tcp_comp_sack_nr, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 9c21d2fc41 ("tcp: add tcp_comp_sack_nr sysctl")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25 12:42:10 +01:00
Kuniyuki Iwashima
22396941a7 tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns.
While reading sysctl_tcp_comp_sack_slack_ns, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: a70437cc09 ("tcp: add hrtimer slack to sack compression")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25 12:42:10 +01:00
Kuniyuki Iwashima
4866b2b0f7 tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns.
While reading sysctl_tcp_comp_sack_delay_ns, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: 6d82aa2420 ("tcp: add tcp_comp_sack_delay_ns sysctl")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25 12:42:09 +01:00
Kuniyuki Iwashima
0273954595 net: Fix data-races around sysctl_[rw]mem(_offset)?.
While reading these sysctl variables, they can be changed concurrently.
Thus, we need to add READ_ONCE() to their readers.

  - .sysctl_rmem
  - .sysctl_rwmem
  - .sysctl_rmem_offset
  - .sysctl_wmem_offset
  - sysctl_tcp_rmem[1, 2]
  - sysctl_tcp_wmem[1, 2]
  - sysctl_decnet_rmem[1]
  - sysctl_decnet_wmem[1]
  - sysctl_tipc_rmem[1]

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25 12:42:09 +01:00
Kuniyuki Iwashima
59bf6c65a0 tcp: Fix data-races around sk_pacing_rate.
While reading sysctl_tcp_pacing_(ss|ca)_ratio, they can be changed
concurrently.  Thus, we need to add READ_ONCE() to their readers.

Fixes: 43e122b014 ("tcp: refine pacing rate determination")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25 12:42:09 +01:00
Taehee Yoo
3e7d18b9dc net: mld: fix reference count leak in mld_{query | report}_work()
mld_{query | report}_work() processes queued events.
If there are too many events in the queue, it re-queue a work.
And then, it returns without in6_dev_put().
But if queuing is failed, it should call in6_dev_put(), but it doesn't.
So, a reference count leak would occur.

THREAD0				THREAD1
mld_report_work()
				spin_lock_bh()
				if (!mod_delayed_work())
					in6_dev_hold();
				spin_unlock_bh()
	spin_lock_bh()
	schedule_delayed_work()
	spin_unlock_bh()

Script to reproduce(by Hangbin Liu):
   ip netns add ns1
   ip netns add ns2
   ip netns exec ns1 sysctl -w net.ipv6.conf.all.force_mld_version=1
   ip netns exec ns2 sysctl -w net.ipv6.conf.all.force_mld_version=1

   ip -n ns1 link add veth0 type veth peer name veth0 netns ns2
   ip -n ns1 link set veth0 up
   ip -n ns2 link set veth0 up

   for i in `seq 50`; do
           for j in `seq 100`; do
                   ip -n ns1 addr add 2021:${i}::${j}/64 dev veth0
                   ip -n ns2 addr add 2022:${i}::${j}/64 dev veth0
           done
   done
   modprobe -r veth
   ip -a netns del

splat looks like:
 unregister_netdevice: waiting for veth0 to become free. Usage count = 2
 leaked reference.
  ipv6_add_dev+0x324/0xec0
  addrconf_notify+0x481/0xd10
  raw_notifier_call_chain+0xe3/0x120
  call_netdevice_notifiers+0x106/0x160
  register_netdevice+0x114c/0x16b0
  veth_newlink+0x48b/0xa50 [veth]
  rtnl_newlink+0x11a2/0x1a40
  rtnetlink_rcv_msg+0x63f/0xc00
  netlink_rcv_skb+0x1df/0x3e0
  netlink_unicast+0x5de/0x850
  netlink_sendmsg+0x6c9/0xa90
  ____sys_sendmsg+0x76a/0x780
  __sys_sendmsg+0x27c/0x340
  do_syscall_64+0x43/0x90
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Fixes: f185de28d9 ("mld: add new workqueues for process mld events")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-25 12:33:59 +01:00
Jens Axboe
4effe18fc0 Merge branch 'for-5.20/io_uring' into for-5.20/io_uring-zerocopy-send
* for-5.20/io_uring: (716 commits)
  io_uring: ensure REQ_F_ISREG is set async offload
  net: fix compat pointer in get_compat_msghdr()
  io_uring: Don't require reinitable percpu_ref
  io_uring: fix types in io_recvmsg_multishot_overflow
  io_uring: Use atomic_long_try_cmpxchg in __io_account_mem
  io_uring: support multishot in recvmsg
  net: copy from user before calling __get_compat_msghdr
  net: copy from user before calling __copy_msghdr
  io_uring: support 0 length iov in buffer select in compat
  io_uring: fix multishot ending when not polled
  io_uring: add netmsg cache
  io_uring: impose max limit on apoll cache
  io_uring: add abstraction around apoll cache
  io_uring: move apoll cache to poll.c
  io_uring: consolidate hash_locked io-wq handling
  io_uring: clear REQ_F_HASH_LOCKED on hash removal
  io_uring: don't race double poll setting REQ_F_ASYNC_DATA
  io_uring: don't miss setting REQ_F_DOUBLE_POLL
  io_uring: disable multishot recvmsg
  io_uring: only trace one of complete or overflow
  ...

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:41:03 -06:00
Jens Axboe
4f6a94d337 net: fix compat pointer in get_compat_msghdr()
A previous change enabled external users to copy the data before
calling __get_compat_msghdr(), but didn't modify get_compat_msghdr() or
__io_compat_recvmsg_copy_hdr() to take that into account. They are both
stil passing in the __user pointer rather than the copied version.

Ensure we pass in the kernel struct, not the pointer to the user data.

Link: https://lore.kernel.org/all/46439555-644d-08a1-7d66-16f8f9a320f0@samsung.com/
Fixes: 1a3e4e94a1b9 ("net: copy from user before calling __get_compat_msghdr")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:18 -06:00
Dylan Yudaken
72c531f8ef net: copy from user before calling __get_compat_msghdr
this is in preparation for multishot receive from io_uring, where it needs
to have access to the original struct user_msghdr.

functionally this should be a no-op.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220714110258.1336200-3-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:17 -06:00
Dylan Yudaken
7fa875b8e5 net: copy from user before calling __copy_msghdr
this is in preparation for multishot receive from io_uring, where it needs
to have access to the original struct user_msghdr.

functionally this should be a no-op.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220714110258.1336200-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24 18:39:17 -06:00
Steven Rostedt (Google)
ded4a2f1ae mac80211: tracing: Use the new __vstring() helper
Instead of open coding a __dynamic_array() with a fixed length (which
defeats the purpose of the dynamic array in the first place). Use the new
__vstring() helper that will use a va_list and only write enough of the
string into the ring buffer that is needed.

Link: https://lkml.kernel.org/r/20220705224751.271015450@goodmis.org

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-24 19:11:17 -04:00
Maxim Mikityanskiy
f6336724a4 net/tls: Remove the context from the list in tls_device_down
tls_device_down takes a reference on all contexts it's going to move to
the degraded state (software fallback). If sk_destruct runs afterwards,
it can reduce the reference counter back to 1 and return early without
destroying the context. Then tls_device_down will release the reference
it took and call tls_device_free_ctx. However, the context will still
stay in tls_device_down_list forever. The list will contain an item,
memory for which is released, making a memory corruption possible.

Fix the above bug by properly removing the context from all lists before
any call to tls_device_free_ctx.

Fixes: 3740651bf7 ("tls: Fix context leak on tls_device_down")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-24 21:40:56 +01:00
Anna Schumaker
29946fbcb2 SUNRPC: Remove xdr_align_data() and xdr_expand_hole()
These functions are no longer needed now that the NFS client places data
and hole segments directly.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-23 15:38:29 -04:00
Anna Schumaker
e1bd87608d SUNRPC: Add a function for zeroing out a portion of an xdr_stream
This will be used during READ_PLUS decoding for handling HOLE segments.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-23 15:38:29 -04:00
Anna Schumaker
7c4cd5f4d2 SUNRPC: Add a function for directly setting the xdr page len
We need to do this step during READ_PLUS decoding so that we know pages
are the right length and any extra data has been preserved in the tail.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-23 15:38:29 -04:00
Anna Schumaker
4f5f3b6028 SUNRPC: Introduce xdr_stream_move_subsegment()
I do this by creating an xdr subsegment for the range we will be
operating over. This lets me shift data to the correct place without
potentially overwriting anything already there.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-23 15:38:29 -04:00
Chuck Lever
f67939e4b0 SUNRPC: Replace dprintk() call site in xs_data_ready
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-23 15:34:38 -04:00
Chuck Lever
0701214cd6 SUNRPC: Fail faster on bad verifier
A bad verifier is not a garbage argument, it's an authentication
failure. Retrying it doesn't make the problem go away, and delays
upper layer recovery steps.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-23 15:34:34 -04:00
Jakub Kicinski
4a934eca7b bluetooth-next pull request for net-next:
- Add support for IM Networks PID 0x3568
  - Add support for BCM4349B1
  - Add support for CYW55572
  - Add support for MT7922 VID/PID 0489/e0e2
  - Add support for Realtek RTL8852C
  - Initial support for Isochronous Channels/ISO sockets
  - Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING quirk
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmLbPmsZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKSo7EACc8njgHO2pN8ncWvgu/gH8
 0v1lRBoi+Tyzk5gZtdM0rIbE3t7tFqml3Kr0WsCkzV6CGgnqCw5i/MKZXCV8G4tG
 0ZsY8y6NMiCFR6wQq3rdNS8NqPmlCHWm6yY2EISEM6qbtF8HwxQvXdkzznZPHgVG
 DR7i5fAVuGA6rIL+9NSG/TxHjJvq6Bmu3v9Uu/V062I7NrBMw9Jr0Ic1EaUqgYck
 QL8+653ZZMxYPxt978UekbQIYEp3YwZ5MTACtX86j2s5tlZKuivKTIZch1vSaOi1
 1zC6up208+p2/4+Yq7FJ2kA6d2be3FD26oT1xymRhqiMakCRrHfdmTFpC7/J4ZgX
 /4mcIREkUoO2duim+91Hgt1Ww1vaD4joPwXD6AILbK1bdp0pi0gw47bQF8XO8uIh
 yPQqnoGWSJGD1VknPh5x7lGcAYQ3bgSg0L3TlQ4gN9Qc/emuC6UOQ9QPvwmyOilG
 ZrDn2p1Rpsoj8vVRTv6+CgKqLokXNUTPixCAaS4AIygRzhIzwReYYNMYUZgYMrTk
 Qf6bKqczEut1vq8NZiN3TTxLLOWHB5+3cdl/QRv4ZeoNXMXPmbtJ0y9Vud66GhZu
 vS6F+BNQapIkEbM6xrC/OepCgzrVoVn2BQuDO6SQA3pg4JxFeAl+V434Va3tqVeK
 9h6GHrl8KePjh528NXbH3Q==
 =Ne87
 -----END PGP SIGNATURE-----

Merge tag 'for-net-next-2022-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Luiz Augusto von Dentz says:

====================
bluetooth-next pull request for net-next:

 - Add support for IM Networks PID 0x3568
 - Add support for BCM4349B1
 - Add support for CYW55572
 - Add support for MT7922 VID/PID 0489/e0e2
 - Add support for Realtek RTL8852C
 - Initial support for Isochronous Channels/ISO sockets
 - Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING quirk

* tag 'for-net-next-2022-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (58 commits)
  Bluetooth: btusb: Detect if an ACL packet is in fact an ISO packet
  Bluetooth: btusb: Add support for ISO packets
  Bluetooth: ISO: Add broadcast support
  Bluetooth: Add initial implementation of BIS connections
  Bluetooth: Add BTPROTO_ISO socket type
  Bluetooth: Add initial implementation of CIS connections
  Bluetooth: hci_core: Introduce hci_recv_event_data
  Bluetooth: Convert delayed discov_off to hci_sync
  Bluetooth: Remove update_scan hci_request dependancy
  Bluetooth: Remove dead code from hci_request.c
  Bluetooth: btrtl: Fix typo in comment
  Bluetooth: MGMT: Fix holding hci_conn reference while command is queued
  Bluetooth: mgmt: Fix using hci_conn_abort
  Bluetooth: Use bt_status to convert from errno
  Bluetooth: Add bt_status
  Bluetooth: hci_sync: Split hci_dev_open_sync
  Bluetooth: hci_sync: Refactor remove Adv Monitor
  Bluetooth: hci_sync: Refactor add Adv Monitor
  Bluetooth: hci_sync: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING
  Bluetooth: btusb: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING for fake CSR
  ...
====================

Link: https://lore.kernel.org/r/20220723002232.964796-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-22 19:00:17 -07:00
Luiz Augusto von Dentz
f764a6c2c1 Bluetooth: ISO: Add broadcast support
This adds broadcast support for BTPROTO_ISO by extending the
sockaddr_iso with a new struct sockaddr_iso_bc where the socket user
can set the broadcast address when receiving, the SID and the BIS
indexes it wants to synchronize.

When using BTPROTO_ISO for broadcast the roles are:

Broadcaster -> uses connect with address set to BDADDR_ANY:
> tools/isotest -s 00:00:00:00:00:00

Broadcast Receiver -> uses listen with address set to broadcaster:
> tools/isotest -d 00:AA:01:00:00:00

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22 17:14:13 -07:00
Luiz Augusto von Dentz
eca0ae4aea Bluetooth: Add initial implementation of BIS connections
This adds initial support for BIS/BIG which includes:

== Broadcaster role: Setup a periodic advertising and create a BIG ==

> tools/isotest -s 00:00:00:00:00:00
isotest[63]: Connected [00:00:00:00:00:00]
isotest[63]: QoS BIG 0x00 BIS 0x00 Packing 0x00 Framing 0x00]
isotest[63]: Output QoS [Interval 10000 us Latency 10 ms SDU 40 PHY 0x02
RTN 2]
isotest[63]: Sending ...
isotest[63]: Number of packets: 1
isotest[63]: Socket jitter buffer: 80 buffer
< HCI Command: LE Set Perio.. (0x08|0x003e) plen 7
...
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Periodic Advertising Parameters (0x08|0x003e) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Perio.. (0x08|0x003f) plen 7
...
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Periodic Advertising Data (0x08|0x003f) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Perio.. (0x08|0x0040) plen 2
...
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Periodic Advertising Enable (0x08|0x0040) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Create B.. (0x08|0x0068) plen 31
...
> HCI Event: Command Status (0x0f) plen 4
      LE Create Broadcast Isochronous Group (0x08|0x0068) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 21
      LE Broadcast Isochronous Group Complete (0x1b)
      ...

== Broadcast Receiver role: Create a PA Sync and BIG Sync ==

> tools/isotest -i hci1 -d 00:AA:01:00:00:00
isotest[66]: Waiting for connection 00:AA:01:00:00:00...
< HCI Command: LE Periodic Advert.. (0x08|0x0044) plen 14
...
> HCI Event: Command Status (0x0f) plen 4
      LE Periodic Advertising Create Sync (0x08|0x0044) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Extended Sca.. (0x08|0x0041) plen 8
...
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Extended Scan Parameters (0x08|0x0041) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Extended Sca.. (0x08|0x0042) plen 6
...
> HCI Event: Command Complete (0x0e) plen 4
      LE Set Extended Scan Enable (0x08|0x0042) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 29
      LE Extended Advertising Report (0x0d)
      ...
> HCI Event: LE Meta Event (0x3e) plen 16
      LE Periodic Advertising Sync Established (0x0e)
      ...
< HCI Command: LE Broadcast Isoch.. (0x08|0x006b) plen 25
...
> HCI Event: Command Status (0x0f) plen 4
      LE Broadcast Isochronous Group Create Sync (0x08|0x006b) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 17
      LE Broadcast Isochronous Group Sync Estabilished (0x1d)
      ...

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22 17:13:56 -07:00
Luiz Augusto von Dentz
ccf74f2390 Bluetooth: Add BTPROTO_ISO socket type
This introduces a new socket type BTPROTO_ISO which can be enabled with
use of ISO Socket experiemental UUID, it can used to initiate/accept
connections and transfer packets between userspace and kernel similarly
to how BTPROTO_SCO works:

Central -> uses connect with address set to destination bdaddr:
> tools/isotest -s 00:AA:01:00:00:00

Peripheral -> uses listen:
> tools/isotest -d

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22 17:13:39 -07:00
Luiz Augusto von Dentz
26afbd826e Bluetooth: Add initial implementation of CIS connections
This adds the initial implementation of CIS connections and introduces
the ISO packets/links.

== Central: Set CIG Parameters, create a CIS and Setup Data Path ==

> tools/isotest -s <address>

< HCI Command: LE Extended Create... (0x08|0x0043) plen 26
...
> HCI Event: Command Status (0x0f) plen 4
      LE Extended Create Connection (0x08|0x0043) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 31
      LE Enhanced Connection Complete (0x0a)
      ...
< HCI Command: LE Create Connected... (0x08|0x0064) plen 5
...
> HCI Event: Command Status (0x0f) plen 4
      LE Create Connected Isochronous Stream (0x08|0x0064) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 29
      LE Connected Isochronous Stream Established (0x19)
      ...
< HCI Command: LE Setup Isochronou.. (0x08|0x006e) plen 13
...
> HCI Event: Command Complete (0x0e) plen 6
      LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1
        Status: Success (0x00)
        Handle: 257
< HCI Command: LE Setup Isochronou.. (0x08|0x006e) plen 13
...
> HCI Event: Command Complete (0x0e) plen 6
      LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1
        Status: Success (0x00)
        Handle: 257

== Peripheral: Accept CIS and Setup Data Path ==

> tools/isotest -d

 HCI Event: LE Meta Event (0x3e) plen 7
      LE Connected Isochronous Stream Request (0x1a)
...
< HCI Command: LE Accept Co.. (0x08|0x0066) plen 2
...
> HCI Event: LE Meta Event (0x3e) plen 29
      LE Connected Isochronous Stream Established (0x19)
...
< HCI Command: LE Setup Is.. (0x08|0x006e) plen 13
...
> HCI Event: Command Complete (0x0e) plen 6
      LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1
        Status: Success (0x00)
        Handle: 257
< HCI Command: LE Setup Is.. (0x08|0x006e) plen 13
...
> HCI Event: Command Complete (0x0e) plen 6
      LE Setup Isochronous Data Path (0x08|0x006e) ncmd 1
        Status: Success (0x00)
        Handle: 257

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22 17:13:22 -07:00
Jakub Kicinski
b3fce974d4 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
bpf-next 2022-07-22

We've added 73 non-merge commits during the last 12 day(s) which contain
a total of 88 files changed, 3458 insertions(+), 860 deletions(-).

The main changes are:

1) Implement BPF trampoline for arm64 JIT, from Xu Kuohai.

2) Add ksyscall/kretsyscall section support to libbpf to simplify tracing kernel
   syscalls through kprobe mechanism, from Andrii Nakryiko.

3) Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel
   function, from Song Liu & Jiri Olsa.

4) Add new kfunc infrastructure for netfilter's CT e.g. to insert and change
   entries, from Kumar Kartikeya Dwivedi & Lorenzo Bianconi.

5) Add a ksym BPF iterator to allow for more flexible and efficient interactions
   with kernel symbols, from Alan Maguire.

6) Bug fixes in libbpf e.g. for uprobe binary path resolution, from Dan Carpenter.

7) Fix BPF subprog function names in stack traces, from Alexei Starovoitov.

8) libbpf support for writing custom perf event readers, from Jon Doron.

9) Switch to use SPDX tag for BPF helper man page, from Alejandro Colomar.

10) Fix xsk send-only sockets when in busy poll mode, from Maciej Fijalkowski.

11) Reparent BPF maps and their charging on memcg offlining, from Roman Gushchin.

12) Multiple follow-up fixes around BPF lsm cgroup infra, from Stanislav Fomichev.

13) Use bootstrap version of bpftool where possible to speed up builds, from Pu Lehui.

14) Cleanup BPF verifier's check_func_arg() handling, from Joanne Koong.

15) Make non-prealloced BPF map allocations low priority to play better with
    memcg limits, from Yafang Shao.

16) Fix BPF test runner to reject zero-length data for skbs, from Zhengchao Shao.

17) Various smaller cleanups and improvements all over the place.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (73 commits)
  bpf: Simplify bpf_prog_pack_[size|mask]
  bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)
  bpf, x64: Allow to use caller address from stack
  ftrace: Allow IPMODIFY and DIRECT ops on the same function
  ftrace: Add modify_ftrace_direct_multi_nolock
  bpf/selftests: Fix couldn't retrieve pinned program in xdp veth test
  bpf: Fix build error in case of !CONFIG_DEBUG_INFO_BTF
  selftests/bpf: Fix test_verifier failed test in unprivileged mode
  selftests/bpf: Add negative tests for new nf_conntrack kfuncs
  selftests/bpf: Add tests for new nf_conntrack kfuncs
  selftests/bpf: Add verifier tests for trusted kfunc args
  net: netfilter: Add kfuncs to set and change CT status
  net: netfilter: Add kfuncs to set and change CT timeout
  net: netfilter: Add kfuncs to allocate and insert CT
  net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup
  bpf: Add documentation for kfuncs
  bpf: Add support for forcing kfunc args to be trusted
  bpf: Switch to new kfunc flags infrastructure
  tools/resolve_btfids: Add support for 8-byte BTF sets
  bpf: Introduce 8-byte BTF set
  ...
====================

Link: https://lore.kernel.org/r/20220722221218.29943-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-22 16:55:44 -07:00
Wei Wang
4d8f24eeed Revert "tcp: change pingpong threshold to 3"
This reverts commit 4a41f453be.

This to-be-reverted commit was meant to apply a stricter rule for the
stack to enter pingpong mode. However, the condition used to check for
interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is
jiffy based and might be too coarse, which delays the stack entering
pingpong mode.
We revert this patch so that we no longer use the above condition to
determine interactive session, and also reduce pingpong threshold to 1.

Fixes: 4a41f453be ("tcp: change pingpong threshold to 3")
Reported-by: LemmyHuang <hlm3280@163.com>
Suggested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220721204404.388396-1-weiwan@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-22 15:09:10 -07:00
Jakub Kicinski
3c47fb2f4c Merge branch 'io_uring-zerocopy-send' of git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux
Pull in Pavel's patch from a shared branch.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-22 14:53:33 -07:00
Pavel Begunkov
2829a267fc net: fix uninitialised msghdr->sg_from_iter
Because of how struct msghdr is usually initialised some fields and
sg_from_iter in particular might be left out not initialised, so we
can't safely use it in __zerocopy_sg_from_iter().

For now use the callback only when there is ->msg_ubuf set relying on
the fact that they're used together and we properly zero ->msg_ubuf.

Fixes: ebe73a284f ("net: Allow custom iter handler in msghdr")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Message-Id: <ce8b68b41351488f79fd998b032b3c56e9b1cc6c.1658401817.git.asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-22 14:52:32 -07:00
Luiz Augusto von Dentz
dfe6d5c3ec Bluetooth: hci_core: Introduce hci_recv_event_data
This introduces hci_recv_event_data to make it simpler to access the
contents of last received event rather than having to pass its contents
to the likes of *_ind/*_cfm callbacks.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22 13:20:52 -07:00
Brian Gix
0ef08313ce Bluetooth: Convert delayed discov_off to hci_sync
The timed ending of Discoverability was handled in hci_requst.c, with
calls using the deprecated hci_req_add() mechanism. Converted to live
inside mgmt.c using the same delayed work queue, but with hci_sync
version of hci_update_discoverable().

Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22 12:55:59 -07:00
Brian Gix
bb87672562 Bluetooth: Remove update_scan hci_request dependancy
This removes the remaining calls to HCI_OP_WRITE_SCAN_ENABLE from
hci_request call chains, and converts them to hci_sync calls.

Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22 12:55:40 -07:00
Brian Gix
ec2904c259 Bluetooth: Remove dead code from hci_request.c
The discov_update work queue is no longer used as a result
of the hci_sync rework.

The __hci_req_hci_power_on() function is no longer referenced in the
code as a result of the hci_sync rework.

Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22 12:55:21 -07:00
Luiz Augusto von Dentz
7b445e220d Bluetooth: MGMT: Fix holding hci_conn reference while command is queued
This removes the use of hci_conn_hold from Get Conn Info and Get Clock
Info since the callback can just do a lookup by address using the cmd
data and only then set cmd->user_data to pass to the complete callback.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-22 11:02:58 -07:00
Johannes Berg
fa28981b35 wifi: mac80211: fix link data leak
During the code reshuffling, I accidentally set this to
NULL before using it, fix that to fix the link data leak.

Fixes: d3e2439b0f ("wifi: mac80211: fix link manipulation")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:52 +02:00
Johannes Berg
a94c90d321 wifi: mac80211: mlme: fix disassoc with MLO
In MLO we shouldn't call ieee80211_bss_info_change_notify(),
call that only (for backward compatibility) without MLO, and
otherwise ieee80211_vif_cfg_change_notify().

Similarly, ieee80211_reset_erp_info() only applies to the
current link, and in MLO we assume the driver doesn't really
need that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:50 +02:00
Johannes Berg
45b12570a4 wifi: mac80211: remove erroneous sband/link validation
In sta_apply_parameters(), we really no longer need to
check that the link or sband exists, in fact, that's
harmful if link 0 doesn't exist, since then this will
fail.

Just remove this check, it was added for validation of
the sband where used, but it's not used here, it's now
only used in sta_link_apply_parameters() which has an
own lookup and check.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:45 +02:00
Johannes Berg
4ca04ed364 wifi: mac80211: mlme: transmit assoc frame with address translation
To transmit the association frame to the right station and
with address translation, use the correct addresses there
and set up the AP address in the configuration earlier so
it's applied during the transmit of auth/assoc frames.

Fixes: 81151ce462 ("wifi: mac80211: support MLO authentication/association with one link")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:41 +02:00
Johannes Berg
1773af9d6a wifi: mac80211: verify link addresses are different
When adding multiple links, verify that they all have
different addresses.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:39 +02:00
Johannes Berg
56057da456 wifi: mac80211: rx: track link in RX data
We'll need the link e.g. for decrypt, and shouldn't be
looking it up all the time later, so track it in the RX
data.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:38 +02:00
Johannes Berg
963d0e8d08 wifi: mac80211: optionally implement MLO multicast TX
For drivers using software encryption for multicast TX, such
as mac80211_hwsim, mac80211 needs to duplicate the multicast
frames on each link, if MLO is enabled. Do this, but don't
just make it dependent on the key but provide a separate flag
for drivers to opt out of this.

This is not very efficient, I expect that drivers will do it
in firmware/hardware or at least with DMA engine assistence,
so this is mostly for hwsim.

To make this work, also implement the SNS11 sequence number
space that an AP MLD shall have, and modify the API to the
__ieee80211_subif_start_xmit() function to always require the
link ID bits to be set.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:36 +02:00
Johannes Berg
e1e68b14c5 wifi: mac80211: expand ieee80211_mgmt_tx() for MLO
There are a couple of new things that should be possible
with MLO:
 * selecting the link to transmit to a station by link ID,
   which a previous patch added to the nl80211 API
 * selecting the link by frequency, similarly
 * allowing transmittion to an MLD without specifying any
   channel or link ID, with MLD addresses

Enable these use cases. Also fix the address comparison
in client mode to use the AP (MLD) address.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:35 +02:00
Johannes Berg
95f498bb49 wifi: nl80211: add MLO link ID to the NL80211_CMD_FRAME TX API
Allow optionally specifying the link ID to transmit on,
which can be done instead of the link frequency, on an
MLD addressed frame. Both can also be omitted in which
case the frame must be MLD addressed and link selection
(and address translation) will be done on lower layers.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:33 +02:00
Johannes Berg
2ec833a5aa wifi: mac80211: report link ID to cfg80211 on mgmt RX
For frames received on an MLD, report the link ID to
userspace.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:32 +02:00
Johannes Berg
6074c9e574 wifi: cfg80211: report link ID in NL80211_CMD_FRAME
If given by the underlying driver, report the link ID for
MLO in NL80211_CMD_FRAME.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:30 +02:00
Avraham Stern
f9202638df wifi: mac80211: add hardware timestamps for RX and TX
When the low level driver reports hardware timestamps for frame
TX status or frame RX, pass the timestamps to cfg80211.

Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:29 +02:00
Avraham Stern
1ff715ffa0 wifi: cfg80211: add hardware timestamps to frame RX info
Add hardware timestamps to management frame RX info.
This shall be used by drivers that support hardware timestamping for
Timing measurement and Fine timing measurement action frames RX.

Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:27 +02:00
Avraham Stern
00b3d84010 wifi: cfg80211/nl80211: move rx management data into a struct
The functions for reporting rx management take many arguments.
Collect all the arguments into a struct, which also make it easier
to add more arguments if needed.

Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:26 +02:00
Avraham Stern
ea7d50c925 wifi: cfg80211: add a function for reporting TX status with hardware timestamps
Add a function for reporting TX status with hardware timestamps. This
function shall be used for reporting the TX status of Timing
measurement and Fine timing measurement action frames by devices that
support reporting hardware timestamps.

Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:24 +02:00
Johannes Berg
177577dbd2 wifi: mac80211: sta_info: fix link_sta insertion
When inserting a link STA, make sure it doesn't exist first
and add lockdep assertions that we cannot modify the hash
table without holding the sta_mtx, so this check is really
correct.

Also return without hashing if the driver failed, and warn
if the hashing fails, which shouldn't happen due to the
check described above.

Fixes: cb71f1d136 ("wifi: mac80211: add sta link addition/removal")
Fixes: ba6ddab94f ("wifi: mac80211: maintain link-sta hash table")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:19 +02:00
Johannes Berg
b18d87f5d1 wifi: mac80211: mlme: fix link_sta setup
We need to copy the address to both the private and public
portion of the link_sta (the private one is needed for the
hash table). Fix this.

Fixes: bbe90107e1 ("wifi: mac80211: mlme: refactor link station setup")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:17 +02:00
Johannes Berg
9dd1953846 wifi: nl80211/mac80211: clarify link ID in control port TX
Clarify the link ID behaviour in control port TX, we need it
to select the link to transmit on for both MLD and non-MLD
receivers, but select the link address as the SA only if the
receiver is not an MLD.

Fixes: 67207bab93 ("wifi: cfg80211/mac80211: Support control port TX from specific link")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:16 +02:00
Johannes Berg
dd820ed633 wifi: mac80211: return error from control port TX for drops
If the frame is going to be dropped anyway because
ieee80211_lookup_ra_sta() returned an error (and
even though it's a bit racy, it will likely continue
to do so), return the error out instead of just
silently dropping the frame.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:14 +02:00
Johannes Berg
8876c67e62 wifi: nl80211: require MLD address on link STA add/modify
We always need the MLD address and link ID to add or
modify the link STA, so require it in the API.

Fixes: 577e5b8c39 ("wifi: cfg80211: add API to add/modify/remove a link station")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:13 +02:00
Johannes Berg
956b961337 wifi: mac80211: more station handling sanity checks
Add more sanity checks to the API handling, we shouldn't
be able to create a station without links, nor should we
be able to add a link to a station that wasn't created as
an MLD with links in the first place.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:11 +02:00
Johannes Berg
0ad49045f2 wifi: mac80211: fix link sta hash table handling
There are two issues here: we unhash the link stations only
directly before freeing the station they belong to, and we
also don't unhash all the links correctly in all cases. Fix
these issues.

Fixes: ba6ddab94f ("wifi: mac80211: maintain link-sta hash table")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:10 +02:00
Johannes Berg
9aebce6c97 wifi: mac80211: validate link address doesn't change
When modifying a link station, validate that the link address
doesn't change, except the first time the link is created.

Fixes: b95eb7f0ee ("wifi: cfg80211/mac80211: separate link params from station params")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:28:08 +02:00
Johannes Berg
6d8e0f84f8 wifi: mac80211: mlme: set sta.mlo to mlo state
At this point, we've already changed link_id to be zero for
a non-MLO connection, so use the 'mlo' variable rather than
link ID to determine the MLO status of the station.

Fixes: bd363ee533 ("wifi: mac80211: mlme: set sta.mlo correctly")
Fixes: 81151ce462 ("wifi: mac80211: support MLO authentication/association with one link")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:27:59 +02:00
Johannes Berg
0f13f3c322 wifi: mac80211: fast-xmit: handle non-MLO clients
If there's a non-MLO client, the A2 must be set to the BSSID
of the link since no translation will happen in lower layers
and it's needed that way for encryption.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:27:58 +02:00
Johannes Berg
1f6389440c wifi: mac80211: fix RX MLD address translation
We should only translate addr3 here if it's the BSSID.

Fixes: 42fb9148c0 ("wifi: mac80211: do link->MLD address translation on RX")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:27:48 +02:00
Johannes Berg
206c8c0680 wifi: mac80211: fix NULL pointer deref with non-MLD STA
If we have a non-MLD STA on an AP MLD, we crash while
adding the station. Fix that, in this case we need to
use the STA's address also on the link data structure.

Fixes: f36fe0a2df ("wifi: mac80211: fix up link station creation/insertion")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:27:47 +02:00
Johannes Berg
553a282cb2 wifi: mac80211: mlme: fix override calculation
In my previous changes here, I neglected to take the old
conn_flags into account that might still be present from
the authentication, and thus ieee80211_setup_assoc_link()
can misbehave, as well as the override calculation being
wrong. Fix that by ORing in the old flags.

Fixes: 1845c1d4a4 ("wifi: mac80211: mlme: refactor assoc link setup")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:27:45 +02:00
Johannes Berg
8a9be422f5 wifi: mac80211: tx: use AP address in some places for MLO
In a few places we need to use the AP (MLD) address, not the
deflink BSSID, the link address translation will happen later.

To make that work properly for fast-xmit, set up the ap_addr
in the vif.cfg earlier.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-07-22 14:27:41 +02:00
Christophe JAILLET
8ee18e2a9e caif: Fix bitmap data type in "struct caifsock"
Bitmap are "unsigned long", so use it instead of a "u32" to make things
more explicit.

While at it, remove some useless cast (and leading spaces) when using the
bitmap API.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22 12:51:45 +01:00
Alan Brady
16576a034c ping: support ipv6 ping socket flow labels
Ping sockets don't appear to make any attempt to preserve flow labels
created and set by userspace using IPV6_FLOWINFO_SEND. Instead they are
clobbered by autolabels (if enabled) or zero.

Grab the flowlabel out of the msghdr similar to how rawv6_sendmsg does
it and move the memset up so it doesn't get zeroed after.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22 12:40:27 +01:00
Kuniyuki Iwashima
2afdbe7b8d tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit.
While reading sysctl_tcp_invalid_ratelimit, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: 032ee42369 ("tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22 12:06:18 +01:00
Kuniyuki Iwashima
85225e6f0a tcp: Fix a data-race around sysctl_tcp_autocorking.
While reading sysctl_tcp_autocorking, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: f54b311142 ("tcp: auto corking")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22 12:06:18 +01:00
Kuniyuki Iwashima
1330ffacd0 tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.
While reading sysctl_tcp_min_rtt_wlen, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: f672258391 ("tcp: track min RTT using windowed min-filter")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22 12:06:18 +01:00
Kuniyuki Iwashima
2455e61b85 tcp: Fix a data-race around sysctl_tcp_tso_rtt_log.
While reading sysctl_tcp_tso_rtt_log, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 65466904b0 ("tcp: adjust TSO packet sizes based on min_rtt")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22 12:06:17 +01:00
Kuniyuki Iwashima
e0bb4ab9df tcp: Fix a data-race around sysctl_tcp_min_tso_segs.
While reading sysctl_tcp_min_tso_segs, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 95bd09eb27 ("tcp: TSO packets automatic sizing")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22 12:06:17 +01:00
Kuniyuki Iwashima
db3815a2fa tcp: Fix a data-race around sysctl_tcp_challenge_ack_limit.
While reading sysctl_tcp_challenge_ack_limit, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: 282f23c6ee ("tcp: implement RFC 5961 3.2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22 12:06:17 +01:00
Kuniyuki Iwashima
9fb90193fb tcp: Fix a data-race around sysctl_tcp_limit_output_bytes.
While reading sysctl_tcp_limit_output_bytes, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: 46d3ceabd8 ("tcp: TCP Small Queues")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22 12:06:17 +01:00
Kuniyuki Iwashima
0f1e4d0659 tcp: Fix data-races around sysctl_tcp_workaround_signed_windows.
While reading sysctl_tcp_workaround_signed_windows, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 15d99e02ba ("[TCP]: sysctl to allow TCP window > 32767 sans wscale")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22 12:06:17 +01:00
Kuniyuki Iwashima
7804764888 tcp: Fix data-races around sysctl_tcp_moderate_rcvbuf.
While reading sysctl_tcp_moderate_rcvbuf, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22 12:06:17 +01:00
Kuniyuki Iwashima
ab1ba21b52 tcp: Fix data-races around sysctl_tcp_no_ssthresh_metrics_save.
While reading sysctl_tcp_no_ssthresh_metrics_save, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 65e6d90168 ("net-tcp: Disable TCP ssthresh metrics cache by default")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22 12:06:17 +01:00
Kuniyuki Iwashima
8499a2454d tcp: Fix a data-race around sysctl_tcp_nometrics_save.
While reading sysctl_tcp_nometrics_save, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22 12:06:17 +01:00
Kuniyuki Iwashima
706c6202a3 tcp: Fix a data-race around sysctl_tcp_frto.
While reading sysctl_tcp_frto, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22 12:06:17 +01:00
Kuniyuki Iwashima
02ca527ac5 tcp: Fix a data-race around sysctl_tcp_app_win.
While reading sysctl_tcp_app_win, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22 12:06:17 +01:00
Kuniyuki Iwashima
58ebb1c8b3 tcp: Fix data-races around sysctl_tcp_dsack.
While reading sysctl_tcp_dsack, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-22 12:06:17 +01:00
Lorenzo Bianconi
ef69aa3a98 net: netfilter: Add kfuncs to set and change CT status
Introduce bpf_ct_set_status and bpf_ct_change_status kfunc helpers in
order to set nf_conn field of allocated entry or update nf_conn status
field of existing inserted entry. Use nf_ct_change_status_common to
share the permitted status field changes between netlink and BPF side
by refactoring ctnetlink_change_status.

It is required to introduce two kfuncs taking nf_conn___init and nf_conn
instead of sharing one because KF_TRUSTED_ARGS flag causes strict type
checking. This would disallow passing nf_conn___init to kfunc taking
nf_conn, and vice versa. We cannot remove the KF_TRUSTED_ARGS flag as we
only want to accept refcounted pointers and not e.g. ct->master.

Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and
bpf_ct_change_* kfuncs are meant to be used on inserted or looked up
CT entry.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-10-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-21 21:03:16 -07:00
Kumar Kartikeya Dwivedi
0b38923644 net: netfilter: Add kfuncs to set and change CT timeout
Introduce bpf_ct_set_timeout and bpf_ct_change_timeout kfunc helpers in
order to change nf_conn timeout. This is same as ctnetlink_change_timeout,
hence code is shared between both by extracting it out to
__nf_ct_change_timeout. It is also updated to return an error when it
sees IPS_FIXED_TIMEOUT_BIT bit in ct->status, as that check was missing.

It is required to introduce two kfuncs taking nf_conn___init and nf_conn
instead of sharing one because KF_TRUSTED_ARGS flag causes strict type
checking. This would disallow passing nf_conn___init to kfunc taking
nf_conn, and vice versa. We cannot remove the KF_TRUSTED_ARGS flag as we
only want to accept refcounted pointers and not e.g. ct->master.

Apart from this, bpf_ct_set_timeout is only called for newly allocated
CT so it doesn't need to inspect the status field just yet. Sharing the
helpers even if it was possible would make timeout setting helper
sensitive to order of setting status and timeout after allocation.

Hence, bpf_ct_set_* kfuncs are meant to be used on allocated CT, and
bpf_ct_change_* kfuncs are meant to be used on inserted or looked up
CT entry.

Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-9-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-21 21:03:16 -07:00
Lorenzo Bianconi
d7e79c97c0 net: netfilter: Add kfuncs to allocate and insert CT
Introduce bpf_xdp_ct_alloc, bpf_skb_ct_alloc and bpf_ct_insert_entry
kfuncs in order to insert a new entry from XDP and TC programs.
Introduce bpf_nf_ct_tuple_parse utility routine to consolidate common
code.

We extract out a helper __nf_ct_set_timeout, used by the ctnetlink and
nf_conntrack_bpf code, extract it out to nf_conntrack_core, so that
nf_conntrack_bpf doesn't need a dependency on CONFIG_NF_CT_NETLINK.
Later this helper will be reused as a helper to set timeout of allocated
but not yet inserted CT entry.

The allocation functions return struct nf_conn___init instead of
nf_conn, to distinguish allocated CT from an already inserted or looked
up CT. This is later used to enforce restrictions on what kfuncs
allocated CT can be used with.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-8-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-21 21:03:16 -07:00
Kumar Kartikeya Dwivedi
aed8ee7feb net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup
Move common checks inside the common function, and maintain the only
difference the two being how to obtain the struct net * from ctx.
No functional change intended.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-7-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-21 21:03:16 -07:00
Kumar Kartikeya Dwivedi
56e948ffc0 bpf: Add support for forcing kfunc args to be trusted
Teach the verifier to detect a new KF_TRUSTED_ARGS kfunc flag, which
means each pointer argument must be trusted, which we define as a
pointer that is referenced (has non-zero ref_obj_id) and also needs to
have its offset unchanged, similar to how release functions expect their
argument. This allows a kfunc to receive pointer arguments unchanged
from the result of the acquire kfunc.

This is required to ensure that kfunc that operate on some object only
work on acquired pointers and not normal PTR_TO_BTF_ID with same type
which can be obtained by pointer walking. The restrictions applied to
release arguments also apply to trusted arguments. This implies that
strict type matching (not deducing type by recursively following members
at offset) and OBJ_RELEASE offset checks (ensuring they are zero) are
used for trusted pointer arguments.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-21 21:03:09 -07:00
Kumar Kartikeya Dwivedi
a4703e3184 bpf: Switch to new kfunc flags infrastructure
Instead of populating multiple sets to indicate some attribute and then
researching the same BTF ID in them, prepare a single unified BTF set
which indicates whether a kfunc is allowed to be called, and also its
attributes if any at the same time. Now, only one call is needed to
perform the lookup for both kfunc availability and its attributes.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220721134245.2450-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-21 20:59:42 -07:00
Jaehee Park
b66eb3a6e4 net: ipv6: avoid accepting values greater than 2 for accept_untracked_na
The accept_untracked_na sysctl changed from a boolean to an integer
when a new knob '2' was added. This patch provides a safeguard to avoid
accepting values that are not defined in the sysctl. When setting a
value greater than 2, the user will get an 'invalid argument' warning.

Fixes: aaa5f515b1 ("net: ipv6: new accept_untracked_na option to accept na only if in-network")
Signed-off-by: Jaehee Park <jhpark1013@gmail.com>
Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Suggested-by: Roopa Prabhu <roopa@nvidia.com>
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Link: https://lore.kernel.org/r/20220720183632.376138-1-jhpark1013@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-21 19:11:10 -07:00
Jakub Kicinski
dde06aaa89 tls: rx: release the sock lock on locking timeout
Eric reports we should release the socket lock if the entire
"grab reader lock" operation has failed. The callers assume
they don't have to release it or otherwise unwind.

Reported-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+16e72110feb2b653ef27@syzkaller.appspotmail.com
Fixes: 4cbc325ed6 ("tls: rx: allow only one reader at a time")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220720203701.2179034-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-21 18:58:11 -07:00
Luiz Augusto von Dentz
1f7435c8f6 Bluetooth: mgmt: Fix using hci_conn_abort
This fixes using hci_conn_abort instead of using hci_conn_abort_sync.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:16:10 -07:00
Luiz Augusto von Dentz
a86ddbffa6 Bluetooth: Use bt_status to convert from errno
If a command cannot be sent or there is a internal error an errno maybe
set instead of a command status.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:15:55 -07:00
Luiz Augusto von Dentz
ca2045e059 Bluetooth: Add bt_status
This adds bt_status which can be used to convert Unix errno to
Bluetooth status.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:15:31 -07:00
Luiz Augusto von Dentz
1bbf4023cf Bluetooth: hci_sync: Split hci_dev_open_sync
This splits hci_dev_open_sync so each stage is handle by its own
function so it is easier to identify each stage.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:15:17 -07:00
Manish Mandlik
7cf5c2978f Bluetooth: hci_sync: Refactor remove Adv Monitor
Make use of hci_cmd_sync_queue for removing an advertisement monitor.

Signed-off-by: Manish Mandlik <mmandlik@google.com>
Reviewed-by: Miao-chen Chou <mcchou@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:14:55 -07:00
Manish Mandlik
b747a83690 Bluetooth: hci_sync: Refactor add Adv Monitor
Make use of hci_cmd_sync_queue for adding an advertisement monitor.

Signed-off-by: Manish Mandlik <mmandlik@google.com>
Reviewed-by: Miao-chen Chou <mcchou@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:14:32 -07:00
Zijun Hu
63b1a7dd38 Bluetooth: hci_sync: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING
Core driver addtionally checks LMP feature bit "Erroneous Data Reporting"
instead of quirk HCI_QUIRK_BROKEN_ERR_DATA_REPORTING to decide if HCI
commands HCI_Read|Write_Default_Erroneous_Data_Reporting are broken, so
remove this unnecessary quirk.

Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Tested-by: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:14:10 -07:00
Zijun Hu
766ae2422b Bluetooth: hci_sync: Check LMP feature bit instead of quirk
BT core driver should addtionally check LMP feature bit
"Erroneous Data Reporting" instead of quirk
HCI_QUIRK_BROKEN_ERR_DATA_REPORTING set by BT device driver to decide if
HCI commands HCI_Read|Write_Default_Erroneous_Data_Reporting are broken.

BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 2, Part C | page 587
This feature indicates whether the device is able to support the
Packet_Status_Flag and the HCI commands HCI_Write_Default_-
Erroneous_Data_Reporting and HCI_Read_Default_Erroneous_-
Data_Reporting.

the quirk was introduced by 'commit cde1a8a992 ("Bluetooth: btusb: Fix
and detect most of the Chinese Bluetooth controllers")' to mark HCI
commands HCI_Read|Write_Default_Erroneous_Data_Reporting broken by BT
device driver, but the reason why these two HCI commands are broken is
that feature "Erroneous Data Reporting" is not enabled by firmware, this
scenario is illustrated by below log of QCA controllers with USB I/F:

@ RAW Open: hcitool (privileged) version 2.22
< HCI Command: Read Local Supported Commands (0x04|0x0002) plen 0
> HCI Event: Command Complete (0x0e) plen 68
      Read Local Supported Commands (0x04|0x0002) ncmd 1
        Status: Success (0x00)
        Commands: 288 entries
......
          Read Default Erroneous Data Reporting (Octet 18 - Bit 2)
          Write Default Erroneous Data Reporting (Octet 18 - Bit 3)
......

< HCI Command: Read Default Erroneous Data Reporting (0x03|0x005a) plen 0
> HCI Event: Command Complete (0x0e) plen 4
      Read Default Erroneous Data Reporting (0x03|0x005a) ncmd 1
        Status: Unknown HCI Command (0x01)

< HCI Command: Read Local Supported Features (0x04|0x0003) plen 0
> HCI Event: Command Complete (0x0e) plen 12
      Read Local Supported Features (0x04|0x0003) ncmd 1
        Status: Success (0x00)
        Features: 0xff 0xfe 0x0f 0xfe 0xd8 0x3f 0x5b 0x87
          3 slot packets
......

Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Tested-by: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:13:18 -07:00
Zijun Hu
0feb8af027 Bluetooth: hci_sync: Correct hci_set_event_mask_page_2_sync() event mask
Event HCI_Truncated_Page_Complete should belong to central
and HCI_Peripheral_Page_Response_Timeout should belong to
peripheral, but hci_set_event_mask_page_2_sync() take these
two events for wrong roles, so correct it by this change.

Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:13:03 -07:00
Luiz Augusto von Dentz
6828b58307 Bluetooth: hci_sync: Don't remove connected devices from accept list
These devices are likely going to be reprogrammed when disconnected so
this avoid a whole bunch of commands attempt to remove and the add back
to the list.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Zhengping Jiang <jiangzp@google.com>
2022-07-21 17:09:34 -07:00
Luiz Augusto von Dentz
0900b1c62f Bluetooth: hci_sync: Fix not updating privacy_mode
When programming a new entry into the resolving list it shall default
to network mode since the params may contain the mode programmed when
the device was last added to the resolving list.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=209745
Fixes: 853b70b506 ("Bluetooth: hci_sync: Set Privacy Mode when updating the resolving list")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Zhengping Jiang <jiangzp@google.com>
2022-07-21 17:09:19 -07:00
Tamas Koczka
9f30de9e03 Bluetooth: Collect kcov coverage from hci_rx_work
Annotate hci_rx_work() with kcov_remote_start() and kcov_remote_stop()
calls, so remote KCOV coverage is collected while processing the rx_q
queue which is the main incoming Bluetooth packet queue.

Coverage is associated with the thread which created the packet skb.

The collected extra coverage helps kernel fuzzing efforts in finding
vulnerabilities.

This change only has effect if the kernel is compiled with CONFIG_KCOV,
otherwise kcov_ functions don't do anything.

Signed-off-by: Tamas Koczka <poprdi@google.com>
Tested-by: Aleksandr Nogikh <nogikh@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:09:06 -07:00
Zhengping Jiang
68253f3cd7 Bluetooth: hci_sync: Fix resuming scan after suspend resume
After resuming, remove setting scanning_paused to false, because it is
checked and set to false in hci_resume_scan_sync. Also move setting
the value to false before updating passive scan, because the value is
used when resuming passive scan.

Fixes: 3b42055388 (Bluetooth: hci_sync: Fix attempting to suspend with
unfiltered passive scan)

Signed-off-by: Zhengping Jiang <jiangzp@google.com>
Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:08:52 -07:00
Zhengping Jiang
d7b2fdfb53 Bluetooth: mgmt: Fix refresh cached connection info
Set the connection data before calling get_conn_info_sync, so it can be
verified the connection is still connected, before refreshing cached
values.

Fixes: 47db6b4299 ("Bluetooth: hci_sync: Convert MGMT_OP_GET_CONN_INFO")
Signed-off-by: Zhengping Jiang <jiangzp@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:07:50 -07:00
Luiz Augusto von Dentz
34a718bc86 Bluetooth: HCI: Fix not always setting Scan Response/Advertising Data
The scan response and advertising data needs to be tracked on a per
instance (adv_info) since when these instaces are removed so are their
data, to fix that new flags are introduced which is used to mark when
the data changes and then checked to confirm when the data needs to be
synced with the controller.

Tested-by: Tedd Ho-Jeong An <tedd.an@intel.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:07:30 -07:00
Luiz Augusto von Dentz
dd7b8cdde0 Bluetooth: eir: Fix using strlen with hdev->{dev_name,short_name}
Both dev_name and short_name are not guaranteed to be NULL terminated so
this instead use strnlen and then attempt to determine if the resulting
string needs to be truncated or not.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216018
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-07-21 17:07:16 -07:00
Xiaohui Zhang
a5133fe87e Bluetooth: use memset avoid memory leaks
Similar to the handling of l2cap_ecred_connect in commit d3715b2333
("Bluetooth: use memset avoid memory leaks"), we thought a patch
might be needed here as well.

Use memset to initialize structs to prevent memory leaks
in l2cap_le_connect

Signed-off-by: Xiaohui Zhang <xiaohuizhang@ruc.edu.cn>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-07-21 17:07:02 -07:00
Dan Carpenter
9111786492 Bluetooth: fix an error code in hci_register_dev()
Preserve the error code from hci_register_suspend_notifier().  Don't
return success.

Fixes: d6bb2a91f95b ("Bluetooth: Unregister suspend with userchannel")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-07-21 17:06:49 -07:00
Abhishek Pandit-Subedi
359ee4f834 Bluetooth: Unregister suspend with userchannel
When HCI_USERCHANNEL is used, unregister the suspend notifier when
binding and register when releasing. The userchannel socket should be
left alone after open is completed.

Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-07-21 17:05:58 -07:00
Abhishek Pandit-Subedi
0acef50ba3 Bluetooth: Fix index added after unregister
When a userchannel socket is released, we should check whether the hdev
is already unregistered before sending out an IndexAdded.

Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-07-21 17:05:42 -07:00
Schspa Shi
877afadad2 Bluetooth: When HCI work queue is drained, only queue chained work
The HCI command, event, and data packet processing workqueue is drained
to avoid deadlock in commit
76727c02c1 ("Bluetooth: Call drain_workqueue() before resetting state").

There is another delayed work, which will queue command to this drained
workqueue. Which results in the following error report:

Bluetooth: hci2: command 0x040f tx timeout
WARNING: CPU: 1 PID: 18374 at kernel/workqueue.c:1438 __queue_work+0xdad/0x1140
Workqueue: events hci_cmd_timeout
RIP: 0010:__queue_work+0xdad/0x1140
RSP: 0000:ffffc90002cffc60 EFLAGS: 00010093
RAX: 0000000000000000 RBX: ffff8880b9d3ec00 RCX: 0000000000000000
RDX: ffff888024ba0000 RSI: ffffffff814e048d RDI: ffff8880b9d3ec08
RBP: 0000000000000008 R08: 0000000000000000 R09: 00000000b9d39700
R10: ffffffff814f73c6 R11: 0000000000000000 R12: ffff88807cce4c60
R13: 0000000000000000 R14: ffff8880796d8800 R15: ffff8880796d8800
FS:  0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c0174b4000 CR3: 000000007cae9000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ? queue_work_on+0xcb/0x110
 ? lockdep_hardirqs_off+0x90/0xd0
 queue_work_on+0xee/0x110
 process_one_work+0x996/0x1610
 ? pwq_dec_nr_in_flight+0x2a0/0x2a0
 ? rwlock_bug.part.0+0x90/0x90
 ? _raw_spin_lock_irq+0x41/0x50
 worker_thread+0x665/0x1080
 ? process_one_work+0x1610/0x1610
 kthread+0x2e9/0x3a0
 ? kthread_complete_and_exit+0x40/0x40
 ret_from_fork+0x1f/0x30
 </TASK>

To fix this, we can add a new HCI_DRAIN_WQ flag, and don't queue the
timeout workqueue while command workqueue is draining.

Fixes: 76727c02c1 ("Bluetooth: Call drain_workqueue() before resetting state")
Reported-by: syzbot+63bed493aebbf6872647@syzkaller.appspotmail.com
Signed-off-by: Schspa Shi <schspa@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-07-21 17:05:22 -07:00
Alain Michaud
629f66aaca Bluetooth: clear the temporary linkkey in hci_conn_cleanup
If a hardware error occurs and the connections are flushed without a
disconnection_complete event being signaled, the temporary linkkeys are
not flushed.

This change ensures that any outstanding flushable linkkeys are flushed
when the connection are flushed from the hash table.

Additionally, this also makes use of test_and_clear_bit to avoid
multiple attempts to delete the link key that's already been flushed.

Signed-off-by: Alain Michaud <alainm@chromium.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-07-21 17:04:53 -07:00
Jakub Kicinski
6e0e846ee2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-21 13:03:39 -07:00
Jakub Kicinski
602ae008ab Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next:

1) Simplify nf_ct_get_tuple(), from Jackie Liu.

2) Add format to request_module() call, from Bill Wendling.

3) Add /proc/net/stats/nf_flowtable to monitor in-flight pending
   hardware offload objects to be processed, from Vlad Buslov.

4) Missing rcu annotation and accessors in the netfilter tree,
   from Florian Westphal.

5) Merge h323 conntrack helper nat hooks into single object,
   also from Florian.

6) A batch of update to fix sparse warnings treewide,
   from Florian Westphal.

7) Move nft_cmp_fast_mask() where it used, from Florian.

8) Missing const in nf_nat_initialized(), from James Yonan.

9) Use bitmap API for Maglev IPVS scheduler, from Christophe Jaillet.

10) Use refcount_inc instead of _inc_not_zero in flowtable,
    from Florian Westphal.

11) Remove pr_debug in xt_TPROXY, from Nathan Cancellor.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: xt_TPROXY: remove pr_debug invocations
  netfilter: flowtable: prefer refcount_inc
  netfilter: ipvs: Use the bitmap API to allocate bitmaps
  netfilter: nf_nat: in nf_nat_initialized(), use const struct nf_conn *
  netfilter: nf_tables: move nft_cmp_fast_mask to where its used
  netfilter: nf_tables: use correct integer types
  netfilter: nf_tables: add and use BE register load-store helpers
  netfilter: nf_tables: use the correct get/put helpers
  netfilter: x_tables: use correct integer types
  netfilter: nfnetlink: add missing __be16 cast
  netfilter: nft_set_bitmap: Fix spelling mistake
  netfilter: h323: merge nat hook pointers into one
  netfilter: nf_conntrack: use rcu accessors where needed
  netfilter: nf_conntrack: add missing __rcu annotations
  netfilter: nf_flow_table: count pending offload workqueue tasks
  net/sched: act_ct: set 'net' pointer when creating new nf_flow_table
  netfilter: conntrack: use correct format characters
  netfilter: conntrack: use fallthrough to cleanup
====================

Link: https://lore.kernel.org/r/20220720230754.209053-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-20 18:05:51 -07:00
Justin Stitt
aa8c7cdbae netfilter: xt_TPROXY: remove pr_debug invocations
pr_debug calls are no longer needed in this file.

Pablo suggested "a patch to remove these pr_debug calls". This patch has
some other beneficial collateral as it also silences multiple Clang
-Wformat warnings that were present in the pr_debug calls.

diff from v1 -> v2:
* converted if statement one-liner style
* x == NULL is now !x

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-21 00:56:00 +02:00
Florian Westphal
f02e7dc4cf netfilter: flowtable: prefer refcount_inc
With refcount_inc_not_zero, we'd also need a smp_rmb or similar,
followed by a test of the CONFIRMED bit.

However, the ct pointer is taken from skb->_nfct, its refcount must
not be 0 (else, we'd already have a use-after-free bug).

Use refcount_inc() instead to clarify the ct refcount is expected to
be at least 1.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-21 00:55:39 +02:00
Christophe JAILLET
5787db7c90 netfilter: ipvs: Use the bitmap API to allocate bitmaps
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them.

It is less verbose and it improves the semantic.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-07-21 00:55:39 +02:00
Oz Shlomo
c0f47c2822 net/sched: cls_api: Fix flow action initialization
The cited commit refactored the flow action initialization sequence to
use an interface method when translating tc action instances to flow
offload objects. The refactored version skips the initialization of the
generic flow action attributes for tc actions, such as pedit, that allocate
more than one offload entry. This can cause potential issues for drivers
mapping flow action ids.

Populate the generic flow action fields for all the flow action entries.

Fixes: c54e1d920f ("flow_offload: add ops to tc_action_ops for flow action setup")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>

----
v1 -> v2:
 - coalese the generic flow action fields initialization to a single loop
Reviewed-by: Baowen Zheng <baowen.zheng@corigine.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:54:27 +01:00
Kuniyuki Iwashima
a11e5b3e7a tcp: Fix data-races around sysctl_tcp_max_reordering.
While reading sysctl_tcp_max_reordering, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: dca145ffaa ("tcp: allow for bigger reordering level")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:14:50 +01:00
Kuniyuki Iwashima
2d17d9c738 tcp: Fix a data-race around sysctl_tcp_abort_on_overflow.
While reading sysctl_tcp_abort_on_overflow, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:14:50 +01:00
Kuniyuki Iwashima
0b484c9191 tcp: Fix a data-race around sysctl_tcp_rfc1337.
While reading sysctl_tcp_rfc1337, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:14:50 +01:00
Kuniyuki Iwashima
4e08ed41cb tcp: Fix a data-race around sysctl_tcp_stdurg.
While reading sysctl_tcp_stdurg, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:14:50 +01:00
Kuniyuki Iwashima
1a63cb91f0 tcp: Fix a data-race around sysctl_tcp_retrans_collapse.
While reading sysctl_tcp_retrans_collapse, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:14:50 +01:00
Kuniyuki Iwashima
4845b5713a tcp: Fix data-races around sysctl_tcp_slow_start_after_idle.
While reading sysctl_tcp_slow_start_after_idle, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 35089bb203 ("[TCP]: Add tcp_slow_start_after_idle sysctl.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:14:50 +01:00
Kuniyuki Iwashima
7c6f2a86ca tcp: Fix a data-race around sysctl_tcp_thin_linear_timeouts.
While reading sysctl_tcp_thin_linear_timeouts, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: 36e31b0af5 ("net: TCP thin linear timeouts")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:14:50 +01:00
Kuniyuki Iwashima
e7d2ef837e tcp: Fix data-races around sysctl_tcp_recovery.
While reading sysctl_tcp_recovery, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 4f41b1c58a ("tcp: use RACK to detect losses")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:14:50 +01:00
Kuniyuki Iwashima
52e65865de tcp: Fix a data-race around sysctl_tcp_early_retrans.
While reading sysctl_tcp_early_retrans, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: eed530b6c6 ("tcp: early retransmit")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:14:49 +01:00
Kuniyuki Iwashima
3666f666e9 tcp: Fix data-races around sysctl knobs related to SYN option.
While reading these knobs, they can be changed concurrently.
Thus, we need to add READ_ONCE() to their readers.

  - tcp_sack
  - tcp_window_scaling
  - tcp_timestamps

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:14:49 +01:00
Kuniyuki Iwashima
9b55c20f83 ip: Fix data-races around sysctl_ip_prot_sock.
sysctl_ip_prot_sock is accessed concurrently, and there is always a chance
of data-race.  So, all readers and writers need some basic protection to
avoid load/store-tearing.

Fixes: 4548b683b7 ("Introduce a sysctl that modifies the value of PROT_SOCK.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:14:49 +01:00
Kuniyuki Iwashima
8895a9c2ac ipv4: Fix data-races around sysctl_fib_multipath_hash_fields.
While reading sysctl_fib_multipath_hash_fields, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: ce5c9c20d3 ("ipv4: Add a sysctl to control multipath hash fields")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:14:49 +01:00
Kuniyuki Iwashima
7998c12a08 ipv4: Fix data-races around sysctl_fib_multipath_hash_policy.
While reading sysctl_fib_multipath_hash_policy, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: bf4e0a3db9 ("net: ipv4: add support for ECMP hash policy choice")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:14:49 +01:00
Kuniyuki Iwashima
87507bcb4f ipv4: Fix a data-race around sysctl_fib_multipath_use_neigh.
While reading sysctl_fib_multipath_use_neigh, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its reader.

Fixes: a6db4494d2 ("net: ipv4: Consider failed nexthops in multipath routes")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:14:49 +01:00
David S. Miller
ef5621758a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2022-07-20

1) Fix a policy refcount imbalance in xfrm_bundle_lookup.
   From Hangyu Hua.

2) Fix some clang -Wformat warnings.
   Justin Stitt
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-20 10:11:58 +01:00
Jakub Kicinski
7f9eee196e Merge branch 'io_uring-zerocopy-send' of git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux
Pavel Begunkov says:

====================
io_uring zerocopy send

The patchset implements io_uring zerocopy send. It works with both registered
and normal buffers, mixing is allowed but not recommended. Apart from usual
request completions, just as with MSG_ZEROCOPY, io_uring separately notifies
the userspace when buffers are freed and can be reused (see API design below),
which is delivered into io_uring's Completion Queue. Those "buffer-free"
notifications are not necessarily per request, but the userspace has control
over it and should explicitly attaching a number of requests to a single
notification. The series also adds some internal optimisations when used with
registered buffers like removing page referencing.

From the kernel networking perspective there are two main changes. The first
one is passing ubuf_info into the network layer from io_uring (inside of an
in kernel struct msghdr). This allows extra optimisations, e.g. ubuf_info
caching on the io_uring side, but also helps to avoid cross-referencing
and synchronisation problems. The second part is an optional optimisation
removing page referencing for requests with registered buffers.

Benchmarking UDP with an optimised version of the selftest (see [1]), which
sends a bunch of requests, waits for completions and repeats. "+ flush" column
posts one additional "buffer-free" notification per request, and just "zc"
doesn't post buffer notifications at all.

NIC (requests / second):
IO size | non-zc    | zc             | zc + flush
4000    | 495134    | 606420 (+22%)  | 558971 (+12%)
1500    | 551808    | 577116 (+4.5%) | 565803 (+2.5%)
1000    | 584677    | 592088 (+1.2%) | 560885 (-4%)
600     | 596292    | 598550 (+0.4%) | 555366 (-6.7%)

dummy (requests / second):
IO size | non-zc    | zc             | zc + flush
8000    | 1299916   | 2396600 (+84%) | 2224219 (+71%)
4000    | 1869230   | 2344146 (+25%) | 2170069 (+16%)
1200    | 2071617   | 2361960 (+14%) | 2203052 (+6%)
600     | 2106794   | 2381527 (+13%) | 2195295 (+4%)

Previously it also brought a massive performance speedup compared to the
msg_zerocopy tool (see [3]), which is probably not super interesting. There
is also an additional bunch of refcounting optimisations that was omitted from
the series for simplicity and as they don't change the picture drastically,
they will be sent as follow up, as well as flushing optimisations closing the
performance gap b/w two last columns.

For TCP on localhost (with hacks enabling localhost zerocopy) and including
additional overhead for receive:

IO size | non-zc    | zc
1200    | 4174      | 4148
4096    | 7597      | 11228

Using a real NIC 1200 bytes, zc is worse than non-zc ~5-10%, maybe the
omitted optimisations will somewhat help, should look better for 4000,
but couldn't test properly because of setup problems.

Links:

  liburing (benchmark + tests):
  [1] https://github.com/isilence/liburing/tree/zc_v4

  kernel repo:
  [2] https://github.com/isilence/linux/tree/zc_v4

  RFC v1:
  [3] https://lore.kernel.org/io-uring/cover.1638282789.git.asml.silence@gmail.com/

  RFC v2:
  https://lore.kernel.org/io-uring/cover.1640029579.git.asml.silence@gmail.com/

  Net patches based:
  git@github.com:isilence/linux.git zc_v4-net-base
  or
  https://github.com/isilence/linux/tree/zc_v4-net-base

API design overview:

  The series introduces an io_uring concept of notifactors. From the userspace
  perspective it's an entity to which it can bind one or more requests and then
  requesting to flush it. Flushing a notifier makes it impossible to attach new
  requests to it, and instructs the notifier to post a completion once all
  requests attached to it are completed and the kernel doesn't need the buffers
  anymore.

  Notifications are stored in notification slots, which should be registered as
  an array in io_uring. Each slot stores only one notifier at any particular
  moment. Flushing removes it from the slot and the slot automatically replaces
  it with a new notifier. All operations with notifiers are done by specifying
  an index of a slot it's currently in.

  When registering a notification the userspace specifies a u64 tag for each
  slot, which will be copied in notification completion entries as
  cqe::user_data. cqe::res is 0 and cqe::flags is equal to wrap around u32
  sequence number counting notifiers of a slot.

====================

Link: https://lore.kernel.org/r/cover.1657643355.git.asml.silence@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19 14:22:41 -07:00
Pavel Begunkov
eb315a7d13 tcp: support externally provided ubufs
Teach tcp how to use external ubuf_info provided in msghdr and
also prepare it for managed frags by sprinkling
skb_zcopy_downgrade_managed() when it could mix managed and not managed
frags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19 14:20:54 -07:00
Pavel Begunkov
1fd3ae8c90 ipv6/udp: support externally provided ubufs
Teach ipv6/udp how to use external ubuf_info provided in msghdr and
also prepare it for managed frags by sprinkling
skb_zcopy_downgrade_managed() when it could mix managed and not managed
frags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19 14:20:54 -07:00
Pavel Begunkov
c445f31b3c ipv4/udp: support externally provided ubufs
Teach ipv4/udp how to use external ubuf_info provided in msghdr and
also prepare it for managed frags by sprinkling
skb_zcopy_downgrade_managed() when it could mix managed and not managed
frags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19 14:20:54 -07:00
Pavel Begunkov
753f1ca4e1 net: introduce managed frags infrastructure
Some users like io_uring can do page pinning more efficiently, so we
want a way to delegate referencing to other subsystems. For that add
a new flag called SKBFL_MANAGED_FRAG_REFS. When set, skb doesn't hold
page references and upper layers are responsivle to managing page
lifetime.

It's allowed to convert skbs from managed to normal by calling
skb_zcopy_downgrade_managed(). The function will take all needed
page references and clear the flag. It's needed, for instance,
to avoid mixing managed modes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19 14:20:54 -07:00
David Ahern
ebe73a284f net: Allow custom iter handler in msghdr
Add support for custom iov_iter handling to msghdr. The idea is that
in-kernel subsystems want control over how an SG is split.

Signed-off-by: David Ahern <dsahern@kernel.org>
[pavel: move callback into msghdr]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19 14:20:54 -07:00
Pavel Begunkov
7c701d92b2 skbuff: carry external ubuf_info in msghdr
Make possible for network in-kernel callers like io_uring to pass in a
custom ubuf_info by setting it in a new field of struct msghdr.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19 14:20:42 -07:00
Zhengchao Shao
fd18942244 bpf: Don't redirect packets with invalid pkt_len
Syzbot found an issue [1]: fq_codel_drop() try to drop a flow whitout any
skbs, that is, the flow->head is null.
The root cause, as the [2] says, is because that bpf_prog_test_run_skb()
run a bpf prog which redirects empty skbs.
So we should determine whether the length of the packet modified by bpf
prog or others like bpf_prog_test is valid before forwarding it directly.

LINK: [1] https://syzkaller.appspot.com/bug?id=0b84da80c2917757915afa89f7738a9d16ec96c5
LINK: [2] https://www.spinics.net/lists/netdev/msg777503.html

Reported-by: syzbot+7a12909485b94426aceb@syzkaller.appspotmail.com
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220715115559.139691-1-shaozhengchao@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-07-19 09:50:54 -07:00
Vladimir Oltean
1699b4d502 net: dsa: fix NULL pointer dereference in dsa_port_reset_vlan_filtering
The "ds" iterator variable used in dsa_port_reset_vlan_filtering() ->
dsa_switch_for_each_port() overwrites the "dp" received as argument,
which is later used to call dsa_port_vlan_filtering() proper.

As a result, switches which do enter that code path (the ones with
vlan_filtering_is_global=true) will dereference an invalid dp in
dsa_port_reset_vlan_filtering() after leaving a VLAN-aware bridge.

Use a dedicated "other_dp" iterator variable to avoid this from
happening.

Fixes: d0004a020b ("net: dsa: remove the "dsa_to_port in a loop" antipattern from the core")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 20:14:23 -07:00
Vladimir Oltean
4db2a5ef4c net: dsa: fix dsa_port_vlan_filtering when global
The blamed refactoring commit changed a "port" iterator with "other_dp",
but still looked at the slave_dev of the dp outside the loop, instead of
other_dp->slave from the loop.

As a result, dsa_port_vlan_filtering() would not call
dsa_slave_manage_vlan_filtering() except for the port in cause, and not
for all switch ports as expected.

Fixes: d0004a020b ("net: dsa: remove the "dsa_to_port in a loop" antipattern from the core")
Reported-by: Lucian Banu <Lucian.Banu@westermo.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 20:14:23 -07:00
Jiri Pirko
f655dacb59 net: devlink: remove unused locked functions
Remove locked versions of functions that are no longer used by anyone.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 20:10:48 -07:00
Jiri Pirko
012ec02ae4 netdevsim: convert driver to use unlocked devlink API during init/fini
Prepare for devlink reload being called with devlink->lock held and
convert the netdevsim driver to use unlocked devlink API during init and
fini flows. Take devl_lock() in reload_down() and reload_up() ops in the
meantime before reload cmd is converted to take the lock itself.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 20:10:48 -07:00
Jiri Pirko
eb0e9fa2c6 net: devlink: add unlocked variants of devlink_region_create/destroy() functions
Add unlocked variants of devlink_region_create/destroy() functions
to be used in drivers called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 20:10:48 -07:00
Jiri Pirko
70a2ff8936 net: devlink: add unlocked variants of devlink_dpipe*() functions
Add unlocked variants of devlink_dpipe*() functions to be used
in drivers called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 20:10:47 -07:00
Jiri Pirko
755cfa69c4 net: devlink: add unlocked variants of devlink_sb*() functions
Add unlocked variants of devlink_sb*() functions to be used
in drivers called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 20:10:47 -07:00
Jiri Pirko
c223d6a4bf net: devlink: add unlocked variants of devlink_resource*() functions
Add unlocked variants of devlink_resource*() functions to be used
in drivers called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 20:10:46 -07:00
Jiri Pirko
852e85a704 net: devlink: add unlocked variants of devling_trap*() functions
Add unlocked variants of devl_trap*() functions to be used in drivers
called-in with devlink->lock held.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 20:10:46 -07:00
Moshe Shemesh
e26fde2f5b net: devlink: avoid false DEADLOCK warning reported by lockdep
Add a lock_class_key per devlink instance to avoid DEADLOCK warning by
lockdep, while locking more than one devlink instance in driver code,
for example in opening VFs flow.

Kernel log:
[  101.433802] ============================================
[  101.433803] WARNING: possible recursive locking detected
[  101.433810] 5.19.0-rc1+ #35 Not tainted
[  101.433812] --------------------------------------------
[  101.433813] bash/892 is trying to acquire lock:
[  101.433815] ffff888127bfc2f8 (&devlink->lock){+.+.}-{3:3}, at: probe_one+0x3c/0x690 [mlx5_core]
[  101.433909]
               but task is already holding lock:
[  101.433910] ffff888118f4c2f8 (&devlink->lock){+.+.}-{3:3}, at: mlx5_core_sriov_configure+0x62/0x280 [mlx5_core]
[  101.433989]
               other info that might help us debug this:
[  101.433990]  Possible unsafe locking scenario:

[  101.433991]        CPU0
[  101.433991]        ----
[  101.433992]   lock(&devlink->lock);
[  101.433993]   lock(&devlink->lock);
[  101.433995]
                *** DEADLOCK ***

[  101.433996]  May be due to missing lock nesting notation

[  101.433996] 6 locks held by bash/892:
[  101.433998]  #0: ffff88810eb50448 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0xf3/0x1d0
[  101.434009]  #1: ffff888114777c88 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x20d/0x520
[  101.434017]  #2: ffff888102b58660 (kn->active#231){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x230/0x520
[  101.434023]  #3: ffff888102d70198 (&dev->mutex){....}-{3:3}, at: sriov_numvfs_store+0x132/0x310
[  101.434031]  #4: ffff888118f4c2f8 (&devlink->lock){+.+.}-{3:3}, at: mlx5_core_sriov_configure+0x62/0x280 [mlx5_core]
[  101.434108]  #5: ffff88812adce198 (&dev->mutex){....}-{3:3}, at: __device_attach+0x76/0x430
[  101.434116]
               stack backtrace:
[  101.434118] CPU: 5 PID: 892 Comm: bash Not tainted 5.19.0-rc1+ #35
[  101.434120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  101.434130] Call Trace:
[  101.434133]  <TASK>
[  101.434135]  dump_stack_lvl+0x57/0x7d
[  101.434145]  __lock_acquire.cold+0x1df/0x3e7
[  101.434151]  ? register_lock_class+0x1880/0x1880
[  101.434157]  lock_acquire+0x1c1/0x550
[  101.434160]  ? probe_one+0x3c/0x690 [mlx5_core]
[  101.434229]  ? lockdep_hardirqs_on_prepare+0x400/0x400
[  101.434232]  ? __xa_alloc+0x1ed/0x2d0
[  101.434236]  ? ksys_write+0xf3/0x1d0
[  101.434239]  __mutex_lock+0x12c/0x14b0
[  101.434243]  ? probe_one+0x3c/0x690 [mlx5_core]
[  101.434312]  ? probe_one+0x3c/0x690 [mlx5_core]
[  101.434380]  ? devlink_alloc_ns+0x11b/0x910
[  101.434385]  ? mutex_lock_io_nested+0x1320/0x1320
[  101.434388]  ? lockdep_init_map_type+0x21a/0x7d0
[  101.434391]  ? lockdep_init_map_type+0x21a/0x7d0
[  101.434393]  ? __init_swait_queue_head+0x70/0xd0
[  101.434397]  probe_one+0x3c/0x690 [mlx5_core]
[  101.434467]  pci_device_probe+0x1b4/0x480
[  101.434471]  really_probe+0x1e0/0xaa0
[  101.434474]  __driver_probe_device+0x219/0x480
[  101.434478]  driver_probe_device+0x49/0x130
[  101.434481]  __device_attach_driver+0x1b8/0x280
[  101.434484]  ? driver_allows_async_probing+0x140/0x140
[  101.434487]  bus_for_each_drv+0x123/0x1a0
[  101.434489]  ? bus_for_each_dev+0x1a0/0x1a0
[  101.434491]  ? lockdep_hardirqs_on_prepare+0x286/0x400
[  101.434494]  ? trace_hardirqs_on+0x2d/0x100
[  101.434498]  __device_attach+0x1a3/0x430
[  101.434501]  ? device_driver_attach+0x1e0/0x1e0
[  101.434503]  ? pci_bridge_d3_possible+0x1e0/0x1e0
[  101.434506]  ? pci_create_resource_files+0xeb/0x190
[  101.434511]  pci_bus_add_device+0x6c/0xa0
[  101.434514]  pci_iov_add_virtfn+0x9e4/0xe00
[  101.434517]  ? trace_hardirqs_on+0x2d/0x100
[  101.434521]  sriov_enable+0x64a/0xca0
[  101.434524]  ? pcibios_sriov_disable+0x10/0x10
[  101.434528]  mlx5_core_sriov_configure+0xab/0x280 [mlx5_core]
[  101.434602]  sriov_numvfs_store+0x20a/0x310
[  101.434605]  ? sriov_totalvfs_show+0xc0/0xc0
[  101.434608]  ? sysfs_file_ops+0x170/0x170
[  101.434611]  ? sysfs_file_ops+0x117/0x170
[  101.434614]  ? sysfs_file_ops+0x170/0x170
[  101.434616]  kernfs_fop_write_iter+0x348/0x520
[  101.434619]  new_sync_write+0x2e5/0x520
[  101.434621]  ? new_sync_read+0x520/0x520
[  101.434624]  ? lock_acquire+0x1c1/0x550
[  101.434626]  ? lockdep_hardirqs_on_prepare+0x400/0x400
[  101.434630]  vfs_write+0x5cb/0x8d0
[  101.434633]  ksys_write+0xf3/0x1d0
[  101.434635]  ? __x64_sys_read+0xb0/0xb0
[  101.434638]  ? lockdep_hardirqs_on_prepare+0x286/0x400
[  101.434640]  ? syscall_enter_from_user_mode+0x1d/0x50
[  101.434643]  do_syscall_64+0x3d/0x90
[  101.434647]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  101.434650] RIP: 0033:0x7f5ff536b2f7
[  101.434658] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f
1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  101.434661] RSP: 002b:00007ffd9ea85d58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  101.434664] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f5ff536b2f7
[  101.434666] RDX: 0000000000000002 RSI: 000055c4c279e230 RDI: 0000000000000001
[  101.434668] RBP: 000055c4c279e230 R08: 000000000000000a R09: 0000000000000001
[  101.434669] R10: 000055c4c283cbf0 R11: 0000000000000246 R12: 0000000000000002
[  101.434670] R13: 00007f5ff543d500 R14: 0000000000000002 R15: 00007f5ff543d700
[  101.434673]  </TASK>

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 20:10:46 -07:00
Pavel Begunkov
2e07a521e1 skbuff: add SKBFL_DONT_ORPHAN flag
We don't want to list every single ubuf_info callback in
skb_orphan_frags(), add a flag controlling the behaviour.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 19:58:45 -07:00
Pavel Begunkov
1b4b2b09d4 skbuff: don't mix ubuf_info from different sources
We should not append MSG_ZEROCOPY requests to skbuff with non
MSG_ZEROCOPY ubuf_info, they might be not compatible.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 19:58:45 -07:00
Pavel Begunkov
773ba4fe91 ipv6: avoid partial copy for zc
Even when zerocopy transmission is requested and possible,
__ip_append_data() will still copy a small chunk of data just because it
allocated some extra linear space (e.g. 128 bytes). It wastes CPU cycles
on copy and iter manipulations and also misalignes potentially aligned
data. Avoid such copies. And as a bonus we can allocate smaller skb.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 19:58:29 -07:00
Pavel Begunkov
8eb77cc739 ipv4: avoid partial copy for zc
Even when zerocopy transmission is requested and possible,
__ip_append_data() will still copy a small chunk of data just because it
allocated some extra linear space (e.g. 148 bytes). It wastes CPU cycles
on copy and iter manipulations and also misalignes potentially aligned
data. Avoid such copies. And as a bonus we can allocate smaller skb.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18 19:58:03 -07:00
Tetsuo Handa
3598cb6e18 wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()
lockdep complains use of uninitialized spinlock at ieee80211_do_stop() [1],
for commit f856373e2f ("wifi: mac80211: do not wake queues on a vif
that is being stopped") guards clear_bit() using fq.lock even before
fq_init() from ieee80211_txq_setup_flows() initializes this spinlock.

According to discussion [2], Toke was not happy with expanding usage of
fq.lock. Since __ieee80211_wake_txqs() is called under RCU read lock, we
can instead use synchronize_rcu() for flushing ieee80211_wake_txqs().

Link: https://syzkaller.appspot.com/bug?extid=eceab52db7c4b961e9d6 [1]
Link: https://lkml.kernel.org/r/874k0zowh2.fsf@toke.dk [2]
Reported-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: f856373e2f ("wifi: mac80211: do not wake queues on a vif that is being stopped")
Tested-by: syzbot <syzbot+eceab52db7c4b961e9d6@syzkaller.appspotmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@kernel.org>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/9cc9b81d-75a3-3925-b612-9d0ad3cab82b@I-love.SAKURA.ne.jp
2022-07-18 15:01:14 +03:00
Kuniyuki Iwashima
021266ec64 tcp: Fix data-races around sysctl_tcp_fastopen_blackhole_timeout.
While reading sysctl_tcp_fastopen_blackhole_timeout, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: cf1ef3f071 ("net/tcp_fastopen: Disable active side TFO in certain scenarios")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima
5a54213318 tcp: Fix data-races around sysctl_tcp_fastopen.
While reading sysctl_tcp_fastopen, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 2100c8d2d9 ("net-tcp: Fast Open base")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima
79539f3474 tcp: Fix data-races around sysctl_max_syn_backlog.
While reading sysctl_max_syn_backlog, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima
cbfc649558 tcp: Fix a data-race around sysctl_tcp_tw_reuse.
While reading sysctl_tcp_tw_reuse, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima
39e24435a7 tcp: Fix data-races around some timeout sysctl knobs.
While reading these sysctl knobs, they can be changed concurrently.
Thus, we need to add READ_ONCE() to their readers.

  - tcp_retries1
  - tcp_retries2
  - tcp_orphan_retries
  - tcp_fin_timeout

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima
46778cd16e tcp: Fix data-races around sysctl_tcp_reordering.
While reading sysctl_tcp_reordering, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima
4177f54589 tcp: Fix data-races around sysctl_tcp_migrate_req.
While reading sysctl_tcp_migrate_req, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: f9ac779f88 ("net: Introduce net.ipv4.tcp_migrate_req.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima
f2e383b5bb tcp: Fix data-races around sysctl_tcp_syncookies.
While reading sysctl_tcp_syncookies, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima
20a3b1c0f6 tcp: Fix data-races around sysctl_tcp_syn(ack)?_retries.
While reading sysctl_tcp_syn(ack)?_retries, they can be changed
concurrently.  Thus, we need to add READ_ONCE() to their readers.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18 12:21:54 +01:00
Kuniyuki Iwashima
f2f316e287 tcp: Fix data-races around keepalive sysctl knobs.
While reading sysctl_tcp_keepalive_(time|probes|intvl), they can be changed
concurrently.  Thus, we need to add READ_ONCE() to their readers.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18 12:21:54 +01:00