Commit Graph

77463 Commits

Author SHA1 Message Date
Johannes Berg
9fd171a71b wifi: cfg80211: refactor regulatory beaconing checking
There are two functions exported now, with different settings,
refactor to just export a single function that take a struct
with different settings. This will make it easier to add more
parameters.

Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://msgid.link/20240523120945.d44c34dadfc2.I59b4403108e0dbf7fc6ae8f7522e1af520cffb1c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-06-12 13:04:25 +02:00
Johannes Berg
459662e83d wifi: cfg80211: refactor 6 GHz AP power type parsing
Add cfg80211_get_6ghz_power_type() to parse the 6 GHz
power type from a given set of elements, which is now
only inside cfg80211_6ghz_power_type_valid().

Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://msgid.link/20240523120945.84cdffd94085.I76f434ee12552e8be91273f3b2d776179eaa62f1@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-06-12 13:04:25 +02:00
Johannes Berg
ca41bfe367 wifi: mac80211: refactor channel checks
To later introduce an override for VLP APs being allowed despite
NO-IR flags, which is somewhat similar in construction to being
allowed to monitor on disabled channels, refactor the code that
checks channel flags to have not a 'monitor' argument but a set
of 'permitting' flags that permit the operation without checking
for 'prohibited' flags.

Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://msgid.link/20240523120945.3da28ded4a50.I90cffc633d0510293d511f60097dc75e719b55f0@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-06-12 13:04:25 +02:00
Johannes Berg
f3269b7912 wifi: nl80211: expose can-monitor channel property
It may be possible to monitor on disabled channels per the
can-monitor flag, but evidently I forgot to expose that out
to userspace. Fix that.

Fixes: a110a3b791 ("wifi: cfg80211: optionally support monitor on disabled channels")
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://msgid.link/20240523120945.9a2c19a51e53.I50fa1b1a18b70f63a5095131ac23dc2e71f3d426@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-06-12 13:04:24 +02:00
Baochen Qiang
bcbd771cd5 wifi: cfg80211: handle 2x996 RU allocation in cfg80211_calculate_bitrate_he()
Currently NL80211_RATE_INFO_HE_RU_ALLOC_2x996 is not handled in
cfg80211_calculate_bitrate_he(), leading to below warning:

kernel: invalid HE MCS: bw:6, ru:6
kernel: WARNING: CPU: 0 PID: 2312 at net/wireless/util.c:1501 cfg80211_calculate_bitrate_he+0x22b/0x270 [cfg80211]

Fix it by handling 2x996 RU allocation in the same way as 160 MHz bandwidth.

Fixes: c4cbaf7973 ("cfg80211: Add support for HE")
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Link: https://msgid.link/20240606020653.33205-3-quic_bqiang@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-06-12 13:04:24 +02:00
Baochen Qiang
9ee0d44f05 wifi: cfg80211: fix typo in cfg80211_calculate_bitrate_he()
rates_996 is mistakenly written as rates_969, fix it.

Fixes: c4cbaf7973 ("cfg80211: Add support for HE")
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Link: https://msgid.link/20240606020653.33205-2-quic_bqiang@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-06-12 13:04:24 +02:00
Breno Leitao
c018411d35 wifi: mac80211: Remove generic .ndo_get_stats64
Commit 3e2f544dd8 ("net: get stats64 if device if driver is
configured") moved the callback to dev_get_tstats64() to net core, so,
unless the driver is doing some custom stats collection, it does not
need to set .ndo_get_stats64.

Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it
doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64
function pointer.

In this driver specifically, .ndo_get_stats64 basically points to
dev_fetch_sw_netstats(). Now it will point to dev_get_tstats64(), which
calls netdev_stats_to_stats64() and dev_fetch_sw_netstats().
netdev_stats_to_stats64() seems irrelevant for this driver.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://msgid.link/20240607102045.235071-2-leitao@debian.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-06-12 13:04:24 +02:00
Breno Leitao
7b7890f386 wifi: mac80211: Move stats allocation to core
With commit 34d21de99c ("net: Move {l,t,d}stats allocation to core and
convert veth & vrf"), stats allocation could be done on net core instead
of this driver.

With this new approach, the driver doesn't have to bother with error
handling (allocation failure checking, making sure free happens in the
right spot, etc). This is core responsibility now.

Move mac80211 driver to leverage the core allocation.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://msgid.link/20240607102045.235071-1-leitao@debian.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-06-12 13:04:24 +02:00
Johannes Berg
7f12e26a19 wifi: cfg80211: make hash table duplicates more survivable
Jiazi Li reported that they occasionally see hash table duplicates
as evidenced by the WARN_ON() in rb_insert_bss() in this code.  It
isn't clear how that happens, nor have I been able to reproduce it,
but if it does happen, the kernel crashes later, when it tries to
unhash the entry that's now not hashed.

Try to make this situation more survivable by removing the BSS from
the list(s) as well, that way it's fully leaked here (as had been
the intent in the hash insert error path), and no longer reachable
through the list(s) so it shouldn't be unhashed again later.

Link: https://lore.kernel.org/r/20231026013528.GA24122@Jiazi.Li
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://msgid.link/20240607181726.36835-2-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-06-12 13:04:24 +02:00
Johannes Berg
5f12dd57a0 wifi: mac80211: correct EHT EIRP TPE parsing
For the EHT EIRP transmit power envelope, the 320 MHz is in
the last octet, but if we've copied 4 octets (count == 3),
the next one is at index 4 not 5 (count + 2). Fix this, and
just hardcode the offset since count is always 3 here.

Fixes: 39dc8b8ea3 ("wifi: mac80211: pass parsed TPE data to drivers")
Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240612100533.f96c1e0fb758.I2f301c4341abb44dafd29128e7e32c66dc0e296d@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-06-12 12:25:54 +02:00
Jakub Kicinski
b1156532bc bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZmIsRAAKCRDbK58LschI
 g4SSAP0bkl6rPMn7zp1h+/l7hlvpp2aVOmasBTe8hIhAGUbluwD/TGq4sNsGgXFI
 i4tUtFRhw8pOjy2guy6526qyJvBs8wY=
 =WMhY
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2024-06-06

We've added 54 non-merge commits during the last 10 day(s) which contain
a total of 50 files changed, 1887 insertions(+), 527 deletions(-).

The main changes are:

1) Add a user space notification mechanism via epoll when a struct_ops
   object is getting detached/unregistered, from Kui-Feng Lee.

2) Big batch of BPF selftest refactoring for sockmap and BPF congctl
   tests, from Geliang Tang.

3) Add BTF field (type and string fields, right now) iterator support
   to libbpf instead of using existing callback-based approaches,
   from Andrii Nakryiko.

4) Extend BPF selftests for the latter with a new btf_field_iter
   selftest, from Alan Maguire.

5) Add new kfuncs for a generic, open-coded bits iterator,
   from Yafang Shao.

6) Fix BPF selftests' kallsyms_find() helper under kernels configured
   with CONFIG_LTO_CLANG_THIN, from Yonghong Song.

7) Remove a bunch of unused structs in BPF selftests,
   from David Alan Gilbert.

8) Convert test_sockmap section names into names understood by libbpf
   so it can deduce program type and attach type, from Jakub Sitnicki.

9) Extend libbpf with the ability to configure log verbosity
   via LIBBPF_LOG_LEVEL environment variable, from Mykyta Yatsenko.

10) Fix BPF selftests with regards to bpf_cookie and find_vma flakiness
    in nested VMs, from Song Liu.

11) Extend riscv32/64 JITs to introduce shift/add helpers to generate Zba
    optimization, from Xiao Wang.

12) Enable BPF programs to declare arrays and struct fields with kptr,
    bpf_rb_root, and bpf_list_head, from Kui-Feng Lee.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (54 commits)
  selftests/bpf: Drop useless arguments of do_test in bpf_tcp_ca
  selftests/bpf: Use start_test in test_dctcp in bpf_tcp_ca
  selftests/bpf: Use start_test in test_dctcp_fallback in bpf_tcp_ca
  selftests/bpf: Add start_test helper in bpf_tcp_ca
  selftests/bpf: Use connect_to_fd_opts in do_test in bpf_tcp_ca
  libbpf: Auto-attach struct_ops BPF maps in BPF skeleton
  selftests/bpf: Add btf_field_iter selftests
  selftests/bpf: Fix send_signal test with nested CONFIG_PARAVIRT
  libbpf: Remove callback-based type/string BTF field visitor helpers
  bpftool: Use BTF field iterator in btfgen
  libbpf: Make use of BTF field iterator in BTF handling code
  libbpf: Make use of BTF field iterator in BPF linker code
  libbpf: Add BTF field iterator
  selftests/bpf: Ignore .llvm.<hash> suffix in kallsyms_find()
  selftests/bpf: Fix bpf_cookie and find_vma in nested VM
  selftests/bpf: Test global bpf_list_head arrays.
  selftests/bpf: Test global bpf_rb_root arrays and fields in nested struct types.
  selftests/bpf: Test kptr arrays and kptrs in nested struct fields.
  bpf: limit the number of levels of a nested struct type.
  bpf: look into the types of the fields of a struct type recursively.
  ...
====================

Link: https://lore.kernel.org/r/20240606223146.23020-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-10 18:02:14 -07:00
Jakub Kicinski
93d4e8bb3f wireless-next patches for v6.11
The first "new features" pull request for v6.11 with changes both in
 stack and in drivers. Nothing out of ordinary, except that we have two
 conflicts this time:
 
 CONFLICT (content): Merge conflict in net/mac80211/cfg.c
 CONFLICT (content): Merge conflict in drivers/net/wireless/microchip/wilc1000/netdev.c
 
 Here are Stephen's resolutions for them:
 
 https://lore.kernel.org/all/20240531124415.05b25e7a@canb.auug.org.au/
 https://lore.kernel.org/all/20240603110023.23572803@canb.auug.org.au/
 
 Major changes:
 
 cfg80211/mac80211
 
 * parse Transmit Power Envelope (TPE) data in mac80211 instead of in drivers
 
 wilc1000
 
 * read MAC address during probe to make it visible to user space
 
 iwlwifi
 
 * bump FW API to 91 for BZ/SC devices
 
 * report 64-bit radiotap timestamp
 
 * Enable P2P low latency by default
 
 * handle Transmit Power Envelope (TPE) advertised by AP
 
 * start using guard()
 
 rtlwifi
 
 * RTL8192DU support
 
 ath12k
 
 * remove unsupported tx monitor handling
 
 * channel 2 in 6 GHz band support
 
 * Spatial Multiplexing Power Save (SMPS) in 6 GHz band support
 
 * multiple BSSID (MBSSID) and Enhanced Multi-BSSID Advertisements (EMA) support
 
 * dynamic VLAN support
 
 * add panic handler for resetting the firmware state
 
 ath10k
 
 * add qcom,no-msa-ready-indicator Device Tree property
 
 * LED support for various chipsets
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmZi07URHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZu3/QgAo7jyKgvpwMaNIVRLyfYCo0r3Q9wb7QPd
 QeRNsftYxlWpPTJ4+Y95aZupy91Ay+RaEQXbbtl7PMIiyQrs7wb4V4Iqzedkws3t
 DZsR5BitH+1BIGY0Omo0fiSB5HlWEwZGUj6inqlgKHpBtdIVTANSMjuwkdoMAV5y
 ZU57axIGToySvDbRlhJQW833Nnh4KnaseA+TtyfXSaBVerzbshkjBr0d9pMBMiH9
 irMQW5CW+7fbxp3OCNsKxX4eG6MFGmm/uP1hFmeYQi2qzUE4SddHMeV4I6oNKOrH
 vFB+ZVmYvOjJUYsNhlCUe6Vy+EKwvmfiDWwE1egelEkgozCixJXAAQ==
 =QT4C
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2024-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.11

The first "new features" pull request for v6.11 with changes both in
stack and in drivers. Nothing out of ordinary, except that we have
two conflicts this time:

net/mac80211/cfg.c
  https://lore.kernel.org/all/20240531124415.05b25e7a@canb.auug.org.au

drivers/net/wireless/microchip/wilc1000/netdev.c
  https://lore.kernel.org/all/20240603110023.23572803@canb.auug.org.au

Major changes:

cfg80211/mac80211
 * parse Transmit Power Envelope (TPE) data in mac80211 instead of in drivers

wilc1000
 * read MAC address during probe to make it visible to user space

iwlwifi
 * bump FW API to 91 for BZ/SC devices
 * report 64-bit radiotap timestamp
 * enable P2P low latency by default
 * handle Transmit Power Envelope (TPE) advertised by AP
 * start using guard()

rtlwifi
 * RTL8192DU support

ath12k
 * remove unsupported tx monitor handling
 * channel 2 in 6 GHz band support
 * Spatial Multiplexing Power Save (SMPS) in 6 GHz band support
 * multiple BSSID (MBSSID) and Enhanced Multi-BSSID Advertisements (EMA)
   support
 * dynamic VLAN support
 * add panic handler for resetting the firmware state

ath10k
 * add qcom,no-msa-ready-indicator Device Tree property
 * LED support for various chipsets

* tag 'wireless-next-2024-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (194 commits)
  wifi: ath12k: add hw_link_id in ath12k_pdev
  wifi: ath12k: add panic handler
  wifi: rtw89: chan: Use swap() in rtw89_swap_sub_entity()
  wifi: brcm80211: remove unused structs
  wifi: brcm80211: use sizeof(*pointer) instead of sizeof(type)
  wifi: ath12k: do not process consecutive RDDM event
  dt-bindings: net: wireless: ath11k: Drop "qcom,ipq8074-wcss-pil" from example
  wifi: ath12k: fix memory leak in ath12k_dp_rx_peer_frag_setup()
  wifi: rtlwifi: handle return value of usb init TX/RX
  wifi: rtlwifi: Enable the new rtl8192du driver
  wifi: rtlwifi: Add rtl8192du/sw.c
  wifi: rtlwifi: Constify rtl_hal_cfg.{ops,usb_interface_cfg} and rtl_priv.cfg
  wifi: rtlwifi: Add rtl8192du/dm.{c,h}
  wifi: rtlwifi: Add rtl8192du/fw.{c,h} and rtl8192du/led.{c,h}
  wifi: rtlwifi: Add rtl8192du/rf.{c,h}
  wifi: rtlwifi: Add rtl8192du/trx.{c,h}
  wifi: rtlwifi: Add rtl8192du/phy.{c,h}
  wifi: rtlwifi: Add rtl8192du/hw.{c,h}
  wifi: rtlwifi: Add new members to struct rtl_priv for RTL8192DU
  wifi: rtlwifi: Add rtl8192du/table.{c,h}
  ...

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

====================

Link: https://lore.kernel.org/r/20240607093517.41394C2BBFC@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-10 17:40:26 -07:00
Marek Behún
eef8e906ae net: dsa: update the unicast MAC address when changing conduit
When changing DSA user interface conduit while the user interface is up,
DSA exhibits different behavior in comparison to when the interface is
down. This different behavior concerns the primary unicast MAC address
stored in the port standalone FDB and in the conduit device UC database.

If we put a switch port down while changing the conduit with
  ip link set sw0p0 down
  ip link set sw0p0 type dsa conduit conduit1
  ip link set sw0p0 up
we delete the address in dsa_user_close() and install the (possibly
different) address in dsa_user_open().

But when changing the conduit on the fly, the old address is not
deleted and the new one is not installed.

Since we explicitly want to support live-changing the conduit, uninstall
the old address before calling dsa_port_assign_conduit() and install the
(possibly different) new address after the call.

Because conduit change might also trigger address change (the user
interface is supposed to inherit the conduit interface MAC address if no
address is defined in hardware (dp->mac is a zero address)), move the
eth_hw_addr_inherit() call from dsa_user_change_conduit() to
dsa_port_change_conduit(), just before installing the new address.

Although this is in theory a flaw in DSA core, it needs not be
backported, since there is currently no DSA driver that can be affected
by this. The only DSA driver that supports changing conduit is felix,
and, as explained by Vladimir Oltean [1]:

  There are 2 reasons why with felix the bug does not manifest itself.

  First is because both the 'ocelot' and the alternate 'ocelot-8021q'
  tagging protocols have the 'promisc_on_conduit = true' flag. So the
  unicast address doesn't have to be in the conduit's RX filter -
  neither the old or the new conduit.

  Second, dsa_user_host_uc_install() theoretically leaves behind host
  FDB entries installed towards the wrong (old) CPU port. But in
  felix_fdb_add(), we treat any FDB entry requested towards any CPU port
  as if it was a multicast FDB entry programmed towards _all_ CPU ports.
  For that reason, it is installed towards the port mask of the PGID_CPU
  port group ID:

	if (dsa_port_is_cpu(dp))
		port = PGID_CPU;

Therefore no Fixes tag for this change.

[1] https://lore.kernel.org/netdev/20240507201827.47suw4fwcjrbungy@skbuf/
Signed-off-by: Marek Behún <kabel@kernel.org>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-10 13:48:06 +01:00
Marek Behún
77f7541248 net: dsa: deduplicate code adding / deleting the port address to fdb
The sequence
  if (dsa_switch_supports_uc_filtering(ds))
    dsa_port_standalone_host_fdb_add(dp, addr, 0);
  if (!ether_addr_equal(addr, conduit->dev_addr))
    dev_uc_add(conduit, addr);
is executed both in dsa_user_open() and dsa_user_set_mac_addr().

Its reverse is executed both in dsa_user_close() and
dsa_user_set_mac_addr().

Refactor these sequences into new functions dsa_user_host_uc_install()
and dsa_user_host_uc_uninstall().

Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-10 13:48:06 +01:00
Jakub Kicinski
5fbf57a937 net: netlink: remove the cb_mutex "injection" from netlink core
Back in 2007, in commit af65bdfce9 ("[NETLINK]: Switch cb_lock spinlock
to mutex and allow to override it") netlink core was extended to allow
subsystems to replace the dump mutex lock with its own lock.

The mechanism was used by rtnetlink to take rtnl_lock but it isn't
sufficiently flexible for other users. Over the 17 years since
it was added no other user appeared. Since rtnetlink needs conditional
locking now, and doesn't use it either, axe this feature complete.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-10 13:15:40 +01:00
Jakub Kicinski
5380d64f8d rtnetlink: move rtnl_lock handling out of af_netlink
Now that we have an intermediate layer of code for handling
rtnl-level netlink dump quirks, we can move the rtnl_lock
taking there.

For dump handlers with RTNL_FLAG_DUMP_SPLIT_NLM_DONE we can
avoid taking rtnl_lock just to generate NLM_DONE, once again.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-10 13:15:40 +01:00
Florian Westphal
f81d0dd2fd tcp: move inet_twsk_schedule helper out of header
Its no longer used outside inet_timewait_sock.c, so move it there.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-10 11:54:18 +01:00
Florian Westphal
c75ad7c759 net: tcp: un-pin the tw_timer
After previous patch, even if timer fires immediately on another CPU,
context that schedules the timer now holds the ehash spinlock, so timer
cannot reap tw socket until ehash lock is released.

BH disable is moved into hashdance_schedule.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-10 11:54:18 +01:00
Valentin Schneider
b334b924c9 net: tcp/dccp: prepare for tw_timer un-pinning
The TCP timewait timer is proving to be problematic for setups where
scheduler CPU isolation is achieved at runtime via cpusets (as opposed to
statically via isolcpus=domains).

What happens there is a CPU goes through tcp_time_wait(), arming the
time_wait timer, then gets isolated. TCP_TIMEWAIT_LEN later, the timer
fires, causing interference for the now-isolated CPU. This is conceptually
similar to the issue described in commit e02b931248 ("workqueue: Unbind
kworkers before sending them to exit()")

Move inet_twsk_schedule() to within inet_twsk_hashdance(), with the ehash
lock held. Expand the lock's critical section from inet_twsk_kill() to
inet_twsk_deschedule_put(), serializing the scheduling vs descheduling of
the timer. IOW, this prevents the following race:

			     tcp_time_wait()
			       inet_twsk_hashdance()
  inet_twsk_deschedule_put()
    del_timer_sync()
			       inet_twsk_schedule()

Thanks to Paolo Abeni for suggesting to leverage the ehash lock.

This also restores a comment from commit ec94c2696f ("tcp/dccp: avoid
one atomic operation for timewait hashdance") as inet_twsk_hashdance() had
a "Step 1" and "Step 3" comment, but the "Step 2" had gone missing.

inet_twsk_deschedule_put() now acquires the ehash spinlock to synchronize
with inet_twsk_hashdance_schedule().

To ease possible regression search, actual un-pin is done in next patch.

Link: https://lore.kernel.org/all/ZPhpfMjSiHVjQkTk@localhost.localdomain/
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-10 11:54:18 +01:00
David Wei
3e61103b2f page_pool: remove WARN_ON() with OR
Having an OR in WARN_ON() makes me sad because it's impossible to tell
which condition is true when triggered.

Split a WARN_ON() with an OR in page_pool_disable_direct_recycling().

Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-09 15:50:43 +01:00
Jakub Kicinski
62b5bf58b9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts.

Adjacent changes:

drivers/net/ethernet/pensando/ionic/ionic_txrx.c
  d9c0420999 ("ionic: Mark error paths in the data path as unlikely")
  491aee894a ("ionic: fix kernel panic in XDP_TX action")

net/ipv6/ip6_fib.c
  b4cb4a1391 ("net: use unrcu_pointer() helper")
  b01e1c0307 ("ipv6: fix possible race in __fib6_drop_pcpu_from()")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-06 12:06:56 -07:00
Eric Dumazet
6971d21672 tcp: move reqsk_alloc() to inet_connection_sock.c
reqsk_alloc() has a single caller, no need to expose it
in include/net/request_sock.h.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 15:18:04 +02:00
Eric Dumazet
adbe695a97 tcp: move inet_reqsk_alloc() close to inet_reqsk_clone()
inet_reqsk_alloc() does not belong to tcp_input.c,
move it to inet_connection_sock.c instead.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 15:18:04 +02:00
Davide Caratti
92f74c1e05 mptcp: refer to 'MPTCP' socket in comments
We used to call it 'master' socket at the early stages of MPTCP
development, but the correct wording is 'MPTCP' socket opposed to 'TCP
subflows': convert the last 3 comments to use a more appropriate term.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 15:13:47 +02:00
Geliang Tang
5cdedad62e mptcp: add mptcp_space_from_win helper
As a wrapper of __tcp_space_from_win(), this patch adds a MPTCP dedicated
space_from_win helper mptcp_space_from_win() in protocol.h to paired with
mptcp_win_from_space().

Use it instead of __tcp_space_from_win() in both mptcp_rcv_space_adjust()
and mptcp_set_rcvlowat().

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 15:13:47 +02:00
Geliang Tang
5f0d0649c8 mptcp: use mptcp_win_from_space helper
The MPTCP dedicated win_from_space helper mptcp_win_from_space() is defined
in protocol.h, use it in mptcp_rcv_space_adjust() instead of using the TCP
one. Here scaling_ratio is the same as msk->scaling_ratio.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 15:13:47 +02:00
Su Hui
0dcc53abf5 net: ethtool: fix the error condition in ethtool_get_phy_stats_ethtool()
Clang static checker (scan-build) warning:
net/ethtool/ioctl.c:line 2233, column 2
Called function pointer is null (null dereference).

Return '-EOPNOTSUPP' when 'ops->get_ethtool_phy_stats' is NULL to fix
this typo error.

Fixes: 201ed315f9 ("net/ethtool/ioctl: split ethtool_get_phy_stats into multiple helpers")
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://lore.kernel.org/r/20240605034742.921751-1-suhui@nfschina.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 13:34:33 +02:00
Jason Xing
9b6a30febd net: allow rps/rfs related configs to be switched
After John Sperbeck reported a compile error if the CONFIG_RFS_ACCEL
is off, I found that I cannot easily enable/disable the config
because of lack of the prompt when using 'make menuconfig'. Therefore,
I decided to change rps/rfc related configs altogether.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Link: https://lore.kernel.org/r/20240605022932.33703-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 13:18:48 +02:00
Eric Dumazet
b01e1c0307 ipv6: fix possible race in __fib6_drop_pcpu_from()
syzbot found a race in __fib6_drop_pcpu_from() [1]

If compiler reads more than once (*ppcpu_rt),
second read could read NULL, if another cpu clears
the value in rt6_get_pcpu_route().

Add a READ_ONCE() to prevent this race.

Also add rcu_read_lock()/rcu_read_unlock() because
we rely on RCU protection while dereferencing pcpu_rt.

[1]

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
CPU: 0 PID: 7543 Comm: kworker/u8:17 Not tainted 6.10.0-rc1-syzkaller-00013-g2bfcfd584ff5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
Workqueue: netns cleanup_net
 RIP: 0010:__fib6_drop_pcpu_from.part.0+0x10a/0x370 net/ipv6/ip6_fib.c:984
Code: f8 48 c1 e8 03 80 3c 28 00 0f 85 16 02 00 00 4d 8b 3f 4d 85 ff 74 31 e8 74 a7 fa f7 49 8d bf 90 00 00 00 48 89 f8 48 c1 e8 03 <80> 3c 28 00 0f 85 1e 02 00 00 49 8b 87 90 00 00 00 48 8b 0c 24 48
RSP: 0018:ffffc900040df070 EFLAGS: 00010206
RAX: 0000000000000012 RBX: 0000000000000001 RCX: ffffffff89932e16
RDX: ffff888049dd1e00 RSI: ffffffff89932d7c RDI: 0000000000000091
RBP: dffffc0000000000 R08: 0000000000000005 R09: 0000000000000007
R10: 0000000000000001 R11: 0000000000000006 R12: ffff88807fa080b8
R13: fffffbfff1a9a07d R14: ffffed100ff41022 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff8880b9200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b32c26000 CR3: 000000005d56e000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  __fib6_drop_pcpu_from net/ipv6/ip6_fib.c:966 [inline]
  fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1027 [inline]
  fib6_purge_rt+0x7f2/0x9f0 net/ipv6/ip6_fib.c:1038
  fib6_del_route net/ipv6/ip6_fib.c:1998 [inline]
  fib6_del+0xa70/0x17b0 net/ipv6/ip6_fib.c:2043
  fib6_clean_node+0x426/0x5b0 net/ipv6/ip6_fib.c:2205
  fib6_walk_continue+0x44f/0x8d0 net/ipv6/ip6_fib.c:2127
  fib6_walk+0x182/0x370 net/ipv6/ip6_fib.c:2175
  fib6_clean_tree+0xd7/0x120 net/ipv6/ip6_fib.c:2255
  __fib6_clean_all+0x100/0x2d0 net/ipv6/ip6_fib.c:2271
  rt6_sync_down_dev net/ipv6/route.c:4906 [inline]
  rt6_disable_ip+0x7ed/0xa00 net/ipv6/route.c:4911
  addrconf_ifdown.isra.0+0x117/0x1b40 net/ipv6/addrconf.c:3855
  addrconf_notify+0x223/0x19e0 net/ipv6/addrconf.c:3778
  notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
  call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1992
  call_netdevice_notifiers_extack net/core/dev.c:2030 [inline]
  call_netdevice_notifiers net/core/dev.c:2044 [inline]
  dev_close_many+0x333/0x6a0 net/core/dev.c:1585
  unregister_netdevice_many_notify+0x46d/0x19f0 net/core/dev.c:11193
  unregister_netdevice_many net/core/dev.c:11276 [inline]
  default_device_exit_batch+0x85b/0xae0 net/core/dev.c:11759
  ops_exit_list+0x128/0x180 net/core/net_namespace.c:178
  cleanup_net+0x5b7/0xbf0 net/core/net_namespace.c:640
  process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231
  process_scheduled_works kernel/workqueue.c:3312 [inline]
  worker_thread+0x6c8/0xf70 kernel/workqueue.c:3393
  kthread+0x2c1/0x3a0 kernel/kthread.c:389
  ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Fixes: d52d3997f8 ("ipv6: Create percpu rt6_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20240604193549.981839-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 13:05:54 +02:00
Kuniyuki Iwashima
efaf24e30e af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill().
While dumping sockets via UNIX_DIAG, we do not hold unix_state_lock().

Let's use READ_ONCE() to read sk->sk_shutdown.

Fixes: e4e541a848 ("sock-diag: Report shutdown for inet and unix sockets (v2)")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima
5d915e584d af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen().
We can dump the socket queue length via UNIX_DIAG by specifying
UDIAG_SHOW_RQLEN.

If sk->sk_state is TCP_LISTEN, we return the recv queue length,
but here we do not hold recvq lock.

Let's use skb_queue_len_lockless() in sk_diag_show_rqlen().

Fixes: c9da99e647 ("unix_diag: Fixup RQLEN extension report")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima
83690b82d2 af_unix: Use skb_queue_empty_lockless() in unix_release_sock().
If the socket type is SOCK_STREAM or SOCK_SEQPACKET, unix_release_sock()
checks the length of the peer socket's recvq under unix_state_lock().

However, unix_stream_read_generic() calls skb_unlink() after releasing
the lock.  Also, for SOCK_SEQPACKET, __skb_try_recv_datagram() unlinks
skb without unix_state_lock().

Thues, unix_state_lock() does not protect qlen.

Let's use skb_queue_empty_lockless() in unix_release_sock().

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima
45d872f0e6 af_unix: Use unix_recvq_full_lockless() in unix_stream_connect().
Once sk->sk_state is changed to TCP_LISTEN, it never changes.

unix_accept() takes advantage of this characteristics; it does not
hold the listener's unix_state_lock() and only acquires recvq lock
to pop one skb.

It means unix_state_lock() does not prevent the queue length from
changing in unix_stream_connect().

Thus, we need to use unix_recvq_full_lockless() to avoid data-race.

Now we remove unix_recvq_full() as no one uses it.

Note that we can remove READ_ONCE() for sk->sk_max_ack_backlog in
unix_recvq_full_lockless() because of the following reasons:

  (1) For SOCK_DGRAM, it is a written-once field in unix_create1()

  (2) For SOCK_STREAM and SOCK_SEQPACKET, it is changed under the
      listener's unix_state_lock() in unix_listen(), and we hold
      the lock in unix_stream_connect()

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima
bd9f2d0573 af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen.
net->unx.sysctl_max_dgram_qlen is exposed as a sysctl knob and can be
changed concurrently.

Let's use READ_ONCE() in unix_create1().

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima
b0632e53e0 af_unix: Annotate data-races around sk->sk_sndbuf.
sk_setsockopt() changes sk->sk_sndbuf under lock_sock(), but it's
not used in af_unix.c.

Let's use READ_ONCE() to read sk->sk_sndbuf in unix_writable(),
unix_dgram_sendmsg(), and unix_stream_sendmsg().

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima
0aa3be7b3e af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG.
While dumping AF_UNIX sockets via UNIX_DIAG, sk->sk_state is read
locklessly.

Let's use READ_ONCE() there.

Note that the result could be inconsistent if the socket is dumped
during the state change.  This is common for other SOCK_DIAG and
similar interfaces.

Fixes: c9da99e647 ("unix_diag: Fixup RQLEN extension report")
Fixes: 2aac7a2cb0 ("unix_diag: Pending connections IDs NLA")
Fixes: 45a96b9be6 ("unix_diag: Dumping all sockets core")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:15 +02:00
Kuniyuki Iwashima
af4c733b6b af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb().
unix_stream_read_skb() is called from sk->sk_data_ready() context
where unix_state_lock() is not held.

Let's use READ_ONCE() there.

Fixes: 77462de14a ("af_unix: Add read_sock for stream socket types")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:14 +02:00
Kuniyuki Iwashima
8a34d4e8d9 af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg().
The following functions read sk->sk_state locklessly and proceed only if
the state is TCP_ESTABLISHED.

  * unix_stream_sendmsg
  * unix_stream_read_generic
  * unix_seqpacket_sendmsg
  * unix_seqpacket_recvmsg

Let's use READ_ONCE() there.

Fixes: a05d2ad1c1 ("af_unix: Only allow recv on connected seqpacket sockets.")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:14 +02:00
Kuniyuki Iwashima
1b536948e8 af_unix: Annotate data-race of sk->sk_state in unix_accept().
Once sk->sk_state is changed to TCP_LISTEN, it never changes.

unix_accept() takes the advantage and reads sk->sk_state without
holding unix_state_lock().

Let's use READ_ONCE() there.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:14 +02:00
Kuniyuki Iwashima
a9bf9c7dc6 af_unix: Annotate data-race of sk->sk_state in unix_stream_connect().
As small optimisation, unix_stream_connect() prefetches the client's
sk->sk_state without unix_state_lock() and checks if it's TCP_CLOSE.

Later, sk->sk_state is checked again under unix_state_lock().

Let's use READ_ONCE() for the first check and TCP_CLOSE directly for
the second check.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:14 +02:00
Kuniyuki Iwashima
eb0718fb3e af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll().
unix_poll() and unix_dgram_poll() read sk->sk_state locklessly and
calls unix_writable() which also reads sk->sk_state without holding
unix_state_lock().

Let's use READ_ONCE() in unix_poll() and unix_dgram_poll() and pass
it to unix_writable().

While at it, we remove TCP_SYN_SENT check in unix_dgram_poll() as
that state does not exist for AF_UNIX socket since the code was added.

Fixes: 1586a5877d ("af_unix: do not report POLLOUT on listeners")
Fixes: 3c73419c09 ("af_unix: fix 'poll for write'/ connected DGRAM sockets")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:14 +02:00
Kuniyuki Iwashima
3a0f38eb28 af_unix: Annotate data-race of sk->sk_state in unix_inq_len().
ioctl(SIOCINQ) calls unix_inq_len() that checks sk->sk_state first
and returns -EINVAL if it's TCP_LISTEN.

Then, for SOCK_STREAM sockets, unix_inq_len() returns the number of
bytes in recvq.

However, unix_inq_len() does not hold unix_state_lock(), and the
concurrent listen() might change the state after checking sk->sk_state.

If the race occurs, 0 is returned for the listener, instead of -EINVAL,
because the length of skb with embryo is 0.

We could hold unix_state_lock() in unix_inq_len(), but it's overkill
given the result is true for pre-listen() TCP_CLOSE state.

So, let's use READ_ONCE() for sk->sk_state in unix_inq_len().

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:14 +02:00
Kuniyuki Iwashima
942238f973 af_unix: Annodate data-races around sk->sk_state for writers.
sk->sk_state is changed under unix_state_lock(), but it's read locklessly
in many places.

This patch adds WRITE_ONCE() on the writer side.

We will add READ_ONCE() to the lockless readers in the following patches.

Fixes: 83301b5367 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:14 +02:00
Kuniyuki Iwashima
26bfb8b570 af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer.
When a SOCK_DGRAM socket connect()s to another socket, the both sockets'
sk->sk_state are changed to TCP_ESTABLISHED so that we can register them
to BPF SOCKMAP.

When the socket disconnects from the peer by connect(AF_UNSPEC), the state
is set back to TCP_CLOSE.

Then, the peer's state is also set to TCP_CLOSE, but the update is done
locklessly and unconditionally.

Let's say socket A connect()ed to B, B connect()ed to C, and A disconnects
from B.

After the first two connect()s, all three sockets' sk->sk_state are
TCP_ESTABLISHED:

  $ ss -xa
  Netid State  Recv-Q Send-Q  Local Address:Port  Peer Address:PortProcess
  u_dgr ESTAB  0      0       @A 641              * 642
  u_dgr ESTAB  0      0       @B 642              * 643
  u_dgr ESTAB  0      0       @C 643              * 0

And after the disconnect, B's state is TCP_CLOSE even though it's still
connected to C and C's state is TCP_ESTABLISHED.

  $ ss -xa
  Netid State  Recv-Q Send-Q  Local Address:Port  Peer Address:PortProcess
  u_dgr UNCONN 0      0       @A 641              * 0
  u_dgr UNCONN 0      0       @B 642              * 643
  u_dgr ESTAB  0      0       @C 643              * 0

In this case, we cannot register B to SOCKMAP.

So, when a socket disconnects from the peer, we should not set TCP_CLOSE to
the peer if the peer is connected to yet another socket, and this must be
done under unix_state_lock().

Note that we use WRITE_ONCE() for sk->sk_state as there are many lockless
readers.  These data-races will be fixed in the following patches.

Fixes: 83301b5367 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:57:14 +02:00
Eric Dumazet
98aa546af5 inet: remove (struct uncached_list)->quarantine
This list is used to tranfert dst that are handled by
rt_flush_dev() and rt6_uncached_list_flush_dev() out
of the per-cpu lists.

But quarantine list is not used later.

If we simply use list_del_init(&rt->dst.rt_uncached),
this also removes the dst from per-cpu list.

This patch also makes the future calls to rt_del_uncached_list()
and rt6_uncached_list_del() faster, because no spinlock
acquisition is needed anymore.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240604165150.726382-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 12:33:25 +02:00
Eric Dumazet
b4cb4a1391 net: use unrcu_pointer() helper
Toke mentioned unrcu_pointer() existence, allowing
to remove some of the ugly casts we have when using
xchg() for rcu protected pointers.

Also make inet_rcv_compat const.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20240604111603.45871-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-06 11:52:52 +02:00
Jakub Kicinski
886bf9172d bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZmAYPgAKCRDbK58LschI
 g2XdAP9M8zYLRw4IG8DUFug7F+oqRPqgbs+Gvsf9YNl5/PSiTQEA6WKa/ObaG/W9
 vre9VxhMWKgcMfzqZyztNHAiDm8R+QI=
 =l7gV
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2024-06-05

We've added 8 non-merge commits during the last 6 day(s) which contain
a total of 9 files changed, 34 insertions(+), 35 deletions(-).

The main changes are:

1) Fix a potential use-after-free in bpf_link_free when the link uses
   dealloc_deferred to free the link object but later still tests for
   presence of link->ops->dealloc, from Cong Wang.

2) Fix BPF test infra to set the run context for rawtp test_run callback
   where syzbot reported a crash, from Jiri Olsa.

3) Fix bpf_session_cookie BTF_ID in the special_kfunc_set list to exclude
   it for the case of !CONFIG_FPROBE, also from Jiri Olsa.

4) Fix a Coverity static analysis report to not close() a link_fd of -1
   in the multi-uprobe feature detector, from Andrii Nakryiko.

5) Revert support for redirect to any xsk socket bound to the same umem
   as it can result in corrupted ring state which can lead to a crash when
   flushing rings. A different approach will be pursued for bpf-next to
   address it safely, from Magnus Karlsson.

6) Fix inet_csk_accept prototype in test_sk_storage_tracing.c which caused
   BPF CI failure after the last tree fast forwarding, from Andrii Nakryiko.

7) Fix a coccicheck warning in BPF devmap that iterator variable cannot
   be NULL, from Thorsten Blum.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  Revert "xsk: Document ability to redirect to any socket bound to the same umem"
  Revert "xsk: Support redirect to any socket bound to the same umem"
  bpf: Set run context for rawtp test_run callback
  bpf: Fix a potential use-after-free in bpf_link_free()
  bpf, devmap: Remove unnecessary if check in for loop
  libbpf: don't close(-1) in multi-uprobe feature detector
  bpf: Fix bpf_session_cookie BTF_ID in special_kfunc_set list
  selftests/bpf: fix inet_csk_accept prototype in test_sk_storage_tracing.c
====================

Link: https://lore.kernel.org/r/20240605091525.22628-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 19:03:08 -07:00
Eric Dumazet
f921a58ae2 net/sched: taprio: always validate TCA_TAPRIO_ATTR_PRIOMAP
If one TCA_TAPRIO_ATTR_PRIOMAP attribute has been provided,
taprio_parse_mqprio_opt() must validate it, or userspace
can inject arbitrary data to the kernel, the second time
taprio_change() is called.

First call (with valid attributes) sets dev->num_tc
to a non zero value.

Second call (with arbitrary mqprio attributes)
returns early from taprio_parse_mqprio_opt()
and bad things can happen.

Fixes: a3d43c0d56 ("taprio: Add support adding an admin schedule")
Reported-by: Noam Rathaus <noamr@ssd-disclosure.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20240604181511.769870-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05 15:54:51 -07:00
Kevin Yang
f086edef71 tcp: add sysctl_tcp_rto_min_us
Adding a sysctl knob to allow user to specify a default
rto_min at socket init time, other than using the hard
coded 200ms default rto_min.

Note that the rto_min route option has the highest precedence
for configuring this setting, followed by the TCP_BPF_RTO_MIN
socket option, followed by the tcp_rto_min_us sysctl.

Signed-off-by: Kevin Yang <yyd@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05 13:42:54 +01:00
Kevin Yang
512bd0f9f9 tcp: derive delack_max with tcp_rto_min helper
Rto_min now has multiple sources, ordered by preprecedence high to
low: ip route option rto_min, icsk->icsk_rto_min.

When derive delack_max from rto_min, we should not only use ip
route option, but should use tcp_rto_min helper to get the correct
rto_min.

Signed-off-by: Kevin Yang <yyd@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-05 13:42:54 +01:00