ipv6_destopt_rcv() and ipv6_parse_hopopts() pulls these data
- Hop-by-Hop/Destination Options Header : 8
- Hdr Ext Len : skb_transport_header(skb)[1] << 3
and calls ip6_parse_tlv(), so it need not check if skb_headlen() is less
than skb_transport_offset(skb) + (skb_transport_header(skb)[1] << 3).
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We need not reload hdr in ipv6_srh_rcv() unless we call
pskb_expand_head().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ipv6_rthdr_rcv() pulls these data
- Segment Routing Header : 8
- Hdr Ext Len : skb_transport_header(skb)[1] << 3
needed by ipv6_srh_rcv(), so pskb_pull() in ipv6_srh_rcv() never
fails and can be replaced with skb_pull().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ipv6_rpl_srh_rcv() checks if ipv6_hdr(skb)->daddr or ohdr->rpl_segaddr[i]
is the multicast address with ipv6_addr_type().
We have the same check for ipv6_hdr(skb)->daddr in ipv6_rthdr_rcv(), so we
need not recheck it in ipv6_rpl_srh_rcv().
Also, we should use ipv6_addr_is_multicast() for ohdr->rpl_segaddr[i]
instead of ipv6_addr_type().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As Eric Dumazet pointed out [0], ipv6_rthdr_rcv() pulls these data
- Segment Routing Header : 8
- Hdr Ext Len : skb_transport_header(skb)[1] << 3
needed by ipv6_rpl_srh_rcv(). We can remove pskb_may_pull() and
replace pskb_pull() with skb_pull() in ipv6_rpl_srh_rcv().
Link: https://lore.kernel.org/netdev/CANn89iLboLwLrHXeHJucAqBkEL_S0rJFog68t7wwwXO-aNf5Mg@mail.gmail.com/ [0]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use the new TLS handshake API to enable the SunRPC client code
to request a TLS handshake. This implements support for RFC 9289,
only on TCP sockets.
Upper layers such as NFS use RPC-with-TLS to protect in-transit
traffic.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
kTLS sockets use CMSG to report decryption errors and the need
for session re-keying.
For RPC-with-TLS, an "application data" message contains a ULP
payload, and that is passed along to the RPC client. An "alert"
message triggers connection reset. Everything else is discarded.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The RPC header parser doesn't recognize TLS handshake traffic, so it
will close the connection prematurely with an error. To avoid that,
shunt the transport's data_ready callback when there is a TLS
handshake in progress.
The XPRT_SOCK_IGNORE_RECV flag will be toggled by code added in a
subsequent patch.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The new authentication flavor is used only to discover peer support
for RPC-over-TLS.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Pass the upper layer's rpc_create_args to the rpc_clnt_new()
tracepoint so additional parts of the upper layer's request can be
recorded.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Add an initial set of policies along with fields for upper layers to
pass the requested policy down to the transport layer.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
NFS is primarily name-spaced using network namespaces. However it
contacts rpcbind (and gss_proxy) using AF_UNIX sockets which are
name-spaced using the mount namespaces. This requires a container using
NFSv3 (the form that requires rpcbind) to manage both network and mount
namespaces, which can seem an unnecessary burden.
As NFS is primarily a network service it makes sense to use network
namespaces as much as possible, and to prefer to communicate with an
rpcbind running in the same network namespace. This can be done, while
preserving the benefits of AF_UNIX sockets, by using an abstract socket
address.
An abstract address has a nul at the start of sun_path, and a length
that is exactly the complete size of the sockaddr_un up to the end of
the name, NOT including any trailing nul (which is not part of the
address).
Abstract addresses are local to a network namespace - regular AF_UNIX
path names a resolved in the mount namespace ignoring the network
namespace.
This patch causes rpcb to first try an abstract address before
continuing with regular AF_UNIX and then IP addresses. This ensures
backwards compatibility.
Choosing the name needs some care as the same address will be configured
for rpcbind, and needs to be built in to libtirpc for this enhancement
to be fully successful. There is no formal standard for choosing
abstract addresses. The defacto standard appears to be to use a path
name similar to what would be used for a filesystem AF_UNIX address -
but with a leading nul.
In that case
"\0/var/run/rpcbind.sock"
seems like the best choice. However at this time /var/run is deprecated
in favour of /run, so
"\0/run/rpcbind.sock"
might be better.
Though as we are deliberately moving away from using the filesystem it
might seem more sensible to explicitly break the connection and just
have
"\0rpcbind.socket"
using the same name as the systemd unit file..
This patch chooses the second option, which seems least likely to raise
objections.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
An "abtract" address for an AF_UNIX socket start with a nul and can
contain any bytes for the given length, but traditionally doesn't
contain other nuls. When reported, the leading nul is replaced by '@'.
sunrpc currently rejects connections to these addresses and reports them
as an empty string. To provide support for future use of these
addresses, allow them for outgoing connections and report them more
usefully.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When using encapsulation the original packet's headers are copied to the
inner headers. This preserves the space for an inner mac header, which
is not used by the inner payloads for the encapsulation types supported
by IPVS. If a packet is using GUE or GRE encapsulation and needs to be
segmented, flow can be passed to __skb_udp_tunnel_segment() which
calculates a negative tunnel header length. A negative tunnel header
length causes pskb_may_pull() to fail, dropping the packet.
This can be observed by attaching probes to ip_vs_in_hook(),
__dev_queue_xmit(), and __skb_udp_tunnel_segment():
perf probe --add '__dev_queue_xmit skb->inner_mac_header \
skb->inner_network_header skb->mac_header skb->network_header'
perf probe --add '__skb_udp_tunnel_segment:7 tnl_hlen'
perf probe -m ip_vs --add 'ip_vs_in_hook skb->inner_mac_header \
skb->inner_network_header skb->mac_header skb->network_header'
These probes the headers and tunnel header length for packets which
traverse the IPVS encapsulation path. A TCP packet can be forced into
the segmentation path by being smaller than a calculated clamped MSS,
but larger than the advertised MSS.
probe:ip_vs_in_hook: inner_mac_header=0x0 inner_network_header=0x0 mac_header=0x44 network_header=0x52
probe:ip_vs_in_hook: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32
probe:dev_queue_xmit: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32
probe:__skb_udp_tunnel_segment_L7: tnl_hlen=-2
When using veth-based encapsulation, the interfaces are set to be
mac-less, which does not preserve space for an inner mac header. This
prevents this issue from occurring.
In our real-world testing of sending a 32KB file we observed operation
time increasing from ~75ms for veth-based encapsulation to over 1.5s
using IPVS encapsulation due to retries from dropped packets.
This changeset modifies the packet on the encapsulation path in
ip_vs_tunnel_xmit() and ip_vs_tunnel_xmit_v6() to remove the inner mac
header offset. This fixes UDP segmentation for both encapsulation types,
and corrects the inner headers for any IPIP flows that may use it.
Fixes: 84c0d5e96f ("ipvs: allow tunneling with gue encapsulation")
Signed-off-by: Terin Stock <terin@cloudflare.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This allows to do more centralized decisions later on, and generally
makes it very explicit which maps are privileged and which are not
(e.g., LRU_HASH and LRU_PERCPU_HASH, which are privileged HASH variants,
as opposed to unprivileged HASH and HASH_PERCPU; now this is explicit
and easy to verify).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230613223533.3689589-4-andrii@kernel.org
An AP reporting colocated APs may send more than one reduced neighbor
report element. As such, iterate all elements instead of only parsing
the first one when looking for colocated APs.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230618214436.ffe2c014f478.I372a4f96c88f7ea28ac39e94e0abfc465b5330d4@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The error handling code would break out of the loop incorrectly,
causing the rest of the message to be misinterpreted. Fix this by
also jumping out of the surrounding while loop, which will trigger
the error detection code.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230618214436.0ffac98475cf.I6f5c08a09f5c9fced01497b95a9841ffd1b039f8@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There were crashes reported in this code, and the timer_shutdown()
warning in one of the previous patches indicates that the timeout
timer for the AP response (addba_resp_timer) is still armed while
we're stopping the aggregation session.
After a very long deliberation of the code, so far the only way I
could find that might cause this would be the following sequence:
- session start requested
- session start indicated to driver, but driver returns
IEEE80211_AMPDU_TX_START_DELAY_ADDBA
- session stop requested, sets HT_AGG_STATE_WANT_STOP
- session stop worker runs ___ieee80211_stop_tx_ba_session(),
sets HT_AGG_STATE_STOPPING
From here on, the order doesn't matter exactly, but:
1. driver calls ieee80211_start_tx_ba_cb_irqsafe(),
setting HT_AGG_STATE_START_CB
2. driver calls ieee80211_stop_tx_ba_cb_irqsafe(),
setting HT_AGG_STATE_STOP_CB
3. the worker will run ieee80211_start_tx_ba_cb() for
HT_AGG_STATE_START_CB
4. the worker will run ieee80211_stop_tx_ba_cb() for
HT_AGG_STATE_STOP_CB
(the order could also be 1./3./2./4.)
This will cause ieee80211_start_tx_ba_cb() to send out the AddBA
request frame to the AP and arm the timer, but we're already in
the middle of stopping and so the ieee80211_stop_tx_ba_cb() will
no longer assume it needs to stop anything.
Prevent this by checking for WANT_STOP/STOPPING in the start CB,
and warn if we're sending a frame on a stopping session.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230618214436.e5b52777462a.I0b2ed6658e81804279f5d7c9c1918cb1f6626bf2@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This is all true today, but difficult to understand since
the callers are in other files etc. Add two new lockdep
assertions to make things easier to read.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230618214436.7f03dec6a90b.I762c11e95da005b80fa0184cb1173b99ec362acf@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There are cases where keeping sdata locked for an operation. Add a
variant that does not take sdata lock to permit these usecases.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There are cases where keeping sdata locked for an operation. Add a
variant that does not take sdata lock to permit these usecases.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
STA MLD setup links may get removed if AP MLD remove the corresponding
affiliated APs with Multi-Link reconfiguration as described in
P802.11be_D3.0, section 35.3.6.2.2 Removing affiliated APs. Currently,
there is no support to notify such operation to cfg80211 and userspace.
Add support for the drivers to indicate STA MLD setup links removal to
cfg80211 and notify the same to userspace. Upon receiving such
indication from the driver, clear the MLO links information of the
removed links in the WDEV.
Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com>
Link: https://lore.kernel.org/r/20230317142153.237900-1-quic_vjakkam@quicinc.com
[rename function and attribute, fix kernel-doc]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If a link is disabled on 6GHz, we should not send a probe request on the
channel to resolve it. Simply skip such RNR entries so that the link is
ignored.
Userspace can still see the link in the RNR and may generate an ML probe
request in order to associate to the (currently) disabled link.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230618214436.4f7384006471.Iff8f1081e76a298bd25f9468abb3a586372cddaa@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The basic multi-link element within an multi-link probe response will
contain full information about BSSes that are part of an MLD AP. This
BSS information may be used to associate with a link of an MLD AP
without having received a beacon from the BSS itself.
This patch adds parsing of the data and adding/updating the BSS using
the received elements. Doing this means that userspace can discover the
BSSes using an ML probe request and request association on these links.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230618214436.29593bd0ae1f.Ic9a67b8f022360aa202b870a932897a389171b14@changeid
[swap loop conditions smatch complained about]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Make the data access a bit nicer overall by using structs. There is a
small change here to also accept a TBTT information length of eight
bytes as we do not require the 20 MHz PSD information.
This also fixes a bug reading the short SSID on big endian machines.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230618214436.4c3f8901c1bc.Ic3e94fd6e1bccff7948a252ad3bb87e322690a17@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The helper functions to retrieve the EML capabilities and medium
synchronization delay both assume that the type is correct. Instead of
assuming the length is correct and still checking the type, add a new
helper to check both and don't do any verification.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230618214435.1b50e7a3b3cf.I9385514d8eb6d6d3c82479a6fa732ef65313e554@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Include the Multi-Link elements found in beacon frames
in the CRC calculation, as these elements are intended
to reflect changes in the AP MLD state.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230618214435.ae8246b93d85.Ia64b45198de90ff7f70abcc997841157f148ea40@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since regulatory disconnect was added, OCB and NAN interface
types were added, which made it completely unusable for any
driver that allowed OCB/NAN. Add OCB/NAN (though NAN doesn't
do anything, we don't have any info) and also remove all the
logic that opts out, so it won't be broken again if/when new
interface types are added.
Fixes: 6e0bd6c35b ("cfg80211: 802.11p OCB mode handling")
Fixes: cb3b7d8765 ("cfg80211: add start / stop NAN commands")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20230616222844.2794d1625a26.I8e78a3789a29e6149447b3139df724a6f1b46fc3@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The multi-link loop here broke disconnect when multi-link
operation (MLO) isn't active for a given interface, since
in that case valid_links is 0 (indicating no links, i.e.
no MLO.)
Fix this by taking that into account properly and skipping
the link only if there are valid_links in the first place.
Cc: stable@vger.kernel.org
Fixes: 7b0a0e3c3a ("wifi: cfg80211: do some rework towards MLO link APIs")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/r/20230616222844.eb073d650c75.I72739923ef80919889ea9b50de9e4ba4baa836ae@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
As a preparation to support Reconfiguration Multi Link
element, rename 'multi_link' and 'multi_link_len' fields
in 'struct ieee802_11_elems' to 'ml_basic' and 'ml_basic_len'.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094949.b11370d3066a.I34280ae3728597056a6a2f313063962206c0d581@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The removed code ran for any BSS that was not included in the MBSSID
element in order to update it. However, instead of using the correct
inheritance rules, it would simply copy the elements from the
transmitting AP. The result is that we would report incorrect elements
in this case.
After some discussions, it seems that there are likely not even APs
actually using this feature. Either way, removing the code decreases
complexity and makes the cfg80211 behaviour more correct.
Fixes: 0b8fb8235b ("cfg80211: Parsing of Multiple BSSID information in scanning")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094949.cfd6d8db1f26.Ia1044902b86cd7d366400a4bfb93691b8f05d68c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The cfg80211_gen_new_ie function merges the IEs using inheritance rules.
Rewrite this function to fix issues around inheritance rules. In
particular, vendor elements do not require any special handling, as they
are either all inherited or overridden by the subprofile.
Also, add fragmentation handling as this may be needed in some cases.
This also changes the function to not require making a copy. The new
version could be optimized a bit by explicitly tracking which IEs have
been handled already rather than looking that up again every time.
Note that a small behavioural change is the removal of the SSID special
handling. This should be fine for the MBSSID element, as the SSID must
be included in the subelement.
Fixes: 0b8fb8235b ("cfg80211: Parsing of Multiple BSSID information in scanning")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094949.bc6152e146db.I2b5f3bc45085e1901e5b5192a674436adaf94748@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The TBTT information field type must be zero. This is only changed in
the 802.11be draft specification where the value 1 is used to indicate
that only the MLD parameters are included.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094949.7865606ffe94.I7ff28afb875d1b4c39acd497df8490a7d3628e3f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Doing this simplifies the code somewhat, as iteration over the
nontransmitted BSSs is not required anymore. Also, mac80211 should
not be iterating over the nontrans_list as it should only be accessed
while the bss_lock is held.
It also simplifies parsing of the IEs somewhat, as cfg80211 already
extracts the IEs and passes them to the callback.
Note that the only user left requiring parsing a specific BSS is the
association code if a beacon is required by the hardware.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094949.39ebfe2f9e59.Ia012b08e0feed8ec431b666888b459f6366f7bd1@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This new function is called from within the inform_bss(_frame)_data
functions in order for the driver to update data that it is tracking.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094949.8d7781b0f965.I80041183072b75c081996a1a5a230b34aff5c668@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It is reasonable to hold bss_lock for a little bit longer after
cfg80211_bss_update is done. Right now, this does not make any big
difference, but doing so in preparation for the next patch which adds
a call to the driver.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094948.61701884ff0d.I3358228209eb6766202aff04d1bae0b8fdff611f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
These calls do not require any locking, so move them in preparation for
the next patches.
A minor change/bugfix is to not hint a beacon for nontransmitted BSSes
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094948.a5bf3558eae9.I33c7465d983c8bef19deb7a533ee475a16f91774@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add NULL check for compat variable to avoid crash in
cfg80211_chandef_compatible() if it got called with
some mixed up channel context where not all the users
compatible with each other, which shouldn't happen.
Signed-off-by: Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094948.ae0f10dfd36b.Iea98c74aeb87bf6ef49f6d0c8687bba0dbea2abd@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In both of these cases (config_link, prep_channel) it is not needed
to parse the MBSSID data for a nontransmitted BSS. In the config_link
case the frame does not contain any MBSSID element and inheritance
rules are only needed for the ML STA profile. While in the
prep_channel case the IEs have already been processed by cfg80211 and
are already exploded.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094948.66d2605ff0ad.I7cdd1d390e7b0735c46204231a9e636d45b7f1e4@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If the device is associated with an AP MLD, then TDLS data frames
should have
- A1 = peer address,
- A2 = own MLD address (since the peer may now know about MLO), and
- A3 = BSSID.
Change the code to do that.
Signed-off-by: Abhishek Naik <abhishek.naik@intel.com>
Signed-off-by: Mukesh Sisodiya <mukesh.sisodiya@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094948.4bf648b63dfd.I98ef1dabd14b74a92120750f7746a7a512011701@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Userspace can now select the link to use for TDLS management
frames (indicating e.g. which BSSID should be used), use the
link_id received from cfg80211 to build the frames.
Signed-off-by: Mukesh Sisodiya <mukesh.sisodiya@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094948.ce1fc230b505.Ie773c5679805001f5a52680d68d9ce0232c57648@changeid
[Benjamin fixed some locking]
Co-developed-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
[fix sta mutex locking too]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For multi-link operation(MLO) TDLS management
frames need to be transmitted on a specific link.
The TDLS setup request will add BSSID along with
peer address and userspace will pass the link-id
based on BSSID value to the driver(or mac80211).
Signed-off-by: Mukesh Sisodiya <mukesh.sisodiya@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230616094948.cb3d87c22812.Ia3d15ac4a9a182145bf2d418bcb3ddf4539cd0a7@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
-Wstringop-overflow is legitimately warning us about extra_size
pontentially being zero at some point, hence potenially ending
up _allocating_ zero bytes of memory for extra pointer and then
trying to access such object in a call to copy_from_user().
Fix this by adding a sanity check to ensure we never end up
trying to allocate zero bytes of data for extra pointer, before
continue executing the rest of the code in the function.
Address the following -Wstringop-overflow warning seen when built
m68k architecture with allyesconfig configuration:
from net/wireless/wext-core.c:11:
In function '_copy_from_user',
inlined from 'copy_from_user' at include/linux/uaccess.h:183:7,
inlined from 'ioctl_standard_iw_point' at net/wireless/wext-core.c:825:7:
arch/m68k/include/asm/string.h:48:25: warning: '__builtin_memset' writing 1 or more bytes into a region of size 0 overflows the destination [-Wstringop-overflow=]
48 | #define memset(d, c, n) __builtin_memset(d, c, n)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/uaccess.h:153:17: note: in expansion of macro 'memset'
153 | memset(to + (n - res), 0, res);
| ^~~~~~
In function 'kmalloc',
inlined from 'kzalloc' at include/linux/slab.h:694:9,
inlined from 'ioctl_standard_iw_point' at net/wireless/wext-core.c:819:10:
include/linux/slab.h:577:16: note: at offset 1 into destination object of size 0 allocated by '__kmalloc'
577 | return __kmalloc(size, flags);
| ^~~~~~~~~~~~~~~~~~~~~~
This help with the ongoing efforts to globally enable
-Wstringop-overflow.
Link: https://github.com/KSPP/linux/issues/315
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/ZItSlzvIpjdjNfd8@work
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In mesh mode, ieee80211_chandef_he_6ghz_oper() is called by
mesh_matches_local() for every received mesh beacon.
On a 6 GHz mesh of a HE-only phy, this spams that the hardware does not
have EHT capabilities, even if the received mesh beacon does not have an
EHT element.
Unlike HE, not supporting EHT in the 6 GHz band is not an error so do
not print anything in this case.
Fixes: 5dca295dd7 ("mac80211: Add initial support for EHT and 320 MHz channels")
Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230614132648.28995-1-nicolas.cavallari@green-communications.fr
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There are some locking changes that will later otherwise
cause conflicts, so merge wireless into wireless-next to
avoid those.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The double ifdefs (one for the variable declaration and
one around the code) are quite aesthetically displeasing.
Factor this code out into a helper for easier wrapping.
This will become even more ugly when another skb ext
comparison is added in the future.
The resulting machine code looks the same, the compiler
seems to try to use %rax more and some blocks more around
but I haven't spotted minor differences.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 7d81ee8722 ("svcrdma: Single-stage RDMA Read") changed the
behavior of svc_rdma_recvfrom() but neglected to update the
documenting comment.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Per-VMA locking allows us to lock a struct vm_area_struct without
taking the process-wide mmap lock in read mode.
Consider a process workload where the mmap lock is taken constantly in
write mode. In this scenario, all zerocopy receives are periodically
blocked during that period of time - though in principle, the memory
ranges being used by TCP are not touched by the operations that need
the mmap write lock. This results in performance degradation.
Now consider another workload where the mmap lock is never taken in
write mode, but there are many TCP connections using receive zerocopy
that are concurrently receiving. These connections all take the mmap
lock in read mode, but this does induce a lot of contention and atomic
ops for this process-wide lock. This results in additional CPU
overhead caused by contending on the cache line for this lock.
However, with per-vma locking, both of these problems can be avoided.
As a test, I ran an RPC-style request/response workload with 4KB
payloads and receive zerocopy enabled, with 100 simultaneous TCP
connections. I measured perf cycles within the
find_tcp_vma/mmap_read_lock/mmap_read_unlock codepath, with and
without per-vma locking enabled.
When using process-wide mmap semaphore read locking, about 1% of
measured perf cycles were within this path. With per-VMA locking, this
value dropped to about 0.45%.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove a couple of dprintk call sites that are of little value.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The preceding block comment before svc_register_xprt_class() is
not related to that function.
While we're here, add proper documenting comments for these two
publicly-visible functions.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This event brackets the svcrdma_post_* trace points. If this trace
event is enabled but does not appear as expected, that indicates a
chunk_ctxt leak.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Try to catch incorrect calling contexts mechanically rather than by
code review.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Micro-optimization: Call ktime_get() only when ->xpo_recvfrom() has
given us a full RPC message to process. rq_stime isn't used
otherwise, so this avoids pointless work.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now that we have bulk page allocation and release APIs, it's more
efficient to use those than it is for nfsd threads to wait for send
completions. Previous patches have eliminated the calls to
wait_for_completion() and complete(), in order to avoid scheduler
overhead.
Now release pages-under-I/O in the send completion handler using
the efficient bulk release API.
I've measured a 7% reduction in cumulative CPU utilization in
svc_rdma_sendto(), svc_rdma_wc_send(), and svc_xprt_release(). In
particular, using release_pages() instead of complete() cuts the
time per svc_rdma_wc_send() call by two-thirds. This helps improve
scalability because svc_rdma_wc_send() is single-threaded per
connection.
Reviewed-by: Tom Talpey <tom@talpey.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
I noticed that svc_rqst_release_pages() was still unnecessarily
releasing a page when svc_rdma_recvfrom() returns zero.
Fixes: a53d5cb064 ("svcrdma: Avoid releasing a page in svc_xprt_release()")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Under certain circumstances, the tcp receive buffer memory limit
set by autotuning (sk_rcvbuf) is increased due to incoming data
packets as a result of the window not closing when it should be.
This can result in the receive buffer growing all the way up to
tcp_rmem[2], even for tcp sessions with a low BDP.
To reproduce: Connect a TCP session with the receiver doing
nothing and the sender sending small packets (an infinite loop
of socket send() with 4 bytes of payload with a sleep of 1 ms
in between each send()). This will cause the tcp receive buffer
to grow all the way up to tcp_rmem[2].
As a result, a host can have individual tcp sessions with receive
buffers of size tcp_rmem[2], and the host itself can reach tcp_mem
limits, causing the host to go into tcp memory pressure mode.
The fundamental issue is the relationship between the granularity
of the window scaling factor and the number of byte ACKed back
to the sender. This problem has previously been identified in
RFC 7323, appendix F [1].
The Linux kernel currently adheres to never shrinking the window.
In addition to the overallocation of memory mentioned above, the
current behavior is functionally incorrect, because once tcp_rmem[2]
is reached when no remediations remain (i.e. tcp collapse fails to
free up any more memory and there are no packets to prune from the
out-of-order queue), the receiver will drop in-window packets
resulting in retransmissions and an eventual timeout of the tcp
session. A receive buffer full condition should instead result
in a zero window and an indefinite wait.
In practice, this problem is largely hidden for most flows. It
is not applicable to mice flows. Elephant flows can send data
fast enough to "overrun" the sk_rcvbuf limit (in a single ACK),
triggering a zero window.
But this problem does show up for other types of flows. Examples
are websockets and other type of flows that send small amounts of
data spaced apart slightly in time. In these cases, we directly
encounter the problem described in [1].
RFC 7323, section 2.4 [2], says there are instances when a retracted
window can be offered, and that TCP implementations MUST ensure
that they handle a shrinking window, as specified in RFC 1122,
section 4.2.2.16 [3]. All prior RFCs on the topic of tcp window
management have made clear that sender must accept a shrunk window
from the receiver, including RFC 793 [4] and RFC 1323 [5].
This patch implements the functionality to shrink the tcp window
when necessary to keep the right edge within the memory limit by
autotuning (sk_rcvbuf). This new functionality is enabled with
the new sysctl: net.ipv4.tcp_shrink_window
Additional information can be found at:
https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/
[1] https://www.rfc-editor.org/rfc/rfc7323#appendix-F
[2] https://www.rfc-editor.org/rfc/rfc7323#section-2.4
[3] https://www.rfc-editor.org/rfc/rfc1122#page-91
[4] https://www.rfc-editor.org/rfc/rfc793
[5] https://www.rfc-editor.org/rfc/rfc1323
Signed-off-by: Mike Freemon <mfreemon@cloudflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current mctp_newroute() contains two exactly same check against
rtm->rtm_type
static int mctp_newroute(...)
{
...
if (rtm->rtm_type != RTN_UNICAST) { // (1)
NL_SET_ERR_MSG(extack, "rtm_type must be RTN_UNICAST");
return -EINVAL;
}
...
if (rtm->rtm_type != RTN_UNICAST) // (2)
return -EINVAL;
...
}
This commits removes the (2) check as it is redundant.
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Acked-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://lore.kernel.org/r/20230615152240.1749428-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
kcm_write_msgs() calls unreserve_psock() to release its hold on the
underlying TCP socket if it has run out of things to transmit, but if we
have nothing in the write queue on entry (e.g. because someone did a
zero-length sendmsg), we don't actually go into the transmission loop and
as a consequence don't call reserve_psock().
Fix this by skipping the call to unreserve_psock() if we didn't reserve a
psock.
Fixes: c31a25e1db ("kcm: Send multiple frags in one sendmsg()")
Reported-by: syzbot+dd1339599f1840e4cc65@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000a61ffe05fe0c3d08@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: syzbot+dd1339599f1840e4cc65@syzkaller.appspotmail.com
cc: Tom Herbert <tom@herbertland.com>
cc: Tom Herbert <tom@quantonium.net>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Link: https://lore.kernel.org/r/20787.1686828722@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
Direct replacement is safe here since the return values
from the helper macros are ignored by the callers.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230613003326.3538391-1-azeemshaikh38@gmail.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Splicing to SOCK_RAW sockets may set MSG_SPLICE_PAGES, but in such a case,
__ip_append_data() will call skb_splice_from_iter() to access the 'from'
data, assuming it to point to a msghdr struct with an iter, instead of
using the provided getfrag function to access it.
In the case of raw_sendmsg(), however, this is not the case and 'from' will
point to a raw_frag_vec struct and raw_getfrag() will be the frag-getting
function. A similar issue may occur with rawv6_sendmsg().
Fix this by ignoring MSG_SPLICE_PAGES if getfrag != ip_generic_getfrag as
ip_generic_getfrag() expects "from" to be a msghdr*, but the other getfrags
don't. Note that this will prevent MSG_SPLICE_PAGES from being effective
for udplite.
This likely affects ping sockets too. udplite looks like it should be okay
as it expects "from" to be a msghdr.
Signed-off-by: David Howells <dhowells@redhat.com>
Reported-by: syzbot+d8486855ef44506fd675@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000ae4cbf05fdeb8349@google.com/
Fixes: 2dc334f1a6 ("splice, net: Use sendmsg(MSG_SPLICE_PAGES) rather than ->sendpage()")
Tested-by: syzbot+d8486855ef44506fd675@syzkaller.appspotmail.com
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/1410156.1686729856@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With offloading enabled, esp_xmit() gets invoked very late, from within
validate_xmit_xfrm() which is after validate_xmit_skb() validates and
linearizes the skb if the underlying device does not support fragments.
esp_output_tail() may add a fragment to the skb while adding the auth
tag/ IV. Devices without the proper support will then send skb->data
points to with the correct length so the packet will have garbage at the
end. A pcap sniffer will claim that the proper data has been sent since
it parses the skb properly.
It is not affected with INET_ESP_OFFLOAD disabled.
Linearize the skb after offloading if the sending hardware requires it.
It was tested on v4, v6 has been adopted.
Fixes: 7785bba299 ("esp: Add a software GRO codepath")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In some cases it is possible for kernel to come with request
to change primary MAC address to the address that is already
set on the given interface.
Add proper check to return fast from the function in these cases.
An example of such case is adding an interface to bonding
channel in balance-alb mode:
modprobe bonding mode=balance-alb miimon=100 max_bonds=1
ip link set bond0 up
ifenslave bond0 <eth>
Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Most of the ioctls to net protocols operates directly on userspace
argument (arg). Usually doing get_user()/put_user() directly in the
ioctl callback. This is not flexible, because it is hard to reuse these
functions without passing userspace buffers.
Change the "struct proto" ioctls to avoid touching userspace memory and
operate on kernel buffers, i.e., all protocol's ioctl callbacks is
adapted to operate on a kernel memory other than on userspace (so, no
more {put,get}_user() and friends being called in the ioctl callback).
This changes the "struct proto" ioctl format in the following way:
int (*ioctl)(struct sock *sk, int cmd,
- unsigned long arg);
+ int *karg);
(Important to say that this patch does not touch the "struct proto_ops"
protocols)
So, the "karg" argument, which is passed to the ioctl callback, is a
pointer allocated to kernel space memory (inside a function wrapper).
This buffer (karg) may contain input argument (copied from userspace in
a prep function) and it might return a value/buffer, which is copied
back to userspace if necessary. There is not one-size-fits-all format
(that is I am using 'may' above), but basically, there are three type of
ioctls:
1) Do not read from userspace, returns a result to userspace
2) Read an input parameter from userspace, and does not return anything
to userspace
3) Read an input from userspace, and return a buffer to userspace.
The default case (1) (where no input parameter is given, and an "int" is
returned to userspace) encompasses more than 90% of the cases, but there
are two other exceptions. Here is a list of exceptions:
* Protocol RAW:
* cmd = SIOCGETVIFCNT:
* input and output = struct sioc_vif_req
* cmd = SIOCGETSGCNT
* input and output = struct sioc_sg_req
* Explanation: for the SIOCGETVIFCNT case, userspace passes the input
argument, which is struct sioc_vif_req. Then the callback populates
the struct, which is copied back to userspace.
* Protocol RAW6:
* cmd = SIOCGETMIFCNT_IN6
* input and output = struct sioc_mif_req6
* cmd = SIOCGETSGCNT_IN6
* input and output = struct sioc_sg_req6
* Protocol PHONET:
* cmd == SIOCPNADDRESOURCE | SIOCPNDELRESOURCE
* input int (4 bytes)
* Nothing is copied back to userspace.
For the exception cases, functions sock_sk_ioctl_inout() will
copy the userspace input, and copy it back to kernel space.
The wrapper that prepare the buffer and put the buffer back to user is
sk_ioctl(), so, instead of calling sk->sk_prot->ioctl(), the callee now
calls sk_ioctl(), which will handle all cases.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230609152800.830401-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
DCCP was marked as Orphan in the MAINTAINERS entry 2 years ago in commit
054c4610bd ("MAINTAINERS: dccp: move Gerrit Renker to CREDITS"). It says
we haven't heard from the maintainer for five years, so DCCP is not well
maintained for 7 years now.
Recently DCCP only receives updates for bugs, and major distros disable it
by default.
Removing DCCP would allow for better organisation of TCP fields to reduce
the number of cache lines hit in the fast path.
Let's add a deprecation notice when DCCP socket is created and schedule its
removal to 2025.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Recently syzkaller reported a 7-year-old null-ptr-deref [0] that occurs
when a UDP-Lite socket tries to allocate a buffer under memory pressure.
Someone should have stumbled on the bug much earlier if UDP-Lite had been
used in a real app. Also, we do not always need a large UDP-Lite workload
to hit the bug since UDP and UDP-Lite share the same memory accounting
limit.
Removing UDP-Lite would simplify UDP code removing a bunch of conditionals
in fast path.
Let's add a deprecation notice when UDP-Lite socket is created and schedule
its removal to 2025.
Link: https://lore.kernel.org/netdev/20230523163305.66466-1-kuniyu@amazon.com/ [0]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
According to nla_parse_nested_deprecated(), the tb[] is supposed to the
destination array with maxtype+1 elements. In current
tipc_nl_media_get() and __tipc_nl_media_set(), a larger array is used
which is unnecessary. This patch resize them to a proper size.
Fixes: 1e55417d8f ("tipc: add media set to new netlink api")
Fixes: 46f15c6794 ("tipc: add media get/dump to new netlink api")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/20230614120604.1196377-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
All callers of tls_is_sk_tx_device_offloaded() currently do
an equivalent of:
if (skb->sk && tls_is_skb_tx_device_offloaded(skb->sk))
Have the helper accept skb and do the skb->sk check locally.
Two drivers have local static inlines with similar wrappers
already.
While at it change the ifdef condition to TLS_DEVICE.
Only TLS_DEVICE selects SOCK_VALIDATE_XMIT, so the two are
equivalent. This makes removing the duplicated IS_ENABLED()
check in funeth more obviously correct.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Maxim Mikityanskiy <maxtram95@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Acked-by: Dimitris Michailidis <dmichail@fungible.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 5fa5ae6058 ("netpoll: add net device refcount tracker to struct netpoll")
was part of one of the initial netdev tracker introduction patches.
It added an explicit netdev_tracker_alloc() for netpoll, presumably
because the flow of the function is somewhat odd.
After most of the core networking stack was converted to use
the tracking hold() variants, netpoll's call to old dev_hold()
stands out a bit.
np is allocated by the caller and ready to use, we can use
netdev_hold() here, even tho np->ndev will only be set to
ndev inside __netpoll_setup().
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
New users of dev_get_by_index() and dev_get_by_name() keep
getting added and it would be nice to steer them towards
the APIs with reference tracking.
Add variants of those calls which allocate the reference
tracker and use them in a couple of places.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mingshuai Ren reports:
When a new chain is added by using tc, one soft lockup alarm will be
generated after delete the prio 0 filter of the chain. To reproduce
the problem, perform the following steps:
(1) tc qdisc add dev eth0 root handle 1: htb default 1
(2) tc chain add dev eth0
(3) tc filter del dev eth0 chain 0 parent 1: prio 0
(4) tc filter add dev eth0 chain 0 parent 1:
Fix the issue by accounting for additional reference to chains that are
explicitly created by RTM_NEWCHAIN message as opposed to implicitly by
RTM_NEWTFILTER message.
Fixes: 726d061286 ("net: sched: prevent insertion of new classifiers during chain flush")
Reported-by: Mingshuai Ren <renmingshuai@huawei.com>
Closes: https://lore.kernel.org/lkml/87legswvi3.fsf@nvidia.com/T/
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Link: https://lore.kernel.org/r/20230612093426.2867183-1-vladbu@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch moves validate_linkmsg() out of do_setlink() to its callers
and deletes the early validate_linkmsg() call in __rtnl_newlink(), so
that it will not call validate_linkmsg() twice in either of the paths:
- __rtnl_newlink() -> do_setlink()
- __rtnl_newlink() -> rtnl_newlink_create() -> rtnl_create_link()
Additionally, as validate_linkmsg() is now only called with a real
dev, we can remove the NULL check for dev in validate_linkmsg().
Note that we moved validate_linkmsg() check to the places where it has
not done any changes to the dev, as Jakub suggested.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/cf2ef061e08251faf9e8be25ff0d61150c030475.1686585334.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A reference underflow is found in TLS handshake subsystem that causes a
direct use-after-free. Part of the crash log is like below:
[ 2.022114] ------------[ cut here ]------------
[ 2.022193] refcount_t: underflow; use-after-free.
[ 2.022288] WARNING: CPU: 0 PID: 60 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110
[ 2.022432] Modules linked in:
[ 2.022848] RIP: 0010:refcount_warn_saturate+0xbe/0x110
[ 2.023231] RSP: 0018:ffffc900001bfe18 EFLAGS: 00000286
[ 2.023325] RAX: 0000000000000000 RBX: 0000000000000007 RCX: 00000000ffffdfff
[ 2.023438] RDX: 0000000000000000 RSI: 00000000ffffffea RDI: 0000000000000001
[ 2.023555] RBP: ffff888004c20098 R08: ffffffff82b392c8 R09: 00000000ffffdfff
[ 2.023693] R10: ffffffff82a592e0 R11: ffffffff82b092e0 R12: ffff888004c200d8
[ 2.023813] R13: 0000000000000000 R14: ffff888004c20000 R15: ffffc90000013ca8
[ 2.023930] FS: 0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[ 2.024062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.024161] CR2: ffff888003601000 CR3: 0000000002a2e000 CR4: 00000000000006f0
[ 2.024275] Call Trace:
[ 2.024322] <TASK>
[ 2.024367] ? __warn+0x7f/0x130
[ 2.024430] ? refcount_warn_saturate+0xbe/0x110
[ 2.024513] ? report_bug+0x199/0x1b0
[ 2.024585] ? handle_bug+0x3c/0x70
[ 2.024676] ? exc_invalid_op+0x18/0x70
[ 2.024750] ? asm_exc_invalid_op+0x1a/0x20
[ 2.024830] ? refcount_warn_saturate+0xbe/0x110
[ 2.024916] ? refcount_warn_saturate+0xbe/0x110
[ 2.024998] __tcp_close+0x2f4/0x3d0
[ 2.025065] ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10
[ 2.025168] tcp_close+0x1f/0x70
[ 2.025231] inet_release+0x33/0x60
[ 2.025297] sock_release+0x1f/0x80
[ 2.025361] handshake_req_cancel_test2+0x100/0x2d0
[ 2.025457] kunit_try_run_case+0x4c/0xa0
[ 2.025532] kunit_generic_run_threadfn_adapter+0x15/0x20
[ 2.025644] kthread+0xe1/0x110
[ 2.025708] ? __pfx_kthread+0x10/0x10
[ 2.025780] ret_from_fork+0x2c/0x50
One can enable CONFIG_NET_HANDSHAKE_KUNIT_TEST config to reproduce above
crash.
The root cause of this bug is that the commit 1ce77c998f
("net/handshake: Unpin sock->file if a handshake is cancelled") adds one
additional fput() function. That patch claims that the fput() is used to
enable sock->file to be freed even when user space never calls DONE.
However, it seems that the intended DONE routine will never give an
additional fput() of ths sock->file. The existing two of them are just
used to balance the reference added in sockfd_lookup().
This patch revert the mentioned commit to avoid the use-after-free. The
patched kernel could successfully pass the KUNIT test and boot to shell.
[ 0.733613] # Subtest: Handshake API tests
[ 0.734029] 1..11
[ 0.734255] KTAP version 1
[ 0.734542] # Subtest: req_alloc API fuzzing
[ 0.736104] ok 1 handshake_req_alloc NULL proto
[ 0.736114] ok 2 handshake_req_alloc CLASS_NONE
[ 0.736559] ok 3 handshake_req_alloc CLASS_MAX
[ 0.737020] ok 4 handshake_req_alloc no callbacks
[ 0.737488] ok 5 handshake_req_alloc no done callback
[ 0.737988] ok 6 handshake_req_alloc excessive privsize
[ 0.738529] ok 7 handshake_req_alloc all good
[ 0.739036] # req_alloc API fuzzing: pass:7 fail:0 skip:0 total:7
[ 0.739444] ok 1 req_alloc API fuzzing
[ 0.740065] ok 2 req_submit NULL req arg
[ 0.740436] ok 3 req_submit NULL sock arg
[ 0.740834] ok 4 req_submit NULL sock->file
[ 0.741236] ok 5 req_lookup works
[ 0.741621] ok 6 req_submit max pending
[ 0.741974] ok 7 req_submit multiple
[ 0.742382] ok 8 req_cancel before accept
[ 0.742764] ok 9 req_cancel after accept
[ 0.743151] ok 10 req_cancel after done
[ 0.743510] ok 11 req_destroy works
[ 0.743882] # Handshake API tests: pass:11 fail:0 skip:0 total:11
[ 0.744205] # Totals: pass:17 fail:0 skip:0 total:17
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: 1ce77c998f ("net/handshake: Unpin sock->file if a handshake is cancelled")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20230613083204.633896-1-linma@zju.edu.cn
Link: https://lore.kernel.org/r/20230614015249.987448-1-linma@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* fix fragmentation for multi-link related elements
* fix callback copy/paste error
* fix multi-link locking
* remove double-locking of wiphy mutex
* transmit only on active links, not all
* activate links in the correct order
* don't remove links that weren't added
* disable soft-IRQs for LQ lock in iwlwifi
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmSJcf0ACgkQ10qiO8sP
aAC77Q//TOZSUoAFnMqsA23SXwN8CeNQC5yBmYwxVMcsqBTO6+7k0NphpFGJUcLA
OG8Leo6gwJdhLFYt7bl7RfHGSQrAZcNeYB9pv9J96vGQGnCComAAANEWSo8OBT2a
03yYrfcKfkXAUNm2dxKvwmi3D3/VPyOgj+O6LNEs1DogHw0V6GdthW3J6s/vl6RU
MPCQqlIFY9j20mXEKPFMaIZ8fyQKh38xa5YttGmeFrSUKYSljWUqqUooSMIkeyS4
D5mYdzbsqCiihnN1FenEjkBUe2eS6BzxL+KVLaY2vth4tQytGeasvCaGcLcB83nc
BxGR0rbEkrwp7nBqE4ZpMmhzHG3hpWus2+hJtMWsQku7qzE/vMh4qv2s2+QUVk/3
jCXGv233bIgvQ2d1SUqp7CenGjJ0eBfKKRVzM+Hyiz+V6kWsugMxNaBmi59JVB7w
5JilT85LfV2cRJgHtkDY7kMpDWnVYfwenvSywoXaRdVuKiowMUhZ9P19wLE0gn7K
qtKIaLnkrLE2QHdqlxcuyMPBLhfga2+qXuo94SIYMFNURW7jjJcSVlN8ZVqqBvvp
ib51XCyx/95zAr1Vyly/Pc7puuCMiiQk0ZhQBgqFPrnjs37JIzHDNo4Cq6H9+FlY
0EncP/akjy8t7PsBdTmQv1UG3wq4EG5Wmh+wLDpa5QKKs2IqcCQ=
=IPqO
-----END PGP SIGNATURE-----
Merge tag 'wireless-2023-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
A couple of straggler fixes, mostly in the stack:
- fix fragmentation for multi-link related elements
- fix callback copy/paste error
- fix multi-link locking
- remove double-locking of wiphy mutex
- transmit only on active links, not all
- activate links in the correct order
- don't remove links that weren't added
- disable soft-IRQs for LQ lock in iwlwifi
* tag 'wireless-2023-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: iwlwifi: mvm: spin_lock_bh() to fix lockdep regression
wifi: mac80211: fragment per STA profile correctly
wifi: mac80211: Use active_links instead of valid_links in Tx
wifi: cfg80211: remove links only on AP
wifi: mac80211: take lock before setting vif links
wifi: cfg80211: fix link del callback to call correct handler
wifi: mac80211: fix link activation settings order
wifi: cfg80211: fix double lock bug in reg_wdev_chan_valid()
====================
Link: https://lore.kernel.org/r/20230614075502.11765-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This filter already exists for excluding IPv6 SNMP stats. Extend its
definition to also exclude IFLA_VF_INFO stats in RTM_GETLINK.
This patch constitutes a partial fix for a netlink attribute nesting
overflow bug in IFLA_VFINFO_LIST. By excluding the stats when the
requester doesn't need them, the truncation of the VF list is avoided.
While it was technically only the stats added in commit c5a9f6f0ab
("net/core: Add drop counters to VF statistics") breaking the camel's
back, the appreciable size of the stats data should never have been
included without due consideration for the maximum number of VFs
supported by PCI.
Fixes: 3b766cd832 ("net/core: Add reading VF statistics through the PF netdevice")
Fixes: c5a9f6f0ab ("net/core: Add drop counters to VF statistics")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Cc: Edwin Peer <espeer@gmail.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://lore.kernel.org/r/20230611105108.122586-1-gal@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
Direct replacement is safe here since LOCAL_ASSIGN is only used by
TRACE macros and the return values are ignored.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230613003404.3538524-1-azeemshaikh38@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
Direct replacement is safe here since WIPHY_ASSIGN is only used by
TRACE macros and the return values are ignored.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230612232301.2572316-1-azeemshaikh38@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
An AP part of an AP MLD might be temporarily disabled, and might be
enabled later. Such a link should be included in the association
exchange, but should not be used until enabled.
Extend the NL80211_CMD_ASSOCIATE to also indicate disabled links.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230608163202.c4c61ee4c4a5.I784ef4a0d619fc9120514b5615458fbef3b3684a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
As a preparation to support disabled/dormant links, add the
following function:
- ieee80211_vif_usable_links(): returns the bitmap of the links
that can be activated. Use this function in all the places that
the bitmap of the usable links is needed.
- ieee80211_vif_is_mld(): returns true iff the vif is an MLD.
Use this function in all the places where an indication that the
connection is a MLD is needed.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230608163202.86e3351da1fc.If6fe3a339fda2019f13f57ff768ecffb711b710a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There are cases in which we don't want the user to override the
smps mode, e.g. when SMPS should be disabled due to EMLSR. Add
a driver flag to disable SMPS overriding and don't override if
it is set.
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230608163202.ef129e80556c.I74a298fdc86b87074c95228d3916739de1400597@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If we get an NDP (null data packet), there's reason to
believe the peer is just sending it to probe, and that
would happen at a low rate. Don't track this packet for
purposes of last RX rate reporting.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230608163202.8af46c4ac094.I13d9d5019addeaa4aff3c8a05f56c9f5a86b1ebd@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The channel switch parsing code would simply return if a scan is
in-progress. Supposedly, this was because channel switch announcements
from other APs should be ignored.
For the beacon case, the function is already only called if we are
associated with the sender. For the action frame cases, add the
appropriate check whether the frame is coming from the AP we are
associated with. Finally, drop the scanning check from
ieee80211_sta_process_chanswitch.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230608163202.3366e9302468.I6c7e0b58c33b7fb4c675374cfe8c3a5cddcec416@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In suspend flow "sdata" is NULL, destroy all roc's which are started.
pass "roc->sdata" to drv_cancel_remain_on_channel() to avoid NULL
dereference and destroy that roc
Signed-off-by: Anjaneyulu <pagadala.yesu.anjaneyulu@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230608163202.c678187a308c.Ic11578778655e273931efc5355d570a16465d1be@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There's quite a bit of code accessing sband iftype data
(HE, HE 6 GHz, EHT) and we always need to remember to use
the ieee80211_vif_type_p2p() helper. Add new helpers to
directly get it from the sband/vif rather than having to
call ieee80211_vif_type_p2p().
Convert most code with the following spatch:
@@
expression vif, sband;
@@
-ieee80211_get_he_iftype_cap(sband, ieee80211_vif_type_p2p(vif))
+ieee80211_get_he_iftype_cap_vif(sband, vif)
@@
expression vif, sband;
@@
-ieee80211_get_eht_iftype_cap(sband, ieee80211_vif_type_p2p(vif))
+ieee80211_get_eht_iftype_cap_vif(sband, vif)
@@
expression vif, sband;
@@
-ieee80211_get_he_6ghz_capa(sband, ieee80211_vif_type_p2p(vif))
+ieee80211_get_he_6ghz_capa_vif(sband, vif)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230604120651.db099f49e764.Ie892966c49e22c7b7ee1073bc684f142debfdc84@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Increase the size of S1G rate_info flags to support S1G and add
flags for new S1G MCS and the supported bandwidths. Also, include
S1G rate information to netlink STA rate message. Lastly, add
rate calculation function for S1G MCS.
Signed-off-by: Gilad Itzkovitch <gilad.itzkovitch@morsemicro.com>
Link: https://lore.kernel.org/r/20230518000723.991912-1-gilad.itzkovitch@morsemicro.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
mini_Qdisc_pair::p_miniq is a double pointer to mini_Qdisc, initialized
in ingress_init() to point to net_device::miniq_ingress. ingress Qdiscs
access this per-net_device pointer in mini_qdisc_pair_swap(). Similar
for clsact Qdiscs and miniq_egress.
Unfortunately, after introducing RTNL-unlocked RTM_{NEW,DEL,GET}TFILTER
requests (thanks Hillf Danton for the hint), when replacing ingress or
clsact Qdiscs, for example, the old Qdisc ("@old") could access the same
miniq_{in,e}gress pointer(s) concurrently with the new Qdisc ("@new"),
causing race conditions [1] including a use-after-free bug in
mini_qdisc_pair_swap() reported by syzbot:
BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901
...
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319
print_report mm/kasan/report.c:430 [inline]
kasan_report+0x11c/0x130 mm/kasan/report.c:536
mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
tcf_chain_head_change_item net/sched/cls_api.c:495 [inline]
tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509
tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline]
tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline]
tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266
...
@old and @new should not affect each other. In other words, @old should
never modify miniq_{in,e}gress after @new, and @new should not update
@old's RCU state.
Fixing without changing sch_api.c turned out to be difficult (please
refer to Closes: for discussions). Instead, make sure @new's first call
always happen after @old's last call (in {ingress,clsact}_destroy()) has
finished:
In qdisc_graft(), return -EBUSY if @old has any ongoing filter requests,
and call qdisc_destroy() for @old before grafting @new.
Introduce qdisc_refcount_dec_if_one() as the counterpart of
qdisc_refcount_inc_nz() used for filter requests. Introduce a
non-static version of qdisc_destroy() that does a TCQ_F_BUILTIN check,
just like qdisc_put() etc.
Depends on patch "net/sched: Refactor qdisc_graft() for ingress and
clsact Qdiscs".
[1] To illustrate, the syzkaller reproducer adds ingress Qdiscs under
TC_H_ROOT (no longer possible after commit c7cfbd1150 ("net/sched:
sch_ingress: Only create under TC_H_INGRESS")) on eth0 that has 8
transmission queues:
Thread 1 creates ingress Qdisc A (containing mini Qdisc a1 and a2),
then adds a flower filter X to A.
Thread 2 creates another ingress Qdisc B (containing mini Qdisc b1 and
b2) to replace A, then adds a flower filter Y to B.
Thread 1 A's refcnt Thread 2
RTM_NEWQDISC (A, RTNL-locked)
qdisc_create(A) 1
qdisc_graft(A) 9
RTM_NEWTFILTER (X, RTNL-unlocked)
__tcf_qdisc_find(A) 10
tcf_chain0_head_change(A)
mini_qdisc_pair_swap(A) (1st)
|
| RTM_NEWQDISC (B, RTNL-locked)
RCU sync 2 qdisc_graft(B)
| 1 notify_and_destroy(A)
|
tcf_block_release(A) 0 RTM_NEWTFILTER (Y, RTNL-unlocked)
qdisc_destroy(A) tcf_chain0_head_change(B)
tcf_chain0_head_change_cb_del(A) mini_qdisc_pair_swap(B) (2nd)
mini_qdisc_pair_swap(A) (3rd) |
... ...
Here, B calls mini_qdisc_pair_swap(), pointing eth0->miniq_ingress to
its mini Qdisc, b1. Then, A calls mini_qdisc_pair_swap() again during
ingress_destroy(), setting eth0->miniq_ingress to NULL, so ingress
packets on eth0 will not find filter Y in sch_handle_ingress().
This is just one of the possible consequences of concurrently accessing
miniq_{in,e}gress pointers.
Fixes: 7a096d579e ("net: sched: ingress: set 'unlocked' flag for Qdisc ops")
Fixes: 87f373921c ("net: sched: ingress: set 'unlocked' flag for clsact Qdisc ops")
Reported-by: syzbot+b53a9c0d1ea4ad62da8b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/0000000000006cf87705f79acf1a@google.com/
Cc: Hillf Danton <hdanton@sina.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Grafting ingress and clsact Qdiscs does not need a for-loop in
qdisc_graft(). Refactor it. No functional changes intended.
Tested-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently UNREPLIED and UNASSURED connections are added to the nf flow
table. This causes the following connection packets to be processed
by the flow table which then skips conntrack_in(), and thus such the
connections will remain UNREPLIED and UNASSURED even if reply traffic
is then seen. Even still, the unoffloaded reply packets are the ones
triggering hardware update from new to established state, and if
there aren't any to triger an update and/or previous update was
missed, hardware can get out of sync with sw and still mark
packets as new.
Fix the above by:
1) Not skipping conntrack_in() for UNASSURED packets, but still
refresh for hardware, as before the cited patch.
2) Try and force a refresh by reply-direction packets that update
the hardware rules from new to established state.
3) Remove any bidirectional flows that didn't failed to update in
hardware for re-insertion as bidrectional once any new packet
arrives.
Fixes: 6a9bad0069 ("net/sched: act_ct: offload UDP NEW connections")
Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/1686313379-117663-1-git-send-email-paulb@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
sopass won't be set if wolopt doesn't change. This means the following
will fail to set the correct sopass.
ethtool -s eth0 wol s sopass 11:22:33:44:55:66
ethtool -s eth0 wol s sopass 22:44:55:66:77:88
Make sure we call into the driver layer set_wol if sopass is different.
Fixes: 55b24334c0 ("ethtool: ioctl: improve error checking for set_wol")
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Link: https://lore.kernel.org/r/1686605822-34544-1-git-send-email-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rewrite the AF_KCM transmission loop to send all the fragments in a single
skb or frag_list-skb in one sendmsg() with MSG_SPLICE_PAGES set. The list
of fragments in each skb is conveniently a bio_vec[] that can just be
attached to a BVEC iter.
Note: I'm working out the size of each fragment-skb by adding up bv_len for
all the bio_vecs in skb->frags[] - but surely this information is recorded
somewhere? For the skbs in head->frag_list, this is equal to
skb->data_len, but not for the head. head->data_len includes all the tail
frags too.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Tom Herbert <tom@herbertland.com>
cc: Tom Herbert <tom@quantonium.net>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When transmitting data, call down into the transport socket using sendmsg
with MSG_SPLICE_PAGES to indicate that content should be spliced rather
than using sendpage.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Tom Herbert <tom@herbertland.com>
cc: Tom Herbert <tom@quantonium.net>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make tcp_bpf_sendpage() a wrapper around tcp_bpf_sendmsg(MSG_SPLICE_PAGES)
rather than a loop calling tcp_sendpage(). sendpage() will be removed in
the future.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jakub Sitnicki <jakub@cloudflare.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When transmitting data, call down into TCP using sendmsg with
MSG_SPLICE_PAGES to indicate that content should be spliced rather than
performing sendpage calls to transmit header, data pages and trailer.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: Anna Schumaker <anna@kernel.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support to the tc flower classifier to match based on fields in CFM
information elements like level and opcode.
tc filter add dev ens6 ingress protocol 802.1q \
flower vlan_id 698 vlan_ethtype 0x8902 cfm mdl 5 op 46 \
action drop
Signed-off-by: Zahari Doychev <zdoychev@maxlinear.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add support for dissecting cfm packets. The cfm packet header
fields maintenance domain level and opcode can be dissected.
Signed-off-by: Zahari Doychev <zdoychev@maxlinear.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Get rid of the completion wait in svc_rdma_sendto(), and release
pages in the send completion handler again. A subsequent patch will
handle releasing those pages more efficiently.
Reverted by hand: patch -R would not apply cleanly.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Pre-requisite for releasing pages in the send completion handler.
Reverted by hand: patch -R would not apply cleanly.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Pre-requisite for releasing pages in the send completion handler.
Reverted by hand: patch -R would not apply cleanly.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The physical device's favored NUMA node ID is available when
allocating a rw_ctxt. Use that value instead of relying on the
assumption that the memory allocation happens to be running on a
node close to the device.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The physical device's favored NUMA node ID is available when
allocating a send_ctxt. Use that value instead of relying on the
assumption that the memory allocation happens to be running on a
node close to the device.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The physical device's favored NUMA node ID is available when
allocating a recv_ctxt. Use that value instead of relying on the
assumption that the memory allocation happens to be running on a
node close to the device.
This clean up eliminates the hack of destroying recv_ctxts that
were not created by the receive CQ thread -- recv_ctxts are now
always allocated on a "good" node.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The physical device's NUMA node ID is available when allocating an
svc_xprt for an incoming connection. Use that value to ensure the
svc_xprt structure is allocated on the NUMA node closest to the
device.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now all tcp_stream_alloc_skb() callers pass @size == 0, we can
remove this parameter.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now all skbs in write queue do not contain any payload in skb->head,
we can remove some dead code.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_send_syn_data() is the last component in TCP transmit
path to put payload in skb->head.
Switch it to use page frags, so that we can remove dead
code later.
This allows to put more payload than previous implementation.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ethtool currently requires a header nest (which is used to carry
the common family options) in all requests including dumps.
$ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get
lib.ynl.NlError: Netlink error: Invalid argument
nl_len = 64 (48) nl_flags = 0x300 nl_type = 2
error: -22 extack: {'msg': 'request header missing'}
$ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get \
--json '{"header":{}}'; )
[{'combined-count': 1,
'combined-max': 1,
'header': {'dev-index': 2, 'dev-name': 'enp1s0'}}]
Requiring the header nest to always be there may seem nice
from the consistency perspective, but it's not serving any
practical purpose. We shouldn't burden the user like this.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 4a19edb60d ("netlink: Pass extack to dump handlers")
added extack support to netlink dumps. It was focused on rtnl
and since rtnl does not use ->start(), ->done() callbacks
it ignored those. Genetlink on the other hand uses ->start()
extensively, for parsing and input validation.
Pass the extact in via struct netlink_dump_control and link
it to cb for the time of ->start(). Both struct netlink_dump_control
and extack itself live on the stack so we can't keep the same
extack for the duration of the dump. This means that the extack
visible in ->start() and each ->dump() callbacks will be different.
Corner cases like reporting a warning message in DONE across dump
calls are still not supported.
We could put the extack (for dumps) in the socket struct,
but layering makes it slightly awkward (extack pointer is decided
before the DO / DUMP split).
The genetlink dump error extacks are now surfaced:
$ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get
lib.ynl.NlError: Netlink error: Invalid argument
nl_len = 64 (48) nl_flags = 0x300 nl_type = 2
error: -22 extack: {'msg': 'request header missing'}
Previously extack was missing:
$ cli.py --spec netlink/specs/ethtool.yaml --dump channels-get
lib.ynl.NlError: Netlink error: Invalid argument
nl_len = 36 (20) nl_flags = 0x100 nl_type = 2
error: -22
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Let's make CONFIG_UNIX a bool instead of a tristate.
We've decided to do that during discussion about SCM_PIDFD patchset [1].
[1] https://lore.kernel.org/lkml/20230524081933.44dc8bea@kernel.org/
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Luca Boccassi <bluca@debian.org>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Acked-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add SO_PEERPIDFD which allows to get pidfd of peer socket holder pidfd.
This thing is direct analog of SO_PEERCRED which allows to get plain PID.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Luca Boccassi <bluca@debian.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: bpf@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Tested-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement SCM_PIDFD, a new type of CMSG type analogical to SCM_CREDENTIALS,
but it contains pidfd instead of plain pid, which allows programmers not
to care about PID reuse problem.
We mask SO_PASSPIDFD feature if CONFIG_UNIX is not builtin because
it depends on a pidfd_prepare() API which is not exported to the kernel
modules.
Idea comes from UAPI kernel group:
https://uapi-group.org/kernel-features/
Big thanks to Christian Brauner and Lennart Poettering for productive
discussions about this.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: David Ahern <dsahern@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Luca Boccassi <bluca@debian.org>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Tested-by: Luca Boccassi <bluca@debian.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since its introduction, the ovs module execute_hash action allowed
hash algorithms other than the skb->l4_hash to be used. However,
additional hash algorithms were not implemented. This means flows
requiring different hash distributions weren't able to use the
kernel datapath.
Now, introduce support for symmetric hashing algorithm as an
alternative hash supported by the ovs module using the flow
dissector.
Output of flow using l4_sym hash:
recirc_id(0),in_port(3),eth(),eth_type(0x0800),
ipv4(dst=64.0.0.0/192.0.0.0,proto=6,frag=no), packets:30473425,
bytes:45902883702, used:0.000s, flags:SP.,
actions:hash(sym_l4(0)),recirc(0xd)
Some performance testing with no GRO/GSO, two veths, single flow:
hash(l4(0)): 4.35 GBits/s
hash(l4_sym(0)): 4.24 GBits/s
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The taprio Qdisc creates child classes per netdev TX queue, but
taprio_dump_class_stats() currently reports offload statistics per
traffic class. Traffic classes are groups of TXQs sharing the same
dequeue priority, so this is incorrect and we shouldn't be bundling up
the TXQ stats when reporting them, as we currently do in enetc.
Modify the API from taprio to drivers such that they report TXQ offload
stats and not TC offload stats.
There is no change in the UAPI or in the global Qdisc stats.
Fixes: 6c1adb650c ("net/sched: taprio: add netlink reporting for offload statistics counters")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For BEET the inner address and therefore family is stored in the
xfrm_state selector. Use that when decapsulating an input packet
instead of incorrectly relying on a non-existent tunnel protocol.
Fixes: 5f24f41e8e ("xfrm: Remove inner/outer modes from input path")
Reported-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
The sctp_sf_eat_auth() function is supposed to enum sctp_disposition
values and returning a kernel error code will cause issues in the
caller. Change -ENOMEM to SCTP_DISPOSITION_NOMEM.
Fixes: 65b07e5d0d ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sctp_sf_eat_auth() function is supposed to return enum sctp_disposition
values but if the call to sctp_ulpevent_make_authkey() fails, it returns
-ENOMEM.
This results in calling BUG() inside the sctp_side_effects() function.
Calling BUG() is an over reaction and not helpful. Call WARN_ON_ONCE()
instead.
This code predates git.
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
./net/sched/act_pedit.c:245:21-28: WARNING opportunity for kmemdup.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5478
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When fragmenting the ML per STA profile, the element ID should be
IEEE80211_MLE_SUBELEM_PER_STA_PROFILE rather than WLAN_EID_FRAGMENT.
Change the helper function to take the to be used element ID and pass
the appropriate value for each of the fragmentation levels.
Fixes: 81151ce462 ("wifi: mac80211: support MLO authentication/association with one link")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230611121219.9b5c793d904b.I7dad952bea8e555e2f3139fbd415d0cd2b3a08c3@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Now that the preparation of an rq_vec has been removed from the
generic read path, nfsd_splice_read() no longer needs to reset
rq_next_page.
nfsd4_encode_read() calls nfsd_splice_read() directly. As far as I
can ascertain, resetting rq_next_page for NFSv4 splice reads is
unnecessary because rq_next_page is already set correctly.
Moreover, resetting it might even be incorrect if previous
operations in the COMPOUND have already consumed at least a page of
the send buffer. I would expect that the result would be encoding
the READ payload over previously-encoded results.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmSCMbUACgkQ1V2XiooU
IOR4Vw//UPJALibmSrS4+yN7nHM3Bwy40ZVCYZkskFuotDU4pkVO6/GKZKZAF9ZM
jgYXb5in61lnatyLlolZeg2lbZM4XmKxi4hZa7hXy2oHvp7p/CkIHri2Sn/dSTT1
fv+/LYeQFc6w3o+z7uUMWG+WruDLyFREAEw5A2vbVLOAvLCsjPYxSWOHOBOuMgg/
dpRMlcgkBDRycy2a9wVhCSDxB/pwv1ksk5Ev8TIC+ACOI4WCauzFvPlO4IrbjiGK
GG4avni4q2R1ia9EbVE4OZiaFYOQyABS8XbJX32vcMN2BVoRCnmacJNNo9Q0jpM8
rRjThsDsKPQARIEB0/HAccO++afrtQWZPQehUNskmd/d3g79tk+3k/hgrP64anz9
C7GuWTT40Xaq7nccsrrOPfHHODYM5IZilw6kVgkmfFUQ0m0mD1A79Mo4XCRxpOAG
adMI9Zy+1tGPfZVTvb1WxMEyk52agYfUc9obxomiTIdK8/5HEK3c5Z2oMOGiTjH/
5Msj6VNTGRjhfFXKiLLk/cb0nhWbuAORwTHahvFLqYa2m1/Gpf+Kvt4jkqCjtmy+
TfOlYPfEo1+qVRSWy8pcGKhMX5H6oYXwff1s4Bj0ycoxVIIJF3qeIdKSP64AGxh+
yahIw0GfvBDA3ZuSYcHIYeEuAP7yFRwL6lX4P/Rz5PNxsj+2YxY=
=H0/i
-----END PGP SIGNATURE-----
Merge tag 'nf-23-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
netfilter pull request 23-06-08
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter fixes for net:
1) Add commit and abort set operation to pipapo set abort path.
2) Bail out immediately in case of ENOMEM in nfnetlink batch.
3) Incorrect error path handling when creating a new rule leads to
dangling pointer in set transaction list.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a shift wrapping bug in this code on 32-bit architectures.
NETLBL_CATMAP_MAPTYPE is u64, bitmap is unsigned long.
Every second 32-bit word of catmap becomes corrupted.
Signed-off-by: Dmitry Mastykin <dmastykin@astralinux.ru>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move declarations into include/net/gso.h and code into net/core/gso.c
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230608191738.3947077-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch unifies the three PM set_flags() interfaces:
mptcp_pm_nl_set_flags() in mptcp/pm_netlink.c for the in-kernel PM and
mptcp_userspace_pm_set_flags() in mptcp/pm_userspace.c for the
userspace PM.
They'll be switched in the common PM infterface mptcp_pm_set_flags() in
mptcp/pm.c based on whether token is NULL or not.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch unifies the three PM get_flags_and_ifindex_by_id() interfaces:
mptcp_pm_nl_get_flags_and_ifindex_by_id() in mptcp/pm_netlink.c for the
in-kernel PM and mptcp_userspace_pm_get_flags_and_ifindex_by_id() in
mptcp/pm_userspace.c for the userspace PM.
They'll be switched in the common PM infterface
mptcp_pm_get_flags_and_ifindex_by_id() in mptcp/pm.c based on whether
mptcp_pm_is_userspace() or not.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch unifies the three PM get_local_id() interfaces:
mptcp_pm_nl_get_local_id() in mptcp/pm_netlink.c for the in-kernel PM and
mptcp_userspace_pm_get_local_id() in mptcp/pm_userspace.c for the
userspace PM.
They'll be switched in the common PM infterface mptcp_pm_get_local_id()
in mptcp/pm.c based on whether mptcp_pm_is_userspace() or not.
Also put together the declarations of these three functions in protocol.h.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rename local_address() with "mptcp_" prefix and export it in protocol.h.
This function will be re-used in the common PM code (pm.c) in the
following commit.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The second pull request for v6.5. We have support for three new
Realtek chipsets, all from different generations. Shows how active
Realtek development is right now, even older generations are being
worked on.
Note: We merged wireless into wireless-next to avoid complex conflicts
between the trees.
Major changes:
rtl8xxxu
* RTL8192FU support
rtw89
* RTL8851BE support
rtw88
* RTL8723DS support
ath11k
* Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID
Advertisement (EMA) support in AP mode
iwlwifi
* support for segmented PNVM images and power tables
* new vendor entries for PPAG (platform antenna gain) feature
cfg80211/mac80211
* more Multi-Link Operation (MLO) support such as hardware restart
* fixes for a potential work/mutex deadlock and with it beginnings of
the previously discussed locking simplifications
-----BEGIN PGP SIGNATURE-----
iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmSDHDERHGt2YWxvQGtl
cm5lbC5vcmcACgkQbhckVSbrbZsMDgf/YWmT/b6W4pYRltHRPYksxKfxASGf6mvZ
UXo5vEehCF4eY2vo4MXTknVJOjrvfbqK5uvWnUfqIYBMgCThBAlG1dz+4Ziiwtw5
QrSKG2rTQxRlI8ecbfe1Kd1TXr9bSvBAtoNAmElgxcRqG1GbiSqUBvcUox63AYuB
yozONCvdAOsrPJTcCX+ThP+ABayAonEZf7YeISIRU6q3QMkbVksHAGAe3TnIAPJq
8l5fQQmHwC0CxActOT2fH8WcJ3dyl5OTqjziRfjrh9T8QUJRp9Yl7zJhpYiBT116
aCVg/NRxAFT+RkKWdQ6jRHMVOAtzL2fe6fuVW0sk7e1/s5uzyyBxJw==
=hq6p
-----END PGP SIGNATURE-----
Merge tag 'wireless-next-2023-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.5
The second pull request for v6.5. We have support for three new
Realtek chipsets, all from different generations. Shows how active
Realtek development is right now, even older generations are being
worked on.
Note: We merged wireless into wireless-next to avoid complex conflicts
between the trees.
Major changes:
rtl8xxxu
- RTL8192FU support
rtw89
- RTL8851BE support
rtw88
- RTL8723DS support
ath11k
- Multiple Basic Service Set Identifier (MBSSID) and Enhanced MBSSID
Advertisement (EMA) support in AP mode
iwlwifi
- support for segmented PNVM images and power tables
- new vendor entries for PPAG (platform antenna gain) feature
cfg80211/mac80211
- more Multi-Link Operation (MLO) support such as hardware restart
- fixes for a potential work/mutex deadlock and with it beginnings of
the previously discussed locking simplifications
* tag 'wireless-next-2023-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (162 commits)
wifi: rtlwifi: remove misused flag from HAL data
wifi: rtlwifi: remove unused dualmac control leftovers
wifi: rtlwifi: remove unused timer and related code
wifi: rsi: Do not set MMC_PM_KEEP_POWER in shutdown
wifi: rsi: Do not configure WoWlan in shutdown hook if not enabled
wifi: brcmfmac: Detect corner error case earlier with log
wifi: rtw89: 8852c: update RF radio A/B parameters to R63
wifi: rtw89: 8852c: update TX power tables to R63 with 6 GHz power type (3 of 3)
wifi: rtw89: 8852c: update TX power tables to R63 with 6 GHz power type (2 of 3)
wifi: rtw89: 8852c: update TX power tables to R63 with 6 GHz power type (1 of 3)
wifi: rtw89: process regulatory for 6 GHz power type
wifi: rtw89: regd: update regulatory map to R64-R40
wifi: rtw89: regd: judge 6 GHz according to chip and BIOS
wifi: rtw89: refine clearing supported bands to check 2/5 GHz first
wifi: rtw89: 8851b: configure CRASH_TRIGGER feature for 8851B
wifi: rtw89: set TX power without precondition during setting channel
wifi: rtw89: debug: txpwr table access only valid page according to chip
wifi: rtw89: 8851b: enable hw_scan support
wifi: cfg80211: move scan done work to wiphy work
wifi: cfg80211: move sched scan stop to wiphy work
...
====================
Link: https://lore.kernel.org/r/87bkhohkbg.fsf@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We are now in a position where no caller of pin_user_pages() requires the
vmas parameter at all, so eliminate this parameter from the function and
all callers.
This clears the way to removing the vmas parameter from GUP altogether.
Link: https://lkml.kernel.org/r/195a99ae949c9f5cb589d2222b736ced96ec199a.1684350871.git.lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> [qib]
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> [drivers/media]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ieee80211_vif_set_links requires the sdata->local->mtx lock to be held.
Add the appropriate locking around the calls in both the link add and
remove handlers.
This causes a warning when e.g. ieee80211_link_release_channel is called
via ieee80211_link_stop from ieee80211_vif_update_links.
Fixes: 0d8c4a3c86 ("wifi: mac80211: implement add/del interface link callbacks")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230608163202.fa0c6597fdad.I83dd70359f6cda30f86df8418d929c2064cf4995@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The wrapper function was incorrectly calling the add handler instead of
the del handler. This had no negative side effect as the default
handlers are essentially identical.
Fixes: f2a0290b2d ("wifi: cfg80211: add optional link add/remove callbacks")
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230608163202.ebd00e000459.Iaff7dc8d1cdecf77f53ea47a0e5080caa36ea02a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In the normal MLME code we always call
ieee80211_mgd_set_link_qos_params() before
ieee80211_link_info_change_notify() and some drivers,
notably iwlwifi, rely on that as they don't do anything
(but store the data) in their conf_tx.
Fix the order here to be the same as in the normal code
paths, so this isn't broken.
Fixes: 3d90110292 ("wifi: mac80211: implement link switching")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230608163202.a2a86bba2f80.Iac97e04827966d22161e63bb6e201b4061e9651b@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The locking was changed recently so now the caller holds the wiphy_lock()
lock. Taking the lock inside the reg_wdev_chan_valid() function will
lead to a deadlock.
Fixes: f7e60032c6 ("wifi: cfg80211: fix locking in regulatory disconnect")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/40c4114a-6cb4-4abf-b013-300b598aba65@moroto.mountain
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In the event of a failure in tcf_change_indev(), u32_set_parms() will
immediately return without decrementing the recently incremented
reference counter. If this happens enough times, the counter will
rollover and the reference freed, leading to a double free which can be
used to do 'bad things'.
In order to prevent this, move the point of possible failure above the
point where the reference counter is incremented. Also save any
meaningful return values to be applied to the return data at the
appropriate point in time.
This issue was caught with KASAN.
Fixes: 705c709126 ("net: sched: cls_u32: no need to call tcf_exts_change for newly allocated struct")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Lee Jones <lee@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As shown in [1], out-of-bounds access occurs in two cases:
1)when the qdisc of the taprio type is used to replace the previously
configured taprio, count and offset in tc_to_txq can be set to 0. In this
case, the value of *txq in taprio_next_tc_txq() will increases
continuously. When the number of accessed queues exceeds the number of
queues on the device, out-of-bounds access occurs.
2)When packets are dequeued, taprio can be deleted. In this case, the tc
rule of dev is cleared. The count and offset values are also set to 0. In
this case, out-of-bounds access is also caused.
Now the restriction on the queue number is added.
[1] https://groups.google.com/g/syzkaller-bugs/c/_lYOKgkBVMg
Fixes: 2f530df76c ("net/sched: taprio: give higher priority to higher TCs in software dequeue mode")
Reported-by: syzbot+04afcb3d2c840447559a@syzkaller.appspotmail.com
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Tested-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of relying on skb->transport_header being set correctly, opt
instead to parse the L3 header length out of the L3 headers for both
IPv4/IPv6 when the Extended Layer Op for tcp/udp is used. This fixes a
bug if GRO is disabled, when GRO is disabled skb->transport_header is
set by __netif_receive_skb_core() to point to the L3 header, it's later
fixed by the upper protocol layers, but act_pedit will receive the SKB
before the fixups are completed. The existing behavior causes the
following to edit the L3 header if GRO is disabled instead of the UDP
header:
tc filter add dev eth0 ingress protocol ip flower ip_proto udp \
dst_ip 192.168.1.3 action pedit ex munge udp set dport 18053
Also re-introduce a rate-limited warning if we were unable to extract
the header offset when using the 'ex' interface.
Fixes: 71d0ed7079 ("net/act_pedit: Support using offset relative to
the conventional network headers")
Signed-off-by: Max Tottenham <mtottenh@akamai.com>
Reviewed-by: Josh Hunt <johunt@akamai.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202305261541.N165u9TZ-lkp@intel.com/
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change ndo_set_mac_address to dev_set_mac_address because
dev_set_mac_address provides a way to notify network layer about MAC
change. In other case, services may not aware about MAC change and keep
using old one which set from network adapter driver.
As example, DHCP client from systemd do not update MAC address without
notification from net subsystem which leads to the problem with acquiring
the right address from DHCP server.
Fixes: cb10c7c0df ("net/ncsi: Add NCSI Broadcom OEM command")
Cc: stable@vger.kernel.org # v6.0+ 2f38e84 net/ncsi: make one oem_gma function for all mfr id
Signed-off-by: Paul Fertser <fercerpav@gmail.com>
Signed-off-by: Ivan Mikhaylov <fr0st61te@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the one Get Mac Address function for all manufacturers and change
this call in handlers accordingly.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Ivan Mikhaylov <fr0st61te@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before Linux v5.8 an AF_INET6 SOCK_DGRAM (udp/udplite) socket
with SOL_UDP, UDP_ENCAP, UDP_ENCAP_ESPINUDP{,_NON_IKE} enabled
would just unconditionally use xfrm4_udp_encap_rcv(), afterwards
such a socket would use the newly added xfrm6_udp_encap_rcv()
which only handles IPv6 packets.
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Benedict Wong <benedictwong@google.com>
Cc: Yan Yan <evitayan@google.com>
Fixes: 0146dca70b ("xfrm: add support for UDPv6 encapsulation of ESP")
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Convert tls_device_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather
than directly splicing in the pages itself. With that, the tls_iter_offset
union is no longer necessary and can be replaced with an iov_iter pointer
and the zc_page argument to tls_push_data() can also be removed.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make TLS's device sendmsg() support MSG_SPLICE_PAGES. This causes pages to
be spliced from the source iterator if possible.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Convert tls_sw_sendpage() and tls_sw_sendpage_locked() to use sendmsg()
with MSG_SPLICE_PAGES rather than directly splicing in the pages itself.
[!] Note that tls_sw_sendpage_locked() appears to have the wrong locking
upstream. I think the caller will only hold the socket lock, but it
should hold tls_ctx->tx_lock too.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make TLS's sendmsg() support MSG_SPLICE_PAGES. This causes pages to be
spliced from the source iterator if possible.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allow splice to undo the effects of MSG_MORE after prematurely ending a
splice/sendfile due to getting an EOF condition (->splice_read() returned
0) after splice had called sendmsg() with MSG_MORE set when the user didn't
set MSG_MORE.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Tom Herbert <tom@herbertland.com>
cc: Tom Herbert <tom@quantonium.net>
cc: Cong Wang <cong.wang@bytedance.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allow splice to undo the effects of MSG_MORE after prematurely ending a
splice/sendfile due to getting an EOF condition (->splice_read() returned
0) after splice had called sendmsg() with MSG_MORE set when the user didn't
set MSG_MORE.
For UDP, a pending packet will not be emitted if the socket is closed
before it is flushed; with this change, it be flushed by ->splice_eof().
For TCP, it's not clear that MSG_MORE is actually effective.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Kuniyuki Iwashima <kuniyu@amazon.com>
cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allow splice to end a TLS record after prematurely ending a splice/sendfile
due to getting an EOF condition (->splice_read() returned 0) after splice
had called TLS with a sendmsg() with MSG_MORE set when the user didn't set
MSG_MORE.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allow splice to end a TLS record after prematurely ending a splice/sendfile
due to getting an EOF condition (->splice_read() returned 0) after splice
had called TLS with a sendmsg() with MSG_MORE set when the user didn't set
MSG_MORE.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add an optional method, ->splice_eof(), to allow splice to indicate the
premature termination of a splice to struct file_operations and struct
proto_ops.
This is called if sendfile() or splice() encounters all of the following
conditions inside splice_direct_to_actor():
(1) the user did not set SPLICE_F_MORE (splice only), and
(2) an EOF condition occurred (->splice_read() returned 0), and
(3) we haven't read enough to fulfill the request (ie. len > 0 still), and
(4) we have already spliced at least one byte.
A further patch will modify the behaviour of SPLICE_F_MORE to always be
passed to the actor if either the user set it or we haven't yet read
sufficient data to fulfill the request.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christoph Hellwig <hch@lst.de>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jan Kara <jack@suse.cz>
cc: Jeff Layton <jlayton@kernel.org>
cc: David Hildenbrand <david@redhat.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: linux-mm@kvack.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Replace generic_splice_sendpage() + splice_from_pipe + pipe_to_sendpage()
with a net-specific handler, splice_to_socket(), that calls sendmsg() with
MSG_SPLICE_PAGES set instead of calling ->sendpage().
MSG_MORE is used to indicate if the sendmsg() is expected to be followed
with more data.
This allows multiple pipe-buffer pages to be passed in a single call in a
BVEC iterator, allowing the processing to be pushed down to a loop in the
protocol driver. This helps pave the way for passing multipage folios down
too.
Protocols that haven't been converted to handle MSG_SPLICE_PAGES yet should
just ignore it and do a normal sendmsg() for now - although that may be a
bit slower as it may copy everything.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Allow MSG_SPLICE_PAGES to be specified to sendmsg() but treat it as normal
sendmsg for now. This means the data will just be copied until
MSG_SPLICE_PAGES is handled.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Boris Pismenny <borisp@nvidia.com>
cc: John Fastabend <john.fastabend@gmail.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
tcp_mtu_probe() is still copying payload from skbs in the write queue,
using skb_copy_bits(), ignoring potential errors.
Modern TCP stack wants to only deal with payload found in page frags,
as this is a prereq for TCPDirect (host stack might not have access
to the payload)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230607214113.1992947-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The netlink version of set_wol checks for not supported wolopts and avoids
setting wol when the correct wolopt is already set. If we do the same with
the ioctl version then we can remove these checks from the driver layer.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/1686179653-29750-1-git-send-email-justin.chen@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ping sockets can't send packets when they're bound to a VRF master
device and the output interface is set to a slave device.
For example, when net.ipv4.ping_group_range is properly set, so that
ping6 can use ping sockets, the following kind of commands fails:
$ ip vrf exec red ping6 fe80::854:e7ff:fe88:4bf1%eth1
What happens is that sk->sk_bound_dev_if is set to the VRF master
device, but 'oif' is set to the real output device. Since both are set
but different, ping_v6_sendmsg() sees their value as inconsistent and
fails.
Fix this by allowing 'oif' to be a slave device of ->sk_bound_dev_if.
This fixes the following kselftest failure:
$ ./fcnal-test.sh -t ipv6_ping
[...]
TEST: ping out, vrf device+address bind - ns-B IPv6 LLA [FAIL]
Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Closes: https://lore.kernel.org/netdev/b6191f90-ffca-dbca-7d06-88a9788def9c@alu.unizg.hr/
Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Fixes: 5e45789698 ("net: ipv6: Fix ping to link-local addresses.")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/6c8b53108816a8d0d5705ae37bdc5a8322b5e3d9.1686153846.git.gnault@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case of error when adding a new rule that refers to an anonymous set,
deactivate expressions via NFT_TRANS_PREPARE state, not NFT_TRANS_RELEASE.
Thus, the lookup expression marks anonymous sets as inactive in the next
generation to ensure it is not reachable in this transaction anymore and
decrement the set refcount as introduced by c1592a8994 ("netfilter:
nf_tables: deactivate anonymous set from preparation phase"). The abort
step takes care of undoing the anonymous set.
This is also consistent with rule deletion, where NFT_TRANS_PREPARE is
used. Note that this error path is exercised in the preparation step of
the commit protocol. This patch replaces nf_tables_rule_release() by the
deactivate and destroy calls, this time with NFT_TRANS_PREPARE.
Due to this incorrect error handling, it is possible to access a
dangling pointer to the anonymous set that remains in the transaction
list.
[1009.379054] BUG: KASAN: use-after-free in nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379106] Read of size 8 at addr ffff88816c4c8020 by task nft-rule-add/137110
[1009.379116] CPU: 7 PID: 137110 Comm: nft-rule-add Not tainted 6.4.0-rc4+ #256
[1009.379128] Call Trace:
[1009.379132] <TASK>
[1009.379135] dump_stack_lvl+0x33/0x50
[1009.379146] ? nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379191] print_address_description.constprop.0+0x27/0x300
[1009.379201] kasan_report+0x107/0x120
[1009.379210] ? nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379255] nft_set_lookup_global+0x147/0x1a0 [nf_tables]
[1009.379302] nft_lookup_init+0xa5/0x270 [nf_tables]
[1009.379350] nf_tables_newrule+0x698/0xe50 [nf_tables]
[1009.379397] ? nf_tables_rule_release+0xe0/0xe0 [nf_tables]
[1009.379441] ? kasan_unpoison+0x23/0x50
[1009.379450] nfnetlink_rcv_batch+0x97c/0xd90 [nfnetlink]
[1009.379470] ? nfnetlink_rcv_msg+0x480/0x480 [nfnetlink]
[1009.379485] ? __alloc_skb+0xb8/0x1e0
[1009.379493] ? __alloc_skb+0xb8/0x1e0
[1009.379502] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
[1009.379509] ? unwind_get_return_address+0x2a/0x40
[1009.379517] ? write_profile+0xc0/0xc0
[1009.379524] ? avc_lookup+0x8f/0xc0
[1009.379532] ? __rcu_read_unlock+0x43/0x60
Fixes: 958bee14d0 ("netfilter: nf_tables: use new transaction infrastructure to handle sets")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cross-merge networking fixes after downstream PR.
Conflicts:
net/sched/sch_taprio.c
d636fc5dd6 ("net: sched: add rcu annotations around qdisc->qdisc_sleeping")
dced11ef84 ("net/sched: taprio: don't overwrite "sch" variable in taprio_dump_class_stats()")
net/ipv4/sysctl_net_ipv4.c
e209fee411 ("net/ipv4: ping_group_range: allow GID from 2147483648 to 4294967294")
ccce324dab ("tcp: make the first N SYN RTO backoffs linear")
https://lore.kernel.org/all/20230605100816.08d41a7b@canb.auug.org.au/
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- fix a broken sync while rescheduling delayed work, by
Vladislav Efanov
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAmSAp90WHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoZPwEADXDBvWlT7DH3sP1peNmFF/y+pd
AgNJE5wJiJGBrCA1K0gZONulyhpOLjsBtwuyWDuS143IlBSPPQqFcJgZrmU6zHfQ
MjZBOJMGEgxUbh51vQH4bVTotZB1STVSbst9+HJi+KLpgEN/BnMU2/gDfbV5JwzK
xVDk2/D0+1OX4w61V+UDqmXuljBLa5cbOUPnRP4/gz/suVui5Q0CESeB+H/One9z
RlKM6YkcSOr4y9MAIgSpJwY8O4hZ0oeqZyewMTYYWYDQ3nGpZ9NUCGR8kYozusqg
u8c71nrJwdHV8VS7IU3eEzeKXFo2uz8UxyTgK+qcsoem4oTZgCZy8nXh+Pwp2y72
R+RHFngBcKIYlvil5cyVUisnJ7GZOjHK/N2pESeG7A/iI0jU6YZgVe5oJaCHbMJl
//F6m4iFHvPAbf61f5tRePTZTPd98LC3KAlI1Fu4/g+07H0ivgsiFk+qi45OvvcE
MWvK12FlgTbCUeqjhg6bKuVva2NY5SDn1uRcVrTAD3HDpcpfw7UYv/jLBIeAwpoY
S7SLRl6xho1+aEPaJWx39DrXWQCFQZP95ygZIN5TPZcihvVp8knY0F7mLPMRovf9
WBFp89+aS/bv1x14fLrPYOzyJD0XisA3A04iERvf1eLroNa9poh3KHf5jhJLH+RP
CTumDtHSDt+GIH7E5A==
=euFF
-----END PGP SIGNATURE-----
Merge tag 'batadv-net-pullrequest-20230607' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
Here is a batman-adv bugfix:
- fix a broken sync while rescheduling delayed work,
by Vladislav Efanov
* tag 'batadv-net-pullrequest-20230607' of git://git.open-mesh.org/linux-merge:
batman-adv: Broken sync while rescheduling delayed work
====================
Link: https://lore.kernel.org/r/20230607155515.548120-1-sw@simonwunderlich.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZIDxUwAKCRDbK58LschI
g5hDAQD7ukrniCvMRNIm2yUZIGSxE4RvGiXptO4a0NfLck5R/wEAsfN2KUsPcPhW
HS37lVfx7VVXfj42+REf7lWLu4TXpwk=
=6mS/
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-06-07
We've added 7 non-merge commits during the last 7 day(s) which contain
a total of 12 files changed, 112 insertions(+), 7 deletions(-).
The main changes are:
1) Fix a use-after-free in BPF's task local storage, from KP Singh.
2) Make struct path handling more robust in bpf_d_path, from Jiri Olsa.
3) Fix a syzbot NULL-pointer dereference in sockmap, from Eric Dumazet.
4) UAPI fix for BPF_NETFILTER before final kernel ships,
from Florian Westphal.
5) Fix map-in-map array_map_gen_lookup code generation where elem_size was
not being set for inner maps, from Rhys Rustad-Elliott.
6) Fix sockopt_sk selftest's NETLINK_LIST_MEMBERSHIPS assertion,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Add extra path pointer check to d_path helper
selftests/bpf: Fix sockopt_sk selftest
bpf: netfilter: Add BPF_NETFILTER bpf_attach_type
selftests/bpf: Add access_inner_map selftest
bpf: Fix elem_size not being set for inner maps
bpf: Fix UAF in task local storage
bpf, sockmap: Avoid potential NULL dereference in sk_psock_verdict_data_ready()
====================
Link: https://lore.kernel.org/r/20230607220514.29698-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
If caller reports ENOMEM, then stop iterating over the batch and send a
single netlink message to userspace to report OOM.
Fixes: cbb8125eb4 ("netfilter: nfnetlink: deliver netlink errors on batch completion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The pipapo set backend follows copy-on-update approach, maintaining one
clone of the existing datastructure that is being updated. The clone
and current datastructures are swapped via rcu from the commit step.
The existing integration with the commit protocol is flawed because
there is no operation to clean up the clone if the transaction is
aborted. Moreover, the datastructure swap happens on set element
activation.
This patch adds two new operations for sets: commit and abort, these new
operations are invoked from the commit and abort steps, after the
transactions have been digested, and it updates the pipapo set backend
to use it.
This patch adds a new ->pending_update field to sets to maintain a list
of sets that require this new commit and abort operations.
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Move the beacon loss work that might cause a disconnect
and the CSA disconnect work to be wiphy work, so we hold
the wiphy lock for them.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Channel switch obviously must be handled per link, and we
have a (potential) deadlock when canceling that work. Use
the new delayed wiphy work to handle this instead and get
rid of the explicit timer that way too.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
SMPS requests are per link, and currently there's a potential
deadlock with canceling. Use the new wiphy work to handle SMPS
instead, so that the cancel cannot deadlock.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Since we want to have wiphy_lock() for the unregistration
in the future, unregister also netdevs via cfg80211 now
to be able to hold the wiphy_lock() for it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We'll need this later to convert other works that might
be cancelled from here, so convert this one first.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add a work abstraction at the cfg80211 level that will always
hold the wiphy_lock() for any work executed and therefore also
can be canceled safely (without waiting) while holding that.
This improves on what we do now as with the new wiphy works we
don't have to worry about locking while cancelling them safely.
Also, don't let such works run while the device is suspended,
since they'll likely need to interact with the device. Flush
them before suspend though.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Sending the wiphy out might cause calls to the driver,
notably get_txq_stats() and get_antenna(). These aren't
very important, since the normally have their own locks
and/or just send out static data, but if the contract
should be that the wiphy lock is always held, these are
also affected. Fix that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Missed this ioctl since it's in wext-sme.c where we
usually get via a front-level ioctl handler in the
other files, but it should also hold the wiphy lock
to align the locking contract towards the driver.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This is a driver callback, and the driver should be able
to assume that it's called with the wiphy lock held. Move
the call up so that's true, it has no other effect since
the device is already unregistering and we cannot reach
this function through other paths.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Most code paths in cfg80211 already hold the wiphy lock,
mostly by virtue of being called from nl80211, so make
the pmsr cleanup worker also hold it, aligning the
locking promises between different parts of cfg80211.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Most code paths in cfg80211 already hold the wiphy lock,
mostly by virtue of being called from nl80211, so make
the auto-disconnect worker also hold it, aligning the
locking promises between different parts of cfg80211.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There are a number of upcoming things in both the stack and
drivers that would otherwise conflict, so merge wireless to
wireless-next to be able to avoid those conflicts.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
try_module_get will be called in tcf_proto_lookup_ops. So module_put needs
to be called to drop the refcount if ops don't implement the required
function.
Fixes: 9f407f1768 ("net: sched: introduce chain templates")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes following sparse errors:
net/sched/act_police.c:360:28: warning: dereference of noderef expression
net/sched/act_police.c:362:45: warning: dereference of noderef expression
net/sched/act_police.c:362:45: warning: dereference of noderef expression
net/sched/act_police.c:368:28: warning: dereference of noderef expression
net/sched/act_police.c:370:45: warning: dereference of noderef expression
net/sched/act_police.c:370:45: warning: dereference of noderef expression
net/sched/act_police.c:376:45: warning: dereference of noderef expression
net/sched/act_police.c:376:45: warning: dereference of noderef expression
Fixes: d1967e495a ("net_sched: act_police: add 2 new attributes to support police 64bit rate and peakrate")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the per cpu upcall counters are allocated after the vport is
created and inserted into the system. This could lead to the datapath
accessing the counters before they are allocated resulting in a kernel
Oops.
Here is an example:
PID: 59693 TASK: ffff0005f4f51500 CPU: 0 COMMAND: "ovs-vswitchd"
#0 [ffff80000a39b5b0] __switch_to at ffffb70f0629f2f4
#1 [ffff80000a39b5d0] __schedule at ffffb70f0629f5cc
#2 [ffff80000a39b650] preempt_schedule_common at ffffb70f0629fa60
#3 [ffff80000a39b670] dynamic_might_resched at ffffb70f0629fb58
#4 [ffff80000a39b680] mutex_lock_killable at ffffb70f062a1388
#5 [ffff80000a39b6a0] pcpu_alloc at ffffb70f0594460c
#6 [ffff80000a39b750] __alloc_percpu_gfp at ffffb70f05944e68
#7 [ffff80000a39b760] ovs_vport_cmd_new at ffffb70ee6961b90 [openvswitch]
...
PID: 58682 TASK: ffff0005b2f0bf00 CPU: 0 COMMAND: "kworker/0:3"
#0 [ffff80000a5d2f40] machine_kexec at ffffb70f056a0758
#1 [ffff80000a5d2f70] __crash_kexec at ffffb70f057e2994
#2 [ffff80000a5d3100] crash_kexec at ffffb70f057e2ad8
#3 [ffff80000a5d3120] die at ffffb70f0628234c
#4 [ffff80000a5d31e0] die_kernel_fault at ffffb70f062828a8
#5 [ffff80000a5d3210] __do_kernel_fault at ffffb70f056a31f4
#6 [ffff80000a5d3240] do_bad_area at ffffb70f056a32a4
#7 [ffff80000a5d3260] do_translation_fault at ffffb70f062a9710
#8 [ffff80000a5d3270] do_mem_abort at ffffb70f056a2f74
#9 [ffff80000a5d32a0] el1_abort at ffffb70f06297dac
#10 [ffff80000a5d32d0] el1h_64_sync_handler at ffffb70f06299b24
#11 [ffff80000a5d3410] el1h_64_sync at ffffb70f056812dc
#12 [ffff80000a5d3430] ovs_dp_upcall at ffffb70ee6963c84 [openvswitch]
#13 [ffff80000a5d3470] ovs_dp_process_packet at ffffb70ee6963fdc [openvswitch]
#14 [ffff80000a5d34f0] ovs_vport_receive at ffffb70ee6972c78 [openvswitch]
#15 [ffff80000a5d36f0] netdev_port_receive at ffffb70ee6973948 [openvswitch]
#16 [ffff80000a5d3720] netdev_frame_hook at ffffb70ee6973a28 [openvswitch]
#17 [ffff80000a5d3730] __netif_receive_skb_core.constprop.0 at ffffb70f06079f90
We moved the per cpu upcall counter allocation to the existing vport
alloc and free functions to solve this.
Fixes: 95637d91fe ("net: openvswitch: release vport resources on failure")
Fixes: 1933ea365a ("net: openvswitch: Add support to count upcall packets")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtm_tca_policy is used from net/sched/sch_api.c and net/sched/cls_api.c,
thus should be declared in an include file.
This fixes the following sparse warning:
net/sched/sch_api.c:1434:25: warning: symbol 'rtm_tca_policy' was not declared. Should it be static?
Fixes: e331473fee ("net/sched: cls_api: add missing validation of netlink attributes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix incorrectly formatted tcp_syn_linear_timeouts sysctl in the
ipv4_net_table.
Fixes: ccce324dab ("tcp: make the first N SYN RTO backoffs linear")
Signed-off-by: David Morley <morleyd@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Tested-by: David Morley <morleyd@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add READ_ONCE()/WRITE_ONCE() on accesses to the sock flow table.
This also prevents a (smart ?) compiler to remove the condition in:
if (table->ents[index] != newval)
table->ents[index] = newval;
We need the condition to avoid dirtying a shared cache line.
Fixes: fec5e652e5 ("rfs: Receive Flow Steering")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Offloaded policies are deleted through two flows: netdev is going
down and policy flush.
In both cases, the code lacks relevant call to delete offloaded policy.
Fixes: 919e43fad5 ("xfrm: add an interface to offload policy")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Replace these open-code bearer rcu_dereference access with bearer_get(),
like other places in bearer.c. While at it, also use tipc_net() instead
of net_generic(net, tipc_net_id) to get "tn" in bearer.c.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/1072588a8691f970bda950c7e2834d1f2983f58e.1685976044.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Fixes to debugfs registration
- Fix use-after-free in hci_remove_ltk/hci_remove_irk
- Fixes to ISO channel support
- Fix missing checks for invalid L2CAP DCID
- Fix l2cap_disconnect_req deadlock
- Add lock to protect HCI_UNREGISTER
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmR+ftMZHGx1aXoudm9u
LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKQMXD/9NcuqbGmEzJspVA8bZ8gXD
L7a68QnacdIoqH56QstLhGPQsYH6dv9fwhpNX6AN8/j8UG8DnDXQtHyfm4gZzfYA
h8GP7+ZQIEiHivIxiamrJnQ1Ii+KYEV3NGyS43YBuuPi9LcTFR0Km42xA0GqOnDU
Hz3/n5v342479TjJPNJkFPmcUGViRaLXtKhzcBzmSykUW+SVuIuD03yxuAJcojf5
rlPYA7yho7k8BAWkcYxWAP3v9fzQVa3nz8rQO2rG+poi4La2mmqRHykuSCXmzvBX
SbZwvzqgquqgQiFLpRIo/nwnVwPu3NYK6dQzlXPqiaxfM6qAtRttwQWNnOT+UxEu
VVGk6fD9iKjo9dttq+lTSY3LI/SXWAHYByIBzjx883hJYf1YvDAMSlMlzo029xL6
BHu3hMTDhosP8sG5wFdR2KzBmUd1W/ZcwOG0UP8PjshZgrOZ3uej9p3MrocKAys7
uGOBFmGzwOaQLXJQLbd4djE5l6zLOxSCV/0OLIWQw7VFQiHb66NzN6wenYEkDnxM
j2pFAlzp4RKHHCjU3dfaE90c0ede116e9nhjAlzmUOxggg6aCxCrCkMNOI8NlZ4v
oukYWq66RWYA/J4S80OLepITtBRPVn3JFxOXss5xESFfEnzL2nRZ5gm8jJJGULU4
x6tKTHaomO99FcH0ZFlZMw==
=jMWO
-----END PGP SIGNATURE-----
Merge tag 'for-net-2023-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fixes to debugfs registration
- Fix use-after-free in hci_remove_ltk/hci_remove_irk
- Fixes to ISO channel support
- Fix missing checks for invalid L2CAP DCID
- Fix l2cap_disconnect_req deadlock
- Add lock to protect HCI_UNREGISTER
* tag 'for-net-2023-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: L2CAP: Add missing checks for invalid DCID
Bluetooth: ISO: use correct CIS order in Set CIG Parameters event
Bluetooth: ISO: don't try to remove CIG if there are bound CIS left
Bluetooth: Fix l2cap_disconnect_req deadlock
Bluetooth: hci_qca: fix debugfs registration
Bluetooth: fix debugfs registration
Bluetooth: hci_sync: add lock to protect HCI_UNREGISTER
Bluetooth: Fix use-after-free in hci_remove_ltk/hci_remove_irk
Bluetooth: ISO: Fix CIG auto-allocation to select configurable CIG
Bluetooth: ISO: consider right CIS when removing CIG at cleanup
====================
Link: https://lore.kernel.org/r/20230606003454.2392552-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmR/uDEACgkQ1V2XiooU
IOTC0BAAoKLyoPncbYOO9bTX9nbmn+gttwVd/wDJEbeAXzHSIiWJmjfCklJ9P7Bu
j3cRAOPe7qyXbUCpTTWPOMzcrjUwnnSuNjF5dgGhfgkg+jiykEuxaRJvyXJ1WKI4
v94hkmVeWB/iVpbNtFlUVzAzjemtLWU8TDEqaKRpZubaf+tNokJ3gggTlTRYslnn
YGXlaypkLh7xGUmW7q3MfmySbfj6E7dHnYJ4Df5MKMwGM3Rrbelh9/VTpn33nob2
74lWg/Gj3My9E+NjnZMoTA/YGnuUVPhYm4naIvp6Hc6IKQ3dI7NqleywxeHbuPgr
McwHtLRR8a5HJpMhPXPtA0d/Ot2LGzKo4L62Ahp4KHrTr/UKDtqSDu+9ZButue/E
0W/dKn+UA5hQKiNXOlTt25npx8VgQJFwcdCAYPJZNONCegCzl2MDVUBZufFLg6OM
JC2XMHFN1GRAHtgHMfdbM1pHYjkx9QBeYFz4zLgWmsGLIvsfgYpVE+nF6ExJsNjZ
pOILZtbAFWCUFVXWVUxJF4OkwOmpV2DhUk0hRKLOhmPD/HSoa4dvkGaB/yQB1uyz
SVfZgIrTqftLYgLvHDb9u0nRSwxibmPSCkr0C86yWRzOLJytil/qWqX6lAyMYUei
Yy8d+Kq/iX6qGJf5py9xtyXbT2Vsb5EYX7+qMu6HySngCZz+Zwo=
=tb7S
-----END PGP SIGNATURE-----
Merge tag 'nf-23-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Missing nul-check in basechain hook netlink dump path, from Gavrilov Ilia.
2) Fix bitwise register tracking, from Jeremy Sowden.
3) Null pointer dereference when accessing conntrack helper,
from Tijs Van Buggenhout.
4) Add schedule point to ipset's call_ad, from Kuniyuki Iwashima.
5) Incorrect boundary check when building chain blob.
* tag 'nf-23-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: out-of-bound check in chain blob
netfilter: ipset: Add schedule point in call_ad().
netfilter: conntrack: fix NULL pointer dereference in nf_confirm_cthelper
netfilter: nft_bitwise: fix register tracking
netfilter: nf_tables: Add null check for nla_nest_start_noflag() in nft_dump_basechain_hook()
====================
Link: https://lore.kernel.org/r/20230606225851.67394-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Both rtw88 and rtw89 have a 802.11 powersave fix for a regression
introduced in v6.0. mt76 fixes a race and a null pointer dereference.
iwlwifi fixes an issue where not enough memory was allocated for a
firmware event. And finally the stack has several smaller fixes all
over.
-----BEGIN PGP SIGNATURE-----
iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmR/S2URHGt2YWxvQGtl
cm5lbC5vcmcACgkQbhckVSbrbZuHXwgAhS9w8UIZ2qLYmLQOlby4Hx9+TV2lSdZ1
V878SCWC+/nRX1mRrWZdU5zwwXXVpLv61dCUOuYyJp8ko4izzTwUhZzvNGowaGgo
HA+KrND/rZ2ApRZDZQMpe8SXaTUZJhcRDdV4njjdeSqNEcfksgz1W8exzDpKt8YD
pAdz8+gfpBSoATRThY5p3vyeC4e1weKqbsk96SLoip/wKzz92jyUx9fyexTskfoN
WMfDU474bz4XIEXzmuFBqpwylwxTvy+FKvEVZfe9PqtXEOChqMUZGGMAemD81FY0
kKIEY21kAOBKRBW5OLNHcR0WrFcq+C17+L9eazE1F7iQiKIVQaCsag==
=a4jg
-----END PGP SIGNATURE-----
Merge tag 'wireless-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.4
Both rtw88 and rtw89 have a 802.11 powersave fix for a regression
introduced in v6.0. mt76 fixes a race and a null pointer dereference.
iwlwifi fixes an issue where not enough memory was allocated for a
firmware event. And finally the stack has several smaller fixes all
over.
* tag 'wireless-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: cfg80211: fix locking in regulatory disconnect
wifi: cfg80211: fix locking in sched scan stop work
wifi: iwlwifi: mvm: Fix -Warray-bounds bug in iwl_mvm_wait_d3_notif()
wifi: mac80211: fix switch count in EMA beacons
wifi: mac80211: don't translate beacon/presp addrs
wifi: mac80211: mlme: fix non-inheritence element
wifi: cfg80211: reject bad AP MLD address
wifi: mac80211: use correct iftype HE cap
wifi: mt76: mt7996: fix possible NULL pointer dereference in mt7996_mac_write_txwi()
wifi: rtw89: remove redundant check of entering LPS
wifi: rtw89: correct PS calculation for SUPPORTS_DYNAMIC_PS
wifi: rtw88: correct PS calculation for SUPPORTS_DYNAMIC_PS
wifi: mt76: mt7615: fix possible race in mt7615_mac_sta_poll
====================
Link: https://lore.kernel.org/r/20230606150817.EC133C433D2@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
RT_CONN_FLAGS(sk) overloads flowi4_tos with the RTO_ONLINK bit when
sk has the SOCK_LOCALROUTE flag set. This allows
ip_route_output_key_hash() to eventually adjust flowi4_scope.
Instead of relying on special handling of the RTO_ONLINK bit, we can
just set the route scope correctly. This will eventually allow to avoid
special interpretation of tos variables and to convert ->flowi4_tos to
dscp_t.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
RT_CONN_FLAGS(sk) overloads the tos parameter with the RTO_ONLINK bit
when sk has the SOCK_LOCALROUTE flag set. This is only useful for
ip_route_output_key_hash() to eventually adjust the route scope.
Let's drop RTO_ONLINK and set the correct scope directly to avoid this
special case in the future and to allow converting ->flowi4_tos to
dscp_t.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We missed that tcp_gso_segment() was assuming skb->len was smaller than 65535 :
oldlen = (u16)~skb->len;
This part came with commit 0718bcc09b ("[NET]: Fix CHECKSUM_HW GSO problems.")
This leads to wrong TCP checksum.
Adapt the code to accept arbitrary packet length.
v2:
- use two csum_add() instead of csum_fold() (Alexander Duyck)
- Change delta type to __wsum to reduce casts (Alexander Duyck)
Fixes: 09f3d1a3a5 ("ipv6/gso: remove temporary HBH/jumbo header")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230605161647.3624428-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A remote DoS vulnerability of RPL Source Routing is assigned CVE-2023-2156.
The Source Routing Header (SRH) has the following format:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next Header | Hdr Ext Len | Routing Type | Segments Left |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CmprI | CmprE | Pad | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
. .
. Addresses[1..n] .
. .
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The originator of an SRH places the first hop's IPv6 address in the IPv6
header's IPv6 Destination Address and the second hop's IPv6 address as
the first address in Addresses[1..n].
The CmprI and CmprE fields indicate the number of prefix octets that are
shared with the IPv6 Destination Address. When CmprI or CmprE is not 0,
Addresses[1..n] are compressed as follows:
1..n-1 : (16 - CmprI) bytes
n : (16 - CmprE) bytes
Segments Left indicates the number of route segments remaining. When the
value is not zero, the SRH is forwarded to the next hop. Its address
is extracted from Addresses[n - Segment Left + 1] and swapped with IPv6
Destination Address.
When Segment Left is greater than or equal to 2, the size of SRH is not
changed because Addresses[1..n-1] are decompressed and recompressed with
CmprI.
OTOH, when Segment Left changes from 1 to 0, the new SRH could have a
different size because Addresses[1..n-1] are decompressed with CmprI and
recompressed with CmprE.
Let's say CmprI is 15 and CmprE is 0. When we receive SRH with Segment
Left >= 2, Addresses[1..n-1] have 1 byte for each, and Addresses[n] has
16 bytes. When Segment Left is 1, Addresses[1..n-1] is decompressed to
16 bytes and not recompressed. Finally, the new SRH will need more room
in the header, and the size is (16 - 1) * (n - 1) bytes.
Here the max value of n is 255 as Segment Left is u8, so in the worst case,
we have to allocate 3825 bytes in the skb headroom. However, now we only
allocate a small fixed buffer that is IPV6_RPL_SRH_WORST_SWAP_SIZE (16 + 7
bytes). If the decompressed size overflows the room, skb_push() hits BUG()
below [0].
Instead of allocating the fixed buffer for every packet, let's allocate
enough headroom only when we receive SRH with Segment Left 1.
[0]:
skbuff: skb_under_panic: text:ffffffff81c9f6e2 len:576 put:576 head:ffff8880070b5180 data:ffff8880070b4fb0 tail:0x70 end:0x140 dev:lo
kernel BUG at net/core/skbuff.c:200!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 154 Comm: python3 Not tainted 6.4.0-rc4-00190-gc308e9ec0047 #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:skb_panic (net/core/skbuff.c:200)
Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 80 6e 77 82 e8 ad 8b 60 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffc90000003da0 EFLAGS: 00000246
RAX: 0000000000000085 RBX: ffff8880058a6600 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88807dc1c540 RDI: ffff88807dc1c540
RBP: ffffc90000003e48 R08: ffffffff82b392c8 R09: 00000000ffffdfff
R10: ffffffff82a592e0 R11: ffffffff82b092e0 R12: ffff888005b1c800
R13: ffff8880070b51b8 R14: ffff888005b1ca18 R15: ffff8880070b5190
FS: 00007f4539f0b740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055670baf3000 CR3: 0000000005b0e000 CR4: 00000000007506f0
PKRU: 55555554
Call Trace:
<IRQ>
skb_push (net/core/skbuff.c:210)
ipv6_rthdr_rcv (./include/linux/skbuff.h:2880 net/ipv6/exthdrs.c:634 net/ipv6/exthdrs.c:718)
ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:437 (discriminator 5))
ip6_input_finish (./include/linux/rcupdate.h:805 net/ipv6/ip6_input.c:483)
__netif_receive_skb_one_core (net/core/dev.c:5494)
process_backlog (./include/linux/rcupdate.h:805 net/core/dev.c:5934)
__napi_poll (net/core/dev.c:6496)
net_rx_action (net/core/dev.c:6565 net/core/dev.c:6696)
__do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:572)
do_softirq (kernel/softirq.c:472 kernel/softirq.c:459)
</IRQ>
<TASK>
__local_bh_enable_ip (kernel/softirq.c:396)
__dev_queue_xmit (net/core/dev.c:4272)
ip6_finish_output2 (./include/net/neighbour.h:544 net/ipv6/ip6_output.c:134)
rawv6_sendmsg (./include/net/dst.h:458 ./include/linux/netfilter.h:303 net/ipv6/raw.c:656 net/ipv6/raw.c:914)
sock_sendmsg (net/socket.c:724 net/socket.c:747)
__sys_sendto (net/socket.c:2144)
__x64_sys_sendto (net/socket.c:2156 net/socket.c:2152 net/socket.c:2152)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7f453a138aea
Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
RSP: 002b:00007ffcc212a1c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007ffcc212a288 RCX: 00007f453a138aea
RDX: 0000000000000060 RSI: 00007f4539084c20 RDI: 0000000000000003
RBP: 00007f4538308e80 R08: 00007ffcc212a300 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f4539712d1b
</TASK>
Modules linked in:
Fixes: 8610c7c6e3 ("net: ipv6: add support for rpl sr exthdr")
Reported-by: Max VA
Closes: https://www.interruptlabs.co.uk/articles/linux-ipv6-route-of-death
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230605180617.67284-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add current size of rule expressions to the boundary check.
Fixes: 2c865a8a28 ("netfilter: nf_tables: add rule blob layout")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
syzkaller found a repro that causes Hung Task [0] with ipset. The repro
first creates an ipset and then tries to delete a large number of IPs
from the ipset concurrently:
IPSET_ATTR_IPADDR_IPV4 : 172.20.20.187
IPSET_ATTR_CIDR : 2
The first deleting thread hogs a CPU with nfnl_lock(NFNL_SUBSYS_IPSET)
held, and other threads wait for it to be released.
Previously, the same issue existed in set->variant->uadt() that could run
so long under ip_set_lock(set). Commit 5e29dc36bd ("netfilter: ipset:
Rework long task execution when adding/deleting entries") tried to fix it,
but the issue still exists in the caller with another mutex.
While adding/deleting many IPs, we should release the CPU periodically to
prevent someone from abusing ipset to hang the system.
Note we need to increment the ipset's refcnt to prevent the ipset from
being destroyed while rescheduling.
[0]:
INFO: task syz-executor174:268 blocked for more than 143 seconds.
Not tainted 6.4.0-rc1-00145-gba79e9a73284 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor174 state:D stack:0 pid:268 ppid:260 flags:0x0000000d
Call trace:
__switch_to+0x308/0x714 arch/arm64/kernel/process.c:556
context_switch kernel/sched/core.c:5343 [inline]
__schedule+0xd84/0x1648 kernel/sched/core.c:6669
schedule+0xf0/0x214 kernel/sched/core.c:6745
schedule_preempt_disabled+0x58/0xf0 kernel/sched/core.c:6804
__mutex_lock_common kernel/locking/mutex.c:679 [inline]
__mutex_lock+0x6fc/0xdb0 kernel/locking/mutex.c:747
__mutex_lock_slowpath+0x14/0x20 kernel/locking/mutex.c:1035
mutex_lock+0x98/0xf0 kernel/locking/mutex.c:286
nfnl_lock net/netfilter/nfnetlink.c:98 [inline]
nfnetlink_rcv_msg+0x480/0x70c net/netfilter/nfnetlink.c:295
netlink_rcv_skb+0x1c0/0x350 net/netlink/af_netlink.c:2546
nfnetlink_rcv+0x18c/0x199c net/netfilter/nfnetlink.c:658
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x664/0x8cc net/netlink/af_netlink.c:1365
netlink_sendmsg+0x6d0/0xa4c net/netlink/af_netlink.c:1913
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
____sys_sendmsg+0x4b8/0x810 net/socket.c:2503
___sys_sendmsg net/socket.c:2557 [inline]
__sys_sendmsg+0x1f8/0x2a4 net/socket.c:2586
__do_sys_sendmsg net/socket.c:2595 [inline]
__se_sys_sendmsg net/socket.c:2593 [inline]
__arm64_sys_sendmsg+0x80/0x94 net/socket.c:2593
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x84/0x270 arch/arm64/kernel/syscall.c:52
el0_svc_common+0x134/0x24c arch/arm64/kernel/syscall.c:142
do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193
el0_svc+0x2c/0x7c arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
Reported-by: syzkaller <syzkaller@googlegroups.com>
Fixes: a7b4f989a6 ("netfilter: ipset: IP set core support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
An nf_conntrack_helper from nf_conn_help may become NULL after DNAT.
Observed when TCP port 1720 (Q931_PORT), associated with h323 conntrack
helper, is DNAT'ed to another destination port (e.g. 1730), while
nfqueue is being used for final acceptance (e.g. snort).
This happenned after transition from kernel 4.14 to 5.10.161.
Workarounds:
* keep the same port (1720) in DNAT
* disable nfqueue
* disable/unload h323 NAT helper
$ linux-5.10/scripts/decode_stacktrace.sh vmlinux < /tmp/kernel.log
BUG: kernel NULL pointer dereference, address: 0000000000000084
[..]
RIP: 0010:nf_conntrack_update (net/netfilter/nf_conntrack_core.c:2080 net/netfilter/nf_conntrack_core.c:2134) nf_conntrack
[..]
nfqnl_reinject (net/netfilter/nfnetlink_queue.c:237) nfnetlink_queue
nfqnl_recv_verdict (net/netfilter/nfnetlink_queue.c:1230) nfnetlink_queue
nfnetlink_rcv_msg (net/netfilter/nfnetlink.c:241) nfnetlink
[..]
Fixes: ee04805ff5 ("netfilter: conntrack: make conntrack userspace helpers work again")
Signed-off-by: Tijs Van Buggenhout <tijs.van.buggenhout@axsguard.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
At the end of `nft_bitwise_reduce`, there is a loop which is intended to
update the bitwise expression associated with each tracked destination
register. However, currently, it just updates the first register
repeatedly. Fix it.
Fixes: 34cc9e5288 ("netfilter: nf_tables: cancel tracking for clobbered destination registers")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The nla_nest_start_noflag() function may fail and return NULL;
the return value needs to be checked.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: d54725cd11 ("netfilter: nf_tables: support for multiple devices per netdev hook")
Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This should use wiphy_lock() now instead of requiring the
RTNL, since __cfg80211_leave() via cfg80211_leave() is now
requiring that lock to be held.
Fixes: a05829a722 ("cfg80211: avoid holding the RTNL when calling the driver")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This should use wiphy_lock() now instead of acquiring the
RTNL, since cfg80211_stop_sched_scan_req() now needs that.
Fixes: a05829a722 ("cfg80211: avoid holding the RTNL when calling the driver")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If we have a reconfig failure in the driver, then we need
to shut down the network interface(s) at the network stack
level through cfg80211, which can result in a lot of those
"Failed check-sdata-in-driver check, ..." warnings, since
interfaces are considered to not be in the driver when the
reconfiguration fails, but we still need to go through all
the shutdown flow.
Avoid many of these warnings by storing the fact that the
stack experienced a reconfiguration failure and not doing
the warning in that case.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230604120651.3750c4ae6e76.I9e80d6026f59263c008a1a68f6cd6891ca0b93b0@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When we flush stations, we first take them off the list
and then destroy them one by one. If we do the different
mode recalculations while destroying them, we cause the
following scenario:
- STA 1 has 80 MHz - min chanctx width is now 80 MHz
- STA 2 has 80 MHz
- empty STA list
- destroy STA 2
- recalc min chanctx width -> results in 20 MHz as
the STA list is already empty
This is broken, since as far as the driver is concerned
STA 1 still exists at this point, and this causes issues
at least with iwlwifi.
Fix - and also optimize - this by doing the recalc of
min chanctx width (and also P2P PS) only after all the
stations were removed.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230604120651.48d262b6b42d.Ia15532657c17535c28ec0c5df263b65f0f80663c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When adding a new link to a station, this needs to cause a
recalculation of the minimum chandef since otherwise we can
have a higher bandwidth station connected on that link than
the link is operating at. Do the appropriate recalc.
Fixes: cb71f1d136 ("wifi: mac80211: add sta link addition/removal")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230604120651.377adf3c789a.I91bf28f399e16e6ac1f83bacd1029a698b4e6685@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The size of enum ieee80211_bss_change is bigger that 32,
so we need u64 to be used in a flag. Also pass u64
instead of u32 to ieee80211_reconfig_ap_links() for the same
reason.
Signed-off-by: Mukesh Sisodiya <mukesh.sisodiya@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230604120651.d53b7018a4eb.I1adaa041de51d50d84a11226573e81ceac0fe90d@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Previously, I didn't implement restarting here at all if the
interface is an MLD, so it only worked for non-MLO. Add the
needed code to restart an AP MLD correctly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230504134511.828474-12-gregory.greenman@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We need to teach the low level driver about the EML capability which
includes information for EMLSR / EMLMR operation.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230504134511.828474-11-gregory.greenman@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Skip the EHT BSS membership selector for getting rates.
While at it, add the definitions for GLK and EPS, and
sort the list.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230504134511.828474-9-gregory.greenman@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Implement proper reconfiguration for interfaces that are
doing MLO, in order to be able to recover from HW restart
correctly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230504134511.828474-6-gregory.greenman@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The GRO control block (NAPI_GRO_CB) is currently at its maximum size.
This commit reduces its size by putting two groups of fields that are
used only at different times into a union.
Specifically, the fields frag0 and frag0_len are the fields that make up
the frag0 optimisation mechanism, which is used during the initial
parsing of the SKB.
The fields last and age are used after the initial parsing, while the
SKB is stored in the GRO list, waiting for other packets to arrive.
There was one location in dev_gro_receive that modified the frag0 fields
after setting last and age. I changed this accordingly without altering
the code behaviour.
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230601161407.GA9253@debian
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently, whenever an EMA beacon is formed, due to is_template
argument being false from the caller, the switch count is always
decremented once which is wrong.
Also if switch count is equal to profile periodicity, this makes
the switch count to reach till zero which triggers a WARN_ON_ONCE.
[ 261.593915] CPU: 1 PID: 800 Comm: kworker/u8:3 Not tainted 5.4.213 #0
[ 261.616143] Hardware name: Qualcomm Technologies, Inc. IPQ9574
[ 261.622666] Workqueue: phy0 ath12k_get_link_bss_conf [ath12k]
[ 261.629771] pstate: 60400005 (nZCv daif +PAN -UAO)
[ 261.635595] pc : ieee80211_next_txq+0x1ac/0x1b8 [mac80211]
[ 261.640282] lr : ieee80211_beacon_update_cntdwn+0x64/0xb4 [mac80211]
[...]
[ 261.729683] Call trace:
[ 261.734986] ieee80211_next_txq+0x1ac/0x1b8 [mac80211]
[ 261.737156] ieee80211_beacon_cntdwn_is_complete+0xa28/0x1194 [mac80211]
[ 261.742365] ieee80211_beacon_cntdwn_is_complete+0xef4/0x1194 [mac80211]
[ 261.749224] ieee80211_beacon_get_template_ema_list+0x38/0x5c [mac80211]
[ 261.755908] ath12k_get_link_bss_conf+0xf8/0x33b4 [ath12k]
[ 261.762590] ath12k_get_link_bss_conf+0x390/0x33b4 [ath12k]
[ 261.767881] process_one_work+0x194/0x270
[ 261.773346] worker_thread+0x200/0x314
[ 261.777514] kthread+0x140/0x150
[ 261.781158] ret_from_fork+0x10/0x18
Fix this issue by making the is_template argument as true when fetching
the EMA beacons.
Fixes: bd54f3c290 ("wifi: mac80211: generate EMA beacons in AP mode")
Signed-off-by: Aditya Kumar Singh <quic_adisi@quicinc.com>
Link: https://lore.kernel.org/r/20230531062012.4537-1-quic_adisi@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Don't do link address translation for beacons and probe responses,
this leads to reporting multiple scan list entries for the same AP
(one with the MLD address) which just breaks things.
We might need to extend this in the future for some other (action)
frames that aren't MLD addressed.
Fixes: 42fb9148c0 ("wifi: mac80211: do link->MLD address translation on RX")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230604120651.62adead1b43a.Ifc25eed26ebf3b269f60b1ec10060156d0e7ec0d@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There were two bugs when creating the non-inheritence
element:
1) 'at_extension' needs to be declared outside the loop,
otherwise the value resets every iteration and we
can never really switch properly
2) 'added' never got set to true, so we always cut off
the extension element again at the end of the function
This shows another issue that we might add a list but no
extension list, but we need to make the extension list a
zero-length one in that case.
Fix all these issues. While at it, add a comment explaining
the trim.
Fixes: 81151ce462 ("wifi: mac80211: support MLO authentication/association with one link")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230604120651.3addaa5c4782.If3a78f9305997ad7ef4ba7ffc17a8234c956f613@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When trying to authenticate, if the AP MLD address isn't
a valid address, mac80211 can throw a warning. Avoid that
by rejecting such addresses.
Fixes: d648c23024 ("wifi: nl80211: support MLO in auth/assoc")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230604120651.89188912bd1d.I8dbc6c8ee0cb766138803eec59508ef4ce477709@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
We already check that the right iftype capa exists,
but then don't use it. Assign it to a variable so we
can actually use it, and then do that.
Fixes: bac2fd3d75 ("mac80211: remove use of ieee80211_get_he_sta_cap()")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Link: https://lore.kernel.org/r/20230604120651.0e908e5c5fdd.Iac142549a6144ac949ebd116b921a59ae5282735@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Convert kcm_sendpage() to use sendmsg() with MSG_SPLICE_PAGES rather than
directly splicing in the pages itself.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Tom Herbert <tom@herbertland.com>
cc: Tom Herbert <tom@quantonium.net>
cc: Cong Wang <cong.wang@bytedance.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Make AF_KCM sendmsg() support MSG_SPLICE_PAGES. This causes pages to be
spliced from the source iterator if possible.
This allows ->sendpage() to be replaced by something that can handle
multiple multipage folios in a single transaction.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Tom Herbert <tom@herbertland.com>
cc: Tom Herbert <tom@quantonium.net>
cc: Cong Wang <cong.wang@bytedance.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When receiving a connect response we should make sure that the DCID is
within the valid range and that we don't already have another channel
allocated for the same DCID.
Missing checks may violate the specification (BLUETOOTH CORE SPECIFICATION
Version 5.4 | Vol 3, Part A, Page 1046).
Fixes: 40624183c2 ("Bluetooth: L2CAP: Add missing checks for invalid LE DCID")
Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
The order of CIS handle array in Set CIG Parameters response shall match
the order of the CIS_ID array in the command (Core v5.3 Vol 4 Part E Sec
7.8.97). We send CIS_IDs mainly in the order of increasing CIS_ID (but
with "last" CIS first if it has fixed CIG_ID). In handling of the
reply, we currently assume this is also the same as the order of
hci_conn in hdev->conn_hash, but that is not true.
Match the correct hci_conn to the correct handle by matching them based
on the CIG+CIS combination. The CIG+CIS combination shall be unique for
ISO_LINK hci_conn at state >= BT_BOUND, which we maintain in
hci_le_set_cig_params.
Fixes: 26afbd826e ("Bluetooth: Add initial implementation of CIS connections")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Consider existing BOUND & CONNECT state CIS to block CIG removal.
Otherwise, under suitable timing conditions we may attempt to remove CIG
while Create CIS is pending, which fails.
Fixes: 26afbd826e ("Bluetooth: Add initial implementation of CIS connections")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
L2CAP assumes that the locks conn->chan_lock and chan->lock are
acquired in the order conn->chan_lock, chan->lock to avoid
potential deadlock.
For example, l2sock_shutdown acquires these locks in the order:
mutex_lock(&conn->chan_lock)
l2cap_chan_lock(chan)
However, l2cap_disconnect_req acquires chan->lock in
l2cap_get_chan_by_scid first and then acquires conn->chan_lock
before calling l2cap_chan_del. This means that these locks are
acquired in unexpected order, which leads to potential deadlock:
l2cap_chan_lock(c)
mutex_lock(&conn->chan_lock)
This patch releases chan->lock before acquiring the conn_chan_lock
to avoid the potential deadlock.
Fixes: a2a9339e1c ("Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}")
Signed-off-by: Ying Hsu <yinghsu@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Since commit ec6cef9cd9 ("Bluetooth: Fix SMP channel registration for
unconfigured controllers") the debugfs interface for unconfigured
controllers will be created when the controller is configured.
There is however currently nothing preventing a controller from being
configured multiple time (e.g. setting the device address using btmgmt)
which results in failed attempts to register the already registered
debugfs entries:
debugfs: File 'features' in directory 'hci0' already present!
debugfs: File 'manufacturer' in directory 'hci0' already present!
debugfs: File 'hci_version' in directory 'hci0' already present!
...
debugfs: File 'quirk_simultaneous_discovery' in directory 'hci0' already present!
Add a controller flag to avoid trying to register the debugfs interface
more than once.
Fixes: ec6cef9cd9 ("Bluetooth: Fix SMP channel registration for unconfigured controllers")
Cc: stable@vger.kernel.org # 4.0
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
When the HCI_UNREGISTER flag is set, no jobs should be scheduled. Fix
potential race when HCI_UNREGISTER is set after the flag is tested in
hci_cmd_sync_queue.
Fixes: 0b94f2651f ("Bluetooth: hci_sync: Fix queuing commands when HCI_UNREGISTER is set")
Signed-off-by: Zhengping Jiang <jiangzp@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Similar to commit 0f7d9b31ce ("netfilter: nf_tables: fix use-after-free
in nft_set_catchall_destroy()"). We can not access k after kfree_rcu()
call.
Cc: stable@vger.kernel.org
Signed-off-by: Min Li <lm0963hack@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>