Commit 82dfb540ae ("VSOCK: Add virtio vsock vsockmon hooks") added
virtio_transport_deliver_tap_pkt() for handing packets to the
vsockmon device. However, in virtio_transport_send_pkt_work(),
the function is called before actually sending the packet (i.e.
before placing it in the virtqueue with virtqueue_add_sgs() and checking
whether it returned successfully).
Queuing the packet in the virtqueue can fail even multiple times.
However, in virtio_transport_deliver_tap_pkt() we deliver the packet
to the monitoring tap interface only the first time we call it.
This certainly avoids seeing the same packet replicated multiple times
in the monitoring interface, but it can show the packet sent with the
wrong timestamp or even before we succeed to queue it in the virtqueue.
Move virtio_transport_deliver_tap_pkt() after calling virtqueue_add_sgs()
and making sure it returned successfully.
Fixes: 82dfb540ae ("VSOCK: Add virtio vsock vsockmon hooks")
Cc: stable@vge.kernel.org
Signed-off-by: Marco Pinna <marco.pinn95@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20240329161259.411751-1-marco.pinn95@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the ax25 device is detaching, the ax25_dev_device_down()
calls ax25_ds_del_timer() to cleanup the slave_timer. When
the timer handler is running, the ax25_ds_del_timer() that
calls del_timer() in it will return directly. As a result,
the use-after-free bugs could happen, one of the scenarios
is shown below:
(Thread 1) | (Thread 2)
| ax25_ds_timeout()
ax25_dev_device_down() |
ax25_ds_del_timer() |
del_timer() |
ax25_dev_put() //FREE |
| ax25_dev-> //USE
In order to mitigate bugs, when the device is detaching, use
timer_shutdown_sync() to stop the timer.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240329015023.9223-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they
accept non-MPC connections. As reported by Christoph, this is "surprising"
because the counter might become greater than MPTcpExtMPCapableSYNRX.
MPTcpExtMPCapableFallbackACK counter's name suggests it should only be
incremented when a connection was seen using MPTCP options, then a
fallback to TCP has been done. Let's do that by incrementing it when
the subflow context of an inbound MPC connection attempt is dropped.
Also, update mptcp_connect.sh kselftest, to ensure that the
above MIB does not increment in case a pure TCP client connects to a
MPTCP server.
Fixes: fc518953bc ("mptcp: add and use MIB counter infrastructure")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/449
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-1-324a8981da48@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Bluetooth: Fix TOCTOU in HCI debugfs implementation
- Bluetooth: hci_event: set the conn encrypted before conn establishes
- Bluetooth: qca: fix device-address endianness
- Bluetooth: hci_sync: Fix not checking error on hci_cmd_sync_cancel_sync
-----BEGIN PGP SIGNATURE-----
iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmYGygoZHGx1aXoudm9u
LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKQ1bD/9ERQCSR7yK9OhMj2/Hrzl9
lxqp7Dn5a3Z/UZgHP4xGE6M1O1oNHo6tvdyWV8nBBmKLjAl35gZloi5ulsp7RJti
2l9bBl42gWI1kCThml+xJQQLi7XJJV8EAUksIk4gym98nG0T9KNgJk8qR4Hj01ae
5CCMhvWYNJDJgMw8zzytgGqy21keNMeFm8Hq4/qrrD2XzTcMTc9AtgraQln7sXCa
afp3AkgyZw8Rg4eBvEBnawtgBV/QXk9dGxQK2g2EJsXOSivzpH+sq3DwkUg7l4UL
JMmLGNP0Ikn9YI7WDQ4AD07clZ+2H8LcendAKX/Zz1i2c4dT9brFOiMeqoSxwkXO
fvI3qHJKt6BKld9qxdlVzLjXUSJ9C0/1zuyuMr3v81zrEQQJ5Bb0WnjmfM28jm/M
usjq/suHH4ZeFg0DtD6UB9FCa3O5K9WXIOXDN5q9xKiP9ZjGGsZ0EeigWlFGvFyr
I/3M75xPcpfBUtI82qbfNTDFSU1y/BTe57SXqQFk4G12ywnh5oY1MEKXTFkbQHGN
RizCAHiq1jvQoACNxO0JZhjJi6zFsNtF8BHWx8gajER8iQtcT+gQXFQqOr+PmMIM
fZromrAGRCEIRLJrtfidi8JwW25dihM3dP+Tz1WUDYTIRwNYrfjpjm+EEsStDVUU
diTV+vX9GBA9z5c42kIodA==
=tCQe
-----END PGP SIGNATURE-----
Merge tag 'for-net-2024-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Bluetooth: Fix TOCTOU in HCI debugfs implementation
- Bluetooth: hci_event: set the conn encrypted before conn establishes
- Bluetooth: qca: fix device-address endianness
- Bluetooth: hci_sync: Fix not checking error on hci_cmd_sync_cancel_sync
* tag 'for-net-2024-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: Fix TOCTOU in HCI debugfs implementation
Bluetooth: hci_event: set the conn encrypted before conn establishes
Bluetooth: hci_sync: Fix not checking error on hci_cmd_sync_cancel_sync
Bluetooth: qca: fix device-address endianness
Bluetooth: add quirk for broken address properties
arm64: dts: qcom: sc7180-trogdor: mark bluetooth address as broken
dt-bindings: bluetooth: add 'qcom,local-bd-address-broken'
Revert "Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT"
====================
Link: https://lore.kernel.org/r/20240329140453.2016486-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jianguo Wu reported another bind() regression introduced by bhash2.
Calling bind() for the following 3 addresses on the same port, the
3rd one should fail but now succeeds.
1. 0.0.0.0 or ::ffff:0.0.0.0
2. [::] w/ IPV6_V6ONLY
3. IPv4 non-wildcard address or v4-mapped-v6 non-wildcard address
The first two bind() create tb2 like this:
bhash2 -> tb2(:: w/ IPV6_V6ONLY) -> tb2(0.0.0.0)
The 3rd bind() will match with the IPv6 only wildcard address bucket
in inet_bind2_bucket_match_addr_any(), however, no conflicting socket
exists in the bucket. So, inet_bhash2_conflict() will returns false,
and thus, inet_bhash2_addr_any_conflict() returns false consequently.
As a result, the 3rd bind() bypasses conflict check, which should be
done against the IPv4 wildcard address bucket.
So, in inet_bhash2_addr_any_conflict(), we must iterate over all buckets.
Note that we cannot add ipv6_only flag for inet_bind2_bucket as it
would confuse the following patetrn.
1. [::] w/ SO_REUSE{ADDR,PORT} and IPV6_V6ONLY
2. [::] w/ SO_REUSE{ADDR,PORT}
3. IPv4 non-wildcard address or v4-mapped-v6 non-wildcard address
The first bind() would create a bucket with ipv6_only flag true,
the second bind() would add the [::] socket into the same bucket,
and the third bind() could succeed based on the wrong assumption
that ipv6_only bucket would not conflict with v4(-mapped-v6) address.
Fixes: 28044fc1d4 ("net: Add a bhash2 table hashed by port and address")
Diagnosed-by: Jianguo Wu <wujianguo106@163.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240326204251.51301-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 5e07e67241 ("tcp: Use bhash2 for v4-mapped-v6 non-wildcard
address.") introduced bind() regression for v4-mapped-v6 address.
When we bind() the following two addresses on the same port, the 2nd
bind() should succeed but fails now.
1. [::] w/ IPV6_ONLY
2. ::ffff:127.0.0.1
After the chagne, v4-mapped-v6 uses bhash2 instead of bhash to
detect conflict faster, but I forgot to add a necessary change.
During the 2nd bind(), inet_bind2_bucket_match_addr_any() returns
the tb2 bucket of [::], and inet_bhash2_conflict() finally calls
inet_bind_conflict(), which returns true, meaning conflict.
inet_bhash2_addr_any_conflict
|- inet_bind2_bucket_match_addr_any <-- return [::] bucket
`- inet_bhash2_conflict
`- __inet_bhash2_conflict <-- checks IPV6_ONLY for AF_INET
| but not for v4-mapped-v6 address
`- inet_bind_conflict <-- does not check address
inet_bind_conflict() does not check socket addresses because
__inet_bhash2_conflict() is expected to do so.
However, it checks IPV6_V6ONLY attribute only against AF_INET
socket, and not for v4-mapped-v6 address.
As a result, v4-mapped-v6 address conflicts with v6-only wildcard
address.
To avoid that, let's add the missing test to use bhash2 for
v4-mapped-v6 address.
Fixes: 5e07e67241 ("tcp: Use bhash2 for v4-mapped-v6 non-wildcard address.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240326204251.51301-2-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is no reason to consume a full cacheline to store system_page_pool.
We can eventually move it to softnet_data later for full locality control.
Fixes: 2b0cfa6e49 ("net: add generic percpu page_pool allocator")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Lorenzo Bianconi <lorenzo@kernel.org>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20240328173448.2262593-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
cp might be null, calling cp->cp_conn would produce null dereference
[Simon Horman adds:]
Analysis:
* cp is a parameter of __rds_rdma_map and is not reassigned.
* The following call-sites pass a NULL cp argument to __rds_rdma_map()
- rds_get_mr()
- rds_get_mr_for_dest
* Prior to the code above, the following assumes that cp may be NULL
(which is indicative, but could itself be unnecessary)
trans_private = rs->rs_transport->get_mr(
sg, nents, rs, &mr->r_key, cp ? cp->cp_conn : NULL,
args->vec.addr, args->vec.bytes,
need_odp ? ODP_ZEROBASED : ODP_NOT_NEEDED);
* The code modified by this patch is guarded by IS_ERR(trans_private),
where trans_private is assigned as per the previous point in this analysis.
The only implementation of get_mr that I could locate is rds_ib_get_mr()
which can return an ERR_PTR if the conn (4th) argument is NULL.
* ret is set to PTR_ERR(trans_private).
rds_ib_get_mr can return ERR_PTR(-ENODEV) if the conn (4th) argument is NULL.
Thus ret may be -ENODEV in which case the code in question will execute.
Conclusion:
* cp may be NULL at the point where this patch adds a check;
this patch does seem to address a possible bug
Fixes: c055fc00c0 ("net/rds: fix WARNING in rds_conn_connect_if_down")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Mahmoud Adam <mngyadam@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240326153132.55580-1-mngyadam@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
struct hci_dev members conn_info_max_age, conn_info_min_age,
le_conn_max_interval, le_conn_min_interval, le_adv_max_interval,
and le_adv_min_interval can be modified from the HCI core code, as well
through debugfs.
The debugfs implementation, that's only available to privileged users,
will check for boundaries, making sure that the minimum value being set
is strictly above the maximum value that already exists, and vice-versa.
However, as both minimum and maximum values can be changed concurrently
to us modifying them, we need to make sure that the value we check is
the value we end up using.
For example, with ->conn_info_max_age set to 10, conn_info_min_age_set()
gets called from vfs handlers to set conn_info_min_age to 8.
In conn_info_min_age_set(), this goes through:
if (val == 0 || val > hdev->conn_info_max_age)
return -EINVAL;
Concurrently, conn_info_max_age_set() gets called to set to set the
conn_info_max_age to 7:
if (val == 0 || val > hdev->conn_info_max_age)
return -EINVAL;
That check will also pass because we used the old value (10) for
conn_info_max_age.
After those checks that both passed, the struct hci_dev access
is mutex-locked, disabling concurrent access, but that does not matter
because the invalid value checks both passed, and we'll end up with
conn_info_min_age = 8 and conn_info_max_age = 7
To fix this problem, we need to lock the structure access before so the
check and assignment are not interrupted.
This fix was originally devised by the BassCheck[1] team, and
considered the problem to be an atomicity one. This isn't the case as
there aren't any concerns about the variable changing while we check it,
but rather after we check it parallel to another change.
This patch fixes CVE-2024-24858 and CVE-2024-24857.
[1] https://sites.google.com/view/basscheck/
Co-developed-by: Gui-Dong Han <2045gemini@gmail.com>
Signed-off-by: Gui-Dong Han <2045gemini@gmail.com>
Link: https://lore.kernel.org/linux-bluetooth/20231222161317.6255-1-2045gemini@gmail.com/
Link: https://nvd.nist.gov/vuln/detail/CVE-2024-24858
Link: https://lore.kernel.org/linux-bluetooth/20231222162931.6553-1-2045gemini@gmail.com/
Link: https://lore.kernel.org/linux-bluetooth/20231222162310.6461-1-2045gemini@gmail.com/
Link: https://nvd.nist.gov/vuln/detail/CVE-2024-24857
Fixes: 31ad169148 ("Bluetooth: Add conn info lifetime parameters to debugfs")
Fixes: 729a1051da ("Bluetooth: Expose default LE advertising interval via debugfs")
Fixes: 71c3b60ec6 ("Bluetooth: Move BR/EDR debugfs file creation into hci_debugfs.c")
Signed-off-by: Bastien Nocera <hadess@hadess.net>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
We have a BT headset (Lenovo Thinkplus XT99), the pairing and
connecting has no problem, once this headset is paired, bluez will
remember this device and will auto re-connect it whenever the device
is powered on. The auto re-connecting works well with Windows and
Android, but with Linux, it always fails. Through debugging, we found
at the rfcomm connection stage, the bluetooth stack reports
"Connection refused - security block (0x0003)".
For this device, the re-connecting negotiation process is different
from other BT headsets, it sends the Link_KEY_REQUEST command before
the CONNECT_REQUEST completes, and it doesn't send ENCRYPT_CHANGE
command during the negotiation. When the device sends the "connect
complete" to hci, the ev->encr_mode is 1.
So here in the conn_complete_evt(), if ev->encr_mode is 1, link type
is ACL and HCI_CONN_ENCRYPT is not set, we set HCI_CONN_ENCRYPT to
this conn, and update conn->enc_key_size accordingly.
After this change, this BT headset could re-connect with Linux
successfully. This is the btmon log after applying the patch, after
receiving the "Connect Complete" with "Encryption: Enabled", will send
the command to read encryption key size:
> HCI Event: Connect Request (0x04) plen 10
Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA)
Class: 0x240404
Major class: Audio/Video (headset, speaker, stereo, video, vcr)
Minor class: Wearable Headset Device
Rendering (Printing, Speaker)
Audio (Speaker, Microphone, Headset)
Link type: ACL (0x01)
...
> HCI Event: Link Key Request (0x17) plen 6
Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA)
< HCI Command: Link Key Request Reply (0x01|0x000b) plen 22
Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA)
Link key: ${32-hex-digits-key}
...
> HCI Event: Connect Complete (0x03) plen 11
Status: Success (0x00)
Handle: 256
Address: 8C:3C:AA:D8:11:67 (OUI 8C-3C-AA)
Link type: ACL (0x01)
Encryption: Enabled (0x01)
< HCI Command: Read Encryption Key... (0x05|0x0008) plen 2
Handle: 256
< ACL Data TX: Handle 256 flags 0x00 dlen 10
L2CAP: Information Request (0x0a) ident 1 len 2
Type: Extended features supported (0x0002)
> HCI Event: Command Complete (0x0e) plen 7
Read Encryption Key Size (0x05|0x0008) ncmd 1
Status: Success (0x00)
Handle: 256
Key size: 16
Cc: stable@vger.kernel.org
Link: https://github.com/bluez/bluez/issues/704
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
hci_cmd_sync_cancel_sync shall check the error passed to it since it
will be propagated using req_result which is __u32 it needs to be
properly set to a positive value if it was passed as negative othertise
IS_ERR will not trigger as -(errno) would be converted to a positive
value.
Fixes: 63298d6e75 ("Bluetooth: hci_core: Cancel request on command timeout")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Reported-and-tested-by: Thorsten Leemhuis <linux@leemhuis.info>
Closes: https://lore.kernel.org/all/08275279-7462-4f4a-a0ee-8aa015f829bc@leemhuis.info/
Some Bluetooth controllers lack persistent storage for the device
address and instead one can be provided by the boot firmware using the
'local-bd-address' devicetree property.
The Bluetooth devicetree bindings clearly states that the address should
be specified in little-endian order, but due to a long-standing bug in
the Qualcomm driver which reversed the address some boot firmware has
been providing the address in big-endian order instead.
Add a new quirk that can be set on platforms with broken firmware and
use it to reverse the address when parsing the property so that the
underlying driver bug can be fixed.
Fixes: 5c0a1001c8 ("Bluetooth: hci_qca: Add helper to set device address")
Cc: stable@vger.kernel.org # 5.1
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
GRO has a fundamental issue with UDP tunnel packets as it can't detect
those in a foolproof way and GRO could happen before they reach the
tunnel endpoint. Previous commits have fixed issues when UDP tunnel
packets come from a remote host, but if those packets are issued locally
they could run into checksum issues.
If the inner packet has a partial checksum the information will be lost
in the GRO logic, either in udp4/6_gro_complete or in
udp_gro_complete_segment and packets will have an invalid checksum when
leaving the host.
Prevent local UDP tunnel packets from ever being GROed at the outer UDP
level.
Due to skb->encapsulation being wrongly used in some drivers this is
actually only preventing UDP tunnel packets with a partial checksum to
be GROed (see iptunnel_handle_offloads) but those were also the packets
triggering issues so in practice this should be sufficient.
Fixes: 9fd1ff5d2a ("udp: Support UDP fraglist GRO/GSO.")
Fixes: 36707061d6 ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP GRO validates checksums and in udp4/6_gro_complete fraglist packets
are converted to CHECKSUM_UNNECESSARY to avoid later checks. However
this is an issue for CHECKSUM_PARTIAL packets as they can be looped in
an egress path and then their partial checksums are not fixed.
Different issues can be observed, from invalid checksum on packets to
traces like:
gen01: hw csum failure
skb len=3008 headroom=160 headlen=1376 tailroom=0
mac=(106,14) net=(120,40) trans=160
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0xffff232e ip_summed=2 complete_sw=0 valid=0 level=0)
hash(0x77e3d716 sw=1 l4=1) proto=0x86dd pkttype=0 iif=12
...
Fix this by only converting CHECKSUM_NONE packets to
CHECKSUM_UNNECESSARY by reusing __skb_incr_checksum_unnecessary. All
other checksum types are kept as-is, including CHECKSUM_COMPLETE as
fraglist packets being segmented back would have their skb->csum valid.
Fixes: 9fd1ff5d2a ("udp: Support UDP fraglist GRO/GSO.")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If packets are GROed with fraglist they might be segmented later on and
continue their journey in the stack. In skb_segment_list those skbs can
be reused as-is. This is an issue as their destructor was removed in
skb_gro_receive_list but not the reference to their socket, and then
they can't be orphaned. Fix this by also removing the reference to the
socket.
For example this could be observed,
kernel BUG at include/linux/skbuff.h:3131! (skb_orphan)
RIP: 0010:ip6_rcv_core+0x11bc/0x19a0
Call Trace:
ipv6_list_rcv+0x250/0x3f0
__netif_receive_skb_list_core+0x49d/0x8f0
netif_receive_skb_list_internal+0x634/0xd40
napi_complete_done+0x1d2/0x7d0
gro_cell_poll+0x118/0x1f0
A similar construction is found in skb_gro_receive, apply the same
change there.
Fixes: 5e10da5385 ("skbuff: allow 'slow_gro' for skb carring sock reference")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When rx-udp-gro-forwarding is enabled UDP packets might be GROed when
being forwarded. If such packets might land in a tunnel this can cause
various issues and udp_gro_receive makes sure this isn't the case by
looking for a matching socket. This is performed in
udp4/6_gro_lookup_skb but only in the current netns. This is an issue
with tunneled packets when the endpoint is in another netns. In such
cases the packets will be GROed at the UDP level, which leads to various
issues later on. The same thing can happen with rx-gro-list.
We saw this with geneve packets being GROed at the UDP level. In such
case gso_size is set; later the packet goes through the geneve rx path,
the geneve header is pulled, the offset are adjusted and frag_list skbs
are not adjusted with regard to geneve. When those skbs hit
skb_fragment, it will misbehave. Different outcomes are possible
depending on what the GROed skbs look like; from corrupted packets to
kernel crashes.
One example is a BUG_ON[1] triggered in skb_segment while processing the
frag_list. Because gso_size is wrong (geneve header was pulled)
skb_segment thinks there is "geneve header size" of data in frag_list,
although it's in fact the next packet. The BUG_ON itself has nothing to
do with the issue. This is only one of the potential issues.
Looking up for a matching socket in udp_gro_receive is fragile: the
lookup could be extended to all netns (not speaking about performances)
but nothing prevents those packets from being modified in between and we
could still not find a matching socket. It's OK to keep the current
logic there as it should cover most cases but we also need to make sure
we handle tunnel packets being GROed too early.
This is done by extending the checks in udp_unexpected_gso: GSO packets
lacking the SKB_GSO_UDP_TUNNEL/_CSUM bits and landing in a tunnel must
be segmented.
[1] kernel BUG at net/core/skbuff.c:4408!
RIP: 0010:skb_segment+0xd2a/0xf70
__udp_gso_segment+0xaa/0x560
Fixes: 9fd1ff5d2a ("udp: Support UDP fraglist GRO/GSO.")
Fixes: 36707061d6 ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets")
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up till now only single character ('A' or 'B') was used to provide
information of HSR slave network device status.
As it is also possible and valid, that Interlink network device may
be supported as well, the description must be more verbose. As a result
the full string description is now used.
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_local_out() and other functions can pass skb->sk as function argument.
If the skb is a fragment and reassembly happens before such function call
returns, the sk must not be released.
This affects skb fragments reassembled via netfilter or similar
modules, e.g. openvswitch or ct_act.c, when run as part of tx pipeline.
Eric Dumazet made an initial analysis of this bug. Quoting Eric:
Calling ip_defrag() in output path is also implying skb_orphan(),
which is buggy because output path relies on sk not disappearing.
A relevant old patch about the issue was :
8282f27449 ("inet: frag: Always orphan skbs inside ip_defrag()")
[..]
net/ipv4/ip_output.c depends on skb->sk being set, and probably to an
inet socket, not an arbitrary one.
If we orphan the packet in ipvlan, then downstream things like FQ
packet scheduler will not work properly.
We need to change ip_defrag() to only use skb_orphan() when really
needed, ie whenever frag_list is going to be used.
Eric suggested to stash sk in fragment queue and made an initial patch.
However there is a problem with this:
If skb is refragmented again right after, ip_do_fragment() will copy
head->sk to the new fragments, and sets up destructor to sock_wfree.
IOW, we have no choice but to fix up sk_wmem accouting to reflect the
fully reassembled skb, else wmem will underflow.
This change moves the orphan down into the core, to last possible moment.
As ip_defrag_offset is aliased with sk_buff->sk member, we must move the
offset into the FRAG_CB, else skb->sk gets clobbered.
This allows to delay the orphaning long enough to learn if the skb has
to be queued or if the skb is completing the reasm queue.
In the former case, things work as before, skb is orphaned. This is
safe because skb gets queued/stolen and won't continue past reasm engine.
In the latter case, we will steal the skb->sk reference, reattach it to
the head skb, and fix up wmem accouting when inet_frag inflates truesize.
Fixes: 7026b1ddb6 ("netfilter: Pass socket pointer down through okfn().")
Diagnosed-by: Eric Dumazet <edumazet@google.com>
Reported-by: xingwei lee <xrivendell7@gmail.com>
Reported-by: yue sun <samsun1006219@gmail.com>
Reported-by: syzbot+e5167d7144a62715044c@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240326101845.30836-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmYE3H8ACgkQ1V2XiooU
IOR/2xAAiN+XhXMFiplQvR/6CEVirEriFIUUT8IR+fwIBUsFc6QAMpdNwCHo7j49
2gFxHNER8oD2IynVCvqYM/O2Ukz8S3FclMicgOb03HZ3cenCUjW1l0el8dYhb8Do
twVs2Trt217QyiwNkM0B68+5lo3wDo3uB70p2abdmcJ1tgCPncpDPR2Pl3DBPswP
kMrO1aohYBTn4SFyaVbCLzkOlS1T6Bf4yqQMcL+zgIdd9+kLkYRqHlvMiwM/vgwp
JJk7mnQx9h73y8sx9EAgaaf+63rNJ1JcDsKAhAAqJa9lJMVOPFTChaDOGp4aInvD
qYBUIqCRC/FWN7BEnq4Hj6NvLmbUPm+9YkMnE7nCfZXJVCVUyfwBFtv1FVKvD5YU
Ybi7Nf66bh9kYhy23TmhdQXEjzO9rIrJQEADz7qmGYydx27c5rwWUu8u7c0SRQ/V
l/1rmT39Dr3vyZIBguIXaeU8hSwVvtlhavEVQ73wKGdMGgdg6QUB2zUrYvb/IU74
v9PSmIoXKdcG4wwQj/ijRGgsZx556ifzehwbHhvFpCOn2THprMLnIaXC/mLKyiJb
TwBhTxyYBqGecEa214VQaxUndzwT9txZ7ExO6ZzEZEip0RERX6LoSvlvy4xo715U
ndq2s/yCSjRprmvEZSkgBwva0LEXjrFl9gormfkM+ejnDC4eAcQ=
=sQm5
-----END PGP SIGNATURE-----
Merge tag 'nf-24-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
Patch #1 reject destroy chain command to delete device hooks in netdev
family, hence, only delchain commands are allowed.
Patch #2 reject table flag update interference with netdev basechain
hook updates, this can leave hooks in inconsistent
registration/unregistration state.
Patch #3 do not unregister netdev basechain hooks if table is dormant.
Otherwise, splat with double unregistration is possible.
Patch #4 fixes Kconfig to allow to restore IP_NF_ARPTABLES,
from Kuniyuki Iwashima.
There are a more fixes still in progress on my side that need more work.
* tag 'nf-24-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: arptables: Select NETFILTER_FAMILY_ARP when building arp_tables.c
netfilter: nf_tables: skip netdev hook unregistration if table is dormant
netfilter: nf_tables: reject table flag and netdev basechain updates
netfilter: nf_tables: reject destroy command to remove basechain hooks
====================
Link: https://lore.kernel.org/r/20240328031855.2063-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Skip hook unregistration when adding or deleting devices from an
existing netdev basechain. Otherwise, commit/abort path try to
unregister hooks which not enabled.
Fixes: b9703ed44f ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Fixes: 7d937b1071 ("netfilter: nf_tables: support for deleting devices in an existing netdev chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netdev basechain updates are stored in the transaction object hook list.
When setting on the table dormant flag, it iterates over the existing
hooks in the basechain. Thus, skipping the hooks that are being
added/deleted in this transaction, which leaves hook registration in
inconsistent state.
Reject table flag updates in combination with netdev basechain updates
in the same batch:
- Update table flags and add/delete basechain: Check from basechain update
path if there are pending flag updates for this table.
- add/delete basechain and update table flags: Iterate over the transaction
list to search for basechain updates from the table update path.
In both cases, the batch is rejected. Based on suggestion from Florian Westphal.
Fixes: b9703ed44f ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Fixes: 7d937b1071 ("netfilter: nf_tables: support for deleting devices in an existing netdev chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Report EOPNOTSUPP if NFT_MSG_DESTROYCHAIN is used to delete hooks in an
existing netdev basechain, thus, only NFT_MSG_DELCHAIN is allowed.
Fixes: 7d937b1071 ("netfilter: nf_tables: support for deleting devices in an existing netdev chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The first fixes for v6.9. Ping-Ke Shih now maintains a separate tree
for Realtek drivers, document that in the MAINTAINERS. Plenty of fixes
for both to stack and iwlwifi. Our kunit tests were working only on um
architecture but that's fixed now.
-----BEGIN PGP SIGNATURE-----
iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmYEbzoRHGt2YWxvQGtl
cm5lbC5vcmcACgkQbhckVSbrbZsjZwgApoOcTn/mkX7DEViByMUpOrdNYqkJh+Tv
RkDDqhbA97i+zlxWp1dwtdfn0CYEcCW2XBucrfDNZMcR/cfXy2Wgdr6BD/FG9S2D
oQX6QQijO7g9uqNgDfIVAC0ftJEeWkM7YUhqNDVR751gjy2WOOJqPtSgNGd873By
P0rbHyfykHMzyYbwlzMLosO3RigefD1p1qkkODPf2OMo5A4tL1gL9AfEk3Kef9sf
9JHHWCLR378sm2sMpGw2Lxw4ypazl08ABu1yAWJk6Xipn80D/b08YUH/1yiKuq22
JrxhllJu2nqaHxXOzje2WEapTBz9tpTAwigOUQJiVZWm6ii19giGng==
=89Ft
-----END PGP SIGNATURE-----
Merge tag 'wireless-2024-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.9-rc2
The first fixes for v6.9. Ping-Ke Shih now maintains a separate tree
for Realtek drivers, document that in the MAINTAINERS. Plenty of fixes
for both to stack and iwlwifi. Our kunit tests were working only on um
architecture but that's fixed now.
* tag 'wireless-2024-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (21 commits)
MAINTAINERS: wifi: mwifiex: add Francesco as reviewer
kunit: fix wireless test dependencies
wifi: iwlwifi: mvm: include link ID when releasing frames
wifi: iwlwifi: mvm: handle debugfs names more carefully
wifi: iwlwifi: mvm: guard against invalid STA ID on removal
wifi: iwlwifi: read txq->read_ptr under lock
wifi: iwlwifi: fw: don't always use FW dump trig
wifi: iwlwifi: mvm: rfi: fix potential response leaks
wifi: mac80211: correctly set active links upon TTLM
wifi: iwlwifi: mvm: Configure the link mapping for non-MLD FW
wifi: iwlwifi: mvm: consider having one active link
wifi: iwlwifi: mvm: pick the version of SESSION_PROTECTION_NOTIF
wifi: mac80211: fix prep_connection error path
wifi: cfg80211: fix rdev_dump_mpp() arguments order
wifi: iwlwifi: mvm: disable MLO for the time being
wifi: cfg80211: add a flag to disable wireless extensions
wifi: mac80211: fix ieee80211_bss_*_flags kernel-doc
wifi: mac80211: check/clear fast rx for non-4addr sta VLAN changes
wifi: mac80211: fix mlme_link_id_dbg()
MAINTAINERS: wifi: add git tree for Realtek WiFi drivers
...
====================
Link: https://lore.kernel.org/r/20240327191346.1A1EAC433C7@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
At the start of tls_sw_recvmsg, we take a reference on the psock, and
then call tls_rx_reader_lock. If that fails, we return directly
without releasing the reference.
Instead of adding a new label, just take the reference after locking
has succeeded, since we don't need it before.
Fixes: 4cbc325ed6 ("tls: rx: allow only one reader at a time")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/fe2ade22d030051ce4c3638704ed58b67d0df643.1711120964.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
process_rx_list may not copy as many bytes as we want to the userspace
buffer, for example in case we hit an EFAULT during the copy. If this
happens, we should only count the bytes that were actually copied,
which may be 0.
Subtracting async_copy_bytes is correct in both peek and !peek cases,
because decrypted == async_copy_bytes + peeked for the peek case: peek
is always !ZC, and we can go through either the sync or async path. In
the async case, we add chunk to both decrypted and
async_copy_bytes. In the sync case, we add chunk to both decrypted and
peeked. I missed that in commit 6caaf10442 ("tls: fix peeking with
sync+async decryption").
Fixes: 4d42cd6bc2 ("tls: rx: fix return value for async crypto")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/1b5a1eaab3c088a9dd5d9f1059ceecd7afe888d1.1711120964.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Only MSG_PEEK needs to copy from an offset during the final
process_rx_list call, because the bytes we copied at the beginning of
tls_sw_recvmsg were left on the rx_list. In the KVEC case, we removed
data from the rx_list as we were copying it, so there's no need to use
an offset, just like in the normal case.
Fixes: 692d7b5d1f ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/e5487514f828e0347d2b92ca40002c62b58af73d.1711120964.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We had various syzbot reports about tcp timers firing after
the corresponding netns has been dismantled.
Fortunately Josef Bacik could trigger the issue more often,
and could test a patch I wrote two years ago.
When TCP sockets are closed, we call inet_csk_clear_xmit_timers()
to 'stop' the timers.
inet_csk_clear_xmit_timers() can be called from any context,
including when socket lock is held.
This is the reason it uses sk_stop_timer(), aka del_timer().
This means that ongoing timers might finish much later.
For user sockets, this is fine because each running timer
holds a reference on the socket, and the user socket holds
a reference on the netns.
For kernel sockets, we risk that the netns is freed before
timer can complete, because kernel sockets do not hold
reference on the netns.
This patch adds inet_csk_clear_xmit_timers_sync() function
that using sk_stop_timer_sync() to make sure all timers
are terminated before the kernel socket is released.
Modules using kernel sockets close them in their netns exit()
handler.
Also add sock_not_owned_by_me() helper to get LOCKDEP
support : inet_csk_clear_xmit_timers_sync() must not be called
while socket lock is held.
It is very possible we can revert in the future commit
3a58f13a88 ("net: rds: acquire refcount on TCP sockets")
which attempted to solve the issue in rds only.
(net/smc/af_smc.c and net/mptcp/subflow.c have similar code)
We probably can remove the check_net() tests from
tcp_out_of_resources() and __tcp_close() in the future.
Reported-by: Josef Bacik <josef@toxicpanda.com>
Closes: https://lore.kernel.org/netdev/20240314210740.GA2823176@perftesting/
Fixes: 26abe14379 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
Fixes: 8a68173691 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
Link: https://lore.kernel.org/bpf/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Josef Bacik <josef@toxicpanda.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/20240322135732.1535772-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
commit e748d0fd66 ("net: hsr: Disable promiscuous mode in
offload mode") disables promiscuous mode of slave devices
while creating an HSR interface. But while deleting the
HSR interface, it does not take care of it. It decreases the
promiscuous mode count, which eventually enables promiscuous
mode on the slave devices when creating HSR interface again.
Fix this by not decrementing the promiscuous mode count while
deleting the HSR interface when offload is enabled.
Fixes: e748d0fd66 ("net: hsr: Disable promiscuous mode in offload mode")
Signed-off-by: Ravi Gunasekaran <r-gunasekaran@ti.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20240322100447.27615-1-r-gunasekaran@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
sk->sk_rcvbuf in __sock_queue_rcv_skb() and __sk_receive_skb() can be
changed by other threads. Mark this as benign using READ_ONCE().
This patch is aimed at reducing the number of benign races reported by
KCSAN in order to focus future debugging effort on harmful races.
Signed-off-by: linke li <lilinke99@qq.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix ieee80211_ttlm_set_links() to not set all active links,
but instead let the driver know that valid links status changed
and select the active links properly.
Fixes: 8f500fbc6c ("wifi: mac80211: process and save negotiated TID to Link mapping request")
Signed-off-by: Ayala Beker <ayala.beker@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240318184907.acddbbf39584.Ide858f95248fcb3e483c97fcaa14b0cd4e964b10@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If prep_channel fails in prep_connection, the code releases
the deflink's chanctx, which is wrong since we may be using
a different link. It's already wrong to even do that always
though, since we might still have the station. Remove it
only if prep_channel succeeded and later updates fail.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20240318184907.2780c1f08c3d.I033c9b15483933088f32a2c0789612a33dd33d82@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fix the order of arguments in the TP_ARGS macro
for the rdev_dump_mpp tracepoint event.
Found by Linux Verification Center (linuxtesting.org).
Signed-off-by: Igor Artemiev <Igor.A.Artemiev@mcst.ru>
Link: https://msgid.link/20240311164519.118398-1-Igor.A.Artemiev@mcst.ru
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Wireless extensions are already disabled if MLO is enabled,
given that we cannot support MLO there with all the hard-
coded assumptions about BSSID etc.
However, the WiFi7 ecosystem is still stabilizing, and some
devices may need MLO disabled while that happens. In that
case, we might end up with a device that supports wext (but
not MLO) in one kernel, and then breaks wext in the future
(by enabling MLO), which is not desirable.
Add a flag to let such drivers/devices disable wext even if
MLO isn't yet enabled.
Cc: stable@vger.kernel.org
Link: https://msgid.link/20240314110951.b50f1dc4ec21.I656ddd8178eedb49dc5c6c0e70f8ce5807afb54f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Running kernel-doc on ieee80211_i.h flagged the following:
net/mac80211/ieee80211_i.h:145: warning: expecting prototype for enum ieee80211_corrupt_data_flags. Prototype was for enum ieee80211_bss_corrupt_data_flags instead
net/mac80211/ieee80211_i.h:162: warning: expecting prototype for enum ieee80211_valid_data_flags. Prototype was for enum ieee80211_bss_valid_data_flags instead
Fix these warnings.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://msgid.link/20240314-kdoc-ieee80211_i-v1-1-72b91b55b257@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When moving a station out of a VLAN and deleting the VLAN afterwards, the
fast_rx entry still holds a pointer to the VLAN's netdev, which can cause
use-after-free bugs. Fix this by immediately calling ieee80211_check_fast_rx
after the VLAN change.
Cc: stable@vger.kernel.org
Reported-by: ranygh@riseup.net
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://msgid.link/20240316074336.40442-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Make sure that the new mlme_link_id_dbg() macro honours
CONFIG_MAC80211_MLME_DEBUG as intended to avoid spamming the log with
messages like:
wlan0: no EHT support, limiting to HE
wlan0: determined local STA to be HE, BW limited to 160 MHz
wlan0: determined AP xx:xx:xx:xx:xx:xx to be VHT
wlan0: connecting with VHT mode, max bandwidth 160 MHz
Fixes: 310c8387c6 ("wifi: mac80211: clean up connection process")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://msgid.link/20240325085948.26203-1-johan+linaro@kernel.org
Tested-by: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Cited commit started returning an error when user space requests to dump
the interface's IPv6 addresses and IPv6 is disabled on the interface.
Restore the previous behavior and do not return an error.
Before cited commit:
# ip address show dev dummy1
2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 1a:52:02:5a:c2:6e brd ff:ff:ff:ff:ff:ff
inet6 fe80::1852:2ff:fe5a:c26e/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
# ip link set dev dummy1 mtu 1000
# ip address show dev dummy1
2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1000 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 1a:52:02:5a:c2:6e brd ff:ff:ff:ff:ff:ff
After cited commit:
# ip address show dev dummy1
2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 1e:9b:94:00:ac:e8 brd ff:ff:ff:ff:ff:ff
inet6 fe80::1c9b:94ff:fe00:ace8/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
# ip link set dev dummy1 mtu 1000
# ip address show dev dummy1
RTNETLINK answers: No such device
Dump terminated
With this patch:
# ip address show dev dummy1
2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 42:35:fc:53:66:cf brd ff:ff:ff:ff:ff:ff
inet6 fe80::4035:fcff:fe53:66cf/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
# ip link set dev dummy1 mtu 1000
# ip address show dev dummy1
2: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1000 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 42:35:fc:53:66:cf brd ff:ff:ff:ff:ff:ff
Fixes: 9cc4cc329d ("ipv6: use xa_array iterator to implement inet6_dump_addr()")
Reported-by: Gal Pressman <gal@nvidia.com>
Closes: https://lore.kernel.org/netdev/7e261328-42eb-411d-b1b4-ad884eeaae4d@linux.dev/
Tested-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240321173042.2151756-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The "*hw_stats_used" value needs to be set on the success paths to prevent
an uninitialized variable bug in the caller, nla_put_nh_group_stats().
Fixes: 5072ae00ae ("net: nexthop: Expose nexthop group HW stats to user space")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/f08ac289-d57f-4a1a-830f-cf9a0563cb9c@moroto.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot reported the following uninit-value access issue [1][2]:
nci_rx_work() parses and processes received packet. When the payload
length is zero, each message type handler reads uninitialized payload
and KMSAN detects this issue. The receipt of a packet with a zero-size
payload is considered unexpected, and therefore, such packets should be
silently discarded.
This patch resolved this issue by checking payload size before calling
each message type handler codes.
Fixes: 6a2968aaf5 ("NFC: basic NCI protocol implementation")
Reported-and-tested-by: syzbot+7ea9413ea6749baf5574@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+29b5ca705d2e0f4a44d2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=7ea9413ea6749baf5574 [1]
Closes: https://syzkaller.appspot.com/bug?extid=29b5ca705d2e0f4a44d2 [2]
Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
Reviewed-by: Jeremy Cline <jeremy@jcline.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current release - regressions:
- rxrpc: fix use of page_frag_alloc_align(), it changed semantics
and we added a new caller in a different subtree
- xfrm: allow UDP encapsulation only in offload modes
Current release - new code bugs:
- tcp: fix refcnt handling in __inet_hash_connect()
- Revert "net: Re-use and set mono_delivery_time bit for userspace tstamp
packets", conflicted with some expectations in BPF uAPI
Previous releases - regressions:
- ipv4: raw: fix sending packets from raw sockets via IPsec tunnels
- devlink: fix devlink's parallel command processing
- veth: do not manipulate GRO when using XDP
- esp: fix bad handling of pages from page_pool
Previous releases - always broken:
- report RCU QS for busy network kthreads (with Paul McK's blessing)
- tcp/rds: fix use-after-free on netns with kernel TCP reqsk
- virt: vmxnet3: fix missing reserved tailroom with XDP
Misc:
- couple of build fixes for Documentation
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmX8bXsACgkQMUZtbf5S
IrsfBg/+KzrEx0tB/Af57ZZGZ5PMjPy+XFDox4iFfHm338UFuGXVvZrXd7G+6YkH
ZwWeF5YDPKzwIEiZ5D3hewZPlkLH0Eg88q74chlE0gUv7t1jhuQHUdIVeFnPcLbN
t/8AcCZCJ2fENbr1iNnzZON1RW0fVOl+SDxhiSYFeFqii6FywDfqWL/h0u86H/AF
KRktgb0LzH0waH6IiefVV1NZyjnZwmQ6+UVQerTzUnQmWhV1xQKoO3MQpZuFRvr6
O+kPZMkrqnTCCy7RO1BexS5cefqc80i5Z25FLGcaHgpnYd2pDNDMMxqrhqO9Y0Pv
6u/tLgRxzVUDXWouzREIRe50Z9GJswkg78zilAhpqYiHRjd8jaBH6y+9mhGFc7F8
iVAx02WfJhlk0aynFf2qZmR7PQIb9XjtFJ7OAeJrno9UD7zAubtikGM/6m6IZfRV
TD1mze95RVnNjbHZMeg6oNLFUMJXVTobtvtqk5pTQvsNsmSYGFvkvWC5/P6ycyYt
pMx6E0PA/ZCnQAlThCOCzFa5BO+It3RJHcQJhgbOzHrlWKwmrjBKcKJcLLcxFSUt
4wwjdEcG1Bo2wdnsjwsQwJDHQW+M9TSLdLM3YVptM9jbqOMizoqr6/xSykg3H4wZ
t/dSiYSsEr06z7lvwbAjUXJ/mfszZ+JsVAFXAN7ahcM4OZb5WTQ=
=gpLl
-----END PGP SIGNATURE-----
Merge tag 'net-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from CAN, netfilter, wireguard and IPsec.
I'd like to highlight [ lowlight? - Linus ] Florian W stepping down as
a netfilter maintainer due to constant stream of bug reports. Not sure
what we can do but IIUC this is not the first such case.
Current release - regressions:
- rxrpc: fix use of page_frag_alloc_align(), it changed semantics and
we added a new caller in a different subtree
- xfrm: allow UDP encapsulation only in offload modes
Current release - new code bugs:
- tcp: fix refcnt handling in __inet_hash_connect()
- Revert "net: Re-use and set mono_delivery_time bit for userspace
tstamp packets", conflicted with some expectations in BPF uAPI
Previous releases - regressions:
- ipv4: raw: fix sending packets from raw sockets via IPsec tunnels
- devlink: fix devlink's parallel command processing
- veth: do not manipulate GRO when using XDP
- esp: fix bad handling of pages from page_pool
Previous releases - always broken:
- report RCU QS for busy network kthreads (with Paul McK's blessing)
- tcp/rds: fix use-after-free on netns with kernel TCP reqsk
- virt: vmxnet3: fix missing reserved tailroom with XDP
Misc:
- couple of build fixes for Documentation"
* tag 'net-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits)
selftests: forwarding: Fix ping failure due to short timeout
MAINTAINERS: step down as netfilter maintainer
netfilter: nf_tables: Fix a memory leak in nf_tables_updchain
net: dsa: mt7530: fix handling of all link-local frames
net: dsa: mt7530: fix link-local frames that ingress vlan filtering ports
bpf: report RCU QS in cpumap kthread
net: report RCU QS on threaded NAPI repolling
rcu: add a helper to report consolidated flavor QS
ionic: update documentation for XDP support
lib/bitmap: Fix bitmap_scatter() and bitmap_gather() kernel doc
netfilter: nf_tables: do not compare internal table flags on updates
netfilter: nft_set_pipapo: release elements in clone only from destroy path
octeontx2-af: Use separate handlers for interrupts
octeontx2-pf: Send UP messages to VF only when VF is up.
octeontx2-pf: Use default max_active works instead of one
octeontx2-pf: Wait till detach_resources msg is complete
octeontx2: Detect the mbox up or down message via register
devlink: fix port new reply cmd type
tcp: Clear req->syncookie in reqsk_alloc().
net/bnx2x: Prevent access to a freed page in page_pool
...
- Generate a list of built DTB files (arch/*/boot/dts/dtbs-list)
- Use more threads when building Debian packages in parallel
- Fix warnings shown during the RPM kernel package uninstallation
- Change OBJECT_FILES_NON_STANDARD_*.o etc. to take a relative path to
Makefile
- Support GCC's -fmin-function-alignment flag
- Fix a null pointer dereference bug in modpost
- Add the DTB support to the RPM package
- Various fixes and cleanups in Kconfig
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmX8HGIVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGYfIQAIl/zEFoNVSHGR4TIvO7SIwkT4MM
VAm0W6XRFaXfIGw8HL/MXe+U9jAyeQ9yL9uUVv8PqFTO+LzBbW1X1X97tlmrlQsC
7mdxbA1KJXwkwt4wH/8/EZQMwHr327vtVH4AilSm+gAaWMXaSKAye3ulKQQ2gevz
vP6aOcfbHIWOPdxA53cLdSl9LOGrYNczKySHXKV9O39T81F+ko7wPpdkiMWw5LWG
ISRCV8bdXli8j10Pmg8jlbevSKl4Z5FG2BVw/Cl8rQ5tBBoCzFsUPnnp9A29G8QP
OqRhbwxtkSm67BMJAYdHnhjp/l0AOEbmetTGpna+R06hirOuXhR3vc6YXZxhQjff
LmKaqfG5YchRALS1fNDsRUNIkQxVJade+tOUG+V4WbxHQKWX7Ghu5EDlt2/x7P0p
+XLPE48HoNQLQOJ+pgIOkaEDl7WLfGhoEtEgprZBuEP2h39xcdbYJyF10ZAAR4UZ
FF6J9lDHbf7v1uqD2YnAQJQ6jJ06CvN6/s6SdiJnCWSs5cYRW0fnYigSIuwAgGHZ
c/QFECoGEflXGGuqZDl5iXiIjhWKzH2nADSVEs7maP47vapcMWb9gA7VBNoOr5M0
IXuFo1khChF4V2pxqlDj3H5TkDlFENYT/Wjh+vvjx8XplKCRKaSh+LaZ39hja61V
dWH7BPecS44h4KXx
=tFdl
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Generate a list of built DTB files (arch/*/boot/dts/dtbs-list)
- Use more threads when building Debian packages in parallel
- Fix warnings shown during the RPM kernel package uninstallation
- Change OBJECT_FILES_NON_STANDARD_*.o etc. to take a relative path to
Makefile
- Support GCC's -fmin-function-alignment flag
- Fix a null pointer dereference bug in modpost
- Add the DTB support to the RPM package
- Various fixes and cleanups in Kconfig
* tag 'kbuild-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (67 commits)
kconfig: tests: test dependency after shuffling choices
kconfig: tests: add a test for randconfig with dependent choices
kconfig: tests: support KCONFIG_SEED for the randconfig runner
kbuild: rpm-pkg: add dtb files in kernel rpm
kconfig: remove unneeded menu_is_visible() call in conf_write_defconfig()
kconfig: check prompt for choice while parsing
kconfig: lxdialog: remove unused dialog colors
kconfig: lxdialog: fix button color for blackbg theme
modpost: fix null pointer dereference
kbuild: remove GCC's default -Wpacked-bitfield-compat flag
kbuild: unexport abs_srctree and abs_objtree
kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1
kconfig: remove named choice support
kconfig: use linked list in get_symbol_str() to iterate over menus
kconfig: link menus to a symbol
kbuild: fix inconsistent indentation in top Makefile
kbuild: Use -fmin-function-alignment when available
alpha: merge two entries for CONFIG_ALPHA_GAMMA
alpha: merge two entries for CONFIG_ALPHA_EV4
kbuild: change DTC_FLAGS_<basetarget>.o to take the path relative to $(obj)
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmX8FywACgkQ1V2XiooU
IOQV6Q/+MR7+Dq1aIBvVgWyjP2IHra/2t4w3+YlQsMTEUCVgURjrkYDMU5xQ42YG
3SC2XNvKBvupHWqCF5Rw0xFWTUdupgfE0b1YeaHDePenq+1Xd3gLLWKDc1ugnkB4
Z1hF90MTMO/JpWPDieimGHuPqlydDpYvJiP05g0Oz8FCczvIalmAj34eJqe4nKA/
Ymig/FKx0i8QE8YJp0UCyYNDkBAuFvHFLRM+FmppkeldNg2Lnhoh0+xW0Feei8HT
Ljew7yRs5emniRAy6tPgOBumBrah4lxXD56J6Ua4vIRF4a3wQ1b+hW2YOUl0VyPh
ALsq1AM/3PuGCiy1ooUbVpyHHEO3WgboPIVDc8GZ7ISxUxzzDKbRoCWdiLuiuLqZ
hvuyrfON5gMLp082/KEFg8zbsqOdAwHPV+5qcjQMkhPBb6/qHQqBmE2wNjZlid1p
m/oGODUgSBKsjZLa8pMyw96YfuaQXyjjsSsnFqx0v3Zmu8nOZdrLslSpwKaDK6fL
wZKEna+3n3skS8f2Y9qstlaCdJmU5nJjAbl2knQdGEYr9hMF/y+dP65w2JpPAZ+9
mHRubyD+IKR72QscJbXVk1GVcaZlDkuCITjwCxviBZvd1L31iMqorc1pcyoBlJaV
ddMQWpVOKZJ3FXyNx4+Ro03GA5JwJwhTzfFIuN2KNvBbaI+9mKM=
=8vTl
-----END PGP SIGNATURE-----
Merge tag 'nf-24-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net. There is a
larger batch of fixes still pending that will follow up asap, this is
what I deemed to be more urgent at this time:
1) Use clone view in pipapo set backend to release elements from destroy
path, otherwise it is possible to destroy elements twice.
2) Incorrect check for internal table flags lead to bogus transaction
objects.
3) Fix counters memleak in netdev basechain update error path,
from Quan Tian.
netfilter pull request 24-03-21
* tag 'nf-24-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: Fix a memory leak in nf_tables_updchain
netfilter: nf_tables: do not compare internal table flags on updates
netfilter: nft_set_pipapo: release elements in clone only from destroy path
====================
Link: https://lore.kernel.org/r/20240321112117.36737-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
If nft_netdev_register_hooks() fails, the memory associated with
nft_stats is not freed, causing a memory leak.
This patch fixes it by moving nft_stats_alloc() down after
nft_netdev_register_hooks() succeeds.
Fixes: b9703ed44f ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Signed-off-by: Quan Tian <tianquan23@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
NAPI threads can keep polling packets under load. Currently it is only
calling cond_resched() before repolling, but it is not sufficient to
clear out the holdout of RCU tasks, which prevent BPF tracing programs
from detaching for long period. This can be reproduced easily with
following set up:
ip netns add test1
ip netns add test2
ip -n test1 link add veth1 type veth peer name veth2 netns test2
ip -n test1 link set veth1 up
ip -n test1 link set lo up
ip -n test2 link set veth2 up
ip -n test2 link set lo up
ip -n test1 addr add 192.168.1.2/31 dev veth1
ip -n test1 addr add 1.1.1.1/32 dev lo
ip -n test2 addr add 192.168.1.3/31 dev veth2
ip -n test2 addr add 2.2.2.2/31 dev lo
ip -n test1 route add default via 192.168.1.3
ip -n test2 route add default via 192.168.1.2
for i in `seq 10 210`; do
for j in `seq 10 210`; do
ip netns exec test2 iptables -I INPUT -s 3.3.$i.$j -p udp --dport 5201
done
done
ip netns exec test2 ethtool -K veth2 gro on
ip netns exec test2 bash -c 'echo 1 > /sys/class/net/veth2/threaded'
ip netns exec test1 ethtool -K veth1 tso off
Then run an iperf3 client/server and a bpftrace script can trigger it:
ip netns exec test2 iperf3 -s -B 2.2.2.2 >/dev/null&
ip netns exec test1 iperf3 -c 2.2.2.2 -B 1.1.1.1 -u -l 1500 -b 3g -t 100 >/dev/null&
bpftrace -e 'kfunc:__napi_poll{@=count();} interval:s:1{exit();}'
Report RCU quiescent states periodically will resolve the issue.
Fixes: 29863d41bb ("net: implement threaded-able napi poll loop support")
Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/4c3b0d3f32d3b18949d75b18e5e1d9f13a24f025.1710877680.git.yan@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Restore skipping transaction if table update does not modify flags.
Fixes: 179d9ba555 ("netfilter: nf_tables: fix table flag updates")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>