Commit Graph

1170785 Commits

Author SHA1 Message Date
Jesper Dangaard Brouer
0f26b74e7d selftests/bpf: Adjust bpf_xdp_metadata_rx_hash for new arg
Update BPF selftests to use the new RSS type argument for kfunc
bpf_xdp_metadata_rx_hash.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132894068.340624.8914711185697163690.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-13 11:15:11 -07:00
Jesper Dangaard Brouer
9123397aee mlx4: bpf_xdp_metadata_rx_hash add xdp rss hash type
Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type
via matching individual Completion Queue Entry (CQE) status bits.

Fixes: ab46182d0d ("net/mlx4_en: Support RX XDP metadata")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132893562.340624.12779118462402031248.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-13 11:15:11 -07:00
Jesper Dangaard Brouer
96b1a098f3 veth: bpf_xdp_metadata_rx_hash add xdp rss hash type
Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type.

The veth driver currently only support XDP-hints based on SKB code path.
The SKB have lost information about the RSS hash type, by compressing
the information down to a single bitfield skb->l4_hash, that only knows
if this was a L4 hash value.

In preparation for veth, the xdp_rss_hash_type have an L4 indication
bit that allow us to return a meaningful L4 indication when working
with SKB based packets.

Fixes: 306531f024 ("veth: Support RX XDP metadata")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132893055.340624.16209448340644513469.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-13 11:15:10 -07:00
Jesper Dangaard Brouer
67f245c2ec mlx5: bpf_xdp_metadata_rx_hash add xdp rss hash type
Update API for bpf_xdp_metadata_rx_hash() with arg for xdp rss hash type
via mapping table.

The mlx5 hardware can also identify and RSS hash IPSEC.  This indicate
hash includes SPI (Security Parameters Index) as part of IPSEC hash.

Extend xdp core enum xdp_rss_hash_type with IPSEC hash type.

Fixes: bc8d405b1b ("net/mlx5e: Support RX XDP metadata")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132892548.340624.11185734579430124869.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-13 11:15:10 -07:00
Jesper Dangaard Brouer
0cd917a4a8 xdp: rss hash types representation
The RSS hash type specifies what portion of packet data NIC hardware used
when calculating RSS hash value. The RSS types are focused on Internet
traffic protocols at OSI layers L3 and L4. L2 (e.g. ARP) often get hash
value zero and no RSS type. For L3 focused on IPv4 vs. IPv6, and L4
primarily TCP vs UDP, but some hardware supports SCTP.

Hardware RSS types are differently encoded for each hardware NIC. Most
hardware represent RSS hash type as a number. Determining L3 vs L4 often
requires a mapping table as there often isn't a pattern or sorting
according to ISO layer.

The patch introduce a XDP RSS hash type (enum xdp_rss_hash_type) that
contains both BITs for the L3/L4 types, and combinations to be used by
drivers for their mapping tables. The enum xdp_rss_type_bits get exposed
to BPF via BTF, and it is up to the BPF-programmer to match using these
defines.

This proposal change the kfunc API bpf_xdp_metadata_rx_hash() adding
a pointer value argument for provide the RSS hash type.
Change signature for all xmo_rx_hash calls in drivers to make it compile.

The RSS type implementations for each driver comes as separate patches.

Fixes: 3d76a4d3d4 ("bpf: XDP metadata RX kfuncs")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132892042.340624.582563003880565460.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-13 11:15:10 -07:00
Jesper Dangaard Brouer
e8163b98d9 selftests/bpf: xdp_hw_metadata remove bpf_printk and add counters
The tool xdp_hw_metadata can be used by driver developers
implementing XDP-hints metadata kfuncs.

Remove all bpf_printk calls, as the tool already transfers all the
XDP-hints related information via metadata area to AF_XDP
userspace process.

Add counters for providing remaining information about failure and
skipped packet events.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/168132891533.340624.7313781245316405141.stgit@firesoul
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-13 11:15:10 -07:00
Liang Chen
0646dc31ca skbuff: Fix a race between coalescing and releasing SKBs
Commit 1effe8ca4e ("skbuff: fix coalescing for page_pool fragment
recycling") allowed coalescing to proceed with non page pool page and page
pool page when @from is cloned, i.e.

to->pp_recycle    --> false
from->pp_recycle  --> true
skb_cloned(from)  --> true

However, it actually requires skb_cloned(@from) to hold true until
coalescing finishes in this situation. If the other cloned SKB is
released while the merging is in process, from_shinfo->nr_frags will be
set to 0 toward the end of the function, causing the increment of frag
page _refcount to be unexpectedly skipped resulting in inconsistent
reference counts. Later when SKB(@to) is released, it frees the page
directly even though the page pool page is still in use, leading to
use-after-free or double-free errors. So it should be prohibited.

The double-free error message below prompted us to investigate:
BUG: Bad page state in process swapper/1  pfn:0e0d1
page:00000000c6548b28 refcount:-1 mapcount:0 mapping:0000000000000000
index:0x2 pfn:0xe0d1
flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc0000000 0000000000000000 ffffffff00000101 0000000000000000
raw: 0000000000000002 0000000000000000 ffffffffffffffff 0000000000000000
page dumped because: nonzero _refcount

CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E      6.2.0+
Call Trace:
 <IRQ>
dump_stack_lvl+0x32/0x50
bad_page+0x69/0xf0
free_pcp_prepare+0x260/0x2f0
free_unref_page+0x20/0x1c0
skb_release_data+0x10b/0x1a0
napi_consume_skb+0x56/0x150
net_rx_action+0xf0/0x350
? __napi_schedule+0x79/0x90
__do_softirq+0xc8/0x2b1
__irq_exit_rcu+0xb9/0xf0
common_interrupt+0x82/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40
RIP: 0010:default_idle+0xb/0x20

Fixes: 53e0961da1 ("page_pool: add frag page recycling support in page pool")
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230413090353.14448-1-liangchen.linux@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 10:08:42 -07:00
Roman Gushchin
e8b7445355 net: macb: fix a memory corruption in extended buffer descriptor mode
For quite some time we were chasing a bug which looked like a sudden
permanent failure of networking and mmc on some of our devices.
The bug was very sensitive to any software changes and even more to
any kernel debug options.

Finally we got a setup where the problem was reproducible with
CONFIG_DMA_API_DEBUG=y and it revealed the issue with the rx dma:

[   16.992082] ------------[ cut here ]------------
[   16.996779] DMA-API: macb ff0b0000.ethernet: device driver tries to free DMA memory it has not allocated [device address=0x0000000875e3e244] [size=1536 bytes]
[   17.011049] WARNING: CPU: 0 PID: 85 at kernel/dma/debug.c:1011 check_unmap+0x6a0/0x900
[   17.018977] Modules linked in: xxxxx
[   17.038823] CPU: 0 PID: 85 Comm: irq/55-8000f000 Not tainted 5.4.0 #28
[   17.045345] Hardware name: xxxxx
[   17.049528] pstate: 60000005 (nZCv daif -PAN -UAO)
[   17.054322] pc : check_unmap+0x6a0/0x900
[   17.058243] lr : check_unmap+0x6a0/0x900
[   17.062163] sp : ffffffc010003c40
[   17.065470] x29: ffffffc010003c40 x28: 000000004000c03c
[   17.070783] x27: ffffffc010da7048 x26: ffffff8878e38800
[   17.076095] x25: ffffff8879d22810 x24: ffffffc010003cc8
[   17.081407] x23: 0000000000000000 x22: ffffffc010a08750
[   17.086719] x21: ffffff8878e3c7c0 x20: ffffffc010acb000
[   17.092032] x19: 0000000875e3e244 x18: 0000000000000010
[   17.097343] x17: 0000000000000000 x16: 0000000000000000
[   17.102647] x15: ffffff8879e4a988 x14: 0720072007200720
[   17.107959] x13: 0720072007200720 x12: 0720072007200720
[   17.113261] x11: 0720072007200720 x10: 0720072007200720
[   17.118565] x9 : 0720072007200720 x8 : 000000000000022d
[   17.123869] x7 : 0000000000000015 x6 : 0000000000000098
[   17.129173] x5 : 0000000000000000 x4 : 0000000000000000
[   17.134475] x3 : 00000000ffffffff x2 : ffffffc010a1d370
[   17.139778] x1 : b420c9d75d27bb00 x0 : 0000000000000000
[   17.145082] Call trace:
[   17.147524]  check_unmap+0x6a0/0x900
[   17.151091]  debug_dma_unmap_page+0x88/0x90
[   17.155266]  gem_rx+0x114/0x2f0
[   17.158396]  macb_poll+0x58/0x100
[   17.161705]  net_rx_action+0x118/0x400
[   17.165445]  __do_softirq+0x138/0x36c
[   17.169100]  irq_exit+0x98/0xc0
[   17.172234]  __handle_domain_irq+0x64/0xc0
[   17.176320]  gic_handle_irq+0x5c/0xc0
[   17.179974]  el1_irq+0xb8/0x140
[   17.183109]  xiic_process+0x5c/0xe30
[   17.186677]  irq_thread_fn+0x28/0x90
[   17.190244]  irq_thread+0x208/0x2a0
[   17.193724]  kthread+0x130/0x140
[   17.196945]  ret_from_fork+0x10/0x20
[   17.200510] ---[ end trace 7240980785f81d6f ]---

[  237.021490] ------------[ cut here ]------------
[  237.026129] DMA-API: exceeded 7 overlapping mappings of cacheline 0x0000000021d79e7b
[  237.033886] WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:499 add_dma_entry+0x214/0x240
[  237.041802] Modules linked in: xxxxx
[  237.061637] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W         5.4.0 #28
[  237.068941] Hardware name: xxxxx
[  237.073116] pstate: 80000085 (Nzcv daIf -PAN -UAO)
[  237.077900] pc : add_dma_entry+0x214/0x240
[  237.081986] lr : add_dma_entry+0x214/0x240
[  237.086072] sp : ffffffc010003c30
[  237.089379] x29: ffffffc010003c30 x28: ffffff8878a0be00
[  237.094683] x27: 0000000000000180 x26: ffffff8878e387c0
[  237.099987] x25: 0000000000000002 x24: 0000000000000000
[  237.105290] x23: 000000000000003b x22: ffffffc010a0fa00
[  237.110594] x21: 0000000021d79e7b x20: ffffffc010abe600
[  237.115897] x19: 00000000ffffffef x18: 0000000000000010
[  237.121201] x17: 0000000000000000 x16: 0000000000000000
[  237.126504] x15: ffffffc010a0fdc8 x14: 0720072007200720
[  237.131807] x13: 0720072007200720 x12: 0720072007200720
[  237.137111] x11: 0720072007200720 x10: 0720072007200720
[  237.142415] x9 : 0720072007200720 x8 : 0000000000000259
[  237.147718] x7 : 0000000000000001 x6 : 0000000000000000
[  237.153022] x5 : ffffffc010003a20 x4 : 0000000000000001
[  237.158325] x3 : 0000000000000006 x2 : 0000000000000007
[  237.163628] x1 : 8ac721b3a7dc1c00 x0 : 0000000000000000
[  237.168932] Call trace:
[  237.171373]  add_dma_entry+0x214/0x240
[  237.175115]  debug_dma_map_page+0xf8/0x120
[  237.179203]  gem_rx_refill+0x190/0x280
[  237.182942]  gem_rx+0x224/0x2f0
[  237.186075]  macb_poll+0x58/0x100
[  237.189384]  net_rx_action+0x118/0x400
[  237.193125]  __do_softirq+0x138/0x36c
[  237.196780]  irq_exit+0x98/0xc0
[  237.199914]  __handle_domain_irq+0x64/0xc0
[  237.204000]  gic_handle_irq+0x5c/0xc0
[  237.207654]  el1_irq+0xb8/0x140
[  237.210789]  arch_cpu_idle+0x40/0x200
[  237.214444]  default_idle_call+0x18/0x30
[  237.218359]  do_idle+0x200/0x280
[  237.221578]  cpu_startup_entry+0x20/0x30
[  237.225493]  rest_init+0xe4/0xf0
[  237.228713]  arch_call_rest_init+0xc/0x14
[  237.232714]  start_kernel+0x47c/0x4a8
[  237.236367] ---[ end trace 7240980785f81d70 ]---

Lars was fast to find an explanation: according to the datasheet
bit 2 of the rx buffer descriptor entry has a different meaning in the
extended mode:
  Address [2] of beginning of buffer, or
  in extended buffer descriptor mode (DMA configuration register [28] = 1),
  indicates a valid timestamp in the buffer descriptor entry.

The macb driver didn't mask this bit while getting an address and it
eventually caused a memory corruption and a dma failure.

The problem is resolved by explicitly clearing the problematic bit
if hw timestamping is used.

Fixes: 7b42961480 ("net: macb: Add support for PTP timestamps in DMA descriptors")
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Co-developed-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230412232144.770336-1-roman.gushchin@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 10:05:07 -07:00
Xin Long
3a0385be13 selftests: add the missing CONFIG_IP_SCTP in net config
The selftest sctp_vrf needs CONFIG_IP_SCTP set in config
when building the kernel, so add it.

Fixes: a61bd7b9fe ("selftests: add a selftest for sctp vrf")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://lore.kernel.org/r/61dddebc4d2dd98fe7fb145e24d4b2430e42b572.1681312386.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 10:04:55 -07:00
Eric Dumazet
1c5950fc6f udp6: fix potential access to stale information
lena wang reported an issue caused by udpv6_sendmsg()
mangling msg->msg_name and msg->msg_namelen, which
are later read from ____sys_sendmsg() :

	/*
	 * If this is sendmmsg() and sending to current destination address was
	 * successful, remember it.
	 */
	if (used_address && err >= 0) {
		used_address->name_len = msg_sys->msg_namelen;
		if (msg_sys->msg_name)
			memcpy(&used_address->name, msg_sys->msg_name,
			       used_address->name_len);
	}

udpv6_sendmsg() wants to pretend the remote address family
is AF_INET in order to call udp_sendmsg().

A fix would be to modify the address in-place, instead
of using a local variable, but this could have other side effects.

Instead, restore initial values before we return from udpv6_sendmsg().

Fixes: c71d8ebe7a ("net: Fix security_socket_sendmsg() bypass problem.")
Reported-by: lena wang <lena.wang@mediatek.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20230412130308.1202254-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 10:04:37 -07:00
Aaron Conole
306dc21361 selftests: openvswitch: adjust datapath NL message declaration
The netlink message for creating a new datapath takes an array
of ports for the PID creation.  This shouldn't cause much issue
but correct it for future cases where we need to do decode of
datapath information that could include the per-cpu PID map.

Fixes: 25f16c873f ("selftests: add openvswitch selftest suite")
Signed-off-by: Aaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/20230412115828.3991806-1-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 10:01:23 -07:00
Jakub Kicinski
ecfcc6fbeb Merge branch 'mptcp-more-fixes-for-6-3'
Matthieu Baerts says:

====================
mptcp: more fixes for 6.3

Patch 1 avoids scheduling the MPTCP worker on a closed socket on some
edge cases. It fixes issues that can be visible from v5.11.

Patch 2 makes sure the MPTCP worker doesn't try to manipulate
disconnected sockets. This is also a fix for an issue that can be
visible from v5.11.

Patch 3 fixes a NULL pointer dereference when MPTCP FastOpen is used
and an early fallback is done. A fix for v6.2.

Patch 4 improves the stability of the userspace PM selftest for a
subtest added in v6.2.
====================

Link: https://lore.kernel.org/r/20230411-upstream-net-20230411-mptcp-fixes-v1-0-ca540f3ef986@tessares.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 09:59:01 -07:00
Matthieu Baerts
711ae788cb selftests: mptcp: userspace pm: uniform verify events
Simply adding a "sleep" before checking something is usually not a good
idea because the time that has been picked can not be enough or too
much. The best is to wait for events with a timeout.

In this selftest, 'sleep 0.5' is used more than 40 times. It is always
used before calling a 'verify_*' function except for this
verify_listener_events which has been added later.

At the end, using all these 'sleep 0.5' seems to work: the slow CIs
don't complain so far. Also because it doesn't take too much time, we
can just add two more 'sleep 0.5' to uniform what is done before calling
a 'verify_*' function. For the same reasons, we can also delay a bigger
refactoring to replace all these 'sleep 0.5' by functions waiting for
events instead of waiting for a fix time and hope for the best.

Fixes: 6c73008aa3 ("selftests: mptcp: listener test for userspace PM")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 09:58:55 -07:00
Paolo Abeni
c0ff6f6da6 mptcp: fix NULL pointer dereference on fastopen early fallback
In case of early fallback to TCP, subflow_syn_recv_sock() deletes
the subflow context before returning the newly allocated sock to
the caller.

The fastopen path does not cope with the above unconditionally
dereferencing the subflow context.

Fixes: 36b122baf6 ("mptcp: add subflow_v(4,6)_send_synack()")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 09:58:55 -07:00
Paolo Abeni
d6a0443733 mptcp: stricter state check in mptcp_worker
As reported by Christoph, the mptcp protocol can run the
worker when the relevant msk socket is in an unexpected state:

connect()
// incoming reset + fastclose
// the mptcp worker is scheduled
mptcp_disconnect()
// msk is now CLOSED
listen()
mptcp_worker()

Leading to the following splat:

divide error: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 21 Comm: kworker/1:0 Not tainted 6.3.0-rc1-gde5e8fd0123c #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Workqueue: events mptcp_worker
RIP: 0010:__tcp_select_window+0x22c/0x4b0 net/ipv4/tcp_output.c:3018
RSP: 0018:ffffc900000b3c98 EFLAGS: 00010293
RAX: 000000000000ffd7 RBX: 000000000000ffd7 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8214ce97 RDI: 0000000000000004
RBP: 000000000000ffd7 R08: 0000000000000004 R09: 0000000000010000
R10: 000000000000ffd7 R11: ffff888005afa148 R12: 000000000000ffd7
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88803ed00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000405270 CR3: 000000003011e006 CR4: 0000000000370ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 tcp_select_window net/ipv4/tcp_output.c:262 [inline]
 __tcp_transmit_skb+0x356/0x1280 net/ipv4/tcp_output.c:1345
 tcp_transmit_skb net/ipv4/tcp_output.c:1417 [inline]
 tcp_send_active_reset+0x13e/0x320 net/ipv4/tcp_output.c:3459
 mptcp_check_fastclose net/mptcp/protocol.c:2530 [inline]
 mptcp_worker+0x6c7/0x800 net/mptcp/protocol.c:2705
 process_one_work+0x3bd/0x950 kernel/workqueue.c:2390
 worker_thread+0x5b/0x610 kernel/workqueue.c:2537
 kthread+0x138/0x170 kernel/kthread.c:376
 ret_from_fork+0x2c/0x50 arch/x86/entry/entry_64.S:308
 </TASK>

This change addresses the issue explicitly checking for bad states
before running the mptcp worker.

Fixes: e16163b6e2 ("mptcp: refactor shutdown and close")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/374
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 09:58:55 -07:00
Paolo Abeni
a5cb752b12 mptcp: use mptcp_schedule_work instead of open-coding it
Beyond reducing code duplication this also avoids scheduling
the mptcp_worker on a closed socket on some edge scenarios.

The addressed issue is actually older than the blamed commit
below, but this fix needs it as a pre-requisite.

Fixes: ba8f48f7a4 ("mptcp: introduce mptcp_schedule_work")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-13 09:58:55 -07:00
Vladimir Oltean
5b7be2d4fd net: enetc: workaround for unresponsive pMAC after receiving express traffic
I have observed an issue where the RX direction of the LS1028A ENETC pMAC
seems unresponsive. The minimal procedure to reproduce the issue is:

1. Connect ENETC port 0 with a loopback RJ45 cable to one of the Felix
   switch ports (0).

2. Bring the ports up (MAC Merge layer is not enabled on either end).

3. Send a large quantity of unidirectional (express) traffic from Felix
   to ENETC. I tried altering frame size and frame count, and it doesn't
   appear to be specific to either of them, but rather, to the quantity
   of octets received. Lowering the frame count, the minimum quantity of
   packets to reproduce relatively consistently seems to be around 37000
   frames at 1514 octets (w/o FCS) each.

4. Using ethtool --set-mm, enable the pMAC in the Felix and in the ENETC
   ports, in both RX and TX directions, and with verification on both
   ends.

5. Wait for verification to complete on both sides.

6. Configure a traffic class as preemptible on both ends.

7. Send some packets again.

The issue is at step 5, where the verification process of ENETC ends
(meaning that Felix responds with an SMD-R and ENETC sees the response),
but the verification process of Felix never ends (it remains VERIFYING).

If step 3 is skipped or if ENETC receives less traffic than
approximately that threshold, the test runs all the way through
(verification succeeds on both ends, preemptible traffic passes fine).

If, between step 4 and 5, the step below is also introduced:

4.1. Disable and re-enable PM0_COMMAND_CONFIG bit RX_EN

then again, the sequence of steps runs all the way through, and
verification succeeds, even if there was the previous RX traffic
injected into ENETC.

Traffic sent *by* the ENETC port prior to enabling the MAC Merge layer
does not seem to influence the verification result, only received
traffic does.

The LS1028A manual does not mention any relationship between
PM0_COMMAND_CONFIG and MMCSR, and the hardware people don't seem to
know for now either.

The bit that is toggled to work around the issue is also toggled
by enetc_mac_enable(), called from phylink's mac_link_down() and
mac_link_up() methods - which is how the workaround was found:
verification would work after a link down/up.

Fixes: c7b9e80869 ("net: enetc: add support for MAC Merge layer")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20230411192645.1896048-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-13 12:42:54 +02:00
Xin Long
32832a2caf sctp: fix a potential overflow in sctp_ifwdtsn_skip
Currently, when traversing ifwdtsn skips with _sctp_walk_ifwdtsn, it only
checks the pos against the end of the chunk. However, the data left for
the last pos may be < sizeof(struct sctp_ifwdtsn_skip), and dereference
it as struct sctp_ifwdtsn_skip may cause coverflow.

This patch fixes it by checking the pos against "the end of the chunk -
sizeof(struct sctp_ifwdtsn_skip)" in sctp_ifwdtsn_skip, similar to
sctp_fwdtsn_skip.

Fixes: 0fc2ea922c ("sctp: implement validate_ftsn for sctp_stream_interleave")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/2a71bffcd80b4f2c61fac6d344bb2f11c8fd74f7.1681155810.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-13 10:01:59 +02:00
Ziyang Xuan
6417070918 net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume()
Syzbot reported a bug as following:

=====================================================
BUG: KMSAN: uninit-value in qrtr_tx_resume+0x185/0x1f0 net/qrtr/af_qrtr.c:230
 qrtr_tx_resume+0x185/0x1f0 net/qrtr/af_qrtr.c:230
 qrtr_endpoint_post+0xf85/0x11b0 net/qrtr/af_qrtr.c:519
 qrtr_tun_write_iter+0x270/0x400 net/qrtr/tun.c:108
 call_write_iter include/linux/fs.h:2189 [inline]
 aio_write+0x63a/0x950 fs/aio.c:1600
 io_submit_one+0x1d1c/0x3bf0 fs/aio.c:2019
 __do_sys_io_submit fs/aio.c:2078 [inline]
 __se_sys_io_submit+0x293/0x770 fs/aio.c:2048
 __x64_sys_io_submit+0x92/0xd0 fs/aio.c:2048
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Uninit was created at:
 slab_post_alloc_hook mm/slab.h:766 [inline]
 slab_alloc_node mm/slub.c:3452 [inline]
 __kmem_cache_alloc_node+0x71f/0xce0 mm/slub.c:3491
 __do_kmalloc_node mm/slab_common.c:967 [inline]
 __kmalloc_node_track_caller+0x114/0x3b0 mm/slab_common.c:988
 kmalloc_reserve net/core/skbuff.c:492 [inline]
 __alloc_skb+0x3af/0x8f0 net/core/skbuff.c:565
 __netdev_alloc_skb+0x120/0x7d0 net/core/skbuff.c:630
 qrtr_endpoint_post+0xbd/0x11b0 net/qrtr/af_qrtr.c:446
 qrtr_tun_write_iter+0x270/0x400 net/qrtr/tun.c:108
 call_write_iter include/linux/fs.h:2189 [inline]
 aio_write+0x63a/0x950 fs/aio.c:1600
 io_submit_one+0x1d1c/0x3bf0 fs/aio.c:2019
 __do_sys_io_submit fs/aio.c:2078 [inline]
 __se_sys_io_submit+0x293/0x770 fs/aio.c:2048
 __x64_sys_io_submit+0x92/0xd0 fs/aio.c:2048
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

It is because that skb->len requires at least sizeof(struct qrtr_ctrl_pkt)
in qrtr_tx_resume(). And skb->len equals to size in qrtr_endpoint_post().
But size is less than sizeof(struct qrtr_ctrl_pkt) when qrtr_cb->type
equals to QRTR_TYPE_RESUME_TX in qrtr_endpoint_post() under the syzbot
scenario. This triggers the uninit variable access bug.

Add size check when qrtr_cb->type equals to QRTR_TYPE_RESUME_TX in
qrtr_endpoint_post() to fix the bug.

Fixes: 5fdeb0d372 ("net: qrtr: Implement outgoing flow control")
Reported-by: syzbot+4436c9630a45820fda76@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=c14607f0963d27d5a3d5f4c8639b500909e43540
Suggested-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230410012352.3997823-1-william.xuanziyang@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-13 09:35:30 +02:00
Martin Willi
59d3efd27c rtnetlink: Restore RTM_NEW/DELLINK notification behavior
The commits referenced below allows userspace to use the NLM_F_ECHO flag
for RTM_NEW/DELLINK operations to receive unicast notifications for the
affected link. Prior to these changes, applications may have relied on
multicast notifications to learn the same information without specifying
the NLM_F_ECHO flag.

For such applications, the mentioned commits changed the behavior for
requests not using NLM_F_ECHO. Multicast notifications are still received,
but now use the portid of the requester and the sequence number of the
request instead of zero values used previously. For the application, this
message may be unexpected and likely handled as a response to the
NLM_F_ACKed request, especially if it uses the same socket to handle
requests and notifications.

To fix existing applications relying on the old notification behavior,
set the portid and sequence number in the notification only if the
request included the NLM_F_ECHO flag. This restores the old behavior
for applications not using it, but allows unicasted notifications for
others.

Fixes: f3a63cce1b ("rtnetlink: Honour NLM_F_ECHO flag in rtnl_delete_link")
Fixes: d88e136cab ("rtnetlink: Honour NLM_F_ECHO flag in rtnl_newlink_create")
Signed-off-by: Martin Willi <martin@strongswan.org>
Acked-by: Guillaume Nault <gnault@redhat.com>
Acked-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20230411074319.24133-1-martin@strongswan.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-12 20:47:58 -07:00
Rob Herring
136f36c74b net: ti/cpsw: Add explicit platform_device.h and of_platform.h includes
TI CPSW uses of_platform_* functions which are declared in of_platform.h.
of_platform.h gets implicitly included by of_device.h, but that is going
to be removed soon. Nothing else depends on of_device.h so it can be
dropped. of_platform.h also implicitly includes platform_device.h, so
add an explicit include for it, too.

Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12 10:54:12 +01:00
Harshit Mogalapalli
a56ef25619 net: wwan: iosm: Fix error handling path in ipc_pcie_probe()
Smatch reports:
	drivers/net/wwan/iosm/iosm_ipc_pcie.c:298 ipc_pcie_probe()
	warn: missing unwind goto?

When dma_set_mask fails it directly returns without disabling pci
device and freeing ipc_pcie. Fix this my calling a correct goto label

As dma_set_mask returns either 0 or -EIO, we can use a goto label, as
it finally returns -EIO.

Add a set_mask_fail goto label which stands consistent with other goto
labels in this function..

Fixes: 035e3befc1 ("net: wwan: iosm: fix driver not working with INTEL_IOMMU disabled")
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12 10:45:32 +01:00
Kuniyuki Iwashima
9744d2bf19 smc: Fix use-after-free in tcp_write_timer_handler().
With Eric's ref tracker, syzbot finally found a repro for
use-after-free in tcp_write_timer_handler() by kernel TCP
sockets. [0]

If SMC creates a kernel socket in __smc_create(), the kernel
socket is supposed to be freed in smc_clcsock_release() by
calling sock_release() when we close() the parent SMC socket.

However, at the end of smc_clcsock_release(), the kernel
socket's sk_state might not be TCP_CLOSE.  This means that
we have not called inet_csk_destroy_sock() in __tcp_close()
and have not stopped the TCP timers.

The kernel socket's TCP timers can be fired later, so we
need to hold a refcnt for net as we do for MPTCP subflows
in mptcp_subflow_create_socket().

[0]:
leaked reference.
 sk_alloc (./include/net/net_namespace.h:335 net/core/sock.c:2108)
 inet_create (net/ipv4/af_inet.c:319 net/ipv4/af_inet.c:244)
 __sock_create (net/socket.c:1546)
 smc_create (net/smc/af_smc.c:3269 net/smc/af_smc.c:3284)
 __sock_create (net/socket.c:1546)
 __sys_socket (net/socket.c:1634 net/socket.c:1618 net/socket.c:1661)
 __x64_sys_socket (net/socket.c:1672)
 do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
==================================================================
BUG: KASAN: slab-use-after-free in tcp_write_timer_handler (net/ipv4/tcp_timer.c:378 net/ipv4/tcp_timer.c:624 net/ipv4/tcp_timer.c:594)
Read of size 1 at addr ffff888052b65e0d by task syzrepro/18091

CPU: 0 PID: 18091 Comm: syzrepro Tainted: G        W          6.3.0-rc4-01174-gb5d54eb5899a #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.amzn2022.0.1 04/01/2014
Call Trace:
 <IRQ>
 dump_stack_lvl (lib/dump_stack.c:107)
 print_report (mm/kasan/report.c:320 mm/kasan/report.c:430)
 kasan_report (mm/kasan/report.c:538)
 tcp_write_timer_handler (net/ipv4/tcp_timer.c:378 net/ipv4/tcp_timer.c:624 net/ipv4/tcp_timer.c:594)
 tcp_write_timer (./include/linux/spinlock.h:390 net/ipv4/tcp_timer.c:643)
 call_timer_fn (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/timer.h:127 kernel/time/timer.c:1701)
 __run_timers.part.0 (kernel/time/timer.c:1752 kernel/time/timer.c:2022)
 run_timer_softirq (kernel/time/timer.c:2037)
 __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:572)
 __irq_exit_rcu (kernel/softirq.c:445 kernel/softirq.c:650)
 irq_exit_rcu (kernel/softirq.c:664)
 sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1107 (discriminator 14))
 </IRQ>

Fixes: ac7138746e ("smc: establish new socket family")
Reported-by: syzbot+7e1e1bdb852961150198@syzkaller.appspotmail.com
Link: https://lore.kernel.org/netdev/000000000000a3f51805f8bcc43a@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12 10:37:00 +01:00
Denis Plotnikov
7573099e10 qlcnic: check pci_reset_function result
Static code analyzer complains to unchecked return value.
The result of pci_reset_function() is unchecked.
Despite, the issue is on the FLR supported code path and in that
case reset can be done with pcie_flr(), the patch uses less invasive
approach by adding the result check of pci_reset_function().

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: 7e2cf4feba ("qlcnic: change driver hardware interface mechanism")
Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-12 08:32:38 +01:00
Jakub Kicinski
adacf21f1c Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
iavf: fix racing in VLANs

Ahmed Zaki says:

This patchset mainly fixes a racing issue in the iavf where the number of
VLANs in the vlan_filter_list might be more than the PF limit. To fix that,
we get rid of the cvlans and svlans bitmaps and keep all the required info
in the list.

The second patch adds two new states that are needed so that we keep the
VLAN info while the interface goes DOWN:
    -- DISABLE    (notify PF, but keep the filter in the list)
    -- INACTIVE   (dev is DOWN, filter is removed from PF)

Finally, the current code keeps each state in a separate bit field, which
is error prone. The first patch refactors that by replacing all bits with
a single enum. The changes are minimal where each bit change is replaced
with the new state value.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  iavf: remove active_cvlans and active_svlans bitmaps
  iavf: refactor VLAN filter states
====================

Link: https://lore.kernel.org/r/20230407210730.3046149-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-11 21:37:53 -07:00
Jakub Kicinski
160c13175e bluetooth pull request for net:
- Fix not setting Dath Path for broadcast sink
  - Fix not cleaning up on LE Connection failure
  - SCO: Fix possible circular locking dependency
  - L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
  - Fix race condition in hidp_session_thread
  - btbcm: Fix logic error in forming the board name
  - btbcm: Fix use after free in btsdio_remove
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmQ0RqEZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKQrlEACfcXOFd5OtB2uLT+TTXkWk
 QS8D4qfQfKkwVxNSBrvg/pSm1PXvmed6OdboQgcrosKppbOydYmXbPy+4k3jzQIG
 2fCkgdUd8TUCGp4CiIlOHIZ3iPANa78It1c986L0of3uJ9lZaoZDNioodpFO+xT9
 6yrjS7blIU8GpkAGPKLSj7Im59CpkPE2tbh6HGmUAkslngsQb/GicqL1do1XWMX+
 8gtc4J1kHDearzm8ecHwX6csdocFuBfSeFCPIshwPwqeNzGkt43gQeBuylTKa1Ep
 HbyUekazaLJq2dlVcjMVDQ/ISlYSXannBF2v+Wx7ETdF+DsmidApZxaLDEdwVjok
 NGxsChfVum5C1bmeiltnc93UW/l8GZYjlf3fEb0xirMmSbZGZ6oR6ul0Q0y6VGa8
 S1nw42K+p7Gys/s0fejkmXZAZCTvfA0TJErocKlVPDwigCzUGUIaMNGIdRQT47u5
 h3f0aW4qvPkdszmlWvuknXWSqLoOVB97L+fNUCA31sSH/dG83KwnBzQAMcn6ZcWC
 EPO4WDVToZWxaMdZ0MSBaGXu4j/dD3KT7wz39FTUBNyUiW0bM3DxOPSI4vncyrkl
 2uaThul45R+iD3n0Q2eFh5T1fcoSZ/GBMc4w2P+y7V7E2w/wxCyFH7+fnEsbrD0T
 7JUlngpbtwbaQu4i4XLRGQ==
 =Ag/P
 -----END PGP SIGNATURE-----

Merge tag 'for-net-2023-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - Fix not setting Dath Path for broadcast sink
 - Fix not cleaning up on LE Connection failure
 - SCO: Fix possible circular locking dependency
 - L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
 - Fix race condition in hidp_session_thread
 - btbcm: Fix logic error in forming the board name
 - btbcm: Fix use after free in btsdio_remove

* tag 'for-net-2023-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
  Bluetooth: Set ISO Data Path on broadcast sink
  Bluetooth: hci_conn: Fix possible UAF
  Bluetooth: SCO: Fix possible circular locking dependency sco_sock_getsockopt
  Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm
  bluetooth: btbcm: Fix logic error in forming the board name.
  Bluetooth: btsdio: fix use after free bug in btsdio_remove due to race condition
  Bluetooth: Fix race condition in hidp_session_thread
  Bluetooth: Fix printing errors if LE Connection times out
  Bluetooth: hci_conn: Fix not cleaning up on LE Connection failure
====================

Link: https://lore.kernel.org/r/20230410172718.4067798-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-11 21:18:23 -07:00
Radu Pirea (OSS)
a4506722dc net: phy: nxp-c45-tja11xx: add remove callback
Unregister PTP clock when the driver is removed.
Purge the RX and TX skb queues.

Fixes: 514def5dd3 ("phy: nxp-c45-tja11xx: add timestamping support")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Radu Pirea (OSS) <radu-nicolae.pirea@oss.nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230406095904.75456-1-radu-nicolae.pirea@oss.nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-11 15:28:31 +02:00
Luiz Augusto von Dentz
a2a9339e1c Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
Similar to commit d0be8347c6 ("Bluetooth: L2CAP: Fix use-after-free
caused by l2cap_chan_put"), just use l2cap_chan_hold_unless_zero to
prevent referencing a channel that is about to be destroyed.

Cc: stable@kernel.org
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Min Li <lm0963hack@gmail.com>
2023-04-10 10:24:32 -07:00
Claudia Draghicescu
d2e4f1b1cb Bluetooth: Set ISO Data Path on broadcast sink
This patch enables ISO data rx on broadcast sink.

Fixes: eca0ae4aea ("Bluetooth: Add initial implementation of BIS connections")
Signed-off-by: Claudia Draghicescu <claudia.rosu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-10 10:24:16 -07:00
Luiz Augusto von Dentz
5dc7d23e16 Bluetooth: hci_conn: Fix possible UAF
This fixes the following trace:

==================================================================
BUG: KASAN: slab-use-after-free in hci_conn_del+0xba/0x3a0
Write of size 8 at addr ffff88800208e9c8 by task iso-tester/31

CPU: 0 PID: 31 Comm: iso-tester Not tainted 6.3.0-rc2-g991aa4a69a47
 #4716
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc36
04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x1d/0x70
 print_report+0xce/0x610
 ? __virt_addr_valid+0xd4/0x150
 ? hci_conn_del+0xba/0x3a0
 kasan_report+0xdd/0x110
 ? hci_conn_del+0xba/0x3a0
 hci_conn_del+0xba/0x3a0
 hci_conn_hash_flush+0xf2/0x120
 hci_dev_close_sync+0x388/0x920
 hci_unregister_dev+0x122/0x260
 vhci_release+0x4f/0x90
 __fput+0x102/0x430
 task_work_run+0xf1/0x160
 ? __pfx_task_work_run+0x10/0x10
 ? mark_held_locks+0x24/0x90
 exit_to_user_mode_prepare+0x170/0x180
 syscall_exit_to_user_mode+0x19/0x50
 do_syscall_64+0x4e/0x90
 entry_SYSCALL_64_after_hwframe+0x70/0xda

Fixes: 0f00cd322d ("Bluetooth: Free potentially unfreed SCO connection")
Link: https://syzkaller.appspot.com/bug?extid=8bb72f86fc823817bc5d
Cc: <stable@vger.kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-10 10:24:00 -07:00
Luiz Augusto von Dentz
975abc0c90 Bluetooth: SCO: Fix possible circular locking dependency sco_sock_getsockopt
This attempts to fix the following trace:

======================================================
WARNING: possible circular locking dependency detected
6.3.0-rc2-g68fcb3a7bf97 #4706 Not tainted
------------------------------------------------------
sco-tester/31 is trying to acquire lock:
ffff8880025b8070 (&hdev->lock){+.+.}-{3:3}, at:
sco_sock_getsockopt+0x1fc/0xa90

but task is already holding lock:
ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at:
sco_sock_getsockopt+0x104/0xa90

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}:
       lock_sock_nested+0x32/0x80
       sco_connect_cfm+0x118/0x4a0
       hci_sync_conn_complete_evt+0x1e6/0x3d0
       hci_event_packet+0x55c/0x7c0
       hci_rx_work+0x34c/0xa00
       process_one_work+0x575/0x910
       worker_thread+0x89/0x6f0
       kthread+0x14e/0x180
       ret_from_fork+0x2b/0x50

-> #1 (hci_cb_list_lock){+.+.}-{3:3}:
       __mutex_lock+0x13b/0xcc0
       hci_sync_conn_complete_evt+0x1ad/0x3d0
       hci_event_packet+0x55c/0x7c0
       hci_rx_work+0x34c/0xa00
       process_one_work+0x575/0x910
       worker_thread+0x89/0x6f0
       kthread+0x14e/0x180
       ret_from_fork+0x2b/0x50

-> #0 (&hdev->lock){+.+.}-{3:3}:
       __lock_acquire+0x18cc/0x3740
       lock_acquire+0x151/0x3a0
       __mutex_lock+0x13b/0xcc0
       sco_sock_getsockopt+0x1fc/0xa90
       __sys_getsockopt+0xe9/0x190
       __x64_sys_getsockopt+0x5b/0x70
       do_syscall_64+0x42/0x90
       entry_SYSCALL_64_after_hwframe+0x70/0xda

other info that might help us debug this:

Chain exists of:
  &hdev->lock --> hci_cb_list_lock --> sk_lock-AF_BLUETOOTH-BTPROTO_SCO

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO);
                               lock(hci_cb_list_lock);
                               lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO);
  lock(&hdev->lock);

 *** DEADLOCK ***

1 lock held by sco-tester/31:
 #0: ffff888001eeb130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0},
 at: sco_sock_getsockopt+0x104/0xa90

Fixes: 248733e87d ("Bluetooth: Allow querying of supported offload codecs over SCO socket")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-10 10:23:45 -07:00
Luiz Augusto von Dentz
9a8ec9e8eb Bluetooth: SCO: Fix possible circular locking dependency on sco_connect_cfm
This attempts to fix the following trace:

======================================================
WARNING: possible circular locking dependency detected
6.3.0-rc2-g0b93eeba4454 #4703 Not tainted
------------------------------------------------------
kworker/u3:0/46 is trying to acquire lock:
ffff888001fd9130 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}, at:
sco_connect_cfm+0x118/0x4a0

but task is already holding lock:
ffffffff831e3340 (hci_cb_list_lock){+.+.}-{3:3}, at:
hci_sync_conn_complete_evt+0x1ad/0x3d0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (hci_cb_list_lock){+.+.}-{3:3}:
       __mutex_lock+0x13b/0xcc0
       hci_sync_conn_complete_evt+0x1ad/0x3d0
       hci_event_packet+0x55c/0x7c0
       hci_rx_work+0x34c/0xa00
       process_one_work+0x575/0x910
       worker_thread+0x89/0x6f0
       kthread+0x14e/0x180
       ret_from_fork+0x2b/0x50

-> #1 (&hdev->lock){+.+.}-{3:3}:
       __mutex_lock+0x13b/0xcc0
       sco_sock_connect+0xfc/0x630
       __sys_connect+0x197/0x1b0
       __x64_sys_connect+0x37/0x50
       do_syscall_64+0x42/0x90
       entry_SYSCALL_64_after_hwframe+0x70/0xda

-> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_SCO){+.+.}-{0:0}:
       __lock_acquire+0x18cc/0x3740
       lock_acquire+0x151/0x3a0
       lock_sock_nested+0x32/0x80
       sco_connect_cfm+0x118/0x4a0
       hci_sync_conn_complete_evt+0x1e6/0x3d0
       hci_event_packet+0x55c/0x7c0
       hci_rx_work+0x34c/0xa00
       process_one_work+0x575/0x910
       worker_thread+0x89/0x6f0
       kthread+0x14e/0x180
       ret_from_fork+0x2b/0x50

other info that might help us debug this:

Chain exists of:
  sk_lock-AF_BLUETOOTH-BTPROTO_SCO --> &hdev->lock --> hci_cb_list_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(hci_cb_list_lock);
                               lock(&hdev->lock);
                               lock(hci_cb_list_lock);
  lock(sk_lock-AF_BLUETOOTH-BTPROTO_SCO);

 *** DEADLOCK ***

4 locks held by kworker/u3:0/46:
 #0: ffff8880028d1130 ((wq_completion)hci0#2){+.+.}-{0:0}, at:
 process_one_work+0x4c0/0x910
 #1: ffff8880013dfde0 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0},
 at: process_one_work+0x4c0/0x910
 #2: ffff8880025d8070 (&hdev->lock){+.+.}-{3:3}, at:
 hci_sync_conn_complete_evt+0xa6/0x3d0
 #3: ffffffffb79e3340 (hci_cb_list_lock){+.+.}-{3:3}, at:
 hci_sync_conn_complete_evt+0x1ad/0x3d0

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-10 10:23:30 -07:00
Sasha Finkelstein
b76abe4648 bluetooth: btbcm: Fix logic error in forming the board name.
This patch fixes an incorrect loop exit condition in code that replaces
'/' symbols in the board name. There might also be a memory corruption
issue here, but it is unlikely to be a real problem.

Cc: <stable@vger.kernel.org>
Signed-off-by: Sasha Finkelstein <fnkl.kernel@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-10 10:23:15 -07:00
Zheng Wang
73f7b171b7 Bluetooth: btsdio: fix use after free bug in btsdio_remove due to race condition
In btsdio_probe, the data->work is bound with btsdio_work. It will be
started in btsdio_send_frame.

If the btsdio_remove runs with a unfinished work, there may be a race
condition that hdev is freed but used in btsdio_work. Fix it by
canceling the work before do cleanup in btsdio_remove.

Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-10 10:23:01 -07:00
Min Li
c95930abd6 Bluetooth: Fix race condition in hidp_session_thread
There is a potential race condition in hidp_session_thread that may
lead to use-after-free. For instance, the timer is active while
hidp_del_timer is called in hidp_session_thread(). After hidp_session_put,
then 'session' will be freed, causing kernel panic when hidp_idle_timeout
is running.

The solution is to use del_timer_sync instead of del_timer.

Here is the call trace:

? hidp_session_probe+0x780/0x780
call_timer_fn+0x2d/0x1e0
__run_timers.part.0+0x569/0x940
hidp_session_probe+0x780/0x780
call_timer_fn+0x1e0/0x1e0
ktime_get+0x5c/0xf0
lapic_next_deadline+0x2c/0x40
clockevents_program_event+0x205/0x320
run_timer_softirq+0xa9/0x1b0
__do_softirq+0x1b9/0x641
__irq_exit_rcu+0xdc/0x190
irq_exit_rcu+0xe/0x20
sysvec_apic_timer_interrupt+0xa1/0xc0

Cc: stable@vger.kernel.org
Signed-off-by: Min Li <lm0963hack@gmail.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-10 10:22:46 -07:00
Luiz Augusto von Dentz
b62e72200e Bluetooth: Fix printing errors if LE Connection times out
This fixes errors like bellow when LE Connection times out since that
is actually not a controller error:

 Bluetooth: hci0: Opcode 0x200d failed: -110
 Bluetooth: hci0: request failed to create LE connection: err -110

Instead the code shall properly detect if -ETIMEDOUT is returned and
send HCI_OP_LE_CREATE_CONN_CANCEL to give up on the connection.

Link: https://github.com/bluez/bluez/issues/340
Fixes: 8e8b92ee60 ("Bluetooth: hci_sync: Add hci_le_create_conn_sync")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-10 10:21:33 -07:00
Luiz Augusto von Dentz
19cf60bf63 Bluetooth: hci_conn: Fix not cleaning up on LE Connection failure
hci_connect_le_scan_cleanup shall always be invoked to cleanup the
states and re-enable passive scanning if necessary, otherwise it may
cause the pending action to stay active causing multiple attempts to
connect.

Fixes: 9b3628d79b ("Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2023-04-10 10:21:33 -07:00
David S. Miller
9a06fe08ae Merge branch 'sfp-eeprom'
Ivan Bornyakov says:

====================
net: fix EEPROM read of absent SFP module

The patchset is to improve EEPROM read requests when SFP module is
absent.

ChangeLog:
v1:
https://lore.kernel.org/netdev/20230405153900.747-1-i.bornyakov@metrotek.ru/
v2:
  * reword commit message of "net: sfp: initialize sfp->i2c_block_size
    at sfp allocation"
  * add second patch to eliminate excessive I2C transfers in
    sfp_module_eeprom() and sfp_module_eeprom_by_page()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-09 16:01:28 +01:00
Ivan Bornyakov
bef227c153 net: sfp: avoid EEPROM read of absent SFP module
If SFP module is not present, it is sensible to fail sfp_module_eeprom()
and sfp_module_eeprom_by_page() early to avoid excessive I2C transfers
which are garanteed to fail.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-09 16:01:28 +01:00
Ivan Bornyakov
813c2dd786 net: sfp: initialize sfp->i2c_block_size at sfp allocation
sfp->i2c_block_size is initialized at SFP module insertion in
sfp_sm_mod_probe(). Because of that, if SFP module was never inserted
since boot, sfp_read() call will lead to zero-length I2C read attempt,
and not all I2C controllers are happy with zero-length reads.

One way to issue sfp_read() on empty SFP cage is to execute ethtool -m.
If SFP module was never plugged since boot, there will be a zero-length
I2C read attempt.

  # ethtool -m xge0
  i2c i2c-3: adapter quirk: no zero length (addr 0x0050, size 0, read)
  Cannot get Module EEPROM data: Operation not supported

If SFP module was plugged then removed at least once,
sfp->i2c_block_size will be initialized and ethtool -m will fail with
different exit code and without I2C error

  # ethtool -m xge0
  Cannot get Module EEPROM data: Remote I/O error

Fix this by initializing sfp->i2_block_size at struct sfp allocation
stage so no wild sfp_read() could issue zero-length I2C read.

Signed-off-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Fixes: 0d035bed2a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround")
Cc: stable@vger.kernel.org
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-09 16:01:28 +01:00
Douglas Anderson
5cc33f139e r8152: Add __GFP_NOWARN to big allocations
When memory is a little tight on my system, it's pretty easy to see
warnings that look like this.

  ksoftirqd/0: page allocation failure: order:3, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
  ...
  Call trace:
   dump_backtrace+0x0/0x1e8
   show_stack+0x20/0x2c
   dump_stack_lvl+0x60/0x78
   dump_stack+0x18/0x38
   warn_alloc+0x104/0x174
   __alloc_pages+0x588/0x67c
   alloc_rx_agg+0xa0/0x190 [r8152 ...]
   r8152_poll+0x270/0x760 [r8152 ...]
   __napi_poll+0x44/0x1ec
   net_rx_action+0x100/0x300
   __do_softirq+0xec/0x38c
   run_ksoftirqd+0x38/0xec
   smpboot_thread_fn+0xb8/0x248
   kthread+0x134/0x154
   ret_from_fork+0x10/0x20

On a fragmented system it's normal that order 3 allocations will
sometimes fail, especially atomic ones. The driver handles these
failures fine and the WARN just creates spam in the logs for this
case. The __GFP_NOWARN flag is exactly for this situation, so add it
to the allocation.

NOTE: my testing is on a 5.15 system, but there should be no reason
that this would be fundamentally different on a mainline kernel.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Link: https://lore.kernel.org/r/20230406171411.1.I84dbef45786af440fd269b71e9436a96a8e7a152@changeid
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07 20:12:46 -07:00
Radu Pirea (OSS)
bdaaecc127 net: phy: nxp-c45-tja11xx: fix unsigned long multiplication overflow
Any multiplication between GENMASK(31, 0) and a number bigger than 1
will be truncated because of the overflow, if the size of unsigned long
is 32 bits.

Replaced GENMASK with GENMASK_ULL to make sure that multiplication will
be between 64 bits values.

Cc: <stable@vger.kernel.org> # 5.15+
Fixes: 514def5dd3 ("phy: nxp-c45-tja11xx: add timestamping support")
Signed-off-by: Radu Pirea (OSS) <radu-nicolae.pirea@oss.nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230406095953.75622-1-radu-nicolae.pirea@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07 19:48:34 -07:00
Felix Huettner
066b86787f net: openvswitch: fix race on port output
assume the following setup on a single machine:
1. An openvswitch instance with one bridge and default flows
2. two network namespaces "server" and "client"
3. two ovs interfaces "server" and "client" on the bridge
4. for each ovs interface a veth pair with a matching name and 32 rx and
   tx queues
5. move the ends of the veth pairs to the respective network namespaces
6. assign ip addresses to each of the veth ends in the namespaces (needs
   to be the same subnet)
7. start some http server on the server network namespace
8. test if a client in the client namespace can reach the http server

when following the actions below the host has a chance of getting a cpu
stuck in a infinite loop:
1. send a large amount of parallel requests to the http server (around
   3000 curls should work)
2. in parallel delete the network namespace (do not delete interfaces or
   stop the server, just kill the namespace)

there is a low chance that this will cause the below kernel cpu stuck
message. If this does not happen just retry.
Below there is also the output of bpftrace for the functions mentioned
in the output.

The series of events happening here is:
1. the network namespace is deleted calling
   `unregister_netdevice_many_notify` somewhere in the process
2. this sets first `NETREG_UNREGISTERING` on both ends of the veth and
   then runs `synchronize_net`
3. it then calls `call_netdevice_notifiers` with `NETDEV_UNREGISTER`
4. this is then handled by `dp_device_event` which calls
   `ovs_netdev_detach_dev` (if a vport is found, which is the case for
   the veth interface attached to ovs)
5. this removes the rx_handlers of the device but does not prevent
   packages to be sent to the device
6. `dp_device_event` then queues the vport deletion to work in
   background as a ovs_lock is needed that we do not hold in the
   unregistration path
7. `unregister_netdevice_many_notify` continues to call
   `netdev_unregister_kobject` which sets `real_num_tx_queues` to 0
8. port deletion continues (but details are not relevant for this issue)
9. at some future point the background task deletes the vport

If after 7. but before 9. a packet is send to the ovs vport (which is
not deleted at this point in time) which forwards it to the
`dev_queue_xmit` flow even though the device is unregistering.
In `skb_tx_hash` (which is called in the `dev_queue_xmit`) path there is
a while loop (if the packet has a rx_queue recorded) that is infinite if
`dev->real_num_tx_queues` is zero.

To prevent this from happening we update `do_output` to handle devices
without carrier the same as if the device is not found (which would
be the code path after 9. is done).

Additionally we now produce a warning in `skb_tx_hash` if we will hit
the infinite loop.

bpftrace (first word is function name):

__dev_queue_xmit server: real_num_tx_queues: 1, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 1
netdev_core_pick_tx server: addr: 0xffff9f0a46d4a000 real_num_tx_queues: 1, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 1
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 2, reg_state: 1
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 6, reg_state: 2
ovs_netdev_detach_dev server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, reg_state: 2
netdev_rx_handler_unregister server: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024, reg_state: 2
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
netdev_rx_handler_unregister ret server: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024, reg_state: 2
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 27, reg_state: 2
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 22, reg_state: 2
dp_device_event server: real_num_tx_queues: 1 cpu 9, pid: 21024, tid: 21024, event 18, reg_state: 2
netdev_unregister_kobject: real_num_tx_queues: 1, cpu: 9, pid: 21024, tid: 21024
synchronize_rcu_expedited: cpu 9, pid: 21024, tid: 21024
ovs_vport_send server: real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 2
__dev_queue_xmit server: real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 2
netdev_core_pick_tx server: addr: 0xffff9f0a46d4a000 real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024, skb_addr: 0xffff9edb6f207000, reg_state: 2
broken device server: real_num_tx_queues: 0, cpu: 2, pid: 28024, tid: 28024
ovs_dp_detach_port server: real_num_tx_queues: 0 cpu 9, pid: 9124, tid: 9124, reg_state: 2
synchronize_rcu_expedited: cpu 9, pid: 33604, tid: 33604

stuck message:

watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [curl:1929279]
Modules linked in: veth pktgen bridge stp llc ip_set_hash_net nft_counter xt_set nft_compat nf_tables ip_set_hash_ip ip_set nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tls binfmt_misc nls_iso8859_1 input_leds joydev serio_raw dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel drm efi_pstore virtio_rng ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel virtio_net ahci net_failover crypto_simd cryptd psmouse libahci virtio_blk failover
CPU: 5 PID: 1929279 Comm: curl Not tainted 5.15.0-67-generic #74-Ubuntu
Hardware name: OpenStack Foundation OpenStack Nova, BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:netdev_pick_tx+0xf1/0x320
Code: 00 00 8d 48 ff 0f b7 c1 66 39 ca 0f 86 e9 01 00 00 45 0f b7 ff 41 39 c7 0f 87 5b 01 00 00 44 29 f8 41 39 c7 0f 87 4f 01 00 00 <eb> f2 0f 1f 44 00 00 49 8b 94 24 28 04 00 00 48 85 d2 0f 84 53 01
RSP: 0018:ffffb78b40298820 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff9c8773adc2e0 RCX: 000000000000083f
RDX: 0000000000000000 RSI: ffff9c8773adc2e0 RDI: ffff9c870a25e000
RBP: ffffb78b40298858 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9c870a25e000
R13: ffff9c870a25e000 R14: ffff9c87fe043480 R15: 0000000000000000
FS:  00007f7b80008f00(0000) GS:ffff9c8e5f740000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7b80f6a0b0 CR3: 0000000329d66000 CR4: 0000000000350ee0
Call Trace:
 <IRQ>
 netdev_core_pick_tx+0xa4/0xb0
 __dev_queue_xmit+0xf8/0x510
 ? __bpf_prog_exit+0x1e/0x30
 dev_queue_xmit+0x10/0x20
 ovs_vport_send+0xad/0x170 [openvswitch]
 do_output+0x59/0x180 [openvswitch]
 do_execute_actions+0xa80/0xaa0 [openvswitch]
 ? kfree+0x1/0x250
 ? kfree+0x1/0x250
 ? kprobe_perf_func+0x4f/0x2b0
 ? flow_lookup.constprop.0+0x5c/0x110 [openvswitch]
 ovs_execute_actions+0x4c/0x120 [openvswitch]
 ovs_dp_process_packet+0xa1/0x200 [openvswitch]
 ? ovs_ct_update_key.isra.0+0xa8/0x120 [openvswitch]
 ? ovs_ct_fill_key+0x1d/0x30 [openvswitch]
 ? ovs_flow_key_extract+0x2db/0x350 [openvswitch]
 ovs_vport_receive+0x77/0xd0 [openvswitch]
 ? __htab_map_lookup_elem+0x4e/0x60
 ? bpf_prog_680e8aff8547aec1_kfree+0x3b/0x714
 ? trace_call_bpf+0xc8/0x150
 ? kfree+0x1/0x250
 ? kfree+0x1/0x250
 ? kprobe_perf_func+0x4f/0x2b0
 ? kprobe_perf_func+0x4f/0x2b0
 ? __mod_memcg_lruvec_state+0x63/0xe0
 netdev_port_receive+0xc4/0x180 [openvswitch]
 ? netdev_port_receive+0x180/0x180 [openvswitch]
 netdev_frame_hook+0x1f/0x40 [openvswitch]
 __netif_receive_skb_core.constprop.0+0x23d/0xf00
 __netif_receive_skb_one_core+0x3f/0xa0
 __netif_receive_skb+0x15/0x60
 process_backlog+0x9e/0x170
 __napi_poll+0x33/0x180
 net_rx_action+0x126/0x280
 ? ttwu_do_activate+0x72/0xf0
 __do_softirq+0xd9/0x2e7
 ? rcu_report_exp_cpu_mult+0x1b0/0x1b0
 do_softirq+0x7d/0xb0
 </IRQ>
 <TASK>
 __local_bh_enable_ip+0x54/0x60
 ip_finish_output2+0x191/0x460
 __ip_finish_output+0xb7/0x180
 ip_finish_output+0x2e/0xc0
 ip_output+0x78/0x100
 ? __ip_finish_output+0x180/0x180
 ip_local_out+0x5e/0x70
 __ip_queue_xmit+0x184/0x440
 ? tcp_syn_options+0x1f9/0x300
 ip_queue_xmit+0x15/0x20
 __tcp_transmit_skb+0x910/0x9c0
 ? __mod_memcg_state+0x44/0xa0
 tcp_connect+0x437/0x4e0
 ? ktime_get_with_offset+0x60/0xf0
 tcp_v4_connect+0x436/0x530
 __inet_stream_connect+0xd4/0x3a0
 ? kprobe_perf_func+0x4f/0x2b0
 ? aa_sk_perm+0x43/0x1c0
 inet_stream_connect+0x3b/0x60
 __sys_connect_file+0x63/0x70
 __sys_connect+0xa6/0xd0
 ? setfl+0x108/0x170
 ? do_fcntl+0xe8/0x5a0
 __x64_sys_connect+0x18/0x20
 do_syscall_64+0x5c/0xc0
 ? __x64_sys_fcntl+0xa9/0xd0
 ? exit_to_user_mode_prepare+0x37/0xb0
 ? syscall_exit_to_user_mode+0x27/0x50
 ? do_syscall_64+0x69/0xc0
 ? __sys_setsockopt+0xea/0x1e0
 ? exit_to_user_mode_prepare+0x37/0xb0
 ? syscall_exit_to_user_mode+0x27/0x50
 ? __x64_sys_setsockopt+0x1f/0x30
 ? do_syscall_64+0x69/0xc0
 ? irqentry_exit+0x1d/0x30
 ? exc_page_fault+0x89/0x170
 entry_SYSCALL_64_after_hwframe+0x61/0xcb
RIP: 0033:0x7f7b8101c6a7
Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 18 89 54 24 0c 48 89 34 24 89
RSP: 002b:00007ffffd6b2198 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7b8101c6a7
RDX: 0000000000000010 RSI: 00007ffffd6b2360 RDI: 0000000000000005
RBP: 0000561f1370d560 R08: 00002795ad21d1ac R09: 0030312e302e302e
R10: 00007ffffd73f080 R11: 0000000000000246 R12: 0000561f1370c410
R13: 0000000000000000 R14: 0000000000000005 R15: 0000000000000000
 </TASK>

Fixes: 7f8a436eaa ("openvswitch: Add conntrack action")
Co-developed-by: Luca Czesla <luca.czesla@mail.schwarz>
Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz>
Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/ZC0pBXBAgh7c76CA@kernel-bug-kernel-bug
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07 19:42:53 -07:00
Jakub Kicinski
029294d019 bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZDCaHgAKCRDbK58LschI
 g2CXAP9sgjqCaRhfSSYWbESKWxokJAa1j6v7phNjR9iqHrBjzwEAg6aDHOUqcYpD
 Zlp/5JV9HIkc1wTmuIUuI74YAVZBJAU=
 =V2jo
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-04-08

We've added 4 non-merge commits during the last 11 day(s) which contain
a total of 5 files changed, 39 insertions(+), 6 deletions(-).

The main changes are:

1) Fix BPF TCP socket iterator to use correct helper for dropping
   socket's refcount, that is, sock_gen_put instead of sock_put,
   from Martin KaFai Lau.

2) Fix a BTI exception splat in BPF trampoline-generated code on arm64,
   from Xu Kuohai.

3) Fix a LongArch JIT error from missing BPF_NOSPEC no-op, from George Guo.

4) Fix dynamic XDP feature detection of veth in xdp_redirect selftest,
   from Lorenzo Bianconi.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: fix xdp_redirect xdp-features selftest for veth driver
  bpf, arm64: Fixed a BTI error on returning to patched function
  LoongArch, bpf: Fix jit to skip speculation barrier opcode
  bpf: tcp: Use sock_gen_put instead of sock_put in bpf_iter_tcp
====================

Link: https://lore.kernel.org/r/20230407224642.30906-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-07 18:23:37 -07:00
Ahmed Zaki
9c85b7fa12 iavf: remove active_cvlans and active_svlans bitmaps
The VLAN filters info is currently being held in a list and 2 bitmaps
(active_cvlans and active_svlans). We are experiencing some racing where
data is not in sync in the list and bitmaps. For example, the VLAN is
initially added to the list but only when the PF replies, it is added to
the bitmap. If a user adds many V2 VLANS before the PF responds:

    while [ $((i++)) ]
        ip l add l eth0 name eth0.$i type vlan id $i

we might end up with more VLAN list entries than the designated limit.
Also, The "ip link show" will show more links added than the PF limit.

On the other and, the bitmaps are only used to check the number of VLAN
filters and to re-enable the filters when the interface goes from DOWN to
UP.

This patch gets rid of the bitmaps and uses the list only. To do that,
the states of the VLAN filter are modified:
1 - IAVF_VLAN_REMOVE: the entry needs to be totally removed after informing
  the PF. This is the "ip link del eth0.$i" path.
2 - IAVF_VLAN_DISABLE: (new) the netdev went down. The filter needs to be
  removed from the PF and then marked INACTIVE.
3 - IAVF_VLAN_INACTIVE: (new) no PF filter exists, but the user did not
  delete the VLAN.

Fixes: 48ccc43ecf ("iavf: Add support VIRTCHNL_VF_OFFLOAD_VLAN_V2 during netdev config")
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-04-07 08:49:50 -07:00
Ahmed Zaki
0c0da0e951 iavf: refactor VLAN filter states
The VLAN filter states are currently being saved as individual bits.
This is error prone as multiple bits might be mistakenly set.

Fix by replacing the bits with a single state enum. Also, add an
"ACTIVE" state for filters that are accepted by the PF.

Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-04-07 08:49:50 -07:00
David S. Miller
b9881d9a76 Merge branch 'bonding-ns-validation-fixes'
Hangbin Liu says:

====================
bonding: fix ns validation on backup slaves

The first patch fixed a ns validation issue on backup slaves. The second
patch re-format the bond option test and add a test lib file. The third
patch add the arp validate regression test for the kernel patch.

Here is the new bonding option test without the kernel fix:

]# ./bond_options.sh
TEST: prio (active-backup miimon primary_reselect 0)           [ OK ]
TEST: prio (active-backup miimon primary_reselect 1)           [ OK ]
TEST: prio (active-backup miimon primary_reselect 2)           [ OK ]
TEST: prio (active-backup arp_ip_target primary_reselect 0)    [ OK ]
TEST: prio (active-backup arp_ip_target primary_reselect 1)    [ OK ]
TEST: prio (active-backup arp_ip_target primary_reselect 2)    [ OK ]
TEST: prio (active-backup ns_ip6_target primary_reselect 0)    [ OK ]
TEST: prio (active-backup ns_ip6_target primary_reselect 1)    [ OK ]
TEST: prio (active-backup ns_ip6_target primary_reselect 2)    [ OK ]
TEST: prio (balance-tlb miimon primary_reselect 0)             [ OK ]
TEST: prio (balance-tlb miimon primary_reselect 1)             [ OK ]
TEST: prio (balance-tlb miimon primary_reselect 2)             [ OK ]
TEST: prio (balance-tlb arp_ip_target primary_reselect 0)      [ OK ]
TEST: prio (balance-tlb arp_ip_target primary_reselect 1)      [ OK ]
TEST: prio (balance-tlb arp_ip_target primary_reselect 2)      [ OK ]
TEST: prio (balance-tlb ns_ip6_target primary_reselect 0)      [ OK ]
TEST: prio (balance-tlb ns_ip6_target primary_reselect 1)      [ OK ]
TEST: prio (balance-tlb ns_ip6_target primary_reselect 2)      [ OK ]
TEST: prio (balance-alb miimon primary_reselect 0)             [ OK ]
TEST: prio (balance-alb miimon primary_reselect 1)             [ OK ]
TEST: prio (balance-alb miimon primary_reselect 2)             [ OK ]
TEST: prio (balance-alb arp_ip_target primary_reselect 0)      [ OK ]
TEST: prio (balance-alb arp_ip_target primary_reselect 1)      [ OK ]
TEST: prio (balance-alb arp_ip_target primary_reselect 2)      [ OK ]
TEST: prio (balance-alb ns_ip6_target primary_reselect 0)      [ OK ]
TEST: prio (balance-alb ns_ip6_target primary_reselect 1)      [ OK ]
TEST: prio (balance-alb ns_ip6_target primary_reselect 2)      [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 0)  [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 1)  [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 2)  [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 3)  [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 4)  [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 5)  [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 6)  [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 0)  [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 1)  [ OK ]
TEST: arp_validate (interface eth1 mii_status DOWN)                 [FAIL]
TEST: arp_validate (interface eth2 mii_status DOWN)                 [FAIL]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 2)  [FAIL]
TEST: arp_validate (interface eth1 mii_status DOWN)                 [FAIL]
TEST: arp_validate (interface eth2 mii_status DOWN)                 [FAIL]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 3)  [FAIL]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 4)  [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 5)  [ OK ]
TEST: arp_validate (interface eth1 mii_status DOWN)                 [FAIL]
TEST: arp_validate (interface eth2 mii_status DOWN)                 [FAIL]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 6)  [FAIL]

Here is the test result after the kernel fix:
TEST: arp_validate (active-backup arp_ip_target arp_validate 0)  [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 1)  [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 2)  [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 3)  [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 4)  [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 5)  [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 6)  [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 0)  [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 1)  [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 2)  [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 3)  [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 4)  [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 5)  [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 6)  [ OK ]
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-07 08:47:20 +01:00
Hangbin Liu
2e825f8acc selftests: bonding: add arp validate test
This patch add bonding arp validate tests with mode active backup,
monitor arp_ip_target and ns_ip6_target. It also checks mii_status
to make sure all slaves are UP.

Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-07 08:47:20 +01:00
Hangbin Liu
481b56e039 selftests: bonding: re-format bond option tests
To improve the testing process for bond options, A new bond topology lib
is added to our testing setup. The current option_prio.sh file will be
renamed to bond_options.sh so that all bonding options can be tested here.
Specifically, for priority testing, we will run all tests using modes
1, 5, and 6. These changes will help us streamline the testing process
and ensure that our bond options are rigorously evaluated.

Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-07 08:47:20 +01:00
Hangbin Liu
4598380f9c bonding: fix ns validation on backup slaves
When arp_validate is set to 2, 3, or 6, validation is performed for
backup slaves as well. As stated in the bond documentation, validation
involves checking the broadcast ARP request sent out via the active
slave. This helps determine which slaves are more likely to function in
the event of an active slave failure.

However, when the target is an IPv6 address, the NS message sent from
the active interface is not checked on backup slaves. Additionally,
based on the bond_arp_rcv() rule b, we must reverse the saddr and daddr
when checking the NS message.

Note that when checking the NS message, the destination address is a
multicast address. Therefore, we must convert the target address to
solicited multicast in the bond_get_targets_ip6() function.

Prior to the fix, the backup slaves had a mii status of "down", but
after the fix, all of the slaves' mii status was updated to "UP".

Fixes: 4e24be018e ("bonding: add new parameter ns_targets")
Reviewed-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-07 08:47:20 +01:00