Commit Graph

1060376 Commits

Author SHA1 Message Date
Joakim Zhang
b5bd95d171 net: fec: only clear interrupt of handling queue in fec_enet_rx_queue()
Background:
We have a customer is running a Profinet stack on the 8MM which receives and
responds PNIO packets every 4ms and PNIO-CM packets every 40ms. However, from
time to time the received PNIO-CM package is "stock" and is only handled when
receiving a new PNIO-CM or DCERPC-Ping packet (tcpdump shows the PNIO-CM and
the DCERPC-Ping packet at the same time but the PNIO-CM HW timestamp is from
the expected 40 ms and not the 2s delay of the DCERPC-Ping).

After debugging, we noticed PNIO, PNIO-CM and DCERPC-Ping packets would
be handled by different RX queues.

The root cause should be driver ack all queues' interrupt when handle a
specific queue in fec_enet_rx_queue(). The blamed patch is introduced to
receive as much packets as possible once to avoid interrupt flooding.
But it's unreasonable to clear other queues'interrupt when handling one
queue, this patch tries to fix it.

Fixes: ed63f1dcd5 (net: fec: clear receive interrupts before processing a packet)
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Reported-by: Nicolas Diaz <nicolas.diaz@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
Link: https://lore.kernel.org/r/20211206135457.15946-1-qiangqing.zhang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:39:39 -08:00
Colin Ian King
3c5290a2dc net: hns3: Fix spelling mistake "faile" -> "failed"
There is a spelling mistake in a dev_err message. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20211206091207.113648-1-colin.i.king@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:35:12 -08:00
Jakub Kicinski
65af674a59 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2021-12-06

This series contains updates to iavf and i40e drivers.

Mitch adds restoration of MSI state during reset for iavf.

Michal fixes checking and reporting of descriptor count changes to
communicate changes and/or issues for iavf.

Karen resolves an issue with failed handling of VF requests while a VF
reset is occurring for i40e.

Mateusz removes clearing of VF requested queue count when configuring
VF ADQ for i40e.

Norbert fixes a NULL pointer dereference that can occur when getting VSI
descriptors for i40e.
====================

Link: https://lore.kernel.org/r/20211206183519.2733180-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:33:12 -08:00
Jakub Kicinski
9e8926888c Merge branch 'net-phy-fix-doc-build-warning'
Yanteng Si says:

====================
net: phy: Fix doc build warnings
====================

Link: https://lore.kernel.org/r/cover.1638776933.git.siyanteng@loongson.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:26:22 -08:00
Yanteng Si
c35e8de704 net: phy: Add the missing blank line in the phylink_suspend comment
Fix warning as:

Documentation/networking/kapi:147: ./drivers/net/phy/phylink.c:1657: WARNING: Unexpected indentation.
Documentation/networking/kapi:147: ./drivers/net/phy/phylink.c:1658: WARNING: Block quote ends without a blank line; unexpected unindent.

Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:26:22 -08:00
Yanteng Si
a97770cc40 net: phy: Remove unnecessary indentation in the comments of phy_device
Fix warning as:

linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:543: WARNING: Unexpected indentation.
linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:544: WARNING: Block quote ends without a blank line; unexpected unindent.
linux-next/Documentation/networking/kapi:122: ./include/linux/phy.h:546: WARNING: Unexpected indentation.

Suggested-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:26:06 -08:00
Jakub Kicinski
150791442e wireless-drivers-next patches for v5.17
First set of patches for v5.17. The biggest change is the iwlmei
 driver for Intel's AMT devices. Also now WCN6855 support in ath11k
 should be usable.
 
 Major changes:
 
 ath10k
 
 * fetch (pre-)calibration data via nvmem subsystem
 
 ath11k
 
 * enable 802.11 power save mode in station mode for qca6390 and wcn6855
 
 * trace log support
 
 * proper board file detection for WCN6855 based on PCI ids
 
 * BSS color change support
 
 rtw88
 
 * add debugfs file to force lowest basic rate
 
 * add quirk to disable PCI ASPM on HP 250 G7 Notebook PC
 
 mwifiex
 
 * add quirk to disable deep sleep with certain hardware revision in
   Surface Book 2 devices
 
 iwlwifi
 
 * add iwlmei driver for co-operating with Intel's Active Management
   Technology (AMT) devices
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmGvclcRHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZtK6wgAoOoT83JKMreXlXXhVegqlJbC3HyElF5r
 DJlpDDJkJUT9airol2nd0yFfP+5WFyQPrt5shsQmqz43U4jlgfFpFXZIjQufK+gn
 VAGvVGfsanRXEFlsVgFZeSZvAEyEyNSggxADC03Ky0xtiCGc89r2o71jD3HA/ZzO
 1X8gbKH7YLWj4G/GQkrKsvIAwzoZXT7nwQSdW73M8QVzk4OSNhLBLdiqKYq0yViM
 7Ea2Vj27hiyk/RXNUZHy+bKa8vKN5sA91VHJ836aPZBQ4OLotGzF3AgHfgIhIpdr
 hI4BVJbngpjQho1EkCnZZuISPes0YQWJB5hK5xpL98yuIL4wyJRfeQ==
 =I8Dj
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-next-2021-12-07' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for v5.17

First set of patches for v5.17. The biggest change is the iwlmei
driver for Intel's AMT devices. Also now WCN6855 support in ath11k
should be usable.

Major changes:

ath10k
 * fetch (pre-)calibration data via nvmem subsystem

ath11k
 * enable 802.11 power save mode in station mode for qca6390 and wcn6855
 * trace log support
 * proper board file detection for WCN6855 based on PCI ids
 * BSS color change support

rtw88
 * add debugfs file to force lowest basic rate
 * add quirk to disable PCI ASPM on HP 250 G7 Notebook PC

mwifiex
 * add quirk to disable deep sleep with certain hardware revision in
  Surface Book 2 devices

iwlwifi
 * add iwlmei driver for co-operating with Intel's Active Management
   Technology (AMT) devices

* tag 'wireless-drivers-next-2021-12-07' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next: (87 commits)
  iwlwifi: mei: fix linking when tracing is not enabled
  rtlwifi: rtl8192de: Style clean-ups
  mwl8k: Use named struct for memcpy() region
  intersil: Use struct_group() for memcpy() region
  libertas_tf: Use struct_group() for memcpy() region
  libertas: Use struct_group() for memcpy() region
  wlcore: no need to initialise statics to false
  rsi: Fix out-of-bounds read in rsi_read_pkt()
  rsi: Fix use-after-free in rsi_rx_done_handler()
  brcmfmac: Configure keep-alive packet on suspend
  wilc1000: remove '-Wunused-but-set-variable' warning in chip_wakeup()
  iwlwifi: mvm: read the rfkill state and feed it to iwlmei
  iwlwifi: mvm: add vendor commands needed for iwlmei
  iwlwifi: integrate with iwlmei
  iwlwifi: mei: add debugfs hooks
  iwlwifi: mei: add the driver to allow cooperation with CSME
  mei: bus: add client dma interface
  mwifiex: Ignore BTCOEX events from the 88W8897 firmware
  mwifiex: Ensure the version string from the firmware is 0-terminated
  mwifiex: Add quirk to disable deep sleep with certain hardware revision
  ...
====================

Link: https://lore.kernel.org/r/20211207144211.A9949C341C1@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 21:01:18 -08:00
Ameer Hamza
e6f60c51f0 gve: fix for null pointer dereference.
Avoid passing NULL skb to __skb_put() function call if
napi_alloc_skb() returns NULL.

Fixes: 37149e9374 ("gve: Implement packet continuation for RX.")
Signed-off-by: Ameer Hamza <amhamza.mgc@gmail.com>
Link: https://lore.kernel.org/r/20211205183810.8299-1-amhamza.mgc@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:57:17 -08:00
Jakub Kicinski
adc76fc97b Merge branch 'net-second-round-of-netdevice-refcount-tracking'
Eric Dumazet says:

====================
net: second round of netdevice refcount tracking

The most interesting part of this series is probably
("inet: add net device refcount tracker to struct fib_nh_common")
but only future reports will confirm this guess.
====================

Link: https://lore.kernel.org/r/20211207013039.1868645-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:45:11 -08:00
Eric Dumazet
ada066b2e0 net: sched: act_mirred: add net device refcount tracker
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:45:00 -08:00
Eric Dumazet
e7c8ab8419 openvswitch: add net device refcount tracker to struct vport
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:45:00 -08:00
Eric Dumazet
e4b8954074 netlink: add net device refcount tracker to struct ethnl_req_info
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:44:59 -08:00
Eric Dumazet
b60645248a net/smc: add net device tracker to struct smc_pnetentry
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:44:59 -08:00
Eric Dumazet
035f1f2b96 pktgen add net device refcount tracker
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:44:59 -08:00
Eric Dumazet
615d069dcf llc: add net device refcount tracker
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:44:59 -08:00
Eric Dumazet
66ce07f780 ax25: add net device refcount tracker
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:44:59 -08:00
Eric Dumazet
e44b14ebae inet: add net device refcount tracker to struct fib_nh_common
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:44:59 -08:00
Eric Dumazet
4fc003fe03 net: switchdev: add net device refcount tracker
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:44:58 -08:00
Eric Dumazet
f12bf6f3f9 net: watchdog: add net device refcount tracker
Add a netdevice_tracker inside struct net_device, to track
the self reference when a device has an active watchdog timer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:44:58 -08:00
Eric Dumazet
b2dcdc7f73 net: bridge: add net device refcount tracker
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:44:58 -08:00
Eric Dumazet
19c9ebf6ed vlan: add net device refcount tracker
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:44:58 -08:00
Eric Dumazet
08f0b22d73 net: eql: add net device refcount tracker
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:44:58 -08:00
Petr Machata
6ebe4b3508 MAINTAINERS: net: mlxsw: Remove Jiri as a maintainer, add myself
Jiri has moved on and will not carry out the mlxsw maintainership duty any
longer. Add myself as a co-maintainer instead.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/45b54312cdebaf65c5d110b15a5dd2df795bf2be.1638807297.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:35:43 -08:00
Jakub Kicinski
56a271be06 Merge branch 'net-tls-cover-all-ciphers-with-tests'
Vadim Fedorenko says:

====================
net: tls: cover all ciphers with tests

Recent patches to Kernel TLS showed that some ciphers are not covered
with tests. Let's cover missed.
====================

Link: https://lore.kernel.org/r/20211206213932.7508-1-vfedorenko@novek.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:18:10 -08:00
Vadim Fedorenko
13bf99ab21 selftests: tls: add missing AES256-GCM cipher
Add tests for TLSv1.2 and TLSv1.3 with AES256-GCM cipher

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:18:07 -08:00
Vadim Fedorenko
d76c51f976 selftests: tls: add missing AES-CCM cipher tests
Add tests for TLSv1.2 and TLSv1.3 with AES-CCM cipher.

Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 20:18:07 -08:00
Eric Dumazet
802a7dc5cf netfilter: conntrack: annotate data-races around ct->timeout
(struct nf_conn)->timeout can be read/written locklessly,
add READ_ONCE()/WRITE_ONCE() to prevent load/store tearing.

BUG: KCSAN: data-race in __nf_conntrack_alloc / __nf_conntrack_find_get

write to 0xffff888132e78c08 of 4 bytes by task 6029 on cpu 0:
 __nf_conntrack_alloc+0x158/0x280 net/netfilter/nf_conntrack_core.c:1563
 init_conntrack+0x1da/0xb30 net/netfilter/nf_conntrack_core.c:1635
 resolve_normal_ct+0x502/0x610 net/netfilter/nf_conntrack_core.c:1746
 nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901
 ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414
 nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
 nf_hook_slow+0x72/0x170 net/netfilter/core.c:619
 nf_hook include/linux/netfilter.h:262 [inline]
 NF_HOOK include/linux/netfilter.h:305 [inline]
 ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324
 inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135
 __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402
 tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline]
 tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680
 __tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864
 tcp_push_pending_frames include/net/tcp.h:1897 [inline]
 tcp_data_snd_check+0x62/0x2e0 net/ipv4/tcp_input.c:5452
 tcp_rcv_established+0x880/0x10e0 net/ipv4/tcp_input.c:5947
 tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521
 sk_backlog_rcv include/net/sock.h:1030 [inline]
 __release_sock+0xf2/0x270 net/core/sock.c:2768
 release_sock+0x40/0x110 net/core/sock.c:3300
 sk_stream_wait_memory+0x435/0x700 net/core/stream.c:145
 tcp_sendmsg_locked+0xb85/0x25a0 net/ipv4/tcp.c:1402
 tcp_sendmsg+0x2c/0x40 net/ipv4/tcp.c:1440
 inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:644
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg net/socket.c:724 [inline]
 __sys_sendto+0x21e/0x2c0 net/socket.c:2036
 __do_sys_sendto net/socket.c:2048 [inline]
 __se_sys_sendto net/socket.c:2044 [inline]
 __x64_sys_sendto+0x74/0x90 net/socket.c:2044
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff888132e78c08 of 4 bytes by task 17446 on cpu 1:
 nf_ct_is_expired include/net/netfilter/nf_conntrack.h:286 [inline]
 ____nf_conntrack_find net/netfilter/nf_conntrack_core.c:776 [inline]
 __nf_conntrack_find_get+0x1c7/0xac0 net/netfilter/nf_conntrack_core.c:807
 resolve_normal_ct+0x273/0x610 net/netfilter/nf_conntrack_core.c:1734
 nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901
 ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414
 nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
 nf_hook_slow+0x72/0x170 net/netfilter/core.c:619
 nf_hook include/linux/netfilter.h:262 [inline]
 NF_HOOK include/linux/netfilter.h:305 [inline]
 ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324
 inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135
 __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402
 __tcp_send_ack+0x1fd/0x300 net/ipv4/tcp_output.c:3956
 tcp_send_ack+0x23/0x30 net/ipv4/tcp_output.c:3962
 __tcp_ack_snd_check+0x2d8/0x510 net/ipv4/tcp_input.c:5478
 tcp_ack_snd_check net/ipv4/tcp_input.c:5523 [inline]
 tcp_rcv_established+0x8c2/0x10e0 net/ipv4/tcp_input.c:5948
 tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521
 sk_backlog_rcv include/net/sock.h:1030 [inline]
 __release_sock+0xf2/0x270 net/core/sock.c:2768
 release_sock+0x40/0x110 net/core/sock.c:3300
 tcp_sendpage+0x94/0xb0 net/ipv4/tcp.c:1114
 inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833
 rds_tcp_xmit+0x376/0x5f0 net/rds/tcp_send.c:118
 rds_send_xmit+0xbed/0x1500 net/rds/send.c:367
 rds_send_worker+0x43/0x200 net/rds/threads.c:200
 process_one_work+0x3fc/0x980 kernel/workqueue.c:2298
 worker_thread+0x616/0xa70 kernel/workqueue.c:2445
 kthread+0x2c7/0x2e0 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30

value changed: 0x00027cc2 -> 0x00000000

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 17446 Comm: kworker/u4:5 Tainted: G        W         5.16.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: krdsd rds_send_worker

Note: I chose an arbitrary commit for the Fixes: tag,
because I do not think we need to backport this fix to very old kernels.

Fixes: e37542ba11 ("netfilter: conntrack: avoid possible false sharing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08 01:29:15 +01:00
Florian Westphal
d46cea0e69 selftests: netfilter: switch zone stress to socat
centos9 has nmap-ncat which doesn't like the '-q' option, use socat.
While at it, mark test skipped if needed tools are missing.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08 01:29:15 +01:00
Pablo Neira Ayuso
962e5a4035 netfilter: nft_exthdr: break evaluation if setting TCP option fails
Break rule evaluation on malformed TCP options.

Fixes: 99d1712bc4 ("netfilter: exthdr: tcp option set support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08 01:05:55 +01:00
Stefano Brivio
0de53b0ffb selftests: netfilter: Add correctness test for mac,net set type
The existing net,mac test didn't cover the issue recently reported
by Nikita Yushchenko, where MAC addresses wouldn't match if given
as first field of a concatenated set with AVX2 and 8-bit groups,
because there's a different code path covering the lookup of six
8-bit groups (MAC addresses) if that's the first field.

Add a similar mac,net test, with MAC address and IPv4 address
swapped in the set specification.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08 01:05:55 +01:00
Stefano Brivio
b7e945e228 nft_set_pipapo: Fix bucket load in AVX2 lookup routine for six 8-bit groups
The sixth byte of packet data has to be looked up in the sixth group,
not in the seventh one, even if we load the bucket data into ymm6
(and not ymm5, for convenience of tracking stalls).

Without this fix, matching on a MAC address as first field of a set,
if 8-bit groups are selected (due to a small set size) would fail,
that is, the given MAC address would never match.

Reported-by: Nikita Yushchenko <nikita.yushchenko@virtuozzo.com>
Cc: <stable@vger.kernel.org> # 5.6.x
Fixes: 7400b06396 ("nft_set_pipapo: Introduce AVX2-based lookup implementation")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-By: Nikita Yushchenko <nikita.yushchenko@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08 01:05:55 +01:00
Nicolas Dichtel
d43b75fbc2 vrf: don't run conntrack on vrf with !dflt qdisc
After the below patch, the conntrack attached to skb is set to "notrack" in
the context of vrf device, for locally generated packets.
But this is true only when the default qdisc is set to the vrf device. When
changing the qdisc, notrack is not set anymore.
In fact, there is a shortcut in the vrf driver, when the default qdisc is
set, see commit dcdd43c41e ("net: vrf: performance improvements for
IPv4") for more details.

This patch ensures that the behavior is always the same, whatever the qdisc
is.

To demonstrate the difference, a new test is added in conntrack_vrf.sh.

Fixes: 8c9c296adf ("vrf: run conntrack only in context of lower/physdev for locally generated packets")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08 01:05:55 +01:00
Linus Torvalds
2a987e6502 perf tools fixes for v5.16: 2nd batch
- Fix SMT detection fast read path on sysfs.
 
 - Fix memory leaks when processing feature headers in perf.data files.
 
 - Fix 'Simple expression parser' 'perf test' on arch without CPU die topology
   info, such as s/390.
 
 - Fix building perf with BUILD_BPF_SKEL=1.
 
 - Fix 'perf bench' by reverting "perf bench: Fix two memory leaks detected with ASan".
 
 - Fix itrace space allowed for new attributes in 'perf script'.
 
 - Fix the build feature detection fast path, that was always failing on systems
   with python3 development packages, speeding up the build.
 
 - Reset shadow counts before loading, fixing metrics using duration_time.
 
 - Sync more kernel headers changed by the new futex_waitv syscall: s390 and powerpc.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYa9kXwAKCRCyPKLppCJ+
 J1pSAQDAk1vQa82x5bqq8yHZTeym4Y6QtKivveTmqCagFT7FOwEAmzwOsdv/v3cQ
 GA0u97zPUsTiN1ma/nh+YZTA9IK1nw0=
 =YPeN
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v5.16-2021-12-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix SMT detection fast read path on sysfs.

 - Fix memory leaks when processing feature headers in perf.data files.

 - Fix 'Simple expression parser' 'perf test' on arch without CPU die
   topology info, such as s/390.

 - Fix building perf with BUILD_BPF_SKEL=1.

 - Fix 'perf bench' by reverting "perf bench: Fix two memory leaks
   detected with ASan".

 - Fix itrace space allowed for new attributes in 'perf script'.

 - Fix the build feature detection fast path, that was always failing on
   systems with python3 development packages, speeding up the build.

 - Reset shadow counts before loading, fixing metrics using
   duration_time.

 - Sync more kernel headers changed by the new futex_waitv syscall: s390
   and powerpc.

* tag 'perf-tools-fixes-for-v5.16-2021-12-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf bpf_skel: Do not use typedef to avoid error on old clang
  perf bpf: Fix building perf with BUILD_BPF_SKEL=1 by default in more distros
  perf header: Fix memory leaks when processing feature headers
  perf test: Reset shadow counts before loading
  perf test: Fix 'Simple expression parser' test on arch without CPU die topology info
  tools build: Remove needless libpython-version feature check that breaks test-all fast path
  perf tools: Fix SMT detection fast read path
  tools headers UAPI: Sync powerpc syscall table file changed by new futex_waitv syscall
  perf inject: Fix itrace space allowed for new attributes
  tools headers UAPI: Sync s390 syscall table file changed by new futex_waitv syscall
  Revert "perf bench: Fix two memory leaks detected with ASan"
2021-12-07 15:36:45 -08:00
Michal Swiatkowski
de6acd1cdd ice: fix adding different tunnels
Adding filters with the same values inside for VXLAN and Geneve causes HW
error, because it looks exactly the same. To choose between different
type of tunnels new recipe is needed. Add storing tunnel types in
creating recipes function and start checking it in finding function.

Change getting open tunnels function to return port on correct tunnel
type. This is needed to copy correct port to dummy packet.

Block user from adding enc_dst_port via tc flower, because VXLAN and
Geneve filters can be created only with destination port which was
previously opened.

Fixes: 8b032a55c1 ("ice: low level support for tunnels")
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-07 13:21:01 -08:00
Michal Swiatkowski
0e32ff0240 ice: fix choosing UDP header type
In tunnels packet there can be two UDP headers:
- outer which for hw should be mark as ICE_UDP_OF
- inner which for hw should be mark as ICE_UDP_ILOS or as ICE_TCP_IL if
  inner header is of TCP type

In none tunnels packet header can be:
- UDP, which for hw should be mark as ICE_UDP_ILOS
- TCP, which for hw should be mark as ICE_TCP_IL

Change incorrect ICE_UDP_OF for none tunnel packets to ICE_UDP_ILOS.
ICE_UDP_OF is incorrect for none tunnel packets and setting it leads to
error from hw while adding this kind of recipe.

In summary, for tunnel outer port type should always be set to
ICE_UDP_OF, for none tunnel outer and tunnel inner it should always be
set to ICE_UDP_ILOS.

Fixes: 9e300987d4 ("ice: VXLAN and Geneve TC support")
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-07 13:21:01 -08:00
Jesse Brandeburg
28dc1b86f8 ice: ignore dropped packets during init
If the hardware is constantly receiving unicast or broadcast packets
during driver load, the device previously counted many GLV_RDPC (VSI
dropped packets) events during init. This causes confusing dropped
packet statistics during driver load. The dropped packets counter
incrementing does stop once the driver finishes loading.

Avoid this problem by baselining our statistics at the end of driver
open instead of the end of probe.

Fixes: cdedef59de ("ice: Configure VSIs for Tx/Rx")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-07 13:21:01 -08:00
Dave Ertman
6d39ea19b0 ice: Fix problems with DSCP QoS implementation
The patch that implemented DSCP QoS implementation removed a
bandwidth check that was used to check for a specific condition
caused by some corner cases.  This check should not of been
removed.

The same patch also added a check for when the DCBx state could
be changed in relation to DSCP, but the check was erroneously
added nested in a check for CEE mode, which made the check useless.

Fix these problems by re-adding the bandwidth check and relocating
the DSCP mode check earlier in the function that changes DCBx state
in the driver.

Fixes: 2a87bd73e5 ("ice: Add DSCP support")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-07 13:21:01 -08:00
Paul Greenwalt
2657e16d8c ice: rearm other interrupt cause register after enabling VFs
The other interrupt cause register (OICR), global interrupt 0, is
disabled when enabling VFs to prevent handling VFLR. If the OICR is
not rearmed then the VF cannot communicate with the PF.

Rearm the OICR after enabling VFs.

Fixes: 916c7fdf5e ("ice: Separate VF VSI initialization/creation from reset flow")
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-07 13:21:01 -08:00
Yahui Cao
f23ab04dd6 ice: fix FDIR init missing when reset VF
When VF is being reset, ice_reset_vf() will be called and FDIR
resource should be released and initialized again.

Fixes: 1f7ea1cd6a ("ice: Enable FDIR Configure for AVF")
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-07 13:21:00 -08:00
Jakub Kicinski
59d58d93af Merge branch 'mptcp-new-features-for-mptcp-sockets-and-netlink-pm'
Mat Martineau says:

====================
mptcp: New features for MPTCP sockets and netlink PM

This collection of patches adds MPTCP socket support for a few socket
options, ioctls, and one ancillary data type (specifics for each are
listed below). There's also a patch modifying the netlink MPTCP path
manager API to allow setting the backup flag on a configured interface
using the endpoint ID instead of the full IP address.

Patches 1 & 2: TCP_INQ cmsg and selftests.

Patches 2 & 3: SIOCINQ, OUTQ, and OUTQNSD ioctls and selftests.

Patch 5: Change backup flag using endpoint ID.

Patches 6 & 7: IP_TOS socket option and selftests.

Patches 8-10: TCP_CORK and TCP_NODELAY socket options. Includes a tcp
change to expose __tcp_sock_set_cork() and __tcp_sock_set_nodelay() for
use by MPTCP.
====================

Link: https://lore.kernel.org/r/20211203223541.69364-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 11:36:37 -08:00
Maxim Galaganov
4f6e14bd19 mptcp: support TCP_CORK and TCP_NODELAY
First, add cork and nodelay fields to the mptcp_sock structure
so they can be used in sync_socket_options(), and fill them on setsockopt
while holding the msk socket lock.

Then, on setsockopt set proper tcp_sk(ssk)->nonagle values for subflows
by calling __tcp_sock_set_cork() or __tcp_sock_set_nodelay() on the ssk
while holding the ssk socket lock.

tcp_push_pending_frames() will be invoked on the ssk if a cork was cleared
or nodelay was set. Also set MPTCP_PUSH_PENDING bit by calling
mptcp_check_and_set_pending(). This will lead to __mptcp_push_pending()
being called inside mptcp_release_cb() with new tcp_sk(ssk)->nonagle.

Also add getsockopt support for TCP_CORK and TCP_NODELAY.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Maxim Galaganov <max@internet.ru>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 11:36:31 -08:00
Maxim Galaganov
8b38217a2a mptcp: expose mptcp_check_and_set_pending
Expose the mptcp_check_and_set_pending() function for use inside MPTCP
sockopt code. The next patch will call it when TCP_CORK is cleared or
TCP_NODELAY is set on the MPTCP socket in order to push pending data
from mptcp_release_cb().

Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Maxim Galaganov <max@internet.ru>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 11:36:31 -08:00
Maxim Galaganov
6fadaa5658 tcp: expose __tcp_sock_set_cork and __tcp_sock_set_nodelay
Expose __tcp_sock_set_cork() and __tcp_sock_set_nodelay() for use in
MPTCP setsockopt code -- namely for syncing MPTCP socket options with
subflows inside sync_socket_options() while already holding the subflow
socket lock.

Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Maxim Galaganov <max@internet.ru>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 11:36:30 -08:00
Florian Westphal
edb596e80c selftests: mptcp: check IP_TOS in/out are the same
Check that getsockopt(IP_TOS) returns what setsockopt(IP_TOS) did set
right before.

Also check that socklen_t == 0 and -1 input values match those
of normal tcp sockets.

Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 11:36:30 -08:00
Florian Westphal
3b1e21eb60 mptcp: getsockopt: add support for IP_TOS
earlier patch added IP_TOS setsockopt support, this allows to get
the value set by earlier setsockopt.

Extends mptcp_put_int_option to handle u8 input/output by
adding required cast.

Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 11:36:30 -08:00
Davide Caratti
602837e847 mptcp: allow changing the "backup" bit by endpoint id
a non-zero 'id' is sufficient to identify MPTCP endpoints: allow changing
the value of 'backup' bit by simply specifying the endpoint id.

Link: https://github.com/multipath-tcp/mptcp_net-next/issues/158
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 11:36:30 -08:00
Florian Westphal
b51880568f selftests: mptcp: add inq test case
client & server use a unix socket connection to communicate
outside of the mptcp connection.

This allows the consumer to know in advance how many bytes have been
(or will be) sent by the peer.
This allows stricter checks on the bytecounts reported by TCP_INQ cmsg.

Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 11:36:30 -08:00
Florian Westphal
644807e3e4 mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls
Allows to query in-sequence data ready for read(), total bytes in
write queue and total bytes in write queue that have not yet been sent.

v2: remove unneeded READ_ONCE() (Paolo Abeni)
v3: check for new data unconditionally in SIOCINQ ioctl (Mat Martineau)

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 11:36:29 -08:00
Florian Westphal
5cbd886ce2 selftests: mptcp: add TCP_INQ support
Do checks on the returned inq counter.

Fail on:
1. Huge value (> 1 kbyte, test case files are 1 kb)
2. last hint larger than returned bytes when read was short
3. erronenous indication of EOF.

3) happens when a hint of X bytes reads X-1 on next call
   but next recvmsg returns more data (instead of EOF).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 11:36:29 -08:00
Florian Westphal
2c9e77659a mptcp: add TCP_INQ cmsg support
Support the TCP_INQ setsockopt.

This is a boolean that tells recvmsg path to include the remaining
in-sequence bytes in the cmsg data.

v2: do not use CB(skb)->offset, increment map_seq instead (Paolo Abeni)
v3: adjust CB(skb)->map_seq when taking skb from ofo queue (Paolo Abeni)

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/224
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07 11:36:29 -08:00