linux

Author	SHA1	Message	Date
Jakub Kicinski	602ae008ab	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: 1) Simplify nf_ct_get_tuple(), from Jackie Liu. 2) Add format to request_module() call, from Bill Wendling. 3) Add /proc/net/stats/nf_flowtable to monitor in-flight pending hardware offload objects to be processed, from Vlad Buslov. 4) Missing rcu annotation and accessors in the netfilter tree, from Florian Westphal. 5) Merge h323 conntrack helper nat hooks into single object, also from Florian. 6) A batch of update to fix sparse warnings treewide, from Florian Westphal. 7) Move nft_cmp_fast_mask() where it used, from Florian. 8) Missing const in nf_nat_initialized(), from James Yonan. 9) Use bitmap API for Maglev IPVS scheduler, from Christophe Jaillet. 10) Use refcount_inc instead of _inc_not_zero in flowtable, from Florian Westphal. 11) Remove pr_debug in xt_TPROXY, from Nathan Cancellor. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: xt_TPROXY: remove pr_debug invocations netfilter: flowtable: prefer refcount_inc netfilter: ipvs: Use the bitmap API to allocate bitmaps netfilter: nf_nat: in nf_nat_initialized(), use const struct nf_conn * netfilter: nf_tables: move nft_cmp_fast_mask to where its used netfilter: nf_tables: use correct integer types netfilter: nf_tables: add and use BE register load-store helpers netfilter: nf_tables: use the correct get/put helpers netfilter: x_tables: use correct integer types netfilter: nfnetlink: add missing __be16 cast netfilter: nft_set_bitmap: Fix spelling mistake netfilter: h323: merge nat hook pointers into one netfilter: nf_conntrack: use rcu accessors where needed netfilter: nf_conntrack: add missing __rcu annotations netfilter: nf_flow_table: count pending offload workqueue tasks net/sched: act_ct: set 'net' pointer when creating new nf_flow_table netfilter: conntrack: use correct format characters netfilter: conntrack: use fallthrough to cleanup ==================== Link: https://lore.kernel.org/r/20220720230754.209053-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-20 18:05:51 -07:00
Jakub Kicinski	47f058ce98	mlx5-updates-2022-07-17 1) Add resiliency for lost completions for PTP TX port timestamp 2) Report Header-data split state via ethtool 3) Decouple HTB code from main regular TX code -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmLXFPYACgkQSD+KveBX +j6LXgf9EafSAwlOVVe1znXsFETQo06RnN+av4/GrH8lv9pNfv+NbZDEvryS6l9v AV0gTIc68cSk77le7RkNna+iZB1QnvsxgMpGtz4v9TjCmke4ZPduNuXh4Jl6yESb t/Bxli2IjcZenzM8iiYex3WronGtWHrraQKwtmrHDpCxTmBGvId4g3AlFoAAUENh u5hDtjYN2SmRfTT4J1N1y9EyoIQqXytV/jtiQqmkpxxCkwtAI8AEZuu4UrYCNogy xrhag3ZKPx0pqNYGbFPxZZq84GJwi6QODbBqhaqO/NqOSYrE83Vtn+ohlWUGWMhv KeEWwYDM0bCF3R2/ONn2xulU2dyUAg== =Q+YN -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2022-07-17 1) Add resiliency for lost completions for PTP TX port timestamp 2) Report Header-data split state via ethtool 3) Decouple HTB code from main regular TX code * tag 'mlx5-updates-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: CT: Remove warning of ignore_flow_level support for non PF net/mlx5e: Add resiliency for PTP TX port timestamp net/mlx5: Expose ts_cqe_metadata_size2wqe_counter net/mlx5e: HTB, move htb functions to a new file net/mlx5e: HTB, change functions name to follow convention net/mlx5e: HTB, remove priv from htb function calls net/mlx5e: HTB, hide and dynamically allocate mlx5e_htb structure net/mlx5e: HTB, move stats and max_sqs to priv net/mlx5e: HTB, move section comment to the right place net/mlx5e: HTB, move ids to selq_params struct net/mlx5e: HTB, reduce visibility of htb functions net/mlx5e: Fix mqprio_rl handling on devlink reload net/mlx5e: Report header-data split state through ethtool ==================== Link: https://lore.kernel.org/r/20220719203529.51151-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-20 17:54:46 -07:00
Justin Stitt	aa8c7cdbae	netfilter: xt_TPROXY: remove pr_debug invocations pr_debug calls are no longer needed in this file. Pablo suggested "a patch to remove these pr_debug calls". This patch has some other beneficial collateral as it also silences multiple Clang -Wformat warnings that were present in the pr_debug calls. diff from v1 -> v2: * converted if statement one-liner style * x == NULL is now !x Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-21 00:56:00 +02:00
Florian Westphal	f02e7dc4cf	netfilter: flowtable: prefer refcount_inc With refcount_inc_not_zero, we'd also need a smp_rmb or similar, followed by a test of the CONFIRMED bit. However, the ct pointer is taken from skb->_nfct, its refcount must not be 0 (else, we'd already have a use-after-free bug). Use refcount_inc() instead to clarify the ct refcount is expected to be at least 1. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-21 00:55:39 +02:00
Christophe JAILLET	5787db7c90	netfilter: ipvs: Use the bitmap API to allocate bitmaps Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-07-21 00:55:39 +02:00
Alex Elder	5fb859f79f	net: ipa: initialize ring indexes to 0 When a GSI channel is initially allocated, and after it has been reset, the hardware assumes its ring index is 0. And although we do initialize channels this way, the comments in the IPA code don't really explain this. For event rings, it doesn't matter what value we use initially, so using 0 is just fine. Add some information about the assumptions made by hardware above the definition of the gsi_ring structure in "gsi.h". Zero the index field for all rings (channel and event) when the ring is allocated. As a result, that function initializes all fields in the structure. Stop zeroing the index the top of gsi_channel_program(). Initially we'll use the index value set when the channel ring was allocated. And we'll explicitly zero the index value in gsi_channel_reset() before programming the hardware, adding a comment explaining why it's required. For event rings, use the index initialized by gsi_ring_alloc() rather than 0 when ringing the doorbell in gsi_evt_ring_program(). (It'll still be zero, but we won't assume that to be the case.) Use a local variable in gsi_evt_ring_program() that represents the address of the event ring's ring structure. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 11:12:20 +01:00
Oleksandr Mazur	52323ef754	net: marvell: prestera: add phylink support For SFP port prestera driver will use kernel phylink infrastucture to configure port mode based on the module that has beed inserted Co-developed-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Signed-off-by: Yevhen Orlov <yevhen.orlov@plvision.eu> Co-developed-by: Taras Chornyi <taras.chornyi@plvision.eu> Signed-off-by: Taras Chornyi <taras.chornyi@plvision.eu> Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:24:40 +01:00
Andrey Turkin	ffcdd1197d	vmxnet3: Implement ethtool's get_channels command Some tools (e.g. libxdp) use that information. Signed-off-by: Andrey Turkin <andrey.turkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:21:15 +01:00
David S. Miller	50ad649dd7	linux-can-next-for-5.20-20220720 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEBsvAIBsPu6mG7thcrX5LkNig010FAmLXs+kTHG1rbEBwZW5n dXRyb25peC5kZQAKCRCtfkuQ2KDTXRhhB/4oU+hdSITnF4SkX+bHa2+7YP/qq2CS AT4/cc7WheYsCB/NFLLBQ9YuPJauw8Z7dTcOvWDX5vxfKzkYen9QaMOhkX67q4i4 yTk1j28ryh7SSMOmSoSlBgru/sqDGL0gc2d3K/DnABJcudrY4gOsFsHA24yPCoPS XwRen/O+/zbumrZrn2vhcqpRKQFq7B4D5GXhFBKyi/VnPpd1z9vb7nR1RqBzg1Qr xdW0oNewAwOv0w1VbZUWJWa2NlsIIdyhSX2hG4pa4GBUl9/KHtA9SMS4WSCcvUhP PpRM2o7MTFKB0ViJqseNxty6f4Xin7B5tNshj9vpmchw5QBFRSN1v1g5 =0265 -----END PGP SIGNATURE----- Merge tag 'linux-can-next-for-5.20-20220720' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== this is a pull request of 29 patches for net-next/master. The first 6 patches target the slcan driver. Dan Carpenter contributes a hardening patch, followed by 5 cleanup patches. Biju Das contributes 5 patches to prepare the sja1000 driver to support the Renesas RZ/N1 SJA1000 CAN controller. Dario Binacchi's patch for the slcan driver fixes a sleep with held spin lock. Another patch by Dario Binacchi fixes a wrong comment in the c_can driver. Pavel Pisa updates the CTU CAN FD IP core registers. Stephane Grosjean contributes 3 patches to the peak_usb driver for cleanups and support of a new MCU. The last 12 patches are by Vincent Mailhol, they fix and improve the txerr and rxerr reporting in all CAN drivers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-20 10:17:05 +01:00
Marc Kleine-Budde	1dbd8748a1	Merge branch 'can-error-set-of-fixes-and-improvement-on-txerr-and-rxerr-reporting' Vincent Mailhol says: ==================== can: error: set of fixes and improvement on txerr and rxerr reporting This series is a collection of patches targeting the CAN error counter. The series is split in three blocks (with small relation to each other). Several drivers uses the data[6] and data[7] fields (both of type u8) of the CAN error frame to report those values. However, the maximum size an u8 can hold is 255 and the error counter can exceed this value if bus-off status occurs. As such, the first nine patches of this series make sure that no drivers try to report txerr or rxerr through the CAN error frame when bus-off status is reached. can_frame::data[5..7] are defined as being "controller specific". Controller specific behaviors are not something desirable (portability issue...) The tenth patch of this series specifies how can_frame::data[5..7] should be use and remove any "controller specific" freedom. The eleventh patch adds a flag to notify though can_frame::can_id that data[6..7] were populated (in order to be consistent with other fields). Finally, the twelfth and last patch add three macro values to specify the different error counter threshold with so far was hard-coded as magic numbers in the drivers. N.B.: * patches 1 to 10 are for net (stable). * patches 11 and 12 are for net-next (but depends on patches 1 to 10). Changelog v1 -> v2: https://lore.kernel.org/all/20220712153157.83847-1-mailhol.vincent@wanadoo.fr * Fix typo in patch #10: data[7] of CAN error frames is for the RX error counter, not the TX one (this is litteraly a one byte change). ==================== As discussed take the whole series via can-next -> net-next. Link: https://lore.kernel.org/all/20220719143550.3681-1-mailhol.vincent@wanadoo.fr Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-07-20 09:28:43 +02:00
Vincent Mailhol	3f9c26210c	can: error: add definitions for the different CAN error thresholds Currently, drivers are using magic numbers to derive the CAN error states from the error counter. Add three macro declarations to remediate this. For reference, the error-active, error-passive and bus-off are defined in ISO 11898, section 12.1.4.2 "Error counting". Although ISO 11898 does not define error-warning state, this extra value is also commonly used and is thus also added. Link: https://lore.kernel.org/all/20220719143550.3681-13-mailhol.vincent@wanadoo.fr Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-07-20 09:27:51 +02:00
Vincent Mailhol	3e5c291c79	can: add CAN_ERR_CNT flag to notify availability of error counter Add a dedicated flag in uapi/linux/can/error.h to notify the userland that fields data[6] and data[7] of the CAN error frame were respectively populated with the tx and rx error counters. For all driver tree-wide, set up this flags whenever needed. Link: https://lore.kernel.org/all/20220719143550.3681-12-mailhol.vincent@wanadoo.fr Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-07-20 09:27:37 +02:00
Vincent Mailhol	e70a3263a7	can: error: specify the values of data[5..7] of CAN error frames Currently, data[5..7] of struct can_frame, when used as a CAN error frame, are defined as being "controller specific". Device specific behaviours are problematic because it prevents someone from writing code which is portable between devices. As a matter of fact, data[5] is never used, data[6] is always used to report TX error counter and data[7] is always used to report RX error counter. can-utils also relies on this. This patch updates the comment in the uapi header to specify that data[5] is reserved (and thus should not be used) and that data[6..7] are used for error counters. Fixes: `0d66548a10` ("[CAN]: Add PF_CAN core module") Link: https://lore.kernel.org/all/20220719143550.3681-11-mailhol.vincent@wanadoo.fr Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-07-20 09:26:50 +02:00
Vincent Mailhol	aebe8a2433	can: usb_8dev: do not report txerr and rxerr during bus-off During bus off, the error count is greater than 255 and can not fit in a u8. Fixes: `0024d8ad16` ("can: usb_8dev: Add support for USB2CAN interface from 8 devices") Link: https://lore.kernel.org/all/20220719143550.3681-10-mailhol.vincent@wanadoo.fr Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-07-20 09:26:50 +02:00
Vincent Mailhol	a57732084e	can: kvaser_usb_leaf: do not report txerr and rxerr during bus-off During bus off, the error count is greater than 255 and can not fit in a u8. Fixes: `7259124eac` ("can: kvaser_usb: Split driver into kvaser_usb_core.c and kvaser_usb_leaf.c") Link: https://lore.kernel.org/all/20220719143550.3681-9-mailhol.vincent@wanadoo.fr CC: Jimmy Assarsson <extja@kvaser.com> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-07-20 09:26:49 +02:00
Vincent Mailhol	936e905953	can: kvaser_usb_hydra: do not report txerr and rxerr during bus-off During bus off, the error count is greater than 255 and can not fit in a u8. Fixes: `aec5fb2268` ("can: kvaser_usb: Add support for Kvaser USB hydra family") Link: https://lore.kernel.org/all/20220719143550.3681-8-mailhol.vincent@wanadoo.fr CC: Jimmy Assarsson <extja@kvaser.com> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-07-20 09:26:49 +02:00
Vincent Mailhol	0ac15a8f66	can: sun4i_can: do not report txerr and rxerr during bus-off During bus off, the error count is greater than 255 and can not fit in a u8. Fixes: `0738eff14d` ("can: Allwinner A10/A20 CAN Controller support - Kernel module") Link: https://lore.kernel.org/all/20220719143550.3681-7-mailhol.vincent@wanadoo.fr CC: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-07-20 09:26:49 +02:00
Vincent Mailhol	a22bd630cf	can: hi311x: do not report txerr and rxerr during bus-off During bus off, the error count is greater than 255 and can not fit in a u8. Fixes: `57e83fb9b7` ("can: hi311x: Add Holt HI-311x CAN driver") Link: https://lore.kernel.org/all/20220719143550.3681-6-mailhol.vincent@wanadoo.fr Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-07-20 09:26:49 +02:00
Vincent Mailhol	ce0e7aeb67	can: slcan: do not report txerr and rxerr during bus-off During bus off, the error count is greater than 255 and can not fit in a u8. alloc_can_err_skb() already sets cf to NULL if the allocation fails [1], so the redundant cf = NULL assignment gets removed. [1] https://elixir.bootlin.com/linux/latest/source/drivers/net/can/dev/skb.c#L187 Fixes: `0a9cdcf098` ("can: slcan: extend the protocol with CAN state info") Link: https://lore.kernel.org/all/20220719143550.3681-5-mailhol.vincent@wanadoo.fr CC: Dario Binacchi <dario.binacchi@amarulasolutions.com> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-07-20 09:26:49 +02:00
Vincent Mailhol	164d7cb2d5	can: sja1000: do not report txerr and rxerr during bus-off During bus off, the error count is greater than 255 and can not fit in a u8. Fixes: `215db1856e` ("can: sja1000: Consolidate and unify state change handling") Link: https://lore.kernel.org/all/20220719143550.3681-4-mailhol.vincent@wanadoo.fr Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-07-20 09:26:49 +02:00
Vincent Mailhol	a37b7245e8	can: rcar_can: do not report txerr and rxerr during bus-off During bus off, the error count is greater than 255 and can not fit in a u8. Fixes: `fd1159318e` ("can: add Renesas R-Car CAN driver") Link: https://lore.kernel.org/all/20220719143550.3681-3-mailhol.vincent@wanadoo.fr Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-07-20 09:26:49 +02:00
Vincent Mailhol	3a5c7e4611	can: pch_can: do not report txerr and rxerr during bus-off During bus off, the error count is greater than 255 and can not fit in a u8. Fixes: `0c78ab76a0` ("pch_can: Add setting TEC/REC statistics processing") Link: https://lore.kernel.org/all/20220719143550.3681-2-mailhol.vincent@wanadoo.fr Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-07-20 09:26:48 +02:00
Jakub Kicinski	c2fe9ec397	Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 1GbE Intel Wired LAN Driver Updates 2022-07-18 This series contains updates to igc driver only. Kurt Kanzenbach adds support for Qbv schedules where one queue stays open in consecutive entries. Sasha removes an unused define and field. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: igc: Remove forced_speed_duplex value igc: Remove MSI-X PBA Clear register igc: Lift TAPRIO schedule restriction ==================== Link: https://lore.kernel.org/r/20220718180109.4114540-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 17:45:05 -07:00
Davide Caratti	ca0cab1192	net/sched: remove qdisc_root_lock() helper the last caller has been removed with commit `96f5e66e8a` ("mac80211: fix aggregation for hardware with ampdu queues"), so it's safe to remove this function. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/703d549e3088367651d92a059743f1be848d74b7.1658133689.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 17:14:55 -07:00
Jakub Kicinski	7f9eee196e	Merge branch 'io_uring-zerocopy-send' of git://git.kernel.org/pub/scm/linux/kernel/git/kuba/linux Pavel Begunkov says: ==================== io_uring zerocopy send The patchset implements io_uring zerocopy send. It works with both registered and normal buffers, mixing is allowed but not recommended. Apart from usual request completions, just as with MSG_ZEROCOPY, io_uring separately notifies the userspace when buffers are freed and can be reused (see API design below), which is delivered into io_uring's Completion Queue. Those "buffer-free" notifications are not necessarily per request, but the userspace has control over it and should explicitly attaching a number of requests to a single notification. The series also adds some internal optimisations when used with registered buffers like removing page referencing. From the kernel networking perspective there are two main changes. The first one is passing ubuf_info into the network layer from io_uring (inside of an in kernel struct msghdr). This allows extra optimisations, e.g. ubuf_info caching on the io_uring side, but also helps to avoid cross-referencing and synchronisation problems. The second part is an optional optimisation removing page referencing for requests with registered buffers. Benchmarking UDP with an optimised version of the selftest (see [1]), which sends a bunch of requests, waits for completions and repeats. "+ flush" column posts one additional "buffer-free" notification per request, and just "zc" doesn't post buffer notifications at all. NIC (requests / second): IO size \| non-zc \| zc \| zc + flush 4000 \| 495134 \| 606420 (+22%) \| 558971 (+12%) 1500 \| 551808 \| 577116 (+4.5%) \| 565803 (+2.5%) 1000 \| 584677 \| 592088 (+1.2%) \| 560885 (-4%) 600 \| 596292 \| 598550 (+0.4%) \| 555366 (-6.7%) dummy (requests / second): IO size \| non-zc \| zc \| zc + flush 8000 \| 1299916 \| 2396600 (+84%) \| 2224219 (+71%) 4000 \| 1869230 \| 2344146 (+25%) \| 2170069 (+16%) 1200 \| 2071617 \| 2361960 (+14%) \| 2203052 (+6%) 600 \| 2106794 \| 2381527 (+13%) \| 2195295 (+4%) Previously it also brought a massive performance speedup compared to the msg_zerocopy tool (see [3]), which is probably not super interesting. There is also an additional bunch of refcounting optimisations that was omitted from the series for simplicity and as they don't change the picture drastically, they will be sent as follow up, as well as flushing optimisations closing the performance gap b/w two last columns. For TCP on localhost (with hacks enabling localhost zerocopy) and including additional overhead for receive: IO size \| non-zc \| zc 1200 \| 4174 \| 4148 4096 \| 7597 \| 11228 Using a real NIC 1200 bytes, zc is worse than non-zc ~5-10%, maybe the omitted optimisations will somewhat help, should look better for 4000, but couldn't test properly because of setup problems. Links: liburing (benchmark + tests): [1] https://github.com/isilence/liburing/tree/zc_v4 kernel repo: [2] https://github.com/isilence/linux/tree/zc_v4 RFC v1: [3] https://lore.kernel.org/io-uring/cover.1638282789.git.asml.silence@gmail.com/ RFC v2: https://lore.kernel.org/io-uring/cover.1640029579.git.asml.silence@gmail.com/ Net patches based: git@github.com:isilence/linux.git zc_v4-net-base or https://github.com/isilence/linux/tree/zc_v4-net-base API design overview: The series introduces an io_uring concept of notifactors. From the userspace perspective it's an entity to which it can bind one or more requests and then requesting to flush it. Flushing a notifier makes it impossible to attach new requests to it, and instructs the notifier to post a completion once all requests attached to it are completed and the kernel doesn't need the buffers anymore. Notifications are stored in notification slots, which should be registered as an array in io_uring. Each slot stores only one notifier at any particular moment. Flushing removes it from the slot and the slot automatically replaces it with a new notifier. All operations with notifiers are done by specifying an index of a slot it's currently in. When registering a notification the userspace specifies a u64 tag for each slot, which will be copied in notification completion entries as cqe::user_data. cqe::res is 0 and cqe::flags is equal to wrap around u32 sequence number counting notifiers of a slot. ==================== Link: https://lore.kernel.org/r/cover.1657643355.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 14:22:41 -07:00
Pavel Begunkov	eb315a7d13	tcp: support externally provided ubufs Teach tcp how to use external ubuf_info provided in msghdr and also prepare it for managed frags by sprinkling skb_zcopy_downgrade_managed() when it could mix managed and not managed frags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 14:20:54 -07:00
Pavel Begunkov	1fd3ae8c90	ipv6/udp: support externally provided ubufs Teach ipv6/udp how to use external ubuf_info provided in msghdr and also prepare it for managed frags by sprinkling skb_zcopy_downgrade_managed() when it could mix managed and not managed frags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 14:20:54 -07:00
Pavel Begunkov	c445f31b3c	ipv4/udp: support externally provided ubufs Teach ipv4/udp how to use external ubuf_info provided in msghdr and also prepare it for managed frags by sprinkling skb_zcopy_downgrade_managed() when it could mix managed and not managed frags. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 14:20:54 -07:00
Pavel Begunkov	84ce071e38	net: introduce __skb_fill_page_desc_noacc Managed pages contain pinned userspace pages and controlled by upper layers, there is no need in tracking skb->pfmemalloc for them. Introduce a helper for filling frags but ignoring page tracking, it'll be needed later. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 14:20:54 -07:00
Pavel Begunkov	753f1ca4e1	net: introduce managed frags infrastructure Some users like io_uring can do page pinning more efficiently, so we want a way to delegate referencing to other subsystems. For that add a new flag called SKBFL_MANAGED_FRAG_REFS. When set, skb doesn't hold page references and upper layers are responsivle to managing page lifetime. It's allowed to convert skbs from managed to normal by calling skb_zcopy_downgrade_managed(). The function will take all needed page references and clear the flag. It's needed, for instance, to avoid mixing managed modes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 14:20:54 -07:00
David Ahern	ebe73a284f	net: Allow custom iter handler in msghdr Add support for custom iov_iter handling to msghdr. The idea is that in-kernel subsystems want control over how an SG is split. Signed-off-by: David Ahern <dsahern@kernel.org> [pavel: move callback into msghdr] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 14:20:54 -07:00
Pavel Begunkov	7c701d92b2	skbuff: carry external ubuf_info in msghdr Make possible for network in-kernel callers like io_uring to pass in a custom ubuf_info by setting it in a new field of struct msghdr. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 14:20:42 -07:00
Edward Cree	1f17708b47	sfc: update MCDI protocol headers Link: https://lore.kernel.org/r/cover.1657878101.git.ecree.xilinx@gmail.com Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-19 13:37:04 -07:00
Roi Dayan	22df2e9362	net/mlx5: CT: Remove warning of ignore_flow_level support for non PF ignore_flow_level isn't supported for SFs, and so it causes post_act and ct to warn about it per SF. Apply the warning only for PF. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-19 13:32:54 -07:00
Aya Levin	58a518948f	net/mlx5e: Add resiliency for PTP TX port timestamp PTP TX port timestamp relies on receiving 2 CQEs for each outgoing packet (WQE). The regular CQE has a less accurate timestamp than the wire CQE. On link change, the wire CQE may get lost. Let the driver detect and restore the relation between the CQEs, and re-sync after timeout. Add resiliency for this as follows: add id (producer counter) into the WQE's metadata. This id will be received in the wire CQE (in wqe_counter field). On handling the wire CQE, if there is no match, replay the PTP application with the time-stamp from the regular CQE and restore the sync between the CQEs and their SKBs. This patch adds 2 ptp counters: 1) ptp_cq0_resync_event: number of times a mismatch was detected between the regular CQE and the wire CQE. 2) ptp_cq0_resync_cqe: total amount of missing wire CQEs. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-19 13:32:54 -07:00
Aya Levin	2e5e4185ff	net/mlx5: Expose ts_cqe_metadata_size2wqe_counter Add capability field which indicates the mask for wqe_counter which connects between loopback CQE and the original WQE. With this connection the driver can identify lost of the loopback CQE and reply PTP synchronization with timestamp given in the original CQE. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-19 13:32:53 -07:00
Moshe Tal	462b005999	net/mlx5e: HTB, move htb functions to a new file Move htb related functions and data to a separated file for better encapsulation. Signed-off-by: Moshe Tal <moshet@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-19 13:32:53 -07:00
Moshe Tal	3685eed56f	net/mlx5e: HTB, change functions name to follow convention Following the change of the functions to be object like, change also the names. Signed-off-by: Moshe Tal <moshet@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-19 13:32:52 -07:00
Moshe Tal	28df4a0117	net/mlx5e: HTB, remove priv from htb function calls As a step to make htb self-contained replace the passing of priv as a parameter to htb function calls with members in the htb struct. Full decoupling the htb from priv will require more work, so for now leave the priv as one of the members in the htb struct, to be replaced by channels in a future commit. Signed-off-by: Moshe Tal <moshet@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-19 13:32:52 -07:00
Saeed Mahameed	aaffda6b36	net/mlx5e: HTB, hide and dynamically allocate mlx5e_htb structure Move structure mlx5e_htb from the main driver include file "en.h" to be hidden in qos.c where the qos functionality is implemented, forward declare it for the rest of the driver and allocate it dynamically upon user demand only. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Moshe Tal <moshet@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>	2022-07-19 13:32:51 -07:00
Moshe Tal	db83f24d89	net/mlx5e: HTB, move stats and max_sqs to priv Preparation for dynamic allocation of the HTB struct. The statistics should be preserved even when the struct is de-allocated. Signed-off-by: Moshe Tal <moshet@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-19 13:32:51 -07:00
Moshe Tal	66d9593648	net/mlx5e: HTB, move section comment to the right place mlx5e_get_qos_sq is a part of the SQ lifecycle, so need be under the title. Signed-off-by: Moshe Tal <moshet@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-19 13:32:50 -07:00
Moshe Tal	4f8d1d3adc	net/mlx5e: HTB, move ids to selq_params struct HTB id fields are needed for selecting queue. Moving them to the selq_params struct will simplify synchronization between control flow and mlx5e_select_queues and will keep the IDs in the hot cacheline of mlx5e_selq_params. Replace mlx5e_selq_prepare() with separate functions that change subsets of parameters, while keeping the rest. This also will be useful to hide mlx5e_htb structure from the rest of the driver in a later patch in this series. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Moshe Tal <moshet@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>	2022-07-19 13:32:50 -07:00
Saeed Mahameed	efe317997e	net/mlx5e: HTB, reduce visibility of htb functions No need to expose all htb tc functions to the main driver file, expose only the master htb tc function mlx5e_htb_setup_tc() which selects the internal "now static" function to call. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Moshe Tal <moshet@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>	2022-07-19 13:32:49 -07:00
Moshe Tal	0bb7228f70	net/mlx5e: Fix mqprio_rl handling on devlink reload Keep mqprio_rl data to params and restore the configuration in case of devlink reload. Change the location of mqprio_rl resources cleanup so it will be done also in reload flow. Also, remove the rl pointer from the params, since this is dynamic object and saved to priv. Signed-off-by: Moshe Tal <moshet@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-19 13:32:49 -07:00
Gal Pressman	07071e47da	net/mlx5e: Report header-data split state through ethtool HW-GRO (SHAMPO) packet merger scheme implies header-data split in the driver, report it through the ethtool interface. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2022-07-19 13:32:48 -07:00
Marc Kleine-Budde	d79ee9a66a	Merge branch 'can-peak_usb-cleanups-and-updates' Stephane Grosjean contributes a peak-usb cleanup and update series. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-07-19 21:42:14 +02:00
Stephane Grosjean	4f23248246	can: peak_usb: include support for a new MCU The CANFD-USB PCAN-USB FD interface undergoes an internal component change that requires a slight modification of its drivers, which leads them to dynamically use endpoint numbers provided by the interface itself. In addition to a change in the calls to the USB functions exported by the kernel, the detection of the USB interface dedicated to CAN must also be modified, as some PEAK-System devices support other interfaces than CAN. Link: https://lore.kernel.org/all/20220719120632.26774-3-s.grosjean@peak-system.com Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> [mkl: add missing cpu_to_le16() conversion] [mkl: fix networking block comment style] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-07-19 21:41:38 +02:00
Stephane Grosjean	a0cf2fe6cf	can: peak_usb: correction of an initially misnamed field name The data structure returned from the USB device contains a number flashed by the user and not the serial number of the device. Link: https://lore.kernel.org/all/20220719120632.26774-2-s.grosjean@peak-system.com Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-07-19 21:22:05 +02:00
Stephane Grosjean	92505df464	can: peak_usb: pcan_dump_mem(): mark input prompt and data pointer as const Mark the input prompt and data pointer as const. Link: https://lore.kernel.org/all/20220719120632.26774-1-s.grosjean@peak-system.com Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> [mkl: mark data pointer as const, too; update commit message] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2022-07-19 21:20:12 +02:00

1 2 3 4 5 ...

1107814 Commits