From a95a4d9b39b0324402569ed7395aae59b8fd2b11 Mon Sep 17 00:00:00 2001 From: Magnus Karlsson Date: Mon, 28 Mar 2022 16:21:20 +0200 Subject: [PATCH 1/4] xsk: Do not write NULL in SW ring at allocation failure For the case when xp_alloc_batch() is used but the batched allocation cannot be used, there is a slow path that uses the non-batched xp_alloc(). When it fails to allocate an entry, it returns NULL. The current code wrote this NULL into the entry of the provided results array (pointer to the driver SW ring usually) and returned. This might not be what the driver expects and to make things simpler, just write successfully allocated xdp_buffs into the SW ring,. The driver might have information in there that is still important after an allocation failure. Note that at this point in time, there are no drivers using xp_alloc_batch() that could trigger this slow path. But one might get added. Fixes: 47e4075df300 ("xsk: Batched buffer allocation for the pool") Signed-off-by: Magnus Karlsson Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220328142123.170157-2-maciej.fijalkowski@intel.com --- net/xdp/xsk_buff_pool.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index b34fca6ada86..af040ffa14ff 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -591,9 +591,13 @@ u32 xp_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max) u32 nb_entries1 = 0, nb_entries2; if (unlikely(pool->dma_need_sync)) { + struct xdp_buff *buff; + /* Slow path */ - *xdp = xp_alloc(pool); - return !!*xdp; + buff = xp_alloc(pool); + if (buff) + *xdp = buff; + return !!buff; } if (unlikely(pool->free_list_cnt)) { From 30d19d57d513821c58de4556e7445982ed22b923 Mon Sep 17 00:00:00 2001 From: Magnus Karlsson Date: Mon, 28 Mar 2022 16:21:21 +0200 Subject: [PATCH 2/4] ice: xsk: Eliminate unnecessary loop iteration The NIC Tx ring completion routine cleans entries from the ring in batches. However, it processes one more batch than it is supposed to. Note that this does not matter from a functionality point of view since it will not find a set DD bit for the next batch and just exit the loop. But from a performance perspective, it is faster to terminate the loop before and not issue an expensive read over PCIe to get the DD bit. Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API") Signed-off-by: Magnus Karlsson Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220328142123.170157-3-maciej.fijalkowski@intel.com --- drivers/net/ethernet/intel/ice/ice_xsk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 88853a6ed931..51427cb4971a 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -754,7 +754,7 @@ skip: next_dd = next_dd + tx_thresh; if (next_dd >= desc_cnt) next_dd = tx_thresh - 1; - } while (budget--); + } while (--budget); xdp_ring->next_dd = next_dd; From 0ec1713009c5cc24244c918def1cd14080be27e3 Mon Sep 17 00:00:00 2001 From: Maciej Fijalkowski Date: Mon, 28 Mar 2022 16:21:22 +0200 Subject: [PATCH 3/4] ice: xsk: Stop Rx processing when ntc catches ntu This can happen with big budget values and some breakage of re-filling descriptors as we do not clear the entry that ntu is pointing at the end of ice_alloc_rx_bufs_zc. So if ntc is at ntu then it might be the case that status_error0 has an old, uncleared value and ntc would go over with processing which would result in false results. Break Rx loop when ntc == ntu to avoid broken behavior. Fixes: 3876ff525de7 ("ice: xsk: Handle SW XDP ring wrap and bump tail more often") Signed-off-by: Maciej Fijalkowski Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220328142123.170157-4-maciej.fijalkowski@intel.com --- drivers/net/ethernet/intel/ice/ice_xsk.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 51427cb4971a..dfbcaf08520e 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -608,6 +608,9 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget) */ dma_rmb(); + if (unlikely(rx_ring->next_to_clean == rx_ring->next_to_use)) + break; + xdp = *ice_xdp_buf(rx_ring, rx_ring->next_to_clean); size = le16_to_cpu(rx_desc->wb.pkt_len) & From 1ac2524de7b366633fc336db6c94062768d0ab03 Mon Sep 17 00:00:00 2001 From: Maciej Fijalkowski Date: Mon, 28 Mar 2022 16:21:23 +0200 Subject: [PATCH 4/4] ice: xsk: Fix indexing in ice_tx_xsk_pool() Ice driver tries to always create XDP rings array to be num_possible_cpus() sized, regardless of user's queue count setting that can be changed via ethtool -L for example. Currently, ice_tx_xsk_pool() calculates the qid by decrementing the ring->q_index by the count of XDP queues, but ring->q_index is set to 'i + vsi->alloc_txq'. When user did ethtool -L $IFACE combined 1, alloc_txq is 1, but vsi->num_xdp_txq is still num_possible_cpus(). Then, ice_tx_xsk_pool() will do OOB access and in the final result ring would not get xsk_pool pointer assigned. Then, each ice_xsk_wakeup() call will fail with error and it will not be possible to get into NAPI and do the processing from driver side. Fix this by decrementing vsi->alloc_txq instead of vsi->num_xdp_txq from ring-q_index in ice_tx_xsk_pool() so the calculation is reflected to the setting of ring->q_index. Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path") Signed-off-by: Maciej Fijalkowski Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220328142123.170157-5-maciej.fijalkowski@intel.com --- drivers/net/ethernet/intel/ice/ice.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index b0b27bfcd7a2..d4f1874df7d0 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -710,7 +710,7 @@ static inline struct xsk_buff_pool *ice_tx_xsk_pool(struct ice_tx_ring *ring) struct ice_vsi *vsi = ring->vsi; u16 qid; - qid = ring->q_index - vsi->num_xdp_txq; + qid = ring->q_index - vsi->alloc_txq; if (!ice_is_xdp_ena_vsi(vsi) || !test_bit(qid, vsi->af_xdp_zc_qps)) return NULL;