Merge branch 'xdp-preferred-busy-polling'
Björn Töpel says:
====================
This series introduces three new features:
1. A new "heavy traffic" busy-polling variant that works in concert
with the existing napi_defer_hard_irqs and gro_flush_timeout knobs.
2. A new socket option that let a user change the busy-polling NAPI
budget.
3. Allow busy-polling to be performed on XDP sockets.
The existing busy-polling mode, enabled by the SO_BUSY_POLL socket
option or system-wide using the /proc/sys/net/core/busy_read knob, is
an opportunistic. That means that if the NAPI context is not
scheduled, it will poll it. If, after busy-polling, the budget is
exceeded the busy-polling logic will schedule the NAPI onto the
regular softirq handling.
One implication of the behavior above is that a busy/heavy loaded NAPI
context will never enter/allow for busy-polling. Some applications
prefer that most NAPI processing would be done by busy-polling.
This series adds a new socket option, SO_PREFER_BUSY_POLL, that works
in concert with the napi_defer_hard_irqs and gro_flush_timeout
knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were
introduced in commit 6f8b12d661
("net: napi: add hard irqs deferral
feature"), and allows for a user to defer interrupts to be enabled and
instead schedule the NAPI context from a watchdog timer. When a user
enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled,
and the NAPI context is being processed by a softirq, the softirq NAPI
processing will exit early to allow the busy-polling to be performed.
If the application stops performing busy-polling via a system call,
the watchdog timer defined by gro_flush_timeout will timeout, and
regular softirq handling will resume.
In summary; Heavy traffic applications that prefer busy-polling over
softirq processing should use this option.
Patch 6 touches a lot of drivers, so the Cc: list is grossly long.
Example usage:
$ echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs
$ echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout
Note that the timeout should be larger than the userspace processing
window, otherwise the watchdog will timeout and fall back to regular
softirq processing.
Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket.
Performance simple UDP ping-pong:
A packet generator blasts UDP packets from a packet generator to a
certain {src,dst}IP/port, so a dedicated ksoftirq will be busy
handling the packets at a certain core.
A simple UDP test program that simply does recvfrom/sendto is running
at the host end. Throughput in pps and RTT latency is measured at the
packet generator.
/proc/sys/net/core/busy_read is set (20).
Min Max Avg (usec)
1. Blocking 2-cores: 490Kpps
1218.192 1335.427 1271.083
2. Blocking, 1-core: 155Kpps
1327.195 17294.855 4761.367
3. Non-blocking, 2-cores: 475Kpps
1221.197 1330.465 1270.740
4. Non-blocking, 1-core: 3Kpps
29006.482 37260.465 33128.367
5. Non-blocking, prefer busy-poll, 1-core: 420Kpps
1202.535 5494.052 4885.443
Scenario 2 and 5 shows when the new option should be used. Throughput
go from 155 to 420Kpps, average latency are similar, but the tail
latencies are much better for the latter.
Performance XDP sockets:
Again, a packet generator blasts UDP packets from a packet generator
to a certain {src,dst}IP/port.
Today, running XDP sockets sample on the same core as the softirq
handling, performance tanks mainly because we do not yield to
user-space when the XDP socket Rx queue is full.
# taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r
Rx: 64Kpps
# # preferred busy-polling, budget 8
# taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 8
Rx 9.9Mpps
# # preferred busy-polling, budget 64
# taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 64
Rx: 19.3Mpps
# # preferred busy-polling, budget 256
# taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 256
Rx: 21.4Mpps
# # preferred busy-polling, budget 512
# taskset -c 5 ./xdpsock -i ens785f1 -q 5 -n 1 -r -B -b 512
Rx: 21.7Mpps
Compared to the two-core case:
# taskset -c 4 ./xdpsock -i ens785f1 -q 20 -n 1 -r
Rx: 20.7Mpps
We're getting better single-core performance than two, for this naïve
drop scenario.
Performance netperf UDP_RR:
Note that netperf UDP_RR is not a heavy traffic tests, and preferred
busy-polling is not typically something we want to use here.
$ echo 20 | sudo tee /proc/sys/net/core/busy_read
$ netperf -H 192.168.1.1 -l 30 -t UDP_RR -v 2 -- \
-o min_latency,mean_latency,max_latency,stddev_latency,transaction_rate
busy-polling blocking sockets: 12,13.33,224,0.63,74731.177
I hacked netperf to use non-blocking sockets and re-ran:
busy-polling non-blocking sockets: 12,13.46,218,0.72,73991.172
prefer busy-polling non-blocking sockets: 12,13.62,221,0.59,73138.448
Using the preferred busy-polling mode does not impact performance.
The above tests was done for the 'ice' driver.
Thanks to Jakub for suggesting this busy-polling addition [1], and
Eric for all input/review!
Changes:
rfc-v1 [2] -> rfc-v2:
* Changed name from bias to prefer.
* Base the work on Eric's/Luigi's defer irq/gro timeout work.
* Proper GRO flushing.
* Build issues for some XDP drivers.
rfc-v2 [3] -> v1:
* Fixed broken qlogic build.
* Do not trigger an IPI (XDP socket wakeup) when busy-polling is
enabled.
v1 [4] -> v2:
* Added napi_id to socionext driver, and added Ilias Acked-by:. (Ilias)
* Added a samples patch to improve busy-polling for xdpsock/l2fwd.
* Correctly mark atomic operations with {WRITE,READ}_ONCE, to make
KCSAN and the code readers happy. (Eric)
* Check NAPI budget not to exceed U16_MAX. (Eric)
* Added kdoc.
v2 [5] -> v3:
* Collected Acked-by.
* Check NAPI disable prior prefer busy-polling. (Jakub)
* Added napi_id registration for virtio-net. (Michael)
* Added napi_id registration for veth.
v3 [6] -> v4:
* Collected Acked-by/Reviewed-by.
[1] https://lore.kernel.org/netdev/20200925120652.10b8d7c5@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
[2] https://lore.kernel.org/bpf/20201028133437.212503-1-bjorn.topel@gmail.com/
[3] https://lore.kernel.org/bpf/20201105102812.152836-1-bjorn.topel@gmail.com/
[4] https://lore.kernel.org/bpf/20201112114041.131998-1-bjorn.topel@gmail.com/
[5] https://lore.kernel.org/bpf/20201116110416.10719-1-bjorn.topel@gmail.com/
[6] https://lore.kernel.org/bpf/20201119083024.119566-1-bjorn.topel@gmail.com/
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This commit is contained in:
commit
df54228515
@ -124,6 +124,9 @@
|
||||
|
||||
#define SO_DETACH_REUSEPORT_BPF 68
|
||||
|
||||
#define SO_PREFER_BUSY_POLL 69
|
||||
#define SO_BUSY_POLL_BUDGET 70
|
||||
|
||||
#if !defined(__KERNEL__)
|
||||
|
||||
#if __BITS_PER_LONG == 64
|
||||
|
@ -135,6 +135,9 @@
|
||||
|
||||
#define SO_DETACH_REUSEPORT_BPF 68
|
||||
|
||||
#define SO_PREFER_BUSY_POLL 69
|
||||
#define SO_BUSY_POLL_BUDGET 70
|
||||
|
||||
#if !defined(__KERNEL__)
|
||||
|
||||
#if __BITS_PER_LONG == 64
|
||||
|
@ -116,6 +116,9 @@
|
||||
|
||||
#define SO_DETACH_REUSEPORT_BPF 0x4042
|
||||
|
||||
#define SO_PREFER_BUSY_POLL 0x4043
|
||||
#define SO_BUSY_POLL_BUDGET 0x4044
|
||||
|
||||
#if !defined(__KERNEL__)
|
||||
|
||||
#if __BITS_PER_LONG == 64
|
||||
|
@ -117,6 +117,9 @@
|
||||
|
||||
#define SO_DETACH_REUSEPORT_BPF 0x0047
|
||||
|
||||
#define SO_PREFER_BUSY_POLL 0x0048
|
||||
#define SO_BUSY_POLL_BUDGET 0x0049
|
||||
|
||||
#if !defined(__KERNEL__)
|
||||
|
||||
|
||||
|
@ -416,7 +416,7 @@ static int ena_xdp_register_rxq_info(struct ena_ring *rx_ring)
|
||||
{
|
||||
int rc;
|
||||
|
||||
rc = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev, rx_ring->qid);
|
||||
rc = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev, rx_ring->qid, 0);
|
||||
|
||||
if (rc) {
|
||||
netif_err(rx_ring->adapter, ifup, rx_ring->netdev,
|
||||
|
@ -2884,7 +2884,7 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
|
||||
if (rc)
|
||||
return rc;
|
||||
|
||||
rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i);
|
||||
rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i, 0);
|
||||
if (rc < 0)
|
||||
return rc;
|
||||
|
||||
|
@ -770,7 +770,7 @@ static void nicvf_rcv_queue_config(struct nicvf *nic, struct queue_set *qs,
|
||||
rq->caching = 1;
|
||||
|
||||
/* Driver have no proper error path for failed XDP RX-queue info reg */
|
||||
WARN_ON(xdp_rxq_info_reg(&rq->xdp_rxq, nic->netdev, qidx) < 0);
|
||||
WARN_ON(xdp_rxq_info_reg(&rq->xdp_rxq, nic->netdev, qidx, 0) < 0);
|
||||
|
||||
/* Send a mailbox msg to PF to config RQ */
|
||||
mbx.rq.msg = NIC_MBOX_MSG_RQ_CFG;
|
||||
|
@ -3334,7 +3334,7 @@ static int dpaa2_eth_setup_rx_flow(struct dpaa2_eth_priv *priv,
|
||||
return 0;
|
||||
|
||||
err = xdp_rxq_info_reg(&fq->channel->xdp_rxq, priv->net_dev,
|
||||
fq->flowid);
|
||||
fq->flowid, 0);
|
||||
if (err) {
|
||||
dev_err(dev, "xdp_rxq_info_reg failed\n");
|
||||
return err;
|
||||
|
@ -1447,7 +1447,7 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
|
||||
/* XDP RX-queue info only needed for RX rings exposed to XDP */
|
||||
if (rx_ring->vsi->type == I40E_VSI_MAIN) {
|
||||
err = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
|
||||
rx_ring->queue_index);
|
||||
rx_ring->queue_index, rx_ring->q_vector->napi.napi_id);
|
||||
if (err < 0)
|
||||
return err;
|
||||
}
|
||||
|
@ -306,7 +306,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
|
||||
if (!xdp_rxq_info_is_reg(&ring->xdp_rxq))
|
||||
/* coverity[check_return] */
|
||||
xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev,
|
||||
ring->q_index);
|
||||
ring->q_index, ring->q_vector->napi.napi_id);
|
||||
|
||||
ring->xsk_pool = ice_xsk_pool(ring);
|
||||
if (ring->xsk_pool) {
|
||||
@ -333,7 +333,7 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
|
||||
/* coverity[check_return] */
|
||||
xdp_rxq_info_reg(&ring->xdp_rxq,
|
||||
ring->netdev,
|
||||
ring->q_index);
|
||||
ring->q_index, ring->q_vector->napi.napi_id);
|
||||
|
||||
err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
|
||||
MEM_TYPE_PAGE_SHARED,
|
||||
|
@ -483,7 +483,7 @@ int ice_setup_rx_ring(struct ice_ring *rx_ring)
|
||||
if (rx_ring->vsi->type == ICE_VSI_PF &&
|
||||
!xdp_rxq_info_is_reg(&rx_ring->xdp_rxq))
|
||||
if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
|
||||
rx_ring->q_index))
|
||||
rx_ring->q_index, rx_ring->q_vector->napi.napi_id))
|
||||
goto err;
|
||||
return 0;
|
||||
|
||||
|
@ -4352,7 +4352,7 @@ int igb_setup_rx_resources(struct igb_ring *rx_ring)
|
||||
|
||||
/* XDP RX-queue info */
|
||||
if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
|
||||
rx_ring->queue_index) < 0)
|
||||
rx_ring->queue_index, 0) < 0)
|
||||
goto err;
|
||||
|
||||
return 0;
|
||||
|
@ -6577,7 +6577,7 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
|
||||
|
||||
/* XDP RX-queue info */
|
||||
if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, adapter->netdev,
|
||||
rx_ring->queue_index) < 0)
|
||||
rx_ring->queue_index, rx_ring->q_vector->napi.napi_id) < 0)
|
||||
goto err;
|
||||
|
||||
rx_ring->xdp_prog = adapter->xdp_prog;
|
||||
|
@ -3493,7 +3493,7 @@ int ixgbevf_setup_rx_resources(struct ixgbevf_adapter *adapter,
|
||||
|
||||
/* XDP RX-queue info */
|
||||
if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, adapter->netdev,
|
||||
rx_ring->queue_index) < 0)
|
||||
rx_ring->queue_index, 0) < 0)
|
||||
goto err;
|
||||
|
||||
rx_ring->xdp_prog = adapter->xdp_prog;
|
||||
|
@ -3227,7 +3227,7 @@ static int mvneta_create_page_pool(struct mvneta_port *pp,
|
||||
return err;
|
||||
}
|
||||
|
||||
err = xdp_rxq_info_reg(&rxq->xdp_rxq, pp->dev, rxq->id);
|
||||
err = xdp_rxq_info_reg(&rxq->xdp_rxq, pp->dev, rxq->id, 0);
|
||||
if (err < 0)
|
||||
goto err_free_pp;
|
||||
|
||||
|
@ -2614,11 +2614,11 @@ static int mvpp2_rxq_init(struct mvpp2_port *port,
|
||||
mvpp2_rxq_status_update(port, rxq->id, 0, rxq->size);
|
||||
|
||||
if (priv->percpu_pools) {
|
||||
err = xdp_rxq_info_reg(&rxq->xdp_rxq_short, port->dev, rxq->id);
|
||||
err = xdp_rxq_info_reg(&rxq->xdp_rxq_short, port->dev, rxq->id, 0);
|
||||
if (err < 0)
|
||||
goto err_free_dma;
|
||||
|
||||
err = xdp_rxq_info_reg(&rxq->xdp_rxq_long, port->dev, rxq->id);
|
||||
err = xdp_rxq_info_reg(&rxq->xdp_rxq_long, port->dev, rxq->id, 0);
|
||||
if (err < 0)
|
||||
goto err_unregister_rxq_short;
|
||||
|
||||
|
@ -283,7 +283,7 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
|
||||
ring->log_stride = ffs(ring->stride) - 1;
|
||||
ring->buf_size = ring->size * ring->stride + TXBB_SIZE;
|
||||
|
||||
if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index) < 0)
|
||||
if (xdp_rxq_info_reg(&ring->xdp_rxq, priv->dev, queue_index, 0) < 0)
|
||||
goto err_ring;
|
||||
|
||||
tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
|
||||
|
@ -434,7 +434,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
|
||||
rq_xdp_ix = rq->ix;
|
||||
if (xsk)
|
||||
rq_xdp_ix += params->num_channels * MLX5E_RQ_GROUP_XSK;
|
||||
err = xdp_rxq_info_reg(&rq->xdp_rxq, rq->netdev, rq_xdp_ix);
|
||||
err = xdp_rxq_info_reg(&rq->xdp_rxq, rq->netdev, rq_xdp_ix, 0);
|
||||
if (err < 0)
|
||||
goto err_rq_xdp_prog;
|
||||
|
||||
|
@ -2533,7 +2533,7 @@ nfp_net_rx_ring_alloc(struct nfp_net_dp *dp, struct nfp_net_rx_ring *rx_ring)
|
||||
|
||||
if (dp->netdev) {
|
||||
err = xdp_rxq_info_reg(&rx_ring->xdp_rxq, dp->netdev,
|
||||
rx_ring->idx);
|
||||
rx_ring->idx, rx_ring->r_vec->napi.napi_id);
|
||||
if (err < 0)
|
||||
return err;
|
||||
}
|
||||
|
@ -1762,7 +1762,7 @@ static void qede_init_fp(struct qede_dev *edev)
|
||||
|
||||
/* Driver have no error path from here */
|
||||
WARN_ON(xdp_rxq_info_reg(&fp->rxq->xdp_rxq, edev->ndev,
|
||||
fp->rxq->rxq_id) < 0);
|
||||
fp->rxq->rxq_id, 0) < 0);
|
||||
|
||||
if (xdp_rxq_info_reg_mem_model(&fp->rxq->xdp_rxq,
|
||||
MEM_TYPE_PAGE_ORDER0,
|
||||
|
@ -262,7 +262,7 @@ void efx_init_rx_queue(struct efx_rx_queue *rx_queue)
|
||||
|
||||
/* Initialise XDP queue information */
|
||||
rc = xdp_rxq_info_reg(&rx_queue->xdp_rxq_info, efx->net_dev,
|
||||
rx_queue->core_index);
|
||||
rx_queue->core_index, 0);
|
||||
|
||||
if (rc) {
|
||||
netif_err(efx, rx_err, efx->net_dev,
|
||||
|
@ -1304,7 +1304,7 @@ static int netsec_setup_rx_dring(struct netsec_priv *priv)
|
||||
goto err_out;
|
||||
}
|
||||
|
||||
err = xdp_rxq_info_reg(&dring->xdp_rxq, priv->ndev, 0);
|
||||
err = xdp_rxq_info_reg(&dring->xdp_rxq, priv->ndev, 0, priv->napi.napi_id);
|
||||
if (err)
|
||||
goto err_out;
|
||||
|
||||
|
@ -1186,7 +1186,7 @@ static int cpsw_ndev_create_xdp_rxq(struct cpsw_priv *priv, int ch)
|
||||
pool = cpsw->page_pool[ch];
|
||||
rxq = &priv->xdp_rxq[ch];
|
||||
|
||||
ret = xdp_rxq_info_reg(rxq, priv->ndev, ch);
|
||||
ret = xdp_rxq_info_reg(rxq, priv->ndev, ch, 0);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
|
@ -1499,7 +1499,7 @@ struct netvsc_device *netvsc_device_add(struct hv_device *device,
|
||||
u64_stats_init(&nvchan->tx_stats.syncp);
|
||||
u64_stats_init(&nvchan->rx_stats.syncp);
|
||||
|
||||
ret = xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i);
|
||||
ret = xdp_rxq_info_reg(&nvchan->xdp_rxq, ndev, i, 0);
|
||||
|
||||
if (ret) {
|
||||
netdev_err(ndev, "xdp_rxq_info_reg fail: %d\n", ret);
|
||||
|
@ -780,7 +780,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
|
||||
} else {
|
||||
/* Setup XDP RX-queue info, for new tfile getting attached */
|
||||
err = xdp_rxq_info_reg(&tfile->xdp_rxq,
|
||||
tun->dev, tfile->queue_index);
|
||||
tun->dev, tfile->queue_index, 0);
|
||||
if (err < 0)
|
||||
goto out;
|
||||
err = xdp_rxq_info_reg_mem_model(&tfile->xdp_rxq,
|
||||
|
@ -884,7 +884,6 @@ static int veth_napi_add(struct net_device *dev)
|
||||
for (i = 0; i < dev->real_num_rx_queues; i++) {
|
||||
struct veth_rq *rq = &priv->rq[i];
|
||||
|
||||
netif_napi_add(dev, &rq->xdp_napi, veth_poll, NAPI_POLL_WEIGHT);
|
||||
napi_enable(&rq->xdp_napi);
|
||||
}
|
||||
|
||||
@ -926,7 +925,8 @@ static int veth_enable_xdp(struct net_device *dev)
|
||||
for (i = 0; i < dev->real_num_rx_queues; i++) {
|
||||
struct veth_rq *rq = &priv->rq[i];
|
||||
|
||||
err = xdp_rxq_info_reg(&rq->xdp_rxq, dev, i);
|
||||
netif_napi_add(dev, &rq->xdp_napi, veth_poll, NAPI_POLL_WEIGHT);
|
||||
err = xdp_rxq_info_reg(&rq->xdp_rxq, dev, i, rq->xdp_napi.napi_id);
|
||||
if (err < 0)
|
||||
goto err_rxq_reg;
|
||||
|
||||
@ -952,8 +952,12 @@ static int veth_enable_xdp(struct net_device *dev)
|
||||
err_reg_mem:
|
||||
xdp_rxq_info_unreg(&priv->rq[i].xdp_rxq);
|
||||
err_rxq_reg:
|
||||
for (i--; i >= 0; i--)
|
||||
xdp_rxq_info_unreg(&priv->rq[i].xdp_rxq);
|
||||
for (i--; i >= 0; i--) {
|
||||
struct veth_rq *rq = &priv->rq[i];
|
||||
|
||||
xdp_rxq_info_unreg(&rq->xdp_rxq);
|
||||
netif_napi_del(&rq->xdp_napi);
|
||||
}
|
||||
|
||||
return err;
|
||||
}
|
||||
|
@ -1485,7 +1485,7 @@ static int virtnet_open(struct net_device *dev)
|
||||
if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
|
||||
schedule_delayed_work(&vi->refill, 0);
|
||||
|
||||
err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i);
|
||||
err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i, vi->rq[i].napi.napi_id);
|
||||
if (err < 0)
|
||||
return err;
|
||||
|
||||
|
@ -2014,7 +2014,7 @@ static int xennet_create_page_pool(struct netfront_queue *queue)
|
||||
}
|
||||
|
||||
err = xdp_rxq_info_reg(&queue->xdp_rxq, queue->info->netdev,
|
||||
queue->id);
|
||||
queue->id, 0);
|
||||
if (err) {
|
||||
netdev_err(queue->info->netdev, "xdp_rxq_info_reg failed\n");
|
||||
goto err_free_pp;
|
||||
|
@ -397,7 +397,8 @@ static void ep_busy_loop(struct eventpoll *ep, int nonblock)
|
||||
unsigned int napi_id = READ_ONCE(ep->napi_id);
|
||||
|
||||
if ((napi_id >= MIN_NAPI_ID) && net_busy_loop_on())
|
||||
napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep);
|
||||
napi_busy_loop(napi_id, nonblock ? NULL : ep_busy_loop_end, ep, false,
|
||||
BUSY_POLL_BUDGET);
|
||||
}
|
||||
|
||||
static inline void ep_reset_busy_poll_napi_id(struct eventpoll *ep)
|
||||
|
@ -350,23 +350,25 @@ struct napi_struct {
|
||||
};
|
||||
|
||||
enum {
|
||||
NAPI_STATE_SCHED, /* Poll is scheduled */
|
||||
NAPI_STATE_MISSED, /* reschedule a napi */
|
||||
NAPI_STATE_DISABLE, /* Disable pending */
|
||||
NAPI_STATE_NPSVC, /* Netpoll - don't dequeue from poll_list */
|
||||
NAPI_STATE_LISTED, /* NAPI added to system lists */
|
||||
NAPI_STATE_NO_BUSY_POLL,/* Do not add in napi_hash, no busy polling */
|
||||
NAPI_STATE_IN_BUSY_POLL,/* sk_busy_loop() owns this NAPI */
|
||||
NAPI_STATE_SCHED, /* Poll is scheduled */
|
||||
NAPI_STATE_MISSED, /* reschedule a napi */
|
||||
NAPI_STATE_DISABLE, /* Disable pending */
|
||||
NAPI_STATE_NPSVC, /* Netpoll - don't dequeue from poll_list */
|
||||
NAPI_STATE_LISTED, /* NAPI added to system lists */
|
||||
NAPI_STATE_NO_BUSY_POLL, /* Do not add in napi_hash, no busy polling */
|
||||
NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */
|
||||
NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/
|
||||
};
|
||||
|
||||
enum {
|
||||
NAPIF_STATE_SCHED = BIT(NAPI_STATE_SCHED),
|
||||
NAPIF_STATE_MISSED = BIT(NAPI_STATE_MISSED),
|
||||
NAPIF_STATE_DISABLE = BIT(NAPI_STATE_DISABLE),
|
||||
NAPIF_STATE_NPSVC = BIT(NAPI_STATE_NPSVC),
|
||||
NAPIF_STATE_LISTED = BIT(NAPI_STATE_LISTED),
|
||||
NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL),
|
||||
NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL),
|
||||
NAPIF_STATE_SCHED = BIT(NAPI_STATE_SCHED),
|
||||
NAPIF_STATE_MISSED = BIT(NAPI_STATE_MISSED),
|
||||
NAPIF_STATE_DISABLE = BIT(NAPI_STATE_DISABLE),
|
||||
NAPIF_STATE_NPSVC = BIT(NAPI_STATE_NPSVC),
|
||||
NAPIF_STATE_LISTED = BIT(NAPI_STATE_LISTED),
|
||||
NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL),
|
||||
NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL),
|
||||
NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL),
|
||||
};
|
||||
|
||||
enum gro_result {
|
||||
@ -437,6 +439,11 @@ static inline bool napi_disable_pending(struct napi_struct *n)
|
||||
return test_bit(NAPI_STATE_DISABLE, &n->state);
|
||||
}
|
||||
|
||||
static inline bool napi_prefer_busy_poll(struct napi_struct *n)
|
||||
{
|
||||
return test_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state);
|
||||
}
|
||||
|
||||
bool napi_schedule_prep(struct napi_struct *n);
|
||||
|
||||
/**
|
||||
|
@ -23,6 +23,8 @@
|
||||
*/
|
||||
#define MIN_NAPI_ID ((unsigned int)(NR_CPUS + 1))
|
||||
|
||||
#define BUSY_POLL_BUDGET 8
|
||||
|
||||
#ifdef CONFIG_NET_RX_BUSY_POLL
|
||||
|
||||
struct napi_struct;
|
||||
@ -43,7 +45,7 @@ bool sk_busy_loop_end(void *p, unsigned long start_time);
|
||||
|
||||
void napi_busy_loop(unsigned int napi_id,
|
||||
bool (*loop_end)(void *, unsigned long),
|
||||
void *loop_end_arg);
|
||||
void *loop_end_arg, bool prefer_busy_poll, u16 budget);
|
||||
|
||||
#else /* CONFIG_NET_RX_BUSY_POLL */
|
||||
static inline unsigned long net_busy_loop_on(void)
|
||||
@ -105,7 +107,9 @@ static inline void sk_busy_loop(struct sock *sk, int nonblock)
|
||||
unsigned int napi_id = READ_ONCE(sk->sk_napi_id);
|
||||
|
||||
if (napi_id >= MIN_NAPI_ID)
|
||||
napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk);
|
||||
napi_busy_loop(napi_id, nonblock ? NULL : sk_busy_loop_end, sk,
|
||||
READ_ONCE(sk->sk_prefer_busy_poll),
|
||||
READ_ONCE(sk->sk_busy_poll_budget) ?: BUSY_POLL_BUDGET);
|
||||
#endif
|
||||
}
|
||||
|
||||
@ -131,14 +135,25 @@ static inline void sk_mark_napi_id(struct sock *sk, const struct sk_buff *skb)
|
||||
sk_rx_queue_set(sk, skb);
|
||||
}
|
||||
|
||||
static inline void __sk_mark_napi_id_once_xdp(struct sock *sk, unsigned int napi_id)
|
||||
{
|
||||
#ifdef CONFIG_NET_RX_BUSY_POLL
|
||||
if (!READ_ONCE(sk->sk_napi_id))
|
||||
WRITE_ONCE(sk->sk_napi_id, napi_id);
|
||||
#endif
|
||||
}
|
||||
|
||||
/* variant used for unconnected sockets */
|
||||
static inline void sk_mark_napi_id_once(struct sock *sk,
|
||||
const struct sk_buff *skb)
|
||||
{
|
||||
#ifdef CONFIG_NET_RX_BUSY_POLL
|
||||
if (!READ_ONCE(sk->sk_napi_id))
|
||||
WRITE_ONCE(sk->sk_napi_id, skb->napi_id);
|
||||
#endif
|
||||
__sk_mark_napi_id_once_xdp(sk, skb->napi_id);
|
||||
}
|
||||
|
||||
static inline void sk_mark_napi_id_once_xdp(struct sock *sk,
|
||||
const struct xdp_buff *xdp)
|
||||
{
|
||||
__sk_mark_napi_id_once_xdp(sk, xdp->rxq->napi_id);
|
||||
}
|
||||
|
||||
#endif /* _LINUX_NET_BUSY_POLL_H */
|
||||
|
@ -301,6 +301,8 @@ struct bpf_local_storage;
|
||||
* @sk_ack_backlog: current listen backlog
|
||||
* @sk_max_ack_backlog: listen backlog set in listen()
|
||||
* @sk_uid: user id of owner
|
||||
* @sk_prefer_busy_poll: prefer busypolling over softirq processing
|
||||
* @sk_busy_poll_budget: napi processing budget when busypolling
|
||||
* @sk_priority: %SO_PRIORITY setting
|
||||
* @sk_type: socket type (%SOCK_STREAM, etc)
|
||||
* @sk_protocol: which protocol this socket belongs in this network family
|
||||
@ -479,6 +481,10 @@ struct sock {
|
||||
u32 sk_ack_backlog;
|
||||
u32 sk_max_ack_backlog;
|
||||
kuid_t sk_uid;
|
||||
#ifdef CONFIG_NET_RX_BUSY_POLL
|
||||
u8 sk_prefer_busy_poll;
|
||||
u16 sk_busy_poll_budget;
|
||||
#endif
|
||||
struct pid *sk_peer_pid;
|
||||
const struct cred *sk_peer_cred;
|
||||
long sk_rcvtimeo;
|
||||
|
@ -59,6 +59,7 @@ struct xdp_rxq_info {
|
||||
u32 queue_index;
|
||||
u32 reg_state;
|
||||
struct xdp_mem_info mem;
|
||||
unsigned int napi_id;
|
||||
} ____cacheline_aligned; /* perf critical, avoid false-sharing */
|
||||
|
||||
struct xdp_txq_info {
|
||||
@ -226,7 +227,7 @@ static inline void xdp_release_frame(struct xdp_frame *xdpf)
|
||||
}
|
||||
|
||||
int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
|
||||
struct net_device *dev, u32 queue_index);
|
||||
struct net_device *dev, u32 queue_index, unsigned int napi_id);
|
||||
void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq);
|
||||
void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq);
|
||||
bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq);
|
||||
|
@ -119,6 +119,9 @@
|
||||
|
||||
#define SO_DETACH_REUSEPORT_BPF 68
|
||||
|
||||
#define SO_PREFER_BUSY_POLL 69
|
||||
#define SO_BUSY_POLL_BUDGET 70
|
||||
|
||||
#if !defined(__KERNEL__)
|
||||
|
||||
#if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
|
||||
|
@ -6458,7 +6458,8 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
|
||||
|
||||
WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
|
||||
|
||||
new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED);
|
||||
new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
|
||||
NAPIF_STATE_PREFER_BUSY_POLL);
|
||||
|
||||
/* If STATE_MISSED was set, leave STATE_SCHED set,
|
||||
* because we will call napi->poll() one more time.
|
||||
@ -6495,10 +6496,30 @@ static struct napi_struct *napi_by_id(unsigned int napi_id)
|
||||
|
||||
#if defined(CONFIG_NET_RX_BUSY_POLL)
|
||||
|
||||
#define BUSY_POLL_BUDGET 8
|
||||
|
||||
static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
|
||||
static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule)
|
||||
{
|
||||
if (!skip_schedule) {
|
||||
gro_normal_list(napi);
|
||||
__napi_schedule(napi);
|
||||
return;
|
||||
}
|
||||
|
||||
if (napi->gro_bitmask) {
|
||||
/* flush too old packets
|
||||
* If HZ < 1000, flush all packets.
|
||||
*/
|
||||
napi_gro_flush(napi, HZ >= 1000);
|
||||
}
|
||||
|
||||
gro_normal_list(napi);
|
||||
clear_bit(NAPI_STATE_SCHED, &napi->state);
|
||||
}
|
||||
|
||||
static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool prefer_busy_poll,
|
||||
u16 budget)
|
||||
{
|
||||
bool skip_schedule = false;
|
||||
unsigned long timeout;
|
||||
int rc;
|
||||
|
||||
/* Busy polling means there is a high chance device driver hard irq
|
||||
@ -6515,29 +6536,33 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
|
||||
|
||||
local_bh_disable();
|
||||
|
||||
if (prefer_busy_poll) {
|
||||
napi->defer_hard_irqs_count = READ_ONCE(napi->dev->napi_defer_hard_irqs);
|
||||
timeout = READ_ONCE(napi->dev->gro_flush_timeout);
|
||||
if (napi->defer_hard_irqs_count && timeout) {
|
||||
hrtimer_start(&napi->timer, ns_to_ktime(timeout), HRTIMER_MODE_REL_PINNED);
|
||||
skip_schedule = true;
|
||||
}
|
||||
}
|
||||
|
||||
/* All we really want here is to re-enable device interrupts.
|
||||
* Ideally, a new ndo_busy_poll_stop() could avoid another round.
|
||||
*/
|
||||
rc = napi->poll(napi, BUSY_POLL_BUDGET);
|
||||
rc = napi->poll(napi, budget);
|
||||
/* We can't gro_normal_list() here, because napi->poll() might have
|
||||
* rearmed the napi (napi_complete_done()) in which case it could
|
||||
* already be running on another CPU.
|
||||
*/
|
||||
trace_napi_poll(napi, rc, BUSY_POLL_BUDGET);
|
||||
trace_napi_poll(napi, rc, budget);
|
||||
netpoll_poll_unlock(have_poll_lock);
|
||||
if (rc == BUSY_POLL_BUDGET) {
|
||||
/* As the whole budget was spent, we still own the napi so can
|
||||
* safely handle the rx_list.
|
||||
*/
|
||||
gro_normal_list(napi);
|
||||
__napi_schedule(napi);
|
||||
}
|
||||
if (rc == budget)
|
||||
__busy_poll_stop(napi, skip_schedule);
|
||||
local_bh_enable();
|
||||
}
|
||||
|
||||
void napi_busy_loop(unsigned int napi_id,
|
||||
bool (*loop_end)(void *, unsigned long),
|
||||
void *loop_end_arg)
|
||||
void *loop_end_arg, bool prefer_busy_poll, u16 budget)
|
||||
{
|
||||
unsigned long start_time = loop_end ? busy_loop_current_time() : 0;
|
||||
int (*napi_poll)(struct napi_struct *napi, int budget);
|
||||
@ -6565,17 +6590,23 @@ restart:
|
||||
* we avoid dirtying napi->state as much as we can.
|
||||
*/
|
||||
if (val & (NAPIF_STATE_DISABLE | NAPIF_STATE_SCHED |
|
||||
NAPIF_STATE_IN_BUSY_POLL))
|
||||
NAPIF_STATE_IN_BUSY_POLL)) {
|
||||
if (prefer_busy_poll)
|
||||
set_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
|
||||
goto count;
|
||||
}
|
||||
if (cmpxchg(&napi->state, val,
|
||||
val | NAPIF_STATE_IN_BUSY_POLL |
|
||||
NAPIF_STATE_SCHED) != val)
|
||||
NAPIF_STATE_SCHED) != val) {
|
||||
if (prefer_busy_poll)
|
||||
set_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
|
||||
goto count;
|
||||
}
|
||||
have_poll_lock = netpoll_poll_lock(napi);
|
||||
napi_poll = napi->poll;
|
||||
}
|
||||
work = napi_poll(napi, BUSY_POLL_BUDGET);
|
||||
trace_napi_poll(napi, work, BUSY_POLL_BUDGET);
|
||||
work = napi_poll(napi, budget);
|
||||
trace_napi_poll(napi, work, budget);
|
||||
gro_normal_list(napi);
|
||||
count:
|
||||
if (work > 0)
|
||||
@ -6588,7 +6619,7 @@ count:
|
||||
|
||||
if (unlikely(need_resched())) {
|
||||
if (napi_poll)
|
||||
busy_poll_stop(napi, have_poll_lock);
|
||||
busy_poll_stop(napi, have_poll_lock, prefer_busy_poll, budget);
|
||||
preempt_enable();
|
||||
rcu_read_unlock();
|
||||
cond_resched();
|
||||
@ -6599,7 +6630,7 @@ count:
|
||||
cpu_relax();
|
||||
}
|
||||
if (napi_poll)
|
||||
busy_poll_stop(napi, have_poll_lock);
|
||||
busy_poll_stop(napi, have_poll_lock, prefer_busy_poll, budget);
|
||||
preempt_enable();
|
||||
out:
|
||||
rcu_read_unlock();
|
||||
@ -6650,8 +6681,10 @@ static enum hrtimer_restart napi_watchdog(struct hrtimer *timer)
|
||||
* NAPI_STATE_MISSED, since we do not react to a device IRQ.
|
||||
*/
|
||||
if (!napi_disable_pending(napi) &&
|
||||
!test_and_set_bit(NAPI_STATE_SCHED, &napi->state))
|
||||
!test_and_set_bit(NAPI_STATE_SCHED, &napi->state)) {
|
||||
clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
|
||||
__napi_schedule_irqoff(napi);
|
||||
}
|
||||
|
||||
return HRTIMER_NORESTART;
|
||||
}
|
||||
@ -6709,6 +6742,7 @@ void napi_disable(struct napi_struct *n)
|
||||
|
||||
hrtimer_cancel(&n->timer);
|
||||
|
||||
clear_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state);
|
||||
clear_bit(NAPI_STATE_DISABLE, &n->state);
|
||||
}
|
||||
EXPORT_SYMBOL(napi_disable);
|
||||
@ -6781,6 +6815,19 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
/* The NAPI context has more processing work, but busy-polling
|
||||
* is preferred. Exit early.
|
||||
*/
|
||||
if (napi_prefer_busy_poll(n)) {
|
||||
if (napi_complete_done(n, work)) {
|
||||
/* If timeout is not set, we need to make sure
|
||||
* that the NAPI is re-scheduled.
|
||||
*/
|
||||
napi_schedule(n);
|
||||
}
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
if (n->gro_bitmask) {
|
||||
/* flush too old packets
|
||||
* If HZ < 1000, flush all packets.
|
||||
@ -9763,7 +9810,7 @@ static int netif_alloc_rx_queues(struct net_device *dev)
|
||||
rx[i].dev = dev;
|
||||
|
||||
/* XDP RX-queue setup */
|
||||
err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i);
|
||||
err = xdp_rxq_info_reg(&rx[i].xdp_rxq, dev, i, 0);
|
||||
if (err < 0)
|
||||
goto err_rxq_info;
|
||||
}
|
||||
|
@ -1159,6 +1159,22 @@ set_sndbuf:
|
||||
sk->sk_ll_usec = val;
|
||||
}
|
||||
break;
|
||||
case SO_PREFER_BUSY_POLL:
|
||||
if (valbool && !capable(CAP_NET_ADMIN))
|
||||
ret = -EPERM;
|
||||
else
|
||||
WRITE_ONCE(sk->sk_prefer_busy_poll, valbool);
|
||||
break;
|
||||
case SO_BUSY_POLL_BUDGET:
|
||||
if (val > READ_ONCE(sk->sk_busy_poll_budget) && !capable(CAP_NET_ADMIN)) {
|
||||
ret = -EPERM;
|
||||
} else {
|
||||
if (val < 0 || val > U16_MAX)
|
||||
ret = -EINVAL;
|
||||
else
|
||||
WRITE_ONCE(sk->sk_busy_poll_budget, val);
|
||||
}
|
||||
break;
|
||||
#endif
|
||||
|
||||
case SO_MAX_PACING_RATE:
|
||||
@ -1523,6 +1539,9 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
|
||||
case SO_BUSY_POLL:
|
||||
v.val = sk->sk_ll_usec;
|
||||
break;
|
||||
case SO_PREFER_BUSY_POLL:
|
||||
v.val = READ_ONCE(sk->sk_prefer_busy_poll);
|
||||
break;
|
||||
#endif
|
||||
|
||||
case SO_MAX_PACING_RATE:
|
||||
|
@ -158,7 +158,7 @@ static void xdp_rxq_info_init(struct xdp_rxq_info *xdp_rxq)
|
||||
|
||||
/* Returns 0 on success, negative on failure */
|
||||
int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
|
||||
struct net_device *dev, u32 queue_index)
|
||||
struct net_device *dev, u32 queue_index, unsigned int napi_id)
|
||||
{
|
||||
if (xdp_rxq->reg_state == REG_STATE_UNUSED) {
|
||||
WARN(1, "Driver promised not to register this");
|
||||
@ -179,6 +179,7 @@ int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
|
||||
xdp_rxq_info_init(xdp_rxq);
|
||||
xdp_rxq->dev = dev;
|
||||
xdp_rxq->queue_index = queue_index;
|
||||
xdp_rxq->napi_id = napi_id;
|
||||
|
||||
xdp_rxq->reg_state = REG_STATE_REGISTERED;
|
||||
return 0;
|
||||
|
@ -23,6 +23,7 @@
|
||||
#include <linux/netdevice.h>
|
||||
#include <linux/rculist.h>
|
||||
#include <net/xdp_sock_drv.h>
|
||||
#include <net/busy_poll.h>
|
||||
#include <net/xdp.h>
|
||||
|
||||
#include "xsk_queue.h"
|
||||
@ -232,6 +233,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp,
|
||||
if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
|
||||
return -EINVAL;
|
||||
|
||||
sk_mark_napi_id_once_xdp(&xs->sk, xdp);
|
||||
len = xdp->data_end - xdp->data;
|
||||
|
||||
return xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL ?
|
||||
@ -517,18 +519,65 @@ static int __xsk_sendmsg(struct sock *sk)
|
||||
return xs->zc ? xsk_zc_xmit(xs) : xsk_generic_xmit(sk);
|
||||
}
|
||||
|
||||
static bool xsk_no_wakeup(struct sock *sk)
|
||||
{
|
||||
#ifdef CONFIG_NET_RX_BUSY_POLL
|
||||
/* Prefer busy-polling, skip the wakeup. */
|
||||
return READ_ONCE(sk->sk_prefer_busy_poll) && READ_ONCE(sk->sk_ll_usec) &&
|
||||
READ_ONCE(sk->sk_napi_id) >= MIN_NAPI_ID;
|
||||
#else
|
||||
return false;
|
||||
#endif
|
||||
}
|
||||
|
||||
static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
|
||||
{
|
||||
bool need_wait = !(m->msg_flags & MSG_DONTWAIT);
|
||||
struct sock *sk = sock->sk;
|
||||
struct xdp_sock *xs = xdp_sk(sk);
|
||||
struct xsk_buff_pool *pool;
|
||||
|
||||
if (unlikely(!xsk_is_bound(xs)))
|
||||
return -ENXIO;
|
||||
if (unlikely(need_wait))
|
||||
return -EOPNOTSUPP;
|
||||
|
||||
return __xsk_sendmsg(sk);
|
||||
if (sk_can_busy_loop(sk))
|
||||
sk_busy_loop(sk, 1); /* only support non-blocking sockets */
|
||||
|
||||
if (xsk_no_wakeup(sk))
|
||||
return 0;
|
||||
|
||||
pool = xs->pool;
|
||||
if (pool->cached_need_wakeup & XDP_WAKEUP_TX)
|
||||
return __xsk_sendmsg(sk);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int xsk_recvmsg(struct socket *sock, struct msghdr *m, size_t len, int flags)
|
||||
{
|
||||
bool need_wait = !(flags & MSG_DONTWAIT);
|
||||
struct sock *sk = sock->sk;
|
||||
struct xdp_sock *xs = xdp_sk(sk);
|
||||
|
||||
if (unlikely(!(xs->dev->flags & IFF_UP)))
|
||||
return -ENETDOWN;
|
||||
if (unlikely(!xs->rx))
|
||||
return -ENOBUFS;
|
||||
if (unlikely(!xsk_is_bound(xs)))
|
||||
return -ENXIO;
|
||||
if (unlikely(need_wait))
|
||||
return -EOPNOTSUPP;
|
||||
|
||||
if (sk_can_busy_loop(sk))
|
||||
sk_busy_loop(sk, 1); /* only support non-blocking sockets */
|
||||
|
||||
if (xsk_no_wakeup(sk))
|
||||
return 0;
|
||||
|
||||
if (xs->pool->cached_need_wakeup & XDP_WAKEUP_RX && xs->zc)
|
||||
return xsk_wakeup(xs, XDP_WAKEUP_RX);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static __poll_t xsk_poll(struct file *file, struct socket *sock,
|
||||
@ -1191,7 +1240,7 @@ static const struct proto_ops xsk_proto_ops = {
|
||||
.setsockopt = xsk_setsockopt,
|
||||
.getsockopt = xsk_getsockopt,
|
||||
.sendmsg = xsk_sendmsg,
|
||||
.recvmsg = sock_no_recvmsg,
|
||||
.recvmsg = xsk_recvmsg,
|
||||
.mmap = xsk_mmap,
|
||||
.sendpage = sock_no_sendpage,
|
||||
};
|
||||
|
@ -144,14 +144,13 @@ static int __xp_assign_dev(struct xsk_buff_pool *pool,
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
if (flags & XDP_USE_NEED_WAKEUP) {
|
||||
if (flags & XDP_USE_NEED_WAKEUP)
|
||||
pool->uses_need_wakeup = true;
|
||||
/* Tx needs to be explicitly woken up the first time.
|
||||
* Also for supporting drivers that do not implement this
|
||||
* feature. They will always have to call sendto().
|
||||
*/
|
||||
pool->cached_need_wakeup = XDP_WAKEUP_TX;
|
||||
}
|
||||
/* Tx needs to be explicitly woken up the first time. Also
|
||||
* for supporting drivers that do not implement this
|
||||
* feature. They will always have to call sendto() or poll().
|
||||
*/
|
||||
pool->cached_need_wakeup = XDP_WAKEUP_TX;
|
||||
|
||||
dev_hold(netdev);
|
||||
|
||||
|
@ -95,6 +95,7 @@ static int opt_timeout = 1000;
|
||||
static bool opt_need_wakeup = true;
|
||||
static u32 opt_num_xsks = 1;
|
||||
static u32 prog_id;
|
||||
static bool opt_busy_poll;
|
||||
|
||||
struct xsk_ring_stats {
|
||||
unsigned long rx_npkts;
|
||||
@ -911,6 +912,7 @@ static struct option long_options[] = {
|
||||
{"quiet", no_argument, 0, 'Q'},
|
||||
{"app-stats", no_argument, 0, 'a'},
|
||||
{"irq-string", no_argument, 0, 'I'},
|
||||
{"busy-poll", no_argument, 0, 'B'},
|
||||
{0, 0, 0, 0}
|
||||
};
|
||||
|
||||
@ -949,6 +951,7 @@ static void usage(const char *prog)
|
||||
" -Q, --quiet Do not display any stats.\n"
|
||||
" -a, --app-stats Display application (syscall) statistics.\n"
|
||||
" -I, --irq-string Display driver interrupt statistics for interface associated with irq-string.\n"
|
||||
" -B, --busy-poll Busy poll.\n"
|
||||
"\n";
|
||||
fprintf(stderr, str, prog, XSK_UMEM__DEFAULT_FRAME_SIZE,
|
||||
opt_batch_size, MIN_PKT_SIZE, MIN_PKT_SIZE,
|
||||
@ -964,7 +967,7 @@ static void parse_command_line(int argc, char **argv)
|
||||
opterr = 0;
|
||||
|
||||
for (;;) {
|
||||
c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:xQaI:",
|
||||
c = getopt_long(argc, argv, "Frtli:q:pSNn:czf:muMd:b:C:s:P:xQaI:B",
|
||||
long_options, &option_index);
|
||||
if (c == -1)
|
||||
break;
|
||||
@ -1062,7 +1065,9 @@ static void parse_command_line(int argc, char **argv)
|
||||
fprintf(stderr, "ERROR: Failed to get irqs for %s\n", opt_irq_str);
|
||||
usage(basename(argv[0]));
|
||||
}
|
||||
|
||||
break;
|
||||
case 'B':
|
||||
opt_busy_poll = 1;
|
||||
break;
|
||||
default:
|
||||
usage(basename(argv[0]));
|
||||
@ -1098,8 +1103,7 @@ static void kick_tx(struct xsk_socket_info *xsk)
|
||||
exit_with_error(errno);
|
||||
}
|
||||
|
||||
static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
|
||||
struct pollfd *fds)
|
||||
static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk)
|
||||
{
|
||||
struct xsk_umem_info *umem = xsk->umem;
|
||||
u32 idx_cq = 0, idx_fq = 0;
|
||||
@ -1132,9 +1136,10 @@ static inline void complete_tx_l2fwd(struct xsk_socket_info *xsk,
|
||||
while (ret != rcvd) {
|
||||
if (ret < 0)
|
||||
exit_with_error(-ret);
|
||||
if (xsk_ring_prod__needs_wakeup(&umem->fq)) {
|
||||
if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&umem->fq)) {
|
||||
xsk->app_stats.fill_fail_polls++;
|
||||
ret = poll(fds, num_socks, opt_timeout);
|
||||
recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL,
|
||||
NULL);
|
||||
}
|
||||
ret = xsk_ring_prod__reserve(&umem->fq, rcvd, &idx_fq);
|
||||
}
|
||||
@ -1170,7 +1175,7 @@ static inline void complete_tx_only(struct xsk_socket_info *xsk,
|
||||
}
|
||||
}
|
||||
|
||||
static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
|
||||
static void rx_drop(struct xsk_socket_info *xsk)
|
||||
{
|
||||
unsigned int rcvd, i;
|
||||
u32 idx_rx = 0, idx_fq = 0;
|
||||
@ -1178,9 +1183,9 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
|
||||
|
||||
rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
|
||||
if (!rcvd) {
|
||||
if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
|
||||
if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
|
||||
xsk->app_stats.rx_empty_polls++;
|
||||
ret = poll(fds, num_socks, opt_timeout);
|
||||
recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
|
||||
}
|
||||
return;
|
||||
}
|
||||
@ -1189,9 +1194,9 @@ static void rx_drop(struct xsk_socket_info *xsk, struct pollfd *fds)
|
||||
while (ret != rcvd) {
|
||||
if (ret < 0)
|
||||
exit_with_error(-ret);
|
||||
if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
|
||||
if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
|
||||
xsk->app_stats.fill_fail_polls++;
|
||||
ret = poll(fds, num_socks, opt_timeout);
|
||||
recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
|
||||
}
|
||||
ret = xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
|
||||
}
|
||||
@ -1233,7 +1238,7 @@ static void rx_drop_all(void)
|
||||
}
|
||||
|
||||
for (i = 0; i < num_socks; i++)
|
||||
rx_drop(xsks[i], fds);
|
||||
rx_drop(xsks[i]);
|
||||
|
||||
if (benchmark_done)
|
||||
break;
|
||||
@ -1331,19 +1336,19 @@ static void tx_only_all(void)
|
||||
complete_tx_only_all();
|
||||
}
|
||||
|
||||
static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
|
||||
static void l2fwd(struct xsk_socket_info *xsk)
|
||||
{
|
||||
unsigned int rcvd, i;
|
||||
u32 idx_rx = 0, idx_tx = 0;
|
||||
int ret;
|
||||
|
||||
complete_tx_l2fwd(xsk, fds);
|
||||
complete_tx_l2fwd(xsk);
|
||||
|
||||
rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
|
||||
if (!rcvd) {
|
||||
if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
|
||||
if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->umem->fq)) {
|
||||
xsk->app_stats.rx_empty_polls++;
|
||||
ret = poll(fds, num_socks, opt_timeout);
|
||||
recvfrom(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, NULL);
|
||||
}
|
||||
return;
|
||||
}
|
||||
@ -1353,8 +1358,8 @@ static void l2fwd(struct xsk_socket_info *xsk, struct pollfd *fds)
|
||||
while (ret != rcvd) {
|
||||
if (ret < 0)
|
||||
exit_with_error(-ret);
|
||||
complete_tx_l2fwd(xsk, fds);
|
||||
if (xsk_ring_prod__needs_wakeup(&xsk->tx)) {
|
||||
complete_tx_l2fwd(xsk);
|
||||
if (opt_busy_poll || xsk_ring_prod__needs_wakeup(&xsk->tx)) {
|
||||
xsk->app_stats.tx_wakeup_sendtos++;
|
||||
kick_tx(xsk);
|
||||
}
|
||||
@ -1388,22 +1393,20 @@ static void l2fwd_all(void)
|
||||
struct pollfd fds[MAX_SOCKS] = {};
|
||||
int i, ret;
|
||||
|
||||
for (i = 0; i < num_socks; i++) {
|
||||
fds[i].fd = xsk_socket__fd(xsks[i]->xsk);
|
||||
fds[i].events = POLLOUT | POLLIN;
|
||||
}
|
||||
|
||||
for (;;) {
|
||||
if (opt_poll) {
|
||||
for (i = 0; i < num_socks; i++)
|
||||
for (i = 0; i < num_socks; i++) {
|
||||
fds[i].fd = xsk_socket__fd(xsks[i]->xsk);
|
||||
fds[i].events = POLLOUT | POLLIN;
|
||||
xsks[i]->app_stats.opt_polls++;
|
||||
}
|
||||
ret = poll(fds, num_socks, opt_timeout);
|
||||
if (ret <= 0)
|
||||
continue;
|
||||
}
|
||||
|
||||
for (i = 0; i < num_socks; i++)
|
||||
l2fwd(xsks[i], fds);
|
||||
l2fwd(xsks[i]);
|
||||
|
||||
if (benchmark_done)
|
||||
break;
|
||||
@ -1461,6 +1464,29 @@ static void enter_xsks_into_map(struct bpf_object *obj)
|
||||
}
|
||||
}
|
||||
|
||||
static void apply_setsockopt(struct xsk_socket_info *xsk)
|
||||
{
|
||||
int sock_opt;
|
||||
|
||||
if (!opt_busy_poll)
|
||||
return;
|
||||
|
||||
sock_opt = 1;
|
||||
if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_PREFER_BUSY_POLL,
|
||||
(void *)&sock_opt, sizeof(sock_opt)) < 0)
|
||||
exit_with_error(errno);
|
||||
|
||||
sock_opt = 20;
|
||||
if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BUSY_POLL,
|
||||
(void *)&sock_opt, sizeof(sock_opt)) < 0)
|
||||
exit_with_error(errno);
|
||||
|
||||
sock_opt = opt_batch_size;
|
||||
if (setsockopt(xsk_socket__fd(xsk->xsk), SOL_SOCKET, SO_BUSY_POLL_BUDGET,
|
||||
(void *)&sock_opt, sizeof(sock_opt)) < 0)
|
||||
exit_with_error(errno);
|
||||
}
|
||||
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
|
||||
@ -1502,6 +1528,9 @@ int main(int argc, char **argv)
|
||||
for (i = 0; i < opt_num_xsks; i++)
|
||||
xsks[num_socks++] = xsk_configure_socket(umem, rx, tx);
|
||||
|
||||
for (i = 0; i < opt_num_xsks; i++)
|
||||
apply_setsockopt(xsks[i]);
|
||||
|
||||
if (opt_bench == BENCH_TXONLY) {
|
||||
gen_eth_hdr_data();
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user