xsk: new descriptor addressing scheme
Currently, AF_XDP only supports a fixed frame-size memory scheme where each frame is referenced via an index (idx). A user passes the frame index to the kernel, and the kernel acts upon the data. Some NICs, however, do not have a fixed frame-size model, instead they have a model where a memory window is passed to the hardware and multiple frames are filled into that window (referred to as the "type-writer" model). By changing the descriptor format from the current frame index addressing scheme, AF_XDP can in the future be extended to support these kinds of NICs. In the index-based model, an idx refers to a frame of size frame_size. Addressing a frame in the UMEM is done by offseting the UMEM starting address by a global offset, idx * frame_size + offset. Communicating via the fill- and completion-rings are done by means of idx. In this commit, the idx is removed in favor of an address (addr), which is a relative address ranging over the UMEM. To convert an idx-based address to the new addr is simply: addr = idx * frame_size + offset. We also stop referring to the UMEM "frame" as a frame. Instead it is simply called a chunk. To transfer ownership of a chunk to the kernel, the addr of the chunk is passed in the fill-ring. Note, that the kernel will mask addr to make it chunk aligned, so there is no need for userspace to do that. E.g., for a chunk size of 2k, passing an addr of 2048, 2050 or 3000 to the fill-ring will refer to the same chunk. On the completion-ring, the addr will match that of the Tx descriptor, passed to the kernel. Changing the descriptor format to use chunks/addr will allow for future changes to move to a type-writer based model, where multiple frames can reside in one chunk. In this model passing one single chunk into the fill-ring, would potentially result in multiple Rx descriptors. This commit changes the uapi of AF_XDP sockets, and updates the documentation. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This commit is contained in:
parent
a509a95536
commit
bbff2f321a
@ -12,7 +12,7 @@ packet processing.
|
|||||||
|
|
||||||
This document assumes that the reader is familiar with BPF and XDP. If
|
This document assumes that the reader is familiar with BPF and XDP. If
|
||||||
not, the Cilium project has an excellent reference guide at
|
not, the Cilium project has an excellent reference guide at
|
||||||
http://cilium.readthedocs.io/en/doc-1.0/bpf/.
|
http://cilium.readthedocs.io/en/latest/bpf/.
|
||||||
|
|
||||||
Using the XDP_REDIRECT action from an XDP program, the program can
|
Using the XDP_REDIRECT action from an XDP program, the program can
|
||||||
redirect ingress frames to other XDP enabled netdevs, using the
|
redirect ingress frames to other XDP enabled netdevs, using the
|
||||||
@ -33,22 +33,22 @@ for a while due to a possible retransmit, the descriptor that points
|
|||||||
to that packet can be changed to point to another and reused right
|
to that packet can be changed to point to another and reused right
|
||||||
away. This again avoids copying data.
|
away. This again avoids copying data.
|
||||||
|
|
||||||
The UMEM consists of a number of equally size frames and each frame
|
The UMEM consists of a number of equally sized chunks. A descriptor in
|
||||||
has a unique frame id. A descriptor in one of the rings references a
|
one of the rings references a frame by referencing its addr. The addr
|
||||||
frame by referencing its frame id. The user space allocates memory for
|
is simply an offset within the entire UMEM region. The user space
|
||||||
this UMEM using whatever means it feels is most appropriate (malloc,
|
allocates memory for this UMEM using whatever means it feels is most
|
||||||
mmap, huge pages, etc). This memory area is then registered with the
|
appropriate (malloc, mmap, huge pages, etc). This memory area is then
|
||||||
kernel using the new setsockopt XDP_UMEM_REG. The UMEM also has two
|
registered with the kernel using the new setsockopt XDP_UMEM_REG. The
|
||||||
rings: the FILL ring and the COMPLETION ring. The fill ring is used by
|
UMEM also has two rings: the FILL ring and the COMPLETION ring. The
|
||||||
the application to send down frame ids for the kernel to fill in with
|
fill ring is used by the application to send down addr for the kernel
|
||||||
RX packet data. References to these frames will then appear in the RX
|
to fill in with RX packet data. References to these frames will then
|
||||||
ring once each packet has been received. The completion ring, on the
|
appear in the RX ring once each packet has been received. The
|
||||||
other hand, contains frame ids that the kernel has transmitted
|
completion ring, on the other hand, contains frame addr that the
|
||||||
completely and can now be used again by user space, for either TX or
|
kernel has transmitted completely and can now be used again by user
|
||||||
RX. Thus, the frame ids appearing in the completion ring are ids that
|
space, for either TX or RX. Thus, the frame addrs appearing in the
|
||||||
were previously transmitted using the TX ring. In summary, the RX and
|
completion ring are addrs that were previously transmitted using the
|
||||||
FILL rings are used for the RX path and the TX and COMPLETION rings
|
TX ring. In summary, the RX and FILL rings are used for the RX path
|
||||||
are used for the TX path.
|
and the TX and COMPLETION rings are used for the TX path.
|
||||||
|
|
||||||
The socket is then finally bound with a bind() call to a device and a
|
The socket is then finally bound with a bind() call to a device and a
|
||||||
specific queue id on that device, and it is not until bind is
|
specific queue id on that device, and it is not until bind is
|
||||||
@ -59,13 +59,13 @@ wants to do this, it simply skips the registration of the UMEM and its
|
|||||||
corresponding two rings, sets the XDP_SHARED_UMEM flag in the bind
|
corresponding two rings, sets the XDP_SHARED_UMEM flag in the bind
|
||||||
call and submits the XSK of the process it would like to share UMEM
|
call and submits the XSK of the process it would like to share UMEM
|
||||||
with as well as its own newly created XSK socket. The new process will
|
with as well as its own newly created XSK socket. The new process will
|
||||||
then receive frame id references in its own RX ring that point to this
|
then receive frame addr references in its own RX ring that point to
|
||||||
shared UMEM. Note that since the ring structures are single-consumer /
|
this shared UMEM. Note that since the ring structures are
|
||||||
single-producer (for performance reasons), the new process has to
|
single-consumer / single-producer (for performance reasons), the new
|
||||||
create its own socket with associated RX and TX rings, since it cannot
|
process has to create its own socket with associated RX and TX rings,
|
||||||
share this with the other process. This is also the reason that there
|
since it cannot share this with the other process. This is also the
|
||||||
is only one set of FILL and COMPLETION rings per UMEM. It is the
|
reason that there is only one set of FILL and COMPLETION rings per
|
||||||
responsibility of a single process to handle the UMEM.
|
UMEM. It is the responsibility of a single process to handle the UMEM.
|
||||||
|
|
||||||
How is then packets distributed from an XDP program to the XSKs? There
|
How is then packets distributed from an XDP program to the XSKs? There
|
||||||
is a BPF map called XSKMAP (or BPF_MAP_TYPE_XSKMAP in full). The
|
is a BPF map called XSKMAP (or BPF_MAP_TYPE_XSKMAP in full). The
|
||||||
@ -102,10 +102,10 @@ UMEM
|
|||||||
|
|
||||||
UMEM is a region of virtual contiguous memory, divided into
|
UMEM is a region of virtual contiguous memory, divided into
|
||||||
equal-sized frames. An UMEM is associated to a netdev and a specific
|
equal-sized frames. An UMEM is associated to a netdev and a specific
|
||||||
queue id of that netdev. It is created and configured (frame size,
|
queue id of that netdev. It is created and configured (chunk size,
|
||||||
frame headroom, start address and size) by using the XDP_UMEM_REG
|
headroom, start address and size) by using the XDP_UMEM_REG setsockopt
|
||||||
setsockopt system call. A UMEM is bound to a netdev and queue id, via
|
system call. A UMEM is bound to a netdev and queue id, via the bind()
|
||||||
the bind() system call.
|
system call.
|
||||||
|
|
||||||
An AF_XDP is socket linked to a single UMEM, but one UMEM can have
|
An AF_XDP is socket linked to a single UMEM, but one UMEM can have
|
||||||
multiple AF_XDP sockets. To share an UMEM created via one socket A,
|
multiple AF_XDP sockets. To share an UMEM created via one socket A,
|
||||||
@ -147,13 +147,17 @@ UMEM Fill Ring
|
|||||||
~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~
|
||||||
|
|
||||||
The Fill ring is used to transfer ownership of UMEM frames from
|
The Fill ring is used to transfer ownership of UMEM frames from
|
||||||
user-space to kernel-space. The UMEM indicies are passed in the
|
user-space to kernel-space. The UMEM addrs are passed in the ring. As
|
||||||
ring. As an example, if the UMEM is 64k and each frame is 4k, then the
|
an example, if the UMEM is 64k and each chunk is 4k, then the UMEM has
|
||||||
UMEM has 16 frames and can pass indicies between 0 and 15.
|
16 chunks and can pass addrs between 0 and 64k.
|
||||||
|
|
||||||
Frames passed to the kernel are used for the ingress path (RX rings).
|
Frames passed to the kernel are used for the ingress path (RX rings).
|
||||||
|
|
||||||
The user application produces UMEM indicies to this ring.
|
The user application produces UMEM addrs to this ring. Note that the
|
||||||
|
kernel will mask the incoming addr. E.g. for a chunk size of 2k, the
|
||||||
|
log2(2048) LSB of the addr will be masked off, meaning that 2048, 2050
|
||||||
|
and 3000 refers to the same chunk.
|
||||||
|
|
||||||
|
|
||||||
UMEM Completetion Ring
|
UMEM Completetion Ring
|
||||||
~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
@ -165,16 +169,15 @@ used.
|
|||||||
Frames passed from the kernel to user-space are frames that has been
|
Frames passed from the kernel to user-space are frames that has been
|
||||||
sent (TX ring) and can be used by user-space again.
|
sent (TX ring) and can be used by user-space again.
|
||||||
|
|
||||||
The user application consumes UMEM indicies from this ring.
|
The user application consumes UMEM addrs from this ring.
|
||||||
|
|
||||||
|
|
||||||
RX Ring
|
RX Ring
|
||||||
~~~~~~~
|
~~~~~~~
|
||||||
|
|
||||||
The RX ring is the receiving side of a socket. Each entry in the ring
|
The RX ring is the receiving side of a socket. Each entry in the ring
|
||||||
is a struct xdp_desc descriptor. The descriptor contains UMEM index
|
is a struct xdp_desc descriptor. The descriptor contains UMEM offset
|
||||||
(idx), the length of the data (len), the offset into the frame
|
(addr) and the length of the data (len).
|
||||||
(offset).
|
|
||||||
|
|
||||||
If no frames have been passed to kernel via the Fill ring, no
|
If no frames have been passed to kernel via the Fill ring, no
|
||||||
descriptors will (or can) appear on the RX ring.
|
descriptors will (or can) appear on the RX ring.
|
||||||
@ -221,38 +224,50 @@ side is xdpsock_user.c and the XDP side xdpsock_kern.c.
|
|||||||
|
|
||||||
Naive ring dequeue and enqueue could look like this::
|
Naive ring dequeue and enqueue could look like this::
|
||||||
|
|
||||||
|
// struct xdp_rxtx_ring {
|
||||||
|
// __u32 *producer;
|
||||||
|
// __u32 *consumer;
|
||||||
|
// struct xdp_desc *desc;
|
||||||
|
// };
|
||||||
|
|
||||||
|
// struct xdp_umem_ring {
|
||||||
|
// __u32 *producer;
|
||||||
|
// __u32 *consumer;
|
||||||
|
// __u64 *desc;
|
||||||
|
// };
|
||||||
|
|
||||||
// typedef struct xdp_rxtx_ring RING;
|
// typedef struct xdp_rxtx_ring RING;
|
||||||
// typedef struct xdp_umem_ring RING;
|
// typedef struct xdp_umem_ring RING;
|
||||||
|
|
||||||
// typedef struct xdp_desc RING_TYPE;
|
// typedef struct xdp_desc RING_TYPE;
|
||||||
// typedef __u32 RING_TYPE;
|
// typedef __u64 RING_TYPE;
|
||||||
|
|
||||||
int dequeue_one(RING *ring, RING_TYPE *item)
|
int dequeue_one(RING *ring, RING_TYPE *item)
|
||||||
{
|
{
|
||||||
__u32 entries = ring->ptrs.producer - ring->ptrs.consumer;
|
__u32 entries = *ring->producer - *ring->consumer;
|
||||||
|
|
||||||
if (entries == 0)
|
if (entries == 0)
|
||||||
return -1;
|
return -1;
|
||||||
|
|
||||||
// read-barrier!
|
// read-barrier!
|
||||||
|
|
||||||
*item = ring->desc[ring->ptrs.consumer & (RING_SIZE - 1)];
|
*item = ring->desc[*ring->consumer & (RING_SIZE - 1)];
|
||||||
ring->ptrs.consumer++;
|
(*ring->consumer)++;
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
int enqueue_one(RING *ring, const RING_TYPE *item)
|
int enqueue_one(RING *ring, const RING_TYPE *item)
|
||||||
{
|
{
|
||||||
u32 free_entries = RING_SIZE - (ring->ptrs.producer - ring->ptrs.consumer);
|
u32 free_entries = RING_SIZE - (*ring->producer - *ring->consumer);
|
||||||
|
|
||||||
if (free_entries == 0)
|
if (free_entries == 0)
|
||||||
return -1;
|
return -1;
|
||||||
|
|
||||||
ring->desc[ring->ptrs.producer & (RING_SIZE - 1)] = *item;
|
ring->desc[*ring->producer & (RING_SIZE - 1)] = *item;
|
||||||
|
|
||||||
// write-barrier!
|
// write-barrier!
|
||||||
|
|
||||||
ring->ptrs.producer++;
|
(*ring->producer)++;
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -48,8 +48,8 @@ struct xdp_mmap_offsets {
|
|||||||
struct xdp_umem_reg {
|
struct xdp_umem_reg {
|
||||||
__u64 addr; /* Start of packet data area */
|
__u64 addr; /* Start of packet data area */
|
||||||
__u64 len; /* Length of packet data area */
|
__u64 len; /* Length of packet data area */
|
||||||
__u32 frame_size; /* Frame size */
|
__u32 chunk_size;
|
||||||
__u32 frame_headroom; /* Frame head room */
|
__u32 headroom;
|
||||||
};
|
};
|
||||||
|
|
||||||
struct xdp_statistics {
|
struct xdp_statistics {
|
||||||
@ -66,13 +66,11 @@ struct xdp_statistics {
|
|||||||
|
|
||||||
/* Rx/Tx descriptor */
|
/* Rx/Tx descriptor */
|
||||||
struct xdp_desc {
|
struct xdp_desc {
|
||||||
__u32 idx;
|
__u64 addr;
|
||||||
__u32 len;
|
__u32 len;
|
||||||
__u16 offset;
|
__u32 options;
|
||||||
__u8 flags;
|
|
||||||
__u8 padding[5];
|
|
||||||
};
|
};
|
||||||
|
|
||||||
/* UMEM descriptor is __u32 */
|
/* UMEM descriptor is __u64 */
|
||||||
|
|
||||||
#endif /* _LINUX_IF_XDP_H */
|
#endif /* _LINUX_IF_XDP_H */
|
||||||
|
@ -14,7 +14,7 @@
|
|||||||
|
|
||||||
#include "xdp_umem.h"
|
#include "xdp_umem.h"
|
||||||
|
|
||||||
#define XDP_UMEM_MIN_FRAME_SIZE 2048
|
#define XDP_UMEM_MIN_CHUNK_SIZE 2048
|
||||||
|
|
||||||
static void xdp_umem_unpin_pages(struct xdp_umem *umem)
|
static void xdp_umem_unpin_pages(struct xdp_umem *umem)
|
||||||
{
|
{
|
||||||
@ -151,12 +151,12 @@ static int xdp_umem_account_pages(struct xdp_umem *umem)
|
|||||||
|
|
||||||
static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
|
static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
|
||||||
{
|
{
|
||||||
u32 frame_size = mr->frame_size, frame_headroom = mr->frame_headroom;
|
u32 chunk_size = mr->chunk_size, headroom = mr->headroom;
|
||||||
|
unsigned int chunks, chunks_per_page;
|
||||||
u64 addr = mr->addr, size = mr->len;
|
u64 addr = mr->addr, size = mr->len;
|
||||||
unsigned int nframes, nfpp;
|
|
||||||
int size_chk, err;
|
int size_chk, err;
|
||||||
|
|
||||||
if (frame_size < XDP_UMEM_MIN_FRAME_SIZE || frame_size > PAGE_SIZE) {
|
if (chunk_size < XDP_UMEM_MIN_CHUNK_SIZE || chunk_size > PAGE_SIZE) {
|
||||||
/* Strictly speaking we could support this, if:
|
/* Strictly speaking we could support this, if:
|
||||||
* - huge pages, or*
|
* - huge pages, or*
|
||||||
* - using an IOMMU, or
|
* - using an IOMMU, or
|
||||||
@ -166,7 +166,7 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
|
|||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (!is_power_of_2(frame_size))
|
if (!is_power_of_2(chunk_size))
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
||||||
if (!PAGE_ALIGNED(addr)) {
|
if (!PAGE_ALIGNED(addr)) {
|
||||||
@ -179,33 +179,30 @@ static int xdp_umem_reg(struct xdp_umem *umem, struct xdp_umem_reg *mr)
|
|||||||
if ((addr + size) < addr)
|
if ((addr + size) < addr)
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
||||||
nframes = (unsigned int)div_u64(size, frame_size);
|
chunks = (unsigned int)div_u64(size, chunk_size);
|
||||||
if (nframes == 0 || nframes > UINT_MAX)
|
if (chunks == 0)
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
||||||
nfpp = PAGE_SIZE / frame_size;
|
chunks_per_page = PAGE_SIZE / chunk_size;
|
||||||
if (nframes < nfpp || nframes % nfpp)
|
if (chunks < chunks_per_page || chunks % chunks_per_page)
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
||||||
frame_headroom = ALIGN(frame_headroom, 64);
|
headroom = ALIGN(headroom, 64);
|
||||||
|
|
||||||
size_chk = frame_size - frame_headroom - XDP_PACKET_HEADROOM;
|
size_chk = chunk_size - headroom - XDP_PACKET_HEADROOM;
|
||||||
if (size_chk < 0)
|
if (size_chk < 0)
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
||||||
umem->pid = get_task_pid(current, PIDTYPE_PID);
|
umem->pid = get_task_pid(current, PIDTYPE_PID);
|
||||||
umem->size = (size_t)size;
|
|
||||||
umem->address = (unsigned long)addr;
|
umem->address = (unsigned long)addr;
|
||||||
umem->props.frame_size = frame_size;
|
umem->props.chunk_mask = ~((u64)chunk_size - 1);
|
||||||
umem->props.nframes = nframes;
|
umem->props.size = size;
|
||||||
umem->frame_headroom = frame_headroom;
|
umem->headroom = headroom;
|
||||||
|
umem->chunk_size_nohr = chunk_size - headroom;
|
||||||
umem->npgs = size / PAGE_SIZE;
|
umem->npgs = size / PAGE_SIZE;
|
||||||
umem->pgs = NULL;
|
umem->pgs = NULL;
|
||||||
umem->user = NULL;
|
umem->user = NULL;
|
||||||
|
|
||||||
umem->frame_size_log2 = ilog2(frame_size);
|
|
||||||
umem->nfpp_mask = nfpp - 1;
|
|
||||||
umem->nfpplog2 = ilog2(nfpp);
|
|
||||||
refcount_set(&umem->users, 1);
|
refcount_set(&umem->users, 1);
|
||||||
|
|
||||||
err = xdp_umem_account_pages(umem);
|
err = xdp_umem_account_pages(umem);
|
||||||
|
@ -18,35 +18,20 @@ struct xdp_umem {
|
|||||||
struct xsk_queue *cq;
|
struct xsk_queue *cq;
|
||||||
struct page **pgs;
|
struct page **pgs;
|
||||||
struct xdp_umem_props props;
|
struct xdp_umem_props props;
|
||||||
u32 npgs;
|
u32 headroom;
|
||||||
u32 frame_headroom;
|
u32 chunk_size_nohr;
|
||||||
u32 nfpp_mask;
|
|
||||||
u32 nfpplog2;
|
|
||||||
u32 frame_size_log2;
|
|
||||||
struct user_struct *user;
|
struct user_struct *user;
|
||||||
struct pid *pid;
|
struct pid *pid;
|
||||||
unsigned long address;
|
unsigned long address;
|
||||||
size_t size;
|
|
||||||
refcount_t users;
|
refcount_t users;
|
||||||
struct work_struct work;
|
struct work_struct work;
|
||||||
|
u32 npgs;
|
||||||
};
|
};
|
||||||
|
|
||||||
static inline char *xdp_umem_get_data(struct xdp_umem *umem, u32 idx)
|
static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
|
||||||
{
|
{
|
||||||
u64 pg, off;
|
return page_address(umem->pgs[addr >> PAGE_SHIFT]) +
|
||||||
char *data;
|
(addr & (PAGE_SIZE - 1));
|
||||||
|
|
||||||
pg = idx >> umem->nfpplog2;
|
|
||||||
off = (idx & umem->nfpp_mask) << umem->frame_size_log2;
|
|
||||||
|
|
||||||
data = page_address(umem->pgs[pg]);
|
|
||||||
return data + off;
|
|
||||||
}
|
|
||||||
|
|
||||||
static inline char *xdp_umem_get_data_with_headroom(struct xdp_umem *umem,
|
|
||||||
u32 idx)
|
|
||||||
{
|
|
||||||
return xdp_umem_get_data(umem, idx) + umem->frame_headroom;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
bool xdp_umem_validate_queues(struct xdp_umem *umem);
|
bool xdp_umem_validate_queues(struct xdp_umem *umem);
|
||||||
|
@ -7,8 +7,8 @@
|
|||||||
#define XDP_UMEM_PROPS_H_
|
#define XDP_UMEM_PROPS_H_
|
||||||
|
|
||||||
struct xdp_umem_props {
|
struct xdp_umem_props {
|
||||||
u32 frame_size;
|
u64 chunk_mask;
|
||||||
u32 nframes;
|
u64 size;
|
||||||
};
|
};
|
||||||
|
|
||||||
#endif /* XDP_UMEM_PROPS_H_ */
|
#endif /* XDP_UMEM_PROPS_H_ */
|
||||||
|
@ -41,24 +41,27 @@ bool xsk_is_setup_for_bpf_map(struct xdp_sock *xs)
|
|||||||
|
|
||||||
static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
|
static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
|
||||||
{
|
{
|
||||||
u32 id, len = xdp->data_end - xdp->data;
|
u32 len = xdp->data_end - xdp->data;
|
||||||
void *buffer;
|
void *buffer;
|
||||||
|
u64 addr;
|
||||||
int err;
|
int err;
|
||||||
|
|
||||||
if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
|
if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
|
||||||
return -EINVAL;
|
return -EINVAL;
|
||||||
|
|
||||||
if (!xskq_peek_id(xs->umem->fq, &id)) {
|
if (!xskq_peek_addr(xs->umem->fq, &addr) ||
|
||||||
|
len > xs->umem->chunk_size_nohr) {
|
||||||
xs->rx_dropped++;
|
xs->rx_dropped++;
|
||||||
return -ENOSPC;
|
return -ENOSPC;
|
||||||
}
|
}
|
||||||
|
|
||||||
buffer = xdp_umem_get_data_with_headroom(xs->umem, id);
|
addr += xs->umem->headroom;
|
||||||
|
|
||||||
|
buffer = xdp_umem_get_data(xs->umem, addr);
|
||||||
memcpy(buffer, xdp->data, len);
|
memcpy(buffer, xdp->data, len);
|
||||||
err = xskq_produce_batch_desc(xs->rx, id, len,
|
err = xskq_produce_batch_desc(xs->rx, addr, len);
|
||||||
xs->umem->frame_headroom);
|
|
||||||
if (!err)
|
if (!err)
|
||||||
xskq_discard_id(xs->umem->fq);
|
xskq_discard_addr(xs->umem->fq);
|
||||||
else
|
else
|
||||||
xs->rx_dropped++;
|
xs->rx_dropped++;
|
||||||
|
|
||||||
@ -95,10 +98,10 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
|
|||||||
|
|
||||||
static void xsk_destruct_skb(struct sk_buff *skb)
|
static void xsk_destruct_skb(struct sk_buff *skb)
|
||||||
{
|
{
|
||||||
u32 id = (u32)(long)skb_shinfo(skb)->destructor_arg;
|
u64 addr = (u64)(long)skb_shinfo(skb)->destructor_arg;
|
||||||
struct xdp_sock *xs = xdp_sk(skb->sk);
|
struct xdp_sock *xs = xdp_sk(skb->sk);
|
||||||
|
|
||||||
WARN_ON_ONCE(xskq_produce_id(xs->umem->cq, id));
|
WARN_ON_ONCE(xskq_produce_addr(xs->umem->cq, addr));
|
||||||
|
|
||||||
sock_wfree(skb);
|
sock_wfree(skb);
|
||||||
}
|
}
|
||||||
@ -123,14 +126,15 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
|
|||||||
|
|
||||||
while (xskq_peek_desc(xs->tx, &desc)) {
|
while (xskq_peek_desc(xs->tx, &desc)) {
|
||||||
char *buffer;
|
char *buffer;
|
||||||
u32 id, len;
|
u64 addr;
|
||||||
|
u32 len;
|
||||||
|
|
||||||
if (max_batch-- == 0) {
|
if (max_batch-- == 0) {
|
||||||
err = -EAGAIN;
|
err = -EAGAIN;
|
||||||
goto out;
|
goto out;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (xskq_reserve_id(xs->umem->cq)) {
|
if (xskq_reserve_addr(xs->umem->cq)) {
|
||||||
err = -EAGAIN;
|
err = -EAGAIN;
|
||||||
goto out;
|
goto out;
|
||||||
}
|
}
|
||||||
@ -153,8 +157,8 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
|
|||||||
}
|
}
|
||||||
|
|
||||||
skb_put(skb, len);
|
skb_put(skb, len);
|
||||||
id = desc.idx;
|
addr = desc.addr;
|
||||||
buffer = xdp_umem_get_data(xs->umem, id) + desc.offset;
|
buffer = xdp_umem_get_data(xs->umem, addr);
|
||||||
err = skb_store_bits(skb, 0, buffer, len);
|
err = skb_store_bits(skb, 0, buffer, len);
|
||||||
if (unlikely(err)) {
|
if (unlikely(err)) {
|
||||||
kfree_skb(skb);
|
kfree_skb(skb);
|
||||||
@ -164,7 +168,7 @@ static int xsk_generic_xmit(struct sock *sk, struct msghdr *m,
|
|||||||
skb->dev = xs->dev;
|
skb->dev = xs->dev;
|
||||||
skb->priority = sk->sk_priority;
|
skb->priority = sk->sk_priority;
|
||||||
skb->mark = sk->sk_mark;
|
skb->mark = sk->sk_mark;
|
||||||
skb_shinfo(skb)->destructor_arg = (void *)(long)id;
|
skb_shinfo(skb)->destructor_arg = (void *)(long)addr;
|
||||||
skb->destructor = xsk_destruct_skb;
|
skb->destructor = xsk_destruct_skb;
|
||||||
|
|
||||||
err = dev_direct_xmit(skb, xs->queue_id);
|
err = dev_direct_xmit(skb, xs->queue_id);
|
||||||
|
@ -17,7 +17,7 @@ void xskq_set_umem(struct xsk_queue *q, struct xdp_umem_props *umem_props)
|
|||||||
|
|
||||||
static u32 xskq_umem_get_ring_size(struct xsk_queue *q)
|
static u32 xskq_umem_get_ring_size(struct xsk_queue *q)
|
||||||
{
|
{
|
||||||
return sizeof(struct xdp_umem_ring) + q->nentries * sizeof(u32);
|
return sizeof(struct xdp_umem_ring) + q->nentries * sizeof(u64);
|
||||||
}
|
}
|
||||||
|
|
||||||
static u32 xskq_rxtx_get_ring_size(struct xsk_queue *q)
|
static u32 xskq_rxtx_get_ring_size(struct xsk_queue *q)
|
||||||
|
@ -27,7 +27,7 @@ struct xdp_rxtx_ring {
|
|||||||
/* Used for the fill and completion queues for buffers */
|
/* Used for the fill and completion queues for buffers */
|
||||||
struct xdp_umem_ring {
|
struct xdp_umem_ring {
|
||||||
struct xdp_ring ptrs;
|
struct xdp_ring ptrs;
|
||||||
u32 desc[0] ____cacheline_aligned_in_smp;
|
u64 desc[0] ____cacheline_aligned_in_smp;
|
||||||
};
|
};
|
||||||
|
|
||||||
struct xsk_queue {
|
struct xsk_queue {
|
||||||
@ -76,24 +76,25 @@ static inline u32 xskq_nb_free(struct xsk_queue *q, u32 producer, u32 dcnt)
|
|||||||
|
|
||||||
/* UMEM queue */
|
/* UMEM queue */
|
||||||
|
|
||||||
static inline bool xskq_is_valid_id(struct xsk_queue *q, u32 idx)
|
static inline bool xskq_is_valid_addr(struct xsk_queue *q, u64 addr)
|
||||||
{
|
{
|
||||||
if (unlikely(idx >= q->umem_props.nframes)) {
|
if (addr >= q->umem_props.size) {
|
||||||
q->invalid_descs++;
|
q->invalid_descs++;
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline u32 *xskq_validate_id(struct xsk_queue *q, u32 *id)
|
static inline u64 *xskq_validate_addr(struct xsk_queue *q, u64 *addr)
|
||||||
{
|
{
|
||||||
while (q->cons_tail != q->cons_head) {
|
while (q->cons_tail != q->cons_head) {
|
||||||
struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
|
struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
|
||||||
unsigned int idx = q->cons_tail & q->ring_mask;
|
unsigned int idx = q->cons_tail & q->ring_mask;
|
||||||
|
|
||||||
*id = READ_ONCE(ring->desc[idx]);
|
*addr = READ_ONCE(ring->desc[idx]) & q->umem_props.chunk_mask;
|
||||||
if (xskq_is_valid_id(q, *id))
|
if (xskq_is_valid_addr(q, *addr))
|
||||||
return id;
|
return addr;
|
||||||
|
|
||||||
q->cons_tail++;
|
q->cons_tail++;
|
||||||
}
|
}
|
||||||
@ -101,7 +102,7 @@ static inline u32 *xskq_validate_id(struct xsk_queue *q, u32 *id)
|
|||||||
return NULL;
|
return NULL;
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline u32 *xskq_peek_id(struct xsk_queue *q, u32 *id)
|
static inline u64 *xskq_peek_addr(struct xsk_queue *q, u64 *addr)
|
||||||
{
|
{
|
||||||
if (q->cons_tail == q->cons_head) {
|
if (q->cons_tail == q->cons_head) {
|
||||||
WRITE_ONCE(q->ring->consumer, q->cons_tail);
|
WRITE_ONCE(q->ring->consumer, q->cons_tail);
|
||||||
@ -111,19 +112,19 @@ static inline u32 *xskq_peek_id(struct xsk_queue *q, u32 *id)
|
|||||||
smp_rmb();
|
smp_rmb();
|
||||||
}
|
}
|
||||||
|
|
||||||
return xskq_validate_id(q, id);
|
return xskq_validate_addr(q, addr);
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline void xskq_discard_id(struct xsk_queue *q)
|
static inline void xskq_discard_addr(struct xsk_queue *q)
|
||||||
{
|
{
|
||||||
q->cons_tail++;
|
q->cons_tail++;
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline int xskq_produce_id(struct xsk_queue *q, u32 id)
|
static inline int xskq_produce_addr(struct xsk_queue *q, u64 addr)
|
||||||
{
|
{
|
||||||
struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
|
struct xdp_umem_ring *ring = (struct xdp_umem_ring *)q->ring;
|
||||||
|
|
||||||
ring->desc[q->prod_tail++ & q->ring_mask] = id;
|
ring->desc[q->prod_tail++ & q->ring_mask] = addr;
|
||||||
|
|
||||||
/* Order producer and data */
|
/* Order producer and data */
|
||||||
smp_wmb();
|
smp_wmb();
|
||||||
@ -132,7 +133,7 @@ static inline int xskq_produce_id(struct xsk_queue *q, u32 id)
|
|||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
static inline int xskq_reserve_id(struct xsk_queue *q)
|
static inline int xskq_reserve_addr(struct xsk_queue *q)
|
||||||
{
|
{
|
||||||
if (xskq_nb_free(q, q->prod_head, 1) == 0)
|
if (xskq_nb_free(q, q->prod_head, 1) == 0)
|
||||||
return -ENOSPC;
|
return -ENOSPC;
|
||||||
@ -145,16 +146,11 @@ static inline int xskq_reserve_id(struct xsk_queue *q)
|
|||||||
|
|
||||||
static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
|
static inline bool xskq_is_valid_desc(struct xsk_queue *q, struct xdp_desc *d)
|
||||||
{
|
{
|
||||||
u32 buff_len;
|
if (!xskq_is_valid_addr(q, d->addr))
|
||||||
|
|
||||||
if (unlikely(d->idx >= q->umem_props.nframes)) {
|
|
||||||
q->invalid_descs++;
|
|
||||||
return false;
|
return false;
|
||||||
}
|
|
||||||
|
|
||||||
buff_len = q->umem_props.frame_size;
|
if (((d->addr + d->len) & q->umem_props.chunk_mask) !=
|
||||||
if (unlikely(d->len > buff_len || d->len == 0 ||
|
(d->addr & q->umem_props.chunk_mask)) {
|
||||||
d->offset > buff_len || d->offset + d->len > buff_len)) {
|
|
||||||
q->invalid_descs++;
|
q->invalid_descs++;
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
@ -199,7 +195,7 @@ static inline void xskq_discard_desc(struct xsk_queue *q)
|
|||||||
}
|
}
|
||||||
|
|
||||||
static inline int xskq_produce_batch_desc(struct xsk_queue *q,
|
static inline int xskq_produce_batch_desc(struct xsk_queue *q,
|
||||||
u32 id, u32 len, u16 offset)
|
u64 addr, u32 len)
|
||||||
{
|
{
|
||||||
struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
|
struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
|
||||||
unsigned int idx;
|
unsigned int idx;
|
||||||
@ -208,9 +204,8 @@ static inline int xskq_produce_batch_desc(struct xsk_queue *q,
|
|||||||
return -ENOSPC;
|
return -ENOSPC;
|
||||||
|
|
||||||
idx = (q->prod_head++) & q->ring_mask;
|
idx = (q->prod_head++) & q->ring_mask;
|
||||||
ring->desc[idx].idx = id;
|
ring->desc[idx].addr = addr;
|
||||||
ring->desc[idx].len = len;
|
ring->desc[idx].len = len;
|
||||||
ring->desc[idx].offset = offset;
|
|
||||||
|
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
Loading…
Reference in New Issue
Block a user