mirror of
https://github.com/torvalds/linux.git
synced 2024-12-27 13:22:23 +00:00
Merge branch 'docs-document-some-aspects-of-struct-sk_buff'
Jakub Kicinski says: ==================== docs: document some aspects of struct sk_buff This small set creates a place to render sk_buff documentation, documents one random thing (data-only skbs) and converts the big checksum comment to kdoc. ==================== Link: https://lore.kernel.org/r/20220323233715.2104106-1-kuba@kernel.org/ Link: https://lore.kernel.org/r/20220324231312.2241166-1-kuba@kernel.org/ Link: https://lore.kernel.org/r/20220509160456.1058940-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This commit is contained in:
commit
4c0c6e4cf7
@ -97,6 +97,7 @@ Contents:
|
||||
sctp
|
||||
secid
|
||||
seg6-sysctl
|
||||
skbuff
|
||||
smc-sysctl
|
||||
statistics
|
||||
strparser
|
||||
|
37
Documentation/networking/skbuff.rst
Normal file
37
Documentation/networking/skbuff.rst
Normal file
@ -0,0 +1,37 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
struct sk_buff
|
||||
==============
|
||||
|
||||
:c:type:`sk_buff` is the main networking structure representing
|
||||
a packet.
|
||||
|
||||
Basic sk_buff geometry
|
||||
----------------------
|
||||
|
||||
.. kernel-doc:: include/linux/skbuff.h
|
||||
:doc: Basic sk_buff geometry
|
||||
|
||||
Shared skbs and skb clones
|
||||
--------------------------
|
||||
|
||||
:c:member:`sk_buff.users` is a simple refcount allowing multiple entities
|
||||
to keep a struct sk_buff alive. skbs with a ``sk_buff.users != 1`` are referred
|
||||
to as shared skbs (see skb_shared()).
|
||||
|
||||
skb_clone() allows for fast duplication of skbs. None of the data buffers
|
||||
get copied, but caller gets a new metadata struct (struct sk_buff).
|
||||
&skb_shared_info.refcount indicates the number of skbs pointing at the same
|
||||
packet data (i.e. clones).
|
||||
|
||||
dataref and headerless skbs
|
||||
---------------------------
|
||||
|
||||
.. kernel-doc:: include/linux/skbuff.h
|
||||
:doc: dataref and headerless skbs
|
||||
|
||||
Checksum information
|
||||
--------------------
|
||||
|
||||
.. kernel-doc:: include/linux/skbuff.h
|
||||
:doc: skb checksums
|
@ -43,98 +43,112 @@
|
||||
#include <linux/netfilter/nf_conntrack_common.h>
|
||||
#endif
|
||||
|
||||
/* The interface for checksum offload between the stack and networking drivers
|
||||
/**
|
||||
* DOC: skb checksums
|
||||
*
|
||||
* The interface for checksum offload between the stack and networking drivers
|
||||
* is as follows...
|
||||
*
|
||||
* A. IP checksum related features
|
||||
* IP checksum related features
|
||||
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
*
|
||||
* Drivers advertise checksum offload capabilities in the features of a device.
|
||||
* From the stack's point of view these are capabilities offered by the driver.
|
||||
* A driver typically only advertises features that it is capable of offloading
|
||||
* to its device.
|
||||
*
|
||||
* The checksum related features are:
|
||||
* .. flat-table:: Checksum related device features
|
||||
* :widths: 1 10
|
||||
*
|
||||
* NETIF_F_HW_CSUM - The driver (or its device) is able to compute one
|
||||
* IP (one's complement) checksum for any combination
|
||||
* of protocols or protocol layering. The checksum is
|
||||
* computed and set in a packet per the CHECKSUM_PARTIAL
|
||||
* interface (see below).
|
||||
* * - %NETIF_F_HW_CSUM
|
||||
* - The driver (or its device) is able to compute one
|
||||
* IP (one's complement) checksum for any combination
|
||||
* of protocols or protocol layering. The checksum is
|
||||
* computed and set in a packet per the CHECKSUM_PARTIAL
|
||||
* interface (see below).
|
||||
*
|
||||
* NETIF_F_IP_CSUM - Driver (device) is only able to checksum plain
|
||||
* TCP or UDP packets over IPv4. These are specifically
|
||||
* unencapsulated packets of the form IPv4|TCP or
|
||||
* IPv4|UDP where the Protocol field in the IPv4 header
|
||||
* is TCP or UDP. The IPv4 header may contain IP options.
|
||||
* This feature cannot be set in features for a device
|
||||
* with NETIF_F_HW_CSUM also set. This feature is being
|
||||
* DEPRECATED (see below).
|
||||
* * - %NETIF_F_IP_CSUM
|
||||
* - Driver (device) is only able to checksum plain
|
||||
* TCP or UDP packets over IPv4. These are specifically
|
||||
* unencapsulated packets of the form IPv4|TCP or
|
||||
* IPv4|UDP where the Protocol field in the IPv4 header
|
||||
* is TCP or UDP. The IPv4 header may contain IP options.
|
||||
* This feature cannot be set in features for a device
|
||||
* with NETIF_F_HW_CSUM also set. This feature is being
|
||||
* DEPRECATED (see below).
|
||||
*
|
||||
* NETIF_F_IPV6_CSUM - Driver (device) is only able to checksum plain
|
||||
* TCP or UDP packets over IPv6. These are specifically
|
||||
* unencapsulated packets of the form IPv6|TCP or
|
||||
* IPv6|UDP where the Next Header field in the IPv6
|
||||
* header is either TCP or UDP. IPv6 extension headers
|
||||
* are not supported with this feature. This feature
|
||||
* cannot be set in features for a device with
|
||||
* NETIF_F_HW_CSUM also set. This feature is being
|
||||
* DEPRECATED (see below).
|
||||
* * - %NETIF_F_IPV6_CSUM
|
||||
* - Driver (device) is only able to checksum plain
|
||||
* TCP or UDP packets over IPv6. These are specifically
|
||||
* unencapsulated packets of the form IPv6|TCP or
|
||||
* IPv6|UDP where the Next Header field in the IPv6
|
||||
* header is either TCP or UDP. IPv6 extension headers
|
||||
* are not supported with this feature. This feature
|
||||
* cannot be set in features for a device with
|
||||
* NETIF_F_HW_CSUM also set. This feature is being
|
||||
* DEPRECATED (see below).
|
||||
*
|
||||
* NETIF_F_RXCSUM - Driver (device) performs receive checksum offload.
|
||||
* This flag is only used to disable the RX checksum
|
||||
* feature for a device. The stack will accept receive
|
||||
* checksum indication in packets received on a device
|
||||
* regardless of whether NETIF_F_RXCSUM is set.
|
||||
* * - %NETIF_F_RXCSUM
|
||||
* - Driver (device) performs receive checksum offload.
|
||||
* This flag is only used to disable the RX checksum
|
||||
* feature for a device. The stack will accept receive
|
||||
* checksum indication in packets received on a device
|
||||
* regardless of whether NETIF_F_RXCSUM is set.
|
||||
*
|
||||
* B. Checksumming of received packets by device. Indication of checksum
|
||||
* verification is set in skb->ip_summed. Possible values are:
|
||||
* Checksumming of received packets by device
|
||||
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
*
|
||||
* CHECKSUM_NONE:
|
||||
* Indication of checksum verification is set in &sk_buff.ip_summed.
|
||||
* Possible values are:
|
||||
*
|
||||
* - %CHECKSUM_NONE
|
||||
*
|
||||
* Device did not checksum this packet e.g. due to lack of capabilities.
|
||||
* The packet contains full (though not verified) checksum in packet but
|
||||
* not in skb->csum. Thus, skb->csum is undefined in this case.
|
||||
*
|
||||
* CHECKSUM_UNNECESSARY:
|
||||
* - %CHECKSUM_UNNECESSARY
|
||||
*
|
||||
* The hardware you're dealing with doesn't calculate the full checksum
|
||||
* (as in CHECKSUM_COMPLETE), but it does parse headers and verify checksums
|
||||
* for specific protocols. For such packets it will set CHECKSUM_UNNECESSARY
|
||||
* if their checksums are okay. skb->csum is still undefined in this case
|
||||
* (as in %CHECKSUM_COMPLETE), but it does parse headers and verify checksums
|
||||
* for specific protocols. For such packets it will set %CHECKSUM_UNNECESSARY
|
||||
* if their checksums are okay. &sk_buff.csum is still undefined in this case
|
||||
* though. A driver or device must never modify the checksum field in the
|
||||
* packet even if checksum is verified.
|
||||
*
|
||||
* CHECKSUM_UNNECESSARY is applicable to following protocols:
|
||||
* TCP: IPv6 and IPv4.
|
||||
* UDP: IPv4 and IPv6. A device may apply CHECKSUM_UNNECESSARY to a
|
||||
* %CHECKSUM_UNNECESSARY is applicable to following protocols:
|
||||
*
|
||||
* - TCP: IPv6 and IPv4.
|
||||
* - UDP: IPv4 and IPv6. A device may apply CHECKSUM_UNNECESSARY to a
|
||||
* zero UDP checksum for either IPv4 or IPv6, the networking stack
|
||||
* may perform further validation in this case.
|
||||
* GRE: only if the checksum is present in the header.
|
||||
* SCTP: indicates the CRC in SCTP header has been validated.
|
||||
* FCOE: indicates the CRC in FC frame has been validated.
|
||||
* - GRE: only if the checksum is present in the header.
|
||||
* - SCTP: indicates the CRC in SCTP header has been validated.
|
||||
* - FCOE: indicates the CRC in FC frame has been validated.
|
||||
*
|
||||
* skb->csum_level indicates the number of consecutive checksums found in
|
||||
* the packet minus one that have been verified as CHECKSUM_UNNECESSARY.
|
||||
* &sk_buff.csum_level indicates the number of consecutive checksums found in
|
||||
* the packet minus one that have been verified as %CHECKSUM_UNNECESSARY.
|
||||
* For instance if a device receives an IPv6->UDP->GRE->IPv4->TCP packet
|
||||
* and a device is able to verify the checksums for UDP (possibly zero),
|
||||
* GRE (checksum flag is set) and TCP, skb->csum_level would be set to
|
||||
* GRE (checksum flag is set) and TCP, &sk_buff.csum_level would be set to
|
||||
* two. If the device were only able to verify the UDP checksum and not
|
||||
* GRE, either because it doesn't support GRE checksum or because GRE
|
||||
* checksum is bad, skb->csum_level would be set to zero (TCP checksum is
|
||||
* not considered in this case).
|
||||
*
|
||||
* CHECKSUM_COMPLETE:
|
||||
* - %CHECKSUM_COMPLETE
|
||||
*
|
||||
* This is the most generic way. The device supplied checksum of the _whole_
|
||||
* packet as seen by netif_rx() and fills in skb->csum. This means the
|
||||
* packet as seen by netif_rx() and fills in &sk_buff.csum. This means the
|
||||
* hardware doesn't need to parse L3/L4 headers to implement this.
|
||||
*
|
||||
* Notes:
|
||||
*
|
||||
* - Even if device supports only some protocols, but is able to produce
|
||||
* skb->csum, it MUST use CHECKSUM_COMPLETE, not CHECKSUM_UNNECESSARY.
|
||||
* - CHECKSUM_COMPLETE is not applicable to SCTP and FCoE protocols.
|
||||
*
|
||||
* CHECKSUM_PARTIAL:
|
||||
* - %CHECKSUM_PARTIAL
|
||||
*
|
||||
* A checksum is set up to be offloaded to a device as described in the
|
||||
* output description for CHECKSUM_PARTIAL. This may occur on a packet
|
||||
@ -146,14 +160,18 @@
|
||||
* packet that are after the checksum being offloaded are not considered to
|
||||
* be verified.
|
||||
*
|
||||
* C. Checksumming on transmit for non-GSO. The stack requests checksum offload
|
||||
* in the skb->ip_summed for a packet. Values are:
|
||||
* Checksumming on transmit for non-GSO
|
||||
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
*
|
||||
* CHECKSUM_PARTIAL:
|
||||
* The stack requests checksum offload in the &sk_buff.ip_summed for a packet.
|
||||
* Values are:
|
||||
*
|
||||
* - %CHECKSUM_PARTIAL
|
||||
*
|
||||
* The driver is required to checksum the packet as seen by hard_start_xmit()
|
||||
* from skb->csum_start up to the end, and to record/write the checksum at
|
||||
* offset skb->csum_start + skb->csum_offset. A driver may verify that the
|
||||
* from &sk_buff.csum_start up to the end, and to record/write the checksum at
|
||||
* offset &sk_buff.csum_start + &sk_buff.csum_offset.
|
||||
* A driver may verify that the
|
||||
* csum_start and csum_offset values are valid values given the length and
|
||||
* offset of the packet, but it should not attempt to validate that the
|
||||
* checksum refers to a legitimate transport layer checksum -- it is the
|
||||
@ -165,55 +183,66 @@
|
||||
* checksum calculation to the device, or call skb_checksum_help (in the case
|
||||
* that the device does not support offload for a particular checksum).
|
||||
*
|
||||
* NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM are being deprecated in favor of
|
||||
* NETIF_F_HW_CSUM. New devices should use NETIF_F_HW_CSUM to indicate
|
||||
* %NETIF_F_IP_CSUM and %NETIF_F_IPV6_CSUM are being deprecated in favor of
|
||||
* %NETIF_F_HW_CSUM. New devices should use %NETIF_F_HW_CSUM to indicate
|
||||
* checksum offload capability.
|
||||
* skb_csum_hwoffload_help() can be called to resolve CHECKSUM_PARTIAL based
|
||||
* skb_csum_hwoffload_help() can be called to resolve %CHECKSUM_PARTIAL based
|
||||
* on network device checksumming capabilities: if a packet does not match
|
||||
* them, skb_checksum_help or skb_crc32c_help (depending on the value of
|
||||
* csum_not_inet, see item D.) is called to resolve the checksum.
|
||||
* them, skb_checksum_help() or skb_crc32c_help() (depending on the value of
|
||||
* &sk_buff.csum_not_inet, see :ref:`crc`)
|
||||
* is called to resolve the checksum.
|
||||
*
|
||||
* CHECKSUM_NONE:
|
||||
* - %CHECKSUM_NONE
|
||||
*
|
||||
* The skb was already checksummed by the protocol, or a checksum is not
|
||||
* required.
|
||||
*
|
||||
* CHECKSUM_UNNECESSARY:
|
||||
* - %CHECKSUM_UNNECESSARY
|
||||
*
|
||||
* This has the same meaning as CHECKSUM_NONE for checksum offload on
|
||||
* output.
|
||||
*
|
||||
* CHECKSUM_COMPLETE:
|
||||
* - %CHECKSUM_COMPLETE
|
||||
*
|
||||
* Not used in checksum output. If a driver observes a packet with this value
|
||||
* set in skbuff, it should treat the packet as if CHECKSUM_NONE were set.
|
||||
* set in skbuff, it should treat the packet as if %CHECKSUM_NONE were set.
|
||||
*
|
||||
* D. Non-IP checksum (CRC) offloads
|
||||
* .. _crc:
|
||||
*
|
||||
* NETIF_F_SCTP_CRC - This feature indicates that a device is capable of
|
||||
* offloading the SCTP CRC in a packet. To perform this offload the stack
|
||||
* will set csum_start and csum_offset accordingly, set ip_summed to
|
||||
* CHECKSUM_PARTIAL and set csum_not_inet to 1, to provide an indication in
|
||||
* the skbuff that the CHECKSUM_PARTIAL refers to CRC32c.
|
||||
* A driver that supports both IP checksum offload and SCTP CRC32c offload
|
||||
* must verify which offload is configured for a packet by testing the
|
||||
* value of skb->csum_not_inet; skb_crc32c_csum_help is provided to resolve
|
||||
* CHECKSUM_PARTIAL on skbs where csum_not_inet is set to 1.
|
||||
* Non-IP checksum (CRC) offloads
|
||||
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
*
|
||||
* NETIF_F_FCOE_CRC - This feature indicates that a device is capable of
|
||||
* offloading the FCOE CRC in a packet. To perform this offload the stack
|
||||
* will set ip_summed to CHECKSUM_PARTIAL and set csum_start and csum_offset
|
||||
* accordingly. Note that there is no indication in the skbuff that the
|
||||
* CHECKSUM_PARTIAL refers to an FCOE checksum, so a driver that supports
|
||||
* both IP checksum offload and FCOE CRC offload must verify which offload
|
||||
* is configured for a packet, presumably by inspecting packet headers.
|
||||
* .. flat-table::
|
||||
* :widths: 1 10
|
||||
*
|
||||
* E. Checksumming on output with GSO.
|
||||
* * - %NETIF_F_SCTP_CRC
|
||||
* - This feature indicates that a device is capable of
|
||||
* offloading the SCTP CRC in a packet. To perform this offload the stack
|
||||
* will set csum_start and csum_offset accordingly, set ip_summed to
|
||||
* %CHECKSUM_PARTIAL and set csum_not_inet to 1, to provide an indication
|
||||
* in the skbuff that the %CHECKSUM_PARTIAL refers to CRC32c.
|
||||
* A driver that supports both IP checksum offload and SCTP CRC32c offload
|
||||
* must verify which offload is configured for a packet by testing the
|
||||
* value of &sk_buff.csum_not_inet; skb_crc32c_csum_help() is provided to
|
||||
* resolve %CHECKSUM_PARTIAL on skbs where csum_not_inet is set to 1.
|
||||
*
|
||||
* In the case of a GSO packet (skb_is_gso(skb) is true), checksum offload
|
||||
* * - %NETIF_F_FCOE_CRC
|
||||
* - This feature indicates that a device is capable of offloading the FCOE
|
||||
* CRC in a packet. To perform this offload the stack will set ip_summed
|
||||
* to %CHECKSUM_PARTIAL and set csum_start and csum_offset
|
||||
* accordingly. Note that there is no indication in the skbuff that the
|
||||
* %CHECKSUM_PARTIAL refers to an FCOE checksum, so a driver that supports
|
||||
* both IP checksum offload and FCOE CRC offload must verify which offload
|
||||
* is configured for a packet, presumably by inspecting packet headers.
|
||||
*
|
||||
* Checksumming on output with GSO
|
||||
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
*
|
||||
* In the case of a GSO packet (skb_is_gso() is true), checksum offload
|
||||
* is implied by the SKB_GSO_* flags in gso_type. Most obviously, if the
|
||||
* gso_type is SKB_GSO_TCPV4 or SKB_GSO_TCPV6, TCP checksum offload as
|
||||
* gso_type is %SKB_GSO_TCPV4 or %SKB_GSO_TCPV6, TCP checksum offload as
|
||||
* part of the GSO operation is implied. If a checksum is being offloaded
|
||||
* with GSO then ip_summed is CHECKSUM_PARTIAL, and both csum_start and
|
||||
* with GSO then ip_summed is %CHECKSUM_PARTIAL, and both csum_start and
|
||||
* csum_offset are set to refer to the outermost checksum being offloaded
|
||||
* (two offloaded checksums are possible with UDP encapsulation).
|
||||
*/
|
||||
@ -727,16 +756,32 @@ struct skb_shared_info {
|
||||
skb_frag_t frags[MAX_SKB_FRAGS];
|
||||
};
|
||||
|
||||
/* We divide dataref into two halves. The higher 16 bits hold references
|
||||
* to the payload part of skb->data. The lower 16 bits hold references to
|
||||
* the entire skb->data. A clone of a headerless skb holds the length of
|
||||
* the header in skb->hdr_len.
|
||||
/**
|
||||
* DOC: dataref and headerless skbs
|
||||
*
|
||||
* All users must obey the rule that the skb->data reference count must be
|
||||
* greater than or equal to the payload reference count.
|
||||
* Transport layers send out clones of payload skbs they hold for
|
||||
* retransmissions. To allow lower layers of the stack to prepend their headers
|
||||
* we split &skb_shared_info.dataref into two halves.
|
||||
* The lower 16 bits count the overall number of references.
|
||||
* The higher 16 bits indicate how many of the references are payload-only.
|
||||
* skb_header_cloned() checks if skb is allowed to add / write the headers.
|
||||
*
|
||||
* Holding a reference to the payload part means that the user does not
|
||||
* care about modifications to the header part of skb->data.
|
||||
* The creator of the skb (e.g. TCP) marks its skb as &sk_buff.nohdr
|
||||
* (via __skb_header_release()). Any clone created from marked skb will get
|
||||
* &sk_buff.hdr_len populated with the available headroom.
|
||||
* If there's the only clone in existence it's able to modify the headroom
|
||||
* at will. The sequence of calls inside the transport layer is::
|
||||
*
|
||||
* <alloc skb>
|
||||
* skb_reserve()
|
||||
* __skb_header_release()
|
||||
* skb_clone()
|
||||
* // send the clone down the stack
|
||||
*
|
||||
* This is not a very generic construct and it depends on the transport layers
|
||||
* doing the right thing. In practice there's usually only one payload-only skb.
|
||||
* Having multiple payload-only skbs with different lengths of hdr_len is not
|
||||
* possible. The payload-only skbs should never leave their owner.
|
||||
*/
|
||||
#define SKB_DATAREF_SHIFT 16
|
||||
#define SKB_DATAREF_MASK ((1 << SKB_DATAREF_SHIFT) - 1)
|
||||
@ -800,6 +845,46 @@ typedef unsigned int sk_buff_data_t;
|
||||
typedef unsigned char *sk_buff_data_t;
|
||||
#endif
|
||||
|
||||
/**
|
||||
* DOC: Basic sk_buff geometry
|
||||
*
|
||||
* struct sk_buff itself is a metadata structure and does not hold any packet
|
||||
* data. All the data is held in associated buffers.
|
||||
*
|
||||
* &sk_buff.head points to the main "head" buffer. The head buffer is divided
|
||||
* into two parts:
|
||||
*
|
||||
* - data buffer, containing headers and sometimes payload;
|
||||
* this is the part of the skb operated on by the common helpers
|
||||
* such as skb_put() or skb_pull();
|
||||
* - shared info (struct skb_shared_info) which holds an array of pointers
|
||||
* to read-only data in the (page, offset, length) format.
|
||||
*
|
||||
* Optionally &skb_shared_info.frag_list may point to another skb.
|
||||
*
|
||||
* Basic diagram may look like this::
|
||||
*
|
||||
* ---------------
|
||||
* | sk_buff |
|
||||
* ---------------
|
||||
* ,--------------------------- + head
|
||||
* / ,----------------- + data
|
||||
* / / ,----------- + tail
|
||||
* | | | , + end
|
||||
* | | | |
|
||||
* v v v v
|
||||
* -----------------------------------------------
|
||||
* | headroom | data | tailroom | skb_shared_info |
|
||||
* -----------------------------------------------
|
||||
* + [page frag]
|
||||
* + [page frag]
|
||||
* + [page frag]
|
||||
* + [page frag] ---------
|
||||
* + frag_list --> | sk_buff |
|
||||
* ---------
|
||||
*
|
||||
*/
|
||||
|
||||
/**
|
||||
* struct sk_buff - socket buffer
|
||||
* @next: Next buffer in list
|
||||
@ -1987,8 +2072,10 @@ static inline int skb_header_unclone(struct sk_buff *skb, gfp_t pri)
|
||||
}
|
||||
|
||||
/**
|
||||
* __skb_header_release - release reference to header
|
||||
* @skb: buffer to operate on
|
||||
* __skb_header_release() - allow clones to use the headroom
|
||||
* @skb: buffer to operate on
|
||||
*
|
||||
* See "DOC: dataref and headerless skbs".
|
||||
*/
|
||||
static inline void __skb_header_release(struct sk_buff *skb)
|
||||
{
|
||||
|
Loading…
Reference in New Issue
Block a user