Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
		
			
				
	
	
		
			168 lines
		
	
	
		
			6.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			168 lines
		
	
	
		
			6.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| Netdev features mess and how to get out from it alive
 | |
| =====================================================
 | |
| 
 | |
| Author:
 | |
| 	Michał Mirosław <mirq-linux@rere.qmqm.pl>
 | |
| 
 | |
| 
 | |
| 
 | |
|  Part I: Feature sets
 | |
| ======================
 | |
| 
 | |
| Long gone are the days when a network card would just take and give packets
 | |
| verbatim.  Today's devices add multiple features and bugs (read: offloads)
 | |
| that relieve an OS of various tasks like generating and checking checksums,
 | |
| splitting packets, classifying them.  Those capabilities and their state
 | |
| are commonly referred to as netdev features in Linux kernel world.
 | |
| 
 | |
| There are currently three sets of features relevant to the driver, and
 | |
| one used internally by network core:
 | |
| 
 | |
|  1. netdev->hw_features set contains features whose state may possibly
 | |
|     be changed (enabled or disabled) for a particular device by user's
 | |
|     request.  This set should be initialized in ndo_init callback and not
 | |
|     changed later.
 | |
| 
 | |
|  2. netdev->features set contains features which are currently enabled
 | |
|     for a device.  This should be changed only by network core or in
 | |
|     error paths of ndo_set_features callback.
 | |
| 
 | |
|  3. netdev->vlan_features set contains features whose state is inherited
 | |
|     by child VLAN devices (limits netdev->features set).  This is currently
 | |
|     used for all VLAN devices whether tags are stripped or inserted in
 | |
|     hardware or software.
 | |
| 
 | |
|  4. netdev->wanted_features set contains feature set requested by user.
 | |
|     This set is filtered by ndo_fix_features callback whenever it or
 | |
|     some device-specific conditions change. This set is internal to
 | |
|     networking core and should not be referenced in drivers.
 | |
| 
 | |
| 
 | |
| 
 | |
|  Part II: Controlling enabled features
 | |
| =======================================
 | |
| 
 | |
| When current feature set (netdev->features) is to be changed, new set
 | |
| is calculated and filtered by calling ndo_fix_features callback
 | |
| and netdev_fix_features(). If the resulting set differs from current
 | |
| set, it is passed to ndo_set_features callback and (if the callback
 | |
| returns success) replaces value stored in netdev->features.
 | |
| NETDEV_FEAT_CHANGE notification is issued after that whenever current
 | |
| set might have changed.
 | |
| 
 | |
| The following events trigger recalculation:
 | |
|  1. device's registration, after ndo_init returned success
 | |
|  2. user requested changes in features state
 | |
|  3. netdev_update_features() is called
 | |
| 
 | |
| ndo_*_features callbacks are called with rtnl_lock held. Missing callbacks
 | |
| are treated as always returning success.
 | |
| 
 | |
| A driver that wants to trigger recalculation must do so by calling
 | |
| netdev_update_features() while holding rtnl_lock. This should not be done
 | |
| from ndo_*_features callbacks. netdev->features should not be modified by
 | |
| driver except by means of ndo_fix_features callback.
 | |
| 
 | |
| 
 | |
| 
 | |
|  Part III: Implementation hints
 | |
| ================================
 | |
| 
 | |
|  * ndo_fix_features:
 | |
| 
 | |
| All dependencies between features should be resolved here. The resulting
 | |
| set can be reduced further by networking core imposed limitations (as coded
 | |
| in netdev_fix_features()). For this reason it is safer to disable a feature
 | |
| when its dependencies are not met instead of forcing the dependency on.
 | |
| 
 | |
| This callback should not modify hardware nor driver state (should be
 | |
| stateless).  It can be called multiple times between successive
 | |
| ndo_set_features calls.
 | |
| 
 | |
| Callback must not alter features contained in NETIF_F_SOFT_FEATURES or
 | |
| NETIF_F_NEVER_CHANGE sets. The exception is NETIF_F_VLAN_CHALLENGED but
 | |
| care must be taken as the change won't affect already configured VLANs.
 | |
| 
 | |
|  * ndo_set_features:
 | |
| 
 | |
| Hardware should be reconfigured to match passed feature set. The set
 | |
| should not be altered unless some error condition happens that can't
 | |
| be reliably detected in ndo_fix_features. In this case, the callback
 | |
| should update netdev->features to match resulting hardware state.
 | |
| Errors returned are not (and cannot be) propagated anywhere except dmesg.
 | |
| (Note: successful return is zero, >0 means silent error.)
 | |
| 
 | |
| 
 | |
| 
 | |
|  Part IV: Features
 | |
| ===================
 | |
| 
 | |
| For current list of features, see include/linux/netdev_features.h.
 | |
| This section describes semantics of some of them.
 | |
| 
 | |
|  * Transmit checksumming
 | |
| 
 | |
| For complete description, see comments near the top of include/linux/skbuff.h.
 | |
| 
 | |
| Note: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM.
 | |
| It means that device can fill TCP/UDP-like checksum anywhere in the packets
 | |
| whatever headers there might be.
 | |
| 
 | |
|  * Transmit TCP segmentation offload
 | |
| 
 | |
| NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit
 | |
| set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6).
 | |
| 
 | |
|  * Transmit DMA from high memory
 | |
| 
 | |
| On platforms where this is relevant, NETIF_F_HIGHDMA signals that
 | |
| ndo_start_xmit can handle skbs with frags in high memory.
 | |
| 
 | |
|  * Transmit scatter-gather
 | |
| 
 | |
| Those features say that ndo_start_xmit can handle fragmented skbs:
 | |
| NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST ---
 | |
| chained skbs (skb->next/prev list).
 | |
| 
 | |
|  * Software features
 | |
| 
 | |
| Features contained in NETIF_F_SOFT_FEATURES are features of networking
 | |
| stack. Driver should not change behaviour based on them.
 | |
| 
 | |
|  * LLTX driver (deprecated for hardware drivers)
 | |
| 
 | |
| NETIF_F_LLTX should be set in drivers that implement their own locking in
 | |
| transmit path or don't need locking at all (e.g. software tunnels).
 | |
| In ndo_start_xmit, it is recommended to use a try_lock and return
 | |
| NETDEV_TX_LOCKED when the spin lock fails.  The locking should also properly
 | |
| protect against other callbacks (the rules you need to find out).
 | |
| 
 | |
| Don't use it for new drivers.
 | |
| 
 | |
|  * netns-local device
 | |
| 
 | |
| NETIF_F_NETNS_LOCAL is set for devices that are not allowed to move between
 | |
| network namespaces (e.g. loopback).
 | |
| 
 | |
| Don't use it in drivers.
 | |
| 
 | |
|  * VLAN challenged
 | |
| 
 | |
| NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN
 | |
| headers. Some drivers set this because the cards can't handle the bigger MTU.
 | |
| [FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU
 | |
| VLANs. This may be not useful, though.]
 | |
| 
 | |
| *  rx-fcs
 | |
| 
 | |
| This requests that the NIC append the Ethernet Frame Checksum (FCS)
 | |
| to the end of the skb data.  This allows sniffers and other tools to
 | |
| read the CRC recorded by the NIC on receipt of the packet.
 | |
| 
 | |
| *  rx-all
 | |
| 
 | |
| This requests that the NIC receive all possible frames, including errored
 | |
| frames (such as bad FCS, etc).  This can be helpful when sniffing a link with
 | |
| bad packets on it.  Some NICs may receive more packets if also put into normal
 | |
| PROMISC mode.
 |