Joby Poriyath provided a xen-netback patch to reduce the size of xenvif structure as some netdev allocation could fail under memory pressure/fragmentation. This patch is handling the problem at the core level, allowing any netdev structures to use vmalloc() if kmalloc() failed. As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT to kzalloc() flags to do this fallback only when really needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Joby Poriyath <joby.poriyath@citrix.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
		
			
				
	
	
		
			108 lines
		
	
	
		
			4.0 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			108 lines
		
	
	
		
			4.0 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| 
 | |
| Network Devices, the Kernel, and You!
 | |
| 
 | |
| 
 | |
| Introduction
 | |
| ============
 | |
| The following is a random collection of documentation regarding
 | |
| network devices.
 | |
| 
 | |
| struct net_device allocation rules
 | |
| ==================================
 | |
| Network device structures need to persist even after module is unloaded and
 | |
| must be allocated with alloc_netdev_mqs() and friends.
 | |
| If device has registered successfully, it will be freed on last use
 | |
| by free_netdev(). This is required to handle the pathologic case cleanly
 | |
| (example: rmmod mydriver </sys/class/net/myeth/mtu )
 | |
| 
 | |
| alloc_netdev_mqs()/alloc_netdev() reserve extra space for driver
 | |
| private data which gets freed when the network device is freed. If
 | |
| separately allocated data is attached to the network device
 | |
| (netdev_priv(dev)) then it is up to the module exit handler to free that.
 | |
| 
 | |
| MTU
 | |
| ===
 | |
| Each network device has a Maximum Transfer Unit. The MTU does not
 | |
| include any link layer protocol overhead. Upper layer protocols must
 | |
| not pass a socket buffer (skb) to a device to transmit with more data
 | |
| than the mtu. The MTU does not include link layer header overhead, so
 | |
| for example on Ethernet if the standard MTU is 1500 bytes used, the
 | |
| actual skb will contain up to 1514 bytes because of the Ethernet
 | |
| header. Devices should allow for the 4 byte VLAN header as well.
 | |
| 
 | |
| Segmentation Offload (GSO, TSO) is an exception to this rule.  The
 | |
| upper layer protocol may pass a large socket buffer to the device
 | |
| transmit routine, and the device will break that up into separate
 | |
| packets based on the current MTU.
 | |
| 
 | |
| MTU is symmetrical and applies both to receive and transmit. A device
 | |
| must be able to receive at least the maximum size packet allowed by
 | |
| the MTU. A network device may use the MTU as mechanism to size receive
 | |
| buffers, but the device should allow packets with VLAN header. With
 | |
| standard Ethernet mtu of 1500 bytes, the device should allow up to
 | |
| 1518 byte packets (1500 + 14 header + 4 tag).  The device may either:
 | |
| drop, truncate, or pass up oversize packets, but dropping oversize
 | |
| packets is preferred.
 | |
| 
 | |
| 
 | |
| struct net_device synchronization rules
 | |
| =======================================
 | |
| ndo_open:
 | |
| 	Synchronization: rtnl_lock() semaphore.
 | |
| 	Context: process
 | |
| 
 | |
| ndo_stop:
 | |
| 	Synchronization: rtnl_lock() semaphore.
 | |
| 	Context: process
 | |
| 	Note: netif_running() is guaranteed false
 | |
| 
 | |
| ndo_do_ioctl:
 | |
| 	Synchronization: rtnl_lock() semaphore.
 | |
| 	Context: process
 | |
| 
 | |
| ndo_get_stats:
 | |
| 	Synchronization: dev_base_lock rwlock.
 | |
| 	Context: nominally process, but don't sleep inside an rwlock
 | |
| 
 | |
| ndo_start_xmit:
 | |
| 	Synchronization: __netif_tx_lock spinlock.
 | |
| 
 | |
| 	When the driver sets NETIF_F_LLTX in dev->features this will be
 | |
| 	called without holding netif_tx_lock. In this case the driver
 | |
| 	has to lock by itself when needed. It is recommended to use a try lock
 | |
| 	for this and return NETDEV_TX_LOCKED when the spin lock fails.
 | |
| 	The locking there should also properly protect against 
 | |
| 	set_rx_mode. Note that the use of NETIF_F_LLTX is deprecated.
 | |
| 	Don't use it for new drivers.
 | |
| 
 | |
| 	Context: Process with BHs disabled or BH (timer),
 | |
| 	         will be called with interrupts disabled by netconsole.
 | |
| 
 | |
| 	Return codes: 
 | |
| 	o NETDEV_TX_OK everything ok. 
 | |
| 	o NETDEV_TX_BUSY Cannot transmit packet, try later 
 | |
| 	  Usually a bug, means queue start/stop flow control is broken in
 | |
| 	  the driver. Note: the driver must NOT put the skb in its DMA ring.
 | |
| 	o NETDEV_TX_LOCKED Locking failed, please retry quickly.
 | |
| 	  Only valid when NETIF_F_LLTX is set.
 | |
| 
 | |
| ndo_tx_timeout:
 | |
| 	Synchronization: netif_tx_lock spinlock; all TX queues frozen.
 | |
| 	Context: BHs disabled
 | |
| 	Notes: netif_queue_stopped() is guaranteed true
 | |
| 
 | |
| ndo_set_rx_mode:
 | |
| 	Synchronization: netif_addr_lock spinlock.
 | |
| 	Context: BHs disabled
 | |
| 
 | |
| struct napi_struct synchronization rules
 | |
| ========================================
 | |
| napi->poll:
 | |
| 	Synchronization: NAPI_STATE_SCHED bit in napi->state.  Device
 | |
| 		driver's ndo_stop method will invoke napi_disable() on
 | |
| 		all NAPI instances which will do a sleeping poll on the
 | |
| 		NAPI_STATE_SCHED napi->state bit, waiting for all pending
 | |
| 		NAPI activity to cease.
 | |
| 	Context: softirq
 | |
| 	         will be called with interrupts disabled by netconsole.
 |