linux/net/ipv4
Florian Westphal df5042f4c5 sk_buff: add skb extension infrastructure
This adds an optional extension infrastructure, with ispec (xfrm) and
bridge netfilter as first users.
objdiff shows no changes if kernel is built without xfrm and br_netfilter
support.

The third (planned future) user is Multipath TCP which is still
out-of-tree.
MPTCP needs to map logical mptcp sequence numbers to the tcp sequence
numbers used by individual subflows.

This DSS mapping is read/written from tcp option space on receive and
written to tcp option space on transmitted tcp packets that are part of
and MPTCP connection.

Extending skb_shared_info or adding a private data field to skb fclones
doesn't work for incoming skb, so a different DSS propagation method would
be required for the receive side.

mptcp has same requirements as secpath/bridge netfilter:

1. extension memory is released when the sk_buff is free'd.
2. data is shared after cloning an skb (clone inherits extension)
3. adding extension to an skb will COW the extension buffer if needed.

The "MPTCP upstreaming" effort adds SKB_EXT_MPTCP extension to store the
mapping for tx and rx processing.

Two new members are added to sk_buff:
1. 'active_extensions' byte (filling a hole), telling which extensions
   are available for this skb.
   This has two purposes.
   a) avoids the need to initialize the pointer.
   b) allows to "delete" an extension by clearing its bit
   value in ->active_extensions.

   While it would be possible to store the active_extensions byte
   in the extension struct instead of sk_buff, there is one problem
   with this:
    When an extension has to be disabled, we can always clear the
    bit in skb->active_extensions.  But in case it would be stored in the
    extension buffer itself, we might have to COW it first, if
    we are dealing with a cloned skb.  On kmalloc failure we would
    be unable to turn an extension off.

2. extension pointer, located at the end of the sk_buff.
   If the active_extensions byte is 0, the pointer is undefined,
   it is not initialized on skb allocation.

This adds extra code to skb clone and free paths (to deal with
refcount/free of extension area) but this replaces similar code that
manages skb->nf_bridge and skb->sp structs in the followup patches of
the series.

It is possible to add support for extensions that are not preseved on
clones/copies.

To do this, it would be needed to define a bitmask of all extensions that
need copy/cow semantics, and change __skb_ext_copy() to check
->active_extensions & SKB_EXT_PRESERVE_ON_CLONE, then just set
->active_extensions to 0 on the new clone.

This isn't done here because all extensions that get added here
need the copy/cow semantics.

v2:
Allocate entire extension space using kmem_cache.
Upside is that this allows better tracking of used memory,
downside is that we will allocate more space than strictly needed in
most cases (its unlikely that all extensions are active/needed at same
time for same skb).
The allocated memory (except the small extension header) is not cleared,
so no additonal overhead aside from memory usage.

Avoid atomic_dec_and_test operation on skb_ext_put()
by using similar trick as kfree_skbmem() does with fclone_ref:
If recount is 1, there is no concurrent user and we can free right away.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 11:21:37 -08:00
..
bpfilter bpfilter: remove trailing newline 2018-07-24 14:10:42 -07:00
netfilter netfilter: avoid using skb->nf_bridge directly 2018-12-19 11:21:37 -08:00
af_inet.c net: use indirect call wrappers at GRO transport layer 2018-12-15 13:23:02 -08:00
ah4.c net-ipv4: remove 2 always zero parameters from ipv4_redirect() 2018-09-26 20:30:55 -07:00
arp.c net: Evict neighbor entries on carrier down 2018-10-12 09:47:39 -07:00
cipso_ipv4.c net/ipv4: defensive cipso option parsing 2018-09-17 19:37:46 -07:00
datagram.c ipv4: Allow sending multicast packets on specific i/f using VRF socket 2018-10-02 22:28:17 -07:00
devinet.c net: core: dev: Add extack argument to dev_change_flags() 2018-12-06 13:26:07 -08:00
esp4_offload.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2018-07-27 09:33:37 -07:00
esp4.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2018-10-01 22:31:17 -07:00
fib_frontend.c net: Don't return invalid table id error when dumping all families 2018-10-24 14:06:25 -07:00
fib_lookup.h
fib_notifier.c
fib_rules.c net: fib_rules: add extack support 2018-04-23 10:21:24 -04:00
fib_semantics.c net: Add extack argument to ip_fib_metrics_init 2018-11-06 15:00:45 -08:00
fib_trie.c net/ipv4: Plumb support for filtering route dumps 2018-10-16 00:13:12 -07:00
fou.c fou: Prevent unbounded recursion in GUE error handler 2018-12-17 21:38:04 -08:00
gre_demux.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
gre_offload.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-07-03 10:29:26 +09:00
icmp.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
igmp.c ipv4/igmp: fix v1/v2 switchback timeout based on rfc3376, 8.12 2018-10-29 20:26:06 -07:00
inet_connection_sock.c inet: minor optimization for backlog setting in listen(2) 2018-11-07 22:31:07 -08:00
inet_diag.c sock_diag: request _diag module only when the family or proto has been registered 2018-03-12 11:03:42 -04:00
inet_fragment.c inet: frags: better deal with smp races 2018-11-08 18:40:30 -08:00
inet_hashtables.c net: dccp: initialize (addr,port) listening hashtable 2018-12-17 23:11:48 -08:00
inet_timewait_sock.c soreuseport: initialise timewait reuseport field 2018-04-07 22:32:32 -04:00
inetpeer.c inetpeer: fix uninit-value in inet_getpeer 2018-04-09 10:57:35 -04:00
ip_forward.c net: Do not route unicast IP packets twice 2018-12-04 08:36:36 -08:00
ip_fragment.c ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes 2018-12-05 20:44:46 -08:00
ip_gre.c net: Add netif_is_gretap()/netif_is_ip6gretap() 2018-12-10 15:53:04 -08:00
ip_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-09 21:43:31 -08:00
ip_options.c
ip_output.c sk_buff: add skb extension infrastructure 2018-12-19 11:21:37 -08:00
ip_sockglue.c net: bpfilter: fix iptables failure if bpfilter_umh is disabled 2018-11-05 17:12:18 -08:00
ip_tunnel_core.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-11-19 10:55:00 -08:00
ip_tunnel.c ip_tunnel: be careful when accessing the inner header 2018-09-24 12:27:04 -07:00
ip_vti.c net-ipv4: remove 2 always zero parameters from ipv4_redirect() 2018-09-26 20:30:55 -07:00
ipcomp.c net-ipv4: remove 2 always zero parameters from ipv4_redirect() 2018-09-26 20:30:55 -07:00
ipconfig.c ipconfig: convert to DEFINE_SHOW_ATTRIBUTE 2018-12-15 11:21:22 -08:00
ipip.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
ipmr_base.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-10-19 11:03:06 -07:00
ipmr.c ipmr: Drop mfc_cache argument to ipmr_queue_xmit 2018-12-17 23:31:14 -08:00
Kconfig net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
Makefile bpf, sockmap: convert to generic sk_msg interface 2018-10-15 12:23:19 -07:00
metrics.c net: Add extack argument to ip_fib_metrics_init 2018-11-06 15:00:45 -08:00
netfilter.c netfilter: utils: move nf_ip_checksum* from ipv4 to utils 2018-07-16 17:51:48 +02:00
netlink.c ipv4: support sport, dport and ip_proto in RTM_GETROUTE 2018-05-23 15:14:12 -04:00
ping.c ipv4: Allow sending multicast packets on specific i/f using VRF socket 2018-10-02 22:28:17 -07:00
proc.c tcp: implement coalescing on backlog queue 2018-11-30 13:26:54 -08:00
protocol.c fou, fou6: ICMP error handlers for FoU and GUE 2018-11-08 17:13:08 -08:00
raw_diag.c
raw.c net/ipv4: Fix missing raw_init when CONFIG_PROC_FS is disabled 2018-11-27 20:58:02 -08:00
route.c ipv4: Don't try to print ASCII of link level header in martian dumps. 2018-11-20 10:15:36 -08:00
syncookies.c tcp: provide earliest departure time in skb->tstamp 2018-09-21 19:37:59 -07:00
sysctl_net_ipv4.c net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFs 2018-11-07 16:12:38 -08:00
tcp_bbr.c tcp_bbr: update comments to reflect pacing_margin_percent 2018-11-08 20:46:17 -08:00
tcp_bic.c
tcp_bpf.c bpf: helper to pop data from messages 2018-11-28 22:07:57 +01:00
tcp_cdg.c tcp: cdg: use tcp high resolution clock cache 2018-10-15 22:56:42 -07:00
tcp_cong.c tcp: Namespace-ify sysctl_tcp_default_congestion_control 2017-11-15 14:09:52 +09:00
tcp_cubic.c
tcp_dctcp.c tcp: refactor DCTCP ECN ACK handling 2018-10-10 22:26:00 -07:00
tcp_dctcp.h tcp: refactor DCTCP ECN ACK handling 2018-10-10 22:26:00 -07:00
tcp_diag.c net: sock: replace sk_state_load with inet_sk_state_load and remove sk_state_store 2017-12-20 14:00:25 -05:00
tcp_fastopen.c tcp: pause Fast Open globally after third consecutive timeout 2017-12-13 15:51:12 -05:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c net/tcp/illinois: replace broken algorithm reference link 2018-02-28 12:03:47 -05:00
tcp_input.c tcp: take care of compressed acks in tcp_add_reno_sack() 2018-11-30 13:26:53 -08:00
tcp_ipv4.c tcp: md5: add tcp_md5_needed jump label 2018-11-30 13:28:03 -08:00
tcp_lp.c
tcp_metrics.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
tcp_minisocks.c tcp: do not restart timewait timer on rst reception 2018-08-31 23:10:35 -07:00
tcp_nv.c tcp_nv: fix potential integer overflow in tcpnv_acked 2018-01-31 10:26:30 -05:00
tcp_offload.c net: use indirect call wrappers at GRO transport layer 2018-12-15 13:23:02 -08:00
tcp_output.c tcp: handle EOR and FIN conditions the same in tcp_tso_should_defer() 2018-12-10 12:09:15 -08:00
tcp_rate.c tcp: introduce tcp_skb_timestamp_us() helper 2018-09-21 19:37:59 -07:00
tcp_recovery.c tcp: introduce tcp_skb_timestamp_us() helper 2018-09-21 19:37:59 -07:00
tcp_scalable.c
tcp_timer.c tcp: fix SNMP TCP timeout under-estimation 2018-11-30 17:22:41 -08:00
tcp_ulp.c tcp, ulp: remove socket lock assertion on ULP cleanup 2018-10-16 12:38:41 -07:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tcp.c tcp: fix code style in tcp_recvmsg() 2018-12-06 12:19:47 -08:00
tunnel4.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
udp_diag.c net: diag: document swapped src/dst in udp_dump_one. 2018-10-28 19:27:21 -07:00
udp_impl.h net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
udp_offload.c udp: use indirect call wrappers for GRO socket lookup 2018-12-15 13:23:02 -08:00
udp_tunnel.c udp_tunnel: add config option to bind to a device 2018-12-03 14:15:26 -08:00
udp.c net: udp: prefer listeners bound to an address 2018-12-14 15:55:20 -08:00
udplite.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
xfrm4_input.c xfrm: reset transport header back to network header after all input transforms ahave been applied 2018-09-04 10:26:30 +02:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c xfrm: reset transport header back to network header after all input transforms ahave been applied 2018-09-04 10:26:30 +02:00
xfrm4_mode_tunnel.c xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto 2018-03-07 10:54:29 +01:00
xfrm4_output.c net: xfrm: use skb_gso_validate_network_len() to check gso sizes 2018-03-04 17:49:17 -05:00
xfrm4_policy.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
xfrm4_protocol.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
xfrm4_state.c
xfrm4_tunnel.c