linux/net/ipv6
Florian Westphal df5042f4c5 sk_buff: add skb extension infrastructure
This adds an optional extension infrastructure, with ispec (xfrm) and
bridge netfilter as first users.
objdiff shows no changes if kernel is built without xfrm and br_netfilter
support.

The third (planned future) user is Multipath TCP which is still
out-of-tree.
MPTCP needs to map logical mptcp sequence numbers to the tcp sequence
numbers used by individual subflows.

This DSS mapping is read/written from tcp option space on receive and
written to tcp option space on transmitted tcp packets that are part of
and MPTCP connection.

Extending skb_shared_info or adding a private data field to skb fclones
doesn't work for incoming skb, so a different DSS propagation method would
be required for the receive side.

mptcp has same requirements as secpath/bridge netfilter:

1. extension memory is released when the sk_buff is free'd.
2. data is shared after cloning an skb (clone inherits extension)
3. adding extension to an skb will COW the extension buffer if needed.

The "MPTCP upstreaming" effort adds SKB_EXT_MPTCP extension to store the
mapping for tx and rx processing.

Two new members are added to sk_buff:
1. 'active_extensions' byte (filling a hole), telling which extensions
   are available for this skb.
   This has two purposes.
   a) avoids the need to initialize the pointer.
   b) allows to "delete" an extension by clearing its bit
   value in ->active_extensions.

   While it would be possible to store the active_extensions byte
   in the extension struct instead of sk_buff, there is one problem
   with this:
    When an extension has to be disabled, we can always clear the
    bit in skb->active_extensions.  But in case it would be stored in the
    extension buffer itself, we might have to COW it first, if
    we are dealing with a cloned skb.  On kmalloc failure we would
    be unable to turn an extension off.

2. extension pointer, located at the end of the sk_buff.
   If the active_extensions byte is 0, the pointer is undefined,
   it is not initialized on skb allocation.

This adds extra code to skb clone and free paths (to deal with
refcount/free of extension area) but this replaces similar code that
manages skb->nf_bridge and skb->sp structs in the followup patches of
the series.

It is possible to add support for extensions that are not preseved on
clones/copies.

To do this, it would be needed to define a bitmask of all extensions that
need copy/cow semantics, and change __skb_ext_copy() to check
->active_extensions & SKB_EXT_PRESERVE_ON_CLONE, then just set
->active_extensions to 0 on the new clone.

This isn't done here because all extensions that get added here
need the copy/cow semantics.

v2:
Allocate entire extension space using kmem_cache.
Upside is that this allows better tracking of used memory,
downside is that we will allocate more space than strictly needed in
most cases (its unlikely that all extensions are active/needed at same
time for same skb).
The allocated memory (except the small extension header) is not cleared,
so no additonal overhead aside from memory usage.

Avoid atomic_dec_and_test operation on skb_ext_put()
by using similar trick as kfree_skbmem() does with fclone_ref:
If recount is 1, there is no concurrent user and we can free right away.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 11:21:37 -08:00
..
ila ila: remove blank lines at EOF 2018-07-24 14:10:42 -07:00
netfilter netfilter: avoid using skb->nf_bridge directly 2018-12-19 11:21:37 -08:00
addrconf_core.c net/ipv6: Add helper to return path MTU based on fib result 2018-05-22 10:51:09 +02:00
addrconf.c net: core: dev: Add extack argument to dev_open() 2018-12-06 13:26:06 -08:00
addrlabel.c net/ipv6: Update ip6addrlbl_dump for strict data checking 2018-10-08 10:39:05 -07:00
af_inet6.c net/ipv6: Add anycast addresses to a global hashtable 2018-11-02 23:54:56 -07:00
ah6.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-11-15 11:56:19 -08:00
anycast.c net/ipv6: compute anycast address hash only if dev is null 2018-11-08 17:04:43 -08:00
calipso.c ipv6: make ipv6_renew_options() interrupt/kernel safe 2018-07-05 20:15:26 +09:00
datagram.c net: ensure unbound datagram socket to be chosen when not in a VRF 2018-11-07 16:12:38 -08:00
esp6_offload.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2018-07-27 09:33:37 -07:00
esp6.c esp: remove redundant define esph 2018-08-29 08:02:43 +02:00
exthdrs_core.c net: ipv6: Fix typo in ipv6_find_hdr() documentation 2018-05-07 23:50:27 -04:00
exthdrs_offload.c
exthdrs.c ipv6: make ipv6_renew_options() interrupt/kernel safe 2018-07-05 20:15:26 +09:00
fib6_notifier.c
fib6_rules.c net/ipv6: Add fib6_lookup 2018-05-11 00:10:56 +02:00
fou6.c fou, fou6: ICMP error handlers for FoU and GUE 2018-11-08 17:13:08 -08:00
icmp.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
inet6_connection_sock.c
inet6_hashtables.c net: tcp6: prefer listeners bound to an address 2018-12-14 15:55:20 -08:00
ip6_checksum.c net: udp: fix handling of CHECKSUM_COMPLETE packets 2018-10-24 14:18:16 -07:00
ip6_fib.c ipv6: properly check return value in inet6_dump_all() 2018-11-05 17:04:54 -08:00
ip6_flowlabel.c ipv6: fold sockcm_cookie into ipcm6_cookie 2018-07-07 10:58:49 +09:00
ip6_gre.c net: Add netif_is_gretap()/netif_is_ip6gretap() 2018-12-10 15:53:04 -08:00
ip6_icmp.c
ip6_input.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-09 21:43:31 -08:00
ip6_offload.c net: use indirect call wrappers at GRO transport layer 2018-12-15 13:23:02 -08:00
ip6_offload.h
ip6_output.c sk_buff: add skb extension infrastructure 2018-12-19 11:21:37 -08:00
ip6_tunnel.c ip6_tunnel: Fix encapsulation layout 2018-10-18 16:54:40 -07:00
ip6_udp_tunnel.c udp_tunnel: add config option to bind to a device 2018-12-03 14:15:26 -08:00
ip6_vti.c vti6: remove !skb->ignore_df check from vti6_xmit() 2018-08-29 17:51:44 -07:00
ip6mr.c ip6mr: Drop mfc6_cache argument to ip6mr_forward2 2018-12-17 23:31:14 -08:00
ipcomp6.c
ipv6_sockglue.c ipv6: allow ping to link-local address in VRF 2018-11-07 16:12:39 -08:00
Kconfig net: remove blank lines at end of file 2018-07-24 14:10:43 -07:00
Makefile
mcast_snoop.c
mcast.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-10-19 11:03:06 -07:00
mip6.c
ndisc.c ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called 2018-10-26 15:58:06 -07:00
netfilter.c netfilter: ipv6: Preserve link scope traffic original oif 2018-11-27 00:12:20 +01:00
output_core.c net: accept UFO datagrams from tuntap and packet 2017-11-24 01:37:35 +09:00
ping.c ipv6: fold sockcm_cookie into ipcm6_cookie 2018-07-07 10:58:49 +09:00
proc.c proc: introduce proc_create_net_single 2018-05-16 07:24:30 +02:00
protocol.c
raw.c net: fix raw socket lookup device bind matching with VRFs 2018-11-07 16:12:39 -08:00
reassembly.c ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes 2018-12-05 20:44:46 -08:00
route.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-11-19 10:55:00 -08:00
seg6_hmac.c Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-07-03 10:29:26 +09:00
seg6_iptunnel.c ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output 2018-12-07 12:22:39 -08:00
seg6_local.c bpf: add End.DT6 action to bpf_lwt_seg6_action helper 2018-07-31 09:22:48 +02:00
seg6.c rhashtable: split rhashtable.h 2018-06-22 13:43:27 +09:00
sit.c net-ipv4: remove 2 always zero parameters from ipv4_redirect() 2018-09-26 20:30:55 -07:00
syncookies.c net/ipv4: disable SMC TCP option with SYN Cookies 2018-03-25 20:53:54 -04:00
sysctl_net_ipv6.c ipv6: sr: Compute flowlabel for outer IPv6 header of seg6 encap mode 2018-04-25 13:02:15 -04:00
tcp_ipv6.c ipv6: Fix handling of LLA with VRF and sockets bound to VRF 2018-12-15 11:36:14 -08:00
tcpv6_offload.c net: use indirect call wrappers at GRO transport layer 2018-12-15 13:23:02 -08:00
tunnel6.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
udp_impl.h net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
udp_offload.c net: use indirect call wrappers at GRO transport layer 2018-12-15 13:23:02 -08:00
udp.c net: udp6: prefer listeners bound to an address 2018-12-14 15:55:20 -08:00
udplite.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
xfrm6_input.c xfrm: reset transport header back to network header after all input transforms ahave been applied 2018-09-04 10:26:30 +02:00
xfrm6_mode_beet.c
xfrm6_mode_ro.c ipv6: xfrm: use 64-bit timestamps 2018-07-11 15:26:35 +02:00
xfrm6_mode_transport.c xfrm: reset transport header back to network header after all input transforms ahave been applied 2018-09-04 10:26:30 +02:00
xfrm6_mode_tunnel.c xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto 2018-03-07 10:54:29 +01:00
xfrm6_output.c xfrm6: call kfree_skb when skb is toobig 2018-09-03 07:37:57 +02:00
xfrm6_policy.c xfrm6: remove BUG_ON from xfrm6_dst_ifdown 2018-11-22 07:55:48 +01:00
xfrm6_protocol.c net: Convert protocol error handlers from void to int 2018-11-08 17:13:08 -08:00
xfrm6_state.c xfrm: remove VLA usage in __xfrm6_sort() 2018-04-26 07:51:48 +02:00
xfrm6_tunnel.c xfrm: Fix warning in xfrm6_tunnel_net_exit. 2018-04-16 07:50:09 +02:00