linux/net/ipv4
David S. Miller 7d384846b9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains a second batch of Netfilter updates for
your net-next tree. This includes a rework of the core hook
infrastructure that improves Netfilter performance by ~15% according to
synthetic benchmarks. Then, a large batch with ipset updates, including
a new hash:ipmac set type, via Jozsef Kadlecsik. This also includes a
couple of assorted updates.

Regarding the core hook infrastructure rework to improve performance,
using this simple drop-all packets ruleset from ingress:

        nft add table netdev x
        nft add chain netdev x y { type filter hook ingress device eth0 priority 0\; }
        nft add rule netdev x y drop

And generating traffic through Jesper Brouer's
samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh script using -i
option. perf report shows nf_tables calls in its top 10:

    17.30%  kpktgend_0   [nf_tables]            [k] nft_do_chain
    15.75%  kpktgend_0   [kernel.vmlinux]       [k] __netif_receive_skb_core
    10.39%  kpktgend_0   [nf_tables_netdev]     [k] nft_do_chain_netdev

I'm measuring here an improvement of ~15% in performance with this
patchset, so we got +2.5Mpps more. I have used my old laptop Intel(R)
Core(TM) i5-3320M CPU @ 2.60GHz 4-cores.

This rework contains more specifically, in strict order, these patches:

1) Remove compile-time debugging from core.

2) Remove obsolete comments that predate the rcu era. These days it is
   well known that a Netfilter hook always runs under rcu_read_lock().

3) Remove threshold handling, this is only used by br_netfilter too.
   We already have specific code to handle this from br_netfilter,
   so remove this code from the core path.

4) Deprecate NF_STOP, as this is only used by br_netfilter.

5) Place nf_state_hook pointer into xt_action_param structure, so
   this structure fits into one single cacheline according to pahole.
   This also implicit affects nftables since it also relies on the
   xt_action_param structure.

6) Move state->hook_entries into nf_queue entry. The hook_entries
   pointer is only required by nf_queue(), so we can store this in the
   queue entry instead.

7) use switch() statement to handle verdict cases.

8) Remove hook_entries field from nf_hook_state structure, this is only
   required by nf_queue, so store it in nf_queue_entry structure.

9) Merge nf_iterate() into nf_hook_slow() that results in a much more
   simple and readable function.

10) Handle NF_REPEAT away from the core, so far the only client is
    nf_conntrack_in() and we can restart the packet processing using a
    simple goto to jump back there when the TCP requires it.
    This update required a second pass to fix fallout, fix from
    Arnd Bergmann.

11) Set random seed from nft_hash when no seed is specified from
    userspace.

12) Simplify nf_tables expression registration, in a much smarter way
    to save lots of boiler plate code, by Liping Zhang.

13) Simplify layer 4 protocol conntrack tracker registration, from
    Davide Caratti.

14) Missing CONFIG_NF_SOCKET_IPV4 dependency for udp4_lib_lookup, due
    to recent generalization of the socket infrastructure, from Arnd
    Bergmann.

15) Then, the ipset batch from Jozsef, he describes it as it follows:

* Cleanup: Remove extra whitespaces in ip_set.h
* Cleanup: Mark some of the helpers arguments as const in ip_set.h
* Cleanup: Group counter helper functions together in ip_set.h
* struct ip_set_skbinfo is introduced instead of open coded fields
  in skbinfo get/init helper funcions.
* Use kmalloc() in comment extension helper instead of kzalloc()
  because it is unnecessary to zero out the area just before
  explicit initialization.
* Cleanup: Split extensions into separate files.
* Cleanup: Separate memsize calculation code into dedicated function.
* Cleanup: group ip_set_put_extensions() and ip_set_get_extensions()
  together.
* Add element count to hash headers by Eric B Munson.
* Add element count to all set types header for uniform output
  across all set types.
* Count non-static extension memory into memsize calculation for
  userspace.
* Cleanup: Remove redundant mtype_expire() arguments, because
  they can be get from other parameters.
* Cleanup: Simplify mtype_expire() for hash types by removing
  one level of intendation.
* Make NLEN compile time constant for hash types.
* Make sure element data size is a multiple of u32 for the hash set
  types.
* Optimize hash creation routine, exit as early as possible.
* Make struct htype per ipset family so nets array becomes fixed size
  and thus simplifies the struct htype allocation.
* Collapse same condition body into a single one.
* Fix reported memory size for hash:* types, base hash bucket structure
  was not taken into account.
* hash:ipmac type support added to ipset by Tomasz Chilinski.
* Use setup_timer() and mod_timer() instead of init_timer()
  by Muhammad Falak R Wani, individually for the set type families.

16) Remove useless connlabel field in struct netns_ct, patch from
    Florian Westphal.

17) xt_find_table_lock() doesn't return ERR_PTR() anymore, so simplify
    {ip,ip6,arp}tables code that uses this.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-13 22:41:25 -05:00
..
netfilter netfilter: x_tables: simplify IS_ERR_OR_NULL to NULL test 2016-11-13 22:26:13 +01:00
af_inet.c net: add recursion limit to GRO 2016-10-20 14:32:22 -04:00
ah4.c ah4: Fix error return in ah_input(). 2015-08-25 13:38:50 -07:00
arp.c net: rename NET_{ADD|INC}_STATS_BH() 2016-04-27 22:48:24 -04:00
cipso_ipv4.c Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/selinux into next 2016-07-07 10:15:34 +10:00
datagram.c net: Set sk_txhash from a random number 2015-07-29 22:44:04 -07:00
devinet.c netconf: add a notif when settings are created 2016-09-01 15:18:08 -07:00
esp4.c esp: Fix ESN generation under UDP encapsulation 2016-06-23 11:52:00 -04:00
fib_frontend.c net: core: add UID to flows, rules, and routes 2016-11-04 14:45:23 -04:00
fib_lookup.h ipv4: consider TOS in fib_select_default 2015-07-24 22:46:11 -07:00
fib_rules.c switchdev: remove FIB offload infrastructure 2016-09-28 04:48:00 -04:00
fib_semantics.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-09-12 15:52:44 -07:00
fib_trie.c switchdev: remove FIB offload infrastructure 2016-09-28 04:48:00 -04:00
fou.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-10-30 12:42:58 -04:00
gre_demux.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-06-30 05:03:36 -04:00
gre_offload.c net: add recursion limit to GRO 2016-10-20 14:32:22 -04:00
icmp.c net: inet: Support UID-based routing in IP protocols. 2016-11-04 14:45:23 -04:00
igmp.c net/multicast: should not send source list records when have filter mode change 2016-08-08 16:04:39 -07:00
inet_connection_sock.c net: inet: Support UID-based routing in IP protocols. 2016-11-04 14:45:23 -04:00
inet_diag.c tcp: no longer hold ehash lock while calling tcp_get_info() 2016-11-09 13:02:27 -05:00
inet_fragment.c net: disable fragment reassembly if high_thresh is zero 2016-06-05 22:56:42 -04:00
inet_hashtables.c net: Require exact match for TCP socket lookups if dif is l3mdev 2016-10-17 10:17:05 -04:00
inet_timewait_sock.c timers, net/ipv4/inet: Initialize connection request timers as pinned 2016-07-07 10:35:06 +02:00
inetpeer.c net: Add helper function to compare inetpeer addresses 2015-08-28 13:32:36 -07:00
ip_forward.c net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags 2016-07-19 16:40:22 -07:00
ip_fragment.c net: rename IP_INC_STATS_BH() 2016-04-27 22:48:23 -04:00
ip_gre.c net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id() 2016-09-10 20:53:55 -07:00
ip_input.c net: VRF: Pass original iif to ip_route_input() 2016-09-16 04:24:07 -04:00
ip_options.c net: ipv4: Convert IP network timestamps to be y2038 safe 2016-03-01 17:18:44 -05:00
ip_output.c net: inet: Support UID-based routing in IP protocols. 2016-11-04 14:45:23 -04:00
ip_sockglue.c net/sock: add an explicit sk argument for ip_cmsg_recv_offset() 2016-11-07 13:24:41 -05:00
ip_tunnel_core.c ip_tunnel: do not clear l4 hashes 2016-09-09 19:33:11 -07:00
ip_tunnel.c ipv4/6: use core net MTU range checking 2016-10-20 14:51:10 -04:00
ip_vti.c vti: use right inner_mode for inbound inter address family policy checks 2016-09-09 09:02:08 +02:00
ipcomp.c
ipconfig.c net: ipconfig: Fix NULL pointer dereference on RARP/BOOTP/DHCP timeout 2016-08-22 21:04:41 -07:00
ipip.c ip_tunnel: add collect_md mode to IPIP tunnel 2016-09-17 10:13:07 -04:00
ipmr.c net: Enable support for VRF with ipv4 multicast 2016-11-01 11:54:26 -04:00
Kconfig net: ip, diag -- Add diag interface for raw sockets 2016-10-23 19:35:24 -04:00
Makefile net: ip, diag -- Add diag interface for raw sockets 2016-10-23 19:35:24 -04:00
netfilter.c ipv4: Pass struct net into ip_route_me_harder 2015-09-29 20:21:32 +02:00
ping.c net: inet: Support UID-based routing in IP protocols. 2016-11-04 14:45:23 -04:00
proc.c net: Suppress the "Comparison to NULL could be written" warnings 2016-09-30 01:50:45 -04:00
protocol.c
raw_diag.c net: ip, raw_diag -- Use jump for exiting from nested loop 2016-11-03 15:25:26 -04:00
raw.c net: Update raw socket bind to consider l3 domain 2016-11-07 13:14:34 -05:00
route.c net: inet: Support UID-based routing in IP protocols. 2016-11-04 14:45:23 -04:00
syncookies.c net: inet: Support UID-based routing in IP protocols. 2016-11-04 14:45:23 -04:00
sysctl_net_ipv4.c ipv4: use the right lock for ping_group_range 2016-10-22 16:23:12 -04:00
tcp_bbr.c tcp_bbr: add a state transition diagram and accompanying comment 2016-10-29 17:12:43 -04:00
tcp_bic.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_cdg.c tcp: cdg: rename struct minmax in tcp_cdg.c to avoid a naming conflict 2016-09-21 00:22:59 -04:00
tcp_cong.c tcp: new CC hook to set sending rate with rate_sample in any CA state 2016-09-21 00:23:01 -04:00
tcp_cubic.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_dctcp.c tcp: return sizeof tcp_dctcp_info in dctcp_get_info() 2016-06-14 23:46:30 -07:00
tcp_diag.c net: diag: Fix refcnt leak in error path destroying socket 2016-08-23 23:11:36 -07:00
tcp_fastopen.c tcp: fastopen: avoid negative sk_forward_alloc 2016-09-08 16:08:10 -07:00
tcp_highspeed.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_htcp.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_hybla.c tcp: do not slow start when cwnd equals ssthresh 2015-07-09 14:22:52 -07:00
tcp_illinois.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_input.c tcp: no longer hold ehash lock while calling tcp_get_info() 2016-11-09 13:02:27 -05:00
tcp_ipv4.c net: inet: Support UID-based routing in IP protocols. 2016-11-04 14:45:23 -04:00
tcp_lp.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_metrics.c genetlink: mark families as __ro_after_init 2016-10-27 16:16:09 -04:00
tcp_minisocks.c tcp: track application-limited rate samples 2016-09-21 00:23:00 -04:00
tcp_nv.c tcp: add NV congestion control 2016-06-10 23:07:49 -07:00
tcp_offload.c gso: Support partial splitting at the frag_list pointer 2016-09-19 20:59:34 -04:00
tcp_output.c tcp: enhance tcp collapsing 2016-11-02 15:21:36 -04:00
tcp_probe.c net: ipv4: tcp_probe: Replace timespec with timespec64 2016-03-01 17:18:44 -05:00
tcp_rate.c tcp: export data delivery rate 2016-09-21 00:23:00 -04:00
tcp_recovery.c tcp: do not assume TCP code is non preemptible 2016-05-02 17:02:25 -04:00
tcp_scalable.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_timer.c tcp: Change txhash on every SYN and RTO retransmit 2016-09-28 07:52:34 -04:00
tcp_vegas.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_vegas.h tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_veno.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_westwood.c tcp: replace cnt & rtt with struct in pkts_acked() 2016-05-11 14:43:19 -04:00
tcp_yeah.c tcp: cwnd does not increase in TCP YeAH 2016-09-08 17:16:12 -07:00
tcp.c tcp: remove unaligned accesses from tcp_get_info() 2016-11-09 22:51:33 -05:00
tunnel4.c tunnels: correct conditional build of MPLS and IPv6 2016-07-11 13:27:06 -07:00
udp_diag.c net: inet: diag: expose the socket mark to privileged processes. 2016-09-08 16:13:09 -07:00
udp_impl.h
udp_offload.c net: add recursion limit to GRO 2016-10-20 14:32:22 -04:00
udp_tunnel.c net: Remove deprecated tunnel specific UDP offload functions 2016-06-17 20:23:32 -07:00
udp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2016-11-13 22:41:25 -05:00
udplite.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-08-30 00:54:02 -04:00
xfrm4_input.c netfilter: Pass net into okfn 2015-09-17 17:18:37 -07:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c
xfrm4_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-10-24 06:54:12 -07:00
xfrm4_policy.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-09-12 15:52:44 -07:00
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c