linux

Author	SHA1	Message	Date
Jason A. Donenfeld	b4fe85f9c9	ip_tunnel: disable preemption when updating per-cpu tstats Drivers like vxlan use the recently introduced udp_tunnel_xmit_skb/udp_tunnel6_xmit_skb APIs. udp_tunnel6_xmit_skb makes use of ip6tunnel_xmit, and ip6tunnel_xmit, after sending the packet, updates the struct stats using the usual u64_stats_update_begin/end calls on this_cpu_ptr(dev->tstats). udp_tunnel_xmit_skb makes use of iptunnel_xmit, which doesn't touch tstats, so drivers like vxlan, immediately after, call iptunnel_xmit_stats, which does the same thing - calls u64_stats_update_begin/end on this_cpu_ptr(dev->tstats). While vxlan is probably fine (I don't know?), calling a similar function from, say, an unbound workqueue, on a fully preemptable kernel causes real issues: [ 188.434537] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u8:0/6 [ 188.435579] caller is debug_smp_processor_id+0x17/0x20 [ 188.435583] CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.2.6 #2 [ 188.435607] Call Trace: [ 188.435611] [<ffffffff8234e936>] dump_stack+0x4f/0x7b [ 188.435615] [<ffffffff81915f3d>] check_preemption_disabled+0x19d/0x1c0 [ 188.435619] [<ffffffff81915f77>] debug_smp_processor_id+0x17/0x20 The solution would be to protect the whole this_cpu_ptr(dev->tstats)/u64_stats_update_begin/end blocks with disabling preemption and then reenabling it. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-16 14:14:32 -05:00
Eric Dumazet	00fd38d938	tcp: ensure proper barriers in lockless contexts Some functions access TCP sockets without holding a lock and might output non consistent data, depending on compiler and or architecture. tcp_diag_get_info(), tcp_get_info(), tcp_poll(), get_tcp4_sock() ... Introduce sk_state_load() and sk_state_store() to fix the issues, and more clearly document where this lack of locking is happening. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-15 18:36:38 -05:00
Martin KaFai Lau	02bcf4e082	ipv6: Check rt->dst.from for the DST_NOCACHE route All DST_NOCACHE rt6_info used to have rt->dst.from set to its parent. After commit `8e3d5be736` ("ipv6: Avoid double dst_free"), DST_NOCACHE is also set to rt6_info which does not have a parent (i.e. rt->dst.from is NULL). This patch catches the rt->dst.from == NULL case. Fixes: `8e3d5be736` ("ipv6: Avoid double dst_free") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-15 17:12:37 -05:00
David S. Miller	382a483e53	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree. This large batch that includes fixes for ipset, netfilter ingress, nf_tables dynamic set instantiation and a longstanding Kconfig dependency problem. More specifically, they are: 1) Add missing check for empty hook list at the ingress hook, from Florian Westphal. 2) Input and output interface are swapped at the ingress hook, reported by Patrick McHardy. 3) Resolve ipset extension alignment issues on ARM, patch from Jozsef Kadlecsik. 4) Fix bit check on bitmap in ipset hash type, also from Jozsef. 5) Release buckets when all entries have expired in ipset hash type, again from Jozsef. 6) Oneliner to initialize conntrack tuple object in the PPTP helper, otherwise the conntrack lookup may fail due to random bits in the structure holes, patch from Anthony Lineham. 7) Silence a bogus gcc warning in nfnetlink_log, from Arnd Bergmann. 8) Fix Kconfig dependency problems with TPROXY, socket and dup, also from Arnd. 9) Add __netdev_alloc_pcpu_stats() to allow creating percpu counters from atomic context, this is required by the follow up fix for nf_tables. 10) Fix crash from the dynamic set expression, we have to add new clone operation that should be defined when a simple memcpy is not enough. This resolves a crash when using per-cpu counters with new Patrick McHardy's flow table nft support. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-12 14:17:16 -05:00
Linus Torvalds	2df4ee78d0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix null deref in xt_TEE netfilter module, from Eric Dumazet. 2) Several spots need to get to the original listner for SYN-ACK packets, most spots got this ok but some were not. Whilst covering the remaining cases, create a helper to do this. From Eric Dumazet. 3) Missiing check of return value from alloc_netdev() in CAIF SPI code, from Rasmus Villemoes. 4) Don't sleep while != TASK_RUNNING in macvtap, from Vlad Yasevich. 5) Use after free in mvneta driver, from Justin Maggard. 6) Fix race on dst->flags access in dst_release(), from Eric Dumazet. 7) Add missing ZLIB_INFLATE dependency for new qed driver. From Arnd Bergmann. 8) Fix multicast getsockopt deadlock, from WANG Cong. 9) Fix deadlock in btusb, from Kuba Pawlak. 10) Some ipv6_add_dev() failure paths were not cleaning up the SNMP6 counter state. From Sabrina Dubroca. 11) Fix packet_bind() race, which can cause lost notifications, from Francesco Ruggeri. 12) Fix MAC restoration in qlcnic driver during bonding mode changes, from Jarod Wilson. 13) Revert bridging forward delay change which broke libvirt and other userspace things, from Vlad Yasevich. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits) Revert "bridge: Allow forward delay to be cfgd when STP enabled" bpf_trace: Make dependent on PERF_EVENTS qed: select ZLIB_INFLATE net: fix a race in dst_release() net: mvneta: Fix memory use after free. net: Documentation: Fix default value tcp_limit_output_bytes macvtap: Resolve possible __might_sleep warning in macvtap_do_read() mvneta: add FIXED_PHY dependency net: caif: check return value of alloc_netdev net: hisilicon: NET_VENDOR_HISILICON should depend on HAS_DMA drivers: net: xgene: fix RGMII 10/100Mb mode netfilter: nft_meta: use skb_to_full_sk() helper net_sched: em_meta: use skb_to_full_sk() helper sched: cls_flow: use skb_to_full_sk() helper netfilter: xt_owner: use skb_to_full_sk() helper smack: use skb_to_full_sk() helper net: add skb_to_full_sk() helper and use it in selinux_netlbl_skbuff_setsid() bpf: doc: correct arch list for supported eBPF JIT dwc_eth_qos: Delete an unnecessary check before the function call "of_node_put" bonding: fix panic on non-ARPHRD_ETHER enslave failure ...	2015-11-10 18:11:41 -08:00
Pablo Neira Ayuso	086f332167	netfilter: nf_tables: add clone interface to expression operations With the conversion of the counter expressions to make it percpu, we need to clone the percpu memory area, otherwise we crash when using counters from flow tables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-11-10 23:47:32 +01:00
Eric Dumazet	54abc686c2	net: add skb_to_full_sk() helper and use it in selinux_netlbl_skbuff_setsid() Generalize selinux_skb_sk() added in commit `212cd08953` ("selinux: fix random read in selinux_ip_postroute_compat()") so that we can use it other contexts. Use it right away in selinux_netlbl_skbuff_setsid() Fixes: `ca6fb06518` ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-08 20:56:38 -05:00
Mel Gorman	d0164adc89	mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd __GFP_WAIT has been used to identify atomic context in callers that hold spinlocks or are in interrupts. They are expected to be high priority and have access one of two watermarks lower than "min" which can be referred to as the "atomic reserve". __GFP_HIGH users get access to the first lower watermark and can be called the "high priority reserve". Over time, callers had a requirement to not block when fallback options were available. Some have abused __GFP_WAIT leading to a situation where an optimisitic allocation with a fallback option can access atomic reserves. This patch uses __GFP_ATOMIC to identify callers that are truely atomic, cannot sleep and have no alternative. High priority users continue to use __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers that want to wake kswapd for background reclaim. __GFP_WAIT is redefined as a caller that is willing to enter direct reclaim and wake kswapd for background reclaim. This patch then converts a number of sites o __GFP_ATOMIC is used by callers that are high priority and have memory pools for those requests. GFP_ATOMIC uses this flag. o Callers that have a limited mempool to guarantee forward progress clear __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall into this category where kswapd will still be woken but atomic reserves are not used as there is a one-entry mempool to guarantee progress. o Callers that are checking if they are non-blocking should use the helper gfpflags_allow_blocking() where possible. This is because checking for __GFP_WAIT as was done historically now can trigger false positives. Some exceptions like dm-crypt.c exist where the code intent is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to flag manipulations. o Callers that built their own GFP flags instead of starting with GFP_KERNEL and friends now also need to specify __GFP_KSWAPD_RECLAIM. The first key hazard to watch out for is callers that removed __GFP_WAIT and was depending on access to atomic reserves for inconspicuous reasons. In some cases it may be appropriate for them to use __GFP_HIGH. The second key hazard is callers that assembled their own combination of GFP flags instead of starting with something like GFP_KERNEL. They may now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless if it's missed in most cases as other activity will wake kswapd. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
David S. Miller	096273304c	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Johan Hedberg says: ==================== pull request: bluetooth 2015-11-05 The following set of Bluetooth patches would be good to get into 4.4-rc1 if possible: - Fix for missing LE CoC parameter validity checks - Fix for potential deadlock in btusb - Fix for issuing unsupported commands during HCI init Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-05 11:38:06 -05:00
Johan Hedberg	8a7889cc6e	Bluetooth: L2CAP: Fix returning correct LE CoC response codes The core spec defines specific response codes for situations when the received CID is incorrect. Add the defines for these and return them as appropriate from the LE Connect Request handler function. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-11-05 04:04:00 +01:00
Tobias Klauser	f63ce5b6fa	tun_dst: Fix potential NULL dereference In tun_dst_unclone() the return value of skb_metadata_dst() is checked for being NULL after it is dereferenced. Fix this by moving the dereference after the NULL check. Found by the Coverity scanner (CID 1338068). Fixes: `fc4099f172` ("openvswitch: Fix egress tunnel info.") Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-04 21:59:22 -05:00
David S. Miller	73186df8d7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Minor overlapping changes in net/ipv4/ipmr.c, in 'net' we were fixing the "BH-ness" of the counter bumps whilst in 'net-next' the functions were modified to take an explicit 'net' parameter. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-03 13:41:45 -05:00
David S. Miller	b3047a77cb	Another set of fixes: * remove a warning on a check that can trigger without any errors having happened (Andrei) * correctly handle deauth request while in the process of associating (Andrei) * fix TDLS HT operation (Arik) * allow changing AID/listen interval during client setup (Ayala) * be more forgiving with WMM parameters to get HT/VHT in case of broken APs with bad WMM settings (Emmanuel, myself) * a number of other fixes (some in documentation) -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJWOKsAAAoJEDBSmw7B7bqrNKMQAKFH81CscgJGOQb/zgGdmuF3 kNrWrnH+3XqoqM2rHpIekQLxVkeUhM+hHCyaPCK7rVnCuu53pJ0u7P0rq922XAW4 olFBGVdE1yG/69ndR9MYDjLWP+ikMmAiMLbM5qzPuDJ5XyBVACC1D82+qSRvByCK Z8PYJ+OsLk05eKa4ER7i8BVExRGM4vrce4Uh3K07yNKbfU81ztsltwflleRFGn3f OCydpuSId+C4TuSmkgBJF1718B9GazvAbZDw5t4jorIrbiZzQMZAtoi+YxwXVhev lvPCO8p1+lhWYUOK5LnO8mbdUFfe+kc3rrZjKWuXuDLp6mvPyP9FOaIFjFfnlJxT 8QadG0QDzTlLHUj29gvrnww8aob9c7iHueXP9OlcBMp9uTyklgBJ+fMyvPfXpWXB Diy9n0VJfWzg8d74wWLLQy/N1qY6gwhXXwgW8TM/49O5BpbyvVsI6jFAR+8ZT9b9 GLGEkN68RBuY03mejkf4PmhqgMVErA2JtabRI0Efm2Do85t9ZxgObF6INsrZ+o2M ffl7jhyHsFB+d38Ilwlb4cyWhxpIGrhTtt2h5zIsgNx3wmrXrarwMM3P4NGOOEbP Euqkk/LoMZdjjB/78JSi6hdQSYoQFaW85tHBzXhMXk0nYXHLWdVEJsLuAtATl8gM vzNkny8pcaLnRg/kXqgl =/d5+ -----END PGP SIGNATURE----- Merge tag 'mac80211-for-davem-2015-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Another set of fixes: * remove a warning on a check that can trigger without any errors having happened (Andrei) * correctly handle deauth request while in the process of associating (Andrei) * fix TDLS HT operation (Arik) * allow changing AID/listen interval during client setup (Ayala) * be more forgiving with WMM parameters to get HT/VHT in case of broken APs with bad WMM settings (Emmanuel, myself) * a number of other fixes (some in documentation) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-03 11:30:57 -05:00
Chaitanya T K	dcae9e0203	mac80211: document sleep requirements for channel context ops Channel context driver operations can sleep, so add might_sleep() and document this. Signed-off-by: Chaitanya T K <chaitanya.mgit@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-11-03 11:15:48 +01:00
Johannes Berg	e86abc689c	cfg80211/mac80211: clarify RSSI CQM reporting requirements The previous patch changed mac80211 to always report an event after a CQM RSSI reconfiguration. Document that as expected behaviour in both the cfg80211 and mac80211 API. Currently, iwlmvm already implements that behaviour; the other drivers implementing CQM RSSI events may have to be changed. This behaviour lets userspace know what the current state is without relying on querying the data which is racy. Reviewed-by: Sharon, Sara <sara.sharon@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-11-03 10:54:58 +01:00
Mahesh Bandewar	52bc671681	bonding: simplify / unify event handling code for 3ad mode. Old logic of updating state-machine is not required since ad_update_actor_keys() does it implicitly. The only loss is the notification differentiation between speed vs. duplex change. Now only one unified notification is printed. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 22:52:24 -05:00
Eric Dumazet	1d6119baf0	net: fix percpu memory leaks This patch fixes following problems : 1) percpu_counter_init() can return an error, therefore init_frag_mem_limit() must propagate this error so that inet_frags_init_net() can do the same up to its callers. 2) If ip[46]_frags_ns_ctl_register() fail, we must unwind properly and free the percpu_counter. Without this fix, we leave freed object in percpu_counters global list (if CONFIG_HOTPLUG_CPU) leading to crashes. This bug was detected by KASAN and syzkaller tool (http://github.com/google/syzkaller) Fixes: `6d7b857d54` ("net: use lib/percpu_counter API for fragmentation mem accounting") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 22:47:14 -05:00
Eric Dumazet	8fa677d270	net: avoid NULL deref in inet_ctl_sock_destroy() Under low memory conditions, tcp_sk_init() and icmp_sk_init() can both iterate on all possible cpus and call inet_ctl_sock_destroy(), with eventual NULL pointer. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 22:46:09 -05:00
Eric Dumazet	9e17f8a475	net: make skb_set_owner_w() more robust skb_set_owner_w() is called from various places that assume skb->sk always point to a full blown socket (as it changes sk->sk_wmem_alloc) We'd like to attach skb to request sockets, and in the future to timewait sockets as well. For these kind of pseudo sockets, we need to take a traditional refcount and use sock_edemux() as the destructor. It is now time to un-inline skb_set_owner_w(), being too big. Fixes: `ca6fb06518` ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Bisected-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 16:28:49 -05:00
Julian Anastasov	4f823defdd	ipv4: fix to not remove local route on link down When fib_netdev_event calls fib_disable_ip on NETDEV_DOWN event we should not delete the local routes if the local address is still present. The confusion comes from the fact that both fib_netdev_event and fib_inetaddr_event use the NETDEV_DOWN constant. Fix it by returning back the variable 'force'. Steps to reproduce: modprobe dummy ifconfig dummy0 192.168.168.1 up ifconfig dummy0 down ip route list table local \| grep dummy \| grep host local 192.168.168.1 dev dummy0 proto kernel scope host src 192.168.168.1 Fixes: `8a3d03166f` ("net: track link-status of ipv4 nexthops") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-01 16:57:39 -05:00
Vivien Didelot	76e398a627	net: dsa: use switchdev obj for VLAN add/del ops Simplify DSA by pushing the switchdev objects for VLAN add and delete operations down to its drivers. Currently only mv88e6xxx is affected. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-01 15:56:11 -05:00
Stefan Hajnoczi	ea3803c193	VSOCK: define VSOCK_SS_LISTEN once only The SS_LISTEN socket state is defined by both af_vsock.c and vmci_transport.c. This is risky since the value could be changed in one file and the other would be out of sync. Rename from SS_LISTEN to VSOCK_SS_LISTEN since the constant is not part of enum socket_state (SS_CONNECTED, ...). This way it is clear that the constant is vsock-specific. The big text reflow in af_vsock.c was necessary to keep to the maximum line length. Text is unchanged except for s/SS_LISTEN/VSOCK_SS_LISTEN/. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-01 12:14:47 -05:00
David S. Miller	740215ddb5	NFC 4.4 pull request This is the NFC pull request for 4.4. It's a bit bigger than usual, the 3 main culprits being: - A new driver for Intel's Fields Peak NCI chipset. In order to support this chipset we had to export a few NCI routines and extend the driver NCI ops to not only support proprietary commands but also core ones. - Support for vendor commands for both STM drivers, st-nci and st21nfca. Those vendor commands allow to run factory tests through the NFC netlink interface. - New i2c and SPI support for the Marvell driver, together with firmware download support for this driver's core. Besides that we also have: - A few file renames in the STM drivers, to keep the naming consistent between drivers. - Some improvements and fixes on the NCI HCI layer, mostly to properly reach a secure element over a legacy HCI link. - A few fixes for the s3fwrn5 and trf7970a drivers. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJWMGIaAAoJEIqAPN1PVmxKHjYP/3Q3Y4Vhvw3kTfDP3IlnAuH3 XjBMGKPLu72MmtSk9jFOr5VuC76YtJzwf+4nKGJybu619NPKfxXN7r83bpsZV1Bk +0cS1RpjIQh92a0ElvX1muCFhgH7ax7zeqQ+29OSpA33e67/DlUcwxiqzF15cwWC Bk0pUv1FxMoNi5ZkG1JrRqrhx/Yqo1dw2HrnMbKVgwLtLzODBuzoGVKfydTo0b1j hkl30DPF3AYMxnwIml3tM8zT96b1LtD0Xgs1yF8IdrIJ+6YLn/6tnw1rUxnE8Ovo JFtvtS0OKjGTNFr1NhueG0i5td8TR4MAJKHh0Lz9ISIFWtNtsVjFfGuAeWZcQ/n/ rQS7xOvzBCndNbe8PS9wBiNQAqLAH5/dzvKRwNboRttkpwIrNOgaBYj2LpuRzpfO p+ArwBryAorfxQVOIWl4knc59UsiPUKOK61uMTZ1sU7jCEvUNVChIm8EGRlMnpMQ ZFlBa2lqNdgz7ubKLofbnWLiCNY6r0E13MSLHZlJZX61IMjs13ojDeKMvitFBe+b 1hDwbSWxIRB8xKbcsIA9bPUnEc16Syywz/Q4iAsE8Gy6l5J41MhA/q2QaO9WSrPE Leah53l5EwQRd55WjJkCkIKZwvCjIerkESfAS0oprELIYXaxs/1PbVl6C7VYZA4K A5tYLw2vS+tTK4Mgi/ym =aw/h -----END PGP SIGNATURE----- Merge tag 'nfc-next-4.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next Samuel Ortiz says: ==================== NFC 4.4 pull request This is the NFC pull request for 4.4. It's a bit bigger than usual, the 3 main culprits being: - A new driver for Intel's Fields Peak NCI chipset. In order to support this chipset we had to export a few NCI routines and extend the driver NCI ops to not only support proprietary commands but also core ones. - Support for vendor commands for both STM drivers, st-nci and st21nfca. Those vendor commands allow to run factory tests through the NFC netlink interface. - New i2c and SPI support for the Marvell driver, together with firmware download support for this driver's core. Besides that we also have: - A few file renames in the STM drivers, to keep the naming consistent between drivers. - Some improvements and fixes on the NCI HCI layer, mostly to properly reach a secure element over a legacy HCI link. - A few fixes for the s3fwrn5 and trf7970a drivers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-30 20:19:43 +09:00
David S. Miller	5bf8921116	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-10-28 Here are a some more Bluetooth patches for 4.4 which collected up during the past week. The most important ones are from Kuba Pawlak for fixing locking issues with SCO sockets. There's also a fix from Alexander Aring for 6lowpan, a memleak fix from Julia Lawall for the btmrvl driver and some cleanup patches from Marcel. Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-30 19:41:10 +09:00
Robert Dolca	f11631748e	NFC: nci: non-static functions can not be inline This fixes a build error that seems to be toochain dependent (Not seen with gcc v5.1): In file included from net/nfc/nci/rsp.c:36:0: net/nfc/nci/rsp.c: In function ‘nci_rsp_packet’: include/net/nfc/nci_core.h:355:12: error: inlining failed in call to always_inline ‘nci_prop_rsp_packet’: function body not available inline int nci_prop_rsp_packet(struct nci_dev *ndev, __u16 opcode, Signed-off-by: Robert Dolca <robert.dolca@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-10-28 06:44:45 +01:00
emmanuel.grumbach@intel.com	8941faa161	net: tso: add support for IPv6 Adding IPv6 for the TSO helper API is trivial: * Don't play with the id (which doesn't exist in IPv6) * Correctly update the payload_len (don't include the length of the IP header itself) Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-26 22:24:22 -07:00
Vincent Cuissard	2bd832459a	NFC: NCI: allow spi driver to choose transfer clock In some cases low level drivers might want to update the SPI transfer clock (e.g. during firmware download). This patch adds this support. Without any modification the driver will use the default SPI clock (from pdata or device tree). Signed-off-by: Vincent Cuissard <cuissard@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-10-27 04:23:34 +01:00
Vincent Cuissard	b5b3e23e4c	NFC: nfcmrvl: add i2c driver This driver adds the support of I2C-based Marvell NFC controller. Signed-off-by: Vincent Cuissard <cuissard@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-10-27 04:21:14 +01:00
Vincent Cuissard	e5629d2947	NFC: NCI: export nci_send_frame and nci_send_cmd function Export nci_send_frame and nci_send_cmd symbols to allow drivers to use it. This is needed for example if NCI is used during firmware download phase. Signed-off-by: Vincent Cuissard <cuissard@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-10-27 04:16:14 +01:00
Christophe Ricard	96d4581f0b	NFC: netlink: Add mode parameter to deactivate_target functions In order to manage in a better way the nci poll mode state machine, add mode parameter to deactivate_target functions. This way we can manage different target state. mode parameter make sense only in nci core. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-10-27 03:55:12 +01:00
Christophe Ricard	b1fa4dc4ff	NFC: st-nci: Add support for proprietary commands Add support for proprietary commands useful mainly for factory testings. Here is a list: - FACTORY_MODE: Allow to set the driver into a mode where no secure element are activated. It does not consider any NFC_ATTR_VENDOR_DATA. - HCI_CLEAR_ALL_PIPES: Allow to execute a HCI clear all pipes command. It does not consider any NFC_ATTR_VENDOR_DATA. - HCI_DM_PUT_DATA: Allow to configure specific CLF registry like for example RF trimmings or low level drivers configurations (I2C, SPI, SWP). - HCI_DM_UPDATE_AID: Allow to configure an AID routing into the CLF routing table following RF technology, CLF mode or protocol. - HCI_DM_GET_INFO: Allow to retrieve CLF information. - HCI_DM_GET_DATA: Allow to retrieve CLF configurable data such as low level drivers configurations or RF trimmings. - HCI_DM_DIRECT_LOAD: Allow to load a firmware into the CLF. A complete packet can be more than 8KB. - HCI_DM_RESET: Allow to run a CLF reset in order to "commit" CLF configuration changes without CLF power off. - HCI_GET_PARAM: Allow to retrieve an HCI CLF parameter (for example the white list). - HCI_DM_FIELD_GENERATOR: Allow to generate different kind of RF technology. When using this command to anti-collision is done. - HCI_LOOPBACK: Allow to echo a command and test the Dh to CLF connectivity. - HCI_DM_VDC_MEASUREMENT_VALUE: Allow to measure the field applied on the CLF antenna. A value between 0 and 0x0f is returned. 0 is maximum. - HCI_DM_FWUPD_START: Allow to put CLF into firmware update mode. It is a specific CLF command as there is no GPIO for this. - HCI_DM_FWUPD_END: Allow to complete firmware update. - HCI_DM_VDC_VALUE_COMPARISON: Allow to compare the field applied on the CLF antenna to a reference value. - MANUFACTURER_SPECIFIC: Allow to retrieve manufacturer specific data received during a NCI_CORE_INIT_CMD. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-10-27 03:55:01 +01:00
Marcel Holtmann	242c0ebd37	Bluetooth: Rename bt_cb()->req into bt_cb()->hci The SKB context buffer for HCI request is really not just for requests, information in their are preserved for the whole HCI layer. So it makes more sense to actually rename it into bt_cb()->hci and also call it then struct hci_ctrl. In addition that allows moving the decoded opcode for outgoing packets into that struct. So far it was just consuming valuable space from the main shared items. And opcode are not valid for L2CAP packets. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-26 08:21:03 +02:00
Christophe Ricard	7e35740438	NFC: st-nci: Add support for NCI_HCI_IDENTITY_MGMT_GATE NCI_HCI_IDENTITY_MGMT_GATE might be useful to get information about hardware or firmware version. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-10-26 06:53:16 +01:00
Christophe Ricard	fa6fbadea5	NFC: nci: add nci_hci_clear_all_pipes functions nci_hci_clear_all_pipes might be use full in some cases for example after a firmware update. Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-10-26 06:53:12 +01:00
Robert Dolca	85b9ce9a21	NFC: nci: add nci_get_conn_info_by_id function This functin takes as a parameter a pointer to the nci_dev struct and the first byte from the values of the first domain specific parameter that was used for the connection creation. Signed-off-by: Robert Dolca <robert.dolca@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-10-25 20:29:11 +01:00
Robert Dolca	22e4bd09c4	NFC: nci: rename nci_prop_ops to nci_driver_ops Initially it was used to create hooks in the driver for proprietary operations. Currently it is being used for hooks for both proprietary and generic operations. Signed-off-by: Robert Dolca <robert.dolca@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-10-25 20:28:59 +01:00
Robert Dolca	0a97a3cba2	NFC: nci: Allow the driver to set handler for core nci ops The driver may be required to act when some responses or notifications arrive. For example the NCI core does not have a handler for NCI_OP_CORE_GET_CONFIG_RSP. The NFCC can send a config response that has to be read by the driver and the packet may contain vendor specific data. The Fields Peak driver needs to take certain actions when a reset notification arrives (packet also not handled by the nfc core). The driver handlers do not interfere with the core and they are called after the core processes the packet. Signed-off-by: Robert Dolca <robert.dolca@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-10-25 19:12:57 +01:00
Robert Dolca	7bc4824ed5	NFC: nci: Introduce nci_core_cmd This allows sending core commands from the driver. The driver should be able to send NCI core commands like CORE_GET_CONFIG_CMD. Signed-off-by: Robert Dolca <robert.dolca@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-10-25 19:12:13 +01:00
Robert Dolca	a9433c11b1	NFC: nci: Introduce new core opcodes Add NCI_OP_CORE_GET_CONFIG_CMD, NCI_OP_CORE_GET_CONFIG_RSP and NCI_OP_CORE_RESET_NTF. Signed-off-by: Robert Dolca <robert.dolca@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-10-25 19:11:49 +01:00
Robert Dolca	2663589ce6	NFC: nci: Add function to get max packet size for conn FDP driver needs to send the firmware as regular packets (not fragmented). The driver should have a way to get the max packet size for a given connection. Signed-off-by: Robert Dolca <robert.dolca@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2015-10-25 19:11:42 +01:00
David S. Miller	ba3e2084f2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/ipv6/xfrm6_output.c net/openvswitch/flow_netlink.c net/openvswitch/vport-gre.c net/openvswitch/vport-vxlan.c net/openvswitch/vport.c net/openvswitch/vport.h The openvswitch conflicts were overlapping changes. One was the egress tunnel info fix in 'net' and the other was the vport ->send() op simplification in 'net-next'. The xfrm6_output.c conflicts was also a simplification overlapping a bug fix. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-24 06:54:12 -07:00
David S. Miller	a72c9512bf	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-10-22 Here's probably the last bluetooth-next pull request for 4.4. Among several other changes it contains the rest of the fixes & cleanups from the Bluetooth UnplugFest (that didn't need to be hurried to 4.3). - Refactoring & cleanups to 6lowpan code - New USB ids for two Atheros controllers and BCM43142A0 from Broadcom - Fix (quirk) for broken Broadcom BCM2045 controllers - Support for latest Apple controllers - Improvements to the vendor diagnostic message support Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-24 05:13:16 -07:00
Roopa Prabhu	f8efb73c97	mpls: multipath route support This patch adds support for MPLS multipath routes. Includes following changes to support multipath: - splits struct mpls_route into 'struct mpls_route + struct mpls_nh' - 'struct mpls_nh' represents a mpls nexthop label forwarding entry - moves mpls route and nexthop structures into internal.h - A mpls_route can point to multiple mpls_nh structs - the nexthops are maintained as a array (similar to ipv4 fib) - In the process of restructuring, this patch also consistently changes all labels to u8 - Adds support to parse/fill RTA_MULTIPATH netlink attribute for multipath routes similar to ipv4/v6 fib - In this patch, the multipath route nexthop selection algorithm simply returns the first nexthop. It is replaced by a hash based algorithm from Robert Shearman in the next patch - mpls_route_update cleanup: remove 'dev' handling in mpls_route_update. mpls_route_update though implemented to update based on dev, it was never used that way. And the dev handling gets tricky with multiple nexthops. Cannot match against any single nexthops dev. So, this patch removes the unused 'dev' handling in mpls_route_update. - dead route/path handling will be implemented in a subsequent patch Example: $ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \ nexthop as 700 via inet 10.1.1.6 dev swp2 \ nexthop as 800 via inet 40.1.1.2 dev swp3 $ip -f mpls route show 100 nexthop as to 200 via inet 10.1.1.2 dev swp1 nexthop as to 700 via inet 10.1.1.6 dev swp2 nexthop as to 800 via inet 40.1.1.2 dev swp3 Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-23 06:26:42 -07:00
Eric Dumazet	5e0724d027	tcp/dccp: fix hashdance race for passive sessions Multiple cpus can process duplicates of incoming ACK messages matching a SYN_RECV request socket. This is a rare event under normal operations, but definitely can happen. Only one must win the race, otherwise corruption would occur. To fix this without adding new atomic ops, we use logic in inet_ehash_nolisten() to detect the request was present in the same ehash bucket where we try to insert the new child. If request socket was not found, we have to undo the child creation. This actually removes a spin_lock()/spin_unlock() pair in reqsk_queue_unlink() for the fast path. Fixes: `e994b2f0fb` ("tcp: do not lock listener to process SYN packets") Fixes: `079096f103` ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-23 05:42:21 -07:00
Pravin B Shelar	fc4099f172	openvswitch: Fix egress tunnel info. While transitioning to netdev based vport we broke OVS feature which allows user to retrieve tunnel packet egress information for lwtunnel devices. Following patch fixes it by introducing ndo operation to get the tunnel egress info. Same ndo operation can be used for lwtunnel devices and compat ovs-tnl-vport devices. So after adding such device operation we can remove similar operation from ovs-vport. Fixes: `614732eaa1` ("openvswitch: Use regular VXLAN net_device device"). Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 19:39:25 -07:00
Vivien Didelot	1a49a2fbf8	net: dsa: remove port_fdb_getnext No driver implements port_fdb_getnext anymore, and port_fdb_dump is preferred anyway, so remove this function from DSA. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:38:45 -07:00
Vivien Didelot	ea70ba9806	net: dsa: add port_fdb_dump function Not all switch chips support a Get Next operation to iterate on its FDB. So add a more simple port_fdb_dump function for them. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:38:35 -07:00
David S. Miller	e9829b9745	Here's another set of patches for the current cycle: * I merged net-next back to avoid a conflict with the * cfg80211 scheduled scan API extensions * preparations for better scan result timestamping * regulatory cleanups * mac80211 statistics cleanups * a few other small cleanups and fixes -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJWJ6lbAAoJEDBSmw7B7bqraasP/Ryaa7zL10E+dOQtqBQHQeMe olbrCUtTYltr4nnuESzh5WPeIVZBQ0DIduoLLF0IDSPVwE/NrbpFUVIMHvJvr+s7 rE9k8RB4P7BMTjf+mkDX1Od9kCKGkt4ezcyt/oNIsqM12SN9JQ99itwz6Mp94xCs XKsiXJRh9f/8Qwd/74qQq1Va3UfGAVuKO8WpUe/A7TYTla8ZY20pv1D8kQKQzrFg DwsMirjmHcUpobSjnPAAmZevRxdk6o0E+P7DYG172H2Tm8/EIMR/gYMnQeYW6HkA lfMMDfAGmNvyRm8v1iuBLodREP4kn4VbhMSZDtH7D6FYfmJh5fSeG09bSe51G5Xh zv/B8A1cCbWFqtQHp3wI6ml8VDyAhDc2Hvqb75KRn6FplIkEiszVP0y3cNHWiJVt Ix6Sysoa6kQDXEgR50APeLJ3VI+/mhXmvIila4jP9PKhO14SDHrCoRQO62Z0COJ7 2E5Ir2KE8T+O9mSeuB7m8xD/t60HDd3q3tLZmH0Ps6xfxKf9y2hdZacbX4Hi5Mqk 2XxXZYnhAXUqZmZhmG3ajnEiB4UGMt21R7dIqNTaQ9chOGBkHqIZxPm82XtNb13h yHILavGpUDT0z6OB2z8fxUcj4a4SrrK+aiIGh4iFpDR0Nu0IyZ5cPHXY2FfvJWmD ZO74RMEpBodYR8BsV4yP =uZ5N -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2015-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Here's another set of patches for the current cycle: * I merged net-next back to avoid a conflict with the * cfg80211 scheduled scan API extensions * preparations for better scan result timestamping * regulatory cleanups * mac80211 statistics cleanups * a few other small cleanups and fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 07:28:41 -07:00
Arad, Ronen	b1974ed05e	netlink: Rightsize IFLA_AF_SPEC size calculation if_nlmsg_size() overestimates the minimum allocation size of netlink dump request (when called from rtnl_calcit()) or the size of the message (when called from rtnl_getlink()). This is because ext_filter_mask is not supported by rtnl_link_get_af_size() and rtnl_link_get_size(). The over-estimation is significant when at least one netdev has many VLANs configured (8 bytes for each configured VLAN). This patch-set "rightsizes" the protocol specific attribute size calculation by propagating ext_filter_mask to rtnl_link_get_af_size() and adding this a argument to get_link_af_size op in rtnl_af_ops. Bridge module already used filtering aware sizing for notifications. br_get_link_af_size_filtered() is consistent with the modified get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops. br_get_link_af_size() becomes unused and thus removed. Signed-off-by: Ronen Arad <ronen.arad@intel.com> Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 19:15:20 -07:00
Johan Hedberg	17bc08f0d1	Bluetooth: Remove unnecessary hci_explicit_connect_lookup function There's only one user of this helper which can be replaces with a call to hci_pend_le_action_lookup() and a check for params->explicit_connect. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 18:58:23 +02:00
Johan Hedberg	1b51c7b6e8	Bluetooth: Add hci_conn_hash_lookup_le() helper function Many of the existing LE connection lookups are forced to use hci_conn_hash_lookup_ba() which doesn't take into account the address type. What's worse, most of the users don't bother checking that the returned address type matches what was wanted. This patch adds a new helper API to look up LE connections based on their address and address type, paving the way to have the hci_conn_hash_lookup_ba() users converted to do more precise lookups. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 18:36:39 +02:00
Yuchung Cheng	4f41b1c58a	tcp: use RACK to detect losses This patch implements the second half of RACK that uses the the most recent transmit time among all delivered packets to detect losses. tcp_rack_mark_lost() is called upon receiving a dubious ACK. It then checks if an not-yet-sacked packet was sent at least "reo_wnd" prior to the sent time of the most recently delivered. If so the packet is deemed lost. The "reo_wnd" reordering window starts with 1msec for fast loss detection and changes to min-RTT/4 when reordering is observed. We found 1msec accommodates well on tiny degree of reordering (<3 pkts) on faster links. We use min-RTT instead of SRTT because reordering is more of a path property but SRTT can be inflated by self-inflicated congestion. The factor of 4 is borrowed from the delayed early retransmit and seems to work reasonably well. Since RACK is still experimental, it is now used as a supplemental loss detection on top of existing algorithms. It is only effective after the fast recovery starts or after the timeout occurs. The fast recovery is still triggered by FACK and/or dupack threshold instead of RACK. We introduce a new sysctl net.ipv4.tcp_recovery for future experiments of loss recoveries. For now RACK can be disabled by setting it to 0. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 07:00:53 -07:00
Yuchung Cheng	659a8ad56f	tcp: track the packet timings in RACK This patch is the first half of the RACK loss recovery. RACK loss recovery uses the notion of time instead of packet sequence (FACK) or counts (dupthresh). It's inspired by the previous FACK heuristic in tcp_mark_lost_retrans(): when a limited transmit (new data packet) is sacked, then current retransmitted sequence below the newly sacked sequence must been lost, since at least one round trip time has elapsed. But it has several limitations: 1) can't detect tail drops since it depends on limited transmit 2) is disabled upon reordering (assumes no reordering) 3) only enabled in fast recovery ut not timeout recovery RACK (Recently ACK) addresses these limitations with the notion of time instead: a packet P1 is lost if a later packet P2 is s/acked, as at least one round trip has passed. Since RACK cares about the time sequence instead of the data sequence of packets, it can detect tail drops when later retransmission is s/acked while FACK or dupthresh can't. For reordering RACK uses a dynamically adjusted reordering window ("reo_wnd") to reduce false positives on ever (small) degree of reordering. This patch implements tcp_advanced_rack() which tracks the most recent transmission time among the packets that have been delivered (ACKed or SACKed) in tp->rack.mstamp. This timestamp is the key to determine which packet has been lost. Consider an example that the sender sends six packets: T1: P1 (lost) T2: P2 T3: P3 T4: P4 T100: sack of P2. rack.mstamp = T2 T101: retransmit P1 T102: sack of P2,P3,P4. rack.mstamp = T4 T205: ACK of P4 since the hole is repaired. rack.mstamp = T101 We need to be careful about spurious retransmission because it may falsely advance tp->rack.mstamp by an RTT or an RTO, causing RACK to falsely mark all packets lost, just like a spurious timeout. We identify spurious retransmission by the ACK's TS echo value. If TS option is not applicable but the retransmission is acknowledged less than min-RTT ago, it is likely to be spurious. We refrain from using the transmission time of these spurious retransmissions. The second half is implemented in the next patch that marks packet lost using RACK timestamp. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 07:00:48 -07:00
Yuchung Cheng	f672258391	tcp: track min RTT using windowed min-filter Kathleen Nichols' algorithm for tracking the minimum RTT of a data stream over some measurement window. It uses constant space and constant time per update. Yet it almost always delivers the same minimum as an implementation that has to keep all the data in the window. The measurement window is tunable via sysctl.net.ipv4.tcp_min_rtt_wlen with a default value of 5 minutes. The algorithm keeps track of the best, 2nd best & 3rd best min values, maintaining an invariant that the measurement time of the n'th best >= n-1'th best. It also makes sure that the three values are widely separated in the time window since that bounds the worse case error when that data is monotonically increasing over the window. Upon getting a new min, we can forget everything earlier because it has no value - the new min is less than everything else in the window by definition and it's the most recent. So we restart fresh on every new min and overwrites the 2nd & 3rd choices. The same property holds for the 2nd & 3rd best. Therefore we have to maintain two invariants to maximize the information in the samples, one on values (1st.v <= 2nd.v <= 3rd.v) and the other on times (now-win <=1st.t <= 2nd.t <= 3rd.t <= now). These invariants determine the structure of the code The RTT input to the windowed filter is the minimum RTT measured from ACK or SACK, or as the last resort from TCP timestamps. The accessor tcp_min_rtt() returns the minimum RTT seen in the window. ~0U indicates it is not available. The minimum is 1usec even if the true RTT is below that. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 07:00:43 -07:00
Johan Hedberg	8ce783dc5e	Bluetooth: Fix missing hdev locking for LE scan cleanup The hci_conn objects don't have a dedicated lock themselves but rely on the caller to hold the hci_dev lock for most types of access. The hci_conn_timeout() function has so far sent certain HCI commands based on the hci_conn state which has been possible without holding the hci_dev lock. The recent changes to do LE scanning before connect attempts added even more operations to hci_conn and hci_dev from hci_conn_timeout, thereby exposing potential race conditions with the hci_dev and hci_conn states. As an example of such a race, here there's a timeout but an l2cap_sock_connect() call manages to race with the cleanup routine: [Oct21 08:14] l2cap_chan_timeout: chan ee4b12c0 state BT_CONNECT [ +0.000004] l2cap_chan_close: chan ee4b12c0 state BT_CONNECT [ +0.000002] l2cap_chan_del: chan ee4b12c0, conn f3141580, err 111, state BT_CONNECT [ +0.000002] l2cap_sock_teardown_cb: chan ee4b12c0 state BT_CONNECT [ +0.000005] l2cap_chan_put: chan ee4b12c0 orig refcnt 4 [ +0.000010] hci_conn_drop: hcon f53d56e0 orig refcnt 1 [ +0.000013] l2cap_chan_put: chan ee4b12c0 orig refcnt 3 [ +0.000063] hci_conn_timeout: hcon f53d56e0 state BT_CONNECT [ +0.000049] hci_conn_params_del: addr ee:0d:30:09:53:1f (type 1) [ +0.000002] hci_chan_list_flush: hcon f53d56e0 [ +0.000001] hci_chan_del: hci0 hcon f53d56e0 chan f4e7ccc0 [ +0.004528] l2cap_sock_create: sock e708fc00 [ +0.000023] l2cap_chan_create: chan ee4b1770 [ +0.000001] l2cap_chan_hold: chan ee4b1770 orig refcnt 1 [ +0.000002] l2cap_sock_init: sk ee4b3390 [ +0.000029] l2cap_sock_bind: sk ee4b3390 [ +0.000010] l2cap_sock_setsockopt: sk ee4b3390 [ +0.000037] l2cap_sock_connect: sk ee4b3390 [ +0.000002] l2cap_chan_connect: 00:02:72:d9:e5:8b -> ee:0d:30:09:53:1f (type 2) psm 0x00 [ +0.000002] hci_get_route: 00:02:72:d9:e5:8b -> ee:0d:30:09:53:1f [ +0.000001] hci_dev_hold: hci0 orig refcnt 8 [ +0.000003] hci_conn_hold: hcon f53d56e0 orig refcnt 0 Above the l2cap_chan_connect() shouldn't have been able to reach the hci_conn f53d56e0 anymore but since hci_conn_timeout didn't do proper locking that's not the case. The end result is a reference to hci_conn that's not in the conn_hash list, resulting in list corruption when trying to remove it later: [Oct21 08:15] l2cap_chan_timeout: chan ee4b1770 state BT_CONNECT [ +0.000004] l2cap_chan_close: chan ee4b1770 state BT_CONNECT [ +0.000003] l2cap_chan_del: chan ee4b1770, conn f3141580, err 111, state BT_CONNECT [ +0.000001] l2cap_sock_teardown_cb: chan ee4b1770 state BT_CONNECT [ +0.000005] l2cap_chan_put: chan ee4b1770 orig refcnt 4 [ +0.000002] hci_conn_drop: hcon f53d56e0 orig refcnt 1 [ +0.000015] l2cap_chan_put: chan ee4b1770 orig refcnt 3 [ +0.000038] hci_conn_timeout: hcon f53d56e0 state BT_CONNECT [ +0.000003] hci_chan_list_flush: hcon f53d56e0 [ +0.000002] hci_conn_hash_del: hci0 hcon f53d56e0 [ +0.000001] ------------[ cut here ]------------ [ +0.000461] WARNING: CPU: 0 PID: 1782 at lib/list_debug.c:56 __list_del_entry+0x3f/0x71() [ +0.000839] list_del corruption, f53d56e0->prev is LIST_POISON2 (00000200) The necessary fix is unfortunately more complicated than just adding hci_dev_lock/unlock calls to the hci_conn_timeout() call path. Particularly, the hci_conn_del() API, which expects the hci_dev lock to be held, performs a cancel_delayed_work_sync(&hcon->disc_work) which would lead to a deadlock if the hci_conn_timeout() call path tries to acquire the same lock. This patch solves the problem by deferring the cleanup work to a separate work callback. To protect against the hci_dev or hci_conn going away meanwhile temporary references are taken with the help of hci_dev_hold() and hci_conn_get(). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org # 4.3	2015-10-21 14:25:34 +02:00
Marcel Holtmann	98a63aaf24	Bluetooth: Introduce driver specific post init callback Some drivers might have to restore certain settings after the init procedure has been completed. This driver callback allows them to hook into that stage. This callback is run just before the controller is declared as powered up. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-21 07:30:53 +03:00
Alexander Aring	028b2a8c16	6lowpan: remove lowpan_is_addr_broadcast This macro is used at 802.15.4 6LoWPAN only and can be replaced by memcmp with the interface broadcast address. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:25 +02:00
Alexander Aring	6350047eb8	6lowpan: move IPHC functionality defines This patch removes the IPHC related defines for doing bit manipulation from global 6lowpan header to the iphc file which should the only one implementation which use these defines. Also move next header compression defines to their nhc implementation. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:25 +02:00
Alexander Aring	478208e3b9	6lowpan: remove lowpan_fetch_skb_u8 This patch removes the lowpan_fetch_skb_u8 function for getting the iphc bytes. Instead we using the generic which has a len parameter to tell the amount of bytes to fetch. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:25 +02:00
Alexander Aring	8911d7748c	6lowpan: cleanup lowpan_header_decompress This patch changes the lowpan_header_decompress function by removing inklayer related information from parameters. This is currently for supporting short and extended address for iphc handling in 802154. We don't support short address handling anyway right now, but there exists already code for handling short addresses in lowpan_header_decompress. The address parameters are also changed to a void pointer, so 6LoWPAN linklayer specific code can put complex structures as these parameters and cast it again inside the generic code by evaluating linklayer type before. The order is also changed by destination address at first and then source address, which is the same like all others functions where destination is always the first, memcpy, dev_hard_header, lowpan_header_compress, etc. This patch also moves the fetching of iphc values from 6LoWPAN linklayer specific code into the generic branch. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:24 +02:00
Alexander Aring	a6f773891a	6lowpan: cleanup lowpan_header_compress This patch changes the lowpan_header_compress function by removing unused parameters like "len" and drop static value parameters of protocol type. Instead we really check the protocol type inside inside the skb structure. Also we drop the use of IEEE802154_ADDR_LEN which is link-layer specific. Instead we using EUI64_ADDR_LEN which should always the default case for now. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:24 +02:00
Alexander Aring	bf513fd6fc	6lowpan: introduce LOWPAN_IPHC_MAX_HC_BUF_LEN This patch introduces the LOWPAN_IPHC_MAX_HC_BUF_LEN define which represent the worst-case supported IPHC buffer length. It's used to allocate the stack buffer space for creating the IPHC header. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:24 +02:00
Marcel Holtmann	e131d74a3a	Bluetooth: Add support setup stage internal notification event Before the vendor specific setup stage is triggered call back into the core to trigger an internal notification event. That event is used to send an index update to the monitor interface. With that specific event it is possible to update userspace with manufacturer information before any HCI command has been executed. This is useful for early stage debugging of vendor specific initialization sequences. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-21 00:49:23 +02:00
Marcel Holtmann	7e995b9ead	Bluetooth: Add new quirk for non-persistent diagnostic settings If the diagnostic settings are not persistent over HCI Reset, then this quirk can be used to tell the Bluetoth core about it. This will ensure that the settings are programmed correctly when the controller is powered up. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-21 00:49:22 +02:00
Johan Hedberg	cad20c2780	Bluetooth: Don't use remote address type to decide IRK persistency There are LE devices on the market that start off by announcing their public address and then once paired switch to using private address. To be interoperable with such devices we should simply trust the fact that we're receiving an IRK from them to indicate that they may use private addresses in the future. Instead, simply tie the persistency to the bonding/no-bonding information the same way as for LTKs and CSRKs. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-21 00:49:21 +02:00
David S. Miller	26440c835f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/asix_common.c net/ipv4/inet_connection_sock.c net/switchdev/switchdev.c In the inet_connection_sock.c case the request socket hashing scheme is completely different in net-next. The other two conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-20 06:08:27 -07:00
David S. Miller	371f1c7e0d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree. Most relevantly, updates for the nfnetlink_log to integrate with conntrack, fixes for cttimeout and improvements for nf_queue core, they are: 1) Remove useless ifdef around static inline function in IPVS, from Eric W. Biederman. 2) Simplify the conntrack support for nfnetlink_queue: Merge nfnetlink_queue_ct.c file into nfnetlink_queue_core.c, then rename it back to nfnetlink_queue.c 3) Use y2038 safe timestamp from nfnetlink_queue. 4) Get rid of dead function definition in nf_conntrack, from Flavio Leitner. 5) Attach conntrack support for nfnetlink_log.c, from Ken-ichirou MATSUZAWA. This adds a new NETFILTER_NETLINK_GLUE_CT Kconfig switch that controls enabling both nfqueue and nflog integration with conntrack. The userspace application can request this via NFULNL_CFG_F_CONNTRACK configuration flag. 6) Remove unused netns variables in IPVS, from Eric W. Biederman and Simon Horman. 7) Don't put back the refcount on the cttimeout object from xt_CT on success. 8) Fix crash on cttimeout policy object removal. We have to flush out the cttimeout extension area of the conntrack not to refer to an unexisting object that was just removed. 9) Make sure rcu_callback completion before removing nfnetlink_cttimeout module removal. 10) Fix compilation warning in br_netfilter when no nf_defrag_ipv4 and nf_defrag_ipv6 are enabled. Patch from Arnd Bergmann. 11) Autoload ctnetlink dependencies when NFULNL_CFG_F_CONNTRACK is requested. Again from Ken-ichirou MATSUZAWA. 12) Don't use pointer to previous hook when reinjecting traffic via nf_queue with NF_REPEAT verdict since it may be already gone. This also avoids a deadloop if the userspace application keeps returning NF_REPEAT. 13) A bunch of cleanups for netfilter IPv4 and IPv6 code from Ian Morris. 14) Consolidate logger instance existence check in nfulnl_recv_config(). 15) Fix broken atomicity when applying configuration updates to logger instances in nfnetlink_log. 16) Get rid of the .owner attribute in our hook object. We don't need this anymore since we're dropping pending packets that have escaped from the kernel when unremoving the hook. Patch from Florian Westphal. 17) Remove unnecessary rcu_read_lock() from nf_reinject code, we always assume RCU read side lock from .call_rcu in nfnetlink. Also from Florian. 18) Use static inline function instead of macros to define NF_HOOK() and NF_HOOK_COND() when no netfilter support in on, from Arnd Bergmann. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-18 22:48:34 -07:00
Eric Dumazet	dc6ef6be52	tcp: do not set queue_mapping on SYNACK At the time of commit `fff3269907` ("tcp: reflect SYN queue_mapping into SYNACK packets") we had little ways to cope with SYN floods. We no longer need to reflect incoming skb queue mappings, and instead can pick a TX queue based on cpu cooking the SYNACK, with normal XPS affinities. Note that all SYNACK retransmits were picking TX queue 0, this no longer is a win given that SYNACK rtx are now distributed on all cpus. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-18 22:26:02 -07:00
Pablo Neira Ayuso	f0a0a978b6	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next This merge resolves conflicts with `75aec9df3a` ("bridge: Remove br_nf_push_frag_xmit_sk") as part of Eric Biederman's effort to improve netns support in the network stack that reached upstream via David's net-next tree. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Conflicts: net/bridge/br_netfilter_hooks.c	2015-10-17 14:28:03 +02:00
Eric Dumazet	c7c49b8fde	net: add pfmemalloc check in sk_add_backlog() Greg reported crashes hitting the following check in __sk_backlog_rcv() BUG_ON(!sock_flag(sk, SOCK_MEMALLOC)); The pfmemalloc bit is currently checked in sk_filter(). This works correctly for TCP, because sk_filter() is ran in tcp_v[46]_rcv() before hitting the prequeue or backlog checks. For UDP or other protocols, this does not work, because the sk_filter() is ran from sock_queue_rcv_skb(), which might be called _after_ backlog queuing if socket is owned by user by the time packet is processed by softirq handler. Fixes: `b4b9e35585` ("netvm: set PF_MEMALLOC as appropriate during SKB processing") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Greg Thelen <gthelen@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-17 05:01:11 -07:00
Florian Westphal	ed78d09d59	netfilter: make nf_queue_entry_get_refs return void We don't care if module is being unloaded anymore since hook unregister handling will destroy queue entries using that hook. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-16 18:22:23 +02:00
Eric Dumazet	ebb516af60	tcp/dccp: fix race at listener dismantle phase Under stress, a close() on a listener can trigger the WARN_ON(sk->sk_ack_backlog) in inet_csk_listen_stop() We need to test if listener is still active before queueing a child in inet_csk_reqsk_queue_add() Create a common inet_child_forget() helper, and use it from inet_csk_reqsk_queue_add() and inet_csk_listen_stop() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-16 00:52:19 -07:00
Eric Dumazet	f03f2e154f	tcp/dccp: add inet_csk_reqsk_queue_drop_and_put() helper Let's reduce the confusion about inet_csk_reqsk_queue_drop() : In many cases we also need to release reference on request socket, so add a helper to do this, reducing code size and complexity. Fixes: `4bdc3d6614` ("tcp/dccp: fix behavior of stale SYN_RECV request sockets") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-16 00:52:18 -07:00
Jiri Pirko	4d429c5ddc	switchdev: introduce possibility to defer obj_add/del Similar to the attr usecase, the caller knows if he is holding RTNL and is in atomic section. So let the called to decide the correct call variant. This allows drivers to sleep inside their ops and wait for hw to get the operation status. Then the status is propagated into switchdev core. This avoids silent errors in drivers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 06:09:49 -07:00
Jiri Pirko	850d0cbc91	switchdev: remove pointers from switchdev objects When object is used in deferred work, we cannot use pointers in switchdev object structures because the memory they point at may be already used by someone else. So rather do local copy of the value. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 06:09:49 -07:00
Jiri Pirko	0bc05d585d	switchdev: allow caller to explicitly request attr_set as deferred Caller should know if he can call attr_set directly (when holding RTNL) or if he has to defer the att_set processing for later. This also allows drivers to sleep inside attr_set and report operation status back to switchdev core. Switchdev core then warns if status is not ok, instead of silent errors happening in drivers. Benefit from newly introduced switchdev deferred ops infrastructure. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 06:09:48 -07:00
Jiri Pirko	f7fadf3047	switchdev: make struct switchdev_attr parameter const for attr_set calls Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 06:09:46 -07:00
Jiri Pirko	793f40147e	switchdev: introduce switchdev deferred ops infrastructure Introduce infrastructure which will be used internally to defer ops. Note that the deferred ops are queued up and either are processed by scheduled work or explicitly by user calling deferred_process function. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 06:09:46 -07:00
Eric Dumazet	f985c65c90	tcp: avoid spurious SYN flood detection at listen() time At listen() time, there is a small window where listener is visible with a zero backlog, triggering a spurious "Possible SYN flooding on port" message. Nothing prevents us from setting the correct backlog. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-14 19:06:32 -07:00
Johannes Berg	4a733ef1be	mac80211: remove PM-QoS listener As this API has never really seen any use and most drivers don't ever use the value derived from it, remove it. Change the only driver using it (rt2x00) to simply use the DTIM period instead of the "max sleep" time. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-14 18:04:08 +02:00
Paolo Abeni	02a6d6136f	Revert "ipv4/icmp: redirect messages can use the ingress daddr as source" Revert the commit `e2ca690b65` ("ipv4/icmp: redirect messages can use the ingress daddr as source"), which tried to introduce a more suitable behaviour for ICMP redirect messages generated by VRRP routers. However RFC 5798 section 8.1.1 states: The IPv4 source address of an ICMP redirect should be the address that the end-host used when making its next-hop routing decision. while said commit used the generating packet destination address, which do not match the above and in most cases leads to no redirect packets to be generated. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-14 06:01:07 -07:00
David Ahern	ccf3c8c3fe	net: Add IPv6 support to l3mdev Add operations to retrieve cached IPv6 dst entry from l3mdev device and lookup IPv6 source address. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:55:04 -07:00
Avraham Stern	3b06d27795	cfg80211: Add multiple scan plans for scheduled scan Add the option to configure multiple 'scan plans' for scheduled scan. Each 'scan plan' defines the number of scan cycles and the interval between scans. The scan plans are executed in the order they were configured. The last scan plan will always run infinitely and thus defines only the interval between scans. The maximum number of scan plans supported by the device and the maximum number of iterations in a single scan plan are advertised to userspace so it can configure the scan plans appropriately. When scheduled scan results are received there is no way to know which scan plan is being currently executed, so there is no way to know when the next scan iteration will start. This is not a problem, however. The scan start timestamp is only used for flushing old scan results, and there is no difference between flushing all results received until the end of the previous iteration or the start of the current one, since no results will be received in between. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-13 10:35:26 +02:00
Dmitry Shmidt	6e19bc4b70	nl80211: allow BSS data to include CLOCK_BOOTTIME timestamp For location and connectivity services, userspace would often like to know the time when the BSS was last seen. The current "last seen" value is calculated in a way that makes it less useful, especially if the system suspended in the meantime. Add the ability for the driver to report a real CLOCK_BOOTTIME stamp that can then be reported to userspace (if present). Drivers wishing to use this must be converted to the new API to call cfg80211_inform_bss_data() or cfg80211_inform_bss_frame_data(). They need to ensure the reported value is accurate enough even when the frame might have been buffered in the device (e.g. firmware.) Signed-off-by: Dmitry Shmidt <dimitrysh@google.com> [modified to use struct, inlines] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-13 10:32:17 +02:00
Tamizh chelvam	93f0490e5d	Revert "mac80211: remove exposing 'mfp' to drivers" This reverts commit `5c48f12017`. Some device drivers (ath10k) offload part of aggregation including AddBA/DelBA negotiations to firmware. In such scenario, the PMF configuration of the station needs to be provided to driver to enable encryption of AddBA/DelBA action frames. Signed-off-by: Tamizh chelvam <c_traja@qti.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-10-13 10:29:11 +02:00
Eric W. Biederman	b72775977c	ipv6: Pass struct net into nf_ct_frag6_gather The function nf_ct_frag6_gather is called on both the input and the output paths of the networking stack. In particular ipv6_defrag which calls nf_ct_frag6_gather is called from both the the PRE_ROUTING chain on input and the LOCAL_OUT chain on output. The addition of a net parameter makes it explicit which network namespace the packets are being reassembled in, and removes the need for nf_ct_frag6_gather to guess. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:44:17 -07:00
Eric W. Biederman	19bcf9f203	ipv4: Pass struct net into ip_defrag and ip_check_defrag The function ip_defrag is called on both the input and the output paths of the networking stack. In particular conntrack when it is tracking outbound packets from the local machine calls ip_defrag. So add a struct net parameter and stop making ip_defrag guess which network namespace it needs to defragment packets in. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:44:16 -07:00
Paolo Abeni	e2ca690b65	ipv4/icmp: redirect messages can use the ingress daddr as source This patch allows configuring how the source address of ICMP redirect messages is selected; by default the old behaviour is retained, while setting icmp_redirects_use_orig_daddr force the usage of the destination address of the packet that caused the redirect. The new behaviour fits closely the RFC 5798 section 8.1.1, and fix the following scenario: Two machines are set up with VRRP to act as routers out of a subnet, they have IPs x.x.x.1/24 and x.x.x.2/24, with VRRP holding on to x.x.x.254/24. If a host in said subnet needs to get an ICMP redirect from the VRRP router, i.e. to reach a destination behind a different gateway, the source IP in the ICMP redirect is chosen as the primary IP on the interface that the packet arrived at, i.e. x.x.x.1 or x.x.x.2. The host will then ignore said redirect, due to RFC 1122 section 3.2.2.2, and will continue to use the wrong next-op. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:38:02 -07:00
Eric Dumazet	d475f090bf	tcp: shrink tcp_timewait_sock by 8 bytes Reducing tcp_timewait_sock from 280 bytes to 272 bytes allows SLAB to pack 15 objects per page instead of 14 (on x86) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:28:24 -07:00
Eric Dumazet	ed53d0ab76	net: shrink struct sock and request_sock by 8 bytes One 32bit hole is following skc_refcnt, use it. skc_incoming_cpu can also be an union for request_sock rcv_wnd. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:28:22 -07:00
Eric Dumazet	8e5eb54d30	net: align sk_refcnt on 128 bytes boundary sk->sk_refcnt is dirtied for every TCP/UDP incoming packet. This is a performance issue if multiple cpus hit a common socket, or multiple sockets are chained due to SO_REUSEPORT. By moving sk_refcnt 8 bytes further, first 128 bytes of sockets are mostly read. As they contain the lookup keys, this has a considerable performance impact, as cpus can cache them. These 8 bytes are not wasted, we use them as a place holder for various fields, depending on the socket type. Tested: SYN flood hitting a 16 RX queues NIC. TCP listener using 16 sockets and SO_REUSEPORT and SO_INCOMING_CPU for proper siloing. Could process 6.0 Mpps SYN instead of 4.2 Mpps Kernel profile looked like : 11.68% [kernel] [k] sha_transform 6.51% [kernel] [k] __inet_lookup_listener 5.07% [kernel] [k] __inet_lookup_established 4.15% [kernel] [k] memcpy_erms 3.46% [kernel] [k] ipt_do_table 2.74% [kernel] [k] fib_table_lookup 2.54% [kernel] [k] tcp_make_synack 2.34% [kernel] [k] tcp_conn_request 2.05% [kernel] [k] __netif_receive_skb_core 2.03% [kernel] [k] kmem_cache_alloc Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:28:22 -07:00
Eric Dumazet	70da268b56	net: SO_INCOMING_CPU setsockopt() support SO_INCOMING_CPU as added in commit `2c8c56e15d` was a getsockopt() command to fetch incoming cpu handling a particular TCP flow after accept() This commits adds setsockopt() support and extends SO_REUSEPORT selection logic : If a TCP listener or UDP socket has this option set, a packet is delivered to this socket only if CPU handling the packet matches the specified one. This allows to build very efficient TCP servers, using one listener per RX queue, as the associated TCP listener should only accept flows handled in softirq by the same cpu. This provides optimal NUMA behavior and keep cpu caches hot. Note that __inet_lookup_listener() still has to iterate over the list of all listeners. Following patch puts sk_refcnt in a different cache line to let this iteration hit only shared and read mostly cache lines. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:28:20 -07:00
Edward Jee	f28ea365cd	sock: support per-packet fwmark It's useful to allow users to set fwmark for an individual packet, without changing the socket state. The function this patch adds in sock layer can be used by the protocols that need such a feature. Signed-off-by: Edward Hyunkoo Jee <edjee@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:25:21 -07:00
Pablo Neira Ayuso	ae2d708ed8	netfilter: conntrack: fix crash on timeout object removal The object and module refcounts are updated for each conntrack template, however, if we delete the iptables rules and we flush the timeout database, we may end up with invalid references to timeout object that are just gone. Resolve this problem by setting the timeout reference to NULL when the custom timeout entry is removed from our base. This patch requires some RCU trickery to ensure safe pointer handling. This handling is similar to what we already do with conntrack helpers, the idea is to avoid bumping the timeout object reference counter from the packet path to avoid the cost of atomic ops. Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-12 17:04:34 +02:00
Scott Feldman	464314ea6c	switchdev: skip over ports returning -EOPNOTSUPP when recursing ports This allows us to recurse over all the ports, skipping over unsupporting ports. Without the change, the recursion would stop at first unsupported port. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 05:20:20 -07:00
Scott Feldman	f55ac58ae6	switchdev: add bridge ageing_time attribute Setting the stage to push bridge-level attributes down to port driver so hardware can be programmed accordingly. Bridge-level attribute example is ageing_time. This is a per-bridge attribute, not a per-bridge-port attr. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 05:20:19 -07:00
Vivien Didelot	8057b3e7a1	net: dsa: use switchdev obj in port_fdb_del For consistency with the FDB add operation, propagate the switchdev_obj_port_fdb structure in the DSA drivers. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-11 05:28:52 -07:00
Vivien Didelot	1f36faf269	net: dsa: push prepare phase in port_fdb_add Now that the prepare phase is pushed down to the DSA drivers, propagate it to the port_fdb_add function. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-11 05:28:50 -07:00
Vivien Didelot	146a32067b	net: dsa: add port_fdb_prepare Push the prepare phase for FDB operations down to the DSA drivers, with a new port_fdb_prepare function. Currently only mv88e6xxx is affected. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-11 05:28:49 -07:00
David S. Miller	7bcfeead48	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-10-08 Here's another set of Bluetooth & 802.15.4 patches for the 4.4 kernel. 802.15.4: - Many improvements & fixes to the mrf24j40 driver - Fixes and cleanups to nl802154, mac802154 & ieee802154 code Bluetooth: - New chipset support in btmrvl driver - Fixes & cleanups to btbcm, btmrvl, bpa10x & btintel drivers - Support for vendor specific diagnostic data through common API - Cleanups to the 6lowpan code - New events & message types for monitor channel Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-11 05:15:30 -07:00
Eric Dumazet	e446f9dfe1	net: synack packets can be attached to request sockets selinux needs few changes to accommodate fact that SYNACK messages can be attached to a request socket, lacking sk_security pointer (Only syncookies are still attached to a TCP_LISTEN socket) Adds a new sk_listener() helper, and use it in selinux and sch_fq Fixes: `ca6fb06518` ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported by: kernel test robot <ying.huang@linux.intel.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Eric Paris <eparis@parisplace.org> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-11 05:05:06 -07:00
Alexander Aring	4d6a6aed22	6lowpan: move shared settings to lowpan_netdev_setup This patch moves values for all lowpan interface to the shared implementation of 6lowpan. This patch also quietly fixes the forgotten IFF_NO_QUEUE flag for the bluetooth 6LoWPAN interface. An identically commit is `4afbc0d` ("net: 6lowpan: convert to using IFF_NO_QUEUE") which wasn't changed for bluetooth 6lowpan. All 6lowpan interfaces should be virtual with IFF_NO_QUEUE, using EUI64 address length, the mtu size is 1280 (IPV6_MIN_MTU) and the netdev type is ARPHRD_6LOWPAN. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-10-08 14:25:34 +02:00
Eric W. Biederman	ede2059dba	dst: Pass net into dst->output The network namespace is already passed into dst_output pass it into dst->output lwt->output and friends. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:27:03 -07:00
Eric W. Biederman	33224b16ff	ipv4, ipv6: Pass net into ip_local_out and ip6_local_out Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:27:02 -07:00
Eric W. Biederman	cf91a99daa	ipv4, ipv6: Pass net into __ip_local_out and __ip6_local_out Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:27:02 -07:00
Eric W. Biederman	792883303c	ipv6: Merge ip6_local_out and ip6_local_out_sk Stop hidding the sk parameter with an inline helper function and make all of the callers pass it, so that it is clear what the function is doing. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:58 -07:00
Eric W. Biederman	9f8955cc46	ipv6: Merge __ip6_local_out and __ip6_local_out_sk Only __ip6_local_out_sk has callers so rename __ip6_local_out_sk __ip6_local_out and remove the previous __ip6_local_out. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:58 -07:00
Eric W. Biederman	e2cb77db08	ipv4: Merge ip_local_out and ip_local_out_sk It is confusing and silly hiding a parameter so modify all of the callers to pass in the appropriate socket or skb->sk if no socket is known. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:57 -07:00
Eric W. Biederman	b92dacd456	ipv4: Merge __ip_local_out and __ip_local_out_sk Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:57 -07:00
Eric W. Biederman	4ebdfba73c	dst: Pass a sk into .local_out For consistency with the other similar methods in the kernel pass a struct sock into the dst_ops .local_out method. Simplifying the socket passing case is needed a prequel to passing a struct net reference into .local_out. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:55 -07:00
Eric W. Biederman	13206b6bff	net: Pass net into dst_output and remove dst_output_okfn Replace dst_output_okfn with dst_output Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:54 -07:00
Paul Moore	c72eda0608	af_unix: constify the sock parameter in unix_sk() Make unix_sk() just like inet[6]_sk() by constify'ing the sock parameter. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:05:18 -07:00
Marcel Holtmann	4b4113d6db	Bluetooth: Add debugfs entry for setting vendor diagnostic mode This adds a new debugfs entry for enabling and disabling the vendor diagnostic mode. It is only exposed for drivers that provide the set_diag driver callback and actually have an option for vendor specific diagnostic information. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-08 09:57:07 +03:00
Marcel Holtmann	e875ff8407	Bluetooth: Add support for vendor specific diagnostic channel Introduce hci_recv_diag function for HCI drivers to allow sending vendor specific diagnostic messages into the Bluetooth core stack. The messages are not processed, but they are forwarded to the monitor channel and can be retrieved by user space diagnostic tools. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-08 09:51:13 +03:00
Marcel Holtmann	6c566dd5a1	Bluetooth: Send index information updates to monitor channel The Bluetooth public device address might change during controller setup and it makes it a lot simpler for monitoring tools if they just get told what the new address is. In addition include the manufacturer / company information of the controller. That allows for easy vendor specific HCI command and event handling. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-08 09:48:34 +03:00
David S. Miller	2579c98f0d	For the current cycle, we have the following right now: * many internal fixes, API improvements, cleanups, etc. * full AP client state tracking in cfg80211/mac80211 from Ayala * VHT support (in mac80211) for mesh * some A-MSDU in A-MPDU support from Emmanuel * show current TX power to userspace (from Rafał) * support for netlink dump in vendor commands (myself) -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJWEp5XAAoJEDBSmw7B7bqr8DsQAICgQL7gSkHUlc6rbMJ9MzX+ 9W0SNpZHSmfE0ZsL3cCoeHbk5dGhX82GumIz4GeqtvIKUNHkC8qlnXJIKTEva+sp PjcF1wS0qQFdt6sg/Zxq+4Q8lZrZf1xP9W0x0ORYi9d9qej07JAZku8zYt4agpNV R4nCl/gKVF375aV8y+qi+WSZXx4j80dJkokoVk4hzotWjd0bGVL1T9YwDRzxg4FI S0DnkxlsD3MRHJXq+9+DbF5cuTjCG2LZNcDIBy455eWN27j9CWgEPVXoySQjDgQc ayf2siw7BccqnV84et0vi+0WYXdZCHm3zCen44s4vaCflhdGxdx48V+Lib6mluR3 OEM1V1l9uV97UyORPljRKvDURq2IUdLQw00of26CTX8qEnmQIfxC7qaRg0rYEiGW SbTClbEiEkBLV+sCStnkv8GJHNpvtI/2VQXH1ydrHsrWC3Sl9bpPOWYlNBPwdzM9 U4zgpxf6gLqlsukQKmMDmoKW7T04Fs0qgE99ThU2x6uTGsux8bfbxgzPCfUdeY8M HmCB5oBCZKJ5pzv6z6lUGc0cO42IL50aBrrlatrEekjevUXW3MMOZCwGrUXxpMw1 gd+2PnLCCUeDyKNvkpXEgr4uS9Egc0sWH1RlpDPaAA5gRdRHiDn7MK7Z+s5OpNIC wnFCQKB+KrNNrQFuXz9k =BF9F -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2015-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== For the current cycle, we have the following right now: * many internal fixes, API improvements, cleanups, etc. * full AP client state tracking in cfg80211/mac80211 from Ayala * VHT support (in mac80211) for mesh * some A-MSDU in A-MPDU support from Emmanuel * show current TX power to userspace (from Rafał) * support for netlink dump in vendor commands (myself) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:29:18 -07:00
David Ahern	8cbb512c92	net: Add source address lookup op for VRF Add operation to l3mdev to lookup source address for a given flow. Add support for the operation to VRF driver and convert existing IPv4 hooks to use the new lookup. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:27:44 -07:00
David Ahern	3ce58d8435	net: Refactor path selection in __ip_route_output_key_hash VRF device needs the same path selection following lookup to set source address. Rather than duplicating code, move existing code into a function that is exported to modules. Code move only; no functional change. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:27:44 -07:00
David Ahern	6e2895a8e3	net: Rename FLOWI_FLAG_VRFSRC to FLOWI_FLAG_L3MDEV_SRC Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:27:42 -07:00
Flavio Leitner	0647e70834	netfilter: remove dead code Remove __nf_conntrack_find() from headers. Fixes: `dcd93ed4cd` ("netfilter: nf_conntrack: remove dead code") Signed-off-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-05 17:32:01 +02:00
Raanan Avargil	8695a144da	tcp/dccp: fix old style declarations I’m using the compilation flag -Werror=old-style-declaration, which requires that the “inline” word would come at the beginning of the code line. $ make drivers/net/ethernet/intel/e1000e/e1000e.ko ... include/net/inet_timewait_sock.h:116:1: error: ‘inline’ is not at beginning of declaration [-Werror=old-style-declaration] static void inline inet_twsk_schedule(struct inet_timewait_sock tw, int timeo) include/net/inet_timewait_sock.h:121:1: error: ‘inline’ is not at beginning of declaration [-Werror=old-style-declaration] static void inline inet_twsk_reschedule(struct inet_timewait_sock tw, int timeo) Fixes: `ed2e923945` ("tcp/dccp: fix timewait races in timer handling") Signed-off-by: Raanan Avargil <raanan.avargil@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 04:02:59 -07:00
David S. Miller	40e106801e	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/net-next Eric W. Biederman says: ==================== net: Pass net through ip fragmention This is the next installment of my work to pass struct net through the output path so the code does not need to guess how to figure out which network namespace it is in, and ultimately routes can have output devices in another network namespace. This round focuses on passing net through ip fragmentation which we seem to call from about everywhere. That is the main ip output paths, the bridge netfilter code, and openvswitch. This has to happend at once accross the tree as function pointers are involved. First some prep work is done, then ipv4 and ipv6 are converted and then temporary helper functions are removed. ==================== Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:39:31 -07:00
Peter Nørlund	79a131592d	ipv4: ICMP packet inspection for multipath ICMP packets are inspected to let them route together with the flow they belong to, minimizing the chance that a problematic path will affect flows on other paths, and so that anycast environments can work with ECMP. Signed-off-by: Peter Nørlund <pch@ordbogen.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:00:04 -07:00
Peter Nørlund	0e884c78ee	ipv4: L3 hash-based multipath Replaces the per-packet multipath with a hash-based multipath using source and destination address. Signed-off-by: Peter Nørlund <pch@ordbogen.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 02:59:21 -07:00
Eric Dumazet	a1a5344ddb	tcp: avoid two atomic ops for syncookies inet_reqsk_alloc() is used to allocate a temporary request in order to generate a SYNACK with a cookie. Then later, syncookie validation also uses a temporary request. These paths already took a reference on listener refcount, we can avoid a couple of atomic operations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 02:45:27 -07:00
Eric Dumazet	004a5d0140	net: use sk_fullsock() in __netdev_pick_tx() SYN_RECV & TIMEWAIT sockets are not full blown, they do not have a sk_dst_cache pointer. Fixes: `ca6fb06518` ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 02:45:25 -07:00
Eric Dumazet	caf3f2676a	inet: ip_skb_dst_mtu() should use sk_fullsock() SYN_RECV & TIMEWAIT sockets are not full blown, do not even try to call ip_sk_use_pmtu() on them. Fixes: `ca6fb06518` ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 02:45:24 -07:00
Marcel Holtmann	22db3cbcf9	Bluetooth: Send transport open and close monitor events When the core starts or shuts down the actual HCI transport, send a new monitor event that indicates that this is happening. These new events correspond to HCI_DEV_OPEN and HCI_DEV_CLOSE events. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-05 10:30:49 +03:00
Marcel Holtmann	4a3f95b7b6	Bluetooth: Introduce HCI_DEV_OPEN and HCI_DEV_CLOSE events When opening the HCI transport via hdev->open send HCI_DEV_OPEN event and when closing the HCI transport via hdev->close send HCI_DEV_CLOSE. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-10-05 10:29:36 +03:00
Pablo Neira Ayuso	b7bd1809e0	netfilter: nfnetlink_queue: get rid of nfnetlink_queue_ct.c The original intention was to avoid dependencies between nfnetlink_queue and conntrack without ifdef pollution. However, we can achieve this by moving the conntrack dependent code into ctnetlink and keep some glue code to access the nfq_ct indirection from nfqueue. After this patch, the nfq_ct indirection is always compiled in the netfilter core to avoid polluting nfqueue with ifdefs. Thus, if nf_conntrack is not compiled this results in only 8-bytes of memory waste in x86_64. This patch also adds ctnetlink_nfqueue_seqadj() to avoid that the nf_conn structure layout if exposed to nf_queue, which creates another dependency with nf_conntrack at compilation time. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-04 21:45:44 +02:00
Eric Dumazet	e96f78ab27	tcp/dccp: add SLAB_DESTROY_BY_RCU flag for request sockets Before letting request sockets being put in TCP/DCCP regular ehash table, we need to add either : - SLAB_DESTROY_BY_RCU flag to their kmem_cache - add RCU grace period before freeing them. Since we carefully respected the SLAB_DESTROY_BY_RCU protocol like ESTABLISH and TIMEWAIT sockets, use it here. req_prot_init() being only used by TCP and DCCP, I did not add a new slab_flags into their rsk_prot, but reuse prot->slab_flags Since all reqsk_alloc() users are correctly dealing with a failure, add the __GFP_NOWARN flag to avoid traces under pressure. Fixes: `079096f103` ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 13:25:20 -07:00
Jiri Pirko	9e8f4a548a	switchdev: push object ID back to object structure Suggested-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:49:40 -07:00
Jiri Pirko	648b4a995a	switchdev: bring back switchdev_obj and use it as a generic object param Replace "void *obj" with a generic structure. Introduce couple of helpers along that. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:49:39 -07:00
Jiri Pirko	52ba57cfdc	switchdev: rename switchdev_obj_fdb to switchdev_obj_port_fdb Make the struct name in sync with object id name. Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:49:39 -07:00
Jiri Pirko	8f24f3095d	switchdev: rename switchdev_obj_vlan to switchdev_obj_port_vlan Make the struct name in sync with object id name. Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:49:38 -07:00
Jiri Pirko	1f86839874	switchdev: rename SWITCHDEV_ATTR_* enum values to SWITCHDEV_ATTR_ID_* To be aligned with obj. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:49:37 -07:00
Jiri Pirko	57d80838da	switchdev: rename SWITCHDEV_OBJ_* enum values to SWITCHDEV_OBJ_ID_* Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:49:36 -07:00
Eric Dumazet	ef547f2ac1	tcp: remove max_qlen_log This control variable was set at first listen(fd, backlog) call, but not updated if application tried to increase or decrease backlog. It made sense at the time listener had a non resizeable hash table. Also rounding to powers of two was not very friendly. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:44 -07:00
Eric Dumazet	10cbc8f179	tcp/dccp: remove struct listen_sock It is enough to check listener sk_state, no need for an extra condition. max_qlen_log can be moved into struct request_sock_queue We can remove syn_wait_lock and the alignment it enforced. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:43 -07:00
Eric Dumazet	ca6fb06518	tcp: attach SYNACK messages to request sockets instead of listener If a listen backlog is very big (to avoid syncookies), then the listener sk->sk_wmem_alloc is the main source of false sharing, as we need to touch it twice per SYNACK re-transmit and TX completion. (One SYN packet takes listener lock once, but up to 6 SYNACK are generated) By attaching the skb to the request socket, we remove this source of contention. Tested: listen(fd, 10485760); // single listener (no SO_REUSEPORT) 16 RX/TX queue NIC Sustain a SYNFLOOD attack of ~320,000 SYN per second, Sending ~1,400,000 SYNACK per second. Perf profiles now show listener spinlock being next bottleneck. 20.29% [kernel] [k] queued_spin_lock_slowpath 10.06% [kernel] [k] __inet_lookup_established 5.12% [kernel] [k] reqsk_timer_handler 3.22% [kernel] [k] get_next_timer_interrupt 3.00% [kernel] [k] tcp_make_synack 2.77% [kernel] [k] ipt_do_table 2.70% [kernel] [k] run_timer_softirq 2.50% [kernel] [k] ip_finish_output 2.04% [kernel] [k] cascade Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:43 -07:00
Eric Dumazet	1b33bc3e9e	ipv6: remove obsolete inet6 functions inet6_csk_search_req() and inet6_csk_reqsk_queue_hash_add() no longer exist. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:42 -07:00
Eric Dumazet	81b496b31a	tcp/dccp: shrink struct listen_sock We no longer use hash_rnd, nr_table_entries and syn_table[] For a listener with a backlog of 10 millions sockets, this saves 80 MBytes of vmalloced memory. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:42 -07:00
Eric Dumazet	079096f103	tcp/dccp: install syn_recv requests into ehash table In this patch, we insert request sockets into TCP/DCCP regular ehash table (where ESTABLISHED and TIMEWAIT sockets are) instead of using the per listener hash table. ACK packets find SYN_RECV pseudo sockets without having to find and lock the listener. In nominal conditions, this halves pressure on listener lock. Note that this will allow for SO_REUSEPORT refinements, so that we can select a listener using cpu/numa affinities instead of the prior 'consistent hash', since only SYN packets will apply this selection logic. We will shrink listen_sock in the following patch to ease code review. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ying Cai <ycai@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:41 -07:00
Eric Dumazet	2feda34192	tcp/dccp: remove inet_csk_reqsk_queue_added() timeout argument This is no longer used. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:40 -07:00
Eric Dumazet	aa3a0c8ce6	tcp: get_openreq[46]() changes When request sockets are no longer in a per listener hash table but on regular TCP ehash, we need to access listener uid through req->rsk_listener get_openreq6() also gets a const for its request socket argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:40 -07:00
Eric Dumazet	b267cdd107	tcp/dccp: init sk_prot and call sk_node_init() in reqsk_alloc() We plan to use generic functions to insert request sockets into ehash table. sk_prot needs to be set (to retrieve sk_prot->h.hashinfo) sk_node needs to be cleared. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:38 -07:00
Eric Dumazet	8d2675f1e4	tcp: move synflood_warned into struct request_sock_queue long term plan is to remove struct listen_sock when its hash table is no longer there. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:37 -07:00
Eric Dumazet	aac065c50a	tcp: move qlen/young out of struct listen_sock qlen_inc & young_inc were protected by listener lock, while qlen_dec & young_dec were atomic fields. Everything needs to be atomic for upcoming lockless listener. Also move qlen/young in request_sock_queue as we'll get rid of struct listen_sock eventually. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:36 -07:00
Eric Dumazet	fff1f3001c	tcp: add a spinlock to protect struct request_sock_queue struct request_sock_queue fields are currently protected by the listener 'lock' (not a real spinlock) We need to add a private spinlock instead, so that softirq handlers creating children do not have to worry with backlog notion that the listener 'lock' carries. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:32:36 -07:00
David S. Miller	f6d3125fa3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/dsa/slave.c net/dsa/slave.c simply had overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-02 07:21:25 -07:00
Alexander Aring	5f2ebb3b59	mac802154: check on len instead mac_len This patch change the length check to len instead of mac_len for checking if the frame control field is available to dereference. We need to change it because I saw issues with af_packet raw sockets and the mrf24j40 which calls this functionality. The raw socket functionality doesn't set the mac_len but resets the skb_mac_header to skb->data which is still correct. The issue occur at mrf24j40 only, because the driver need to evaluate the fc fields. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-30 13:22:11 +02:00
Alexander Aring	a26c5fd762	nl802154: add support for security layer This patch adds support for accessing mac802154 llsec implementation over nl802154. I added for a new Kconfig entry to provide this functionality CONFIG_IEEE802154_NL802154_EXPERIMENTAL. This interface is still in development. It provides to change security parameters and add/del/dump entries of security tables. Later we can add also a get to get an entry by unique identifier. Cc: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-30 13:16:44 +02:00
Alexander Aring	c648a0138b	netlink: add nla_get for le32 and le64 This patch adds missing inline wrappers for nla_get_le32 and nla_get_le64. The 802.15.4 MAC byteorder is little endian and we keep the byteorder for fields like address configuration in the same byteorder as it comes from the MAC layer. To provide these fields for nl802154 userspace applications, we need these inline wrappers for netlink. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-30 13:16:44 +02:00
Eric W. Biederman	7d8c6e3915	ipv6: Pass struct net through ip6_fragment Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2015-09-30 01:45:03 -05:00
Eric W. Biederman	694869b3c5	ipv4: Pass struct net through ip_fragment Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-09-30 01:45:03 -05:00
David S. Miller	4bf1b54f9d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following pull request contains Netfilter/IPVS updates for net-next containing 90 patches from Eric Biederman. The main goal of this batch is to avoid recurrent lookups for the netns pointer, that happens over and over again in our Netfilter/IPVS code. The idea consists of passing netns pointer from the hook state to the relevant functions and objects where this may be needed. You can find more information on the IPVS updates from Simon Horman's commit merge message: `c3456026ad` ("Merge tag 'ipvs2-for-v4.4' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next"). Exceptionally, this time, I'm not posting the patches again on netdev, Eric already Cc'ed this mailing list in the original submission. If you need me to make, just let me know. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 21:46:21 -07:00
Vivien Didelot	44bbcf5c4a	net: switchdev: extract struct switchdev_obj_* Now that switchdev and its drivers directly use specific switchdev_obj_* structures, move them out of the switchdev_obj union and get rif of this outer structure. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 21:32:00 -07:00
Vivien Didelot	ab06900230	net: switchdev: abstract object in add/del ops Similar to the notifier_call callback of a notifier_block, change the function signature of switchdev add and del operations to: int switchdev_port_obj_add/del(struct net_device dev, enum switchdev_obj_id id, void obj); This allows the caller to pass a specific switchdev_obj_* structure instead of the generic switchdev_obj one. Drivers implementation of these operations and switchdev have been changed accordingly. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 21:31:59 -07:00
Vivien Didelot	25f07adc47	net: switchdev: pass callback to dump operation Similar to the notifier_call callback of a notifier_block, change the function signature of switchdev dump operation to: int switchdev_port_obj_dump(struct net_device dev, enum switchdev_obj_id id, void obj, int (cb)(void obj)); This allows the caller to pass and expect back a specific switchdev_obj_* structure instead of the generic switchdev_obj one. Drivers implementation of dump operation can now expect this specific structure and call the callback with it. Drivers have been changed accordingly. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 21:31:59 -07:00
Vivien Didelot	03d5fb1862	net: switchdev: remove dev from switchdev_obj cb The net_device associated to a dump operation does not have to be passed to the callback. switchdev stores it in a superset struct, if needed. Also some drivers (such as DSA drivers) may not have easy access to it. This will simplify pushing the callback function down to the drivers. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 21:31:59 -07:00
David Ahern	9478d12d33	net: Move netif_index_is_l3_master to l3mdev.h Change CONFIG dependency to CONFIG_NET_L3_MASTER_DEV as well. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 20:40:34 -07:00
David Ahern	ec539514e5	net: Remove vrf header file Move remaining structs to VRF driver and delete the vrf header file. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 20:40:33 -07:00
David Ahern	93a7e7e837	net: Remove the now unused vrf_ptr Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 20:40:33 -07:00
David Ahern	8e1ed7058b	net: Replace calls to vrf_dev_get_rth Replace calls to vrf_dev_get_rth with l3mdev_get_rtable. The check on the flow flags is handled in the l3mdev operation. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 20:40:33 -07:00
David Ahern	3236b0042b	net: Replace vrf_dev_table and friends Replace calls to vrf_dev_table and friends with l3mdev_fib_table and kin. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 20:40:33 -07:00
David Ahern	385add906b	net: Replace vrf_master_ifindex{, _rcu} with l3mdev equivalents Replace calls to vrf_master_ifindex_rcu and vrf_master_ifindex with either l3mdev_master_ifindex_rcu or l3mdev_master_ifindex. The pattern: oif = vrf_master_ifindex(dev) ? : dev->ifindex; is replaced with oif = l3mdev_fib_oif(dev); And remove the now unused vrf macros. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 20:40:33 -07:00
David Ahern	1b69c6d0ae	net: Introduce L3 Master device abstraction L3 master devices allow users of the abstraction to influence FIB lookups for enslaved devices. Current API provides a means for the master device to return a specific FIB table for an enslaved device, to return an rtable/custom dst and influence the OIF used for fib lookups. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 20:40:32 -07:00
David Ahern	007979eaf9	net: Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER and update the name of the netif_is_vrf and netif_index_is_vrf macros. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 20:40:32 -07:00
Eric Dumazet	0536fcc039	tcp: prepare fastopen code for upcoming listener changes While auditing TCP stack for upcoming 'lockless' listener changes, I found I had to change fastopen_init_queue() to properly init the object before publishing it. Otherwise an other cpu could try to lock the spinlock before it gets properly initialized. Instead of adding appropriate barriers, just remove dynamic memory allocations : - Structure is 28 bytes on 64bit arches. Using additional 8 bytes for holding a pointer seems overkill. - Two listeners can share same cache line and performance would suffer. If we really want to save few bytes, we would instead dynamically allocate whole struct request_sock_queue in the future. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 16:53:10 -07:00
Eric Dumazet	2985aaac01	tcp: constify tcp_syn_flood_action() socket argument tcp_syn_flood_action() will soon be called with unlocked socket. In order to avoid SYN flood warning being emitted multiple times, use xchg(). Extend max_qlen_log and synflood_warned fields in struct listen_sock to u32 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 16:53:10 -07:00
Eric Dumazet	f964629e33	tcp: constify tcp_v{4\|6}_route_req() sock argument These functions do not change the listener socket. Goal is to make sure tcp_conn_request() is not messing with listener in a racy way. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 16:53:09 -07:00
Eric Dumazet	3f684b4b1f	tcp: cookie_init_sequence() cleanups Some common IPv4/IPv6 code can be factorized. Also constify cookie_init_sequence() socket argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 16:53:09 -07:00
Eric Dumazet	0c27171e66	tcp/dccp: constify syn_recv_sock() method sock argument We'll soon no longer hold listener socket lock, these functions do not modify the socket in any way. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 16:53:09 -07:00
Eric Dumazet	c28c6f0459	tcp: constify tcp_create_openreq_child() socket argument This method does not touch the listener socket. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 16:53:09 -07:00
Eric Dumazet	87e002b21a	net: constify sk_gfp_atomic() sock argument Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 16:53:08 -07:00
Eric Dumazet	1ce31c9e08	inet: constify __inet_inherit_port() sock argument socket is not touched, make it const. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 16:53:08 -07:00
Eric Dumazet	a2432c4fa5	inet: constify inet_csk_route_child_sock() socket argument The socket points to the (shared) listener. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 16:53:08 -07:00
Eric Dumazet	f76b33c32b	dccp: use inet6_csk_route_req() helper Before changing dccp_v6_request_recv_sock() sock argument to const, we need to get rid of security_sk_classify_flow(), and it seems doable by reusing inet6_csk_route_req() helper. We need to add a proto parameter to inet6_csk_route_req(), not assume it is TCP. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 16:53:08 -07:00
Eric Dumazet	72ab4a86f7	tcp: remove tcp_rcv_state_process() tcp_hdr argument Factorize code to get tcp header from skb. It makes no sense to duplicate code in callers. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 16:53:07 -07:00
Eric Dumazet	bda07a64c0	tcp: remove unused len argument from tcp_rcv_state_process() Once we realize tcp_rcv_synsent_state_process() does not use its 'len' argument and we get rid of it, then it becomes clear this argument is no longer used in tcp_rcv_state_process() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 16:53:07 -07:00
Eric Dumazet	a00e74442b	tcp/dccp: constify send_synack and send_reset socket argument None of these functions need to change the socket, make it const. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 16:53:07 -07:00
Aaron Conole	4613012db1	af_unix: Convert the unix_sk macro to an inline function for type safety As suggested by Eric Dumazet this change replaces the #define with a static inline function to enjoy complaints by the compiler when misusing the API. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 13:47:07 -07:00
Eric W. Biederman	c1444c6357	bridge: Pass net into br_validate_ipv4 and br_validate_ipv6 The network namespace is easiliy available in state->net so use it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-29 20:21:32 +02:00
Eric W. Biederman	372892ec11	ipv4: Push struct net down into nf_send_reset This is needed so struct net can be pushed down into ip_route_me_harder. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-29 20:21:31 +02:00
Ayala Beker	47edb11b52	cfg80211: allow changing station capabilities for unassociated stations Currently, cfg80211 rejects capability updates for existing entries and as a result it's impossible to update entries that were added unassociated, but that is necessary to go through the full station states from userspace, adding a station before authentication etc. Fix this by allowing updates to capabilities for stations that the driver (or mac80211) assigned unassociated state. Drivers setting the full station state support flag must use the new station type for proper operation. Signed-off-by: Ayala Beker <ayala.beker@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-09-29 15:56:50 +02:00
Helmut Schaa	35afa58862	mac80211: Copy tx'ed beacons to monitor mode When debugging wireless powersave issues on the AP side it's quite helpful to see our own beacons that are transmitted by the hardware/driver. However, this is not that easy since beacons don't pass through the regular TX queues. Preferably drivers would call ieee80211_tx_status also for tx'ed beacons but that's not always possible. Hence, just send a copy of each beacon generated by ieee80211_beacon_get_tim to monitor devices when they are getting fetched by the driver. Also add a HW flag IEEE80211_HW_BEACON_TX_STATUS that can be used by drivers to indicate that they report TX status for beacons. Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> (with a fix from Christian Lamparted rolled in) Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-09-29 15:56:33 +02:00
Loic Poulain	fbef168fec	Bluetooth: Add hci_cmd_sync function Send a HCI command and wait for command complete event. This function serializes the requests by grabbing the req_lock. Signed-off-by: Loic Poulain <loic.poulain@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-29 15:16:11 +02:00
Eric Dumazet	7c85af8810	tcp: avoid reorders for TFO passive connections We found that a TCP Fast Open passive connection was vulnerable to reorders, as the exchange might look like [1] C -> S S <FO ...> <request> [2] S -> C S. ack request <options> [3] S -> C . <answer> packets [2] and [3] can be generated at almost the same time. If C receives the 3rd packet before the 2nd, it will drop it as the socket is in SYN_SENT state and expects a SYNACK. S will have to retransmit the answer. Current OOO avoidance in linux is defeated because SYNACK packets are attached to the LISTEN socket, while DATA packets are attached to the children. They might be sent by different cpus, and different TX queues might be selected. It turns out that for TFO, we created a child, which is a full blown socket in TCP_SYN_RECV state, and we simply can attach the SYNACK packet to this socket. This means that at the time tcp_sendmsg() pushes DATA packet, skb->ooo_okay will be set iff the SYNACK packet had been sent and TX completed. This removes the reorder source at the host level. We also removed the export of tcp_try_fastopen(), as it is no longer called from IPv6. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-28 22:11:19 -07:00
Jiri Benc	b1be00a6c3	vxlan: support both IPv4 and IPv6 sockets in a single vxlan device For metadata based vxlan interface, open both IPv4 and IPv6 socket. This is much more user friendly: it's not necessary to create two vxlan interfaces and pay attention to using the right one in routing rules. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-26 22:40:55 -07:00
David S. Miller	4963ed48f2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/ipv4/arp.c The net/ipv4/arp.c conflict was one commit adding a new local variable while another commit was deleting one. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-26 16:08:27 -07:00
Eric Dumazet	1b70e977ce	inet: constify inet_rtx_syn_ack() sock argument SYNACK packets are sent on behalf on unlocked listeners or fastopen sockets. Mark socket as const to catch future changes that might break the assumption. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:39 -07:00
Eric Dumazet	ea3bea3a1d	tcp/dccp: constify rtx_synack() and friends This is done to make sure we do not change listener socket while sending SYNACK packets while socket lock is not held. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:39 -07:00
Eric Dumazet	0f935dbedc	tcp: constify tcp_v{4\|6}_send_synack() socket argument This documents fact that listener lock might not be held at the time SYNACK are sent. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:39 -07:00
Eric Dumazet	1c1e9d2b67	ipv6: constify ip6_xmit() sock argument This is to document that socket lock might not be held at this point. skb_set_owner_w() and ipv6_local_error() are using proper atomic ops or spinlocks, so we promote the socket to non const when calling them. netfilter hooks should never assume socket lock is held, we also promote the socket to non const. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:38 -07:00
Eric Dumazet	5d062de7f8	tcp: constify tcp_make_synack() socket argument listener socket is not locked when tcp_make_synack() is called. We better make sure no field is written. There is one exception : Since SYNACK packets are attached to the listener at this moment (or SYN_RECV child in case of Fast Open), sock_wmalloc() needs to update sk->sk_wmem_alloc, but this is done using atomic operations so this is safe. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:38 -07:00
Eric Dumazet	cfe673b0ae	ip: constify ip_build_and_send_pkt() socket argument This function is used to build and send SYNACK packets, possibly on behalf of unlocked listener socket. Make sure we did not miss a write by making this socket const. We no longer can use ip_select_ident() and have to either set iph->id to 0 or directly call __ip_select_ident() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:38 -07:00
Eric Dumazet	b83e3deb97	tcp: md5: constify tcp_md5_do_lookup() socket argument When TCP new listener is done, these functions will be called without socket lock being held. Make sure they don't change anything. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:38 -07:00
Eric Dumazet	4e3f5d727d	inet: constify ip_dont_fragment() arguments ip_dont_fragment() can accept const socket and dst Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:37 -07:00
Eric Dumazet	30d50c61df	ipv6: constify inet6_csk_route_req() socket argument socket is not modified, make it const so that callers can do the same if they need. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:37 -07:00
Eric Dumazet	3aef934f4d	ipv6: constify ip6_dst_lookup_{flow\|tail}() sock arguments ip6_dst_lookup_flow() and ip6_dst_lookup_tail() do not touch socket, lets add a const qualifier. This will permit the same change in inet6_csk_route_req() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:37 -07:00
Eric Dumazet	e5895bc600	inet: constify inet_csk_route_req() socket argument This is used by TCP listener core, and listener socket shall not be modified by inet_csk_route_req(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:37 -07:00
Eric Dumazet	6f9c961546	inet: constify ip_route_output_flow() socket argument Very soon, TCP stack might call inet_csk_route_req(), which calls inet_csk_route_req() with an unlocked listener socket, so we need to make sure ip_route_output_flow() is not trying to change any field from its socket argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:37 -07:00
Eric Dumazet	b1964b5fce	tcp: constify tcp_openreq_init_rwin() Soon, listener socket wont be locked when tcp_openreq_init_rwin() is called. We need to read socket fields once, as their value could change under us. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:36 -07:00
Eric Dumazet	b40cf18ef7	tcp: constify listener socket in tcp_v[46]_init_req() Soon, listener socket spinlock will no longer be held, add const arguments to tcp_v[46]_init_req() to make clear these functions can not mess socket fields. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-25 13:00:36 -07:00
Jiri Pirko	f623ab7f51	switchdev: reduce transaction phase enum down to a boolean Now, since we have only 2 values for transaction phase, just use bool. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-24 22:59:22 -07:00
Jiri Pirko	9f6467cf22	switchdev: remove "ABORT" transaction phase No longer used by drivers, as transaction queue with item destructors takes care of abort phase internally in switchdev code. So kill it. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-24 22:59:22 -07:00
Jiri Pirko	2b8a61a6fd	switchdev: remove "NONE" transaction phase Shouldn't have been there in the first place. Now it is unused, kill it. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-24 22:59:22 -07:00
Jiri Pirko	8bdb427206	switchdev: add switchdev_trans_ph_prepare/commit helpers Add helpers which should be used int attr_set/obj_add switchdev ops to check the phase of transaction. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-24 22:59:21 -07:00
Jiri Pirko	f8db83486e	switchdev: move transaction phase enum under transaction structure Before it disappears completely, move transaction phase enum under transaction structure and make attr/obj structures a bit cleaner. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-24 22:59:21 -07:00
Jiri Pirko	7ea6eb3f56	switchdev: introduce transaction item queue for attr_set and obj_add Now, the memory allocation in prepare/commit state is done separatelly in each driver (rocker). Introduce the similar mechanism in generic switchdev code, in form of queue. That can be used not only for memory allocations, but also for different items. Abort item destruction is handled as well. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-24 22:59:21 -07:00
Jiri Pirko	69f5df491e	switchdev: rename "trans" to "trans_ph". This is temporary, name "trans" will be used for something else and "trans_ph" will eventually disappear. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-24 22:59:21 -07:00
Pablo Neira Ayuso	c3456026ad	Merge tag 'ipvs2-for-v4.4' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== Second Round of IPVS Updates for v4.4 please consider these bug fixes and extensive clean-ups of IPVS from Eric Biederman for v4.4. His excellent description of the changes, which is part of an even larger set of clean-up work, is as follows: I am gradually working my way through the netfilter stack passing struct down into the netfilter hooks and from the netfilter hooks and from there down into the functions that actually care. This removes the need for netfilter functions to guess how to figure out how to compute which network namespace they are in and instead provides a simple and reliable method to do so. The cleanups stand on their own but this is part of a larger effort to have routes with an output device that is not in the current network namespace. The IPVS code has been a bit more of a challenge than most. Just passing struct net through to where it is needed did not feel clean to me. The practical issue is that the ipvs code in most places actually wants struct netns_ipvs and not struct net. So as part of this process I have turned the relationship between struct net and the structs netns_ipvs, ip_vs_conn_param, ip_vs_conn, and ip_vs_service inside out. I have modified the ipvs functions to take a struct netns_ipvs not a struct net. The net is code with fewer conversions from one type of structure to another. I did wind up adding a struct netns_ipvs parameter to quite a few functions that did not have it before so I could pass the structure down from the netfilter hooks to where it is actually needed to avoid guessing. I have broken up the work in a bunch of small patches so there is at least a chance and reviewing that each step I took is correct. The series compiles at each step so bisecting it should not be a problem if something weird comes up. The first two changes in this series are actually bug fixes. The first is a compile fix for a bug in sctp that came in, in the last round of ipvs changes merged into nf-next. The second fixes an older bug where in pathological circumstances the wrong network namespace could be used when a proc file is written to. The rest of the patchset is a bunch of boring changes getting pushing struct netns_ipvs (and by extension ipvs->net) where it needs to be. Either by replacing struct net pointers or adding new struct netns_ipvs pointers. With a handful of other minor cleanups (like removing skb_net). I have decided include the bug fixes in this pull request. Patch one relates to a bug that was added to nf-next recently and is thus not applicable to nf . Patch two could arguably be promoted to a fix for v4.3 and stable though it does not appear to be severe enough to warrant that course of action; let me know if you would like me to reconsider. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-25 01:38:58 +02:00
Jiri Benc	63d008a4e9	ipv4: send arp replies to the correct tunnel When using ip lwtunnels, the additional data for xmit (basically, the actual tunnel to use) are carried in ip_tunnel_info either in dst->lwtstate or in metadata dst. When replying to ARP requests, we need to send the reply to the same tunnel the request came from. This means we need to construct proper metadata dst for ARP replies. We could perform another route lookup to get a dst entry with the correct lwtstate. However, this won't always ensure that the outgoing tunnel is the same as the incoming one, and it won't work anyway for IPv4 duplicate address detection. The only thing to do is to "reverse" the ip_tunnel_info. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-24 14:31:36 -07:00
Jiri Benc	38cf595b19	ipv6: remove unused neigh parameter from ndisc functions Since commit `12fd84f438` ("ipv6: Remove unused neigh argument for icmp6_dst_alloc() and its callers."), the neigh parameter of ndisc_send_na and ndisc_send_ns is unused. CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-24 12:26:08 -07:00
Jiri Benc	92c14d9b5e	genetlink: simplify genl_notify The genl_notify function has too many arguments for no real reason - all callers use genl_info to get them anyway. Just pass the genl_info down to genl_notify. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-24 12:25:23 -07:00
Frederic Danis	594b31ea7d	Bluetooth: Add BT_WARN and bt_dev_warn logging macros Add warning logging macros to bluetooth subsystem logs. Signed-off-by: Frederic Danis <frederic.danis@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-24 16:25:44 +02:00
Eric W. Biederman	9cfdd75b7c	ipvs: Remove skb_sknet This function adds no real value and it obscures what the code is doing. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:43 +09:00
Eric W. Biederman	7c6c21ee94	ipvs: Remove skb_net This hack has no more users so remove it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:43 +09:00
Eric W. Biederman	7d1f88eca0	ipvs: Pass ipvs not net to ip_vs_protocol_net_(init\|cleanup) Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:43 +09:00
Eric W. Biederman	69f390934b	ipvs: Remove net argument from ip_vs_tcp_conn_listen The argument is unnecessary and in practice confusing, and has caused the callers to do all manner of silly things. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:43 +09:00
Eric W. Biederman	5703294874	ipvs: Wrap sysctl_cache_bypass and remove ifdefs in ip_vs_leave With sysctl_cache_bypass now a compile time constant the compiler can figue out that it can elimiate all of the code that depends on sysctl_cache_bypass being true. Also remove the duplicate computation of net previously necessitated by #ifdef CONFIG_SYSCTL Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:42 +09:00
Eric W. Biederman	d8f44c335a	ipvs: Pass ipvs into .conn_schedule and ip_vs_try_to_schedule This moves the hack "net_ipvs(skb_net(skb))" up one level where it will be easier to remove. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:41 +09:00
Eric W. Biederman	2f3edc6a5b	ipvs: Pass ipvs not net into ip_vs_conn_net_init and ip_vs_conn_net_cleanup Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:41 +09:00
Eric W. Biederman	0cf705c8c2	ipvs: Pass ipvs into conn_out_get Move the hack of relying on "net_ipvs(skb_net(skb))" to derive the ipvs up a layer. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:41 +09:00
Eric W. Biederman	ab16197642	ipvs: Pass ipvs into .conn_in_get and ip_vs_conn_in_get_proto Stop relying on "net_ipvs(skb_net(skb))" to derive the ipvs as skb_net is a hack. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:41 +09:00
Eric W. Biederman	1281a9c2d1	ipvs: Pass ipvs not net into init_netns and exit_netns Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:40 +09:00
Eric W. Biederman	b5dd212cc1	ipvs: Pass ipvs not net into ip_vs_app_net_init and ip_vs_app_net_cleanup Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:40 +09:00
Eric W. Biederman	9f8128a56e	ipvs: Pass ipvs not net to register_ip_vs_app and unregister_ip_vs_app Also move the tests for net_ipvs being NULL into __ip_vs_ftp_init and __ip_vs_ftp_exit. The only places where they possibly make sense. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:40 +09:00
Eric W. Biederman	3250dc9c52	ipvs: Pass ipvs not net to register_ip_vs_app_inc Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:40 +09:00
Eric W. Biederman	19648918fb	ipvs: Pass ipvs not net into register_app and unregister_app Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:39 +09:00
Eric W. Biederman	a4dd0360c6	ipvs: Pass ipvs not net to ip_vs_estimator_net_init and ip_vs_estimator_cleanup Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:39 +09:00
Eric W. Biederman	3d99376689	ipvs: Pass ipvs not net into ip_vs_control_net_(init\|cleanup) Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:39 +09:00
Eric W. Biederman	423b55954d	ipvs: Pass ipvs not net to ip_vs_random_drop_entry Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:39 +09:00
Eric W. Biederman	0f34d54bf4	ipvs: Pass ipvs not net to ip_vs_start_estimator aned ip_vs_stop_estimator Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:39 +09:00
Eric W. Biederman	ebea1f7c0b	ipvs: Pass ipvs not net to ip_vs_sync_net_cleanup Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:38 +09:00
Eric W. Biederman	802cb43703	ipvs: Pass ipvs not net to ip_vs_sync_net_init Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:38 +09:00
Eric W. Biederman	b61a8c1a40	ipvs: Pass ipvs not net to ip_vs_sync_conn Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:38 +09:00
Eric W. Biederman	b3cf3cbfb5	ipvs: Pass ipvs not net to stop_sync_thread Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:37 +09:00
Eric W. Biederman	6ac121d710	ipvs: Pass ipvs not net to start_sync_thread Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:37 +09:00
Eric W. Biederman	18d6ade63c	ipvs: Pass ipvs not net to ip_vs_proto_data_get Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:35 +09:00
Eric W. Biederman	56d2169b77	ipvs: Pass ipvs not net to ip_vs_service_net_cleanup Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:35 +09:00
Eric W. Biederman	dc2add6f2e	ipvs: Pass ipvs not net to ip_vs_find_dest Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:34 +09:00
Eric W. Biederman	48aed1b029	ipvs: Pass ipvs not net to ip_vs_has_real_service Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:34 +09:00
Eric W. Biederman	0a4fd6ce92	ipvs: Pass ipvs not net to ip_vs_service_find Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:34 +09:00
Eric W. Biederman	3109d2f2d1	ipvs: Store ipvs not net in struct ip_vs_service In practice struct netns_ipvs is as meaningful as struct net and more useful as it holds the ipvs specific data. So store a pointer to struct netns_ipvs. Update the accesses of param->net to access param->ipvs->net instead. In functions where we are searching for an svc and filtering by net filter by ipvs instead. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:33 +09:00
Eric W. Biederman	19913dec1b	ipvs: Pass ipvs not net to ip_vs_fill_conn ipvs is what is actually desired so change the parameter and the modify the callers to pass struct netns_ipvs. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:33 +09:00
Eric W. Biederman	e64e2b460c	ipvs: Store ipvs not net in struct ip_vs_conn_param In practice struct netns_ipvs is as meaningful as struct net and more useful as it holds the ipvs specific data. So store a pointer to struct netns_ipvs. Update the accesses of param->net to access param->ipvs->net instead. When lookup up struct ip_vs_conn in a hash table replace comparisons of cp->net with comparisons of cp->ipvs which is possible now that ipvs is present in ip_vs_conn_param. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:33 +09:00
Eric W. Biederman	58dbc6f260	ipvs: Store ipvs not net in struct ip_vs_conn In practice struct netns_ipvs is as meaningful as struct net and more useful as it holds the ipvs specific data. So store a pointer to struct netns_ipvs. Update the accesses of conn->net to access conn->ipvs->net instead. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:33 +09:00
Max Filippov	06e60e5912	net/ethoc: support big-endian register layout Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-23 15:33:15 -07:00
David S. Miller	99cb99aa05	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree in this 4.4 development cycle, they are: 1) Schedule ICMP traffic to IPVS instances, this introduces a new schedule_icmp proc knob to enable/disable it. By default is off to retain the old behaviour. Patchset from Alex Gartrell. I'm also including what Alex originally said for the record: "The configuration of ipvs at Facebook is relatively straightforward. All ipvs instances bgp advertise a set of VIPs and the network prefers the nearest one or uses ECMP in the event of a tie. For the uninitiated, ECMP deterministically and statelessly load balances by hashing the packet (usually a 5-tuple of protocol, saddr, daddr, sport, and dport) and using that number as an index (basic hash table type logic). The problem is that ICMP packets (which contain really important information like whether or not an MTU has been exceeded) will get a different hash value and may end up at a different ipvs instance. With no information about where to route these packets, they are dropped, creating ICMP black holes and breaking Path MTU discovery. Suddenly, my mom's pictures can't load and I'm fielding midday calls that I want nothing to do with. To address this, this patch set introduces the ability to schedule icmp packets which is gated by a sysctl net.ipv4.vs.schedule_icmp. If set to 0, the old behavior is maintained -- otherwise ICMP packets are scheduled." 2) Add another proc entry to ignore tunneled packets to avoid routing loops from IPVS, also from Alex. 3) Fifteen patches from Eric Biederman to: * Stop passing nf_hook_ops as parameter to the hook and use the state hook object instead all around the netfilter code, so only the private data pointer is passed to the registered hook function. * Now that we've got state->net, propagate the netns pointer to netfilter hook clients to avoid its computation over and over again. A good example of how this has been simplified is the former TEE target (now nf_dup infrastructure) since it has killed the ugly pick_net() function. There's another round of netns updates from Eric Biederman making the line. To avoid the patchbomb again to almost all the networking mailing list (that is 84 patches) I'd suggest we send you a pull request with no patches or let me know if you prefer a better way. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-22 13:11:43 -07:00
Johannes Berg	5359d112dc	Revert "mac80211: add pointer for driver use to key" This reverts commit `f9a060f4b2`. No driver has turned up needing this functionality, and I've just implemented the functionality I wanted this for in a different way. Thus, remove it again, until somebody shows up with a need for having it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-09-22 15:21:28 +02:00
Emmanuel Grumbach	99e7ca44bb	mac80211: allow the driver to advertise A-MSDU within A-MPDU Rx support Drivers may be interested in receiving A-MSDU within A-MDPU. Not all the devices may be able to do so, make it configurable. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-09-22 15:21:24 +02:00
Emmanuel Grumbach	e3abc8ff0f	mac80211: allow to transmit A-MSDU within A-MPDU Advertise the capability to send A-MSDU within A-MPDU in the AddBA request sent by mac80211. Let the driver know about the peer's capabilities. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-09-22 15:21:23 +02:00
Andrei Otcheretianski	1b09b5568e	mac80211: introduce per vif frame registration API Currently the cfg80211's frame registration api receives wdev, however mac80211 assumes per device filter configuration and ignores wdev. Per device filtering is too wasteful, especially for multi-channel devices. Introduce new per vif frame registration API and use it for probe request registrations in ieee80211_mgmt_frame_register() Also call directly to ieee80211_configure_filter instead of using a work since it is now allowed to sleep in ieee80211_mgmt_frame_register. Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-09-22 15:21:22 +02:00
Johannes Berg	7bdbe400d1	nl80211: support vendor dumpit commands In order to transfer many items in vendor commands, support the dumpit netlink method for them. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-09-22 15:21:22 +02:00
Alexander Aring	87a93e4ece	ieee802154: change needed headroom/tailroom This patch cleanups needed_headroom, needed_tailroom and hard_header_len fields for wpan and lowpan interfaces. For wpan interfaces the worst case mac header len should be part of needed_headroom, currently this is set as hard_header_len, but hard_header_len should be set to the minimum header length which xmit call assumes and this is the minimum frame length of 802.15.4. The hard_header_len value will check inside send callbacl of AF_PACKET raw sockets. For lowpan interfaces, if fragmentation isn't needed the skb will call dev_hard_header for 802154 layer and queue it afterwards. This happens without new skb allocation, so we need the same headroom and tailroom lengths like 802154 inside 802154 6lowpan layer. At least we assume as minimum header length an ipv6 header size. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-22 11:51:20 +02:00
Alexander Aring	838b83d63d	ieee802154: introduce wpan_dev_header_ops The current header_ops callback structure of net device are used mostly from 802.15.4 upper-layers. Because this callback structure is a very generic one, which is also used by e.g. DGRAM AF_PACKET sockets, we can't make this callback structure 802.15.4 specific which is currently is. I saw the smallest "constraint" for calling this callback with dev_hard_header/dev_parse_header by AF_PACKET which assign a 8 byte array for address void pointers. Currently 802.15.4 specific protocols like af802154 and 6LoWPAN will assign the "struct ieee802154_addr" as these parameters which is greater than 8 bytes. The current callback implementation for header_ops.create assumes always a complete "struct ieee802154_addr" which AF_PACKET can't never handled and is greater than 8 bytes. For that reason we introduce now a "generic" create/parse header_ops callback which allows handling with intra-pan extended addresses only. This allows a small use-case with AF_PACKET to send "somehow" a valid dataframe over DGRAM. To keeping the current dev_hard_header behaviour we introduce a similar callback structure "wpan_dev_header_ops" which contains 802.15.4 specific upper-layer header creation functionality, which can be called by wpan_dev_hard_header. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-22 11:51:20 +02:00
Alexander Aring	a1da67b811	ieee802154: header_ops: fix frame control setting Sometimes upper-layer protocols wants to generate a new mac header by filling "struct ieee802154_hdr" only. These upper-layers sets for the address settings the source and dest fields, but not the fc fields for indicate the source and dest address mode. This patch changes the "ieee802154_hdr_push" function so the fc address fields are set according the source and dest fields of "struct ieee802154_hdr". Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-22 11:51:20 +02:00
Eric Dumazet	ed2e923945	tcp/dccp: fix timewait races in timer handling When creating a timewait socket, we need to arm the timer before allowing other cpus to find it. The signal allowing cpus to find the socket is setting tw_refcnt to non zero value. As we set tw_refcnt in __inet_twsk_hashdance(), we therefore need to call inet_twsk_schedule() first. This also means we need to remove tw_refcnt changes from inet_twsk_schedule() and let the caller handle it. Note that because we use mod_timer_pinned(), we have the guarantee the timer wont expire before we set tw_refcnt as we run in BH context. To make things more readable I introduced inet_twsk_reschedule() helper. When rearming the timer, we can use mod_timer_pending() to make sure we do not rearm a canceled timer. Note: This bug can possibly trigger if packets of a flow can hit multiple cpus. This does not normally happen, unless flow steering is broken somehow. This explains this bug was spotted ~5 months after its introduction. A similar fix is needed for SYN_RECV sockets in reqsk_queue_hash_req(), but will be provided in a separate patch for proper tracking. Fixes: `789f558cfb` ("tcp/dccp: get rid of central timewait timer") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Ying Cai <ycai@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-21 16:32:29 -07:00
Yuchung Cheng	0f1c28ae74	tcp: usec resolution SYN/ACK RTT Currently SYN/ACK RTT is measured in jiffies. For LAN the SYN/ACK RTT is often measured as 0ms or sometimes 1ms, which would affect RTT estimation and min RTT samping used by some congestion control. This patch improves SYN/ACK RTT to be usec resolution if platform supports it. While the timestamping of SYN/ACK is done in request sock, the RTT measurement is carefully arranged to avoid storing another u64 timestamp in tcp_sock. For regular handshake w/o SYNACK retransmission, the RTT is sampled right after the child socket is created and right before the request sock is released (tcp_check_req() in tcp_minisocks.c) For Fast Open the child socket is already created when SYN/ACK was sent, the RTT is sampled in tcp_rcv_state_process() after processing the final ACK an right before the request socket is released. If the SYN/ACK was retransmistted or SYN-cookie was used, we rely on TCP timestamps to measure the RTT. The sample is taken at the same place in tcp_rcv_state_process() after the timestamp values are validated in tcp_validate_incoming(). Note that we do not store TS echo value in request_sock for SYN-cookies, because the value is already stored in tp->rx_opt used by tcp_ack_update_rtt(). One side benefit is that the RTT measurement now happens before initializing congestion control (of the passive side). Therefore the congestion control can use the SYN/ACK RTT. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-21 16:19:01 -07:00
Ursula Braun	91e60eb60b	s390/iucv: do not use arrays as argument The iucv code uses arrays as arguments. Even though this does not really cause a problem, it could be misleading, since the compiler turns array arguments into just a pointer argument. To be more precise this patch changes the array arguments into pointers. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-21 16:03:04 -07:00
David S. Miller	5dcd246107	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-09-18 Here's the first bluetooth-next pull request for the 4.4 kernel: - ieee802154 cleanups & fixes - debugfs support for the at86rf230 driver - Support for quirky (seemingly counterfeit) CSR Bluetooth controllers - Power management and device config improvements for Intel controllers - Fix for devices with incorrect advertising data length - Fix for closing HCI user channel socket Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-21 16:00:44 -07:00
Nicolas Dichtel	83cf9a2521	ip6tunnel: make rx/tx bytes counters consistent Like the previous patch, which fixes ipv4 tunnels, here is the ipv6 part. Before the patch, the external ipv6 header + gre header were included on tx. After the patch: $ ping -c1 192.168.6.121 ; ip -s l ls dev ip6gre1 PING 192.168.6.121 (192.168.6.121) 56(84) bytes of data. 64 bytes from 192.168.6.121: icmp_req=1 ttl=64 time=1.92 ms --- 192.168.6.121 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.923/1.923/1.923/0.000 ms 7: ip6gre1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN mode DEFAULT group default link/gre6 20:01:06:60:30:08:c1:c3:00:00:00:00:00:00:01:23 peer 20:01:06:60:30:08:c1:c3:00:00:00:00:00:00:01:21 RX: bytes packets errors dropped overrun mcast 84 1 0 0 0 0 TX: bytes packets errors dropped carrier collsns 84 1 0 0 0 0 $ ping -c1 192.168.1.121 ; ip -s l ls dev ip6tnl1 PING 192.168.1.121 (192.168.1.121) 56(84) bytes of data. 64 bytes from 192.168.1.121: icmp_req=1 ttl=64 time=2.28 ms --- 192.168.1.121 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 2.288/2.288/2.288/0.000 ms 8: ip6tnl1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1452 qdisc noqueue state UNKNOWN mode DEFAULT group default link/tunnel6 2001:660:3008:c1c3::123 peer 2001:660:3008:c1c3::121 RX: bytes packets errors dropped overrun mcast 84 1 0 0 0 0 TX: bytes packets errors dropped carrier collsns 84 1 0 0 0 0 Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-20 22:36:42 -07:00
Nikola Forró	0315e38270	net: Fix behaviour of unreachable, blackhole and prohibit routes Man page of ip-route(8) says following about route types: unreachable - these destinations are unreachable. Packets are dis‐ carded and the ICMP message host unreachable is generated. The local senders get an EHOSTUNREACH error. blackhole - these destinations are unreachable. Packets are dis‐ carded silently. The local senders get an EINVAL error. prohibit - these destinations are unreachable. Packets are discarded and the ICMP message communication administratively prohibited is generated. The local senders get an EACCES error. In the inet6 address family, this was correct, except the local senders got ENETUNREACH error instead of EHOSTUNREACH in case of unreachable route. In the inet address family, all three route types generated ICMP message net unreachable, and the local senders got ENETUNREACH error. In both address families all three route types now behave consistently with documentation. Signed-off-by: Nikola Forró <nforro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-20 21:45:08 -07:00
Eric W. Biederman	c7af6483b9	netfilter: Pass net into nf_xfrm_me_harder Instead of calling dev_net on a likley looking network device pass state->net into nf_xfrm_me_harder. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 22:00:22 +02:00
Eric W. Biederman	06198b34a3	netfilter: Pass priv instead of nf_hook_ops to netfilter hooks Only pass the void *priv parameter out of the nf_hook_ops. That is all any of the functions are interested now, and by limiting what is passed it becomes simpler to change implementation details. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 22:00:16 +02:00
Eric W. Biederman	a31f1adc09	netfilter: nf_conntrack: Add a struct net parameter to l4_pkt_to_tuple As gre does not have the srckey in the packet gre_pkt_to_tuple needs to perform a lookup in it's per network namespace tables. Pass in the proper network namespace to all pkt_to_tuple implementations to ensure gre (and any similar protocols) can get this right. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 22:00:04 +02:00
Eric W. Biederman	a4ffe319ae	act_connmark: Remember the struct net instead of guessing it. Stop guessing the struct net instead of remember it. Guessing is just silly and will be problematic in the future when I implement routes between network namespaces. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 21:59:31 +02:00
Eric W. Biederman	206e8c0075	netfilter: Pass net to nf_dup_ipv4 and nf_dup_ipv6 This allows them to stop guessing the network namespace with pick_net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 21:59:11 +02:00
Eric W. Biederman	46448d0093	netfilter: nf_tables: Pass struct net in nft_pktinfo nft_pktinfo is passed on the stack so this does not bloat any in core data structures. By centrally computing this information this makes maintence of the code simpler, and understading of the code easier. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 21:58:38 +02:00
Eric W. Biederman	156c196f60	netfilter: x_tables: Pass struct net in xt_action_param As xt_action_param lives on the stack this does not bloat any persistent data structures. This is a first step in making netfilter code that needs to know which network namespace it is executing in simpler. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 21:58:14 +02:00
Eric W. Biederman	6aa187f21c	netfilter: nf_tables: kill nft_pktinfo.ops - Add nft_pktinfo.pf to replace ops->pf - Add nft_pktinfo.hook to replace ops->hooknum This simplifies the code, makes it more readable, and likely reduces cache line misses. Maintainability is enhanced as the details of nft_hook_ops are of no concern to the recpients of nft_pktinfo. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 21:58:01 +02:00
Pablo Neira Ayuso	36aea585a1	Merge tag 'ipvs-for-v4.4' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== IPVS Updates for v4.4 please consider these IPVS Updates for v4.4. The updates include the following from Alex Gartrell: * Scheduling of ICMP * Sysctl to ignore tunneled packets; and hence some packet-looping scenarios ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 21:05:03 +02:00
Szymon Janc	e781b7f7fc	Bluetooth: Add BT_ERR_RATELIMITED This patch adds ratelimited version of the BT_ERR macro. Signed-off-by: Szymon Janc <ext.szymon.janc@tieto.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-18 09:53:19 +02:00
Alexei Starovoitov	27b29f6305	bpf: add bpf_redirect() helper Existing bpf_clone_redirect() helper clones skb before redirecting it to RX or TX of destination netdev. Introduce bpf_redirect() helper that does that without cloning. Benchmarked with two hosts using 10G ixgbe NICs. One host is doing line rate pktgen. Another host is configured as: $ tc qdisc add dev $dev ingress $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \ action bpf run object-file tcbpf1_kern.o section clone_redirect_xmit drop so it receives the packet on $dev and immediately xmits it on $dev + 1 The section 'clone_redirect_xmit' in tcbpf1_kern.o file has the program that does bpf_clone_redirect() and performance is 2.0 Mpps $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \ action bpf run object-file tcbpf1_kern.o section redirect_xmit drop which is using bpf_redirect() - 2.4 Mpps and using cls_bpf with integrated actions as: $ tc filter add dev $dev root pref 10 \ bpf run object-file tcbpf1_kern.o section redirect_xmit integ_act classid 1 performance is 2.5 Mpps To summarize: u32+act_bpf using clone_redirect - 2.0 Mpps u32+act_bpf using redirect - 2.4 Mpps cls_bpf using redirect - 2.5 Mpps For comparison linux bridge in this setup is doing 2.1 Mpps and ixgbe rx + drop in ip_rcv - 7.8 Mpps Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 21:09:07 -07:00
Daniel Borkmann	045efa82ff	cls_bpf: introduce integrated actions Often cls_bpf classifier is used with single action drop attached. Optimize this use case and let cls_bpf return both classid and action. For backwards compatibility reasons enable this feature under TCA_BPF_FLAG_ACT_DIRECT flag. Then more interesting programs like the following are easier to write: int cls_bpf_prog(struct __sk_buff skb) { / classify arp, ip, ipv6 into different traffic classes * and drop all other packets */ switch (skb->protocol) { case htons(ETH_P_ARP): skb->tc_classid = 1; break; case htons(ETH_P_IP): skb->tc_classid = 2; break; case htons(ETH_P_IPV6): skb->tc_classid = 3; break; default: return TC_ACT_SHOT; } return TC_ACT_OK; } Joint work with Daniel Borkmann. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 21:09:06 -07:00
Eric Dumazet	58d607d3e5	tcp: provide skb->hash to synack packets In commit `b73c3d0e4f` ("net: Save TX flow hash in sock and set in skbuf on xmit"), Tom provided a l4 hash to most outgoing TCP packets. We'd like to provide one as well for SYNACK packets, so that all packets of a given flow share same txhash, to later enable bonding driver to also use skb->hash to perform slave selection. Note that a SYNACK retransmit shuffles the tx hash, as Tom did in commit `265f94ff54` ("net: Recompute sk_txhash on negative routing advice") for established sockets. This has nice effect making TCP flows resilient to some kind of black holes, even at connection establish phase. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Cc: Mahesh Bandewar <maheshb@google.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 21:01:04 -07:00
Eric W. Biederman	0c4b51f005	netfilter: Pass net into okfn This is immediately motivated by the bridge code that chains functions that call into netfilter. Without passing net into the okfns the bridge code would need to guess about the best expression for the network namespace to process packets in. As net is frequently one of the first things computed in continuation functions after netfilter has done it's job passing in the desired network namespace is in many cases a code simplification. To support this change the function dst_output_okfn is introduced to simplify passing dst_output as an okfn. For the moment dst_output_okfn just silently drops the struct net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:37 -07:00
Eric W. Biederman	5a70649e0d	net: Merge dst_output and dst_output_sk Add a sock paramter to dst_output making dst_output_sk superfluous. Add a skb->sk parameter to all of the callers of dst_output Have the callers of dst_output_sk call dst_output. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:32 -07:00
Eric W. Biederman	a6568b2425	xfrm: Remove unused afinfo method init_dst Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:32 -07:00
David Ahern	58189ca7b2	net: Fix vti use case with oif in dst lookups Steffen reported that the recent change to add oif to dst lookups breaks the VTI use case. The problem is that with the oif set in the flow struct the comparison to the nh_oif is triggered. Fix by splitting the FLOWI_FLAG_VRFSRC into 2 flags -- one that triggers the vrf device cache bypass (FLOWI_FLAG_VRFSRC) and another telling the lookup to not compare nh oif (FLOWI_FLAG_SKIP_NH_OIF). Fixes: `42a7b32b73` ("xfrm: Add oif to dst lookups") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 16:36:34 -07:00
Roopa Prabhu	37a1d3611c	ipv6: include NLM_F_REPLACE in route replace notifications This patch adds NLM_F_REPLACE flag to ipv6 route replace notifications. This makes nlm_flags in ipv6 replace notifications consistent with ipv4. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 15:00:27 -07:00
Stefan Schmidt	bfe08a875a	ieee802154: af_ieee802154: fix typo in comment. Signed-off-by: Stefan Schmidt <stefan@osg.samsung.com> Acked-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-17 13:20:05 +02:00
Alexander Aring	187625e184	ieee802154: 6lowpan: remove tx full-size calc workaround This patch removes a workaround for datagram_size calculation while doing fragmentation on transmit. Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-17 13:20:04 +02:00
Alexander Aring	54552d0302	ieee802154: 6lowpan: check on valid 802.15.4 frame This patch adds frame control checks to check if the received frame is something which could contain a 6LoWPAN packet. Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-17 13:20:04 +02:00
Alexander Aring	72a5e6bb51	ieee820154: 6lowpan: dispatch evaluation rework This patch complete reworks the evaluation of 6lowpan dispatch value by introducing a receive handler mechanism for each dispatch value. A list of changes: - Doing uncompression on-the-fly when FRAG1 is received, this require some special handling for 802.15.4 lltype in generic 6lowpan branch for setting the payload length correct. - Fix dispatch mask for fragmentation. - Add IPv6 dispatch evaluation for FRAG1. - Add skb_unshare for dispatch which might manipulate the skb data buffer. Cc: Jukka Rissanen <jukka.rissanen@linux.intel.com> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-17 13:20:03 +02:00
Simon Fels	6b3cc1db68	Bluetooth: close HCI device when user channel socket gets closed With `9380f9eacf` the order of unsetting the HCI_USER_CHANNEL flag of the HCI device was reverted to ensure the device is first closed before making it available again. Due to hci_dev_close checking for HCI_USER_CHANNEL being set on the device it was never really closed and was kept opened. We're now calling hci_dev_do_close directly to make sure the device is correctly closed and we keep the correct order to unset the flag on our device object. Signed-off-by: Simon Fels <simon.fels@canonical.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-17 13:20:02 +02:00
Loic Poulain	6f558b70fb	Bluetooth: Add bt_dev logging macros Add specific bluetooth device logging macros since hci device name is repeatedly referred in bluetooth subsystem logs. Signed-off-by: Loic Poulain <loic.poulain@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-09-17 13:20:00 +02:00
Alex Gartrell	4e478098ac	ipvs: add sysctl to ignore tunneled packets This is a way to avoid nasty routing loops when multiple ipvs instances can forward to eachother. Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-17 11:50:02 +09:00
Sowmini Varadhan	d5566fd72e	rtnetlink: RTEXT_FILTER_SKIP_STATS support to avoid dumping inet/inet6 stats Many commonly used functions like getifaddrs() invoke RTM_GETLINK to dump the interface information, and do not need the the AF_INET6 statististics that are always returned by default from rtnl_fill_ifinfo(). Computing the statistics can be an expensive operation that impacts scaling, so it is desirable to avoid this if the information is not needed. This patch adds a the RTEXT_FILTER_SKIP_STATS extended info flag that can be passed with netlink_request() to avoid statistics computation for the ifinfo path. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-15 15:25:02 -07:00
Martin KaFai Lau	70da5b5c53	ipv6: Replace spinlock with seqlock and rcu in ip6_tunnel This patch uses a seqlock to ensure consistency between idst->dst and idst->cookie. It also makes dst freeing from fib tree to undergo a rcu grace period. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-15 14:53:05 -07:00
Martin KaFai Lau	cdf3464e6c	ipv6: Fix dst_entry refcnt bugs in ip6_tunnel Problems in the current dst_entry cache in the ip6_tunnel: 1. ip6_tnl_dst_set is racy. There is no lock to protect it: - One major problem is that the dst refcnt gets messed up. F.e. the same dst_cache can be released multiple times and then triggering the infamous dst refcnt < 0 warning message. - Another issue is the inconsistency between dst_cache and dst_cookie. It can be reproduced by adding and removing the ip6gre tunnel while running a super_netperf TCP_CRR test. 2. ip6_tnl_dst_get does not take the dst refcnt before returning the dst. This patch: 1. Create a percpu dst_entry cache in ip6_tnl 2. Use a spinlock to protect the dst_cache operations 3. ip6_tnl_dst_get always takes the dst refcnt before returning Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-15 14:53:05 -07:00
Martin KaFai Lau	f230d1e891	ipv6: Rename the dst_cache helper functions in ip6_tunnel It is a prep work to fix the dst_entry refcnt bugs in ip6_tunnel. This patch rename: 1. ip6_tnl_dst_check() to ip6_tnl_dst_get() to better reflect that it will take a dst refcnt in the next patch. 2. ip6_tnl_dst_store() to ip6_tnl_dst_set() to have a more conventional name matching with ip6_tnl_dst_get(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-15 14:53:04 -07:00
David Ahern	b7503e0cdb	net: Add FIB table id to rtable Add the FIB table id to rtable to make the information available for IPv4 as it is for IPv6. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-15 12:01:41 -07:00
Linus Torvalds	65c61bc5db	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix out-of-bounds array access in netfilter ipset, from Jozsef Kadlecsik. 2) Use correct free operation on netfilter conntrack templates, from Daniel Borkmann. 3) Fix route leak in SCTP, from Marcelo Ricardo Leitner. 4) Fix sizeof(pointer) in mac80211, from Thierry Reding. 5) Fix cache pointer comparison in ip6mr leading to missed unlock of mrt_lock. From Richard Laing. 6) rds_conn_lookup() needs to consider network namespace in key comparison, from Sowmini Varadhan. 7) Fix deadlock in TIPC code wrt broadcast link wakeups, from Kolmakov Dmitriy. 8) Fix fd leaks in bpf syscall, from Daniel Borkmann. 9) Fix error recovery when installing ipv6 multipath routes, we would delete the old route before we would know if we could fully commit to the new set of nexthops. Fix from Roopa Prabhu. 10) Fix run-time suspend problems in r8152, from Hayes Wang. 11) In fec, don't program the MAC address into the chip when the clocks are gated off. From Fugang Duan. 12) Fix poll behavior for netlink sockets when using rx ring mmap, from Daniel Borkmann. 13) Don't allocate memory with GFP_KERNEL from get_stats64 in r8169 driver, from Corinna Vinschen. 14) In TCP Cubic congestion control, handle idle periods better where we are application limited, in order to keep cwnd from growing out of control. From Eric Dumzet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits) tcp_cubic: better follow cubic curve after idle period tcp: generate CA_EVENT_TX_START on data frames xen-netfront: respect user provided max_queues xen-netback: respect user provided max_queues r8169: Fix sleeping function called during get_stats64, v2 ether: add IEEE 1722 ethertype - TSN netlink, mmap: fix edge-case leakages in nf queue zero-copy netlink, mmap: don't walk rx ring on poll if receive queue non-empty cxgb4: changes for new firmware 1.14.4.0 net: fec: add netif status check before set mac address r8152: fix the runtime suspend issues r8152: split DRIVER_VERSION ipv6: fix ifnullfree.cocci warnings add microchip LAN88xx phy driver stmmac: fix check for phydev being open net: qlcnic: delete redundant memsets net: mv643xx_eth: use kzalloc net: jme: use kzalloc() instead of kmalloc+memset net: cavium: liquidio: use kzalloc in setup_glist() net: ipv6: use common fib_default_rule_pref ...	2015-09-10 13:53:15 -07:00
Phil Sutter	f53de1e9a4	net: ipv6: use common fib_default_rule_pref This switches IPv6 policy routing to use the shared fib_default_rule_pref() function of IPv4 and DECnet. It is also used in multicast routing for IPv4 as well as IPv6. The motivation for this patch is a complaint about iproute2 behaving inconsistent between IPv4 and IPv6 when adding policy rules: Formerly, IPv6 rules were assigned a fixed priority of 0x3FFF whereas for IPv4 the assigned priority value was decreased with each rule added. Since then all users of the default_pref field have been converted to assign the generic function fib_default_rule_pref(), fib_nl_newrule() may just use it directly instead. Therefore get rid of the function pointer altogether and make fib_default_rule_pref() static, as it's not used outside fib_rules.c anymore. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-09 14:19:50 -07:00
Linus Torvalds	26d2177e97	Changes for 4.3 - Create drivers/staging/rdma - Move amso1100 driver to staging/rdma and schedule for deletion - Move ipath driver to staging/rdma and schedule for deletion - Add hfi1 driver to staging/rdma and set TODO for move to regular tree - Initial support for namespaces to be used on RDMA devices - Add RoCE GID table handling to the RDMA core caching code - Infrastructure to support handling of devices with differing read and write scatter gather capabilities - Various iSER updates - Kill off unsafe usage of global mr registrations - Update SRP driver - Misc. mlx4 driver updates - Support for the mr_alloc verb - Support for a netlink interface between kernel and user space cache daemon to speed path record queries and route resolution - Ininitial support for safe hot removal of verbs devices -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJV7v8wAAoJELgmozMOVy/d2dcP/3PXnGFPgFGJODKE6VCZtTvj nooNXRKXjxv470UT5DiAX7SNcBxzzS7Zl/Lj+831H9iNXUyzuH31KtBOAZ3W03vZ yXwCB2caOStSldTRSUUvPe2aIFPnyNmSpC4i6XcJLJMCFijKmxin5pAo8qE44BQU yjhT+wC9P6LL5wZXsn/nFIMLjOFfu0WBFHNp3gs5j59paxlx5VeIAZk16aQZH135 m7YCyicwrS8iyWQl2bEXRMon2vlCHlX2RHmOJ4f/P5I0quNcGF2+d8Yxa+K1VyC5 zcb3OBezz+wZtvh16yhsDfSPqHWirljwID2VzOgRSzTJWvQjju8VkwHtkq6bYoBW egIxGCHcGWsD0R5iBXLYr/tB+BmjbDObSm0AsR4+JvSShkeVA1IpeoO+19162ixE n6CQnk2jCee8KXeIN4PoIKsjRSbIECM0JliWPLoIpuTuEhhpajftlSLgL5hf1dzp HrSy6fXmmoRj7wlTa7DnYIC3X+ffwckB8/t1zMAm2sKnIFUTjtQXF7upNiiyWk4L /T1QEzJ2bLQckQ9yY4v528SvBQwA4Dy1amIQB7SU8+2S//bYdUvhysWPkdKC4oOT WlqS5PFDCI31MvNbbM3rUbMAD8eBAR8ACw9ZpGI/Rffm5FEX5W3LoxA8gfEBRuqt 30ZYFuW8evTL+YQcaV65 =EHLg -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull inifiniband/rdma updates from Doug Ledford: "This is a fairly sizeable set of changes. I've put them through a decent amount of testing prior to sending the pull request due to that. There are still a few fixups that I know are coming, but I wanted to go ahead and get the big, sizable chunk into your hands sooner rather than waiting for those last few fixups. Of note is the fact that this creates what is intended to be a temporary area in the drivers/staging tree specifically for some cleanups and additions that are coming for the RDMA stack. We deprecated two drivers (ipath and amso1100) and are waiting to hear back if we can deprecate another one (ehca). We also put Intel's new hfi1 driver into this area because it needs to be refactored and a transfer library created out of the factored out code, and then it and the qib driver and the soft-roce driver should all be modified to use that library. I expect drivers/staging/rdma to be around for three or four kernel releases and then to go away as all of the work is completed and final deletions of deprecated drivers are done. Summary of changes for 4.3: - Create drivers/staging/rdma - Move amso1100 driver to staging/rdma and schedule for deletion - Move ipath driver to staging/rdma and schedule for deletion - Add hfi1 driver to staging/rdma and set TODO for move to regular tree - Initial support for namespaces to be used on RDMA devices - Add RoCE GID table handling to the RDMA core caching code - Infrastructure to support handling of devices with differing read and write scatter gather capabilities - Various iSER updates - Kill off unsafe usage of global mr registrations - Update SRP driver - Misc mlx4 driver updates - Support for the mr_alloc verb - Support for a netlink interface between kernel and user space cache daemon to speed path record queries and route resolution - Ininitial support for safe hot removal of verbs devices" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits) IB/ipoib: Suppress warning for send only join failures IB/ipoib: Clean up send-only multicast joins IB/srp: Fix possible protection fault IB/core: Move SM class defines from ib_mad.h to ib_smi.h IB/core: Remove unnecessary defines from ib_mad.h IB/hfi1: Add PSM2 user space header to header_install IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY mlx5: Fix incorrect wc pkey_index assignment for GSI messages IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow IB/uverbs: reject invalid or unknown opcodes IB/cxgb4: Fix if statement in pick_local_ip6adddrs IB/sa: Fix rdma netlink message flags IB/ucma: HW Device hot-removal support IB/mlx4_ib: Disassociate support IB/uverbs: Enable device removal when there are active user space applications IB/uverbs: Explicitly pass ib_dev to uverbs commands IB/uverbs: Fix race between ib_uverbs_open and remove_one IB/uverbs: Fix reference counting usage of event files IB/core: Make ib_dealloc_pd return void IB/srp: Create an insecure all physical rkey only if needed ...	2015-09-09 08:33:31 -07:00
Michal Hocko	e752eb6881	memcg: move memcg_proto_active from sock.h The only user is sock_update_memcg which is living in memcontrol.c so it doesn't make much sense to pollute sock.h by this inline helper. Move it to memcontrol.c and open code it into its only caller. Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-08 15:35:28 -07:00
Michal Hocko	33398cf2f3	memcg: export struct mem_cgroup mem_cgroup structure is defined in mm/memcontrol.c currently which means that the code outside of this file has to use external API even for trivial access stuff. This patch exports mm_struct with its dependencies and makes some of the exported functions inlines. This even helps to reduce the code size a bit (make defconfig + CONFIG_MEMCG=y) text data bss dec hex filename 12355346 1823792 1089536 15268674 e8fb42 vmlinux.before 12354970 1823792 1089536 15268298 e8f9ca vmlinux.after This is not much (370B) but better than nothing. We also save a function call in some hot paths like callers of mem_cgroup_count_vm_event which is used for accounting. The patch doesn't introduce any functional changes. [vdavykov@parallels.com: inline memcg_kmem_is_active] [vdavykov@parallels.com: do not expose type outside of CONFIG_MEMCG] [akpm@linux-foundation.org: memcontrol.h needs eventfd.h for eventfd_ctx] [akpm@linux-foundation.org: export mem_cgroup_from_task() to modules] Signed-off-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Vladimir Davydov <vdavydov@parallels.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-08 15:35:28 -07:00
David S. Miller	080fff50a3	For the first round of fixes, we have this: * fix for the sizeof() pointer type issue * a fix for regulatory getting into a restore loop * a fix for rfkill global 'all' state, it needs to be stored everywhere to apply correctly to new rfkill instances * properly refuse CQM RSSI when it cannot actually be used * protect HT TDLS traffic properly in non-HT networks * don't incorrectly advertise 80 MHz support when not allowed -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJV6av+AAoJEDBSmw7B7bqrNYYP/1kZjdk9s4wPHpQndfSWK/sj yTjysh/VXZ5Jyzkcmg337AjEuuJ/SntDXjMXEKTxHHrt5fmHFBvmPErPUY7p7I/t MJqxVnYGypGW4LImyXYRJbejabmPrxtHC+jcWibw24yBCzJGCXF/MeKCotksnvIn IoJDvUHrVFZhfzFRepdozJi1Qy2Y04Q7zXywplW6XFYWxdfDWaWyno/xh+rM8oO7 xSgpkuj4FDfimDBjzJOHwBH3xOlt/uHoB97G0TJbJ6k0R6a8aTz8eK5OWB4rFcn+ BsuQfaWcrGg9fJsoqz4mYlWf/F4Qj/gU9Gr5sBUWWa0xQ7HctcgJ/t27XjUWP7oV VnQ9YQt/SukG3BVf8GuNqaSo7/7oprWHdDYRH0tnvImXCWsUlHM8gv8MrG0UOHGC 4I4qgPLHon6Ui3lNHfivl8iSDFT+jfsMDMWiB6kKZRpUOdEicRrepMoDARVbTYck h6uY4Tx6OQ39evLSXjyrCWb0aOr2WoDCypxEkpE+fgDsWr8yTV0Ti0QZfVqatO3d d4GQb6U5Zkd14ymKtzULa5qjOw1TmnKpxVzSSdrONaPBVdUDsl4QH38gIppFjGsV 61Gn3PXRRgy+VAib/YyIYqilvdhZ2TdXaSVAEyKv6WjKavOFA/8KsDPmJsyKoCSn 1dDOwaiqgfu2bK1PTU9A =xw8r -----END PGP SIGNATURE----- Merge tag 'mac80211-for-davem-2015-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== For the first round of fixes, we have this: * fix for the sizeof() pointer type issue * a fix for regulatory getting into a restore loop * a fix for rfkill global 'all' state, it needs to be stored everywhere to apply correctly to new rfkill instances * properly refuse CQM RSSI when it cannot actually be used * protect HT TDLS traffic properly in non-HT networks * don't incorrectly advertise 80 MHz support when not allowed ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-06 19:49:55 -07:00
David S. Miller	53cfd053e4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Conflicts: include/net/netfilter/nf_conntrack.h The conflict was an overlap between changing the type of the zone argument to nf_ct_tmpl_alloc() whilst exporting nf_ct_tmpl_free. Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net, they are: 1) Oneliner to restore maps in nf_tables since we support addressing registers at 32 bits level. 2) Restore previous default behaviour in bridge netfilter when CONFIG_IPV6=n, oneliner from Bernhard Thaler. 3) Out of bound access in ipset hash:net* set types, reported by Dave Jones' KASan utility, patch from Jozsef Kadlecsik. 4) Fix ipset compilation with gcc 4.4.7 related to C99 initialization of unnamed unions, patch from Elad Raz. 5) Add a workaround to address inconsistent endianess in the res_id field of nfnetlink batch messages, reported by Florian Westphal. 6) Fix error paths of CT/synproxy since the conntrack template was moved to use kmalloc, patch from Daniel Borkmann. All of them look good to me to reach 4.2, I can route this to -stable myself too, just let me know what you prefer. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-05 21:57:42 -07:00
Avri Altman	22f66895e6	mac80211: protect non-HT BSS when HT TDLS traffic exists HT TDLS traffic should be protected in a non-HT BSS to avoid collisions. Therefore, when TDLS peers join/leave, check if protection is (now) needed and set the ht_operation_mode of the virtual interface according to the HT capabilities of the TDLS peer(s). This works because a non-HT BSS connection never sets (or otherwise uses) the ht_operation_mode; it just means that drivers must be aware that this field applies to all HT traffic for this virtual interface, not just the traffic within the BSS. Document that. Signed-off-by: Avri Altman <avri.altman@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-09-04 14:25:46 +02:00
Daniel Borkmann	62da98656b	netfilter: nf_conntrack: make nf_ct_zone_dflt built-in Fengguang reported, that some randconfig generated the following linker issue with nf_ct_zone_dflt object involved: [...] CC init/version.o LD init/built-in.o net/built-in.o: In function `ipv4_conntrack_defrag': nf_defrag_ipv4.c:(.text+0x93e95): undefined reference to `nf_ct_zone_dflt' net/built-in.o: In function `ipv6_defrag': nf_defrag_ipv6_hooks.c:(.text+0xe3ffe): undefined reference to `nf_ct_zone_dflt' make: *** [vmlinux] Error 1 Given that configurations exist where we have a built-in part, which is accessing nf_ct_zone_dflt such as the two handlers nf_ct_defrag_user() and nf_ct6_defrag_user(), and a part that configures nf_conntrack as a module, we must move nf_ct_zone_dflt into a fixed, guaranteed built-in area when netfilter is configured in general. Therefore, split the more generic parts into a common header under include/linux/netfilter/ and move nf_ct_zone_dflt into the built-in section that already holds parts related to CONFIG_NF_CONNTRACK in the netfilter core. This fixes the issue on my side. Fixes: `308ac9143e` ("netfilter: nf_conntrack: push zone object into functions") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-02 16:32:56 -07:00
David S. Miller	20a17bf6c0	flow_dissector: Use 'const' where possible. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-01 21:19:17 -07:00
David S. Miller	4b36993d3d	flow_dissector: Don't use bit fields. Just have a flags member instead. In file included from include/linux/linkage.h:4:0, from include/linux/kernel.h:6, from net/core/flow_dissector.c:1: In function 'flow_keys_hash_start', inlined from 'flow_hash_from_keys' at net/core/flow_dissector.c:553:34: >> include/linux/compiler.h:447:38: error: call to '__compiletime_assert_459' declared with attribute error: BUILD_BUG_ON failed: FLOW_KEYS_HASH_OFFSET % sizeof(u32) Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-01 16:46:08 -07:00
Tom Herbert	823b969395	flow_dissector: Add control/reporting of encapsulation Add an input flag to flow dissector on rather dissection should stop when encapsulation is detected (IP/IP or GRE). Also, add a key_control flag that indicates encapsulation was encountered during the dissection. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-01 15:06:23 -07:00
Tom Herbert	872b1abb1e	flow_dissector: Add flag to stop parsing when an IPv6 flow label is seen Add an input flag to flow dissector on rather dissection should be stopped when a flow label is encountered. Presumably, the flow label is derived from a sufficient hash of an inner transport packet so further dissection is not needed (that is ports are not included in the flow hash). Using the flow label instead of ports has the additional benefit that packet fragments should hash to same value as non-fragments for a flow (assuming that the same flow label is used). We set this flag by default in for skb_get_hash. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-01 15:06:23 -07:00
Tom Herbert	8306b688f1	flow_dissector: Add flag to stop parsing at L3 Add an input flag to flow dissector on rather dissection should be stopped when an L3 packet is encountered. This would be useful if a caller just wanted to get IP addresses of the outermost header (e.g. to do an L3 hash). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-01 15:06:23 -07:00
Tom Herbert	807e165dc4	flow_dissector: Add control/reporting of fragmentation Add an input flag to flow dissector on rather dissection should be attempted on a first fragment. Also add key_control flags to indicate that a packet is a fragment or first fragment. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-01 15:06:22 -07:00
Tom Herbert	c6cc1ca7f4	flowi: Abstract out functions to get flow hash based on flowi Create __get_hash_from_flowi6 and __get_hash_from_flowi4 to get the flow keys and hash based on flowi structures. These are called by __skb_get_hash_flowi6 and __skb_get_hash_flowi4. Also, created get_hash_from_flowi6 and get_hash_from_flowi4 which can be called when just the hash value for a flowi is needed. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-01 15:06:22 -07:00
Tom Herbert	bcc83839ff	skbuff: Make __skb_set_sw_hash a general function Move __skb_set_sw_hash to skbuff.h and add __skb_set_hash which is a common method (between __skb_set_sw_hash and skb_set_hash) to set the hash in an skbuff. Also, move skb_clear_hash to be closer to __skb_set_hash. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-01 15:06:22 -07:00
Tom Herbert	e5276937ae	flow_dissector: Move skb related functions to skbuff.h Move the flow dissector functions that are specific to skbuffs into skbuff.h out of flow_dissector.h. This makes flow_dissector.h have no dependencies on skbuff.h. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-01 15:06:21 -07:00
David Ahern	9b8ff51822	net: Make table id type u32 A number of VRF patches used 'int' for table id. It should be u32 to be consistent with the rest of the stack. Fixes: `4e3c89920c` ("net: Introduce VRF related flags and helpers") `15be405eb2` ("net: Add inet_addr lookup by table") `30bbaa1950` ("net: Fix up inet_addr_type checks") `021dd3b8a1` ("net: Add routes to the table associated with the device") `dc028da54e` ("inet: Move VRF table lookup to inlined function") `f6d3c19274` ("net: FIB tracepoints") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-01 14:32:44 -07:00
Daniel Borkmann	9cf94eab8b	netfilter: conntrack: use nf_ct_tmpl_free in CT/synproxy error paths Commit `0838aa7fcf` ("netfilter: fix netns dependencies with conntrack templates") migrated templates to the new allocator api, but forgot to update error paths for them in CT and synproxy to use nf_ct_tmpl_free() instead of nf_conntrack_free(). Due to that, memory is being freed into the wrong kmemcache, but also we drop the per net reference count of ct objects causing an imbalance. In Brad's case, this leads to a wrap-around of net->ct.count and thus lets __nf_conntrack_alloc() refuse to create a new ct object: [ 10.340913] xt_addrtype: ipv6 does not support BROADCAST matching [ 10.810168] nf_conntrack: table full, dropping packet [ 11.917416] r8169 0000:07:00.0 eth0: link up [ 11.917438] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 12.815902] nf_conntrack: table full, dropping packet [ 15.688561] nf_conntrack: table full, dropping packet [ 15.689365] nf_conntrack: table full, dropping packet [ 15.690169] nf_conntrack: table full, dropping packet [ 15.690967] nf_conntrack: table full, dropping packet [...] With slab debugging, it also reports the wrong kmemcache (kmalloc-512 vs. nf_conntrack_ffffffff81ce75c0) and reports poison overwrites, etc. Thus, to fix the problem, export and use nf_ct_tmpl_free() instead. Fixes: `0838aa7fcf` ("netfilter: fix netns dependencies with conntrack templates") Reported-by: Brad Jackson <bjackson0971@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-01 12:15:08 +02:00
Pravin B Shelar	63b6c13dbb	tun_dst: Remove opts_size opts_size is only written and never read. Following patch removes this unused variable. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-31 21:23:42 -07:00
Alex Gartrell	94485fedcb	ipvs: add schedule_icmp sysctl This sysctl will be used to enable the scheduling of icmp packets. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-01 10:33:44 +09:00
Alex Gartrell	802c41adcf	ipvs: drop inverse argument to conn_{in,out}_get No longer necessary since the information is included in the ip_vs_iphdr itself. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-01 10:33:37 +09:00
Alex Gartrell	4fd9beef37	ipvs: Add hdr_flags to iphdr These flags contain information like whether or not the addresses are inverted or from icmp. The first will allow us to drop an inverse param all over the place, and the second will later be useful in scheduling icmp. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-01 10:33:26 +09:00
Alex Gartrell	b0e010c527	ipvs: replace ip_vs_fill_ip4hdr with ip_vs_fill_iph_skb_off This removes some duplicated code and makes the ICMPv6 path look more like the ICMP path. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-01 10:33:21 +09:00
Eric Dumazet	c42858eaf4	gro_cells: remove spinlock protecting receive queues As David pointed out, spinlock are no longer needed to protect the per cpu queues used in gro cells infrastructure. Also use new napi_complete_done() API so that gro_flush_timeout tweaks have an effect. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-31 15:17:17 -07:00
Daniel Borkmann	c3a8d94746	tcp: use dctcp if enabled on the route to the initiator Currently, the following case doesn't use DCTCP, even if it should: A responder has f.e. Cubic as system wide default, but for a specific route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The initiating host then uses DCTCP as congestion control, but since the initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok, and we have to fall back to Reno after 3WHS completes. We were thinking on how to solve this in a minimal, non-intrusive way without bloating tcp_ecn_create_request() needlessly: lets cache the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0) is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows to only do a single metric feature lookup inside tcp_ecn_create_request(). Joint work with Florian Westphal. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-31 12:34:00 -07:00
Pravin B Shelar	4c22279848	ip-tunnel: Use API to access tunnel metadata options. Currently tun-info options pointer is used in few cases to pass options around. But tunnel options can be accessed using ip_tunnel_info_opts() API without using the pointer. Following patch removes the redundant pointer and consistently make use of API. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-31 12:28:56 -07:00
Raghavendra K T	c4c6bc3146	net: Introduce helper functions to get the per cpu data Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-30 21:48:58 -07:00
Matan Barak	e999869548	net/bonding: Export bond_option_active_slave_get_rcu Some consumers of the netdev events API would like to know who is the active slave when a NETDEV_CHANGEUPPER or NETDEV_BONDING_FAILOVER events occur. For example, when managing RoCE GIDs, GIDs based on the bond's ips should only be set on the port which corresponds to active slave netdevice. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:50 -04:00
Matan Barak	399e6f9581	net/ipv6: Export addrconf_ifid_eui48 For loopback purposes, RoCE devices should have a default GID in the port GID table, even when the interface is down. In order to do so, we use the IPv6 link local address which would have been genenrated for the related Ethernet netdevice when it goes up as a default GID. addrconf_ifid_eui48 is used to gernerate this address, export it. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-08-30 18:08:49 -04:00
Jiri Benc	a43a9ef6a2	vxlan: do not receive IPv4 packets on IPv6 socket By default (subject to the sysctl settings), IPv6 sockets listen also for IPv4 traffic. Vxlan is not prepared for that and expects IPv6 header in packets received through an IPv6 socket. In addition, it's currently not possible to have both IPv4 and IPv6 vxlan tunnel on the same port (unless bindv6only sysctl is enabled), as it's not possible to create and bind both IPv4 and IPv6 vxlan interfaces and there's no way to specify both IPv4 and IPv6 remote/group IP addresses. Set IPV6_V6ONLY on vxlan sockets to fix both of these issues. This is not done globally in udp_tunnel, as l2tp and tipc seems to work okay when receiving IPv4 packets on IPv6 socket and people may rely on this behavior. The other tunnels (geneve and fou) do not support IPv6. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-29 13:07:54 -07:00
Jiri Benc	7f9562a1f4	ip_tunnels: record IP version in tunnel info There's currently nothing preventing directing packets with IPv6 encapsulation data to IPv4 tunnels (and vice versa). If this happens, IPv6 addresses are incorrectly interpreted as IPv4 ones. Track whether the given ip_tunnel_key contains IPv4 or IPv6 data. Store this in ip_tunnel_info. Reject packets at appropriate places if they are supposed to be encapsulated into an incompatible protocol. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-29 13:07:54 -07:00
Jiri Benc	46fa062ad6	ip_tunnels: convert the mode field of ip_tunnel_info to flags The mode field holds a single bit of information only (whether the ip_tunnel_info struct is for rx or tx). Change the mode field to bit flags. This allows more mode flags to be added. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-29 13:07:54 -07:00
David S. Miller	581a5f2a61	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree. In sum, patches to address fallout from the previous round plus updates from the IPVS folks via Simon Horman, they are: 1) Add a new scheduler to IPVS: The weighted overflow scheduling algorithm directs network connections to the server with the highest weight that is currently available and overflows to the next when active connections exceed the node's weight. From Raducu Deaconu. 2) Fix locking ordering in IPVS, always take rtnl_lock in first place. Patch from Julian Anastasov. 3) Allow to indicate the MTU to the IPVS in-kernel state sync daemon. From Julian Anastasov. 4) Enhance multicast configuration for the IPVS state sync daemon. Also from Julian. 5) Resolve sparse warnings in the nf_dup modules. 6) Fix a linking problem when CONFIG_NF_DUP_IPV6 is not set. 7) Add ICMP codes 5 and 6 to IPv6 REJECT target, they are more informative subsets of code 1. From Andreas Herz. 8) Revert the jumpstack size calculation from mark_source_chains due to chain depth miscalculations, from Florian Westphal. 9) Calm down more sparse warning around the Netfilter tree, again from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-28 16:29:59 -07:00
David Ahern	192132b9a0	net: Add support for VRFs to inetpeer cache inetpeer caches based on address only, so duplicate IP addresses within a namespace return the same cached entry. Enhance the ipv4 address key to contain both the IPv4 address and VRF device index. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-28 13:32:36 -07:00
David Ahern	5345c2e12d	net: Refactor inetpeer address struct Move the inetpeer_addr_base union to inetpeer_addr and drop inetpeer_addr_base. Both the a6 and in6_addr overlays are not needed; drop the __be32 version and rename in6 to a6 for consistency with ipv4. Add a new u32 array to the union which removes the need for the typecast in the compare function and the use of a consistent arg for both ipv4 and ipv6 addresses which makes the compare function more readable. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-28 13:32:36 -07:00
David Ahern	d39d14ffa2	net: Add helper function to compare inetpeer addresses tcp_metrics and inetpeer both have functions to compare inetpeer addresses. Consolidate into 1 version. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-28 13:32:36 -07:00
David Ahern	3abef286cf	net: Add set,get helpers for inetpeer addresses Use inetpeer set,get helpers in tcp_metrics rather than peeking into the inetpeer_addr struct. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-28 13:32:36 -07:00
David Ahern	72afa352d6	net: Introduce ipv4_addr_hash and use it for tcp metrics Refactors a common line into helper function. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-28 13:32:35 -07:00
Phil Sutter	d66d6c3152	net: sched: register noqueue qdisc This way users can attach noqueue just like any other qdisc using tc without having to mess with tx_queue_len first. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-27 17:14:30 -07:00
Pravin B Shelar	371bd1061d	geneve: Consolidate Geneve functionality in single module. geneve_core module handles send and receive functionality. This way OVS could use the Geneve API. Now with use of tunnel meatadata mode OVS can directly use Geneve netdevice. So there is no need for separate module for Geneve. Following patch consolidates Geneve protocol processing in single module. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Reviewed-by: Jesse Gross <jesse@nicira.com> Acked-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-27 15:42:48 -07:00
Pravin B Shelar	e305ac6cf5	geneve: Add support to collect tunnel metadata. Following patch create new tunnel flag which enable tunnel metadata collection on given device. These devices can be used by tunnel metadata based routing or by OVS. Geneve Consolidation patch get rid of collect_md_tun to simplify tunnel lookup further. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Reviewed-by: Jesse Gross <jesse@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-27 15:42:47 -07:00
Pravin B Shelar	c29a70d2ca	tunnel: introduce udp_tun_rx_dst() Introduce function udp_tun_rx_dst() to initialize tunnel dst on receive path. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Reviewed-by: Jesse Gross <jesse@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-27 15:42:47 -07:00
Daniel Borkmann	3b3ae88026	net: sched: consolidate tc_classify{,_compat} For classifiers getting invoked via tc_classify(), we always need an extra function call into tc_classify_compat(), as both are being exported as symbols and tc_classify() itself doesn't do much except handling of reclassifications when tp->classify() returned with TC_ACT_RECLASSIFY. CBQ and ATM are the only qdiscs that directly call into tc_classify_compat(), all others use tc_classify(). When tc actions are being configured out in the kernel, tc_classify() effectively does nothing besides delegating. We could spare this layer and consolidate both functions. pktgen on single CPU constantly pushing skbs directly into the netif_receive_skb() path with a dummy classifier on ingress qdisc attached, improves slightly from 22.3Mpps to 23.1Mpps. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-27 14:18:48 -07:00
Joe Stringer	86ca02e774	netfilter: connlabels: Export setting connlabel length Add functions to change connlabel length into nf_conntrack_labels.c so they may be reused by other modules like OVS and nftables without needing to jump through xt_match_check() hoops. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Florian Westphal <fw@strlen.de> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-27 11:40:43 -07:00
Joe Stringer	e79e259588	dst: Add __skb_dst_copy() variation This variation on skb_dst_copy() doesn't require two skbs. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-27 11:40:42 -07:00
Alexei Starovoitov	cff82457c5	net_sched: act_bpf: remove spinlock in fast path Similar to act_gact/act_mirred, act_bpf can be lockless in packet processing with extra care taken to free bpf programs after rcu grace period. Replacement of existing act_bpf (very rare) is done with synchronize_rcu() and final destruction is done from tc_action_ops->cleanup() callback that is called from tcf_exts_destroy()->tcf_action_destroy()->__tcf_hash_release() when bind and refcnt reach zero which is only possible when classifier is destroyed. Previous two patches fixed the last two classifiers (tcindex and rsvp) to call tcf_exts_destroy() from rcu callback. Similar to gact/mirred there is a race between prog->filter and prog->tcf_action. Meaning that the program being replaced may use previous default action if it happened to return TC_ACT_UNSPEC. act_mirred race betwen tcf_action and tcfm_dev is similar. In all cases the race is harmless. Long term we may want to improve the situation by replacing the whole tc_action->priv as single pointer instead of updating inner fields one by one. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-26 11:01:45 -07:00
Alexei Starovoitov	3c645621b7	net_sched: make tcf_hash_destroy() static tcf_hash_destroy() used once. Make it static. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-26 11:01:44 -07:00
Jiri Benc	48e92c44bd	vxlan: fix multiple inclusion of vxlan.h The vxlan_get_sk_family inline function was added after the last #endif, making multiple inclusion of net/vxlan.h fail. Move it to the proper place. Reported-by: Mark Rustad <mark.d.rustad@intel.com> Fixes: `705cc62f67` ("vxlan: provide access function for vxlan socket address family") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-25 14:32:04 -07:00
David Ahern	2c0027cd54	inetpeer: remove dead code Remove various inlined functions not referenced in the kernel. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-25 13:45:52 -07:00
Eric Dumazet	43e122b014	tcp: refine pacing rate determination When TCP pacing was added back in linux-3.12, we chose to apply a fixed ratio of 200 % against current rate, to allow probing for optimal throughput even during slow start phase, where cwnd can be doubled every other gRTT. At Google, we found it was better applying a different ratio while in Congestion Avoidance phase. This ratio was set to 120 %. We've used the normal tcp_in_slow_start() helper for a while, then tuned the condition to select the conservative ratio as soon as cwnd >= ssthresh/2 : - After cwnd reduction, it is safer to ramp up more slowly, as we approach optimal cwnd. - Initial ramp up (ssthresh == INFINITY) still allows doubling cwnd every other RTT. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-25 11:33:54 -07:00
Eric Dumazet	6f021c62d6	tcp: fix slow start after idle vs TSO/GSO slow start after idle might reduce cwnd, but we perform this after first packet was cooked and sent. With TSO/GSO, it means that we might send a full TSO packet even if cwnd should have been reduced to IW10. Moving the SSAI check in skb_entail() makes sense, because we slightly reduce number of times this check is done, especially for large send() and TCP Small queue callbacks from softirq context. As Neal pointed out, we also need to perform the check if/when receive window opens. Tested: Following packetdrill test demonstrates the problem // Test of slow start after idle `sysctl -q net.ipv4.tcp_slow_start_after_idle=1` 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 65535 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6> +.100 < . 1:1(0) ack 1 win 511 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_SOCKET, SO_SNDBUF, [200000], 4) = 0 +0 write(4, ..., 26000) = 26000 +0 > . 1:5001(5000) ack 1 +0 > . 5001:10001(5000) ack 1 +0 %{ assert tcpi_snd_cwnd == 10 }% +.100 < . 1:1(0) ack 10001 win 511 +0 %{ assert tcpi_snd_cwnd == 20, tcpi_snd_cwnd }% +0 > . 10001:20001(10000) ack 1 +0 > P. 20001:26001(6000) ack 1 +.100 < . 1:1(0) ack 26001 win 511 +0 %{ assert tcpi_snd_cwnd == 36, tcpi_snd_cwnd }% +4 write(4, ..., 20000) = 20000 // If slow start after idle works properly, we should send 5 MSS here (cwnd/2) +0 > . 26001:31001(5000) ack 1 +0 %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }% +0 > . 31001:36001(5000) ack 1 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-25 11:22:50 -07:00
Tom Herbert	127eb7cd3c	lwt: Add cfg argument to build_state Add cfg and family arguments to lwt build state functions. cfg is a void pointer and will either be a pointer to a fib_config or fib6_config structure. The family parameter indicates which one (either AF_INET or AF_INET6). LWT encpasulation implementation may use the fib configuration to build the LWT state. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-24 10:34:40 -07:00
David S. Miller	d9893d1351	NFC 4.3 pull request This is the NFC pull request for 4.3. With this one we have: - A new driver for Samsung's S3FWRN5 NFC chipset. In order to properly support this driver, a few NCI core routines needed to be exported. Future drivers like Intel's Fields Peak will benefit from this. - SPI support as a physical transport for STM st21nfcb. - An additional netlink API for sending replies back to userspace from vendor commands. - 2 small fixes for TI's trf7970a - A few st-nci fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJV1l7oAAoJEIqAPN1PVmxK7DwP+QF3A6wCBzaNQKcla6LOl+Ru lGquPpFyihlDT/916IP7MnMNZYOP3ENdGll5lKts2yKxuty327Bb2UWNkaFP63Ei +zZwhoZXwh6dbK35kwnd87Cwgn0E8vTF+zHhC2MmP8uGDIgOuevb//2GBDayG88T rw4QMZsunT4o9x/bNK1uTlYaKPDs5pxXYFYUPXOQ2F5GqpUVFag3pLbZcJhpSZg7 9Y1KNLxwi/1rwO90JTXH9CQ+oWgVcH86nIlzqGznxgNdOoCF/V/0hdGlTzj8sE6T A3a0Qy1gaWQw9+9QoqE7YWu6JfEIgEhRgFx8dm4SsmUbrnlmgYBavEFeG7++1AG5 QByWh/h2po0MysNRCfhey2EExlZgdvc1WyLQlS+0w+aWmM5MYq+J4lx+sqXdUDdu MZyNDytRfGgRimBPSxyjuHtzrBWR8MyenjsbpraNoVDlFD8wVnGa5OcAv2gi7dkD wloOSZj5jJq9WoMeGWEwp2TsQMySJDydBSTvgqovlk2K8gmeY69g2YUUmwR2Truf fulBsO8upet/v2cRbCepI2X4NvS37wZBRcJtPpGcCoXUagA1vWH8eBKfjoduGERt vc5c9DYjSA0VEJ3kzu3Atro/oNrZDDqvX1wEn+64fk1B8v53Lvf2X0ESQQUcfWpz k7hPYzOt2IOA7d41CY59 =KMsn -----END PGP SIGNATURE----- Merge tag 'nfc-next-4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next Samuel Ortiz says: ==================== NFC 4.3 pull request This is the NFC pull request for 4.3. With this one we have: - A new driver for Samsung's S3FWRN5 NFC chipset. In order to properly support this driver, a few NCI core routines needed to be exported. Future drivers like Intel's Fields Peak will benefit from this. - SPI support as a physical transport for STM st21nfcb. - An additional netlink API for sending replies back to userspace from vendor commands. - 2 small fixes for TI's trf7970a - A few st-nci fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-23 20:42:57 -07:00
Jiri Benc	751a587ac9	route: fix breakage after moving lwtunnel state __recnt and related fields need to be in its own cacheline for performance reasons. Commit `61adedf3e3` ("route: move lwtunnel state to dst_entry") broke that on 32bit archs, causing BUILD_BUG_ON in dst_hold to be triggered. This patch fixes the breakage by moving the lwtunnel state to the end of dst_entry on 32bit archs. Unfortunately, this makes it share the cacheline with __refcnt and may affect performance, thus further patches may be needed. Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: `61adedf3e3` ("route: move lwtunnel state to dst_entry") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-23 16:51:17 -07:00

... 5 6 7 8 9 ...

9375 Commits