linux

Author	SHA1	Message	Date
Sebastian Siewior	fe420d87bb	net/core: remove explicit do_softirq() from busy_poll_stop() Since commit `217f697436` ("net: busy-poll: allow preemption in sk_busy_loop()") there is an explicit do_softirq() invocation after local_bh_enable() has been invoked. I don't understand why we need this because local_bh_enable() will invoke do_softirq() once the softirq counter reached zero and we have softirq-related work pending. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-20 13:09:33 -04:00
Serhey Popovych	bdaf32c3ce	fib_rules: Resolve goto rules target on delete We should avoid marking goto rules unresolved when their target is actually reachable after rule deletion. Consolder following sample scenario: # ip -4 ru sh 0: from all lookup local 32000: from all goto 32100 32100: from all lookup main 32100: from all lookup default 32766: from all lookup main 32767: from all lookup default # ip -4 ru del pref 32100 table main # ip -4 ru sh 0: from all lookup local 32000: from all goto 32100 [unresolved] 32100: from all lookup default 32766: from all lookup main 32767: from all lookup default After removal of first rule with preference 32100 we mark all goto rules as unreachable, even when rule with same preference as removed one still present. Check if next rule with same preference is available and make all rules with goto action pointing to it. Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-20 12:39:18 -04:00
Xin Long	86fdb3448c	sctp: ensure ep is not destroyed before doing the dump Now before dumping a sock in sctp_diag, it only holds the sock while the ep may be already destroyed. It can cause a use-after-free panic when accessing ep->asocs. This patch is to set sctp_sk(sk)->ep NULL in sctp_endpoint_destroy, and check if this ep is already destroyed before dumping this ep. Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdrver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-19 15:13:43 -04:00
Gao Feng	9745e362ad	net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev The register_vlan_device would invoke free_netdev directly, when register_vlan_dev failed. It would trigger the BUG_ON in free_netdev if the dev was already registered. In this case, the netdev would be freed in netdev_run_todo later. So add one condition check now. Only when dev is not registered, then free it directly. The following is the part coredump when netdev_upper_dev_link failed in register_vlan_dev. I removed the lines which are too long. [ 411.237457] ------------[ cut here ]------------ [ 411.237458] kernel BUG at net/core/dev.c:7998! [ 411.237484] invalid opcode: 0000 [#1] SMP [ 411.237705] [last unloaded: 8021q] [ 411.237718] CPU: 1 PID: 12845 Comm: vconfig Tainted: G E 4.12.0-rc5+ #6 [ 411.237737] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [ 411.237764] task: ffff9cbeb6685580 task.stack: ffffa7d2807d8000 [ 411.237782] RIP: 0010:free_netdev+0x116/0x120 [ 411.237794] RSP: 0018:ffffa7d2807dbdb0 EFLAGS: 00010297 [ 411.237808] RAX: 0000000000000002 RBX: ffff9cbeb6ba8fd8 RCX: 0000000000001878 [ 411.237826] RDX: 0000000000000001 RSI: 0000000000000282 RDI: 0000000000000000 [ 411.237844] RBP: ffffa7d2807dbdc8 R08: 0002986100029841 R09: 0002982100029801 [ 411.237861] R10: 0004000100029980 R11: 0004000100029980 R12: ffff9cbeb6ba9000 [ 411.238761] R13: ffff9cbeb6ba9060 R14: ffff9cbe60f1a000 R15: ffff9cbeb6ba9000 [ 411.239518] FS: 00007fb690d81700(0000) GS:ffff9cbebb640000(0000) knlGS:0000000000000000 [ 411.239949] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 411.240454] CR2: 00007f7115624000 CR3: 0000000077cdf000 CR4: 00000000003406e0 [ 411.240936] Call Trace: [ 411.241462] vlan_ioctl_handler+0x3f1/0x400 [8021q] [ 411.241910] sock_ioctl+0x18b/0x2c0 [ 411.242394] do_vfs_ioctl+0xa1/0x5d0 [ 411.242853] ? sock_alloc_file+0xa6/0x130 [ 411.243465] SyS_ioctl+0x79/0x90 [ 411.243900] entry_SYSCALL_64_fastpath+0x1e/0xa9 [ 411.244425] RIP: 0033:0x7fb69089a357 [ 411.244863] RSP: 002b:00007ffcd04e0fc8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 411.245445] RAX: ffffffffffffffda RBX: 00007ffcd04e2884 RCX: 00007fb69089a357 [ 411.245903] RDX: 00007ffcd04e0fd0 RSI: 0000000000008983 RDI: 0000000000000003 [ 411.246527] RBP: 00007ffcd04e0fd0 R08: 0000000000000000 R09: 1999999999999999 [ 411.246976] R10: 000000000000053f R11: 0000000000000202 R12: 0000000000000004 [ 411.247414] R13: 00007ffcd04e1128 R14: 00007ffcd04e2888 R15: 0000000000000001 [ 411.249129] RIP: free_netdev+0x116/0x120 RSP: ffffa7d2807dbdb0 Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-19 14:10:20 -04:00
Ivan Delalande	8917a777be	tcp: md5: add TCP_MD5SIG_EXT socket option to set a key address prefix Replace first padding in the tcp_md5sig structure with a new flag field and address prefix length so it can be specified when configuring a new key for TCP MD5 signature. The tcpm_flags field will only be used if the socket option is TCP_MD5SIG_EXT to avoid breaking existing programs, and tcpm_prefixlen only when the TCP_MD5SIG_FLAG_PREFIX flag is set. Signed-off-by: Bob Gilligan <gilligan@arista.com> Signed-off-by: Eric Mowat <mowat@arista.com> Signed-off-by: Ivan Delalande <colona@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-19 13:51:34 -04:00
Ivan Delalande	6797318e62	tcp: md5: add an address prefix for key lookup This allows the keys used for TCP MD5 signature to be used for whole range of addresses, specified with a prefix length, instead of only one address as it currently is. Signed-off-by: Bob Gilligan <gilligan@arista.com> Signed-off-by: Eric Mowat <mowat@arista.com> Signed-off-by: Ivan Delalande <colona@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-19 13:50:55 -04:00
Pablo Neira Ayuso	04ba724b65	netfilter: nfnetlink: extended ACK reporting Pass down struct netlink_ext_ack as parameter to all of our nfnetlink subsystem callbacks, so we can work on follow up patches to provide finer grain error reporting using the new infrastructure that `2d4bc93368` ("netlink: extended ACK reporting") provides. No functional change, just pass down this new object to callbacks. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-06-19 19:38:24 +02:00
Florian Westphal	d8297d4f3e	netfilter: nf_tables: reduce chain type table size text data bss dec hex filename old: 151590 2240 1152 154982 25d66 net/netfilter/nf_tables_api.o new: 151666 2240 416 154322 25ad2 net/netfilter/nf_tables_api.o Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-06-19 19:20:59 +02:00
Florian Westphal	b7b5fda468	netfilter: conntrack: use NFPROTO_MAX to size array We don't support anything larger than NFPROTO_MAX, so we can shrink this a bit: text data dec hex filename old: 8259 1096 9355 248b net/netfilter/nf_conntrack_proto.o new: 8259 624 8883 22b3 net/netfilter/nf_conntrack_proto.o Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-06-19 19:20:49 +02:00
Liping Zhang	d53e3fc390	netfilter: use nf_conntrack_helpers_register when possible amanda_helper, nf_conntrack_helper_ras and nf_conntrack_helper_q931 are all arrays, so we can use nf_conntrack_helpers_register to register the ct helper, this will help us to eliminate some "goto errX" statements. Also introduce h323_helper_init/exit helper function to register the ct helpers, this is prepared for the followup patch, which will add net namespace support for ct helper. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-06-19 19:13:21 +02:00
Jike Song	2becbbc547	netfilter, kbuild: use canonical method to specify objs. Should use ":=" instead of "+=". Signed-off-by: Jike Song <jike.song@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-06-19 19:09:20 +02:00
Gao Feng	e15b9c50c4	netfilter: ebt: Use new helper ebt_invalid_target to check target Use the new helper function ebt_invalid_target instead of the old macro INVALID_TARGET and other duplicated codes to enhance the readability. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-06-19 19:09:19 +02:00
Florian Westphal	7866cc57b5	netns: add and use net_ns_barrier Quoting Joe Stringer: If a user loads nf_conntrack_ftp, sends FTP traffic through a network namespace, destroys that namespace then unloads the FTP helper module, then the kernel will crash. Events that lead to the crash: 1. conntrack is created with ftp helper in netns x 2. This netns is destroyed 3. netns destruction is scheduled 4. netns destruction wq starts, removes netns from global list 5. ftp helper is unloaded, which resets all helpers of the conntracks via for_each_net() but because netns is already gone from list the for_each_net() loop doesn't include it, therefore all of these conntracks are unaffected. 6. helper module unload finishes 7. netns wq invokes destructor for rmmod'ed helper CC: "Eric W. Biederman" <ebiederm@xmission.com> Reported-by: Joe Stringer <joe@ovn.org> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-06-19 19:09:19 +02:00
Florian Westphal	2c41f33c1b	netfilter: move table iteration out of netns exit paths We only need to iterate & remove in case of module removal; for netns destruction all conntracks will be removed anyway. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-06-19 19:09:19 +02:00
Xin Long	202f59afd4	netfilter: ipt_CLUSTERIP: do not hold dev It's a terrible thing to hold dev in iptables target. When the dev is being removed, unregister_netdevice has to wait for the dev to become free. dmesg will keep logging the err: kernel:unregister_netdevice: waiting for veth0_in to become free. \ Usage count = 1 until iptables rules with this target are removed manually. The worse thing is when deleting a netns, a virtual nic will be deleted instead of reset to init_net in default_device_ops exit/exit_batch. As it is earlier than to flush the iptables rules in iptable_filter_net_ops exit, unregister_netdevice will block to wait for the nic to become free. As unregister_netdevice is actually waiting for iptables rules flushing while iptables rules have to be flushed after unregister_netdevice. This 'dead lock' will cause unregister_netdevice to block there forever. As the netns is not available to operate at that moment, iptables rules can not even be flushed manually either. The reproducer can be: # ip netns add test # ip link add veth0_in type veth peer name veth0_out # ip link set veth0_in netns test # ip netns exec test ip link set lo up # ip netns exec test ip link set veth0_in up # ip netns exec test iptables -I INPUT -d 1.2.3.4 -i veth0_in -j \ CLUSTERIP --new --clustermac 89:d4:47:eb:9a:fa --total-nodes 3 \ --local-node 1 --hashmode sourceip-sourceport # ip netns del test This issue can be triggered by all virtual nics with ipt_CLUSTERIP. This patch is to fix it by not holding dev in ipt_CLUSTERIP, but saving the dev->ifindex instead of the dev. As Pablo Neira Ayuso's suggestion, it will refresh c->ifindex and dev's mc by registering a netdevice notifier, just as what xt_TEE does. So it removes the old codes updating dev's mc, and also no need to initialize c->ifindex with dev->ifindex. But as one config can be shared by more than one targets, and the netdev notifier is per config, not per target. It couldn't get e->ip.iniface in the notifier handler. So e->ip.iniface has to be saved into config. Note that for backwards compatibility, this patch doesn't remove the codes checking if the dev exists before creating a config. v1->v2: - As Pablo Neira Ayuso's suggestion, register a netdevice notifier to manage c->ifindex and dev's mc. Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2017-06-19 19:06:20 +02:00
David S. Miller	4b153ca989	Here's just the fix for that ancient bug: * remove wext calling ndo_do_ioctl, since nobody needs that now and it makes the type change easier * use struct iwreq instead of struct ifreq almost everywhere in wireless extensions code * copy only struct iwreq from userspace in dev_ioctl for the wireless extensions, since it's smaller than struct ifreq -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEExu3sM/nZ1eRSfR9Ha3t4Rpy0AB0FAllDhUsACgkQa3t4Rpy0 AB2ttw//SepU66meFuzYy+bFbR38Q2sguKADmSN9jjng3oPyKhHKfEJwRusZZ3zg eEIk/NNB2iPTMLSaa4kR1Wclcae0jq5KgO8HwJBLvS7peCXWKx03vnP4Dy7yJ/6U VvOk+3JoudnQFhdDnIg+RVsGbwLx0hlq2l727U1Sp6kFyChK2etLikPzVKkEgVnG 2R/l1BhDqXdQ6Lh7nXWa6O9pwaqkpnPOuJvipJzmUQRB/4GBNjBxSK6J+ac98sm6 +KCCONBvBBMBago0xySTVURzMTrhW2UH1cE6ITQYjlShB/zsyilYkECvFzOSAYZL u9ob1yCAmZwDqhtvEUSi7CEfLtcO43I0XDF4oL00xfmYD9alm9dJPAlvZ1ihsrw7 ojBDjyykUstWRSeP8zETTdYDIMSPVsed1Y6NzQiy+el/6U3//+o2FcOShqUh89lx OIlQwX5i9LBRC/POQ6L8R4VPelNZ/czKMNlq1Z+ubNM9i3PT/8gGf6WapbMPpNUk AqAsB13tR17QmLjNpdVxHtoNvD9aceYaFkN+GXRNSb3pJNoJouedx6d5maFYJAju GRdZXBV14Z7bamKB3x9EAjpD3DHplJw4m8BvwnBr9zWkGyAvoNsHIC5h8ynzjWSp J7KpXPB9IKX6ne+1gCNrrPod2AmK4sWIaAT/SaWMCoHjV4m74k4= =O240 -----END PGP SIGNATURE----- Merge tag 'mac80211-for-davem-2017-06-16' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Here's just the fix for that ancient bug: * remove wext calling ndo_do_ioctl, since nobody needs that now and it makes the type change easier * use struct iwreq instead of struct ifreq almost everywhere in wireless extensions code * copy only struct iwreq from userspace in dev_ioctl for the wireless extensions, since it's smaller than struct ifreq ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-19 00:03:51 -04:00
Haishuang Yan	46f8cd9d2f	ip6_tunnel: Correct tos value in collect_md mode Same as ip_gre, geneve and vxlan, use key->tos as traffic class value. CC: Peter Dawson <petedaws@gmail.com> Fixes: `0e9a709560` ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets”) Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: Peter Dawson <peter.a.dawson@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-18 23:56:57 -04:00
Johan Hovold	20777bc57c	NFC: fix broken device allocation Commit `7eda8b8e96` ("NFC: Use IDR library to assing NFC devices IDs") moved device-id allocation and struct-device initialisation from nfc_allocate_device() to nfc_register_device(). This broke just about every nfc-device-registration error path, which continue to call nfc_free_device() that tries to put the device reference of the now uninitialised (but zeroed) struct device: kobject: '(null)' (ce316420): is not initialized, yet kobject_put() is being called. The late struct-device initialisation also meant that various work queues whose names are derived from the nfc device name were also misnamed: 421 root 0 SW< [(null)_nci_cmd_] 422 root 0 SW< [(null)_nci_rx_w] 423 root 0 SW< [(null)_nci_tx_w] Move the id-allocation and struct-device initialisation back to nfc_allocate_device() and fix up the single call site which did not use nfc_free_device() in its error path. Fixes: `7eda8b8e96` ("NFC: Use IDR library to assing NFC devices IDs") Cc: stable <stable@vger.kernel.org> # 3.8 Cc: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2017-06-18 23:57:58 +02:00
Florian Fainelli	06d4d450db	net: dsa: Fix legacy probing After commit `6d3c8c0dd8` ("net: dsa: Remove master_netdev and use dst->cpu_dp->netdev") and `a29342e739` ("net: dsa: Associate slave network device with CPU port") we would be seeing NULL pointer dereferences when accessing dst->cpu_dp->netdev too early. In the legacy code, we actually know early in advance the master network device, so pass it down to the relevant functions. Fixes: `6d3c8c0dd8` ("net: dsa: Remove master_netdev and use dst->cpu_dp->netdev") Fixes: `a29342e739` ("net: dsa: Associate slave network device with CPU port") Reported-by: Jason Cobham <jcobham@questertangent.com> Tested-by: Jason Cobham <jcobham@questertangent.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:59:45 -04:00
Dave Watson	d807ec656f	tls: update Kconfig Missing crypto deps for some platforms. Default to n for new module. config: m68k-amcore_defconfig (attached as .config) compiler: m68k-linux-gcc (GCC) 4.9.0 make.cross ARCH=m68k All errors (new ones prefixed by >>): net/built-in.o: In function `tls_set_sw_offload': >> (.text+0x732f8): undefined reference to `crypto_alloc_aead' net/built-in.o: In function `tls_set_sw_offload': >> (.text+0x7333c): undefined reference to `crypto_aead_setkey' net/built-in.o: In function `tls_set_sw_offload': >> (.text+0x73354): undefined reference to `crypto_aead_setauthsize' Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:56:46 -04:00
Wei Wang	a4c2fd7f78	net: remove DST_NOCACHE flag DST_NOCACHE flag check has been removed from dst_release() and dst_hold_safe() in a previous patch because all the dst are now ref counted properly and can be released based on refcnt only. Looking at the rest of the DST_NOCACHE use, all of them can now be removed or replaced with other checks. So this patch gets rid of all the DST_NOCACHE usage and remove this flag completely. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:54:01 -04:00
Wei Wang	b2a9c0ed75	net: remove DST_NOGC flag Now that all the components have been changed to release dst based on refcnt only and not depend on dst gc anymore, we can remove the temporary flag DST_NOGC. Note that we also need to remove the DST_NOCACHE check in dst_release() and dst_hold_safe() because now all the dst are released based on refcnt and behaves as DST_NOCACHE. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:54:01 -04:00
Wei Wang	5b7c9a8ff8	net: remove dst gc related code This patch removes all dst gc related code and all the dst free functions Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:54:01 -04:00
Wei Wang	560fd93bca	decnet: take dst->__refcnt when struct dn_route is created struct dn_route is inserted into dn_rt_hash_table but no dst->__refcnt is taken. This patch makes sure the dn_rt_hash_table's reference to the dst is ref counted. As the dst is always ref counted properly, we can safely mark DST_NOGC flag so dst_release() will release dst based on refcnt only. And dst gc is no longer needed and all dst_free() or its related function calls should be replaced with dst_release() or dst_release_immediate(). And dst_dev_put() is called when removing dst from the hash table to release the reference on dst->dev before we lose pointer to it. Also, correct the logic in dn_dst_check_expire() and dn_dst_gc() to check dst->__refcnt to be > 1 to indicate it is referenced by other users. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:54:01 -04:00
Wei Wang	52df157f17	xfrm: take refcnt of dst when creating struct xfrm_dst bundle During the creation of xfrm_dst bundle, always take ref count when allocating the dst. This way, xfrm_bundle_create() will form a linked list of dst with dst->child pointing to a ref counted dst child. And the returned dst pointer is also ref counted. This makes the link from the flow cache to this dst now ref counted properly. As the dst is always ref counted properly, we can safely mark DST_NOGC flag so dst_release() will release dst based on refcnt only. And dst gc is no longer needed and all dst_free() and its related function calls should be replaced with dst_release() or dst_release_immediate(). The special handling logic for dst->child in dst_destroy() can be replaced with a simple dst_release_immediate() call on the child to release the whole list linked by dst->child pointer. Previously used DST_NOHASH flag is not needed anymore as well. The reason that DST_NOHASH is used in the existing code is mainly to prevent the dst inserted in the fib tree to be wrongly destroyed during the deletion of the xfrm_dst bundle. So in the existing code, DST_NOHASH flag is marked in all the dst children except the one which is in the fib tree. However, with this patch series to remove dst gc logic and release dst only based on ref count, it is safe to release all the children from a xfrm_dst bundle as long as the dst children are all ref counted properly which is already the case in the existing code. So, this patch removes the use of DST_NOHASH flag. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:54:00 -04:00
Wei Wang	db916649b5	ipv6: get rid of icmp6 dst garbage collector icmp6 dst route is currently ref counted during creation and will be freed by user during its call of dst_release(). So no need of a garbage collector for it. Remove all icmp6 dst garbage collector related code. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:54:00 -04:00
Wei Wang	587fea7411	ipv6: mark DST_NOGC and remove the operation of dst_free() With the previous preparation patches, we are ready to get rid of the dst gc operation in ipv6 code and release dst based on refcnt only. So this patch adds DST_NOGC flag for all IPv6 dst and remove the calls to dst_free() and its related functions. At this point, all dst created in ipv6 code do not use the dst gc anymore and will be destroyed at the point when refcnt drops to 0. Also, as icmp6 dst route is refcounted during creation and will be freed by user during its call of dst_release(), there is no need to add this dst to the icmp6 gc list as well. Instead, we need to add it into uncached list so that when a NETDEV_DOWN/NETDEV_UNREGISRER event comes, we can properly go through these icmp6 dst as well and release the net device properly. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:54:00 -04:00
Wei Wang	ad65a2f056	ipv6: call dst_hold_safe() properly Similar as ipv4, ipv6 path also needs to call dst_hold_safe() when necessary to avoid double free issue on the dst. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:54:00 -04:00
Wei Wang	9514528d92	ipv6: call dst_dev_put() properly As the intend of this patch series is to completely remove dst gc, we need to call dst_dev_put() to release the reference to dst->dev when removing routes from fib because we won't keep the gc list anymore and will lose the dst pointer right after removing the routes. Without the gc list, there is no way to find all the dst's that have dst->dev pointing to the going-down dev. Hence, we are doing dst_dev_put() immediately before we lose the last reference of the dst from the routing code. The next dst_check() will trigger a route re-lookup to find another route (if there is any). Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:54:00 -04:00
Wei Wang	1cfb71eeb1	ipv6: take dst->__refcnt for insertion into fib6 tree In IPv6 routing code, struct rt6_info is created for each static route and RTF_CACHE route and inserted into fib6 tree. In both cases, dst ref count is not taken. As explained in the previous patch, this leads to the need of the dst garbage collector. This patch holds ref count of dst before inserting the route into fib6 tree and properly releases the dst when deleting it from the fib6 tree as a preparation in order to fully get rid of dst gc later. Also, correct fib6_age() logic to check dst->__refcnt to be 1 to indicate no user is referencing the dst. And remove dst_hold() in vrf_rt6_create() as ip6_dst_alloc() already puts dst->__refcnt to 1. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:54:00 -04:00
Wei Wang	b838d5e1c5	ipv4: mark DST_NOGC and remove the operation of dst_free() With the previous preparation patches, we are ready to get rid of the dst gc operation in ipv4 code and release dst based on refcnt only. So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls to dst_free(). At this point, all dst created in ipv4 code do not use the dst gc anymore and will be destroyed at the point when refcnt drops to 0. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:54:00 -04:00
Wei Wang	9df16efadd	ipv4: call dst_hold_safe() properly This patch checks all the calls to dst_hold()/skb_dst_force()/dst_clone()/dst_use() to see if dst_hold_safe() is needed to avoid double free issue if dst gc is removed and dst_release() directly destroys dst when dst->__refcnt drops to 0. In tx path, TCP hold sk->sk_rx_dst ref count and also hold sock_lock(). UDP and other similar protocols always hold refcount for skb->_skb_refdst. So both paths seem to be safe. In rx path, as it is lockless and skb_dst_set_noref() is likely to be used, dst_hold_safe() should always be used when trying to hold dst. In the routing code, if dst is held during an rcu protected session, it is necessary to call dst_hold_safe() as the current dst might be in its rcu grace period. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:54:00 -04:00
Wei Wang	95c47f9cf5	ipv4: call dst_dev_put() properly As the intend of this patch series is to completely remove dst gc, we need to call dst_dev_put() to release the reference to dst->dev when removing routes from fib because we won't keep the gc list anymore and will lose the dst pointer right after removing the routes. Without the gc list, there is no way to find all the dst's that have dst->dev pointing to the going-down dev. Hence, we are doing dst_dev_put() immediately before we lose the last reference of the dst from the routing code. The next dst_check() will trigger a route re-lookup to find another route (if there is any). Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:53:59 -04:00
Wei Wang	0830106c53	ipv4: take dst->__refcnt when caching dst in fib In IPv4 routing code, fib_nh and fib_nh_exception can hold pointers to struct rtable but they never increment dst->__refcnt. This leads to the need of the dst garbage collector because when user is done with this dst and calls dst_release(), it can only decrement dst->__refcnt and can not free the dst even it sees dst->__refcnt drops from 1 to 0 (unless DST_NOCACHE flag is set) because the routing code might still hold reference to it. And when the routing code tries to delete a route, it has to put the dst to the gc_list if dst->__refcnt is not yet 0 and have a gc thread running periodically to check on dst->__refcnt and finally to free dst when refcnt becomes 0. This patch increments dst->__refcnt when fib_nh/fib_nh_exception holds reference to this dst and properly release the dst when fib_nh/fib_nh_exception has been updated with a new dst. This patch is a preparation in order to fully get rid of dst gc later. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:53:59 -04:00
Wei Wang	4a6ce2b6f2	net: introduce a new function dst_dev_put() This function should be called when removing routes from fib tree after the dst gc is no longer in use. We first mark DST_OBSOLETE_DEAD on this dst to make sure next dst_ops->check() fails and returns NULL. Secondly, as we no longer keep the gc_list, we need to properly release dst->dev right at the moment when the dst is removed from the fib/fib6 tree. It does the following: 1. change dst->input and output pointers to dst_discard/dst_dscard_out to discard all packets 2. replace dst->dev with loopback interface Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:53:59 -04:00
Wei Wang	5f56f409b5	net: introduce DST_NOGC in dst_release() to destroy dst based on refcnt The current mechanism of freeing dst is a bit complicated. dst has its ref count and when user grabs the reference to the dst, the ref count is properly taken in most cases except in IPv4/IPv6/decnet/xfrm routing code due to some historic reasons. If the reference to dst is always taken properly, we should be able to simplify the logic in dst_release() to destroy dst when dst->__refcnt drops from 1 to 0. And this should be the only condition to determine if we can call dst_destroy(). And as dst is always ref counted, there is no need for a dst garbage list to hold the dst entries that already get removed by the routing code but are still held by other users. And the task to periodically check the list to free dst if ref count become 0 is also not needed anymore. This patch introduces a temporary flag DST_NOGC(no garbage collector). If it is set in the dst, dst_release() will call dst_destroy() when dst->__refcnt drops to 0. dst_hold_safe() will also check for this flag and do atomic_inc_not_zero() similar as DST_NOCACHE to avoid double free issue. This temporary flag is mainly used so that we can make the transition component by component without breaking other parts. This flag will be removed after all components are properly transitioned. This patch also introduces a new function dst_release_immediate() which destroys dst without waiting on the rcu when refcnt drops to 0. It will be used in later patches. Follow-up patches will correct all the places to properly take ref count on dst and mark DST_NOGC. dst_release() or dst_release_immediate() will be used to release the dst instead of dst_free() and its related functions. And final clean-up patch will remove the DST_NOGC flag. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:53:59 -04:00
Wei Wang	1dbe32525e	net: use loopback dev when generating blackhole route Existing ipv4/6_blackhole_route() code generates a blackhole route with dst->dev pointing to the passed in dst->dev. It is not necessary to hold reference to the passed in dst->dev because the packets going through this route are dropped anyway. A loopback interface is good enough so that we don't need to worry about releasing this dst->dev when this dev is going down. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:53:59 -04:00
Wei Wang	d24406c85d	udp: call dst_hold_safe() in udp_sk_rx_set_dst() In udp_v4/6_early_demux() code, we try to hold dst->__refcnt for dst with DST_NOCACHE flag. This is because later in udp_sk_rx_dst_set() function, we will try to cache this dst in sk for connected case. However, a better way to achieve this is to not try to hold dst in early_demux(), but in udp_sk_rx_dst_set(), call dst_hold_safe(). This approach is also more consistant with how tcp is handling it. And it will make later changes simpler. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:53:59 -04:00
Wei Wang	1758fd4688	ipv6: remove unnecessary dst_hold() in ip6_fragment() In ipv6 tx path, rcu_read_lock() is taken so that dst won't get freed during the execution of ip6_fragment(). Hence, no need to hold dst in it. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-17 22:53:59 -04:00
Vivien Didelot	a1a6b7ea7f	net: dsa: add cross-chip multicast support Similarly to how cross-chip VLAN works, define a bitmap of multicast group members for a switch, now including its DSA ports, so that multicast traffic can be sent to all switches of the fabric. A switch may drop the frames if no user port is a member. This brings support for multicast in a multi-chip environment. As of now, all switches of the fabric must support the multicast operations in order to program a single fabric port. Reported-by: Jason Cobham <jcobham@questertangent.com> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Tested-by: Jason Cobham <jcobham@questertangent.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-16 15:21:14 -04:00
Wei Wang	247488c0a4	decnet: always not take dst->__refcnt when inserting dst into hash table In the existing dn_route.c code, dn_route_output_slow() takes dst->__refcnt before calling dn_insert_route() while dn_route_input_slow() does not take dst->__refcnt before calling dn_insert_route(). This makes the whole routing code very buggy. In dn_dst_check_expire(), dnrt_free() is called when rt expires. This makes the routes inserted by dn_route_output_slow() not able to be freed as the refcnt is not released. In dn_dst_gc(), dnrt_drop() is called to release rt which could potentially cause the dst->__refcnt to be dropped to -1. In dn_run_flush(), dst_free() is called to release all the dst. Again, it makes the dst inserted by dn_route_output_slow() not able to be released and also, it does not wait on the rcu and could potentially cause crash in the path where other users still refer to this dst. This patch makes sure both input and output path do not take dst->__refcnt before calling dn_insert_route() and also makes sure dnrt_free()/dst_free() is called when removing dst from the hash table. The only difference between those 2 calls is that dnrt_free() waits on the rcu while dst_free() does not. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-16 15:00:00 -04:00
Wei Wang	76371d2e3a	decnet: always not take dst->__refcnt when inserting dst into hash table In the existing dn_route.c code, dn_route_output_slow() takes dst->__refcnt before calling dn_insert_route() while dn_route_input_slow() does not take dst->__refcnt before calling dn_insert_route(). This makes the whole routing code very buggy. In dn_dst_check_expire(), dnrt_free() is called when rt expires. This makes the routes inserted by dn_route_output_slow() not able to be freed as the refcnt is not released. In dn_dst_gc(), dnrt_drop() is called to release rt which could potentially cause the dst->__refcnt to be dropped to -1. In dn_run_flush(), dst_free() is called to release all the dst. Again, it makes the dst inserted by dn_route_output_slow() not able to be released and also, it does not wait on the rcu and could potentially cause crash in the path where other users still refer to this dst. This patch makes sure both input and output path do not take dst->__refcnt before calling dn_insert_route() and also makes sure dnrt_free()/dst_free() is called when removing dst from the hash table. The only difference between those 2 calls is that dnrt_free() waits on the rcu while dst_free() does not. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-16 14:59:36 -04:00
Sowmini Varadhan	10beea7d74	rds: tcp: Set linger when rejecting an incoming conn in rds_tcp_accept_one Each time we get an incoming SYN to the RDS_TCP_PORT, the TCP layer accepts the connection and then the rds_tcp_accept_one() callback is invoked to process the incoming connection. rds_tcp_accept_one() may reject the incoming syn for a number of reasons, e.g., commit `1a0e100fb2` ("RDS: TCP: Force every connection to be initiated by numerically smaller IP address"), or because we are getting spammed by a malicious node that is triggering a flood of connection attempts to RDS_TCP_PORT. If the incoming syn is rejected, no data would have been sent on the TCP socket, and we do not need to be in TIME_WAIT state, so we set linger on the TCP socket before closing, thereby closing the socket efficiently with a RST. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: Imanti Mendez <imanti.mendez@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-16 12:45:15 -04:00
Sowmini Varadhan	00354de577	rds: tcp: various endian-ness fixes Found when testing between sparc and x86 machines on different subnets, so the address comparison patterns hit the corner cases and brought out some bugs fixed by this patch. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: Imanti Mendez <imanti.mendez@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-16 12:45:15 -04:00
Sowmini Varadhan	41500c3e2a	rds: tcp: remove cp_outgoing After commit `1a0e100fb2` ("RDS: TCP: Force every connection to be initiated by numerically smaller IP address") we no longer need the logic associated with cp_outgoing, so clean up usage of this field. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: Imanti Mendez <imanti.mendez@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-16 12:45:14 -04:00
Haishuang Yan	f1925ca50d	ip6_tunnel: fix potential issue in __ip6_tnl_rcv When __ip6_tnl_rcv fails, the tun_dst won't be freed, so call dst_release to free it in error code path. Fixes: `8d79266bc4` ("ip6_tunnel: add collect_md mode to IPv6 tunnels") CC: Alexei Starovoitov <ast@fb.com> Tested-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-16 12:01:29 -04:00
Haishuang Yan	469f87e158	ip_tunnel: fix potential issue in ip_tunnel_rcv When ip_tunnel_rcv fails, the tun_dst won't be freed, so call dst_release to free it in error code path. Fixes: `2e15ea390e` ("ip_gre: Add support to collect tunnel metadata.") Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Tested-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-16 12:01:29 -04:00
Martin KaFai Lau	58038695e6	net: Add IFLA_XDP_PROG_ID Expose prog_id through IFLA_XDP_PROG_ID. This patch makes modification to generic_xdp. The later patches will modify other xdp-supported drivers. prog_id is added to struct net_dev_xdp. iproute2 patch will be followed. Here is how the 'ip link' will look like: > ip link show eth0 3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp(prog_id:1) qdisc fq_codel state UP mode DEFAULT group default qlen 1000 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-16 11:58:36 -04:00
Johannes Berg	634fef6107	networking: add and use skb_put_u8() Joe and Bjørn suggested that it'd be nicer to not have the cast in the fairly common case of doing (u8 )skb_put(skb, 1) = c; Add skb_put_u8() for this case, and use it across the code, using the following spatch: @@ expression SKB, C, S; typedef u8; identifier fn = {skb_put}; fresh identifier fn2 = fn ## "_u8"; @@ - (u8 )fn(SKB, S) = C; + fn2(SKB, C); Note that due to the "S", the spatch isn't perfect, it should have checked that S is 1, but there's also places that use a sizeof expression like sizeof(var) or sizeof(u8) etc. Turns out that nobody ever did something like (u8 )skb_put(skb, 2) = c; which would be wrong anyway since the second byte wouldn't be initialized. Suggested-by: Joe Perches <joe@perches.com> Suggested-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-16 11:48:40 -04:00
Johannes Berg	d58ff35122	networking: make skb_push & __skb_push return void pointers It seems like a historic accident that these return unsigned char , and in many places that means casts are required, more often than not. Make these functions return void and remove all the casts across the tree, adding a (u8 ) cast only where the unsigned char pointer was used directly, all done with the following spatch: @@ expression SKB, LEN; typedef u8; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; @@ - (fn(SKB, LEN)) + (u8 )fn(SKB, LEN) @@ expression E, SKB, LEN; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; type T; @@ - E = ((T )(fn(SKB, LEN))) + E = fn(SKB, LEN) @@ expression SKB, LEN; identifier fn = { skb_push, __skb_push, skb_push_rcsum }; @@ - fn(SKB, LEN)[0] + (u8 )fn(SKB, LEN) Note that the last part there converts from push(...)[0] to the more idiomatic (u8 *)push(...). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-16 11:48:40 -04:00
Johannes Berg	af72868b90	networking: make skb_pull & friends return void pointers It seems like a historic accident that these return unsigned char , and in many places that means casts are required, more often than not. Make these functions return void and remove all the casts across the tree, adding a (u8 ) cast only where the unsigned char pointer was used directly, all done with the following spatch: @@ expression SKB, LEN; typedef u8; identifier fn = { skb_pull, __skb_pull, skb_pull_inline, __pskb_pull_tail, __pskb_pull, pskb_pull }; @@ - (fn(SKB, LEN)) + (u8 )fn(SKB, LEN) @@ expression E, SKB, LEN; identifier fn = { skb_pull, __skb_pull, skb_pull_inline, __pskb_pull_tail, __pskb_pull, pskb_pull }; type T; @@ - E = ((T *)(fn(SKB, LEN))) + E = fn(SKB, LEN) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-16 11:48:39 -04:00
Johannes Berg	4df864c1d9	networking: make skb_put & friends return void pointers It seems like a historic accident that these return unsigned char , and in many places that means casts are required, more often than not. Make these functions (skb_put, __skb_put and pskb_put) return void and remove all the casts across the tree, adding a (u8 ) cast only where the unsigned char pointer was used directly, all done with the following spatch: @@ expression SKB, LEN; typedef u8; identifier fn = { skb_put, __skb_put }; @@ - (fn(SKB, LEN)) + (u8 )fn(SKB, LEN) @@ expression E, SKB, LEN; identifier fn = { skb_put, __skb_put }; type T; @@ - E = ((T *)(fn(SKB, LEN))) + E = fn(SKB, LEN) which actually doesn't cover pskb_put since there are only three users overall. A handful of stragglers were converted manually, notably a macro in drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many instances in net/bluetooth/hci_sock.c. In the former file, I also had to fix one whitespace problem spatch introduced. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-16 11:48:39 -04:00
Johannes Berg	59ae1d127a	networking: introduce and use skb_put_data() A common pattern with skb_put() is to just want to memcpy() some data into the new space, introduce skb_put_data() for this. An spatch similar to the one for skb_put_zero() converts many of the places using it: @@ identifier p, p2; expression len, skb, data; type t, t2; @@ ( -p = skb_put(skb, len); +p = skb_put_data(skb, data, len); \| -p = (t)skb_put(skb, len); +p = skb_put_data(skb, data, len); ) ( p2 = (t2)p; -memcpy(p2, data, len); \| -memcpy(p, data, len); ) @@ type t, t2; identifier p, p2; expression skb, data; @@ t p; ... ( -p = skb_put(skb, sizeof(t)); +p = skb_put_data(skb, data, sizeof(t)); \| -p = (t )skb_put(skb, sizeof(t)); +p = skb_put_data(skb, data, sizeof(t)); ) ( p2 = (t2)p; -memcpy(p2, data, sizeof(p)); \| -memcpy(p, data, sizeof(p)); ) @@ expression skb, len, data; @@ -memcpy(skb_put(skb, len), data, len); +skb_put_data(skb, data, len); (again, manually post-processed to retain some comments) Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-16 11:48:37 -04:00
Johannes Berg	b080db5853	networking: convert many more places to skb_put_zero() There were many places that my previous spatch didn't find, as pointed out by yuan linyu in various patches. The following spatch found many more and also removes the now unnecessary casts: @@ identifier p, p2; expression len; expression skb; type t, t2; @@ ( -p = skb_put(skb, len); +p = skb_put_zero(skb, len); \| -p = (t)skb_put(skb, len); +p = skb_put_zero(skb, len); ) ... when != p ( p2 = (t2)p; -memset(p2, 0, len); \| -memset(p, 0, len); ) @@ type t, t2; identifier p, p2; expression skb; @@ t p; ... ( -p = skb_put(skb, sizeof(t)); +p = skb_put_zero(skb, sizeof(t)); \| -p = (t )skb_put(skb, sizeof(t)); +p = skb_put_zero(skb, sizeof(t)); ) ... when != p ( p2 = (t2)p; -memset(p2, 0, sizeof(p)); \| -memset(p, 0, sizeof(p)); ) @@ expression skb, len; @@ -memset(skb_put(skb, len), 0, len); +skb_put_zero(skb, len); Apply it to the tree (with one manual fixup to keep the comment in vxlan.c, which spatch removed.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-16 11:48:35 -04:00
David S. Miller	54144b4825	tls: Depend upon INET not plain NET. We refer to TCP et al. symbols so have to use INET as the dependency. ERROR: "tcp_prot" [net/tls/tls.ko] undefined! >> ERROR: "tcp_rate_check_app_limited" [net/tls/tls.ko] undefined! ERROR: "tcp_register_ulp" [net/tls/tls.ko] undefined! ERROR: "tcp_unregister_ulp" [net/tls/tls.ko] undefined! ERROR: "do_tcp_sendpages" [net/tls/tls.ko] undefined! Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-16 11:28:49 -04:00
Vivien Didelot	e4b7778769	net: dsa: assign default CPU port to all ports The current code only assigns the default cpu_dp to all user ports of the switch to which the CPU port belongs. The user ports of the other switches of the fabric thus don't have a default CPU port. This patch fixes this by assigning the cpu_dp of all user ports of all switches of the fabric when the tree is fully parsed. Fixes: `a29342e739` ("net: dsa: Associate slave network device with CPU port") Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-15 17:23:35 -04:00
Xin Long	988c732211	sctp: return next obj by passing pos + 1 into sctp_transport_get_idx In sctp_for_each_transport, pos is used to save how many objs it has dumped. Now it gets the last obj by sctp_transport_get_idx, then gets the next obj by sctp_transport_get_next. The issue is that in the meanwhile if some objs in transport hashtable are removed and the objs nums are less than pos, sctp_transport_get_idx would return NULL and hti.walker.tbl is NULL as well. At this moment it should stop hti, instead of continue getting the next obj. Or it would cause a NULL pointer dereference in sctp_transport_get_next. This patch is to pass pos + 1 into sctp_transport_get_idx to get the next obj directly, even if pos > objs nums, it would return NULL and stop hti. Fixes: `626d16f50f` ("sctp: export some apis or variables for sctp_diag and reuse some for proc") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-15 14:40:30 -04:00
David Howells	5f2f97656a	rxrpc: Fix several cases where a padded len isn't checked in ticket decode This fixes CVE-2017-7482. When a kerberos 5 ticket is being decoded so that it can be loaded into an rxrpc-type key, there are several places in which the length of a variable-length field is checked to make sure that it's not going to overrun the available data - but the data is padded to the nearest four-byte boundary and the code doesn't check for this extra. This could lead to the size-remaining variable wrapping and the data pointer going over the end of the buffer. Fix this by making the various variable-length data checks use the padded length. Reported-by: 石磊 <shilei-c@360.cn> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.c.dionne@auristor.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-15 14:23:44 -04:00
Jiri Benc	86087e170c	net: sched: act_tunnel_key: make UDP checksum configurable Allow requesting of zero UDP checksum for encapsulated packets. The name and meaning of the attribute is "NO_CSUM" in order to have the same meaning of the attribute missing and being 0. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-15 14:21:03 -04:00
Jiri Benc	63fe4c39d2	net: sched: act_tunnel_key: request UDP checksum by default There's currently no way to request (outer) UDP checksum with act_tunnel_key. This is problem especially for IPv6. Right now, tunnel_key action with IPv6 does not work without going through hassles: both sides have to have udp6zerocsumrx configured on the tunnel interface. This is obviously not a good solution universally. It makes more sense to compute the UDP checksum by default even for IPv4. Just set the default to request the checksum when using act_tunnel_key. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-15 14:21:03 -04:00
Dave Watson	3c4d755915	tls: kernel TLS support Software implementation of transport layer security, implemented using ULP infrastructure. tcp proto_ops are replaced with tls equivalents of sendmsg and sendpage. Only symmetric crypto is done in the kernel, keys are passed by setsockopt after the handshake is complete. All control messages are supported via CMSG data - the actual symmetric encryption is the same, just the message type needs to be passed separately. For user API, please see Documentation patch. Pieces that can be shared between hw and sw implementation are in tls_main.c Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-15 12:12:40 -04:00
Dave Watson	e3b5616a34	tcp: export do_tcp_sendpages and tcp_rate_check_app_limited functions Export do_tcp_sendpages and tcp_rate_check_app_limited, since tls will need to sendpages while the socket is already locked. tcp_sendpage is exported, but requires the socket lock to not be held already. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-15 12:12:40 -04:00
Dave Watson	734942cc4e	tcp: ULP infrastructure Add the infrustructure for attaching Upper Layer Protocols (ULPs) over TCP sockets. Based on a similar infrastructure in tcp_cong. The idea is that any ULP can add its own logic by changing the TCP proto_ops structure to its own methods. Example usage: setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls")); modules will call: tcp_register_ulp(&tcp_tls_ulp_ops); to register/unregister their ulp, with an init function and name. A list of registered ulps will be returned by tcp_get_available_ulp, which is hooked up to /proc. Example: $ cat /proc/sys/net/ipv4/tcp_available_ulp tls There is currently no functionality to remove or chain ULPs, but it should be possible to add these in the future if needed. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-15 12:12:40 -04:00
David S. Miller	0ddead90b2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net The conflicts were two cases of overlapping changes in batman-adv and the qed driver. Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-15 11:59:32 -04:00
Xin Long	f8a894b218	ipv6: fix calling in6_ifa_hold incorrectly for dad work Now when starting the dad work in addrconf_mod_dad_work, if the dad work is idle and queued, it needs to hold ifa. The problem is there's one gap in [1], during which if the pending dad work is removed elsewhere. It will miss to hold ifa, but the dad word is still idea and queue. if (!delayed_work_pending(&ifp->dad_work)) in6_ifa_hold(ifp); <--------------[1] mod_delayed_work(addrconf_wq, &ifp->dad_work, delay); An use-after-free issue can be caused by this. Chen Wei found this issue when WARN_ON(!hlist_unhashed(&ifp->addr_lst)) in net6_ifa_finish_destroy was hit because of it. As Hannes' suggestion, this patch is to fix it by holding ifa first in addrconf_mod_dad_work, then calling mod_delayed_work and putting ifa if the dad_work is already in queue. Note that this patch did not choose to fix it with: if (!mod_delayed_work(delay)) in6_ifa_hold(ifp); As with it, when delay == 0, dad_work would be scheduled immediately, all addrconf_mod_dad_work(0) callings had to be moved under ifp->lock. Reported-by: Wei Chen <weichen@redhat.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-15 11:26:16 -04:00
David Howells	f7aec129a3	rxrpc: Cache the congestion window setting Cache the congestion window setting that was determined during a call's transmission phase when it finishes so that it can be used by the next call to the same peer, thereby shortcutting the slow-start algorithm. The value is stored in the rxrpc_peer struct and is accessed without locking. Each call takes the value that happens to be there when it starts and just overwrites the value when it finishes. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-14 15:42:45 -04:00
Jesper Dangaard Brouer	849a44de91	net: don't global ICMP rate limit packets originating from loopback Florian Weimer seems to have a glibc test-case which requires that loopback interfaces does not get ICMP ratelimited. This was broken by commit `c0303efeab` ("net: reduce cycles spend on ICMP replies that gets rate limited"). An ICMP response will usually be routed back-out the same incoming interface. Thus, take advantage of this and skip global ICMP ratelimit when the incoming device is loopback. In the unlikely event that the outgoing it not loopback, due to strange routing policy rules, ICMP rate limiting still works via peer ratelimiting via icmpv4_xrlim_allow(). Thus, we should still comply with RFC1812 (section 4.3.2.8 "Rate Limiting"). This seems to fix the reproducer given by Florian. While still avoiding to perform expensive and unneeded outgoing route lookup for rate limited packets (in the non-loopback case). Fixes: `c0303efeab` ("net: reduce cycles spend on ICMP replies that gets rate limited") Reported-by: Florian Weimer <fweimer@redhat.com> Reported-by: "H.J. Lu" <hjl.tools@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-14 15:33:58 -04:00
Dan Carpenter	c4f65b09b4	net/act_pedit: fix an error code I'm reviewing static checker warnings where we do ERR_PTR(0), which is the same as NULL. I'm pretty sure we intended to return ERR_PTR(-EINVAL) here. Sometimes these bugs lead to a NULL dereference but I don't immediately see that problem here. Fixes: `71d0ed7079` ("net/act_pedit: Support using offset relative to the conventional network headers") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Amir Vadai <amir@vadai.me> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-14 15:24:18 -04:00
Paolo Abeni	7608894e43	net: use skb_unref() in napi_consume_skb() The commit 83ada39bb79d ("net: factor out a helper to decrement the skb refcount") provided and used a helper for decrementing skb usage, but I missed at least a spot for it. This change remove some more duplicated code reusing skb_unref() in napi_consume_skb(), too. The helper uses an additional, unneeded unlikely(!skb) test - napi_consume_skb() already check it a few lines above - but the compiler is smart enough to optimize the duplicated test out. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-14 15:23:51 -04:00
David S. Miller	4cbf87c789	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2017-06-14 Here's another batch of Bluetooth patches for the 4.13 kernel: - Fix for Broadcom controllers not supporting Event Mask Page 2 - New QCA ROME USB ID for btusb - Fix for Security Manager Protocol to use constant-time memcmp - Improved support for TI WiLink chips Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-14 15:22:17 -04:00
Yonghong Song	31fd85816d	bpf: permits narrower load from bpf program context fields Currently, verifier will reject a program if it contains an narrower load from the bpf context structure. For example, __u8 h = __sk_buff->hash, or __u16 p = __sk_buff->protocol __u32 sample_period = bpf_perf_event_data->sample_period which are narrower loads of 4-byte or 8-byte field. This patch solves the issue by: . Introduce a new parameter ctx_field_size to carry the field size of narrower load from prog type specific *__is_valid_access validator back to verifier. . The non-zero ctx_field_size for a memory access indicates (1). underlying prog type specific convert_ctx_accesses supporting non-whole-field access (2). the current insn is a narrower or whole field access. . In verifier, for such loads where load memory size is less than ctx_field_size, verifier transforms it to a full field load followed by proper masking. . Currently, __sk_buff and bpf_perf_event_data->sample_period are supporting narrowing loads. . Narrower stores are still not allowed as typical ctx stores are just normal stores. Because of this change, some tests in verifier will fail and these tests are removed. As a bonus, rename some out of bound __sk_buff->cb access to proper field name and remove two redundant "skb cb oob" tests. Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-14 14:56:25 -04:00
WANG Cong	74030603df	net_sched: move tcf_lock down after gen_replace_estimator() Laura reported a sleep-in-atomic kernel warning inside tcf_act_police_init() which calls gen_replace_estimator() with spinlock protection. It is not necessary in this case, we already have RTNL lock here so it is enough to protect concurrent writers. For the reader, i.e. tcf_act_police(), it needs to make decision based on this rate estimator, in the worst case we drop more/less packets than necessary while changing the rate in parallel, it is still acceptable. Reported-by: Laura Abbott <labbott@redhat.com> Reported-by: Nick Huber <nicholashuber@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-14 14:39:19 -04:00
Johannes Berg	68dd02d19c	dev_ioctl: copy only the smaller struct iwreq for wext Unfortunately, struct iwreq isn't a proper subset of struct ifreq, but is still handled by the same code path. Robert reported that then applications may (randomly) fault if the struct iwreq they pass happens to land within 8 bytes of the end of a mapping (the struct is only 32 bytes, vs. struct ifreq's 40 bytes). To fix this, pull out the code handling wireless extension ioctls and copy only the smaller structure in this case. This bug goes back a long time, I tracked that it was introduced into mainline in 2.1.15, over 20 years ago! This fixes https://bugzilla.kernel.org/show_bug.cgi?id=195869 Reported-by: Robert O'Callahan <robert@ocallahan.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-06-14 13:52:44 +02:00
Johannes Berg	4f39a1f587	wireless: wext: use struct iwreq earlier in the call chain To make it clear that we never use struct ifreq, cast from it directly in the wext entrypoint and use struct iwreq from there on. The next patch will remove the cast again and pass the correct struct from the beginning. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-06-14 13:37:42 +02:00
Dan Carpenter	e747f64336	xfrm: NULL dereference on allocation failure The default error code in pfkey_msg2xfrm_state() is -ENOBUFS. We added a new call to security_xfrm_state_alloc() which sets "err" to zero so there several places where we can return ERR_PTR(0) if kmalloc() fails. The caller is expecting error pointers so it leads to a NULL dereference. Fixes: `df71837d50` ("[LSM-IPSec]: Security association restriction.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-06-14 12:40:49 +02:00
Dan Carpenter	1e3d0c2c70	xfrm: Oops on error in pfkey_msg2xfrm_state() There are some missing error codes here so we accidentally return NULL instead of an error pointer. It results in a NULL pointer dereference. Fixes: `df71837d50` ("[LSM-IPSec]: Security association restriction.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-06-14 12:40:49 +02:00
Johannes Berg	8bfb367660	wireless: wext: remove ndo_do_ioctl fallback There are no longer any drivers (in the tree proper, I didn't check all the staging drivers) that take WEXT ioctls through this API, the only remaining ones that even have ndo_do_ioctl are using it only for private ioctls. Therefore, we can remove this call. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-06-14 09:17:48 +02:00
Florian Fainelli	3cc9f2573c	net: dsa: Introduce dsa_get_cpu_port() Introduce a helper function which will return a reference to the CPU port used in a dsa_switch_tree. Right now this is a singleton, but this will change once we introduce multi-CPU port support, so ease the transition by converting the affected code paths. Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-13 16:35:03 -04:00
Florian Fainelli	a29342e739	net: dsa: Associate slave network device with CPU port In preparation for supporting multiple CPU ports with DSA, have the dsa_port structure know which CPU it is associated with. This will be important in order to make sure the correct CPU is used for transmission of the frames. If not for functional reasons, for performance (e.g: load balancing) and forwarding decisions. Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-13 16:35:03 -04:00
Florian Fainelli	67dbb9d433	net: dsa: Relocate master ethtool operations Relocate master_ethtool_ops and master_orig_ethtool_ops into struct dsa_port in order to be both consistent, and make things self contained within the dsa_port structure. This is a preliminary change to supporting multiple CPU port interfaces. Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-13 16:35:02 -04:00
Florian Fainelli	6d3c8c0dd8	net: dsa: Remove master_netdev and use dst->cpu_dp->netdev In preparation for supporting multiple CPU ports, remove dst->master_netdev and ds->master_netdev and replace them with only one instance of the common object we have for a port: struct dsa_port::netdev. ds->master_netdev is currently write only and would be helpful in the case where we have two switches, both with CPU ports, and also connected within each other, which the multi-CPU port patch series would address. While at it, introduce a helper function used in net/dsa/slave.c to immediately get a reference on the master network device called dsa_master_netdev(). Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-13 16:35:02 -04:00
Mateusz Jurczyk	20a3d5bf5e	caif: Add sockaddr length check before accessing sa_family in connect handler Verify that the caller-provided sockaddr structure is large enough to contain the sa_family field, before accessing it in the connect() handler of the AF_CAIF socket. Since the syscall doesn't enforce a minimum size of the corresponding memory region, very short sockaddrs (zero or one byte long) result in operating on uninitialized memory while referencing sa_family. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-13 16:16:11 -04:00
Johannes Berg	aa9f979c41	networking: use skb_put_zero() Use the recently introduced helper to replace the pattern of skb_put() && memset(), this transformation was done with the following spatch: @@ identifier p; expression len; expression skb; @@ -p = skb_put(skb, len); -memset(p, 0, len); +p = skb_put_zero(skb, len); Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-13 13:54:03 -04:00
David S. Miller	0e74008b66	A couple of weeks worth of updates - looks like things are quiet: * merged net-next back to get a patch from net that another patch here depends on * various small improvements/cleanups across the board * 4-way handshake offload (many thanks to Arend for shepherding that) * mesh CSA/DFS support in mac80211 * the skb_put_zero() we discussed previously -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEExu3sM/nZ1eRSfR9Ha3t4Rpy0AB0FAlk/2HoACgkQa3t4Rpy0 AB3psA/8CVT+cJHH6fQoP2ev17LMB5CF/bBaRh8jeYRg/RslofwptLaG6CVi/Eri RSf036q1pUqpS7BlBguCUwqtNGIKvhr3AUIuN0nQsrH4iPJMl8DaCHM4a7BigdtG Cq4N7GTS5gJcUcjpxcOIoCsrpdkp8Lvnz6z7nBIxemYAyGuxrW2Z9ES38fh4TTlS k+8h+c8+K0Q3WsT5BB3i7zTTBLLhpR9r1YcbNf4Y984vF/Blc4M1ggbWMPZZG/y8 CdOMH3dM9FHrzyHeyRC2ppVah6GBUgeccSlJP5KcF2vsMi2fVRwfxWTFXaQzgJy7 lS2bKuqAiLopaYAmq/fSMBygxm2GPSsKtc2lz+TLXXTL18fqpIq7ZTjZLE+gYTCv DB0GamoaFciEKJ+jOvy95y2WRMnYia2whBrzsUzQ4Uful6vXbr5Q5ue5xCj4t4Qe bbveAdVl7n7m1pqtq9A3YP0m/lX2f7BIv2DF5bM1XoHohZHDdvETDF7NE2BIsT/I QFem5ffcBQRZPmdg7Tkh3K79tA4JA/ML4cx8W7Te9k+aOtaFR+ojA4pnH/8fI9d/ 6hIPuLwxI+OWGYNglxyIbuzZ4KiQr5JnZe4OFk4+/Y2g01ALY3DAbXnCVIXJIh8e bqUf+1Bai6EnxLDWx4qehB+bPVHzHYmvlZeJud+KJPUU/NZ9YSw= =x2vs -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2017-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== A couple of weeks worth of updates - looks like things are quiet: * merged net-next back to get a patch from net that another patch here depends on * various small improvements/cleanups across the board * 4-way handshake offload (many thanks to Arend for shepherding that) * mesh CSA/DFS support in mac80211 * the skb_put_zero() we discussed previously ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-13 13:52:37 -04:00
David S. Miller	5952b0200e	This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - decrease maximum fragment size, by Matthias Schiffer - Clean up seqfile writing, by Markus Elfring (2 patches) - use __func__ in debug messages, by Sven Eckelmann - Mark tpmeter initializers with __init, by Antonio Quartulli - ignore loop detection MAC addresses, by Simon Wunderlich - clean up some return handling, by Simon Wunderlich - improve ELP throughput value handling for WiFi neighbors in BATMAN V/ELP, by Sven Eckelmann (2 patches) -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAlk/zdgWHHN3QHNpbW9u d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoT9dEADWePo0E4gtqX8sZ9f/asqCUjiA RRcg6oHjhKl0yy8bAh+86/iEEJIsWs0IKt/gT7N8v6y3/rE7ZWrXarTvZUEcu+lb pCbl3/e4m+VE4ROvZA/iUFxbzrp0SGP/DO9b6GUP/Lo2SOFzr4ATYpw2HVmUnQmt 7l3do7KSA6WVZs4KTno98ydZW5sFhBIJnnJcwuWGs6TWt7XDnFKOKeTg4QAcaycn /qZQ7NB/25Q9cu1sPF5V+DqlIGhYTl3d4KF4ENUTSClb5OLq0QnBIvfRBoJMwU+q I6roLhd9oYSycC6+6nw70EFbXokxaVz2OyCG2Yffv3BnoMOrwy3eOCmk7reZr0x0 lGhGjUo+baHBbXr/wGWoDI5gUr7OWtkFGJ4vsdGqkWbmwvjf1yTQyOr9mC6y9lAV EiQ+yphIxt7UXbQxu6Yi8xON4v4/YG0CGtb6tiuZVoqXl6k/VZR3I6gluVkVG/oy k/wBnXkdAfcOK+0x1pEgkfrwew42oVXl/xgtA/Al/TuRSwg1BCVcC0Jh9TdHQpqE DWa4UApMlIk8W21gRf49ixxmjwAJ4qGey8YnqTE38uNv/06Lt+U9Bq6qMMGhY9v/ 1rTv9kTlcgAr0LjEwTXpboRAOOVrZxW1PXtWLc5Lim2ge5+vydA2R7R+R4EgP8Kf gpM7YbosYwaO+MAw5Q== =i0xq -----END PGP SIGNATURE----- Merge tag 'batadv-next-for-davem-20170613' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - decrease maximum fragment size, by Matthias Schiffer - Clean up seqfile writing, by Markus Elfring (2 patches) - use __func__ in debug messages, by Sven Eckelmann - Mark tpmeter initializers with __init, by Antonio Quartulli - ignore loop detection MAC addresses, by Simon Wunderlich - clean up some return handling, by Simon Wunderlich - improve ELP throughput value handling for WiFi neighbors in BATMAN V/ELP, by Sven Eckelmann (2 patches) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-13 13:47:16 -04:00
David S. Miller	7de84403a2	Here are two batman-adv bugfixes: - fix rx packet counters for local ARP replies, by Sven Eckelmann - fix memory leaks for unicast packetes received from another gateway in bridge loop avoidance, by Andreas Pape -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAlk/yMgWHHN3QHNpbW9u d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoedFD/9POmDgXWG5pYhz1/NG51zWguxF X4p04AuDj9rAfxiXt80AH1MQnmqo8e2ZArGRA0x+wqr7QVT9CiiUcVbRdWuqAmGu cm2zE+2JaBYtSfRTbRTjuHMO5htY8Q7UK7DZr0OVyT6ApLcC44zsbTUEQnaYxEar zhEt5n80XodRhk8TPXbphYaRG3udtr0ULpqYP96CTL/0HScaF5xmYl7+QF8lEajE AgxAm2K8kp1fPptrCLIKJMCRw7IMoJsLGGwWIQYL2TTnHJ9ZOfzdV0zq7yTFGp6s UVHL5SXu1esckv4LaJgWn54mFyVyBY35US6b8Xkk/LYDEO4NNin1Qa3X8ObPEIG2 Xqun6BqeUjDYNEYQYBRJ0Zxem3TXQlNevPbAAsPjwlFy6t6ArpT267KPZH7u2wu4 F7QgPBlsBtymeIj1yYRNwhzbRDjRTvNq+8N39hf1fBijpJANM7iYwJ+rGet/HzZA UOsggnq4lV5CsdXcqobT4F4Ru2am/8SB2wwPlydOfCNOdlMr5qAu40dEJ5TxWHgq 5nkOhDQHKznGzk+9QMItKCeakhq119GRL7TCKQj4fcYG/jFp9HPtVSb3OmAz2UGH fb/g+myOTCrwPctIE65A7GUTMhPCRckcQfTJwOWI0AGDbun2fwGhUzgZknNz6KwE 2J+twzFipw3E31vJUg== =AbYj -----END PGP SIGNATURE----- Merge tag 'batadv-net-for-davem-20170613' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here are two batman-adv bugfixes: - fix rx packet counters for local ARP replies, by Sven Eckelmann - fix memory leaks for unicast packetes received from another gateway in bridge loop avoidance, by Andreas Pape ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-13 13:46:01 -04:00
David S. Miller	c5549ee401	Some fixes: * Avi fixes some fallout from my mac80211 RX flags changes * Emmanuel fixes an issue with adhering to the spec, and an oversight in the SMPS management code * Jason's patch makes mac80211 use constant-time memory comparisons for message authentication, to avoid having potentially observable timing differences * my fix makes mac80211 set the basic rates bitmap before the channel so the next update to the driver has more consistent data - this required another rework patch to remove some useless 5/10 MHz code that can never be hit -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEExu3sM/nZ1eRSfR9Ha3t4Rpy0AB0FAlk/s1cACgkQa3t4Rpy0 AB336g//dkuRslWLyTzPt57t9VFI9q3sfDCg7ATj9cOrExqlukB9M7/Bc2e8FxXm 5JycdNg7iw4ysYgh2BHf1bRHROx006aNyaRzCMMsLDMkGl1iuB3W9ZSUPueNeyvV xA+OU1ZIA2ze0SrI4DXuotRoj7cHIMr280drZJaq9wFmxV5hr4NIpwFY5syjI8dG K8Net9LLYaRWAdQUjEwW778ONut738qONt+kg5dPw4tbjJUbaeO2HN4l0zjIMyEZ LGa0KOSVbarMaY6S3xniW5gheap4qEJyhoVPw1UO+dLAH8LSDQlu7SVviDAadpim ufjdQdVYir/zxO317gRu80oEyLDgl7U/E8PaSCIl/c+P+TwOM8RqQ4I2lleg9wA3 NHEPGTDRLllfSFjDhOQSHCQD6MwHYVBgKTrfmi97da8IqHOoR25cHH16muSixwKI DrMw4DOiVDxwuOoV7TgOiadQ9Rx6C8l+U0zlKVsQk/j3zJyNZXSkNIQTGAQ13ZZj Otm4WRXX0Bgm6ViRTXcRkekh//3ZA87SNbRNfKYzBwH8pOX+mDAraxKBsX4h4HGb KLiTKRKVIFnVQTJlzDoKwqSuQRSzkZ3f6jgTeOmaysPAIkwewivh6aqyROxImAsi 9GXZOrcUBG34aNRXB6FReojzqpJR3x48fawFc5qXAv/O5RWbuJc= =S5/1 -----END PGP SIGNATURE----- Merge tag 'mac80211-for-davem-2017-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Some fixes: * Avi fixes some fallout from my mac80211 RX flags changes * Emmanuel fixes an issue with adhering to the spec, and an oversight in the SMPS management code * Jason's patch makes mac80211 use constant-time memory comparisons for message authentication, to avoid having potentially observable timing differences * my fix makes mac80211 set the basic rates bitmap before the channel so the next update to the driver has more consistent data - this required another rework patch to remove some useless 5/10 MHz code that can never be hit ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-13 13:34:13 -04:00
yuval.shaia@oracle.com	5514174fe9	net: phy: Make phy_ethtool_ksettings_get return void Make return value void since function never return meaningfull value Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-13 12:59:06 -04:00
WANG Cong	c38b7d327a	igmp: acquire pmc lock for ip_mc_clear_src() Andrey reported a use-after-free in add_grec(): for (psf = *psf_list; psf; psf = psf_next) { ... psf_next = psf->sf_next; where the struct ip_sf_list's were already freed by: kfree+0xe8/0x2b0 mm/slub.c:3882 ip_mc_clear_src+0x69/0x1c0 net/ipv4/igmp.c:2078 ip_mc_dec_group+0x19a/0x470 net/ipv4/igmp.c:1618 ip_mc_drop_socket+0x145/0x230 net/ipv4/igmp.c:2609 inet_release+0x4e/0x1c0 net/ipv4/af_inet.c:411 sock_release+0x8d/0x1e0 net/socket.c:597 sock_close+0x16/0x20 net/socket.c:1072 This happens because we don't hold pmc->lock in ip_mc_clear_src() and a parallel mr_ifc_timer timer could jump in and access them. The RCU lock is there but it is merely for pmc itself, this spinlock could actually ensure we don't access them in parallel. Thanks to Eric and Long for discussion on this bug. Reported-by: Andrey Konovalov <andreyknvl@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Xin Long <lucien.xin@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-13 12:51:37 -04:00
Ashwanth Goli	97d8b6e3b8	net: rps: fix uninitialized symbol warning This patch fixes uninitialized symbol warning that got introduced by the following commit `773fc8f6e8` ("net: rps: send out pending IPI's on CPU hotplug") Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-13 11:31:22 -04:00
Sven Eckelmann	d62890885e	batman-adv: Accept only filled wifi station info The wifi driver can decide to not provide parts of the station info. For example, the expected throughput of the station can be omitted when the used rate control doesn't provide this kind of information. The B.A.T.M.A.N. V implementation must therefore check the filled bitfield before it tries to access the expected_throughput of the returned station_info. Reported-by: Alvaro Antelo <alvaro.antelo@gmail.com> Fixes: `c833484e5f` ("batman-adv: ELP - compute the metric based on the estimated throughput") Signed-off-by: Sven Eckelmann <sven@narfation.org> Reviewed-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2017-06-13 12:25:43 +02:00
Sven Eckelmann	3f3f87325d	batman-adv: Use default throughput value on cfg80211 error A wifi interface should never be handled like an ethernet devices. The parser of the cfg80211 output must therefore skip the ethtool code when cfg80211_get_station returned an error. Fixes: `f44a3ae9a2` ("batman-adv: refactor wifi interface detection") Signed-off-by: Sven Eckelmann <sven@narfation.org> Reviewed-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2017-06-13 12:24:02 +02:00
Andy Shevchenko	4524667b1e	net: rfkill: gpio: Switch to devm_acpi_dev_add_driver_gpios() Switch to use managed variant of acpi_dev_add_driver_gpios() to simplify error path and fix potentially wrong assingment if ->probe() fails. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-06-13 11:07:51 +02:00
Emmanuel Grumbach	6dad28ae19	mac80211: add the action to the drv_ampdu_action tracepoint It is very useful to know what ampdu action is currently happening. Add this information to the tracepoint. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-06-13 11:06:39 +02:00
Avraham Stern	f45cbe6e69	nl80211: add authorized flag to ROAM event Drivers that initiate roaming while being connected to a network that uses 802.1X authentication need to inform user space if 802.1X authentication is further required after roaming. For example, when using the Fast transition protocol, roaming within the mobility domain does not require new 802.1X authentication, but roaming to another mobility domain does. In addition, some drivers may not support 802.1X authentication (so it has to be done in user space), while other drivers do. Add a flag to the roaming notification to indicate if user space is required to do 802.1X authentication after the roaming or not. This flag will only be used for networks that use 802.1X authentication. For networks that do not use 802.1X authentication it is assumed that no further action is required from user space after the roaming notification. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> [arend.vanspriel@broadcom.com reuse NL80211_ATTR_PORT_AUTHORIZED] Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> [rebase to apply w/o the flag in CONNECT] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-06-13 11:04:37 +02:00
Avraham Stern	3a00df5707	cfg80211: support 4-way handshake offloading for 802.1X Add API for setting the PMK to the driver. For FT support, allow setting also the PMK-R0 Name. This can be used by drivers that support 4-Way handshake offload while IEEE802.1X authentication is managed by upper layers. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> [arend.vanspriel@broadcom.com: add WANT_1X_4WAY_HS attribute] Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> [reword NL80211_EXT_FEATURE_4WAY_HANDSHAKE_STA_1X docs a bit to say that the device may require it] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-06-13 10:44:09 +02:00
Eliad Peller	91b5ab6289	cfg80211: support 4-way handshake offloading for WPA/WPA2-PSK Let drivers advertise support for station-mode 4-way handshake offloading with a new NL80211_EXT_FEATURE_4WAY_HANDSHAKE_STA_PSK flag. Extend use of NL80211_ATTR_PMK attribute indicating it might be passed as part of NL80211_CMD_CONNECT command, and contain the PSK (which is the PMK, hence the name.) The driver/device is assumed to handle the 4-way handshake by itself in this case (including key derivations, etc.), instead of relying on the supplicant. This patch is somewhat based on this one (by Vladimir Kondratiev): https://patchwork.kernel.org/patch/1309561/. Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com> Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> [arend.vanspriel@broadcom.com rebase dealing with existing ATTR_PMK] Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> [reword NL80211_EXT_FEATURE_4WAY_HANDSHAKE_STA_PSK docs to indicate that this offload might be required] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-06-13 10:43:56 +02:00
Emmanuel Grumbach	b3dd827965	mac80211: don't send SMPS action frame in AP mode when not needed mac80211 allows to modify the SMPS state of an AP both, when it is started, and after it has been started. Such a change will trigger an action frame to all the peers that are currently connected, and will be remembered so that new peers will get notified as soon as they connect (since the SMPS setting in the beacon may not be the right one). This means that we need to remember the SMPS state currently requested as well as the SMPS state that was configured initially (and advertised in the beacon). The former is bss->req_smps and the latter is sdata->smps_mode. Initially, the AP interface could only be started with SMPS_OFF, which means that sdata->smps_mode was SMPS_OFF always. Later, a nl80211 API was added to be able to start an AP with a different AP mode. That code forgot to update bss->req_smps and because of that, if the AP interface was started with SMPS_DYNAMIC, we had: sdata->smps_mode = SMPS_DYNAMIC bss->req_smps = SMPS_OFF That configuration made mac80211 think it needs to fire off an action frame to any new station connecting to the AP in order to let it know that the actual SMPS configuration is SMPS_OFF. Fix that by properly setting bss->req_smps in ieee80211_start_ap. Fixes: `f699317487` ("mac80211: set smps_mode according to ap params") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-06-13 10:24:35 +02:00
Jason A. Donenfeld	98c67d187d	mac80211/wpa: use constant time memory comparison for MACs Otherwise, we enable all sorts of forgeries via timing attack. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: linux-wireless@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-06-13 10:24:34 +02:00
Johannes Berg	c87905bec5	mac80211: set bss_info data before configuring the channel When mac80211 changes the channel, it also calls into the driver's bss_info_changed() callback, e.g. with BSS_CHANGED_IDLE. The driver may, like iwlwifi does, access more data from bss_info in that case and iwlwifi accesses the basic_rates bitmap, but if changing from a band with more (basic) rates to one with fewer, an out-of-bounds access of the rate array may result. While we can't avoid having invalid data at some point in time, we can avoid having it while we call the driver - so set up all the data before configuring the channel, and then apply it afterwards. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=195677 Reported-by: Johannes Hirte <johannes.hirte@datenkhaos.de> Tested-by: Johannes Hirte <johannes.hirte@datenkhaos.de> Debugged-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-06-13 10:24:33 +02:00
Johannes Berg	44f6d42cbd	mac80211: remove 5/10 MHz rate code from station MLME There's no need for the station MLME code to handle bitrates for 5 or 10 MHz channels when it can't ever create such a configuration. Remove the unnecessary code. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-06-13 10:24:32 +02:00
Avraham Stern	204a7dbcb2	mac80211: Fix incorrect condition when checking rx timestamp If the driver reports the rx timestamp at PLCP start, mac80211 can only handle legacy encoding, but the code checks that the encoding is not legacy. Fix this. Fixes: `da6a4352e7` ("mac80211: separate encoding/bandwidth from flags") Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-06-13 10:24:32 +02:00
Emmanuel Grumbach	769dc04db3	mac80211: don't look at the PM bit of BAR frames When a peer sends a BAR frame with PM bit clear, we should not modify its PM state as madated by the spec in 802.11-20012 10.2.1.2. Cc: stable@vger.kernel.org Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2017-06-13 10:24:31 +02:00
Karicheri, Muralidharan	675c8da049	hsr: fix incorrect warning When HSR interface is setup using ip link command, an annoying warning appears with the trace as below:- [ 203.019828] hsr_get_node: Non-HSR frame [ 203.019833] Modules linked in: [ 203.019848] CPU: 0 PID: 158 Comm: sd-resolve Tainted: G W 4.12.0-rc3-00052-g9fa6bf70 #2 [ 203.019853] Hardware name: Generic DRA74X (Flattened Device Tree) [ 203.019869] [<c0110280>] (unwind_backtrace) from [<c010c2f4>] (show_stack+0x10/0x14) [ 203.019880] [<c010c2f4>] (show_stack) from [<c04b9f64>] (dump_stack+0xac/0xe0) [ 203.019894] [<c04b9f64>] (dump_stack) from [<c01374e8>] (__warn+0xd8/0x104) [ 203.019907] [<c01374e8>] (__warn) from [<c0137548>] (warn_slowpath_fmt+0x34/0x44) root@am57xx-evm:~# [ 203.019921] [<c0137548>] (warn_slowpath_fmt) from [<c081126c>] (hsr_get_node+0x148/0x170) [ 203.019932] [<c081126c>] (hsr_get_node) from [<c0814240>] (hsr_forward_skb+0x110/0x7c0) [ 203.019942] [<c0814240>] (hsr_forward_skb) from [<c0811d64>] (hsr_dev_xmit+0x2c/0x34) [ 203.019954] [<c0811d64>] (hsr_dev_xmit) from [<c06c0828>] (dev_hard_start_xmit+0xc4/0x3bc) [ 203.019963] [<c06c0828>] (dev_hard_start_xmit) from [<c06c13d8>] (__dev_queue_xmit+0x7c4/0x98c) [ 203.019974] [<c06c13d8>] (__dev_queue_xmit) from [<c0782f54>] (ip6_finish_output2+0x330/0xc1c) [ 203.019983] [<c0782f54>] (ip6_finish_output2) from [<c0788f0c>] (ip6_output+0x58/0x454) [ 203.019994] [<c0788f0c>] (ip6_output) from [<c07b16cc>] (mld_sendpack+0x420/0x744) As this is an expected path to hsr_get_node() with frame coming from the master interface, add a check to ensure packet is not from the master port and then warn. Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-12 15:21:20 -04:00
Paolo Abeni	b65ac44674	udp: try to avoid 2 cache miss on dequeue when udp_recvmsg() is executed, on x86_64 and other archs, most skb fields are on cold cachelines. If the skb are linear and the kernel don't need to compute the udp csum, only a handful of skb fields are required by udp_recvmsg(). Since we already use skb->dev_scratch to cache hot data, and there are 32 bits unused on 64 bit archs, use such field to cache as much data as we can, and try to prefetch on dequeue the relevant fields that are left out. This can save up to 2 cache miss per packet. v1 -> v2: - changed udp_dev_scratch fields types to u{32,16} variant, replaced bitfiled with bool Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-12 10:01:29 -04:00
Paolo Abeni	0a463c78d2	udp: avoid a cache miss on dequeue Since UDP no more uses sk->destructor, we can clear completely the skb head state before enqueuing. Amend and use skb_release_head_state() for that. All head states share a single cacheline, which is not normally used/accesses on dequeue. We can avoid entirely accessing such cacheline implementing and using in the UDP code a specialized skb free helper which ignores the skb head state. This saves a cacheline miss at skb deallocation time. v1 -> v2: replaced secpath_reset() with skb_release_head_state() Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-12 10:01:29 -04:00
Paolo Abeni	3889a803e1	net: factor out a helper to decrement the skb refcount The same code is replicated in 3 different places; move it to a common helper. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-12 10:01:29 -04:00
Christian Perle	3500cd73df	proc: snmp6: Use correct type in memset Reading /proc/net/snmp6 yields bogus values on 32 bit kernels. Use "u64" instead of "unsigned long" in sizeof(). Fixes: `4a4857b1c8` ("proc: Reduce cache miss in snmp6_seq_show") Signed-off-by: Christian Perle <christian.perle@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-12 09:53:14 -04:00
Hangbin Liu	138437f591	xfrm: move xfrm_garbage_collect out of xfrm_policy_flush Now we will force to do garbage collection if any policy removed in xfrm_policy_flush(). But during xfrm_net_exit(). We call flow_cache_fini() first and set set fc->percpu to NULL. Then after we call xfrm_policy_fini() -> frxm_policy_flush() -> flow_cache_flush(), we will get NULL pointer dereference when check percpu_empty. The code path looks like: flow_cache_fini() - fc->percpu = NULL xfrm_policy_fini() - xfrm_policy_flush() - xfrm_garbage_collect() - flow_cache_flush() - flow_cache_percpu_empty() - fcp = per_cpu_ptr(fc->percpu, cpu) To reproduce, just add ipsec in netns and then remove the netns. v2: As Xin Long suggested, since only two other places need to call it. move xfrm_garbage_collect() outside xfrm_policy_flush(). v3: Fix subject mismatch after v2 fix. Fixes: `35db069121` ("xfrm: do the garbage collection after flushing policy") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2017-06-12 11:51:21 +02:00
Marcel Holtmann	313f6888c8	Bluetooth: Send HCI Set Event Mask Page 2 command only when needed The Broadcom BCM20702 Bluetooth controller in ThinkPad-T530 devices report support for the Set Event Mask Page 2 command, but actually do return an error when trying to use it. < HCI Command: Read Local Supported Commands (0x04\|0x0002) plen 0 > HCI Event: Command Complete (0x0e) plen 68 Read Local Supported Commands (0x04\|0x0002) ncmd 1 Status: Success (0x00) Commands: 162 entries ... Set Event Mask Page 2 (Octet 22 - Bit 2) ... < HCI Command: Set Event Mask Page 2 (0x03\|0x0063) plen 8 Mask: 0x0000000000000000 > HCI Event: Command Complete (0x0e) plen 4 Set Event Mask Page 2 (0x03\|0x0063) ncmd 1 Status: Unknown HCI Command (0x01) Since these controllers do not support any feature that would require the event mask page 2 to be modified, it is safe to not send this command at all. The default value is all bits set to zero. T: Bus=01 Lev=02 Prnt=02 Port=03 Cnt=03 Dev#= 9 Spd=12 MxCh= 0 D: Ver= 2.00 Cls=ff(vend.) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0a5c ProdID=21e6 Rev= 1.12 S: Manufacturer=Broadcom Corp S: Product=BCM20702A0 S: SerialNumber=F82FA8E8CFC0 C:* #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr= 0mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=btusb E: Ad=84(I) Atr=02(Bulk) MxPS= 32 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 32 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 0 Cls=fe(app. ) Sub=01 Prot=01 Driver=(none) Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Szymon Janc <szymon.janc@codecoup.pl>	2017-06-12 11:45:30 +02:00
Donald Sharp	4b1f0d33db	net: ipmr: Fix some mroute forwarding issues in vrf's This patch fixes two issues: 1) When forwarding on *,G mroutes that are in a vrf, the kernel was dropping information about the actual incoming interface when calling ip_mr_forward from ip_mr_input. This caused ip_mr_forward to send the multicast packet back out the incoming interface. Fix this by modifying ip_mr_forward to be handed the correctly resolved dev. 2) When a unresolved cache entry is created we store the incoming skb on the unresolved cache entry and upon mroute resolution from the user space daemon, we attempt to forward the packet. Again we were not resolving to the correct incoming device for a vrf scenario, before calling ip_mr_forward. Fix this by resolving to the correct interface and calling ip_mr_forward with the result. Fixes: `e58e415968` ("net: Enable support for VRF with ipv4 multicast") Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Acked-by: David Ahern <dsahern@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-11 18:15:06 -04:00
Daniel Borkmann	ded092cd73	bpf: add bpf_set_hash helper for tc progs Allow for tc BPF programs to set a skb->hash, apart from clearing and triggering a recalc that we have right now. It allows for BPF to implement a custom hashing routine for skb_get_hash(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-10 19:05:47 -04:00
Daniel Borkmann	966789fb86	bpf: remove cg_skb_func_proto and use sk_filter_func_proto directly Since cg_skb_func_proto() doesn't do anything else than just calling into sk_filter_func_proto(), remove it and set sk_filter_func_proto() directly for .get_func_proto callback. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-10 19:05:46 -04:00
Jia-Ju Bai	343eba69c6	net: tipc: Fix a sleep-in-atomic bug in tipc_msg_reverse The kernel may sleep under a rcu read lock in tipc_msg_reverse, and the function call path is: tipc_l2_rcv_msg (acquire the lock by rcu_read_lock) tipc_rcv tipc_sk_rcv tipc_msg_reverse pskb_expand_head(GFP_KERNEL) --> may sleep tipc_node_broadcast tipc_node_xmit_skb tipc_node_xmit tipc_sk_rcv tipc_msg_reverse pskb_expand_head(GFP_KERNEL) --> may sleep To fix it, "GFP_KERNEL" is replaced with "GFP_ATOMIC". Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-10 18:20:38 -04:00
Jia-Ju Bai	f146e872eb	net: caif: Fix a sleep-in-atomic bug in cfpkt_create_pfx The kernel may sleep under a rcu read lock in cfpkt_create_pfx, and the function call path is: cfcnfg_linkup_rsp (acquire the lock by rcu_read_lock) cfctrl_linkdown_req cfpkt_create cfpkt_create_pfx alloc_skb(GFP_KERNEL) --> may sleep cfserl_receive (acquire the lock by rcu_read_lock) cfpkt_split cfpkt_create_pfx alloc_skb(GFP_KERNEL) --> may sleep There is "in_interrupt" in cfpkt_create_pfx to decide use "GFP_KERNEL" or "GFP_ATOMIC". In this situation, "GFP_KERNEL" is used because the function is called under a rcu read lock, instead in interrupt. To fix it, only "GFP_ATOMIC" is used in cfpkt_create_pfx. Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-10 18:19:45 -04:00
Chenbo Feng	89dfba3e1b	Remove the redundant skb->dev initialization in ip6_fragment After moves the skb->dev and skb->protocol initialization into ip6_output, setting the skb->dev inside ip6_fragment is unnecessary. Fixes: 97a7a37a7b7b("ipv6: Initial skb->dev and skb->protocol in ip6_output") Signed-off-by: Chenbo Feng <fengc@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-10 16:25:21 -04:00
Xin Long	4abf5a653b	sctp: no need to check assoc id before calling sctp_assoc_set_id sctp_assoc_set_id does the assoc id check in the beginning when processing dupcookie, no need to do the same check before calling it. v1->v2: fix some typo errs Marcelo pointed in changelog. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-10 16:23:31 -04:00
Xin Long	c0a4c2d1cd	sctp: use read_lock_bh in sctp_eps_seq_show This patch is to use read_lock_bh instead of local_bh_disable and read_lock in sctp_eps_seq_show. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-10 16:22:26 -04:00
Xin Long	6dfe4b97e0	sctp: fix recursive locking warning in sctp_do_peeloff Dmitry got the following recursive locking report while running syzkaller fuzzer, the Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:52 print_deadlock_bug kernel/locking/lockdep.c:1729 [inline] check_deadlock kernel/locking/lockdep.c:1773 [inline] validate_chain kernel/locking/lockdep.c:2251 [inline] __lock_acquire+0xef2/0x3430 kernel/locking/lockdep.c:3340 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755 lock_sock_nested+0xcb/0x120 net/core/sock.c:2536 lock_sock include/net/sock.h:1460 [inline] sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432 sock_release+0x8d/0x1e0 net/socket.c:597 __sock_create+0x38b/0x870 net/socket.c:1226 sock_create+0x7f/0xa0 net/socket.c:1237 sctp_do_peeloff+0x1a2/0x440 net/sctp/socket.c:4879 sctp_getsockopt_peeloff net/sctp/socket.c:4914 [inline] sctp_getsockopt+0x111a/0x67e0 net/sctp/socket.c:6628 sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2690 SYSC_getsockopt net/socket.c:1817 [inline] SyS_getsockopt+0x240/0x380 net/socket.c:1799 entry_SYSCALL_64_fastpath+0x1f/0xc2 This warning is caused by the lock held by sctp_getsockopt() is on one socket, while the other lock that sctp_close() is getting later is on the newly created (which failed) socket during peeloff operation. This patch is to avoid this warning by use lock_sock with subclass SINGLE_DEPTH_NESTING as Wang Cong and Marcelo's suggestion. Reported-by: Dmitry Vyukov <dvyukov@google.com> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-10 16:22:25 -04:00
Xin Long	581409dacc	sctp: disable BH in sctp_for_each_endpoint Now sctp holds read_lock when foreach sctp_ep_hashtable without disabling BH. If CPU schedules to another thread A at this moment, the thread A may be trying to hold the write_lock with disabling BH. As BH is disabled and CPU cannot schedule back to the thread holding the read_lock, while the thread A keeps waiting for the read_lock. A dead lock would be triggered by this. This patch is to fix this dead lock by calling read_lock_bh instead to disable BH when holding the read_lock in sctp_for_each_endpoint. Fixes: `626d16f50f` ("sctp: export some apis or variables for sctp_diag and reuse some for proc") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-10 16:18:10 -04:00
Rosen, Rami	664f46a290	net/packet: remove unneeded declaraion of tpacket_snd(). This patch removes unneeded forward declaration of tpacket_snd() in net/packet/af_packet.c. Signed-off-by: Rami Rosen <rami.rosen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-10 16:15:25 -04:00
Dominik Heidler	9b3dc0a17d	l2tp: cast l2tp traffic counter to unsigned This fixes a counter problem on 32bit systems: When the rx_bytes counter reached 2 GiB, it jumpd to (2^64 Bytes - 2GiB) Bytes. rtnl_link_stats64 has __u64 type and atomic_long_read returns atomic_long_t which is signed. Due to the conversation we get an incorrect value on 32bit systems if the MSB of the atomic_long_t value is set. CC: Tom Parkin <tparkin@katalix.com> Fixes: `7b7c0719cd` ("l2tp: avoid deadlock in l2tp stats update") Signed-off-by: Dominik Heidler <dheidler@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-10 16:14:27 -04:00
Chenbo Feng	384abed1fe	bpf: Remove duplicate tcp_filter hook in ipv6 There are two tcp_filter hooks in tcp_ipv6 ingress path currently. One is at tcp_v6_rcv and another is in tcp_v6_do_rcv. It seems the tcp_filter() call inside tcp_v6_do_rcv is redundent and some packet will be filtered twice in this situation. This will cause trouble when using eBPF filters to account traffic data. Signed-off-by: Chenbo Feng <fengc@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-10 16:08:02 -04:00
Nicolas Dichtel	10d486a30c	netns: fix error code when the nsid is already used When the user tries to assign a specific nsid, idr_alloc() is called with the range [nsid, nsid+1]. If this nsid is already used, idr_alloc() returns ENOSPC (No space left on device). In our case, it's better to return EEXIST to make it clear that the nsid is not available. CC: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-10 15:58:50 -04:00
Nicolas Dichtel	4a7f7bc600	netns: define extack error msg for nsis cmds It helps the user to identify errors. CC: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-10 15:58:50 -04:00
Jason A. Donenfeld	329d823098	Bluetooth: use constant time memory comparison for secret values This file is filled with complex cryptography. Thus, the comparisons of MACs and secret keys and curve points and so forth should not add timing attacks, which could either result in a direct forgery, or, given the complexity, some other type of attack. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org	2017-06-10 15:45:40 +02:00
David S. Miller	f6d4c71332	linux-can-fixes-for-4.12-20170609 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEE4bay/IylYqM/npjQHv7KIOw4HPYFAlk6mLwTHG1rbEBwZW5n dXRyb25peC5kZQAKCRAe/sog7Dgc9vx5CADTRLfMxgmmGsLcM8nO0sKn/7X/o4Hy qprZy1Nu4J49kG/93pumIvWkcG9fzZ4hbDy+4DiEmL/fIumbvg56aaW90fyvW9Cl HzL3S7Id8Db6Qwfr4Wd7r1IRYn08+om4ukO09GdskmMpS4wOr68+3WhutGGu1mEX ZS2sTE/dKnozZqUw6agcwXJmAJWFbYBaVcF+x2GT07ako0jYXF29zqe/GlpmzVBJ Wmib9DO2BeDm/QpNIot/I3Se86xS0AIH6ds++YRJX6k9dlFb4T2Liw6UysPSVCz6 QkAT3SiTu3TKoT1TXEyMQEQSpnrFtpMowdTk/s04bKibVIpTTskZm44c =DNKW -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-4.12-20170609' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2017-06-09 this is a pull request of 6 patches for net/master. There's a patch by Stephane Grosjean that fixes an uninitialized symbol warning in the peak_canfd driver. A patch by Johan Hovold to fix the product-id endianness in an error message in the the peak_usb driver. A patch by Oliver Hartkopp to enable CAN FD for virtual CAN devices by default. Three patches by me, one makes the helper function can_change_state() robust to be called with cf == NULL. The next patch fixes a memory leak in the gs_usb driver. And the last one fixes a lockdep splat by properly initialize the per-net can_rcvlists_lock spin_lock. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-09 15:41:57 -04:00
Johannes Berg	c7a61cba71	mac80211: free netdev on dev_alloc_name() error The change to remove free_netdev() from ieee80211_if_free() erroneously didn't add the necessary free_netdev() for when ieee80211_if_free() is called directly in one place, rather than as the priv_destructor. Add the missing call. Fixes: `cf124db566` ("net: Fix inconsistent teardown and release of private netdev state.") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-09 15:40:15 -04:00
ashwanth@codeaurora.org	773fc8f6e8	net: rps: send out pending IPI's on CPU hotplug IPI's from the victim cpu are not handled in dev_cpu_callback. So these pending IPI's would be sent to the remote cpu only when NET_RX is scheduled on the victim cpu and since this trigger is unpredictable it would result in packet latencies on the remote cpu. This patch add support to send the pending ipi's of victim cpu. Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-09 15:35:01 -04:00
Chenbo Feng	97a7a37a7b	ipv6: Initial skb->dev and skb->protocol in ip6_output Move the initialization of skb->dev and skb->protocol from ip6_finish_output2 to ip6_output. This can make the skb->dev and skb->protocol information avalaible to the CGROUP eBPF filter. Signed-off-by: Chenbo Feng <fengc@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-09 15:07:56 -04:00
Krister Johansen	f186ce61bb	Fix an intermittent pr_emerg warning about lo becoming free. It looks like this: Message from syslogd@flamingo at Apr 26 00:45:00 ... kernel:unregister_netdevice: waiting for lo to become free. Usage count = 4 They seem to coincide with net namespace teardown. The message is emitted by netdev_wait_allrefs(). Forced a kdump in netdev_run_todo, but found that the refcount on the lo device was already 0 at the time we got to the panic. Used bcc to check the blocking in netdev_run_todo. The only places where we're off cpu there are in the rcu_barrier() and msleep() calls. That behavior is expected. The msleep time coincides with the amount of time we spend waiting for the refcount to reach zero; the rcu_barrier() wait times are not excessive. After looking through the list of callbacks that the netdevice notifiers invoke in this path, it appears that the dst_dev_event is the most interesting. The dst_ifdown path places a hold on the loopback_dev as part of releasing the dev associated with the original dst cache entry. Most of our notifier callbacks are straight-forward, but this one a) looks complex, and b) places a hold on the network interface in question. I constructed a new bcc script that watches various events in the liftime of a dst cache entry. Note that dst_ifdown will take a hold on the loopback device until the invalidated dst entry gets freed. [ __dst_free] on DST: ffff883ccabb7900 IF tap1008300eth0 invoked at 1282115677036183 __dst_free rcu_nocb_kthread kthread ret_from_fork Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-09 12:27:28 -04:00
Krister Johansen	3ad7d2468f	Ipvlan should return an error when an address is already in use. The ipvlan code already knows how to detect when a duplicate address is about to be assigned to an ipvlan device. However, that failure is not propogated outward and leads to a silent failure. Introduce a validation step at ip address creation time and allow device drivers to register to validate the incoming ip addresses. The ipvlan code is the first consumer. If it detects an address in use, we can return an error to the user before beginning to commit the new ifa in the networking code. This can be especially useful if it is necessary to provision many ipvlans in containers. The provisioning software (or operator) can use this to detect situations where an ip address is unexpectedly in use. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-09 12:26:07 -04:00
Mateusz Jurczyk	defbcf2dec	af_unix: Add sockaddr length checks before accessing sa_family in bind and connect handlers Verify that the caller-provided sockaddr structure is large enough to contain the sa_family field, before accessing it in bind() and connect() handlers of the AF_UNIX socket. Since neither syscall enforces a minimum size of the corresponding memory region, very short sockaddrs (zero or one byte long) result in operating on uninitialized memory while referencing .sa_family. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-09 10:10:24 -04:00
Simon Wunderlich	75ae84a4fe	batman-adv: simplify return handling in some TT functions Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Sven Eckelmann <sven@narfation.org>	2017-06-09 15:57:00 +02:00
Simon Wunderlich	1227c9ae01	batman-adv: do not add loop detection mac addresses to global tt This change has been made for local TT already, add another one for global TT - but only for temporary entries (aka speedy join), to prevent inconsistencies between local and global tables in case an older batman-adv version is still announcing those entries from its local table. Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Sven Eckelmann <sven@narfation.org>	2017-06-09 15:56:59 +02:00
Antonio Quartulli	d1aa514220	batman-adv: tp_meter: mark init function with __init batadv_tp_meter_init() is invoked in batadv_init() only which is marked with __init. For this reason batadv_tp_meter_init() can be marked with __init as well and dropped after module load. Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2017-06-09 15:56:59 +02:00
Marc Kleine-Budde	74b7b49088	can: af_can: namespace support: fix lockdep splat: properly initialize spin_lock This patch uses spin_lock_init() instead of __SPIN_LOCK_UNLOCKED() to initialize the per namespace net->can.can_rcvlists_lock lock to fix this lockdep warning: \| INFO: trying to register non-static key. \| the code is fine but needs lockdep annotation. \| turning off the locking correctness validator. \| CPU: 0 PID: 186 Comm: candump Not tainted 4.12.0-rc3+ #47 \| Hardware name: Marvell Kirkwood (Flattened Device Tree) \| [<c0016644>] (unwind_backtrace) from [<c00139a8>] (show_stack+0x18/0x1c) \| [<c00139a8>] (show_stack) from [<c0058c8c>] (register_lock_class+0x1e4/0x55c) \| [<c0058c8c>] (register_lock_class) from [<c005bdfc>] (__lock_acquire+0x148/0x1990) \| [<c005bdfc>] (__lock_acquire) from [<c005deec>] (lock_acquire+0x174/0x210) \| [<c005deec>] (lock_acquire) from [<c04a6780>] (_raw_spin_lock+0x50/0x88) \| [<c04a6780>] (_raw_spin_lock) from [<bf02116c>] (can_rx_register+0x94/0x15c [can]) \| [<bf02116c>] (can_rx_register [can]) from [<bf02a868>] (raw_enable_filters+0x60/0xc0 [can_raw]) \| [<bf02a868>] (raw_enable_filters [can_raw]) from [<bf02ac14>] (raw_enable_allfilters+0x2c/0xa0 [can_raw]) \| [<bf02ac14>] (raw_enable_allfilters [can_raw]) from [<bf02ad38>] (raw_bind+0xb0/0x250 [can_raw]) \| [<bf02ad38>] (raw_bind [can_raw]) from [<c03b5fb8>] (SyS_bind+0x70/0xac) \| [<c03b5fb8>] (SyS_bind) from [<c000f8c0>] (ret_fast_syscall+0x0/0x1c) Cc: Mario Kicherer <dev@kicherer.org> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-06-09 11:39:23 +02:00
Willem de Bruijn	fff88030b3	skbuff: only inherit relevant tx_flags When inheriting tx_flags from one skbuff to another, always apply a mask to avoid overwriting unrelated other bits in the field. The two SKBTX_SHARED_FRAG cases clears all other bits. In practice, tx_flags are zero at this point now. But this is fragile. Timestamp flags are set, for instance, if in tcp_gso_segment, after this clear in skb_segment. The SKBTX_ANY_TSTAMP mask in __skb_tstamp_tx ensures that new skbs do not accidentally inherit flags such as SKBTX_SHARED_FRAG. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-08 16:12:08 -04:00
Arnd Bergmann	0db47e3d32	ila_xlat: add missing hash secret initialization While discussing the possible merits of clang warning about unused initialized functions, I found one function that was clearly meant to be called but never actually is. __ila_hash_secret_init() initializes the hash value for the ila locator, apparently this is intended to prevent hash collision attacks, but this ends up being a read-only zero constant since there is no caller. I could find no indication of why it was never called, the earliest patch submission for the module already was like this. If my interpretation is right, we certainly want to backport the patch to stable kernels as well. I considered adding it to the ila_xlat_init callback, but for best effect the random data is read as late as possible, just before it is first used. The underlying net_get_random_once() is already highly optimized to avoid overhead when called frequently. Fixes: `7f00feaf10` ("ila: Add generic ILA translation facility") Cc: stable@vger.kernel.org Link: https://www.spinics.net/lists/kernel/msg2527243.html Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-08 15:36:56 -04:00
Nikolay Aleksandrov	772c344dbb	net: ipmr: add getlink support Currently there's no way to dump the VIF table for an ipmr table other than the default (via proc). This is a major issue when debugging ipmr issues and in general it is good to know which interfaces are configured. This patch adds support for RTM_GETLINK for the ipmr family so we can dump the VIF table and the ipmr table's current config for each table. We're protected by rtnl so no need to acquire RCU or mrt_lock. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-08 14:38:59 -04:00
Arkadi Sharshevsky	eca59f6915	net: Remove support for bridge bypass ndos from stacked devices Remove support for bridge bypass ndos from stacked devices. At this point no driver which supports stack device behavior offload supports operation with SELF flag. The case for upper device is already taken care of in both of the following cases: 1. FDB add/del - driver should check at the notification cb if the stacked device contains his ports. 2. Port attribute - calls switchdev code directly which checks for case of stack device. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-08 14:16:28 -04:00
Arkadi Sharshevsky	9fe8bcec0d	net: bridge: Receive notification about successful FDB offload When a new static FDB is added to the bridge a notification is sent to the driver for offload. In case of successful offload the driver should notify the bridge back, which in turn should mark the FDB as offloaded. Currently, externally learned is equivalent for being offloaded which is not correct due to the fact that FDBs which are added from user-space are also marked as externally learned. In order to specify if an FDB was successfully offloaded a new flag is introduced. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-08 14:16:25 -04:00
Arkadi Sharshevsky	6b26b51b1d	net: bridge: Add support for notifying devices about FDB add/del Currently the bridge doesn't notify the underlying devices about new FDBs learned. The FDB sync is placed on the switchdev notifier chain because devices may potentially learn FDB that are not directly related to their ports, for example: 1. Mixed SW/HW bridge - FDBs that point to the ASICs external devices should be offloaded as CPU traps in order to perform forwarding in slow path. 2. EVPN - Externally learned FDBs for the vtep device. Notification is sent only about static FDB add/del. This is done due to fact that currently this is the only scenario supported by switch drivers. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-08 14:16:25 -04:00
Arkadi Sharshevsky	ff5cf10011	net: switchdev: Change notifier chain to be atomic In order to use the switchdev notifier chain for FDB sync with the device it has to be changed to atomic. The is done because the bridge can learn new FDBs in atomic context. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-08 14:16:24 -04:00
Arkadi Sharshevsky	0baa10fff2	net: bridge: Add support for calling FDB external learning under rcu This is done as a preparation to moving the switchdev notifier chain to be atomic. The FDB external learning should be called under rtnl or rcu. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-08 14:16:24 -04:00
Arkadi Sharshevsky	3922285d96	net: bridge: Add support for offloading port attributes Currently the flood, learning and learning_sync port attributes are offloaded by setting the SELF flag. Add support for offloading the flood and learning attribute through the bridge code. In case of setting an unsupported flag on a offloded port the operation will fail. The learning_sync attribute doesn't have any software representation and cannot be offloaded through the bridge code. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-08 14:16:24 -04:00
Vivien Didelot	b2f81d304c	net: dsa: add CPU and DSA ports as VLAN members In a multi-chip switch fabric, it is currently the responsibility of the driver to add the CPU or DSA (interconnecting chips together) ports as members of a new VLAN entry. This makes the drivers more complicated. We want the DSA drivers to be stupid and the DSA core being the one responsible for caring about the abstracted switch logic and topology. Make the DSA core program the CPU and DSA ports as part of the VLAN. This makes all chips of the data path to be aware of VIDs spanning the the whole fabric and thus, seamlessly add support for cross-chip VLAN. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-08 11:43:32 -04:00
Vivien Didelot	1ca4aa9cd4	net: dsa: check VLAN capability of every switch Now that the VLAN object is propagated to every switch chip of the switch fabric, we can easily ensure that they all support the required VLAN operations before modifying an entry on a single switch. To achieve that, remove the condition skipping other target switches, and add a bitmap of VLAN members, eventually containing the target port, if we are programming the switch target. This will allow us to easily add other VLAN members, such as the DSA or CPU ports (to introduce cross-chip VLAN support) or the other port members if we want to reduce hardware accesses later. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-08 11:43:31 -04:00
David S. Miller	7eca9cc539	RxRPC rewrite -----BEGIN PGP SIGNATURE----- iQIVAwUAWThq9/Sw1s6N8H32AQLfQhAAikphSQnfbT4SkZsVmcZefNMlThGgX2EE 5nDNsDiZnXqAOY5ivMnLlb7JXjby2Ckb3coTa8gVK2RmvgIOqGAVdKqYNJQNqYvi +plwZFHlx+qWBbQRmucAfGorhmdoG3mRyksHHcpeQ4c/9bcfOJXY9QwAwiSZcPXl RDS5QsNVI0nKL/PB8hbKBSp+40/joMJFVSAnBn5X/zxyL5jcoj0Gj7HXj/EKnlfq qO5GiheISjJJ47cTR+J3JXl1OrJqG0Dd17BdgK85S+G2bWy9o7MsotMKd1XHHIkQ IxuQ7oUa3QVKNUF+Lp1Kxx7ve/V6PPzbaFAY2RGyqwImD4iy2dBNpfgzL4/3rpT3 IeFBP57N5f2J2EBKeA90GOXVB71LN520e9WytjjD+NMcyJHaFKjjv4xbr5lUhRPp 6psJHLld6s92NwwPN4YVcT7RrqMFxPC0NmD8xymrm+XnKizdvJQ9TMbD+33nhlV3 yf1DDYBtPq8/hVyMmgywwy/la8KSCv3pybu1GcXx5MsTAoqLOeXcUcWr2d/ljTsg m5xRtjbsw200exf65lc+083W/xXRFGQ9XbFvCPqcefQ+LSE3A4yInTEyzMl0X4WC 2ciqgM11TYrexw+OwDM5oXQWmp58GZlpSDNlvXvWK8RsCJxwYPrF2Fw8/fw7/wcK 7EVfvAA+j0k= =0fbW -----END PGP SIGNATURE----- Merge tag 'rxrpc-rewrite-20170607-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Tx length parameter Here's a set of patches that allows someone initiating a client call with AF_RXRPC to indicate upfront the total amount of data that will be transmitted. This will allow AF_RXRPC to encrypt directly from source buffer to packet rather than having to copy into the buffer and only encrypt when it's full (the encrypted portion of the packet starts with a length and so we can't encrypt until we know what the length will be). The three patches are: (1) Provide a means of finding out what control message types are actually supported. EINVAL is reported if an unsupported cmsg type is seen, so we don't want to set the new cmsg unless we know it will be accepted. (2) Consolidate some stuff into a struct to reduce the parameter count on the function that parses the cmsg buffer. (3) Introduce the RXRPC_TX_LENGTH cmsg. This can be provided on the first sendmsg() that contributes data to a client call request or a service call reply. If provided, the user must provide exactly that amount of data or an error will be incurred. Changes in version 2: (*) struct rxrpc_send_params::tx_total_len should be s64 not u64. Thanks to Julia Lawall for reporting this. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-08 11:41:41 -04:00
Bjorn Andersson	b24844b1b5	net: qrtr: Inform open sockets about new controller As the higher level communication only deals with "services" the a service directory is required to keep track of local and remote services. In order for qrtr clients to be informed about when the service directory implementation is available some event needs to be passed to them. Rather than introducing support for broadcasting such a message in-band to all open local sockets we flag each socket with ENETRESET, as there are no other expected operations that would benefit from having support from locally broadcasting messages. Cc: Courtney Cavin <ccavin@gmail.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-06-08 11:34:57 -04:00

1 2 3 4 5 ...

46895 Commits