linux

Author	SHA1	Message	Date
David S. Miller	b35560e485	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2019-02-21 1) Don't do TX bytes accounting for the esp trailer when sending from a request socket as this will result in an out of bounds memory write. From Martin Willi. 2) Destroy xfrm_state synchronously on net exit path to avoid nested gc flush callbacks that may trigger a warning in xfrm6_tunnel_net_exit(). From Cong Wang. 3) Do an unconditionally clone in pfkey_broadcast_one() to avoid a race when freeing the skb. From Sean Tranchetti. 4) Fix inbound traffic via XFRM interfaces across network namespaces. We did the lookup for interfaces and policies in the wrong namespace. From Tobias Brunner. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 16:08:52 -08:00
Lorenzo Bianconi	103d0244d2	net: ip6_gre: do not report erspan_ver for ip6gre or ip6gretap Report erspan version field to userspace in ip6gre_fill_info just for erspan_v6 tunnels. Moreover report IFLA_GRE_ERSPAN_INDEX only for erspan version 1. The issue can be triggered with the following reproducer: $ip link add name gre6 type ip6gre local 2001::1 remote 2002::2 $ip link set gre6 up $ip -d link sh gre6 14: grep6@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1448 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/gre6 2001::1 peer 2002::2 promiscuity 0 minmtu 0 maxmtu 0 ip6gre remote 2002::2 local 2001::1 hoplimit 64 encaplimit 4 tclass 0x00 flowlabel 0x00000 erspan_index 0 erspan_ver 0 addrgenmode eui64 Fixes: `94d7d8f292` ("ip6_gre: add erspan v2 support") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 16:02:10 -08:00
Lorenzo Bianconi	2bdf700e53	net: ip_gre: do not report erspan_ver for gre or gretap Report erspan version field to userspace in ipgre_fill_info just for erspan tunnels. The issue can be triggered with the following reproducer: $ip link add name gre1 type gre local 192.168.0.1 remote 192.168.1.1 $ip link set dev gre1 up $ip -d link sh gre1 13: gre1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1476 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/gre 192.168.0.1 peer 192.168.1.1 promiscuity 0 minmtu 0 maxmtu 0 gre remote 192.168.1.1 local 192.168.0.1 ttl inherit erspan_ver 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 Fixes: `f551c91de2` ("net: erspan: introduce erspan v2 for ip_gre") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 16:02:10 -08:00
Florian Fainelli	010c8f01aa	net: Get rid of switchdev_port_attr_get() With the bridge no longer calling switchdev_port_attr_get() to obtain the supported bridge port flags from a driver but instead trying to set the bridge port flags directly and relying on driver to reject unsupported configurations, we can effectively get rid of switchdev_port_attr_get() entirely since this was the only place where it was called. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 14:55:14 -08:00
Florian Fainelli	cc0c207a5d	net: Remove SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT Now that we have converted the bridge code and the drivers to check for bridge port(s) flags at the time we try to set them, there is no need for a get() -> set() sequence anymore and SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT therefore becomes unused. Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 14:55:14 -08:00
Florian Fainelli	1ef0764486	net: bridge: Stop calling switchdev_port_attr_get() Now that all switchdev drivers have been converted to check the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS flags and report flags that they do not support accordingly, we can migrate the bridge code to try to set that attribute first, check the results and then do the actual setting. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 14:55:14 -08:00
Florian Fainelli	ea87005a00	net: dsa: Add setter for SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS In preparation for removing SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT, add support for a function that processes the SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS attributes and returns not supported for any flag set, since DSA does not currently support toggling those bridge port attributes (yet). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 14:55:14 -08:00
Russell King	c138806344	net: dsa: enable flooding for bridge ports Switches work by learning the MAC address for each attached station by monitoring traffic from each station. When a station sends a packet, the switch records which port the MAC address is connected to. With IPv4 networking, before communication commences with a neighbour, an ARP packet is broadcasted to all stations asking for the MAC address corresponding with the IPv4. The desired station responds with an ARP reply, and the ARP reply causes the switch to learn which port the station is connected to. With IPv6 networking, the situation is rather different. Rather than broadcasting ARP packets, a "neighbour solicitation" is multicasted rather than broadcasted. This multicast needs to reach the intended station in order for the neighbour to be discovered. Once a neighbour has been discovered, and entered into the sending stations neighbour cache, communication can restart at a point later without sending a new neighbour solicitation, even if the entry in the neighbour cache is marked as stale. This can be after the MAC address has expired from the forwarding cache of the DSA switch - when that occurs, there is a long pause in communication. Our DSA implementation for mv88e6xxx switches disables flooding of multicast and unicast frames for bridged ports. As per the above description, this is fine for IPv4 networking, since the broadcasted ARP queries will be sent to and received by all stations on the same network. However, this breaks IPv6 very badly - blocking neighbour solicitations and later causing connections to stall. The defaults that the Linux bridge code expect from bridges are for unknown unicast and unknown multicast frames to be flooded to all ports on the bridge, which is at odds to the defaults adopted by our DSA implementation for mv88e6xxx switches. This commit enables by default flooding of both unknown unicast and unknown multicast frames whenever a port is added to a bridge, and disables the flooding when a port leaves the bridge. This means that mv88e6xxx DSA switches now behave as per the bridge(8) man page, and IPv6 works flawlessly through such a switch. Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 14:53:07 -08:00
Russell King	57652796aa	net: dsa: add support for bridge flags The Linux bridge implementation allows various properties of the bridge to be controlled, such as flooding unknown unicast and multicast frames. This patch adds the necessary DSA infrastructure to allow the Linux bridge support to control these properties for DSA switches. Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> [florian: Add missing dp and ds variables declaration to fix build] Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 14:53:07 -08:00
Tung Nguyen	48766a583c	tipc: improve function tipc_wait_for_rcvmsg() This commit replaces schedule_timeout() with wait_woken() in function tipc_wait_for_rcvmsg(). wait_woken() uses memory barriers in its implementation to avoid potential race condition when putting a process into sleeping state and then waking it up. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 13:58:05 -08:00
Tung Nguyen	223b7329ec	tipc: improve function tipc_wait_for_cond() Commit `844cf763fb` ("tipc: make macro tipc_wait_for_cond() smp safe") replaced finish_wait() with remove_wait_queue() but still used prepare_to_wait(). This causes unnecessary conditional checking before adding to wait queue in prepare_to_wait(). This commit replaces prepare_to_wait() with add_wait_queue() as the pair function with remove_wait_queue(). Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 13:58:05 -08:00
Li RongQing	08e71623c8	bridge: remove redundant check on err in br_multicast_ipv4_rcv br_ip4_multicast_mrd_rcv only return 0 and -ENOMSG, no other negative value Signed-off-by: Li RongQing <lirongqing@baidu.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 13:48:59 -08:00
Li RongQing	a2b5a3fa2c	net: remove unneeded switch fall-through This case block has been terminated by a return, so not need a switch fall-through Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 13:48:00 -08:00
Dan Carpenter	af736bf071	net: sched: potential NULL dereference in tcf_block_find() The error code isn't set on this path so it would result in returning ERR_PTR(0) and a NULL dereference in the caller. Fixes: `18d3eefb17` ("net: sched: refactor tcf_block_find() into standalone functions") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 13:11:07 -08:00
Callum Sinclair	ca8d4794f6	ipmr: ip6mr: Create new sockopt to clear mfc cache or vifs Currently the only way to clear the forwarding cache was to delete the entries one by one using the MRT_DEL_MFC socket option or to destroy and recreate the socket. Create a new socket option which with the use of optional flags can clear any combination of multicast entries (static or not static) and multicast vifs (static or not static). Calling the new socket option MRT_FLUSH with the flags MRT_FLUSH_MFC and MRT_FLUSH_VIFS will clear all entries and vifs on the socket except for static entries. Signed-off-by: Callum Sinclair <callum.sinclair@alliedtelesis.co.nz> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 13:05:05 -08:00
Aya Levin	574b1e1f45	devlink: Modify reply of DEVLINK_CMD_HEALTH_REPORTER_GET Avoid sending attributes related to recovery: DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD and DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER in reply to DEVLINK_CMD_HEALTH_REPORTER_GET for a reporter which didn't register a recover operation. These parameters can't be configured on a reporter that did not provide a recover operation, thus not needed to return them. Fixes: `7afe335a8b` ("devlink: Add health get command") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 10:38:51 -08:00
Aya Levin	54719527fd	devlink: Rename devlink health attributes Rename devlink health attributes for better reflect the attributes use. Add COUNT prefix on error counter attribute and recovery counter attribute. Fixes: `7afe335a8b` ("devlink: Add health get command") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 10:38:51 -08:00
Hans Wippel	af5f60c7e3	net/smc: allow PCI IDs as ib device names in the pnet table SMC-D devices are identified by their PCI IDs in the pnet table. In order to make usage of the pnet table more consistent for users, this patch adds this form of identification for ib devices as well. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 10:34:37 -08:00
Hans Wippel	64e28b52c7	net/smc: add pnet table namespace support This patch adds namespace support to the pnet table code. Each network namespace gets its own pnet table. Infiniband and smcd device pnetids can only be modified in the initial namespace. In other namespaces they can still be used as if they were set by the underlying hardware. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 10:34:37 -08:00
Hans Wippel	f3d74b2245	net/smc: add smcd support to the pnet table Currently, users can only set pnetids for netdevs and ib devices in the pnet table. This patch adds support for smcd devices to the pnet table. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 10:34:37 -08:00
Hans Wippel	890a2cb4a9	net/smc: rework pnet table If a device does not have a pnetid, users can set a temporary pnetid for said device in the pnet table. This patch reworks the pnet table to make it more flexible. Multiple entries with the same pnetid but differing devices are now allowed. Additionally, the netlink interface now sends each mapping from pnetid to device separately to the user while maintaining the message format existing applications might expect. Also, the SMC data structure for ib devices already has a pnetid attribute. So, it is used to store the user defined pnetids. As a result, the pnet table entries are only used for netdevs. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 10:34:37 -08:00
Ursula Braun	cecc7a317d	net/smc: cleanup for smcr_tx_sndbuf_nonempty Use local variable pflags from the beginning of function smcr_tx_sndbuf_nonempty Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 10:34:37 -08:00
Ursula Braun	d7cf4a3bf3	net/smc: fix smc_poll in SMC_INIT state smc_poll() returns with mask bit EPOLLPRI if the connection urg_state is SMC_URG_VALID. Since SMC_URG_VALID is zero, smc_poll signals EPOLLPRI errorneously if called in state SMC_INIT before the connection is created, for instance in a non-blocking connect scenario. This patch switches to non-zero values for the urg states. Reviewed-by: Karsten Graul <kgraul@linux.ibm.com> Fixes: `de8474eb9d` ("net/smc: urgent data support") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 10:19:20 -08:00
Paolo Abeni	bf1dc8bad1	ipv6: route: enforce RCU protection in ip6_route_check_nh_onlink() We need a RCU critical section around rt6_info->from deference, and proper annotation. Fixes: `4ed591c8ab` ("net/ipv6: Allow onlink routes to have a device mismatch if it is the default route") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 09:54:35 -08:00
Paolo Abeni	193f3685d0	ipv6: route: enforce RCU protection in rt6_update_exception_stamp_rt() We must access rt6_info->from under RCU read lock: move the dereference under such lock, with proper annotation. v1 -> v2: - avoid using multiple, racy, fetch operations for rt->from Fixes: `a68886a691` ("net/ipv6: Make from in rt6_info rcu protected") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-21 09:54:35 -08:00
Linus Torvalds	8a61716ff2	Merge tag 'ceph-for-5.0-rc8' of git://github.com/ceph/ceph-client Pull ceph fixes from Ilya Dryomov: "Two bug fixes for old issues, both marked for stable" * tag 'ceph-for-5.0-rc8' of git://github.com/ceph/ceph-client: ceph: avoid repeatedly adding inode to mdsc->snap_flush_list libceph: handle an empty authorize reply	2019-02-21 09:43:37 -08:00
Björn Töpel	11fe9262ed	Revert "xsk: simplify AF_XDP socket teardown" This reverts commit `e2ce367488`. It turns out that the sock destructor xsk_destruct was needed after all. The cleanup simplification broke the skb transmit cleanup path, due to that the umem was prematurely destroyed. The umem cannot be destroyed until all outstanding skbs are freed, which means that we cannot remove the umem until the sk_destruct has been called. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-21 16:32:25 +01:00
J. Bruce Fields	b7e5034cbe	svcrpc: fix UDP on servers with lots of threads James Pearson found that an NFS server stopped responding to UDP requests if started with more than 1017 threads. sv_max_mesg is about 2^20, so that is probably where the calculation performed by svc_sock_setbufsize(svsk->sk_sock, (serv->sv_nrthreads+3) * serv->sv_max_mesg, (serv->sv_nrthreads+3) * serv->sv_max_mesg); starts to overflow an int. Reported-by: James Pearson <jcpearson@gmail.com> Tested-by: James Pearson <jcpearson@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2019-02-21 10:17:36 -05:00
Cong Wang	51dcb69de6	net_sched: fix a memory leak in cls_tcindex (cherry picked from commit `033b228e7f`) When tcindex_destroy() destroys all the filter results in the perfect hash table, it invokes the walker to delete each of them. However, results with class==0 are skipped in either tcindex_walk() or tcindex_delete(), which causes a memory leak reported by kmemleak. This patch fixes it by skipping the walker and directly deleting these filter results so we don't miss any filter result. As a result of this change, we have to initialize exts->net properly in tcindex_alloc_perfect_hash(). For net-next, we need to consider whether we should initialize ->net in tcf_exts_init() instead, before that just directly test CONFIG_NET_CLS_ACT=y. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-20 20:11:10 -08:00
Cong Wang	3d210534cc	net_sched: fix a race condition in tcindex_destroy() (cherry picked from commit `8015d93ebd`) tcindex_destroy() invokes tcindex_destroy_element() via a walker to delete each filter result in its perfect hash table, and tcindex_destroy_element() calls tcindex_delete() which schedules tcf RCU works to do the final deletion work. Unfortunately this races with the RCU callback __tcindex_destroy(), which could lead to use-after-free as reported by Adrian. Fix this by migrating this RCU callback to tcf RCU work too, as that workqueue is ordered, we will not have use-after-free. Note, we don't need to hold netns refcnt because we don't call tcf_exts_destroy() here. Fixes: `27ce4f05e2` ("net_sched: use tcf_queue_work() in tcindex filter") Reported-by: Adrian <bugs@abtelecom.ro> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-20 20:11:10 -08:00
Al Viro	ae3b564179	missing barriers in some of unix_sock ->addr and ->path accesses Several u->addr and u->path users are not holding any locks in common with unix_bind(). unix_state_lock() is useless for those purposes. u->addr is assign-once and (u->addr) is fully set up by the time we set u->addr (all under unix_table_lock). u->path is also set in the same critical area, also before setting u->addr, and any unix_sock with ->path filled will have non-NULL ->addr. So setting ->addr with smp_store_release() is all we need for those "lockless" users - just have them fetch ->addr with smp_load_acquire() and don't even bother looking at ->path if they see NULL ->addr. Users of ->addr and ->path fall into several classes now: 1) ones that do smp_load_acquire(u->addr) and access (u->addr) and u->path only if smp_load_acquire() has returned non-NULL. 2) places holding unix_table_lock. These are guaranteed that (u->addr) is seen fully initialized. If unix_sock is in one of the "bound" chains, so's ->path. 3) unix_sock_destructor() using ->addr is safe. All places that set u->addr are guaranteed to have seen all stores (u->addr) while holding a reference to u and unix_sock_destructor() is called when (atomic) refcount hits zero. 4) unix_release_sock() using ->path is safe. unix_bind() is serialized wrt unix_release() (normally - by struct file refcount), and for the instances that had ->path set by unix_bind() unix_release_sock() comes from unix_release(), so they are fine. Instances that had it set in unix_stream_connect() either end up attached to a socket (in unix_accept()), in which case the call chain to unix_release_sock() and serialization are the same as in the previous case, or they never get accept'ed and unix_release_sock() is called when the listener is shut down and its queue gets purged. In that case the listener's queue lock provides the barriers needed - unix_stream_connect() shoves our unix_sock into listener's queue under that lock right after having set ->path and eventual unix_release_sock() caller picks them from that queue under the same lock right before calling unix_release_sock(). 5) unix_find_other() use of ->path is pointless, but safe - it happens with successful lookup by (abstract) name, so ->path.dentry is guaranteed to be NULL there. earlier-variant-reviewed-by: "Paul E. McKenney" <paulmck@linux.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-20 20:06:28 -08:00
Trond Myklebust	6f903b111e	SUNRPC: Remove the redundant 'zerocopy' argument to xs_sendpages() Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2019-02-20 17:35:58 -05:00
Trond Myklebust	c87dc4c73b	SUNRPC: Further cleanups of xs_sendpages() Now that we send the pages using a struct msghdr, instead of using sendpage(), we no longer need to 'prime the socket' with an address for unconnected UDP messages. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2019-02-20 17:35:58 -05:00
Trond Myklebust	0472e47660	SUNRPC: Convert socket page send code to use iov_iter() Simplify the page send code using iov_iter and bvecs. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2019-02-20 17:35:58 -05:00
Trond Myklebust	e791f8e938	SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec() Prepare to the socket transmission code to use iov_iter. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2019-02-20 17:35:58 -05:00
Trond Myklebust	5f52a9d429	SUNRPC: Initiate a connection close on an ESHUTDOWN error in stream receive If the client stream receive code receives an ESHUTDOWN error either because the server closed the connection, or because it sent a callback which cannot be processed, then we should shut down the connection. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2019-02-20 17:35:58 -05:00
Trond Myklebust	727fcc64a0	SUNRPC: Don't suppress socket errors when a message read completes If the message read completes, but the socket returned an error condition, we should ensure to propagate that error. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2019-02-20 17:35:58 -05:00
Trond Myklebust	e92053a52e	SUNRPC: Handle zero length fragments correctly A zero length fragment is really a bug, but let's ensure we don't go nuts when one turns up. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2019-02-20 17:35:58 -05:00
Trond Myklebust	ae05355151	SUNRPC: Don't reset the stream record info when the receive worker is running To ensure that the receive worker has exclusive access to the stream record info, we must not reset the contents other than when holding the transport->recv_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2019-02-20 17:33:55 -05:00
NeilBrown	e3735c8998	SUNRPC: remove pointless test in unx_match() As reported by Dan Carpenter, this test for acred->cred being set is inconsistent with the dereference of the pointer a few lines earlier. An 'auth_cred' always has ->cred set - every place that creates one initializes this field, often as the first thing done. So remove this test. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2019-02-20 17:33:55 -05:00
Trond Myklebust	b9779a54bb	SUNRPC: Ensure rq_bytes_sent is reset before request transmission When we resend a request, ensure that the 'rq_bytes_sent' is reset to zero. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2019-02-20 17:33:55 -05:00
Trond Myklebust	0ffe86f480	SUNRPC: Use poll() to fix up the socket requeue races Because we clear XPRT_SOCK_DATA_READY before reading, we can end up with a situation where new data arrives, causing xs_data_ready() to queue up a second receive worker job for the same socket, which then immediately gets stuck waiting on the transport receive mutex. The fix is to only clear XPRT_SOCK_DATA_READY once we're done reading, and then to use poll() to check if we might need to queue up a new job in order to deal with any new data. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2019-02-20 17:33:54 -05:00
Trond Myklebust	a1231fda7e	SUNRPC: Set memalloc_nofs_save() on all rpciod/xprtiod jobs Set memalloc_nofs_save() on all the rpciod/xprtiod jobs so that we ensure memory allocations for asynchronous rpc calls don't ever end up recursing back to the NFS layer for memory reclaim. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2019-02-20 15:14:21 -05:00
Willem de Bruijn	418e897e07	gso: validate gso_type on ipip style tunnels Commit `121d57af30` ("gso: validate gso_type in GSO handlers") added gso_type validation to existing gso_segment callback functions, to filter out illegal and potentially dangerous SKB_GSO_DODGY packets. Convert tunnels that now call inet_gso_segment and ipv6_gso_segment directly to have their own callbacks and extend validation to these. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-20 11:24:27 -08:00
Russell King	9c2054a5cf	net: dsa: fix unintended change of bridge interface STP state When a DSA port is added to a bridge and brought up, the resulting STP state programmed into the hardware depends on the order that these operations are performed. However, the Linux bridge code believes that the port is in disabled mode. If the DSA port is first added to a bridge and then brought up, it will be in blocking mode. If it is brought up and then added to the bridge, it will be in disabled mode. This difference is caused by DSA always setting the STP mode in dsa_port_enable() whether or not this port is part of a bridge. Since bridge always sets the STP state when the port is added, brought up or taken down, it is unnecessary for us to manipulate the STP state. Apparently, this code was copied from Rocker, and the very next day a similar fix for Rocker was merged but was not propagated to DSA. See `e47172ab7e` ("rocker: put port in FORWADING state after leaving bridge") Fixes: `b73adef677` ("net: dsa: integrate with SWITCHDEV for HW bridging") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-20 11:08:26 -08:00
David S. Miller	375ca548f7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Two easily resolvable overlapping change conflicts, one in TCP and one in the eBPF verifier. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-20 00:34:07 -08:00
Linus Torvalds	40e196a906	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix suspend and resume in mt76x0u USB driver, from Stanislaw Gruszka. 2) Missing memory barriers in xsk, from Magnus Karlsson. 3) rhashtable fixes in mac80211 from Herbert Xu. 4) 32-bit MIPS eBPF JIT fixes from Paul Burton. 5) Fix for_each_netdev_feature() on big endian, from Hauke Mehrtens. 6) GSO validation fixes from Willem de Bruijn. 7) Endianness fix for dwmac4 timestamp handling, from Alexandre Torgue. 8) More strict checks in tcp_v4_err(), from Eric Dumazet. 9) af_alg_release should NULL out the sk after the sock_put(), from Mao Wenan. 10) Missing unlock in mac80211 mesh error path, from Wei Yongjun. 11) Missing device put in hns driver, from Salil Mehta. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) sky2: Increase D3 delay again vhost: correctly check the return value of translate_desc() in log_used() net: netcp: Fix ethss driver probe issue net: hns: Fixes the missing put_device in positive leg for roce reset net: stmmac: Fix a race in EEE enable callback qed: Fix iWARP syn packet mac address validation. qed: Fix iWARP buffer size provided for syn packet processing. r8152: Add support for MAC address pass through on RTL8153-BD mac80211: mesh: fix missing unlock on error in table_path_del() net/mlx4_en: fix spelling mistake: "quiting" -> "quitting" net: crypto set sk to NULL when af_alg_release. net: Do not allocate page fragments that are not skb aligned mm: Use fixed constant in page_frag_alloc instead of size + 1 tcp: tcp_v4_err() should be more careful tcp: clear icsk_backoff in tcp_write_queue_purge() net: mv643xx_eth: disable clk on error path in mv643xx_eth_shared_probe() qmi_wwan: apply SET_DTR quirk to Sierra WP7607 net: stmmac: handle endianness in dwmac4_get_timestamp doc: Mention MSG_ZEROCOPY implementation for UDP mlxsw: __mlxsw_sp_port_headroom_set(): Fix a use of local variable ...	2019-02-19 16:13:19 -08:00
David S. Miller	d2cf821ff6	Merge branch 'ieee802154-for-davem-2019-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next Stefan Schmidt says: ==================== pull-request: ieee802154-next 2019-02-19 An update from ieee802154 for net-next Another quite quite cycle in the ieee802154 subsystem. Peter did a rework of the IP frag queue handling to make it use rbtree and get in line with the core IPv4 and IPv6 implementatiosn in the kernel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-19 14:10:11 -08:00
YueHaibing	3b9c9f3b0b	net: rose: add missing dev_put() on error in rose_bind when capable check failed, dev_put should be call before return -EACCES. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-19 13:22:46 -08:00
Jesper Dangaard Brouer	74e31ca850	bpf: add skb->queue_mapping write access from tc clsact The skb->queue_mapping already have read access, via __sk_buff->queue_mapping. This patch allow BPF tc qdisc clsact write access to the queue_mapping via tc_cls_act_is_valid_access. Also handle that the value NO_QUEUE_MAPPING is not allowed. It is already possible to change this via TC filter action skbedit tc-skbedit(8). Due to the lack of TC examples, lets show one: # tc qdisc add dev ixgbe1 clsact # tc filter add dev ixgbe1 ingress matchall action skbedit queue_mapping 5 # tc filter list dev ixgbe1 ingress The most common mistake is that XPS (Transmit Packet Steering) takes precedence over setting skb->queue_mapping. XPS is configured per DEVICE via /sys/class/net/DEVICE/queues/tx-*/xps_cpus via a CPU hex mask. To disable set mask=00. The purpose of changing skb->queue_mapping is to influence the selection of the net_device "txq" (struct netdev_queue), which influence selection of the qdisc "root_lock" (via txq->qdisc->q.lock) and txq->_xmit_lock. When using the MQ qdisc the txq->qdisc points to different qdiscs and associated locks, and HARD_TX_LOCK (txq->_xmit_lock), allowing for CPU scalability. Due to lack of TC examples, lets show howto attach clsact BPF programs: # tc qdisc add dev ixgbe2 clsact # tc filter add dev ixgbe2 egress bpf da obj XXX_kern.o sec tc_qmap2cpu # tc filter list dev ixgbe2 egress Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-19 21:56:05 +01:00

... 133 134 135 136 137 ...

61760 Commits