linux

Author	SHA1	Message	Date
Vlad Buslov	97394bef56	net: sched: change tcf block offload counter type to atomic_t As a preparation for running proto ops functions without rtnl lock, change offload counter type to atomic. This is necessary to allow updating the counter by multiple concurrent users when offloading filters to hardware from unlocked classifiers. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-26 14:17:43 -07:00
Vlad Buslov	4f8116c850	net: sched: protect block offload-related fields with rw_semaphore In order to remove dependency on rtnl lock, extend tcf_block with 'cb_lock' rwsem and use it to protect flow_block->cb_list and related counters from concurrent modification. The lock is taken in read mode for read-only traversal of cb_list in tc_setup_cb_call() and write mode in all other cases. This approach ensures that: - cb_list is not changed concurrently while filters is being offloaded on block. - block->nooffloaddevcnt is checked while holding the lock in read mode, but is only changed by bind/unbind code when holding the cb_lock in write mode to prevent concurrent modification. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-26 14:17:43 -07:00
Chuck Lever	98ef77d1aa	xprtrdma: Send Queue size grows after a reconnect Eli Dorfman reports that after a series of idle disconnects, an RPC/RDMA transport becomes unusable (rdma_create_qp returns -ENOMEM). Problem was tracked down to increasing Send Queue size after each reconnect. The rdma_create_qp() API does not promise to leave its @qp_init_attr parameter unaltered. In fact, some drivers do modify one or more of its fields. Thus our calls to rdma_create_qp must use a fresh copy of ib_qp_init_attr each time. This fix is appropriate for kernels dating back to late 2007, though it will have to be adapted, as the connect code has changed over the years. Reported-by: Eli Dorfman <eli@vastdata.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2019-08-26 15:45:38 -04:00
Chuck Lever	f9e1afe0fa	xprtrdma: Clear xprt->reestablish_timeout on close Ensure that the re-establishment delay does not grow exponentially on each good reconnect. This probably should have been part of commit `675dd90ad0` ("xprtrdma: Modernize ops->connect"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2019-08-26 15:34:59 -04:00
Trond Myklebust	c82e5472c9	SUNRPC: Handle connection breakages correctly in call_status() If the connection breaks while we're waiting for a reply from the server, then we want to immediately try to reconnect. Fixes: `ec6017d903` ("SUNRPC fix regression in umount of a secure mount") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2019-08-26 15:31:29 -04:00
Trond Myklebust	d5711920ec	Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated" This reverts commit `a79f194aa4`. The mechanism for aborting I/O is racy, since we are not guaranteed that the request is asleep while we're changing both task->tk_status and task->tk_action. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v5.1	2019-08-26 15:31:29 -04:00
Trond Myklebust	80f455da6c	SUNRPC: Handle EADDRINUSE and ENOBUFS correctly If a connect or bind attempt returns EADDRINUSE, that means we want to retry with a different port. It is not a fatal connection error. Similarly, ENOBUFS is not fatal, but just indicates a memory allocation issue. Retry after a short delay. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2019-08-26 15:31:29 -04:00
Trond Myklebust	bd736ed3e2	SUNRPC: Don't handle errors if the bind/connect succeeded Don't handle errors in call_bind_status()/call_connect_status() if it turns out that a previous call caused it to succeed. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v5.1+	2019-08-26 15:31:29 -04:00
Chuck Lever	ee2f412ece	xprtrdma: Recycle MRs after disconnect The optimization done in "xprtrdma: Simplify rpcrdma_mr_pop" was a bit too optimistic. MRs left over after a reconnect still need to be recycled, not added back to the free list, since they could be in flight or actually fully registered. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2019-08-26 15:22:31 -04:00
Michael Braun	65af4a1074	netfilter: nfnetlink_log: add support for VLAN information Currently, there is no vlan information (e.g. when used with a vlan aware bridge) passed to userspache, HWHEADER will contain an 08 00 (ip) suffix even for tagged ip packets. Therefore, add an extra netlink attribute that passes the vlan information to userspace similarly to `15824ab29f` for nfqueue. Signed-off-by: Michael Braun <michael-dev@fami-braun.de> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-26 11:06:07 +02:00
Ander Juaristi	63d10e12b0	netfilter: nft_meta: support for time matching This patch introduces meta matches in the kernel for time (a UNIX timestamp), day (a day of week, represented as an integer between 0-6), and hour (an hour in the current day, or: number of seconds since midnight). All values are taken as unsigned 64-bit integers. The 'time' keyword is internally converted to nanoseconds by nft in userspace, and hence the timestamp is taken in nanoseconds as well. Signed-off-by: Ander Juaristi <a@juaristi.eus> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-26 11:03:14 +02:00
Ander Juaristi	a1b840adaf	netfilter: nf_tables: Introduce new 64-bit helper register functions Introduce new helper functions to load/store 64-bit values onto/from registers: - nft_reg_store64 - nft_reg_load64 This commit also re-orders all these helpers from smallest to largest target bit size. Signed-off-by: Ander Juaristi <a@juaristi.eus> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2019-08-26 11:01:00 +02:00
Yi-Hung Wei	7177895154	openvswitch: Fix conntrack cache with timeout This patch addresses a conntrack cache issue with timeout policy. Currently, we do not check if the timeout extension is set properly in the cached conntrack entry. Thus, after packet recirculate from conntrack action, the timeout policy is not applied properly. This patch fixes the aforementioned issue. Fixes: `06bd2bdf19` ("openvswitch: Add timeout support to ct action") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-25 14:48:43 -07:00
Alexey Kodanev	803f3e22ae	ipv4: mpls: fix mpls_xmit for iptunnel When using mpls over gre/gre6 setup, rt->rt_gw4 address is not set, the same for rt->rt_gw_family. Therefore, when rt->rt_gw_family is checked in mpls_xmit(), neigh_xmit() call is skipped. As a result, such setup doesn't work anymore. This issue was found with LTP mpls03 tests. Fixes: `1550c17193` ("ipv4: Prepare rtable for IPv6 gateway") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-25 14:34:08 -07:00
Zhu Yanjun	e0e6d06282	net: rds: add service level support in rds-info >From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL) is used to identify different flows within an IBA subnet. It is carried in the local route header of the packet. Before this commit, run "rds-info -I". The outputs are as below: " RDS IB Connections: LocalAddr RemoteAddr Tos SL LocalDev RemoteDev 192.2.95.3 192.2.95.1 2 0 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 1 0 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9 " After this commit, the output is as below: " RDS IB Connections: LocalAddr RemoteAddr Tos SL LocalDev RemoteDev 192.2.95.3 192.2.95.1 2 2 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 1 1 fe80::21:28:1a:39 fe80::21:28:10:b9 192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9 " The commit `fe3475af3b` ("net: rds: add per rds connection cache statistics") adds cache_allocs in struct rds_info_rdma_connection as below: struct rds_info_rdma_connection { ... __u32 rdma_mr_max; __u32 rdma_mr_size; __u8 tos; __u32 cache_allocs; }; The peer struct in rds-tools of struct rds_info_rdma_connection is as below: struct rds_info_rdma_connection { ... uint32_t rdma_mr_max; uint32_t rdma_mr_size; uint8_t tos; uint8_t sl; uint32_t cache_allocs; }; The difference between userspace and kernel is the member variable sl. In the kernel struct, the member variable sl is missing. This will introduce risks. So it is necessary to use this commit to avoid this risk. Fixes: `fe3475af3b` ("net: rds: add per rds connection cache statistics") CC: Joe Jin <joe.jin@oracle.com> CC: JUNXIAO_BI <junxiao.bi@oracle.com> Suggested-by: Gerd Rausch <gerd.rausch@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-24 16:55:25 -07:00
John Fastabend	e93fb3e952	net: route dump netlink NLM_F_MULTI flag missing An excerpt from netlink(7) man page, In multipart messages (multiple nlmsghdr headers with associated payload in one byte stream) the first and all following headers have the NLM_F_MULTI flag set, except for the last header which has the type NLMSG_DONE. but, after (`ee28906`) there is a missing NLM_F_MULTI flag in the middle of a FIB dump. The result is user space applications following above man page excerpt may get confused and may stop parsing msg believing something went wrong. In the golang netlink lib [0] the library logic stops parsing believing the message is not a multipart message. Found this running Cilium[1] against net-next while adding a feature to auto-detect routes. I noticed with multiple route tables we no longer could detect the default routes on net tree kernels because the library logic was not returning them. Fix this by handling the fib_dump_info_fnhe() case the same way the fib_dump_info() handles it by passing the flags argument through the call chain and adding a flags argument to rt_fill_info(). Tested with Cilium stack and auto-detection of routes works again. Also annotated libs to dump netlink msgs and inspected NLM_F_MULTI and NLMSG_DONE flags look correct after this. Note: In inet_rtm_getroute() pass rt_fill_info() '0' for flags the same as is done for fib_dump_info() so this looks correct to me. [0] https://github.com/vishvananda/netlink/ [1] https://github.com/cilium/ Fixes: `ee28906fd7` ("ipv4: Dump route exceptions if requested") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-24 16:49:48 -07:00
zhanglin	b45ce32135	sock: fix potential memory leak in proto_register() If protocols registered exceeded PROTO_INUSE_NR, prot will be added to proto_list, but no available bit left for prot in proto_inuse_idx. Changes since v2: * Propagate the error code properly Signed-off-by: zhanglin <zhang.lin16@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-24 16:33:14 -07:00
Markus Elfring	dd016aca28	net/core/skmsg: Delete an unnecessary check before the function call “consume_skb” The consume_skb() function performs also input parameter validation. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-24 16:24:53 -07:00
Hangbin Liu	c3b4c3a47e	xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md mode In decode_session{4,6} there is a possibility that the skb dst dev is NULL, e,g, with tunnel collect_md mode, which will cause kernel crash. Here is what the code path looks like, for GRE: - ip6gre_tunnel_xmit - ip6gre_xmit_ipv6 - __gre6_xmit - ip6_tnl_xmit - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE - icmpv6_send - icmpv6_route_lookup - xfrm_decode_session_reverse - decode_session4 - oif = skb_dst(skb)->dev->ifindex; <-- here - decode_session6 - oif = skb_dst(skb)->dev->ifindex; <-- here The reason is __metadata_dst_init() init dst->dev to NULL by default. We could not fix it in __metadata_dst_init() as there is no dev supplied. On the other hand, the skb_dst(skb)->dev is actually not needed as we called decode_session{4,6} via xfrm_decode_session_reverse(), so oif is not used by: fl4->flowi4_oif = reverse ? skb->skb_iif : oif; So make a dst dev check here should be clean and safe. v4: No changes. v3: No changes. v2: fix the issue in decode_session{4,6} instead of updating shared dst dev in {ip_md, ip6}_tunnel_xmit. Fixes: `8d79266bc4` ("ip6_tunnel: add collect_md mode to IPv6 tunnels") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-24 14:49:35 -07:00
Hangbin Liu	e2c6939341	ipv4/icmp: fix rt dst dev null pointer dereference In __icmp_send() there is a possibility that the rt->dst.dev is NULL, e,g, with tunnel collect_md mode, which will cause kernel crash. Here is what the code path looks like, for GRE: - ip6gre_tunnel_xmit - ip6gre_xmit_ipv4 - __gre6_xmit - ip6_tnl_xmit - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE - icmp_send - net = dev_net(rt->dst.dev); <-- here The reason is __metadata_dst_init() init dst->dev to NULL by default. We could not fix it in __metadata_dst_init() as there is no dev supplied. On the other hand, the reason we need rt->dst.dev is to get the net. So we can just try get it from skb->dev when rt->dst.dev is NULL. v4: Julian Anastasov remind skb->dev also could be NULL. We'd better still use dst.dev and do a check to avoid crash. v3: No changes. v2: fix the issue in __icmp_send() instead of updating shared dst dev in {ip_md, ip6}_tunnel_xmit. Fixes: `c8b34e680a` ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-24 14:49:35 -07:00
Yi-Hung Wei	12c6bc38f9	openvswitch: Fix log message in ovs conntrack Fixes: `06bd2bdf19` ("openvswitch: Add timeout support to ct action") Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-24 14:18:59 -07:00
David S. Miller	12e2e15d83	Merge branch 'ieee802154-for-davem-2019-08-24' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan Stefan Schmidt says: ==================== pull-request: ieee802154 for net 2019-08-24 An update from ieee802154 for your net tree. Yue Haibing fixed two bugs discovered by KASAN in the hwsim driver for ieee802154 and Colin Ian King cleaned up a redundant variable assignment. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-24 13:46:57 -07:00
David S. Miller	211c462452	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2019-08-24 The following pull-request contains BPF updates for your net tree. The main changes are: 1) Fix verifier precision tracking with BPF-to-BPF calls, from Alexei. 2) Fix a use-after-free in prog symbol exposure, from Daniel. 3) Several s390x JIT fixes plus BE related fixes in BPF kselftests, from Ilya. 4) Fix memory leak by unpinning XDP umem pages in error path, from Ivan. 5) Fix a potential use-after-free on flow dissector detach, from Jakub. 6) Fix bpftool to close prog fd after showing metadata, from Quentin. 7) BPF kselftest config and TEST_PROGS_EXTENDED fixes, from Anders. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-23 17:34:11 -07:00
Ilya Leoshkevich	2c238177bd	bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0 test_select_reuseport fails on s390 due to verifier rejecting test_select_reuseport_kern.o with the following message: ; data_check.eth_protocol = reuse_md->eth_protocol; 18: (69) r1 = (u16 )(r6 +22) invalid bpf_context access off=22 size=2 This is because on big-endian machines casts from __u32 to __u16 are generated by referencing the respective variable as __u16 with an offset of 2 (as opposed to 0 on little-endian machines). The verifier already has all the infrastructure in place to allow such accesses, it's just that they are not explicitly enabled for eth_protocol field. Enable them for eth_protocol field by using bpf_ctx_range instead of offsetof. Ditto for ip_protocol, bind_inany and len, since they already allow narrowing, and the same problem can arise when working with them. Fixes: `2dbb9b9e6d` ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-24 01:25:41 +02:00
Jakub Sitnicki	db38de3968	flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH Call to bpf_prog_put(), with help of call_rcu(), queues an RCU-callback to free the program once a grace period has elapsed. The callback can run together with new RCU readers that started after the last grace period. New RCU readers can potentially see the "old" to-be-freed or already-freed pointer to the program object before the RCU update-side NULLs it. Reorder the operations so that the RCU update-side resets the protected pointer before the end of the grace period after which the program will be freed. Fixes: `d58e468b11` ("flow_dissector: implements flow dissector BPF hook") Reported-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: Petar Penkov <ppenkov@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-24 01:15:34 +02:00
Ido Schimmel	bd1200b795	drop_monitor: Make timestamps y2038 safe Timestamps are currently communicated to user space as 'struct timespec', which is not considered y2038 safe since it uses a 32-bit signed value for seconds. Fix this while the API is still not part of any official kernel release by using 64-bit nanoseconds timestamps instead. Fixes: `ca30707dee` ("drop_monitor: Add packet alert mode") Fixes: `5e58109b1e` ("drop_monitor: Add support for packet alert mode for hardware drops") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-23 14:58:07 -07:00
Dag Moxnes	bf1867db9b	net/rds: Whitelist rdma_cookie and rx_tstamp for usercopy Add the RDMA cookie and RX timestamp to the usercopy whitelist. After the introduction of hardened usercopy whitelisting (https://lwn.net/Articles/727322/), a warning is displayed when the RDMA cookie or RX timestamp is copied to userspace: kernel: WARNING: CPU: 3 PID: 5750 at mm/usercopy.c:81 usercopy_warn+0x8e/0xa6 [...] kernel: Call Trace: kernel: __check_heap_object+0xb8/0x11b kernel: __check_object_size+0xe3/0x1bc kernel: put_cmsg+0x95/0x115 kernel: rds_recvmsg+0x43d/0x620 [rds] kernel: sock_recvmsg+0x43/0x4a kernel: ___sys_recvmsg+0xda/0x1e6 kernel: ? __handle_mm_fault+0xcae/0xf79 kernel: __sys_recvmsg+0x51/0x8a kernel: SyS_recvmsg+0x12/0x1c kernel: do_syscall_64+0x79/0x1ae When the whitelisting feature was introduced, the memory for the RDMA cookie and RX timestamp in RDS was not added to the whitelist, causing the warning above. Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com> Tested-by: Jenny <jenny.x.xu@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-23 14:55:52 -07:00
Sabrina Dubroca	db0b99f59a	ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev Currently, ipv6_find_idev returns NULL when ipv6_add_dev fails, ignoring the specific error value. This results in addrconf_add_dev returning ENOBUFS in all cases, which is unfortunate in cases such as: # ip link add dummyX type dummy # ip link set dummyX mtu 1200 up # ip addr add 2000::/64 dev dummyX RTNETLINK answers: No buffer space available Commit `a317a2f19d` ("ipv6: fail early when creating netdev named all or default") introduced error returns in ipv6_add_dev. Before that, that function would simply return NULL for all failures. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-23 14:53:06 -07:00
Xin Long	c7a42eb492	net: ipv6: fix listify ip6_rcv_finish in case of forwarding We need a similar fix for ipv6 as Commit `0761680d52` ("net: ipv4: fix listify ip_rcv_finish in case of forwarding") does for ipv4. This issue can be reprocuded by syzbot since Commit `323ebb61e3` ("net: use listified RX for handling GRO_NORMAL skbs") on net-next. The call trace was: kernel BUG at include/linux/skbuff.h:2225! RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline] RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902 Call Trace: sctp_inq_pop+0x2f1/0xd80 net/sctp/inqueue.c:202 sctp_endpoint_bh_rcv+0x184/0x8d0 net/sctp/endpointola.c:385 sctp_inq_push+0x1e4/0x280 net/sctp/inqueue.c:80 sctp_rcv+0x2807/0x3590 net/sctp/input.c:256 sctp6_rcv+0x17/0x30 net/sctp/ipv6.c:1049 ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447 dst_input include/net/dst.h:442 [inline] ip6_sublist_rcv_finish+0x98/0x1e0 net/ipv6/ip6_input.c:84 ip6_list_rcv_finish net/ipv6/ip6_input.c:118 [inline] ip6_sublist_rcv+0x80c/0xcf0 net/ipv6/ip6_input.c:282 ipv6_list_rcv+0x373/0x4b0 net/ipv6/ip6_input.c:316 __netif_receive_skb_list_ptype net/core/dev.c:5049 [inline] __netif_receive_skb_list_core+0x5fc/0x9d0 net/core/dev.c:5097 __netif_receive_skb_list net/core/dev.c:5149 [inline] netif_receive_skb_list_internal+0x7eb/0xe60 net/core/dev.c:5244 gro_normal_list.part.0+0x1e/0xb0 net/core/dev.c:5757 gro_normal_list net/core/dev.c:5755 [inline] gro_normal_one net/core/dev.c:5769 [inline] napi_frags_finish net/core/dev.c:5782 [inline] napi_gro_frags+0xa6a/0xea0 net/core/dev.c:5855 tun_get_user+0x2e98/0x3fa0 drivers/net/tun.c:1974 tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2020 Fixes: `d8269e2cbf` ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()") Fixes: `323ebb61e3` ("net: use listified RX for handling GRO_NORMAL skbs") Reported-by: syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com Reported-by: syzbot+4a0643a653ac375612d1@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-23 14:42:14 -07:00
Sven Eckelmann	0ff0f15a32	batman-adv: Only read OGM2 tvlv_len after buffer len check Multiple batadv_ogm2_packet can be stored in an skbuff. The functions batadv_v_ogm_send_to_if() uses batadv_v_ogm_aggr_packet() to check if there is another additional batadv_ogm2_packet in the skb or not before they continue processing the packet. The length for such an OGM2 is BATADV_OGM2_HLEN + batadv_ogm2_packet->tvlv_len. The check must first check that at least BATADV_OGM2_HLEN bytes are available before it accesses tvlv_len (which is part of the header. Otherwise it might try read outside of the currently available skbuff to get the content of tvlv_len. Fixes: `9323158ef9` ("batman-adv: OGMv2 - implement originators logic") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2019-08-23 18:20:31 +02:00
Sven Eckelmann	a15d56a607	batman-adv: Only read OGM tvlv_len after buffer len check Multiple batadv_ogm_packet can be stored in an skbuff. The functions batadv_iv_ogm_send_to_if()/batadv_iv_ogm_receive() use batadv_iv_ogm_aggr_packet() to check if there is another additional batadv_ogm_packet in the skb or not before they continue processing the packet. The length for such an OGM is BATADV_OGM_HLEN + batadv_ogm_packet->tvlv_len. The check must first check that at least BATADV_OGM_HLEN bytes are available before it accesses tvlv_len (which is part of the header. Otherwise it might try read outside of the currently available skbuff to get the content of tvlv_len. Fixes: `ef26157747` ("batman-adv: tvlv - basic infrastructure") Reported-by: syzbot+355cab184197dbbfa384@syzkaller.appspotmail.com Signed-off-by: Sven Eckelmann <sven@narfation.org> Acked-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>	2019-08-23 18:20:17 +02:00
Linus Torvalds	4e56394490	Merge tag 'ceph-for-5.3-rc6' of git://github.com/ceph/ceph-client Pull ceph fixes from Ilya Dryomov: "Three important fixes tagged for stable (an indefinite hang, a crash on an assert and a NULL pointer dereference) plus a small series from Luis fixing instances of vfree() under spinlock" * tag 'ceph-for-5.3-rc6' of git://github.com/ceph/ceph-client: libceph: fix PG split vs OSD (re)connect race ceph: don't try fill file_lock on unsuccessful GETFILELOCK reply ceph: clear page dirty before invalidate page ceph: fix buffer free while holding i_ceph_lock in fill_inode() ceph: fix buffer free while holding i_ceph_lock in __ceph_build_xattrs_blob() ceph: fix buffer free while holding i_ceph_lock in __ceph_setxattr() libceph: allow ceph_buffer_put() to receive a NULL ceph_buffer	2019-08-23 09:19:38 -07:00
Ben Wei	6d24e14140	net/ncsi: update response packet length for GCPS/GNS/GNPTS commands Update response packet length for the following commands per NC-SI spec - Get Controller Packet Statistics - Get NC-SI Statistics - Get NC-SI Pass-through Statistics command Signed-off-by: Ben Wei <benwei@fb.com> Reviewed-by: Justin Lee <justin.lee1@dell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-22 19:28:04 -07:00
Justin.Lee1@Dell.com	f6edbf2d61	net/ncsi: Fix the payload copying for the request coming from Netlink The request coming from Netlink should use the OEM generic handler. The standard command handler expects payload in bytes/words/dwords but the actual payload is stored in data if the request is coming from Netlink. Signed-off-by: Justin Lee <justin.lee1@dell.com> Reviewed-by: Vijay Khemka <vijaykhemka@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-22 19:27:02 -07:00
Matthew Wang	7010998c6c	nl80211: add NL80211_CMD_UPDATE_FT_IES to supported commands Add NL80211_CMD_UPDATE_FT_IES to supported commands. In mac80211 drivers, this can be implemented via existing NL80211_CMD_AUTHENTICATE and NL80211_ATTR_IE, but non-mac80211 drivers have a separate command for this. A driver supports FT if it either is mac80211 or supports this command. Signed-off-by: Matthew Wang <matthewmwang@chromium.org> Reviewed-by: Brian Norris <briannorris@chromium.org> Link: https://lore.kernel.org/r/20190822174806.2954-1-matthewmwang@chromium.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-08-22 21:59:02 +02:00
Colin Ian King	b26af93044	mac80211: minstrel_ht: fix infinite loop because supported is not being shifted Currently the for-loop will spin forever if variable supported is non-zero because supported is never changed. Fix this by adding in the missing right shift of supported. Addresses-Coverity: ("Infinite loop") Fixes: `48cb39522a` ("mac80211: minstrel_ht: improve rate probing for devices with static fallback") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20190822122034.28664-1-colin.king@canonical.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-08-22 21:58:18 +02:00
Colin Ian King	c76c992525	nexthops: remove redundant assignment to variable err Variable err is initialized to a value that is never read and it is re-assigned later. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused Value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-22 12:14:05 -07:00
Ingo Molnar	6c06b66e95	Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU and LKMM changes from Paul E. McKenney: - A few more RCU flavor consolidation cleanups. - Miscellaneous fixes. - Updates to RCU's list-traversal macros improving lockdep usability. - Torture-test updates. - Forward-progress improvements for no-CBs CPUs: Avoid ignoring incoming callbacks during grace-period waits. - Forward-progress improvements for no-CBs CPUs: Use ->cblist structure to take advantage of others' grace periods. - Also added a small commit that avoids needlessly inflicting scheduler-clock ticks on callback-offloaded CPUs. - Forward-progress improvements for no-CBs CPUs: Reduce contention on ->nocb_lock guarding ->cblist. - Forward-progress improvements for no-CBs CPUs: Add ->nocb_bypass list to further reduce contention on ->nocb_lock guarding ->cblist. - LKMM updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-08-22 20:52:04 +02:00
Ilya Dryomov	a561372405	libceph: fix PG split vs OSD (re)connect race We can't rely on ->peer_features in calc_target() because it may be called both when the OSD session is established and open and when it's not. ->peer_features is not valid unless the OSD session is open. If this happens on a PG split (pg_num increase), that could mean we don't resend a request that should have been resent, hanging the client indefinitely. In userspace this was fixed by looking at require_osd_release and get_xinfo[osd].features fields of the osdmap. However these fields belong to the OSD section of the osdmap, which the kernel doesn't decode (only the client section is decoded). Instead, let's drop this feature check. It effectively checks for luminous, so only pre-luminous OSDs would be affected in that on a PG split the kernel might resend a request that should not have been resent. Duplicates can occur in other scenarios, so both sides should already be prepared for them: see dup/replay logic on the OSD side and retry_attempt check on the client side. Cc: stable@vger.kernel.org Fixes: `7de030d6b1` ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT") Link: https://tracker.ceph.com/issues/41162 Reported-by: Jerry Lee <leisurelysw24@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Jerry Lee <leisurelysw24@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>	2019-08-22 10:47:41 +02:00
Li RongQing	0f404bbdaf	net: fix icmp_socket_deliver argument 2 input it expects a unsigned int, but got a __be32 Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-21 20:43:22 -07:00
Hangbin Liu	f17f7648a4	ipv6/addrconf: allow adding multicast addr if IFA_F_MCAUTOJOIN is set In commit `93a714d6b5` ("multicast: Extend ip address command to enable multicast group join/leave on") we added a new flag IFA_F_MCAUTOJOIN to make user able to add multicast address on ethernet interface. This works for IPv4, but not for IPv6. See the inet6_addr_add code. static int inet6_addr_add() { ... if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) { ipv6_mc_config(net->ipv6.mc_autojoin_sk, true...) } ifp = ipv6_add_addr(idev, cfg, true, extack); <- always fail with maddr if (!IS_ERR(ifp)) { ... } else if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) { ipv6_mc_config(net->ipv6.mc_autojoin_sk, false...) } } But in ipv6_add_addr() it will check the address type and reject multicast address directly. So this feature is never worked for IPv6. We should not remove the multicast address check totally in ipv6_add_addr(), but could accept multicast address only when IFA_F_MCAUTOJOIN flag supplied. v2: update commit description Fixes: `93a714d6b5` ("multicast: Extend ip address command to enable multicast group join/leave on") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-21 20:39:29 -07:00
David S. Miller	6e2866a9df	Merge tag 'batadv-net-for-davem-20190821' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - fix uninit-value in batadv_netlink_get_ifindex(), by Eric Dumazet ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-21 13:49:20 -07:00
Chuck Lever	435eba4ae0	xprtrdma: Optimize rpcrdma_post_recvs() Micro-optimization: In rpcrdma_post_recvs, since commit `e340c2d6ef` ("xprtrdma: Reduce the doorbell rate (Receive)"), the common case is to return without doing anything. Found with perf. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2019-08-21 15:04:59 -04:00
Chuck Lever	1738de336e	xprtrdma: Inline XDR chunk encoder functions Micro-optimization: Save the cost of three function calls during transport header encoding. These were "noinline" before to generate more meaningful call stacks during debugging, but this code is now pretty stable. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2019-08-21 14:48:41 -04:00
Chuck Lever	17d47f93bc	xprtrdma: Fix bc_max_slots return value For the moment the returned value just happens to be correct because the current backchannel server implementation does not vary the number of credits it offers. The spec does permit this value to change during the lifetime of a connection, however. The actual maximum is fixed for all RPC/RDMA transports, because each transport instance has to pre-allocate the resources for processing BC requests. That's the value that should be returned. Fixes: `7402a4fedc` ("SUNRPC: Fix up backchannel slot table ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2019-08-21 14:35:49 -04:00
Jason Gunthorpe	868df536f5	Merge branch 'odp_fixes' into rdma.git for-next Jason Gunthorpe says: ==================== This is a collection of general cleanups for ODP to clarify some of the flows around umem creation and use of the interval tree. ==================== The branch is based on v5.3-rc5 due to dependencies * odp_fixes: RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr RDMA/mlx5: Use ib_umem_start instead of umem.address RDMA/core: Make invalidate_range a device operation RDMA/odp: Use kvcalloc for the dma_list and page_list RDMA/odp: Check for overflow when computing the umem_odp end RDMA/odp: Provide ib_umem_odp_release() to undo the allocs RDMA/odp: Split creating a umem_odp from ib_umem_get RDMA/odp: Make the three ways to create a umem_odp clear RMDA/odp: Consolidate umem_odp initialization RDMA/odp: Make it clearer when a umem is an implicit ODP umem RDMA/odp: Iterate over the whole rbtree directly RDMA/odp: Use the common interval tree library instead of generic RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-21 14:10:36 -03:00
Chuck Lever	2a7f77c7be	xprtrdma: Clean up xprt_rdma_set_connect_timeout() Clean up: The function name should match the documenting comment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2019-08-21 11:53:42 -04:00
Chuck Lever	b0b227f071	xprtrdma: Use an llist to manage free rpcrdma_reps rpcrdma_rep objects are removed from their free list by only a single thread: the Receive completion handler. Thus that free list can be converted to an llist, where a single-threaded consumer and a multi-threaded producer (rpcrdma_buffer_put) can both access the llist without the need for any serialization. This eliminates spin lock contention between the Receive completion handler and rpcrdma_buffer_get, and makes the rep consumer wait- free. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2019-08-21 11:45:27 -04:00
Chuck Lever	4d6b8890dd	xprtrdma: Remove rpcrdma_buffer::rb_mrlock Clean up: Now that the free list is used sparingly, get rid of the separate spin lock protecting it. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2019-08-21 11:40:00 -04:00
Chuck Lever	6dc6ec9e04	xprtrdma: Cache free MRs in each rpcrdma_req Instead of a globally-contended MR free list, cache MRs in each rpcrdma_req as they are released. This means acquiring and releasing an MR will be lock-free in the common case, even outside the transport send lock. The original idea of per-rpcrdma_req MR free lists was suggested by Shirley Ma <shirley.ma@oracle.com> several years ago. I just now figured out how to make that idea work with on-demand MR allocation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2019-08-21 11:06:24 -04:00

... 85 86 87 88 89 ...

61760 Commits