linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-19 01:23:20 +00:00

Author	SHA1	Message	Date
Stephen Suryaputra	e1ae5c2ea4	vrf: Increment Icmp6InMsgs on the original netdev Get the ingress interface and increment ICMP counters based on that instead of skb->dev when the the dev is a VRF device. This is a follow up on the following message: https://www.spinics.net/lists/netdev/msg560268.html v2: Avoid changing skb->dev since it has unintended effect for local delivery (David Ahern). Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-12 11:00:11 -07:00
Maxime Chevallier	f0d2ca1531	net: ethtool: Allow matching on vlan DEI bit Using ethtool, users can specify a classification action matching on the full vlan tag, which includes the DEI bit (also previously called CFI). However, when converting the ethool_flow_spec to a flow_rule, we use dissector keys to represent the matching patterns. Since the vlan dissector key doesn't include the DEI bit, this information was silently discarded when translating the ethtool flow spec in to a flow_rule. This commit adds the DEI bit into the vlan dissector key, and allows propagating the information to the driver when parsing the ethtool flow spec. Fixes: `eca4205f9e` ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator") Reported-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-12 10:09:56 -07:00
Masanari Iida	bb2e05e0c8	linux-next: DOC: RDS: Fix a typo in rds.txt This patch fixes a spelling typo in rds.txt Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-12 09:56:29 -07:00
Matteo Croce	ec66854c83	mpls: fix af_mpls dependencies for real Randy reported that selecting MPLS_ROUTING without PROC_FS breaks the build, because since commit `c1a9d65954` ("mpls: fix af_mpls dependencies"), MPLS_ROUTING selects PROC_SYSCTL, but Kconfig's select doesn't recursively handle dependencies. Change the select into a dependency. Fixes: `c1a9d65954` ("mpls: fix af_mpls dependencies") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-12 09:42:34 -07:00
David S. Miller	93c65f83f2	Merge branch 'vxlan-geneve-linear' Stefano Brivio says: ==================== Don't assume linear buffers in error handlers for VXLAN and GENEVE Guillaume noticed the same issue fixed by commit `26fc181e6c` ("fou, fou6: do not assume linear skbs") for fou and fou6 is also present in VXLAN and GENEVE error handlers: we can't assume linear buffers there, we need to use pskb_may_pull() instead. ==================== Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-11 12:08:05 -07:00
Stefano Brivio	eccc73a6b2	geneve: Don't assume linear buffers in error handler In commit `a07966447f` ("geneve: ICMP error lookup handler") I wrongly assumed buffers from icmp_socket_deliver() would be linear. This is not the case: icmp_socket_deliver() only guarantees we have 8 bytes of linear data. Eric fixed this same issue for fou and fou6 in commits `26fc181e6c` ("fou, fou6: do not assume linear skbs") and `5355ed6388` ("fou, fou6: avoid uninit-value in gue_err() and gue6_err()"). Use pskb_may_pull() instead of checking skb->len, and take into account the fact we later access the GENEVE header with udp_hdr(), so we also need to sum skb_transport_header() here. Reported-by: Guillaume Nault <gnault@redhat.com> Fixes: `a07966447f` ("geneve: ICMP error lookup handler") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-11 12:07:33 -07:00
Stefano Brivio	8399a6930d	vxlan: Don't assume linear buffers in error handler In commit `c3a43b9fec` ("vxlan: ICMP error lookup handler") I wrongly assumed buffers from icmp_socket_deliver() would be linear. This is not the case: icmp_socket_deliver() only guarantees we have 8 bytes of linear data. Eric fixed this same issue for fou and fou6 in commits `26fc181e6c` ("fou, fou6: do not assume linear skbs") and `5355ed6388` ("fou, fou6: avoid uninit-value in gue_err() and gue6_err()"). Use pskb_may_pull() instead of checking skb->len, and take into account the fact we later access the VXLAN header with udp_hdr(), so we also need to sum skb_transport_header() here. Reported-by: Guillaume Nault <gnault@redhat.com> Fixes: `c3a43b9fec` ("vxlan: ICMP error lookup handler") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-11 12:07:33 -07:00
Taehee Yoo	309b66970e	net: openvswitch: do not free vport if register_netdevice() is failed. In order to create an internal vport, internal_dev_create() is used and that calls register_netdevice() internally. If register_netdevice() fails, it calls dev->priv_destructor() to free private data of netdev. actually, a private data of this is a vport. Hence internal_dev_create() should not free and use a vport after failure of register_netdevice(). Test command ovs-dpctl add-dp bonding_masters Splat looks like: [ 1035.667767] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 1035.675958] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 1035.676916] CPU: 1 PID: 1028 Comm: ovs-vswitchd Tainted: G B 5.2.0-rc3+ #240 [ 1035.676916] RIP: 0010:internal_dev_create+0x2e5/0x4e0 [openvswitch] [ 1035.676916] Code: 48 c1 ea 03 80 3c 02 00 0f 85 9f 01 00 00 4c 8b 23 48 b8 00 00 00 00 00 fc ff df 49 8d bc 24 60 05 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 86 01 00 00 49 8b bc 24 60 05 00 00 e8 e4 68 f4 [ 1035.713720] RSP: 0018:ffff88810dcb7578 EFLAGS: 00010206 [ 1035.713720] RAX: dffffc0000000000 RBX: ffff88810d13fe08 RCX: ffffffff84297704 [ 1035.713720] RDX: 00000000000000ac RSI: 0000000000000000 RDI: 0000000000000560 [ 1035.713720] RBP: 00000000ffffffef R08: fffffbfff0d3b881 R09: fffffbfff0d3b881 [ 1035.713720] R10: 0000000000000001 R11: fffffbfff0d3b880 R12: 0000000000000000 [ 1035.768776] R13: 0000607ee460b900 R14: ffff88810dcb7690 R15: ffff88810dcb7698 [ 1035.777709] FS: 00007f02095fc980(0000) GS:ffff88811b400000(0000) knlGS:0000000000000000 [ 1035.777709] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1035.777709] CR2: 00007ffdf01d2f28 CR3: 0000000108258000 CR4: 00000000001006e0 [ 1035.777709] Call Trace: [ 1035.777709] ovs_vport_add+0x267/0x4f0 [openvswitch] [ 1035.777709] new_vport+0x15/0x1e0 [openvswitch] [ 1035.777709] ovs_vport_cmd_new+0x567/0xd10 [openvswitch] [ 1035.777709] ? ovs_dp_cmd_dump+0x490/0x490 [openvswitch] [ 1035.777709] ? __kmalloc+0x131/0x2e0 [ 1035.777709] ? genl_family_rcv_msg+0xa54/0x1030 [ 1035.777709] genl_family_rcv_msg+0x63a/0x1030 [ 1035.777709] ? genl_unregister_family+0x630/0x630 [ 1035.841681] ? debug_show_all_locks+0x2d0/0x2d0 [ ... ] Fixes: `cf124db566` ("net: Fix inconsistent teardown and release of private netdev state.") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-11 11:54:01 -07:00
Willem de Bruijn	522924b583	net: correct udp zerocopy refcnt also when zerocopy only on append The below patch fixes an incorrect zerocopy refcnt increment when appending with MSG_MORE to an existing zerocopy udp skb. send(.., MSG_ZEROCOPY \| MSG_MORE); // refcnt 1 send(.., MSG_ZEROCOPY \| MSG_MORE); // refcnt still 1 (bar frags) But it missed that zerocopy need not be passed at the first send. The right test whether the uarg is newly allocated and thus has extra refcnt 1 is not !skb, but !skb_zcopy. send(.., MSG_MORE); // <no uarg> send(.., MSG_ZEROCOPY); // refcnt 1 Fixes: `100f6d8e09` ("net: correct zerocopy refcnt with udp MSG_MORE") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-11 11:40:54 -07:00
John Hurley	dce5ccccd1	nfp: ensure skb network header is set for packet redirect Packets received at the NFP driver may be redirected to egress of another netdev (e.g. in the case of OvS internal ports). On the egress path, some processes, like TC egress hooks, may expect the network header offset field in the skb to be correctly set. If this is not the case there is potential for abnormal behaviour and even the triggering of BUG() calls. Set the skb network header field before the mac header pull when doing a packet redirect. Fixes: `27f54b5825` ("nfp: allow fallback packets from non-reprs") Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-09 20:08:09 -07:00
Yuchung Cheng	fcc2202a9d	tcp: fix undo spurious SYNACK in passive Fast Open Commit `794200d662` ("tcp: undo cwnd on Fast Open spurious SYNACK retransmit") may cause tcp_fastretrans_alert() to warn about pending retransmission in Open state. This is triggered when the Fast Open server both sends data and has spurious SYNACK retransmission during the handshake, and the data packets were lost or reordered. The root cause is a bit complicated: (1) Upon receiving SYN-data: a full socket is created with snd_una = ISN + 1 by tcp_create_openreq_child() (2) On SYNACK timeout the server/sender enters CA_Loss state. (3) Upon receiving the final ACK to complete the handshake, sender does not mark FLAG_SND_UNA_ADVANCED since (1) Sender then calls tcp_process_loss since state is CA_loss by (2) (4) tcp_process_loss() does not invoke undo operations but instead mark REXMIT_LOST to force retransmission (5) tcp_rcv_synrecv_state_fastopen() calls tcp_try_undo_loss(). It changes state to CA_Open but has positive tp->retrans_out (6) Next ACK triggers the WARN_ON in tcp_fastretrans_alert() The step that goes wrong is (4) where the undo operation should have been invoked because the ACK successfully acknowledged the SYN sequence. This fixes that by specifically checking undo when the SYN-ACK sequence is acknowledged. Then after tcp_process_loss() the state would be further adjusted based in tcp_fastretrans_alert() to avoid triggering the warning in (6). Fixes: `794200d662` ("tcp: undo cwnd on Fast Open spurious SYNACK retransmit") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-09 20:04:11 -07:00
Matteo Croce	c1a9d65954	mpls: fix af_mpls dependencies MPLS routing code relies on sysctl to work, so let it select PROC_SYSCTL. Reported-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-09 19:57:24 -07:00
David S. Miller	7f0b44a42e	Merge branch 'ibmvnic-Fixes-for-device-reset-handling' Thomas Falcon says: ==================== ibmvnic: Fixes for device reset handling This series contains three unrelated fixes to issues seen during device resets. The first patch fixes an error when the driver requests to deactivate the link of an uninitialized device, resulting in a failure to reset. Next, a patch to fix multicast transmission failures seen after a driver reset. The final patch fixes mishandling of memory allocation failures during device initialization, which caused a kernel oops. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-09 19:51:28 -07:00
Thomas Falcon	7c940b1a52	ibmvnic: Fix unchecked return codes of memory allocations The return values for these memory allocations are unchecked, which may cause an oops if the driver does not handle them after a failure. Fix by checking the function's return code. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-09 19:51:28 -07:00
Thomas Falcon	be32a24372	ibmvnic: Refresh device multicast list after reset It was observed that multicast packets were no longer received after a device reset. The fix is to resend the current multicast list to the backing device after recovery. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-09 19:51:28 -07:00
Thomas Falcon	1f94608b0c	ibmvnic: Do not close unopened driver during reset Check driver state before halting it during a reset. If the driver is not running, do nothing. Otherwise, a request to deactivate a down link can cause an error and the reset will fail. Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-09 19:51:28 -07:00
David S. Miller	4172eadb08	mlx5-fixes-2019-06-07 -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAlz62dYACgkQSD+KveBX +j6zVggAw6zCNNIoLES7++v6utSmZ+3Vs9Qo420OiHrPJ/IgzWmKRhMbKUh1OZQW 88Clc1hJAkGlRvoj9fhUEgLebHtUllyf4NKwcaufbwWWlt3ihi2TFi8IgzB+MNkn ak8b0zx3/RsEA1Ee2APFPuAc/ybER0lcdHucAqVA8GtfMaHoE6LmRTGeXIXo0tQ5 Rp8O499B5qTUm0jd5j3UnJr8YBUVio0bSZk9TbOEvDGokq+U+bslyxcfjGiMCV41 +jfqfr5rjiW/75i7UNS5w2D320UTSUqccefc0zBBTu4/t2ySRllLW5/GGE1/Bjqp UihNI2R63fxoYkg/bYCbF1T2QRaJnw== =JTmU -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2019-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-06-07 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. For -stable v4.17 ('net/mlx5: Avoid reloading already removed devices') For -stable v5.0 ('net/mlx5e: Avoid detaching non-existing netdev under switchdev mode') For -stable v5.1 ('net/mlx5e: Fix source port matching in fdb peer flow rule') ('net/mlx5e: Support tagged tunnel over bond') ('net/mlx5e: Add ndo_set_feature for uplink representor') ('net/mlx5: Update pci error handler entries and command translation') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-09 19:45:54 -07:00
David S. Miller	62f42a114b	linux-can-fixes-for-5.2-20190607 -----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEmvEkXzgOfc881GuFWsYho5HknSAFAlz60fwTHG1rbEBwZW5n dXRyb25peC5kZQAKCRBaxiGjkeSdILI8B/41j6UxwMSYMetM/Vw2AyqPt0z671Gu mAVzDK3PZF5WIzD5JsDlUhwN1dqvfvAZeSS1tQwQ18ZpQLkRSxlAKhkLLlxBVKpJ 0uDwYuFWwXF8DbdlRwZq+Db0GGAXW5+8WtA4gp9GTiIP6kTgmqnnaZvWPnQzQmKJ RCY8YAzF52jXa4hBfLsXiG7NoO6cZHQJMFKLsQEpHV+GMqTcAYJHzBP6YVVXC8PG 495TREmTq5lvRKc2U3QS6//yPWleFYr5ViW/psEcBPCGdruyJI8LjDRaqnGWRBJX vPyEVSFHtQZu49Vue3g4jbjhVr8OWTV/MsFgE/zK5Ahe+2G5969pID3n =iCgZ -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-5.2-20190607' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2019-06-07 this is a pull reqeust of 9 patches for net/master. The first patch is by Alexander Dahl and removes a duplicate menu entry from the Kconfig. The next patch by Joakim Zhang fixes the timeout in the flexcan driver when setting small bit rates. Anssi Hannula's patch for the xilinx_can driver fixes the bittiming_const for CAN FD core. The two patches by Sean Nyekjaer bring mcp25625 to the existing mcp251x driver. The patch by Eugen Hristev implements an errata for the m_can driver. YueHaibing's patch fixes the error handling ing can_init(). The patch by Fabio Estevam for the flexcan driver removes an unneeded registration message during flexcan_probe(). And the last patch is by Willem de Bruijn and adds the missing purging the socket error queue on sock destruct. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-09 19:44:01 -07:00
George Wilkie	2f3f7d1fa0	mpls: fix warning with multi-label encap If you configure a route with multiple labels, e.g. ip route add 10.10.3.0/24 encap mpls 16/100 via 10.10.2.2 dev ens4 A warning is logged: kernel: [ 130.561819] netlink: 'ip': attribute type 1 has an invalid length. This happens because mpls_iptunnel_policy has set the type of MPLS_IPTUNNEL_DST to fixed size NLA_U32. Change it to a minimum size. nla_get_labels() does the remaining validation. Fixes: `e3e4712ec0` ("mpls: ip tunnel support") Signed-off-by: George Wilkie <gwilkie@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-09 13:26:34 -07:00
Michael Schmitz	a9520543b1	net: phy: rename Asix Electronics PHY driver [Resent to net instead of net-next - may clash with Anders Roxell's patch series addressing duplicate module names] Commit `31dd83b966` ("net-next: phy: new Asix Electronics PHY driver") introduced a new PHY driver drivers/net/phy/asix.c that causes a module name conflict with a pre-existiting driver (drivers/net/usb/asix.c). The PHY driver is used by the X-Surf 100 ethernet card driver, and loaded by that driver via its PHY ID. A rename of the driver looks unproblematic. Rename PHY driver to ax88796b.c in order to resolve name conflict. Signed-off-by: Michael Schmitz <schmitzmic@gmail.com> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Fixes: `31dd83b966` ("net-next: phy: new Asix Electronics PHY driver") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-09 13:24:17 -07:00
Eric Dumazet	65a3c497c0	ipv6: flowlabel: fl6_sock_lookup() must use atomic_inc_not_zero Before taking a refcount, make sure the object is not already scheduled for deletion. Same fix is needed in ipv6_flowlabel_opt() Fixes: `18367681a1` ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-09 13:07:14 -07:00
Enrico Weigelt	c3fee640bc	net: ipv4: fib_semantics: fix uninitialized variable fix an uninitialized variable: CC net/ipv4/fib_semantics.o net/ipv4/fib_semantics.c: In function 'fib_check_nh_v4_gw': net/ipv4/fib_semantics.c:1027:12: warning: 'err' may be used uninitialized in this function [-Wmaybe-uninitialized] if (!tbl \|\| err) { ^~ Signed-off-by: Enrico Weigelt <info@metux.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-09 12:47:30 -07:00
David S. Miller	38e406f600	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2019-06-07 The following pull-request contains BPF updates for your net tree. The main changes are: 1) Fix several bugs in riscv64 JIT code emission which forgot to clear high 32-bits for alu32 ops, from Björn and Luke with selftests covering all relevant BPF alu ops from Björn and Jiong. 2) Two fixes for UDP BPF reuseport that avoid calling the program in case of __udp6_lib_err and UDP GRO which broke reuseport_select_sock() assumption that skb->data is pointing to transport header, from Martin. 3) Two fixes for BPF sockmap: a use-after-free from sleep in psock's backlog workqueue, and a missing restore of sk_write_space when psock gets dropped, from Jakub and John. 4) Fix unconnected UDP sendmsg hook API which is insufficient as-is since it breaks standard applications like DNS if reverse NAT is not performed upon receive, from Daniel. 5) Fix an out-of-bounds read in __bpf_skc_lookup which in case of AF_INET6 fails to verify that the length of the tuple is long enough, from Lorenz. 6) Fix libbpf's libbpf__probe_raw_btf to return an fd instead of 0/1 (for {un,}successful probe) as that is expected to be propagated as an fd to load_sk_storage_btf() and thus closing the wrong descriptor otherwise, from Michal. 7) Fix bpftool's JSON output for the case when a lookup fails, from Krzesimir. 8) Minor misc fixes in docs, samples and selftests, from various others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-07 14:46:47 -07:00
Eli Britstein	45e7d4c0c1	net/mlx5e: Support tagged tunnel over bond Stacked devices like bond interface may have a VLAN device on top of them. Detect lag state correctly under this condition, and return the correct routed net device, according to it the encap header is built. Fixes: `e32ee6c78e` ("net/mlx5e: Support tunnel encap over tagged Ethernet") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-07 14:40:37 -07:00
Alaa Hleihel	47c9d2c99d	net/mlx5e: Avoid detaching non-existing netdev under switchdev mode After introducing dedicated uplink representor, the netdev instance set over the esw manager vport (PF) became no longer in use, so it was removed in the cited commit once we're on switchdev mode. However, the mlx5e_detach function was not updated accordingly, and it still tries to detach a non-existing netdev, causing a kernel crash. This patch fixes this issue. Fixes: `aec002f6f8` ("net/mlx5e: Uninstantiate esw manager vport netdev on switchdev mode") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-07 14:40:37 -07:00
Raed Salem	b83c073016	net/mlx5e: Fix source port matching in fdb peer flow rule The cited commit changed the initialization placement of the eswitch attributes so it is done prior to parse tc actions function call, including among others the in_rep and in_mdev fields which are mistakenly reassigned inside the parse actions function. This breaks the source port matching criteria of the peer redirect rule. Fix by removing the now redundant reassignment of the already initialized fields. Fixes: `988ab9c736` ("net/mlx5e: Introduce mlx5e_flow_esw_attr_init() helper") Signed-off-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-07 14:40:37 -07:00
Shay Agroskin	57c70d8740	net/mlx5e: Replace reciprocal_scale in TX select queue function The TX queue index returned by the fallback function ranges between [0,NUM CHANNELS - 1] if QoS isn't set and [0, (NUM CHANNELS)(NUM TCs) -1] otherwise. Our HW uses different TC mapping than the fallback function (which is denoted as 'up', user priority) so we only need to extract a channel number out of the returned value. Since (NUM CHANNELS)(NUM TCs) is a relatively small number, using reciprocal scale almost always returns zero. We instead access the 'txq2sq' table to extract the sq (and with it the channel number) associated with the tx queue, thus getting a more evenly distributed channel number. Perf: Rx/Tx side with Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz and ConnectX-5. Used 'iperf' UDP traffic, 10 threads, and priority 5. Before: 0.566Mpps After: 2.37Mpps As expected, releasing the existing bottleneck of steering all traffic to TX queue zero significantly improves transmission rates. Fixes: `7ccdd0841b` ("net/mlx5e: Fix select queue callback") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-07 14:40:37 -07:00
Chris Mi	d3cbd4254d	net/mlx5e: Add ndo_set_feature for uplink representor After we have a dedicated uplink representor, the new netdev ops doesn't support ndo_set_feature. Because of that, we can't change some features, eg. rxvlan. Now add it back. In this patch, I also do a cleanup for the features flag handling, eg. remove duplicate NETIF_F_HW_TC flag setting. Fixes: `aec002f6f8` ("net/mlx5e: Uninstantiate esw manager vport netdev on switchdev mode") Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-07 14:40:37 -07:00
Alaa Hleihel	dd80857bf3	net/mlx5: Avoid reloading already removed devices Prior to reloading a device we must first verify that it was not already removed. Otherwise, the attempt to remove the device will do nothing, and in that case we will end up proceeding with adding an new device that no one was expecting to remove, leaving behind used resources such as EQs that causes a failure to destroy comp EQs and syndrome (0x30f433). Fix that by making sure that we try to remove and add a device (based on a protocol) only if the device is already added. Fixes: `c5447c7059` ("net/mlx5: E-Switch, Reload IB interface when switching devlink modes") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-07 14:40:36 -07:00
Edward Srouji	6a6fabbfa3	net/mlx5: Update pci error handler entries and command translation Add missing entries for create/destroy UCTX and UMEM commands. This could get us wrong "unknown FW command" error in flows where we unbind the device or reset the driver. Also the translation of these commands from opcodes to string was missing. Fixes: `6e3722baac` ("IB/mlx5: Use the correct commands for UMEM and UCTX allocation") Signed-off-by: Edward Srouji <edwards@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-06-07 14:40:36 -07:00
Willem de Bruijn	fd704bd5ee	can: purge socket error queue on sock destruct CAN supports software tx timestamps as of the below commit. Purge any queued timestamp packets on socket destroy. Fixes: `51f31cabe3` ("ip: support for TX timestamps on UDP and RAW sockets") Reported-by: syzbot+a90604060cb40f5bdd16@syzkaller.appspotmail.com Signed-off-by: Willem de Bruijn <willemb@google.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-06-07 23:03:54 +02:00
Fabio Estevam	eb503004a7	can: flexcan: Remove unneeded registration message Currently the following message is observed when the flexcan driver is probed: flexcan 2090000.flexcan: device registered (reg_base=(ptrval), irq=23) The reason for printing 'ptrval' is explained at Documentation/core-api/printk-formats.rst: "Pointers printed without a specifier extension (i.e unadorned %p) are hashed to prevent leaking information about the kernel memory layout. This has the added benefit of providing a unique identifier. On 64-bit machines the first 32 bits are zeroed. The kernel will print ``(ptrval)`` until it gathers enough entropy." Instead of passing %pK, which can print the correct address, simply remove the entire message as it is not really that useful. Signed-off-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-06-07 23:03:54 +02:00
YueHaibing	c5a3aed1cd	can: af_can: Fix error path of can_init() This patch add error path for can_init() to avoid possible crash if some error occurs. Fixes: `0d66548a10` ("[CAN]: Add PF_CAN core module") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-06-07 23:03:54 +02:00
Eugen Hristev	3e82f2f34c	can: m_can: implement errata "Needless activation of MRAF irq" During frame reception while the MCAN is in Error Passive state and the Receive Error Counter has thevalue MCAN_ECR.REC = 127, it may happen that MCAN_IR.MRAF is set although there was no Message RAM access failure. If MCAN_IR.MRAF is enabled, an interrupt to the Host CPU is generated. Work around: The Message RAM Access Failure interrupt routine needs to check whether MCAN_ECR.RP = '1' and MCAN_ECR.REC = '127'. In this case, reset MCAN_IR.MRAF. No further action is required. This affects versions older than 3.2.0 Errata explained on Sama5d2 SoC which includes this hardware block: http://ww1.microchip.com/downloads/en/DeviceDoc/SAMA5D2-Family-Silicon-Errata-and-Data-Sheet-Clarification-DS80000803B.pdf chapter 6.2 Reproducibility: If 2 devices with m_can are connected back to back, configuring different bitrate on them will lead to interrupt storm on the receiving side, with error "Message RAM access failure occurred". Another way is to have a bad hardware connection. Bad wire connection can lead to this issue as well. This patch fixes the issue according to provided workaround. Signed-off-by: Eugen Hristev <eugen.hristev@microchip.com> Reviewed-by: Ludovic Desroches <ludovic.desroches@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-06-07 23:03:54 +02:00
Sean Nyekjaer	35b7fa4d07	can: mcp251x: add support for mcp25625 Fully compatible with mcp2515, the mcp25625 have integrated transceiver. This patch adds support for the mcp25625 to the existing mcp251x driver. Signed-off-by: Sean Nyekjaer <sean@geanix.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-06-07 23:03:53 +02:00
Sean Nyekjaer	0df82dcd55	dt-bindings: can: mcp251x: add mcp25625 support Fully compatible with mcp2515, the mcp25625 have integrated transceiver. This patch add the mcp25625 to the device tree bindings documentation. Signed-off-by: Sean Nyekjaer <sean@geanix.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-06-07 23:03:53 +02:00
Anssi Hannula	904044dd8f	can: xilinx_can: use correct bittiming_const for CAN FD core Commit `9e5f1b273e` ("can: xilinx_can: add support for Xilinx CAN FD core") added a new can_bittiming_const structure for CAN FD cores that support larger values for tseg1, tseg2, and sjw than previous Xilinx CAN cores, but the commit did not actually take that into use. Fix that. Tested with CAN FD core on a ZynqMP board. Fixes: `9e5f1b273e` ("can: xilinx_can: add support for Xilinx CAN FD core") Reported-by: Shubhrajyoti Datta <shubhrajyoti.datta@gmail.com> Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi> Cc: Michal Simek <michal.simek@xilinx.com> Reviewed-by: Shubhrajyoti Datta <shubhrajyoti.datta@gmail.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-06-07 23:03:53 +02:00
Joakim Zhang	247e5356a7	can: flexcan: fix timeout when set small bitrate Current we can meet timeout issue when setting a small bitrate like 10000 as follows on i.MX6UL EVK board (ipg clock = 66MHZ, per clock = 30MHZ): \| root@imx6ul7d:~# ip link set can0 up type can bitrate 10000 A link change request failed with some changes committed already. Interface can0 may have been left with an inconsistent configuration, please check. \| RTNETLINK answers: Connection timed out It is caused by calling of flexcan_chip_unfreeze() timeout. Originally the code is using usleep_range(10, 20) for unfreeze operation, but the patch (`8badd65` can: flexcan: avoid calling usleep_range from interrupt context) changed it into udelay(10) which is only a half delay of before, there're also some other delay changes. After double to FLEXCAN_TIMEOUT_US to 100 can fix the issue. Meanwhile, Rasmus Villemoes reported that even with a timeout of 100, flexcan_probe() fails on the MPC8309, which requires a value of at least 140 to work reliably. 250 works for everyone. Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-06-07 23:03:53 +02:00
Alexander Dahl	0ed89d777d	can: usb: Kconfig: Remove duplicate menu entry This seems to have slipped in by accident when sorting the entries. Fixes: `ffbdd9172e` Signed-off-by: Alexander Dahl <ada@thorsis.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2019-06-07 23:03:53 +02:00
David S. Miller	c7e3c93abb	wireless-drivers fixes for 5.2 First set of fixes for 5.2. Most important here are buffer overflow fixes for mwifiex. rtw88 * fix out of bounds compiler warning * fix rssi handling to get 4x more throughput * avoid circular locking rsi * fix unitilised data warning, these are hopefully the last ones so that the warning can be enabled by default mwifiex * fix buffer overflows iwlwifi * remove not used debugfs file * various fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJc+iptAAoJEG4XJFUm622b+cQH/R/k3d55ldIADljn8P1N2JEb Ogc5weW2DkUgfWdeKpJ1rp7JLaRPypmY4GipA7930SEEk8Ac46WShErH/6TLxEZD 1TGDV/qTtb4QJAZzdd/bs1J8IwaCUK1BolJfcpe5cdE/4/owe2pxiYbvFxAUCGRD IRK/FJ/ohiv6SB1zUc4ATQ0HWI2OQjiiA4O6WctUSPdHHME7u4/7/MfhD2JBvjBW 454uLf8g2QhMWPdSw/PIXcND+FwgfrS7gCO7pZz4FUy7y6062LLaM+wcgmBxEN9t SfaH1Ur14Hb0nw4QwYAvY2loJk15iowm7jvuAfqZP9vfpxdt4LYw4KrV/Y7FU8E= =N7kc -----END PGP SIGNATURE----- Merge tag 'wireless-drivers-for-davem-2019-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 5.2 First set of fixes for 5.2. Most important here are buffer overflow fixes for mwifiex. rtw88 * fix out of bounds compiler warning * fix rssi handling to get 4x more throughput * avoid circular locking rsi * fix unitilised data warning, these are hopefully the last ones so that the warning can be enabled by default mwifiex * fix buffer overflows iwlwifi * remove not used debugfs file * various fixes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-07 12:16:26 -07:00
Linus Torvalds	1e1d926369	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Free AF_PACKET po->rollover properly, from Willem de Bruijn. 2) Read SFP eeprom in max 16 byte increments to avoid problems with some SFP modules, from Russell King. 3) Fix UDP socket lookup wrt. VRF, from Tim Beale. 4) Handle route invalidation properly in s390 qeth driver, from Julian Wiedmann. 5) Memory leak on unload in RDS, from Zhu Yanjun. 6) sctp_process_init leak, from Neil HOrman. 7) Fix fib_rules rule insertion semantic change that broke Android, from Hangbin Liu. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits) pktgen: do not sleep with the thread lock held. net: mvpp2: Use strscpy to handle stat strings net: rds: fix memory leak in rds_ib_flush_mr_pool ipv6: fix EFAULT on sendto with icmpv6 and hdrincl ipv6: use READ_ONCE() for inet->hdrincl as in ipv4 Revert "fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied" net: aquantia: fix wol configuration not applied sometimes ethtool: fix potential userspace buffer overflow Fix memory leak in sctp_process_init net: rds: fix memory leak when unload rds_rdma ipv6: fix the check before getting the cookie in rt6_get_cookie ipv4: not do cache for local delivery if bc_forwarding is enabled s390/qeth: handle error when updating TX queue count s390/qeth: fix VLAN attribute in bridge_hostnotify udev event s390/qeth: check dst entry before use s390/qeth: handle limited IPv4 broadcast in L3 TX path net: fix indirect calls helpers for ptype list hooks. net: ipvlan: Fix ipvlan device tso disabled while NETIF_F_IP_CSUM is set udp: only choose unbound UDP socket for multicast when not in a VRF net/tls: replace the sleeping lock around RX resync with a bit lock ...	2019-06-07 09:29:14 -07:00
Linus Torvalds	6e38335dcc	5.2 First rc pull request The usual driver bug fixes and fixes for a couple of regressions introduced in 5.2: - Fix a race on bootup with RDMA device renaming and srp. SRP also needs to rename its internal sys files - Fix a memory leak in hns - Don't leak resources in efa on certain error unwinds - Don't panic in certain error unwinds in ib_register_device - Various small user visible bug fix patches for the hfi and efa drivers - Fix the 32 bit compilation break -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlz5c5oACgkQOG33FX4g mxpEEhAAk64phaKUih9KVT0MpC1zezB1C0EGKg45GKuMOFUFJQ5tZ0g4s6aDEbG3 374ZE9h82HMgKn4tQ95110AKvCI+VAbbKOS7kzk1rLWE1ruJxk5DNsvp1v5/S3FE GXBSws+HtZtdiRAMTYyEOfz0MqpvghFg0vor4PugrmOuqIe2a0bkYPEzYPjYbaNH jSctd/q4s/o02n6gfbCrFpXsW0Va3OIaDX5a+Fx5+lWW+GPr/Uzk/3kN95mFbDRp XsCE80V+n3ceKSQUp0lYtxU3tm2mT1JpiiZjXuKyjRV8IMUS+xkdJ8scEz0upGcg +Jr74mN/xKT3toHaMv7fZ3RGlYgFsSsZcAApm6LrIlTNQXKjJ8hl+2BWdi4nRfYZ X89RRWEl3j8i6URu65iH7y7IlfFEhjJGmATUQFdrfECR9hBMJ8VHzBfcz7aYgoac Ggi+2Vjm7GQlr9mzW/phXb25PWqP5yVTW6/3BUtMs3oY7kd6vE2n9XzGIy13uBpX fzY/tnIMrgZMjphYPPbBAbwl+tBKZCu4k6lpP7cLsVsIwY0NIWS26JCnCdO0efqR SnAUPjoAV7nkpG3mMO9Qv7h7yar3HrG7ED15hfmB4VowRNQMfDoTLc8jVWDvGk4/ aFBSH8dEjszZ5tMO9HL+RXnvpkRcDyQpfVQJttY5adZFQlUOd+0= =RmxY -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes from Jason Gunthorpe: "Things are looking pretty quiet here in RDMA, not too many bug fixes rolling in right now. The usual driver bug fixes and fixes for a couple of regressions introduced in 5.2: - Fix a race on bootup with RDMA device renaming and srp. SRP also needs to rename its internal sys files - Fix a memory leak in hns - Don't leak resources in efa on certain error unwinds - Don't panic in certain error unwinds in ib_register_device - Various small user visible bug fix patches for the hfi and efa drivers - Fix the 32 bit compilation break" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/efa: Remove MAYEXEC flag check from mmap flow mlx5: avoid 64-bit division IB/hfi1: Validate page aligned for a given virtual address IB/{qib, hfi1, rdmavt}: Correct ibv_devinfo max_mr value IB/hfi1: Insure freeze_work work_struct is canceled on shutdown IB/rdmavt: Fix alloc_qpn() WARN_ON() RDMA/core: Fix panic when port_data isn't initialized RDMA/uverbs: Pass udata on uverbs error unwind RDMA/core: Clear out the udata before error unwind RDMA/hns: Fix PD memory leak for internal allocation RDMA/srp: Rename SRP sysfs name after IB device rename trigger	2019-06-07 09:25:27 -07:00
Linus Torvalds	a02a532c2a	arm64 fixes for -rc4 - Fix boot crash on platforms with SVE2 due to missing register encoding - Fix architected timer accessors when CONFIG_OPTIMIZE_INLINING=y - Move cpu_logical_map into smp.h for use by upcoming irqchip drivers - Trivial typo fix in comment - Disable some useless, noisy warnings from GCC 9 -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlz6QlIACgkQt6xw3ITB YzQJRAf/e2PwVOHLwUIbPg//rKKNlJmBMnIUROEiAPjY7atTfjuYd3/65pIq7ZuO RjMIUT1A2kg+pMOFzmXObLICq3Xl1/7LUUPIQ1iDvEeWIRb7HKQXoJkg9lvUEy89 T4sR1EBkK7uYh0w+/L7k1LESGgl4+VFY/ZY+4NmwsXEK4jfty3by7zE7Vy35MvQ6 XuEbYuMNjfskgBGbZPVqV8qHUlRurfhWWXjdgdAe9E7+fsHPuOrIr4+uXwAyVtRR 1/4tLySNW2GE8o0Ftun82ZasFTTgPc18ozASBYX1FMlQRaXznNREmSHYaIiartJE VorEdRe25ztdI54fVM2dL0KKmTZi1g== =ErME -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "Another round of mostly-benign fixes, the exception being a boot crash on SVE2-capable CPUs (although I don't know where you'd find such a thing, so maybe it's benign too). We're in the process of resolving some big-endian ptrace breakage, so I'll probably have some more for you next week. Summary: - Fix boot crash on platforms with SVE2 due to missing register encoding - Fix architected timer accessors when CONFIG_OPTIMIZE_INLINING=y - Move cpu_logical_map into smp.h for use by upcoming irqchip drivers - Trivial typo fix in comment - Disable some useless, noisy warnings from GCC 9" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Silence gcc warnings about arch ABI drift ARM64: trivial: s/TIF_SECOMP/TIF_SECCOMP/ comment typo fix arm64: arch_timer: mark functions as __always_inline arm64: smp: Moved cpu_logical_map[] to smp.h arm64: cpufeature: Fix missing ZFR0 in __read_sysreg_by_encoding()	2019-06-07 09:21:48 -07:00
Alexei Starovoitov	4aeba32801	Merge branch 'fix-unconnected-udp' Daniel Borkmann says: ==================== Please refer to the patch 1/6 as the main patch with the details on the current sendmsg hook API limitations and proposal to fix it in order to work with basic applications like DNS. Remaining patches are the usual uapi and tooling updates as well as test cases. Thanks a lot! v2 -> v3: - Add attach types to test_section_names.c and libbpf (Andrey) - Added given Acks, rest as-is v1 -> v2: - Split off uapi header sync and bpftool bits (Martin, Alexei) - Added missing bpftool doc and bash completion as well ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-06-06 16:53:13 -07:00
Daniel Borkmann	b714560f7b	bpf: expand section tests for test_section_names Add cgroup/recvmsg{4,6} to test_section_names as well. Test run output: # ./test_section_names libbpf: failed to guess program type based on ELF section name 'InvAliD' libbpf: supported section(type) names are: [...] libbpf: failed to guess attach type based on ELF section name 'InvAliD' libbpf: attachable section(type) names are: [...] libbpf: failed to guess program type based on ELF section name 'cgroup' libbpf: supported section(type) names are: [...] libbpf: failed to guess attach type based on ELF section name 'cgroup' libbpf: attachable section(type) names are: [...] Summary: 38 PASSED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-06-06 16:53:12 -07:00
Daniel Borkmann	1812291e76	bpf: more msg_name rewrite tests to test_sock_addr Extend test_sock_addr for recvmsg test cases, bigger parts of the sendmsg code can be reused for this. Below are the strace view of the recvmsg rewrites; the sendmsg side does not have a BPF prog connected to it for the context of this test: IPv4 test case: [pid 4846] bpf(BPF_PROG_ATTACH, {target_fd=3, attach_bpf_fd=4, attach_type=0x13 /* BPF_??? /, attach_flags=BPF_F_ALLOW_OVERRIDE}, 112) = 0 [pid 4846] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 5 [pid 4846] bind(5, {sa_family=AF_INET, sin_port=htons(4444), sin_addr=inet_addr("127.0.0.1")}, 128) = 0 [pid 4846] socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 6 [pid 4846] sendmsg(6, {msg_name={sa_family=AF_INET, sin_port=htons(4444), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=128, msg_iov=[{iov_base="a", iov_len=1}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1 [pid 4846] select(6, [5], NULL, NULL, {tv_sec=2, tv_usec=0}) = 1 (in [5], left {tv_sec=1, tv_usec=999995}) [pid 4846] recvmsg(5, {msg_name={sa_family=AF_INET, sin_port=htons(4040), sin_addr=inet_addr("192.168.1.254")}, msg_namelen=128->16, msg_iov=[{iov_base="a", iov_len=64}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1 [pid 4846] close(6) = 0 [pid 4846] close(5) = 0 [pid 4846] bpf(BPF_PROG_DETACH, {target_fd=3, attach_type=0x13 / BPF_??? /}, 112) = 0 IPv6 test case: [pid 4846] bpf(BPF_PROG_ATTACH, {target_fd=3, attach_bpf_fd=4, attach_type=0x14 / BPF_??? /, attach_flags=BPF_F_ALLOW_OVERRIDE}, 112) = 0 [pid 4846] socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) = 5 [pid 4846] bind(5, {sa_family=AF_INET6, sin6_port=htons(6666), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 128) = 0 [pid 4846] socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) = 6 [pid 4846] sendmsg(6, {msg_name={sa_family=AF_INET6, sin6_port=htons(6666), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, msg_namelen=128, msg_iov=[{iov_base="a", iov_len=1}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1 [pid 4846] select(6, [5], NULL, NULL, {tv_sec=2, tv_usec=0}) = 1 (in [5], left {tv_sec=1, tv_usec=999996}) [pid 4846] recvmsg(5, {msg_name={sa_family=AF_INET6, sin6_port=htons(6060), inet_pton(AF_INET6, "face:b00c:1234:5678::abcd", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, msg_namelen=128->28, msg_iov=[{iov_base="a", iov_len=64}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 1 [pid 4846] close(6) = 0 [pid 4846] close(5) = 0 [pid 4846] bpf(BPF_PROG_DETACH, {target_fd=3, attach_type=0x14 / BPF_??? */}, 112) = 0 test_sock_addr run w/o strace view: # ./test_sock_addr.sh [...] Test case: recvmsg4: return code ok .. [PASS] Test case: recvmsg4: return code !ok .. [PASS] Test case: recvmsg6: return code ok .. [PASS] Test case: recvmsg6: return code !ok .. [PASS] Test case: recvmsg4: rewrite IP & port (asm) .. [PASS] Test case: recvmsg6: rewrite IP & port (asm) .. [PASS] [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrey Ignatov <rdna@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-06-06 16:53:12 -07:00
Daniel Borkmann	000aa1250d	bpf, bpftool: enable recvmsg attach types Trivial patch to bpftool in order to complete enabling attaching programs to BPF_CGROUP_UDP{4,6}_RECVMSG. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrey Ignatov <rdna@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-06-06 16:53:12 -07:00
Daniel Borkmann	9bb59ac1f6	bpf, libbpf: enable recvmsg attach types Another trivial patch to libbpf in order to enable identifying and attaching programs to BPF_CGROUP_UDP{4,6}_RECVMSG by section name. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-06-06 16:53:12 -07:00
Daniel Borkmann	3dbc6adac1	bpf: sync tooling uapi header Sync BPF uapi header in order to pull in BPF_CGROUP_UDP{4,6}_RECVMSG attach types. This is done and preferred as an extra patch in order to ease sync of libbpf. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrey Ignatov <rdna@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-06-06 16:53:12 -07:00
Daniel Borkmann	983695fa67	bpf: fix unconnected udp hooks Intention of cgroup bind/connect/sendmsg BPF hooks is to act transparently to applications as also stated in original motivation in `7828f20e37` ("Merge branch 'bpf-cgroup-bind-connect'"). When recently integrating the latter two hooks into Cilium to enable host based load-balancing with Kubernetes, I ran into the issue that pods couldn't start up as DNS got broken. Kubernetes typically sets up DNS as a service and is thus subject to load-balancing. Upon further debugging, it turns out that the cgroupv2 sendmsg BPF hooks API is currently insufficient and thus not usable as-is for standard applications shipped with most distros. To break down the issue we ran into with a simple example: # cat /etc/resolv.conf nameserver 147.75.207.207 nameserver 147.75.207.208 For the purpose of a simple test, we set up above IPs as service IPs and transparently redirect traffic to a different DNS backend server for that node: # cilium service list ID Frontend Backend 1 147.75.207.207:53 1 => 8.8.8.8:53 2 147.75.207.208:53 1 => 8.8.8.8:53 The attached BPF program is basically selecting one of the backends if the service IP/port matches on the cgroup hook. DNS breaks here, because the hooks are not transparent enough to applications which have built-in msg_name address checks: # nslookup 1.1.1.1 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 [...] ;; connection timed out; no servers could be reached # dig 1.1.1.1 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 [...] ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1 ;; global options: +cmd ;; connection timed out; no servers could be reached For comparison, if none of the service IPs is used, and we tell nslookup to use 8.8.8.8 directly it works just fine, of course: # nslookup 1.1.1.1 8.8.8.8 1.1.1.1.in-addr.arpa name = one.one.one.one. In order to fix this and thus act more transparent to the application, this needs reverse translation on recvmsg() side. A minimal fix for this API is to add similar recvmsg() hooks behind the BPF cgroups static key such that the program can track state and replace the current sockaddr_in{,6} with the original service IP. From BPF side, this basically tracks the service tuple plus socket cookie in an LRU map where the reverse NAT can then be retrieved via map value as one example. Side-note: the BPF cgroups static key should be converted to a per-hook static key in future. Same example after this fix: # cilium service list ID Frontend Backend 1 147.75.207.207:53 1 => 8.8.8.8:53 2 147.75.207.208:53 1 => 8.8.8.8:53 Lookups work fine now: # nslookup 1.1.1.1 1.1.1.1.in-addr.arpa name = one.one.one.one. Authoritative answers can be found from: # dig 1.1.1.1 ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 51550 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;1.1.1.1. IN A ;; AUTHORITY SECTION: . 23426 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2019052001 1800 900 604800 86400 ;; Query time: 17 msec ;; SERVER: 147.75.207.207#53(147.75.207.207) ;; WHEN: Tue May 21 12:59:38 UTC 2019 ;; MSG SIZE rcvd: 111 And from an actual packet level it shows that we're using the back end server when talking via 147.75.207.20{7,8} front end: # tcpdump -i any udp [...] 12:59:52.698732 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38) 12:59:52.698735 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38) 12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67) 12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67) [...] In order to be flexible and to have same semantics as in sendmsg BPF programs, we only allow return codes in [1,1] range. In the sendmsg case the program is called if msg->msg_name is present which can be the case in both, connected and unconnected UDP. The former only relies on the sockaddr_in{,6} passed via connect(2) if passed msg->msg_name was NULL. Therefore, on recvmsg side, we act in similar way to call into the BPF program whenever a non-NULL msg->msg_name was passed independent of sk->sk_state being TCP_ESTABLISHED or not. Note that for TCP case, the msg->msg_name is ignored in the regular recvmsg path and therefore not relevant. For the case of ip{,v6}_recv_error() paths, picked up via MSG_ERRQUEUE, the hook is not called. This is intentional as it aligns with the same semantics as in case of TCP cgroup BPF hooks right now. This might be better addressed in future through a different bpf_attach_type such that this case can be distinguished from the regular recvmsg paths, for example. Fixes: `1cedee13d2` ("bpf: Hooks for sys_sendmsg") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrey Ignatov <rdna@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-06-06 16:53:12 -07:00

1 2 3 4 5 ...

840668 Commits