When token lookup on MP_JOIN 3rd ack fails, the server
socket closes with a reset the incoming child. Such socket
has the 'is_mptcp' flag set, but no msk socket associated
- due to the failed lookup.
While crafting the reset packet mptcp_established_options_mp()
will try to dereference the child's master socket, causing
a NULL ptr dereference.
This change addresses the issue with explicit fallback to
TCP in such error path.
Fixes: 729cd6436f ("mptcp: cope better with MP_JOIN failure")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While the other fq-based qdiscs take advantage of skb->hash and doesn't
recompute it if it is already set, sch_cake does not.
This was a deliberate choice because sch_cake hashes various parts of the
packet header to support its advanced flow isolation modes. However,
foregoing the use of skb->hash entirely loses a few important benefits:
- When skb->hash is set by hardware, a few CPU cycles can be saved by not
hashing again in software.
- Tunnel encapsulations will generally preserve the value of skb->hash from
before the encapsulation, which allows flow-based qdiscs to distinguish
between flows even though the outer packet header no longer has flow
information.
It turns out that we can preserve these desirable properties in many cases,
while still supporting the advanced flow isolation properties of sch_cake.
This patch does so by reusing the skb->hash value as the flow_hash part of
the hashing procedure in cake_hash() only in the following conditions:
- If the skb->hash is marked as covering the flow headers (skb->l4_hash is
set)
AND
- NAT header rewriting is either disabled, or did not change any values
used for hashing. The latter is important to match local-origin packets
such as those of a tunnel endpoint.
The immediate motivation for fixing this was the recent patch to WireGuard
to preserve the skb->hash on encapsulation. As such, this is also what I
tested against; with this patch, added latency under load for competing
flows drops from ~8 ms to sub-1ms on an RRUL test over a WireGuard tunnel
going through a virtual link shaped to 1Gbps using sch_cake. This matches
the results we saw with a similar setup using sch_fq_codel when testing the
WireGuard patch.
Fixes: 046f6fd5da ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pre-fetch send buffer for the CDC validation message before entering the
send_lock. Without that the send call might fail with -EBUSY because
there are no free buffers and waiting for buffers is not possible under
send_lock.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Accumulated cleanup patches and sparse warning fixes for mlx5 driver.
1) sync with mlx5-next branch
2) Eli Cohen declares mpls_entry_encode() helper in mpls.h as suggested
by Jakub Kicinski and David Ahern, and use it in mlx5
3) Jesper Fixes xdp data_meta setup in mlx5
4) Many sparse and build warnings cleanup
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl7R3wcACgkQSD+KveBX
+j6/ZQf/QD39naPeImfLjemkRK9L+TKbS4nU6wpUwf1jC33Wdm4HhkhsWEnR6C4l
OwU/Pae3I9EtKP4gRE0W1o8h7zC9h4hY7+IKZOdyQ32iUY55PX/H25oqAiCj1NCM
xzWpXOTwK/vkqmkCedAd+YpNdYlbOhfycr+KVPSsvFdaPqjzfNO1PJcLsUbAbzrX
A+8pYdhUYTtx1N3YHJL5abLN6WzMAKxgwlm9GG8YCXACTJT6CBWWHGebVsC5TDUk
Lj5hJj38mI8/3dcu6vWP0kLGVfRZo0HS/gpPGxbKQFpP+1uBYaRENAQONxkY++6S
GDPix7ccvN+yNMlON893PC/Cogw3Yg==
=WaCJ
-----END PGP SIGNATURE-----
Merge tag 'mlx5-cleanup-2020-05-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-cleanup-2020-05-29
Accumulated cleanup patches and sparse warning fixes for mlx5 driver.
1) sync with mlx5-next branch
2) Eli Cohen declares mpls_entry_encode() helper in mpls.h as suggested
by Jakub Kicinski and David Ahern, and use it in mlx5
3) Jesper Fixes xdp data_meta setup in mlx5
4) Many sparse and build warnings cleanup
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Move mpls_entry_encode() from net/mpls/internal.h to include/net/mpls.h
and make it available for other users. Specifically, hardware driver that
offload MPLS can benefit from that.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
As of commit 98fa6271cf
("tcp: refactor setting the initial congestion window") this is called
only from tcp_input.c, so it can be static.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kbuild test robot is reporting:
net/ethtool/cabletest.c:230:5: warning: no previous prototype for
Mark the function as static.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
net/tipc/msg.c: In function 'tipc_msg_append':
net/tipc/msg.c:215:24: warning:
variable 'prev' set but not used [-Wunused-but-set-variable]
commit 0a3e060f34 ("tipc: add test for Nagle algorithm effectiveness")
left behind this, remove it.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Right now, our only tag_8021q user, sja1105, has the ability to restore
bridge VLANs on its own, so this logic is unnecessary.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No users left.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SCTP protocol allows to bind multiple address to a socket. That
feature is currently only exposed as a socket option. Add a bind_add
method struct proto that allows to bind additional addresses, and
switch the dlm code to use the method instead of going through the
socket option from kernel space.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split out a sctp_setsockopt_bindx_kernel that takes a kernel pointer
to the sockaddr and make sctp_setsockopt_bindx a small wrapper around
it. This prepares for adding a new bind_add proto op.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2020-05-29
1) Add IPv6 encapsulation support for ESP over UDP and TCP.
From Sabrina Dubroca.
2) Remove unneeded reference when initializing xfrm interfaces.
From Nicolas Dichtel.
3) Remove some indirect calls from the state_afinfo.
From Florian Westphal.
Please note that this pull request has two merge conflicts
between commit:
0c922a4850 ("xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finish")
from Linus' tree and commit:
2ab6096db2 ("xfrm: remove output_finish indirection from xfrm_state_afinfo")
from the ipsec-next tree.
and between commit:
3986912f6a ("ipv6: move SIOCADDRT and SIOCDELRT handling into ->compat_ioctl")
from the net-next tree and commit:
0146dca70b ("xfrm: add support for UDPv6 encapsulation of ESP")
from the ipsec-next tree.
Both conflicts can be resolved as done in linux-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid using kernel_setsockopt for the TIPC_IMPORTANCE option when we can
just use the internal helper. The only change needed is to pass a struct
sock instead of tipc_sock, which is private to socket.c
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the RXRPC_MIN_SECURITY_LEVEL sockopt from
kernel space without going through a fake uaccess.
Thanks to David Howells for the documentation updates.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the IPV6_RECVPKTINFO sockopt from kernel
space without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the IPV6_ADD_PREFERENCES sockopt from kernel
space without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the IPV6_RECVERR sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the IPV6_V6ONLY sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the IP_PKTINFO sockopt from kernel
space without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the IP_MTU_DISCOVER sockopt from kernel
space without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Howells <dhowells@redhat.com> [rxrpc bits]
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the IP_RECVERR sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the IP_FREEBIND sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the IP_TOS sockopt from kernel space without
going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the TCP_KEEPCNT sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the TCP_KEEPINTVL sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the TCP_KEEP_IDLE sockopt from kernel
space without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the TCP_USER_TIMEOUT sockopt from kernel
space without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the TCP_SYNCNT sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the TCP_QUICKACK sockopt from kernel space
without going through a fake uaccess. Cleanup the callers to avoid
pointless wrappers now that this is a simple function call.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the TCP_NODELAY sockopt from kernel space
without going through a fake uaccess. Cleanup the callers to avoid
pointless wrappers now that this is a simple function call.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the TCP_CORK sockopt from kernel space
without going through a fake uaccess. Cleanup the callers to avoid
pointless wrappers now that this is a simple function call.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the SO_REUSEPORT sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the SO_RCVBUFFORCE sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the SO_KEEPALIVE sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly enable timestamps instead of setting the
SO_TIMESTAMP* sockopts from kernel space and going through a fake
uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the SO_BINDTOIFINDEX sockopt from kernel
space without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the SO_SNDTIMEO_NEW sockopt from kernel
space without going through a fake uaccess. The interface is
simplified to only pass the seconds value, as that is the only
thing needed at the moment.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the SO_PRIORITY sockopt from kernel space
without going through a fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the SO_LINGER sockopt from kernel space
with onoff set to true and a linger time of 0 without going through a
fake uaccess.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to directly set the SO_REUSEADDR sockopt from kernel space
without going through a fake uaccess.
For this the iscsi target now has to formally depend on inet to avoid
a mostly theoretical compile failure. For actual operation it already
did depend on having ipv4 or ipv6 support.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make tcp_ld_RTO_revert() helper available to IPv6, and
implement RFC 6069 :
Quoting this RFC :
3. Connectivity Disruption Indication
For Internet Protocol version 6 (IPv6) [RFC2460], the counterpart of
the ICMP destination unreachable message of code 0 (net unreachable)
and of code 1 (host unreachable) is the ICMPv6 destination
unreachable message of code 0 (no route to destination) [RFC4443].
As with IPv4, a router should generate an ICMPv6 destination
unreachable message of code 0 in response to a packet that cannot be
delivered to its destination address because it lacks a matching
entry in its routing table.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
change typo in function name "nofity" to "notify"
sctp_ulpevent_nofity_peer_addr_change ->
sctp_ulpevent_notify_peer_addr_change
Signed-off-by: Jonas Falkevik <jonas.falkevik@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a field to the tls rx offload context which enables
drivers to force a send_resync call.
This field can be used by drivers to request a resync at the next
possible tls record. It is beneficial for hardware that provides the
resync sequence number asynchronously. In such cases, the packet that
triggered the resync does not contain the information required for a
resync. Instead, the driver requests resync for all the following
TLS record until the asynchronous notification with the resync request
TCP sequence arrives.
A following series for mlx5e ConnectX-6DX TLS RX offload support will
use this mechanism.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Resetting old qdisc on dev_queue->qdisc_sleeping in
dev_qdisc_reset() is redundant, because this qdisc,
even if not same with dev_queue->qdisc, is reset via
qdisc_put() right after calling dev_graft_qdisc() when
hitting refcnt 0.
This is very easy to observe with qdisc_reset() tracepoint
and stack traces.
Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz>
Tested-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Except for sch_mq and sch_mqprio, each dev queue points to the
same root qdisc, so when we reset the dev queues with
netdev_for_each_tx_queue() we end up resetting the same instance
of the root qdisc for multiple times.
Avoid this by checking the __QDISC_STATE_DEACTIVATED bit in
each iteration, so for sch_mq/sch_mqprio, we still reset all
of them like before, for the rest, we only reset it once.
Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz>
Tested-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qdisc_destroy() calls ops->reset() and cleans up qdisc->gso_skb
and qdisc->skb_bad_txq, these are nearly same with qdisc_reset(),
so just call it directly, and cosolidate the code for the next
patch.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>