Commit Graph

51038 Commits

Author SHA1 Message Date
Ursula Braun
9b67e26f93 net/smc: handle ioctls SIOCINQ, SIOCOUTQ, and SIOCOUTQNSD
SIOCINQ returns the amount of unread data in the RMB.
SIOCOUTQ returns the amount of unsent or unacked sent data in the send
buffer.
SIOCOUTQNSD returns the amount of data prepared for sending, but
not yet sent.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02 13:29:12 -04:00
Karsten Graul
ed75986f4a net/smc: ipv6 support for smc_diag.c
Update smc_diag.c to support ipv6 addresses on the diagnosis interface.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02 13:29:12 -04:00
Karsten Graul
877ae5be42 net/smc: periodic testlink support
Add periodic LLC testlink support to ensure the link is still active.
The interval time is initialized using the value of
sysctl_tcp_keepalive_time.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02 13:29:12 -04:00
Sean Tranchetti
6c035ba7e7 udp: Complement partial checksum for GSO packet
Using the udp_v4_check() function to calculate the pseudo header
for the newly segmented UDP packets results in assigning the complement
of the value to the UDP header checksum field.

Always undo the complement the partial checksum value in order to
match the case where GSO is not used on the UDP transmit path.

Fixes: ee80d1ebe5 ("udp: add udp gso")
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-02 10:59:32 -04:00
Soheil Hassas Yeganeh
b75eba76d3 tcp: send in-queue bytes in cmsg upon read
Applications with many concurrent connections, high variance
in receive queue length and tight memory bounds cannot
allocate worst-case buffer size to drain sockets. Knowing
the size of receive queue length, applications can optimize
how they allocate buffers to read from the socket.

The number of bytes pending on the socket is directly
available through ioctl(FIONREAD/SIOCINQ) and can be
approximated using getsockopt(MEMINFO) (rmem_alloc includes
skb overheads in addition to application data). But, both of
these options add an extra syscall per recvmsg. Moreover,
ioctl(FIONREAD/SIOCINQ) takes the socket lock.

Add the TCP_INQ socket option to TCP. When this socket
option is set, recvmsg() relays the number of bytes available
on the socket for reading to the application via the
TCP_CM_INQ control message.

Calculate the number of bytes after releasing the socket lock
to include the processed backlog, if any. To avoid an extra
branch in the hot path of recvmsg() for this new control
message, move all cmsg processing inside an existing branch for
processing receive timestamps. Since the socket lock is not held
when calculating the size of receive queue, TCP_INQ is a hint.
For example, it can overestimate the queue size by one byte,
if FIN is received.

With this method, applications can start reading from the socket
using a small buffer, and then use larger buffers based on the
remaining data when needed.

V3 change-log:
	As suggested by David Miller, added loads with barrier
	to check whether we have multiple threads calling recvmsg
	in parallel. When that happens we lock the socket to
	calculate inq.
V4 change-log:
	Removed inline from a static function.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01 18:56:29 -04:00
Florian Fainelli
e283de3a4f net: core: Inline netdev_features_size_check()
We do not require this inline function to be used in multiple different
locations, just inline it where it gets used in register_netdevice().

Suggested-by: David Miller <davem@davemloft.net>
Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01 14:24:19 -04:00
Willem de Bruijn
a8c744a8b4 udp: disable gso with no_check_tx
Syzbot managed to send a udp gso packet without checksum offload into
the gso stack by disabling tx checksum (UDP_NO_CHECK6_TX). This
triggered the skb_warn_bad_offload.

  RIP: 0010:skb_warn_bad_offload+0x2bc/0x600 net/core/dev.c:2658
   skb_gso_segment include/linux/netdevice.h:4038 [inline]
   validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3120
   __dev_queue_xmit+0xbf8/0x34c0 net/core/dev.c:3577
   dev_queue_xmit+0x17/0x20 net/core/dev.c:3618

UDP_NO_CHECK6_TX sets skb->ip_summed to CHECKSUM_NONE just after the
udp gso integrity checks in udp_(v6_)send_skb. Extend those checks to
catch and fail in this case.

After the integrity checks jump directly to the CHECKSUM_PARTIAL case
to avoid reading the no_check_tx flags again (a TOCTTOU race).

Fixes: bec1f6f697 ("udp: generate gso with UDP_SEGMENT")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01 14:20:14 -04:00
Paul Blakey
05cd271fd6 cls_flower: Support multiple masks per priority
Currently flower doesn't support inserting filters with different masks
on a single priority, even if the actual flows (key + mask) inserted
aren't overlapping, as with the use case of offloading openvswitch
datapath flows. Instead one must go up one level, and assign different
priorities for each mask, which will create a different flower
instances.

This patch opens flower to support more than one mask per priority,
and a single flower instance. It does so by adding another hash table
on top of the existing one which will store the different masks,
and the filters that share it.

The user is left with the responsibility of ensuring non overlapping
flows, otherwise precedence is not guaranteed.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01 14:14:15 -04:00
Marcelo Ricardo Leitner
8914f4bace sctp: add sctp_make_op_error_limited and reuse inner functions
The idea is quite similar to the old functions, but note that the _fixed
function wasn't "fixed" as in that it would generate a packet with a fixed
size, but rather limited/bounded to PMTU.

Also, now with sctp_mtu_payload(), we have a more accurate limit.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01 12:09:35 -04:00
Marcelo Ricardo Leitner
6d3e8aa876 sctp: allow sctp_init_cause to return errors
And do so if the skb doesn't have enough space for the payload.
This is a preparation for the next patch.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01 12:09:35 -04:00
Ilya Lesokhin
e8f6979981 net/tls: Add generic NIC offload infrastructure
This patch adds a generic infrastructure to offload TLS crypto to a
network device. It enables the kernel TLS socket to skip encryption
and authentication operations on the transmit side of the data path.
Leaving those computationally expensive operations to the NIC.

The NIC offload infrastructure builds TLS records and pushes them to
the TCP layer just like the SW KTLS implementation and using the same
API.
TCP segmentation is mostly unaffected. Currently the only exception is
that we prevent mixed SKBs where only part of the payload requires
offload. In the future we are likely to add a similar restriction
following a change cipher spec record.

The notable differences between SW KTLS and NIC offloaded TLS
implementations are as follows:
1. The offloaded implementation builds "plaintext TLS record", those
records contain plaintext instead of ciphertext and place holder bytes
instead of authentication tags.
2. The offloaded implementation maintains a mapping from TCP sequence
number to TLS records. Thus given a TCP SKB sent from a NIC offloaded
TLS socket, we can use the tls NIC offload infrastructure to obtain
enough context to encrypt the payload of the SKB.
A TLS record is released when the last byte of the record is ack'ed,
this is done through the new icsk_clean_acked callback.

The infrastructure should be extendable to support various NIC offload
implementations.  However it is currently written with the
implementation below in mind:
The NIC assumes that packets from each offloaded stream are sent as
plaintext and in-order. It keeps track of the TLS records in the TCP
stream. When a packet marked for offload is transmitted, the NIC
encrypts the payload in-place and puts authentication tags in the
relevant place holders.

The responsibility for handling out-of-order packets (i.e. TCP
retransmission, qdisc drops) falls on the netdev driver.

The netdev driver keeps track of the expected TCP SN from the NIC's
perspective.  If the next packet to transmit matches the expected TCP
SN, the driver advances the expected TCP SN, and transmits the packet
with TLS offload indication.

If the next packet to transmit does not match the expected TCP SN. The
driver calls the TLS layer to obtain the TLS record that includes the
TCP of the packet for transmission. Using this TLS record, the driver
posts a work entry on the transmit queue to reconstruct the NIC TLS
state required for the offload of the out-of-order packet. It updates
the expected TCP SN accordingly and transmits the now in-order packet.
The same queue is used for packet transmission and TLS context
reconstruction to avoid the need for flushing the transmit queue before
issuing the context reconstruction request.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01 09:42:47 -04:00
Boris Pismenny
f66de3ee2c net/tls: Split conf to rx + tx
In TLS inline crypto, we can have one direction in software
and another in hardware. Thus, we split the TLS configuration to separate
structures for receive and transmit.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01 09:42:47 -04:00
Ilya Lesokhin
2342a8512a net: Add TLS TX offload features
This patch adds a netdev feature to configure TLS TX offloads.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01 09:42:47 -04:00
Ilya Lesokhin
ebf4e808fa net: Add Software fallback infrastructure for socket dependent offloads
With socket dependent offloads we rely on the netdev to transform
the transmitted packets before sending them to the wire.
When a packet from an offloaded socket is rerouted to a different
device we need to detect it and do the transformation in software.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01 09:42:46 -04:00
Ilya Lesokhin
08303c1895 net: Rename and export copy_skb_header
copy_skb_header is renamed to skb_copy_header and
exported. Exposing this function give more flexibility
in copying SKBs.
skb_copy and skb_copy_expand do not give enough control
over which parts are copied.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01 09:42:46 -04:00
Ilya Lesokhin
6dac152355 tcp: Add clean acked data hook
Called when a TCP segment is acknowledged.
Could be used by application protocols who hold additional
metadata associated with the stream data.

This is required by TLS device offload to release
metadata associated with acknowledged TLS records.

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-01 09:42:46 -04:00
Petr Machata
4d4fd36126 net: bridge: Publish bridge accessor functions
Add a couple new functions to allow querying FDB and vlan settings of a
bridge.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30 12:42:40 -04:00
Ahmed Abdelsalam
6df93462c2 ipv6: sr: extract the right key values for "seg6_make_flowlabel"
The seg6_make_flowlabel() is used by seg6_do_srh_encap() to compute the
flowlabel from a given skb. It relies on skb_get_hash() which eventually
calls __skb_flow_dissect() to extract the flow_keys struct values from
the skb.

In case of IPv4 traffic, calling seg6_make_flowlabel() after skb_push(),
skb_reset_network_header(), and skb_mac_header_rebuild() will results in
flow_keys struct of all key values set to zero.

This patch calls seg6_make_flowlabel() before resetting the headers of skb
to get the right key values.

Extracted Key values are based on the type inner packet as follows:
1) IPv6 traffic: src_IP, dst_IP, L4 proto, and flowlabel of inner packet.
2) IPv4 traffic: src_IP, dst_IP, L4 proto, src_port, and dst_port
3) L2 traffic: depends on what kind of traffic carried into the L2
frame. IPv6 and IPv4 traffic works as discussed 1) and 2)

Here a hex_dump of struct flow_keys for IPv4 and IPv6 traffic
10.100.1.100: 47302 > 30.0.0.2: 5001
00000000: 14 00 02 00 00 00 00 00 08 00 11 00 00 00 00 00
00000010: 00 00 00 00 00 00 00 00 13 89 b8 c6 1e 00 00 02
00000020: 0a 64 01 64

fc00:a1:a > b2::2
00000000: 28 00 03 00 00 00 00 00 86 dd 11 00 99 f9 02 00
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 b2 00 00
00000020: 00 00 00 00 00 00 00 00 00 00 00 02 fc 00 00 a1
00000030: 00 00 00 00 00 00 00 00 00 00 00 0a

Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30 12:13:43 -04:00
William Tu
1baf5ebf89 erspan: auto detect truncated packets.
Currently the truncated bit is set only when the mirrored packet
is larger than mtu.  For certain cases, the packet might already
been truncated before sending to the erspan tunnel.  In this case,
the patch detect whether the IP header's total length is larger
than the actual skb->len.  If true, this indicated that the
mirrored packet is truncated and set the erspan truncate bit.

I tested the patch using bpf_skb_change_tail helper function to
shrink the packet size and send to erspan tunnel.

Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30 11:43:45 -04:00
Florian Fainelli
3ac305c386 net: core: Assert the size of netdev_featres_t
We have about 53 netdev_features_t bits defined and counting, add a
build time check to catch when an u64 type will not be enough and we
will have to convert that to a bitmap. This is done in
register_netdevice() for convenience.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 22:50:36 -04:00
Alexander Duyck
1b837d489e net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash
I am dropping the export of __skb_tx_hash as after my patches nobody is
using it outside of the net/core/dev.c file. In addition I am renaming and
repurposing it to just be a static declaration of skb_tx_hash since that
was the only user for it at this point. By doing this the compiler can
inline it into __netdev_pick_tx as that will improve performance.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 22:01:33 -04:00
Eric Dumazet
05255b823a tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive
When adding tcp mmap() implementation, I forgot that socket lock
had to be taken before current->mm->mmap_sem. syzbot eventually caught
the bug.

Since we can not lock the socket in tcp mmap() handler we have to
split the operation in two phases.

1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
  This operation does not involve any TCP locking.

2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
 the transfert of pages from skbs to one VMA.
  This operation only uses down_read(&current->mm->mmap_sem) after
  holding TCP lock, thus solving the lockdep issue.

This new implementation was suggested by Andy Lutomirski with great details.

Benefits are :

- Better scalability, in case multiple threads reuse VMAS
   (without mmap()/munmap() calls) since mmap_sem wont be write locked.

- Better error recovery.
   The previous mmap() model had to provide the expected size of the
   mapping. If for some reason one part could not be mapped (partial MSS),
   the whole operation had to be aborted.
   With the tcp_zerocopy_receive struct, kernel can report how
   many bytes were successfuly mapped, and how many bytes should
   be read to skip the problematic sequence.

- No more memory allocation to hold an array of page pointers.
  16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/

- skbs are freed while mmap_sem has been released

Following patch makes the change in tcp_mmap tool to demonstrate
one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)

Note that memcg might require additional changes.

Fixes: 93ab6cc691 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Cc: linux-mm@kvack.org
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 21:29:55 -04:00
Willem de Bruijn
af201bab50 udp: remove stray export symbol
UDP GSO needs to export __udp_gso_segment to call it from ipv6.

I accidentally exported static ipv4 function __udp4_gso_segment.
Remove that EXPORT_SYMBOL_GPL.

Fixes: ee80d1ebe5 ("udp: add udp gso")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 20:32:39 -04:00
Bjorn Andersson
28fb4e59a4 net: qrtr: Expose tunneling endpoint to user space
This implements a misc character device named "qrtr-tun" for the purpose
of allowing user space applications to implement endpoints in the qrtr
network.

This allows more advanced (and dynamic) testing of the qrtr code as well
as opens up the ability of tunneling qrtr over a network or USB link.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 15:06:10 -04:00
Marcelo Ricardo Leitner
38687b56c5 sctp: allow unsetting sockopt MAXSEG
RFC 6458 Section 8.1.16 says that setting MAXSEG as 0 means that the user
is not limiting it, and not that it should set to the *current* maximum,
as we are doing.

This patch thus allow setting it as 0, effectively removing the user
limit.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:24 -04:00
Marcelo Ricardo Leitner
439ef0309c sctp: consider idata chunks when setting SCTP_MAXSEG
When setting SCTP_MAXSEG sock option, it should consider which kind of
data chunk is being used if the asoc is already available, so that the
limit better reflect reality.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:23 -04:00
Marcelo Ricardo Leitner
63d01330aa sctp: honor PMTU_DISABLED when handling icmp
sctp_sendmsg() could trigger PMTU updates even when PMTU_DISABLED was
set, as pmtu_pending could be set unconditionally during icmp handling
if the socket was in use by the application.

This patch fixes it by checking for PMTU_DISABLED when handling such
deferred updates.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:23 -04:00
Marcelo Ricardo Leitner
6e91b578bf sctp: re-use sctp_transport_pmtu in sctp_transport_route
sctp_transport_route currently is very similar to sctp_transport_pmtu plus
a few other bits.

This patch reuses sctp_transport_pmtu in sctp_transport_route and removes
the duplicated code.

Also, as all calls to sctp_transport_route were forcing the dst release
before calling it, let's just include such release too.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:23 -04:00
Marcelo Ricardo Leitner
22d7be267e sctp: remove sctp_transport_pmtu_check
We are now keeping the MTU information synced between asoc, transport
and dst, which makes the check at sctp_packet_config() not needed
anymore. As it was the sole caller to this function, lets remove it.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:23 -04:00
Marcelo Ricardo Leitner
6ff0f871c2 sctp: introduce sctp_dst_mtu
Which makes sure that the MTU respects the minimum value of
SCTP_DEFAULT_MINSEGMENT and that it is correctly aligned.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:23 -04:00
Marcelo Ricardo Leitner
2521680e18 sctp: remove sctp_assoc_pending_pmtu
No need for this helper.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:23 -04:00
Marcelo Ricardo Leitner
2f5e3c9df6 sctp: introduce sctp_assoc_update_frag_point
and avoid the open-coded versions of it.

Now sctp_datamsg_from_user can just re-use asoc->frag_point as it will
always be updated.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:23 -04:00
Marcelo Ricardo Leitner
feddd6c1af sctp: introduce sctp_mtu_payload
When given a MTU, this function calculates how much payload we can carry
on it. Without a MTU, it calculates the amount of header overhead we
have.

So that when we have extra overhead, like the one added for IP options
on SELinux patches, it is easier to handle it.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:23 -04:00
Marcelo Ricardo Leitner
c4b2893dae sctp: introduce sctp_assoc_set_pmtu
All changes to asoc PMTU should now go through this wrapper, making it
easier to track them and to do other actions upon it.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:22 -04:00
Marcelo Ricardo Leitner
c88da20f95 sctp: remove an if() that is always true
As noticed by Xin Long, the if() here is always true as PMTU can never
be 0.

Reported-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:22 -04:00
Marcelo Ricardo Leitner
800e00c127 sctp: move transport pathmtu calc away of sctp_assoc_add_peer
There was only one case that sctp_assoc_add_peer couldn't handle, which
is when SPP_PMTUD_DISABLE is set and pathmtu not initialized.
So add this situation to sctp_transport_route and reuse what was
already in there.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:22 -04:00
Wei Wang
c36207bd87 tcp: remove mss check in tcp_select_initial_window()
In tcp_select_initial_window(), we only set rcv_wnd to
tcp_default_init_rwnd() if current mss > (1 << wscale). Otherwise,
rcv_wnd is kept at the full receive space of the socket which is a
value way larger than tcp_default_init_rwnd().
With larger initial rcv_wnd value, receive buffer autotuning logic
takes longer to kick in and increase the receive buffer.

In a TCP throughput test where receiver has rmem[2] set to 125MB
(wscale is 11), we see the connection gets recvbuf limited at the
beginning of the connection and gets less throughput overall.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:05:36 -04:00
Ursula Braun
abb190f194 net/smc: handle sockopt TCP_DEFER_ACCEPT
If sockopt TCP_DEFER_ACCEPT is set, the accept is delayed till
data is available.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:02:52 -04:00
Ursula Braun
01d2f7e2cd net/smc: sockopts TCP_NODELAY and TCP_CORK
Setting sockopt TCP_NODELAY or resetting sockopt TCP_CORK
triggers data transfer.

For a corked SMC socket RDMA writes are deferred, if there is
still sufficient send buffer space available.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:02:52 -04:00
Ursula Braun
ee9dfbef02 net/smc: handle sockopts forcing fallback
Several TCP sockopts do not work for SMC. One example are the
TCP_FASTOPEN sockopts, since SMC-connection setup is based on the TCP
three-way-handshake.
If the SMC socket is still in state SMC_INIT, such sockopts trigger
fallback to TCP. Otherwise an error is returned.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:02:52 -04:00
Karsten Graul
3382576106 net/smc: fix structure size
The struct smc_cdc_msg must be defined as packed so the
size is 44 bytes.
And change the structure size check so sizeof is checked.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:02:51 -04:00
Kirill Tkhai
3f5ecd8a90 net: Fix coccinelle warning
kbuild test robot says:

  >coccinelle warnings: (new ones prefixed by >>)
  >>> net/core/dev.c:1588:2-3: Unneeded semicolon

So, let's remove it.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 13:53:14 -04:00
YueHaibing
d8fb1648fc bridge: use hlist_entry_safe
Use hlist_entry_safe() instead of open-coding it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 13:20:48 -04:00
Florian Fainelli
cf96357303 net: dsa: Allow providing PHY statistics from CPU port
Implement the same type of ethtool diversion that we have for
ETH_SS_STATS and make it work with ETH_SS_PHY_STATS. This allows
providing PHY level statistics for CPU ports that are directly
connecting to a PHY device.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 11:53:03 -04:00
Florian Fainelli
6207a78c09 net: dsa: Add helper function to obtain PHY device of a given port
In preparation for having more call sites attempting to obtain a
reference against a PHY device corresponding to a particular port,
introduce a helper function for that purpose.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 11:53:03 -04:00
Florian Fainelli
89f0904834 net: dsa: Pass stringset to ethtool operations
Up until now we largely assumed that we were interested in ETH_SS_STATS
type of strings for all ethtool operations, this is about to change with
the introduction of additional string sets, e.g: ETH_SS_PHY_STATS.
Update all functions to take an appropriate stringset argument and act
on it when it is different than ETH_SS_STATS for now.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 11:53:03 -04:00
Florian Fainelli
1d1e79f1c6 net: dsa: Do not check for ethtool_ops validity
This is completely redundant with what netdev_set_default_ethtool_ops()
does, we are always guaranteed to have a valid dev->ethtool_ops pointer,
however, within that structure, not all function calls may be populated,
so we still have to check them individually.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 11:53:02 -04:00
Florian Fainelli
9994338227 net: Allow network devices to have PHY statistics
Add a new callback: get_ethtool_phy_stats() which allows network device
drivers not making use of the PHY library to return PHY statistics.
Update ethtool_get_phy_stats(), __ethtool_get_sset_count() and
__ethtool_get_strings() accordingly to interogate the network device
about ETH_SS_PHY_STATS.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 11:53:02 -04:00
Florian Fainelli
c59530d0d5 net: Move PHY statistics code into PHY library helpers
In order to make it possible for network device drivers that do not
necessarily have a phy_device attached, but still report PHY statistics,
have a preliminary refactoring consisting in creating helper functions
that encapsulate the PHY device driver knowledge within PHYLIB.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 11:53:02 -04:00
Guillaume Nault
8349440733 l2tp: consistent reference counting in procfs and debufs
The 'pppol2tp' procfs and 'l2tp/tunnels' debugfs files handle reference
counting of sessions differently than for tunnels.

For consistency, use the same mechanism for handling both sessions and
tunnels. That is, drop the reference on the previous session just
before looking up the next one (rather than in .show()). If necessary
(if dump stops before *_next_session() returns NULL), drop the last
reference in .stop().

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 11:06:35 -04:00