Commit Graph

57946 Commits

Author SHA1 Message Date
Ahmed Abdelsalam
6df93462c2 ipv6: sr: extract the right key values for "seg6_make_flowlabel"
The seg6_make_flowlabel() is used by seg6_do_srh_encap() to compute the
flowlabel from a given skb. It relies on skb_get_hash() which eventually
calls __skb_flow_dissect() to extract the flow_keys struct values from
the skb.

In case of IPv4 traffic, calling seg6_make_flowlabel() after skb_push(),
skb_reset_network_header(), and skb_mac_header_rebuild() will results in
flow_keys struct of all key values set to zero.

This patch calls seg6_make_flowlabel() before resetting the headers of skb
to get the right key values.

Extracted Key values are based on the type inner packet as follows:
1) IPv6 traffic: src_IP, dst_IP, L4 proto, and flowlabel of inner packet.
2) IPv4 traffic: src_IP, dst_IP, L4 proto, src_port, and dst_port
3) L2 traffic: depends on what kind of traffic carried into the L2
frame. IPv6 and IPv4 traffic works as discussed 1) and 2)

Here a hex_dump of struct flow_keys for IPv4 and IPv6 traffic
10.100.1.100: 47302 > 30.0.0.2: 5001
00000000: 14 00 02 00 00 00 00 00 08 00 11 00 00 00 00 00
00000010: 00 00 00 00 00 00 00 00 13 89 b8 c6 1e 00 00 02
00000020: 0a 64 01 64

fc00:a1:a > b2::2
00000000: 28 00 03 00 00 00 00 00 86 dd 11 00 99 f9 02 00
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 b2 00 00
00000020: 00 00 00 00 00 00 00 00 00 00 00 02 fc 00 00 a1
00000030: 00 00 00 00 00 00 00 00 00 00 00 0a

Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30 12:13:43 -04:00
William Tu
1baf5ebf89 erspan: auto detect truncated packets.
Currently the truncated bit is set only when the mirrored packet
is larger than mtu.  For certain cases, the packet might already
been truncated before sending to the erspan tunnel.  In this case,
the patch detect whether the IP header's total length is larger
than the actual skb->len.  If true, this indicated that the
mirrored packet is truncated and set the erspan truncate bit.

I tested the patch using bpf_skb_change_tail helper function to
shrink the packet size and send to erspan tunnel.

Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-30 11:43:45 -04:00
Florian Fainelli
3ac305c386 net: core: Assert the size of netdev_featres_t
We have about 53 netdev_features_t bits defined and counting, add a
build time check to catch when an u64 type will not be enough and we
will have to convert that to a bitmap. This is done in
register_netdevice() for convenience.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 22:50:36 -04:00
Alexander Duyck
1b837d489e net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash
I am dropping the export of __skb_tx_hash as after my patches nobody is
using it outside of the net/core/dev.c file. In addition I am renaming and
repurposing it to just be a static declaration of skb_tx_hash since that
was the only user for it at this point. By doing this the compiler can
inline it into __netdev_pick_tx as that will improve performance.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 22:01:33 -04:00
Eric Dumazet
05255b823a tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive
When adding tcp mmap() implementation, I forgot that socket lock
had to be taken before current->mm->mmap_sem. syzbot eventually caught
the bug.

Since we can not lock the socket in tcp mmap() handler we have to
split the operation in two phases.

1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
  This operation does not involve any TCP locking.

2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
 the transfert of pages from skbs to one VMA.
  This operation only uses down_read(&current->mm->mmap_sem) after
  holding TCP lock, thus solving the lockdep issue.

This new implementation was suggested by Andy Lutomirski with great details.

Benefits are :

- Better scalability, in case multiple threads reuse VMAS
   (without mmap()/munmap() calls) since mmap_sem wont be write locked.

- Better error recovery.
   The previous mmap() model had to provide the expected size of the
   mapping. If for some reason one part could not be mapped (partial MSS),
   the whole operation had to be aborted.
   With the tcp_zerocopy_receive struct, kernel can report how
   many bytes were successfuly mapped, and how many bytes should
   be read to skip the problematic sequence.

- No more memory allocation to hold an array of page pointers.
  16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/

- skbs are freed while mmap_sem has been released

Following patch makes the change in tcp_mmap tool to demonstrate
one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)

Note that memcg might require additional changes.

Fixes: 93ab6cc691 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Cc: linux-mm@kvack.org
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 21:29:55 -04:00
Hangbin Liu
e8238fc2bd bridge: check iface upper dev when setting master via ioctl
When we set a bond slave's master to bridge via ioctl, we only check
the IFF_BRIDGE_PORT flag. Although we will find the slave's real master
at netdev_master_upper_dev_link() later, it already does some settings
and allocates some resources. It would be better to return as early
as possible.

v1 -> v2:
use netdev_master_upper_dev_get() instead of netdev_has_any_upper_dev()
to check if we have a master, because not all upper devs are masters,
e.g. vlan device.

Reported-by: syzbot+de73361ee4971b6e6f75@syzkaller.appspotmail.com
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-29 21:08:02 -04:00
Willem de Bruijn
af201bab50 udp: remove stray export symbol
UDP GSO needs to export __udp_gso_segment to call it from ipv6.

I accidentally exported static ipv4 function __udp4_gso_segment.
Remove that EXPORT_SYMBOL_GPL.

Fixes: ee80d1ebe5 ("udp: add udp gso")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 20:32:39 -04:00
Lance Richardson
988bf7243e net: support compat 64-bit time in {s,g}etsockopt
For the x32 ABI, struct timeval has two 64-bit fields. However
the kernel currently interprets the user-space values used for
the SO_RCVTIMEO and SO_SNDTIMEO socket options as having a pair
of 32-bit fields.

When the seconds portion of the requested timeout is less than 2**32,
the seconds portion of the effective timeout is correct but the
microseconds portion is zero.  When the seconds portion of the
requested timeout is zero and the microseconds portion is non-zero,
the kernel interprets the timeout as zero (never timeout).

Fix by using 64-bit time for SO_RCVTIMEO/SO_SNDTIMEO as required
for the ABI.

The code included below demonstrates the problem.

Results before patch:
    $ gcc -m64 -Wall -O2 -o socktmo socktmo.c && ./socktmo
    recv time: 2.008181 seconds
    send time: 2.015985 seconds

    $ gcc -m32 -Wall -O2 -o socktmo socktmo.c && ./socktmo
    recv time: 2.016763 seconds
    send time: 2.016062 seconds

    $ gcc -mx32 -Wall -O2 -o socktmo socktmo.c && ./socktmo
    recv time: 1.007239 seconds
    send time: 1.023890 seconds

Results after patch:
    $ gcc -m64 -O2 -Wall -o socktmo socktmo.c && ./socktmo
    recv time: 2.010062 seconds
    send time: 2.015836 seconds

    $ gcc -m32 -O2 -Wall -o socktmo socktmo.c && ./socktmo
    recv time: 2.013974 seconds
    send time: 2.015981 seconds

    $ gcc -mx32 -O2 -Wall -o socktmo socktmo.c && ./socktmo
    recv time: 2.030257 seconds
    send time: 2.013383 seconds

 #include <stdio.h>
 #include <stdlib.h>
 #include <sys/socket.h>
 #include <sys/types.h>
 #include <sys/time.h>

 void checkrc(char *str, int rc)
 {
         if (rc >= 0)
                 return;

         perror(str);
         exit(1);
 }

 static char buf[1024];
 int main(int argc, char **argv)
 {
         int rc;
         int socks[2];
         struct timeval tv;
         struct timeval start, end, delta;

         rc = socketpair(AF_UNIX, SOCK_STREAM, 0, socks);
         checkrc("socketpair", rc);

         /* set timeout to 1.999999 seconds */
         tv.tv_sec = 1;
         tv.tv_usec = 999999;
         rc = setsockopt(socks[0], SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof tv);
         rc = setsockopt(socks[0], SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof tv);
         checkrc("setsockopt", rc);

         /* measure actual receive timeout */
         gettimeofday(&start, NULL);
         rc = recv(socks[0], buf, sizeof buf, 0);
         gettimeofday(&end, NULL);
         timersub(&end, &start, &delta);

         printf("recv time: %ld.%06ld seconds\n",
                (long)delta.tv_sec, (long)delta.tv_usec);

         /* fill send buffer */
         do {
                 rc = send(socks[0], buf, sizeof buf, 0);
         } while (rc > 0);

         /* measure actual send timeout */
         gettimeofday(&start, NULL);
         rc = send(socks[0], buf, sizeof buf, 0);
         gettimeofday(&end, NULL);
         timersub(&end, &start, &delta);

         printf("send time: %ld.%06ld seconds\n",
                (long)delta.tv_sec, (long)delta.tv_usec);
         exit(0);
 }

Fixes: 515c7af85e ("x32: Use compat shims for {g,s}etsockopt")
Reported-by: Gopal RajagopalSai <gopalsr83@gmail.com>
Signed-off-by: Lance Richardson <lance.richardson.net@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 19:46:06 -04:00
Bjorn Andersson
28fb4e59a4 net: qrtr: Expose tunneling endpoint to user space
This implements a misc character device named "qrtr-tun" for the purpose
of allowing user space applications to implement endpoints in the qrtr
network.

This allows more advanced (and dynamic) testing of the qrtr code as well
as opens up the ability of tunneling qrtr over a network or USB link.

Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 15:06:10 -04:00
Marcelo Ricardo Leitner
38687b56c5 sctp: allow unsetting sockopt MAXSEG
RFC 6458 Section 8.1.16 says that setting MAXSEG as 0 means that the user
is not limiting it, and not that it should set to the *current* maximum,
as we are doing.

This patch thus allow setting it as 0, effectively removing the user
limit.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:24 -04:00
Marcelo Ricardo Leitner
439ef0309c sctp: consider idata chunks when setting SCTP_MAXSEG
When setting SCTP_MAXSEG sock option, it should consider which kind of
data chunk is being used if the asoc is already available, so that the
limit better reflect reality.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:23 -04:00
Marcelo Ricardo Leitner
63d01330aa sctp: honor PMTU_DISABLED when handling icmp
sctp_sendmsg() could trigger PMTU updates even when PMTU_DISABLED was
set, as pmtu_pending could be set unconditionally during icmp handling
if the socket was in use by the application.

This patch fixes it by checking for PMTU_DISABLED when handling such
deferred updates.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:23 -04:00
Marcelo Ricardo Leitner
6e91b578bf sctp: re-use sctp_transport_pmtu in sctp_transport_route
sctp_transport_route currently is very similar to sctp_transport_pmtu plus
a few other bits.

This patch reuses sctp_transport_pmtu in sctp_transport_route and removes
the duplicated code.

Also, as all calls to sctp_transport_route were forcing the dst release
before calling it, let's just include such release too.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:23 -04:00
Marcelo Ricardo Leitner
22d7be267e sctp: remove sctp_transport_pmtu_check
We are now keeping the MTU information synced between asoc, transport
and dst, which makes the check at sctp_packet_config() not needed
anymore. As it was the sole caller to this function, lets remove it.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:23 -04:00
Marcelo Ricardo Leitner
6ff0f871c2 sctp: introduce sctp_dst_mtu
Which makes sure that the MTU respects the minimum value of
SCTP_DEFAULT_MINSEGMENT and that it is correctly aligned.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:23 -04:00
Marcelo Ricardo Leitner
2521680e18 sctp: remove sctp_assoc_pending_pmtu
No need for this helper.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:23 -04:00
Marcelo Ricardo Leitner
2f5e3c9df6 sctp: introduce sctp_assoc_update_frag_point
and avoid the open-coded versions of it.

Now sctp_datamsg_from_user can just re-use asoc->frag_point as it will
always be updated.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:23 -04:00
Marcelo Ricardo Leitner
feddd6c1af sctp: introduce sctp_mtu_payload
When given a MTU, this function calculates how much payload we can carry
on it. Without a MTU, it calculates the amount of header overhead we
have.

So that when we have extra overhead, like the one added for IP options
on SELinux patches, it is easier to handle it.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:23 -04:00
Marcelo Ricardo Leitner
c4b2893dae sctp: introduce sctp_assoc_set_pmtu
All changes to asoc PMTU should now go through this wrapper, making it
easier to track them and to do other actions upon it.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:22 -04:00
Marcelo Ricardo Leitner
c88da20f95 sctp: remove an if() that is always true
As noticed by Xin Long, the if() here is always true as PMTU can never
be 0.

Reported-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:22 -04:00
Marcelo Ricardo Leitner
800e00c127 sctp: move transport pathmtu calc away of sctp_assoc_add_peer
There was only one case that sctp_assoc_add_peer couldn't handle, which
is when SPP_PMTUD_DISABLE is set and pathmtu not initialized.
So add this situation to sctp_transport_route and reuse what was
already in there.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:35:22 -04:00
Wei Wang
c36207bd87 tcp: remove mss check in tcp_select_initial_window()
In tcp_select_initial_window(), we only set rcv_wnd to
tcp_default_init_rwnd() if current mss > (1 << wscale). Otherwise,
rcv_wnd is kept at the full receive space of the socket which is a
value way larger than tcp_default_init_rwnd().
With larger initial rcv_wnd value, receive buffer autotuning logic
takes longer to kick in and increase the receive buffer.

In a TCP throughput test where receiver has rmem[2] set to 125MB
(wscale is 11), we see the connection gets recvbuf limited at the
beginning of the connection and gets less throughput overall.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:05:36 -04:00
Ursula Braun
abb190f194 net/smc: handle sockopt TCP_DEFER_ACCEPT
If sockopt TCP_DEFER_ACCEPT is set, the accept is delayed till
data is available.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:02:52 -04:00
Ursula Braun
01d2f7e2cd net/smc: sockopts TCP_NODELAY and TCP_CORK
Setting sockopt TCP_NODELAY or resetting sockopt TCP_CORK
triggers data transfer.

For a corked SMC socket RDMA writes are deferred, if there is
still sufficient send buffer space available.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:02:52 -04:00
Ursula Braun
ee9dfbef02 net/smc: handle sockopts forcing fallback
Several TCP sockopts do not work for SMC. One example are the
TCP_FASTOPEN sockopts, since SMC-connection setup is based on the TCP
three-way-handshake.
If the SMC socket is still in state SMC_INIT, such sockopts trigger
fallback to TCP. Otherwise an error is returned.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:02:52 -04:00
Karsten Graul
3382576106 net/smc: fix structure size
The struct smc_cdc_msg must be defined as packed so the
size is 44 bytes.
And change the structure size check so sizeof is checked.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:02:51 -04:00
Linus Torvalds
64ebe3126c Merge tag 'ceph-for-4.17-rc3' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
 "A CephFS quota follow-up and fixes for two older issues in the
  messenger layer, marked for stable"

* tag 'ceph-for-4.17-rc3' of git://github.com/ceph/ceph-client:
  libceph: validate con->state at the top of try_write()
  libceph: reschedule a tick in finish_hunting()
  libceph: un-backoff on tick when we have a authenticated session
  ceph: check if mds create snaprealm when setting quota
2018-04-27 10:56:29 -07:00
Kirill Tkhai
3f5ecd8a90 net: Fix coccinelle warning
kbuild test robot says:

  >coccinelle warnings: (new ones prefixed by >>)
  >>> net/core/dev.c:1588:2-3: Unneeded semicolon

So, let's remove it.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 13:53:14 -04:00
Xin Long
6a9a27d539 sctp: clear the new asoc's stream outcnt in sctp_stream_update
When processing a duplicate cookie-echo chunk, sctp moves the new
temp asoc's stream out/in into the old asoc, and later frees this
new temp asoc.

But now after this move, the new temp asoc's stream->outcnt is not
cleared while stream->out is set to NULL, which would cause a same
crash as the one fixed in Commit 79d0895140 ("sctp: fix error
path in sctp_stream_init") when freeing this asoc later.

This fix is to clear this outcnt in sctp_stream_update.

Fixes: f952be79ce ("sctp: introduce struct sctp_stream_out_ext")
Reported-by: Jianwen Ji <jiji@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 13:34:34 -04:00
Xin Long
d625329b06 sctp: handle two v4 addrs comparison in sctp_inet6_cmp_addr
Since sctp ipv6 socket also supports v4 addrs, it's possible to
compare two v4 addrs in pf v6 .cmp_addr, sctp_inet6_cmp_addr.

However after Commit 1071ec9d45 ("sctp: do not check port in
sctp_inet6_cmp_addr"), it no longer calls af1->cmp_addr, which
in this case is sctp_v4_cmp_addr, but calls __sctp_v6_cmp_addr
where it handles them as two v6 addrs. It would cause a out of
bounds crash.

syzbot found this crash when trying to bind two v4 addrs to a
v6 socket.

This patch fixes it by adding the process for two v4 addrs in
sctp_inet6_cmp_addr.

Fixes: 1071ec9d45 ("sctp: do not check port in sctp_inet6_cmp_addr")
Reported-by: syzbot+cd494c1dd681d4d93ebb@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 13:21:50 -04:00
YueHaibing
d8fb1648fc bridge: use hlist_entry_safe
Use hlist_entry_safe() instead of open-coding it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 13:20:48 -04:00
Florian Fainelli
cf96357303 net: dsa: Allow providing PHY statistics from CPU port
Implement the same type of ethtool diversion that we have for
ETH_SS_STATS and make it work with ETH_SS_PHY_STATS. This allows
providing PHY level statistics for CPU ports that are directly
connecting to a PHY device.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 11:53:03 -04:00
Florian Fainelli
6207a78c09 net: dsa: Add helper function to obtain PHY device of a given port
In preparation for having more call sites attempting to obtain a
reference against a PHY device corresponding to a particular port,
introduce a helper function for that purpose.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 11:53:03 -04:00
Florian Fainelli
89f0904834 net: dsa: Pass stringset to ethtool operations
Up until now we largely assumed that we were interested in ETH_SS_STATS
type of strings for all ethtool operations, this is about to change with
the introduction of additional string sets, e.g: ETH_SS_PHY_STATS.
Update all functions to take an appropriate stringset argument and act
on it when it is different than ETH_SS_STATS for now.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 11:53:03 -04:00
Florian Fainelli
1d1e79f1c6 net: dsa: Do not check for ethtool_ops validity
This is completely redundant with what netdev_set_default_ethtool_ops()
does, we are always guaranteed to have a valid dev->ethtool_ops pointer,
however, within that structure, not all function calls may be populated,
so we still have to check them individually.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 11:53:02 -04:00
Florian Fainelli
9994338227 net: Allow network devices to have PHY statistics
Add a new callback: get_ethtool_phy_stats() which allows network device
drivers not making use of the PHY library to return PHY statistics.
Update ethtool_get_phy_stats(), __ethtool_get_sset_count() and
__ethtool_get_strings() accordingly to interogate the network device
about ETH_SS_PHY_STATS.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 11:53:02 -04:00
Florian Fainelli
c59530d0d5 net: Move PHY statistics code into PHY library helpers
In order to make it possible for network device drivers that do not
necessarily have a phy_device attached, but still report PHY statistics,
have a preliminary refactoring consisting in creating helper functions
that encapsulate the PHY device driver knowledge within PHYLIB.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 11:53:02 -04:00
Yuchung Cheng
16ae6aa170 tcp: ignore Fast Open on repair mode
The TCP repair sequence of operation is to first set the socket in
repair mode, then inject the TCP stats into the socket with repair
socket options, then call connect() to re-activate the socket. The
connect syscall simply returns and set state to ESTABLISHED
mode. As a result Fast Open is meaningless for TCP repair.

However allowing sendto() system call with MSG_FASTOPEN flag half-way
during the repair operation could unexpectedly cause data to be
sent, before the operation finishes changing the internal TCP stats
(e.g. MSS).  This in turn triggers TCP warnings on inconsistent
packet accounting.

The fix is to simply disallow Fast Open operation once the socket
is in the repair mode.

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 11:49:31 -04:00
Guillaume Nault
8349440733 l2tp: consistent reference counting in procfs and debufs
The 'pppol2tp' procfs and 'l2tp/tunnels' debugfs files handle reference
counting of sessions differently than for tunnels.

For consistency, use the same mechanism for handling both sessions and
tunnels. That is, drop the reference on the previous session just
before looking up the next one (rather than in .show()). If necessary
(if dump stops before *_next_session() returns NULL), drop the last
reference in .stop().

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 11:06:35 -04:00
Jon Maloy
3e5cf362c3 tipc: introduce ioctl for fetching node identity
After the introduction of a 128-bit node identity it may be difficult
for a user to correlate between this identity and the generated node
hash address.

We now try to make this easier by introducing a new ioctl() call for
fetching a node identity by using the hash value as key. This will
be particularly useful when we extend some of the commands in the
'tipc' tool, but we also expect regular user applications to need
this feature.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 11:05:41 -04:00
Jon Maloy
7dbc73e612 tipc: fix bug in function tipc_nl_node_dump_monitor
Commit 36a50a989e ("tipc: fix infinite loop when dumping link monitor
summary") intended to fix a problem with user tool looping when max
number of bearers are enabled.

Unfortunately, the wrong version of the commit was posted, so the
problem was not solved at all.

This commit adds the missing part.

Fixes: 36a50a989e ("tipc: fix infinite loop when dumping link monitor summary")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 11:03:56 -04:00
Stefano Brivio
b4331a6818 vti6: Change minimum MTU to IPV4_MIN_MTU, vti6 can carry IPv4 too
A vti6 interface can carry IPv4 as well, so it makes no sense to
enforce a minimum MTU of IPV6_MIN_MTU.

If the user sets an MTU below IPV6_MIN_MTU, IPv6 will be
disabled on the interface, courtesy of addrconf_notify().

Reported-by: Xin Long <lucien.xin@gmail.com>
Fixes: b96f9afee4 ("ipv4/6: use core net MTU range checking")
Fixes: c6741fbed6 ("vti6: Properly adjust vti6 MTU from MTU of lower device")
Fixes: 53c81e95df ("ip6_vti: adjust vti mtu according to mtu of lower device")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-04-27 07:29:23 +02:00
David S. Miller
79741a38b4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-04-27

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add extensive BPF helper description into include/uapi/linux/bpf.h
   and a new script bpf_helpers_doc.py which allows for generating a
   man page out of it. Thus, every helper in BPF now comes with proper
   function signature, detailed description and return code explanation,
   from Quentin.

2) Migrate the BPF collect metadata tunnel tests from BPF samples over
   to the BPF selftests and further extend them with v6 vxlan, geneve
   and ipip tests, simplify the ipip tests, improve documentation and
   convert to bpf_ntoh*() / bpf_hton*() api, from William.

3) Currently, helpers that expect ARG_PTR_TO_MAP_{KEY,VALUE} can only
   access stack and packet memory. Extend this to allow such helpers
   to also use map values, which enabled use cases where value from
   a first lookup can be directly used as a key for a second lookup,
   from Paul.

4) Add a new helper bpf_skb_get_xfrm_state() for tc BPF programs in
   order to retrieve XFRM state information containing SPI, peer
   address and reqid values, from Eyal.

5) Various optimizations in nfp driver's BPF JIT in order to turn ADD
   and SUB instructions with negative immediate into the opposite
   operation with a positive immediate such that nfp can better fit
   small immediates into instructions. Savings in instruction count
   up to 4% have been observed, from Jakub.

6) Add the BPF prog's gpl_compatible flag to struct bpf_prog_info
   and add support for dumping this through bpftool, from Jiri.

7) Move the BPF sockmap samples over into BPF selftests instead since
   sockmap was rather a series of tests than sample anyway and this way
   this can be run from automated bots, from John.

8) Follow-up fix for bpf_adjust_tail() helper in order to make it work
   with generic XDP, from Nikita.

9) Some follow-up cleanups to BTF, namely, removing unused defines from
   BTF uapi header and renaming 'name' struct btf_* members into name_off
   to make it more clear they are offsets into string section, from Martin.

10) Remove test_sock_addr from TEST_GEN_PROGS in BPF selftests since
    not run directly but invoked from test_sock_addr.sh, from Yonghong.

11) Remove redundant ret assignment in sample BPF loader, from Wang.

12) Add couple of missing files to BPF selftest's gitignore, from Anders.

There are two trivial merge conflicts while pulling:

  1) Remove samples/sockmap/Makefile since all sockmap tests have been
     moved to selftests.
  2) Add both hunks from tools/testing/selftests/bpf/.gitignore to the
     file since git should ignore all of them.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-26 21:19:50 -04:00
Florian Westphal
2f99aa31cd netfilter: nf_tables: skip synchronize_rcu if transaction log is empty
After processing the transaction log, the remaining entries of the log
need to be released.

However, in some cases no entries remain, e.g. because the transaction
did not remove anything.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-04-27 00:40:12 +02:00
Florian Westphal
dceb48d86b netfilter: x_tables: check name length in find_match/target, too
ebtables uses find_match() rather than find_request_match in one case
(see bcf4934288,
 "netfilter: ebtables: Fix extension lookup with identical name"), so
 extend the check on name length to those functions too.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-04-27 00:40:11 +02:00
Jozsef Kadlecsik
72d4d3e398 netfilter: Fix handling simultaneous open in TCP conntrack
Dominique Martinet reported a TCP hang problem when simultaneous open was used.
The problem is that the tcp_conntracks state table is not smart enough
to handle the case. The state table could be fixed by introducing a new state,
but that would require more lines of code compared to this patch, due to the
required backward compatibility with ctnetlink.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Reported-by: Dominique Martinet <asmadeus@codewreck.org>
Tested-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-04-27 00:39:29 +02:00
Cong Wang
8b2ebb6cf0 ipvs: initialize tbl->entries in ip_vs_lblc_init_svc()
Similarly, tbl->entries is not initialized after kmalloc(),
therefore causes an uninit-value warning in ip_vs_lblc_check_expire(),
as reported by syzbot.

Reported-by: <syzbot+3e9695f147fb529aa9bc@syzkaller.appspotmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-04-27 00:20:33 +02:00
Cong Wang
3aa1409a7b ipvs: initialize tbl->entries after allocation
tbl->entries is not initialized after kmalloc(), therefore
causes an uninit-value warning in ip_vs_lblc_check_expire()
as reported by syzbot.

Reported-by: <syzbot+3dfdea57819073a04f21@syzkaller.appspotmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-04-27 00:20:33 +02:00
Pablo Neira Ayuso
146cd6b5d5 Merge tag 'ipvs-for-v4.18' of http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next
Simon Horman says:

====================
IPVS Updates for v4.18

please consider these IPVS enhancements for v4.18.

* Whitepace cleanup

* Add Maglev hashing algorithm as a IPVS scheduler

  Inju Song says "Implements the Google's Maglev hashing algorithm as a
  IPVS scheduler.  Basically it provides consistent hashing but offers some
  special features about disruption and load balancing.

  1) minimal disruption: when the set of destinations changes,
     a connection will likely be sent to the same destination
     as it was before.

  2) load balancing: each destination will receive an almost
     equal number of connections.

 Seel also: [3.4 Consistent Hasing] in
 https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-eisenbud.pdf
 "

* Fix to correct implementation of Knuth's multiplicative hashing
  which is used in sh/dh/lblc/lblcr algorithms. Instead the
  implementation provided by the hash_32() macro is used.
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-04-27 00:16:14 +02:00
Florian Westphal
d0103158cf netfilter: nf_tables: merge exthdr expression into nft core
before:
   text    data     bss     dec     hex filename
   5056     844       0    5900    170c net/netfilter/nft_exthdr.ko
 102456    2316     401  105173   19ad5 net/netfilter/nf_tables.ko

after:
 106410    2392     401  109203   1aa93 net/netfilter/nf_tables.ko

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-04-27 00:00:56 +02:00