UDP GSO needs to export __udp_gso_segment to call it from ipv6.
I accidentally exported static ipv4 function __udp4_gso_segment.
Remove that EXPORT_SYMBOL_GPL.
Fixes: ee80d1ebe5 ("udp: add udp gso")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a documentation for seg_flowlabel sysctl into
Documentation/networking/ip-sysctl.txt
Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
U64_MAX is well defined now while the UINT64_MAX is not, so we fall
back to drivers' own definition as below:
#ifndef UINT64_MAX
#define UINT64_MAX (u64)(~((u64)0))
#endif
I believe this is in one phy driver then copied and pasted to other phy
drivers.
Replace the UINT64_MAX with U64_MAX to clean up the source code.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
use ns_to_timespec64() and timespec64_to_ns() instead of open coding
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implements a misc character device named "qrtr-tun" for the purpose
of allowing user space applications to implement endpoints in the qrtr
network.
This allows more advanced (and dynamic) testing of the qrtr code as well
as opens up the ability of tunneling qrtr over a network or USB link.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata says:
====================
selftests: Add tests for mirroring to gretap
This suite tests GRE-encapsulated mirroring. The general topology that
most of the tests use is as follows, but each test defines details of
the topology based on its needs, and some tests actually use a somewhat
different topology.
+---------------------+ +---------------------+
| H1 | | H2 |
| + $h1 | | $h2 + |
+-----|---------------+ +---------------|-----+
| |
+-----|------------------------------------------------------|-----+
| SW o---> mirror | |
| +---|------------------------------------------------------|---+ |
| | + $swp1 BR $swp2 + | |
| +--------------------------------------------------------------+ |
| |
| + $swp3 + gt6 (ip6gretap) + gt4 (gretap) |
+-----|----------------:--------------------:----------------------+
| : :
+-----|----------------:--------------------:----------------------+
| + $h3 + h3-gt6(ip6gretap) + h3-gt4 (gretap) |
| H3 |
+------------------------------------------------------------------+
The following axes of configuration space are tested:
- ingress and egress mirroring
- mirroring triggered by matchall and flower
- mirroring to ipgretap and ip6gretap
- remote tunnel reachable directly or through a next-hop route
- skip_sw as well as skip_hw configurations
Apart from basic tests with the above mentioned features, the following
tests are included:
- handling of changes to neighbors pertinent to routing decisions in
mirrored underlay
- handling of configuration changes at the mirrored-to tunnel (endpoint
addresses, upness)
A suite of mlxsw-specific tests will be part of a separate submission
through linux-mlxsw patch queue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
These tests set up mirroring in a situation that the configuration is
incorrect, i.e. mirrored packets, if any, are not supposed to reach
destination tunnel device. Then the configuration is rectified and
mirroring is checked to have started working.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test that when a mirror to gretap or ip6gretap netdevice is configured,
changes to neighbors are reflected.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a test for mirroring to a gretap and an ip6gretap netdevices such
that the mirroring action is triggered by a flower match.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test mirroring to a gretap and an ip6gretap netdevice with a bound
device, where the tunnel device and the bound device are in different
VRFs (an overlay / underlay configuration).
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test mirror to a gretap and an ip6gretap netdevice such that the remote
address of the tunnel is reachable through a next-hop route.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a test for basic mirroring to gretap and ip6gretap netdevices.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To simplify implementation of mirror-to-gretap tests, extend lib.sh with
several new functions that might potentially be useful more
broadly (although right now the mirroring tests will be the only
client).
Also add mirror_lib.sh with code useful for mirroring tests,
mirror_gre_lib.sh with code specifically useful for mirror-to-gretap
tests, and mirror_gre_topo.sh that primes a given test with a good
baseline topology that the test can then tweak to its liking.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: Net-next updates.
This series has 3 main features. The first is to add mqprio TC to
hardware queue mapping to avoid reprogramming hardware CoS queue
watermarks during run-time. The second is DIM improvements from
Andy Gospo. The third is some improvements to VF resource allocations
when supporting large numbers of VFs with more limited resources.
There are some additional minor improvements and a new function level
discard counter.
v2: Fixed EEPROM typo noted by Andrew Lunn.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add logic to reserve default rings at driver open time if none was
reserved during probe time. This will happen when the PF driver did
not provision minimum rings to the VF, due to more limited resources.
Driver open will only succeed if some minimum rings can be reserved.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For completeness and correctness, the VF driver needs to reserve these
RSS and L2 contexts.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When rings are more limited and the PF has not provisioned minimum
guaranteed rings to the VF, do not reserve rings during driver probe.
Wait till device open before reserving rings when they will be used.
Device open will succeed if some minimum rings can be successfully
reserved and allocated.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code does not reserve rings during ethtool -L when the device
is down. The rings will be reserved when the device is later opened.
Change it to reserve rings during ethtool -L when the device is down.
This provides a better guarantee that the device open will be successful
when the rings are reserved ahead of time.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds debugfs support for bnxt_en with the purpose of allowing users
to examine the current DIM profile in use for each receive queue. This
was instrumental in debugging issues found with DIM and ensuring that
the profiles we expect to use are the profiles being used.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Testing with DIM enabled on older kernels indicated that firmware calls
were slower than expected. More detailed analysis indicated that the
default 25us delay was higher than necessary. Reducing the time spend in
usleep_range() for the first several calls would reduce the overall
latency of firmware calls on newer Intel processors.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This keeps the RING_IDLE flag set in hardware for higher coalesce
settings by default and improved latency.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware does not allow the operation and would return failure, causing
a warning in dmesg. So check for VF and disallow it in the driver.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add counters to display sum of rx/tx_discard_pkts of all rings as
function level statistics via ethtool.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace switch statements printing different messages for every ring type
with a common message.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Older firmware will reject this call and cause an error message to
be printed by the VF driver.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware messages that are forwarded from PF to VFs are encapsulated.
The size of these encapsulated messages must not exceed the maximum
defined message size. Add appropriate checks to avoid oversize
messages. Firmware messages may be expanded in future specs and
this will provide some guardrails to avoid data corruption.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initially, the MQPRIO TCs are mapped 1:1 directly to the hardware
queues. Some of these hardware queues are configured to be lossless.
When PFC is enabled on one of more TCs, we now need to remap the
TCs that have PFC enabled to the lossless hardware queues.
After remapping, we need to close and open the NIC for the new
mapping to take effect. We also need to reprogram all ETS parameters.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current driver maps MQPRIO traffic classes directly 1:1 to the
internal hardware queues (TC0 maps to hardware queue 0, etc). This
direct mapping requires the internal hardware queues to be reconfigured
from lossless to lossy and vice versa when necessary. This
involves reconfiguring internal buffer thresholds which is
disruptive and not always reliable.
Implement a new scheme to map TCs to internal hardware queues by
matching up their PFC requirements. This will eliminate the need
to reconfigure a hardware queue internal buffers at run time. After
remapping, the NIC is closed and opened for the new TC to hardware
queues to take effect.
This patch only adds the basic mapping logic.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The calls up from the napi poll reading the receive ring had many
places where an argument was being recreated. I.e the caller already
had the value and wasn't passing it, then the callee would use
known relationship to determine the same value. Simpler and faster
to just pass arguments needed.
Also, add const in a couple places where message is being only read.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcelo Ricardo Leitner says:
====================
sctp: refactor MTU handling
Currently MTU handling is spread over SCTP stack. There are multiple
places doing same/similar calculations and updating them is error prone
as one spot can easily be left out.
This patchset converges it into a more concise and consistent code. In
general, it moves MTU handling from functions with bigger objectives,
such as sctp_assoc_add_peer(), to specific functions.
It's also a preparation for the next patchset, which removes the
duplication between sctp_make_op_error_space and
sctp_make_op_error_fixed and relies on sctp_mtu_payload introduced here.
More details on each patch.
====================
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 6458 Section 8.1.16 says that setting MAXSEG as 0 means that the user
is not limiting it, and not that it should set to the *current* maximum,
as we are doing.
This patch thus allow setting it as 0, effectively removing the user
limit.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When setting SCTP_MAXSEG sock option, it should consider which kind of
data chunk is being used if the asoc is already available, so that the
limit better reflect reality.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_sendmsg() could trigger PMTU updates even when PMTU_DISABLED was
set, as pmtu_pending could be set unconditionally during icmp handling
if the socket was in use by the application.
This patch fixes it by checking for PMTU_DISABLED when handling such
deferred updates.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_transport_route currently is very similar to sctp_transport_pmtu plus
a few other bits.
This patch reuses sctp_transport_pmtu in sctp_transport_route and removes
the duplicated code.
Also, as all calls to sctp_transport_route were forcing the dst release
before calling it, let's just include such release too.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are now keeping the MTU information synced between asoc, transport
and dst, which makes the check at sctp_packet_config() not needed
anymore. As it was the sole caller to this function, lets remove it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Which makes sure that the MTU respects the minimum value of
SCTP_DEFAULT_MINSEGMENT and that it is correctly aligned.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
and avoid the open-coded versions of it.
Now sctp_datamsg_from_user can just re-use asoc->frag_point as it will
always be updated.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When given a MTU, this function calculates how much payload we can carry
on it. Without a MTU, it calculates the amount of header overhead we
have.
So that when we have extra overhead, like the one added for IP options
on SELinux patches, it is easier to handle it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All changes to asoc PMTU should now go through this wrapper, making it
easier to track them and to do other actions upon it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As noticed by Xin Long, the if() here is always true as PMTU can never
be 0.
Reported-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There was only one case that sctp_assoc_add_peer couldn't handle, which
is when SPP_PMTUD_DISABLE is set and pathmtu not initialized.
So add this situation to sctp_transport_route and reuse what was
already in there.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This value is not used anywhere in the code. In essence it is a
duplicate of SCTP_DEFAULT_MINSEGMENT, which is used by the stack.
SCTP_MIN_PMTU value makes more sense, but we should not change to it now
as it would risk breaking applications.
So this patch removes SCTP_MIN_PMTU and adjust the comment above it.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A vti6 interface can carry IPv4 packets too.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In tcp_select_initial_window(), we only set rcv_wnd to
tcp_default_init_rwnd() if current mss > (1 << wscale). Otherwise,
rcv_wnd is kept at the full receive space of the socket which is a
value way larger than tcp_default_init_rwnd().
With larger initial rcv_wnd value, receive buffer autotuning logic
takes longer to kick in and increase the receive buffer.
In a TCP throughput test where receiver has rmem[2] set to 125MB
(wscale is 11), we see the connection gets recvbuf limited at the
beginning of the connection and gets less throughput overall.
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun says:
====================
smc fixes from 2018-04-17 - v3
in the mean time we challenged the benefit of these CLC handshake
optimizations for the sockopts TCP_NODELAY and TCP_CORK.
We decided to give up on them for now, since SMC still works
properly without.
There is now version 3 of the patch series with patches 2-4 implementing
sockopts that require special handling in SMC.
Version 3 changes
* no deferring of setsockopts TCP_NODELAY and TCP_CORK anymore
* allow fallback for some sockopts eliminating SMC usage
* when setting TCP_NODELAY always enforce data transmission
(not only together with corked data)
Version 2 changes of Patch 2/4 (and 3/4):
* return error -EOPNOTSUPP for TCP_FASTOPEN sockopts
* fix a kernel_setsockopt() usage bug by switching parameter
variable from type "u8" to "int"
* add return code validation when calling kernel_setsockopt()
* propagate a setsockopt error on the internal CLC socket
to the SMC socket.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If sockopt TCP_DEFER_ACCEPT is set, the accept is delayed till
data is available.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting sockopt TCP_NODELAY or resetting sockopt TCP_CORK
triggers data transfer.
For a corked SMC socket RDMA writes are deferred, if there is
still sufficient send buffer space available.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several TCP sockopts do not work for SMC. One example are the
TCP_FASTOPEN sockopts, since SMC-connection setup is based on the TCP
three-way-handshake.
If the SMC socket is still in state SMC_INIT, such sockopts trigger
fallback to TCP. Otherwise an error is returned.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The struct smc_cdc_msg must be defined as packed so the
size is 44 bytes.
And change the structure size check so sizeof is checked.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>