Commit Graph

948892 Commits

Author SHA1 Message Date
Steven Rostedt (VMware)
74e879373b ring-buffer: Move the add_timestamp into its own function
Make a helper function rb_add_timestamp() that moves the adding of the
extended time stamps into its own function. Also, remove the noinline and
inline for the functions it calls, as recent benchmarks appear they do not
make a difference (just let gcc decide).

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-07-01 22:12:06 -04:00
Steven Rostedt (VMware)
58fbc3c632 ring-buffer: Consolidate add_timestamp to remove some branches
Reorganize a little the logic to handle adding the absolute time stamp,
extended and forced time stamps, in such a way to remove a branch or two.
This is just a micro optimization.

Also add before and after time stamps to the rb_event_info structure to
display those values in the rb_check_timestamps() code, if something were to
go wrong.

Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-07-01 22:11:22 -04:00
Ronnie Sahlberg
19e888678b cifs: prevent truncation from long to int in wait_for_free_credits
The wait_event_... defines evaluate to long so we should not assign it an int as this may truncate
the value.

Reported-by: Marshall Midden <marshallmidden@gmail.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-07-01 20:01:26 -05:00
Helmut Grohne
e4b9a72d76 net: dsa: microchip: enable ksz9893 via i2c in the ksz9477 driver
The KSZ9893 3-Port Gigabit Ethernet Switch can be controlled via SPI,
I²C or MDIO (very limited and not supported by this driver). While there
is already a compatible entry for the SPI bus, it was missing for I²C.

Signed-off-by: Helmut Grohne <helmut.grohne@intenta.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 17:48:47 -07:00
David S. Miller
23212a7007 Merge branch 'mptcp-add-receive-buffer-auto-tuning'
Florian Westphal says:

====================
mptcp: add receive buffer auto-tuning

First patch extends the test script to allow for reproducible results.
Second patch adds receive auto-tuning.  Its based on what TCP is doing,
only difference is that we use the largest RTT of any of the subflows
and that we will update all subflows with the new value.

Else, we get spurious packet drops because the mptcp work queue might
not be able to move packets from subflow socket to master socket
fast enough.  Without the adjustment, TCP may drop the packets because
the subflow socket is over its rcvbuffer limit.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 17:47:55 -07:00
Florian Westphal
a6b118febb mptcp: add receive buffer auto-tuning
When mptcp is used, userspace doesn't read from the tcp (subflow)
socket but from the parent (mptcp) socket receive queue.

skbs are moved from the subflow socket to the mptcp rx queue either from
'data_ready' callback (if mptcp socket can be locked), a work queue, or
the socket receive function.

This means tcp_rcv_space_adjust() is never called and thus no receive
buffer size auto-tuning is done.

An earlier (not merged) patch added tcp_rcv_space_adjust() calls to the
function that moves skbs from subflow to mptcp socket.
While this enabled autotuning, it also meant tuning was done even if
userspace was reading the mptcp socket very slowly.

This adds mptcp_rcv_space_adjust() and calls it after userspace has
read data from the mptcp socket rx queue.

Its very similar to tcp_rcv_space_adjust, with two differences:

1. The rtt estimate is the largest one observed on a subflow
2. The rcvbuf size and window clamp of all subflows is adjusted
   to the mptcp-level rcvbuf.

Otherwise, we get spurious drops at tcp (subflow) socket level if
the skbs are not moved to the mptcp socket fast enough.

Before:
time mptcp_connect.sh -t -f $((4*1024*1024)) -d 300 -l 0.01% -r 0 -e "" -m mmap
[..]
ns4 MPTCP -> ns3 (10.0.3.2:10108      ) MPTCP   (duration 40823ms) [ OK ]
ns4 MPTCP -> ns3 (10.0.3.2:10109      ) TCP     (duration 23119ms) [ OK ]
ns4 TCP   -> ns3 (10.0.3.2:10110      ) MPTCP   (duration  5421ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10111) MPTCP   (duration 41446ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10112) TCP     (duration 23427ms) [ OK ]
ns4 TCP   -> ns3 (dead:beef:3::2:10113) MPTCP   (duration  5426ms) [ OK ]
Time: 1396 seconds

After:
ns4 MPTCP -> ns3 (10.0.3.2:10108      ) MPTCP   (duration  5417ms) [ OK ]
ns4 MPTCP -> ns3 (10.0.3.2:10109      ) TCP     (duration  5427ms) [ OK ]
ns4 TCP   -> ns3 (10.0.3.2:10110      ) MPTCP   (duration  5422ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10111) MPTCP   (duration  5415ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10112) TCP     (duration  5422ms) [ OK ]
ns4 TCP   -> ns3 (dead:beef:3::2:10113) MPTCP   (duration  5423ms) [ OK ]
Time: 296 seconds

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 17:47:55 -07:00
Florian Westphal
767659f650 selftests: mptcp: add option to specify size of file to transfer
The script generates two random files that are then sent via tcp and
mptcp connections.

In order to compare throughput over consecutive runs add an option
to provide the file size on the command line: "-f 128000".

Also add an option, -t, to enable tcp tests. This is useful to
compare throughput of mptcp connections and tcp connections.

Example: run tests with a 4mb file size, 300ms delay 0.01% loss,
default gso/tso/gro settings and with large write/blocking io:

mptcp_connect.sh -t -f $((4 * 1024 * 1024)) -d 300 -l 0.01%  -r 0 -e "" -m mmap

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 17:47:55 -07:00
Eric Dumazet
ba3bb0e76c tcp: fix SO_RCVLOWAT possible hangs under high mem pressure
Whenever tcp_try_rmem_schedule() returns an error, we are under
trouble and should make sure to wakeup readers so that they
can drain socket queues and eventually make room.

Fixes: 03f45c883c ("tcp: avoid extra wakeups for SO_RCVLOWAT users")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 17:46:04 -07:00
Danny Lin
b97e9d9d67 net: sched: Allow changing default qdisc to FQ-PIE
Similar to fq_codel and the other qdiscs that can set as default,
fq_pie is also suitable for general use without explicit configuration,
which makes it a valid choice for this.

This is useful in situations where a painless out-of-the-box solution
for reducing bufferbloat is desired but fq_codel is not necessarily the
best choice. For example, fq_pie can be better for DASH streaming, but
there could be more cases where it's the better choice of the two simple
AQMs available in the kernel.

Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 17:43:27 -07:00
Zhang Xiaoxu
9ffad9263b cifs: Fix the target file was deleted when rename failed.
When xfstest generic/035, we found the target file was deleted
if the rename return -EACESS.

In cifs_rename2, we unlink the positive target dentry if rename
failed with EACESS or EEXIST, even if the target dentry is positived
before rename. Then the existing file was deleted.

We should just delete the target file which created during the
rename.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-07-01 19:41:56 -05:00
David S. Miller
d8c8a96ce5 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2020-07-01

This series contains updates to all Intel drivers, but a majority of the
changes are to the i40e driver.

Jeff converts 'fall through' comments to the 'fallthrough;' keyword for
all Intel drivers. Removed unnecessary delay in the ixgbe ethtool
diagnostics test.

Arkadiusz implements Total Port Shutdown for i40e. This is the revised
patch based on Jakub's feedback from an earlier submission of this
patch, where additional code comments and description was needed to
describe the functionality.

Wei Yongjun fixes return error code for iavf_init_get_resources().

Magnus optimizes XDP code in i40e; starting with AF_XDP zero-copy
transmit completion path. Then by only executing a division when
necessary in the napi_poll data path. Move the check for transmit ring
full outside the send loop to increase performance.

Ciara add XDP ring statistics to i40e and the ability to dump these
statistics and descriptors.

Tony fixes reporting iavf statistics.

Radoslaw adds support for 2.5 and 5 Gbps by implementing the newer ethtool
ksettings API in ixgbe.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 17:41:56 -07:00
Paul Aurich
5391b8e1b7 SMB3: Honor 'posix' flag for multiuser mounts
The flag from the primary tcon needs to be copied into the volume info
so that cifs_get_tcon will try to enable extensions on the per-user
tcon. At that point, since posix extensions must have already been
enabled on the superblock, don't try to needlessly adjust the mount
flags.

Fixes: ce558b0e17 ("smb3: Add posix create context for smb3.11 posix mounts")
Fixes: b326614ea2 ("smb3: allow "posix" mount option to enable new SMB311 protocol extensions")
Signed-off-by: Paul Aurich <paul@darkrain42.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-07-01 19:41:36 -05:00
Paul Aurich
6b356f6cf9 SMB3: Honor 'handletimeout' flag for multiuser mounts
Fixes: ca567eb2b3 ("SMB3: Allow persistent handle timeout to be configurable on mount")
Signed-off-by: Paul Aurich <paul@darkrain42.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-07-01 19:40:33 -05:00
Paul Aurich
ad35f169db SMB3: Honor lease disabling for multiuser mounts
Fixes: 3e7a02d478 ("smb3: allow disabling requesting leases")
Signed-off-by: Paul Aurich <paul@darkrain42.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-07-01 19:40:17 -05:00
Paul Aurich
00dfbc2f9c SMB3: Honor persistent/resilient handle flags for multiuser mounts
Without this:

- persistent handles will only be enabled for per-user tcons if the
  server advertises the 'Continuous Availabity' capability
- resilient handles would never be enabled for per-user tcons

Signed-off-by: Paul Aurich <paul@darkrain42.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-07-01 19:40:06 -05:00
Paul Aurich
cc15461c73 SMB3: Honor 'seal' flag for multiuser mounts
Ensure multiuser SMB3 mounts use encryption for all users' tcons if the
mount options are configured to require encryption. Without this, only
the primary tcon and IPC tcons are guaranteed to be encrypted. Per-user
tcons would only be encrypted if the server was configured to require
encryption.

Signed-off-by: Paul Aurich <paul@darkrain42.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-07-01 19:38:46 -05:00
Willem de Bruijn
0da7536fb4 ip: Fix SO_MARK in RST, ACK and ICMP packets
When no full socket is available, skbs are sent over a per-netns
control socket. Its sk_mark is temporarily adjusted to match that
of the real (request or timewait) socket or to reflect an incoming
skb, so that the outgoing skb inherits this in __ip_make_skb.

Introduction of the socket cookie mark field broke this. Now the
skb is set through the cookie and cork:

<caller>		# init sockc.mark from sk_mark or cmsg
ip_append_data
  ip_setup_cork		# convert sockc.mark to cork mark
ip_push_pending_frames
  ip_finish_skb
    __ip_make_skb	# set skb->mark to cork mark

But I missed these special control sockets. Update all callers of
__ip(6)_make_skb that were originally missed.

For IPv6, the same two icmp(v6) paths are affected. The third
case is not, as commit 92e55f412c ("tcp: don't annotate
mark on control socket from tcp_v6_send_response()") replaced
the ctl_sk->sk_mark with passing the mark field directly as a
function argument. That commit predates the commit that
introduced the bug.

Fixes: c6af0c227a ("ip: support SO_MARK cmsg")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reported-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 17:38:30 -07:00
Paul Aurich
aadd69cad0 cifs: Display local UID details for SMB sessions in DebugData
This is useful for distinguishing SMB sessions on a multiuser mount.

Signed-off-by: Paul Aurich <paul@darkrain42.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2020-07-01 19:38:19 -05:00
Eric Dumazet
e114e1e8ac tcp: md5: do not send silly options in SYNCOOKIES
Whenever cookie_init_timestamp() has been used to encode
ECN,SACK,WSCALE options, we can not remove the TS option in the SYNACK.

Otherwise, tcp_synack_options() will still advertize options like WSCALE
that we can not deduce later when receiving the packet from the client
to complete 3WHS.

Note that modern linux TCP stacks wont use MD5+TS+SACK in a SYN packet,
but we can not know for sure that all TCP stacks have the same logic.

Before the fix a tcpdump would exhibit this wrong exchange :

10:12:15.464591 IP C > S: Flags [S], seq 4202415601, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 456965269 ecr 0,nop,wscale 8], length 0
10:12:15.464602 IP S > C: Flags [S.], seq 253516766, ack 4202415602, win 65535, options [nop,nop,md5 valid,mss 1400,nop,nop,sackOK,nop,wscale 8], length 0
10:12:15.464611 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid], length 0
10:12:15.464678 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid], length 12
10:12:15.464685 IP S > C: Flags [.], ack 13, win 65535, options [nop,nop,md5 valid], length 0

After this patch the exchange looks saner :

11:59:59.882990 IP C > S: Flags [S], seq 517075944, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 1751508483 ecr 0,nop,wscale 8], length 0
11:59:59.883002 IP S > C: Flags [S.], seq 1902939253, ack 517075945, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 1751508479 ecr 1751508483,nop,wscale 8], length 0
11:59:59.883012 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508479], length 0
11:59:59.883114 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508479], length 12
11:59:59.883122 IP S > C: Flags [.], ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508483], length 0
11:59:59.883152 IP S > C: Flags [P.], seq 1:13, ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508484 ecr 1751508483], length 12
11:59:59.883170 IP C > S: Flags [.], ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508484 ecr 1751508484], length 0

Of course, no SACK block will ever be added later, but nothing should break.
Technically, we could remove the 4 nops included in MD5+TS options,
but again some stacks could break seeing not conventional alignment.

Fixes: 4957faade1 ("TCPCT part 1g: Responder Cookie => Initiator")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 17:36:23 -07:00
Rao Shoaib
9ef845f894 rds: If one path needs re-connection, check all and re-connect
In testing with mprds enabled, Oracle Cluster nodes after reboot were
not able to communicate with others nodes and so failed to rejoin
the cluster. Peers with lower IP address initiated connection but the
node could not respond as it choose a different path and could not
initiate a connection as it had a higher IP address.

With this patch, when a node sends out a packet and the selected path
is down, all other paths are also checked and any down paths are
re-connected.

Reviewed-by: Ka-cheong Poon <ka-cheong.poon@oracle.com>
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com>
Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 17:35:17 -07:00
Eric Dumazet
e6ced831ef tcp: md5: refine tcp_md5_do_add()/tcp_md5_hash_key() barriers
My prior fix went a bit too far, according to Herbert and Mathieu.

Since we accept that concurrent TCP MD5 lookups might see inconsistent
keys, we can use READ_ONCE()/WRITE_ONCE() instead of smp_rmb()/smp_wmb()

Clearing all key->key[] is needed to avoid possible KMSAN reports,
if key->keylen is increased. Since tcp_md5_do_add() is not fast path,
using __GFP_ZERO to clear all struct tcp_md5sig_key is simpler.

data_race() was added in linux-5.8 and will prevent KCSAN reports,
this can safely be removed in stable backports, if data_race() is
not yet backported.

v2: use data_race() both in tcp_md5_hash_key() and tcp_md5_do_add()

Fixes: 6a2febec33 ("tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Marco Elver <elver@google.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 17:29:45 -07:00
David S. Miller
11a20c7152 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2020-07-01

This series contains updates to the ice driver only.

Jacob implements a devlink region for device capabilities.

Bruce removes structs containing only one-element arrays that are either
unused or only used for indexing. Instead, use pointer arithmetic or
other indexing to access the elements. Converts "C struct hack"
variable-length types to the preferred C99 flexible array member.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 17:25:00 -07:00
Bruce Allan
66486d8943 ice: replace single-element array used for C struct hack
Convert the pre-C90-extension "C struct hack" method (using a single-
element array at the end of a structure for implementing variable-length
types) to the preferred use of C99 flexible array member.

Additional code cleanups were done near areas affected by this change.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-01 16:35:23 -07:00
Bruce Allan
b3c3890489 ice: avoid unnecessary single-member variable-length structs
There are a number of structures that consist of a one-element array as the
only struct member.  Some of those are unused so remove them. Others are
used to index into a buffer/array consisting of a variable number of a
different data or structure type.  Those are unnecessary since we can use
simple pointer arithmetic or index directly into the buffer to access
individual elements of the buffer/array.

Additional code cleanups were done near areas affected by this change.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-01 16:33:29 -07:00
Matt Atwood
680c45c767 drm/i915/dp: Correctly advertise HBR3 for GEN11+
intel_dp_set_source_rates() calls intel_dp_is_edp(), which is unsafe to
use before encoder_type is set. This caused GEN11+ to incorrectly strip
HBR3 from source rates for edp. Move intel_dp_set_source_rates() to
after encoder_type is set. Add comment to intel_dp_is_edp() describing
unsafe usages.

v2: Alter intel_dp_set_source_rates final position (Ville/Manasi).
    Remove outdated comment (Ville).
    Slight optimization of control flow in intel_dp_init_connector.
    Slight rewording in commit message.

Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200630233310.10191-1-matthew.s.atwood@intel.com
2020-07-01 16:24:45 -07:00
Mathieu Poirier
49cff12568 Revert "remoteproc: Add support for runtime PM"
This reverts commit a99a37f6cd.

Removing PM runtime operations from the remoteproc core in order to:

1) Keep all power management operations in platform drivers.  That way we
do not loose flexibility in an area that is very HW specific.

2) Avoid making the support for remote processor managed by external
entities more complex that it already is.

3) Fix regression introduced for the Omap remoteproc driver.

Acked-by: Suman Anna <s-anna@ti.com>
Tested-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20200630163118.3830422-3-mathieu.poirier@linaro.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2020-07-01 16:05:32 -07:00
Mathieu Poirier
4605ad8f45 remoteproc: ingenic: Move clock handling to prepare/unprepare callbacks
This patch moves clock related operations to the remoteproc prepare()
and unprepare() callbacks so that the PM runtime framework doesn't
have to be involved needlessly.  This provides a simpler approach and
requires less code.

Based on the work from Paul Cercueil published here:
https://lore.kernel.org/linux-remoteproc/20191116170846.67220-4-paul@crapouillou.net/

Reviewed-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: https://lore.kernel.org/r/20200630163118.3830422-2-mathieu.poirier@linaro.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2020-07-01 16:05:18 -07:00
Jarod Wilson
a3b658cfb6 bonding: allow xfrm offload setup post-module-load
At the moment, bonding xfrm crypto offload can only be set up if the bonding
module is loaded with active-backup mode already set. We need to be able to
make this work with bonds set to AB after the bonding driver has already
been loaded.

So what's done here is:

1) move #define BOND_XFRM_FEATURES to net/bonding.h so it can be used
by both bond_main.c and bond_options.c
2) set BOND_XFRM_FEATURES in bond_dev->hw_features universally, rather than
only when loading in AB mode
3) wire up xfrmdev_ops universally too
4) disable BOND_XFRM_FEATURES in bond_dev->features if not AB
5) exit early (non-AB case) from bond_ipsec_offload_ok, to prevent a
performance hit from traversing into the underlying drivers
5) toggle BOND_XFRM_FEATURES in bond_dev->wanted_features and call
netdev_change_features() from bond_option_mode_set()

In my local testing, I can change bonding modes back and forth on the fly,
have hardware offload work when I'm in AB, and see no performance penalty
to non-AB software encryption, despite having xfrm bits all wired up for
all modes now.

Fixes: 18cb261afd ("bonding: support hardware encryption offload to slaves")
Reported-by: Huy Nguyen <huyn@mellanox.com>
CC: Saeed Mahameed <saeedm@mellanox.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: netdev@vger.kernel.org
CC: intel-wired-lan@lists.osuosl.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:53:32 -07:00
Sean Tranchetti
1e82a62fec genetlink: remove genl_bind
A potential deadlock can occur during registering or unregistering a
new generic netlink family between the main nl_table_lock and the
cb_lock where each thread wants the lock held by the other, as
demonstrated below.

1) Thread 1 is performing a netlink_bind() operation on a socket. As part
   of this call, it will call netlink_lock_table(), incrementing the
   nl_table_users count to 1.
2) Thread 2 is registering (or unregistering) a genl_family via the
   genl_(un)register_family() API. The cb_lock semaphore will be taken for
   writing.
3) Thread 1 will call genl_bind() as part of the bind operation to handle
   subscribing to GENL multicast groups at the request of the user. It will
   attempt to take the cb_lock semaphore for reading, but it will fail and
   be scheduled away, waiting for Thread 2 to finish the write.
4) Thread 2 will call netlink_table_grab() during the (un)registration
   call. However, as Thread 1 has incremented nl_table_users, it will not
   be able to proceed, and both threads will be stuck waiting for the
   other.

genl_bind() is a noop, unless a genl_family implements the mcast_bind()
function to handle setting up family-specific multicast operations. Since
no one in-tree uses this functionality as Cong pointed out, simply removing
the genl_bind() function will remove the possibility for deadlock, as there
is no attempt by Thread 1 above to take the cb_lock semaphore.

Fixes: c380d9a7af ("genetlink: pass multicast bind/unbind to families")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Johannes Berg <johannes.berg@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:49:11 -07:00
Jacob Keller
8d7aab3515 ice: implement snapshot for device capabilities
Add a new devlink region used for capturing a snapshot of the device
capabilities buffer which is reported by the firmware over the AdminQ.
This information can useful in debugging driver and firmware
interactions.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-01 15:41:45 -07:00
David S. Miller
651f8bd4da Merge branch 'net-ipa-endpoint-configuration-updates'
Alex Elder says:

====================
net: ipa: endpoint configuration updates

This series updates code that configures IPA endpoints.  The changes
made mainly affect access to registers that are valid only for RX, or
only for TX endpoints.

The first three patches avoid writing endpoint registers if they are
not defined to be valid.  The fourth patch slightly modifies the
parameters for the offset macros used for these endpoint registers,
to make it explicit when only some endpoints are valid.

The last patch just tweaks one line of code so it uses a convention
used everywhere else in the driver.

Version 2 of this series eliminates some of the "assert()" comments
that Jakub inquired about.  The ones removed will actually go away
in an upcoming (not-yet-posted) patch series anyway.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:30:34 -07:00
Alex Elder
547c878854 net: ipa: HOL_BLOCK_EN_FMASK is a 1-bit mask
The convention throughout the IPA driver is to directly use
single-bit field mask values, rather than using (for example)
u32_encode_bits() to set or clear them.

Fix the one place that doesn't follow that convention, which sets
HOL_BLOCK_EN_FMASK in ipa_endpoint_init_hol_block_enable().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:30:34 -07:00
Alex Elder
8b97bcb7bb net: ipa: clarify endpoint register macro constraints
A handful of registers are valid only for RX endpoints, and some
others are valid only for TX endpoints.  For these endpoints, add
a comment above their defined offset macro that indicates the
endpoints to which they apply.

Extend the endpoint parameter naming convention as well, to make
these constraints more explicit.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:30:34 -07:00
Alex Elder
00b9102afa net: ipa: mode register is TX only
The INIT_MODE endpoint configuration register is only valid for TX
endpoints.  Rather than writing a zero to that register for RX
endpoints, avoid writing the register at all.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:30:34 -07:00
Alex Elder
9b63f09378 net: ipa: metadata_mask register is RX only
The INIT_HDR_METADATA_MASK endpoint configuration register is only
valid for RX endpoints.  Rather than writing a zero to that register
for TX endpoints, avoid writing the register at all.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:30:34 -07:00
Alex Elder
f8d34dfdf3 net: ipa: head-of-line block registers are RX only
The INIT_HOL_BLOCK_EN and INIT_HOL_BLOCK_TIMER endpoint registers
are only valid for RX endpoints.

Have ipa_endpoint_modem_hol_block_clear_all() skip writing these
registers for TX endpoints.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:30:33 -07:00
Fabio Estevam
0115e6c98c dt-bindings: clock: imx: Fix e-mail address
The freescale.com domain is gone for quite some time.

Use the nxp.com domain instead.

Signed-off-by: Fabio Estevam <festevam@gmail.com>
Link: https://lore.kernel.org/r/20200701005346.1008-1-festevam@gmail.com
Signed-off-by: Rob Herring <robh@kernel.org>
2020-07-01 16:29:11 -06:00
David S. Miller
21ddff5c95 Merge branch 'net-ipa-small-improvements'
Alex Elder says:

====================
net: ipa: small improvements

This series contains two patches that improve the error output
that's reported when an error occurs while changing the state of a
GSI channel or event ring.  The first ensures all such error
conditions report an error, and the second simplifies the messages a
little and ensures they are all consistent.

A third (independent) patch gets rid of an unused symbol in the
microcontroller code.

Version 2 fixes two alignment problems pointed out by checkpatch.pl,
as requested by Jakub Kicinski.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:29:07 -07:00
Alex Elder
722208ea3e net: ipa: kill IPA_MEM_UC_OFFSET
The microcontroller shared memory area is at the beginning of the
IPA resident memory.  IPA_MEM_UC_OFFSET was defined as the offset
within that region where it's found, but it's 0, and it's never
actually used.  Just get rid of the definition, and move some of the
description it had to be above the definition of the ipa_uc_mem_area
structure.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:29:07 -07:00
Alex Elder
8463488af4 net: ipa: standarize more GSI error messages
Make minor updates to error messages reported in "gsi.c":
  - Use local variables to reduce multi-line function calls
  - Don't use parentheses in messages
  - Do some slight rewording in a few cases

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:29:07 -07:00
Alex Elder
a442b3c755 net: ipa: always report GSI state errors
We check the state of an event ring or channel both before and after
any GSI command issued that will change that state.  In most--but
not all--cases, if the state is something different than expected we
report an error message.

Add error messages where missing, so that all unexpected states
provide information about what went wrong.  Drop the parentheses
around the state value shown in all cases.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:29:07 -07:00
David S. Miller
6f6746d7ba Merge branch 'net-ipa-simple-refactorizations'
Alex Elder says:

====================
net: ipa: simple refactorizations

This series makes three small changes to some endpoint configuration
code.  The first uses a constant to represent the frequency of an
internal clock used for timers in the IPA.  The second modifies a
limit used so it matches Qualcomm's internal code.  And the third
reworks a few lines of code, eliminating a multi-line function call.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:27:09 -07:00
Alex Elder
9e88cb5ff7 net: ipa: reuse a local variable in ipa_endpoint_init_aggr()
Reuse the "limit" local variable in ipa_endpoint_init_aggr() when
setting the aggregation size limit.  Simple cleanup.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:27:09 -07:00
Alex Elder
1d86652b13 net: ipa: reduce aggregation time limit
Halve the time limit used when aggregation is enabled on an RX
endpoint, to half a millisecond.

Use DIV_ROUND_CLOSEST() to compute the value that represents the
time period, to get better accuracy in the event the time limit is
not an even multiple of the granularity.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:27:09 -07:00
Alex Elder
317a5740b7 net: ipa: rework ipa_aggr_granularity_val()
The timer used for aggregation makes use of an internal 32 KHz clock.
The granularity of the timer is programmed by a field whose value is
computed by ipa_aggr_granularity_val().  Redefine the way that value
is computed by using a new TIMER_FREQUENCY constant representing the
underlying clock frequency.

Add two BUILD_BUG_ON() calls to ensure the value used is valid.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:27:09 -07:00
David S. Miller
8c96439724 Merge branch 'add-XDP-support-to-xen-netfront'
Denis Kirjanov says:

====================
xen networking: add XDP support to xen-netfront

The first patch adds a new extra type to enable proper synchronization
between an RX request/response pair.
The second patch implements BFP interface for xen-netfront.
The third patch enables extra space for XDP processing.

v14:
- fixed compilation warnings

v13:
- fixed compilation due to previous rename

v12:
- xen-netback: rename netfront_xdp_headroom to xdp_headroom

v11:
- add the new headroom constant to netif.h
- xenbus_scanf check
- lock a bulk of puckets in xennet_xdp_xmit()

v10:
- add a new xen_netif_extra_info type to enable proper synchronization
 between an RX request/response pair.
- order local variable declarations

v9:
- assign an xdp program before switching to Reconfiguring
- minor cleanups
- address checkpatch issues

v8:
- add PAGE_POOL config dependency
- keep the state of XDP processing in netfront_xdp_enabled
- fixed allocator type in xdp_rxq_info_reg_mem_model()
- minor cleanups in xen-netback

v7:
- use page_pool_dev_alloc_pages() on page allocation
- remove the leftover break statement from netback_changed

v6:
- added the missing SOB line
- fixed subject

v5:
- split netfront/netback changes
- added a sync point between backend/frontend on switching to XDP
- added pagepool API

v4:
- added verbose patch descriprion
- don't expose the XDP headroom offset to the domU guest
- add a modparam to netback to toggle XDP offset
- don't process jumbo frames for now

v3:
- added XDP_TX support (tested with xdping echoserver)
- added XDP_REDIRECT support (tested with modified xdp_redirect_kern)
- moved xdp negotiation to xen-netback

v2:
- avoid data copying while passing to XDP
- tell xen-netback that we need the headroom space
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:25:14 -07:00
Denis Kirjanov
1c9535c701 xen networking: add XDP offset adjustment to xen-netback
the patch basically adds the offset adjustment and netfront
state reading to make XDP work on netfront side.

Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:25:14 -07:00
Denis Kirjanov
6c5aa6fc4d xen networking: add basic XDP support for xen-netfront
The patch adds a basic XDP processing to xen-netfront driver.

We ran an XDP program for an RX response received from netback
driver. Also we request xen-netback to adjust data offset for
bpf_xdp_adjust_head() header space for custom headers.

synchronization between frontend and backend parts is done
by using xenbus state switching:
Reconfiguring -> Reconfigured- > Connected

UDP packets drop rate using xdp program is around 310 kpps
using ./pktgen_sample04_many_flows.sh and 160 kpps without the patch.

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:25:14 -07:00
Denis Kirjanov
2cef30d7bd xen: netif.h: add a new extra type for XDP
The patch adds a new extra type to be able to diffirentiate
between RX responses on xen-netfront side with the adjusted offset
required for XDP processing.

The offset value from a guest is passed via xenstore.

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-01 15:25:14 -07:00
Alexei Starovoitov
91f77560e4 Merge branch 'test_progs-improvements'
Jesper Dangaard Brouer says:

====================
V3: Reorder patches to cause less code churn.

The BPF selftest 'test_progs' contains many tests, that cover all the
different areas of the kernel where BPF is used.  The CI system sees this
as one test, which is impractical for identifying what team/engineer is
responsible for debugging the problem.

This patchset add some options that makes it easier to create a shell
for-loop that invoke each (top-level) test avail in test_progs. Then each
test FAIL/PASS result can be presented the CI system to have a separate
bullet. (For Red Hat use-case in Beaker https://beaker-project.org/)

Created a public script[1] that uses these features in an advanced way.
Demonstrating howto reduce the number of (top-level) tests by grouping tests
together via using the existing test pattern selection feature, and then
using the new --list feature combined with exclude (-b) to get a list of
remaining test names that was not part of the groups.

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/scripts/bpf_selftests_grouping.sh
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-07-01 15:22:13 -07:00