Commit Graph

781771 Commits

Author SHA1 Message Date
Manish Chopra
2ce9c93eac qede: Ingress tc flower offload (drop action) support.
The main motive of this patch is to lay down driver's
tc offload infrastructure in place.

With these changes tc can offload various supported flow
profiles (4 tuples, src-ip, dst-ip, l4 port) for the drop
action. Dropped flows statistic is a global counter for
all the offloaded flows for drop action and is populated
in ethtool statistics as common "gft_filter_drop".

Examples -

tc qdisc add dev p4p1 ingress
tc filter add dev p4p1 protocol ipv4 parent ffff: flower \
	skip_sw ip_proto tcp dst_ip 192.168.40.200 action drop
tc filter add dev p4p1 protocol ipv4 parent ffff: flower \
	skip_sw ip_proto udp src_ip 192.168.40.100 action drop
tc filter add dev p4p1 protocol ipv4 parent ffff: flower \
	skip_sw ip_proto tcp src_ip 192.168.40.100 dst_ip 192.168.40.200 \
	src_port 453 dst_port 876 action drop
tc filter add dev p4p1 protocol ipv4 parent ffff: flower \
	skip_sw ip_proto tcp dst_port 98 action drop

Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 14:05:30 -07:00
Manish Chopra
91a56adbf1 qede: Add destination ip based flow profile.
This patch adds support for dropping and redirecting
the flows based on destination IP in the packet.

This also moves the profile mode settings in their own
functions which can be used through tc flows in successive
patch.

For example -

ethtool -N p5p1 flow-type tcp4 dst-ip 192.168.40.100 action -1
ethtool -N p5p1 flow-type udp4 dst-ip 192.168.50.100 action 1
ethtool -N p5p1 flow-type tcp4 dst-ip 192.168.60.100 action 0x100000000

Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 14:05:30 -07:00
Manish Chopra
5e7baf0fcb qed/qede: Multi CoS support.
This patch adds support for tc mqprio offload,
using this different traffic classes on the adapter
can be utilized based on configured priority to tc map.

For example -

tc qdisc add dev eth0 root mqprio num_tc 4 map 0 1 2 3

This will cause SKBs with priority 0,1,2,3 to transmit
over tc 0,1,2,3 hardware queues respectively.

Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 14:05:30 -07:00
David S. Miller
2eee32a74f Merge branch 's390-qeth-next'
Julian Wiedmann says:

====================
s390/qeth: updates 2018-08-09

one more set of patches for net-next. This is all sorts of cleanups and
more refactoring on the way to using netdev_priv. Please apply.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 14:02:50 -07:00
Gustavo A. R. Silva
1a363b0d3b s390/qeth: use true and false for boolean values
Return statements in functions returning bool should use true or false
instead of an integer value.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 14:02:50 -07:00
Julian Wiedmann
f15cdaf237 s390/qeth: don't restrict qeth_card to DMA memory
Allocating the main qeth_card struct with GFP_DMA blocks us from moving
it into netdev_priv(). But the only reason why we need DMA memory is the
ccw1 structs embedded into each ccw channel. So extract those into
separate allocations, like we already do for the cmd buffers.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 14:02:50 -07:00
Julian Wiedmann
95f4d8b75a s390/qeth: clean up card initialization
The qeth_card struct is kzalloc-ed, so remove all the redundant
0-initializations. While at it, split up what's left of
qeth_determine_card_type().

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 14:02:50 -07:00
Julian Wiedmann
24142fd8d8 s390/qeth: do basic setup for data channel
The data channel currently doesn't need a setup operation, because we
don't use pre-allocated cmd buffers for its IO. But subsequent changes
will introduce further setup that also applies to the data channel.
This refactors things a bit, so that the new stuff can then be
automatically applied to all channels.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 14:02:50 -07:00
Julian Wiedmann
45ca2fd646 s390/qeth: use qeth_setup_ccw() to set up all CCWs
Re-work the helper a little bit, so that it can be used for all CCWs
that qeth issues.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 14:02:50 -07:00
Julian Wiedmann
750b162598 s390/qeth: reduce hard-coded access to ccw channels
Where possible use accessor macros and local pointers to access the ccw
channels. This makes it less likely to miss a spot.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 14:02:50 -07:00
Julian Wiedmann
73657a3e5b s390/qeth: extract helper for MPC protocol type
Just a little code deduplication.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 14:02:50 -07:00
Jens Axboe
61884de08f null_blk: add lock drop/acquire annotation
sparse complains:

drivers/block/null_blk_main.c:816:24: sparse: context imbalance in 'null_insert_page' - unexpected unlock

Fix it by adding the necessary annotations to the function.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-09 14:22:41 -06:00
Alex Williamson
51ba09452d PCI: Delay after FLR of Intel DC P3700 NVMe
Add a device-specific reset for Intel DC P3700 NVMe device which exhibits a
timeout failure in drivers waiting for the ready status to update after
NVMe enable if the driver interacts with the device too soon after FLR.  As
this has been observed in device assignment scenarios, resolve this with a
device-specific reset quirk to add an additional, heuristically determined,
delay after the FLR completes.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1592654
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-08-09 15:20:52 -05:00
Alex Williamson
ffb0863426 PCI: Disable Samsung SM961/PM961 NVMe before FLR
The Samsung SM961/PM961 (960 EVO) sometimes fails to return from FLR with
the PCI config space reading back as -1.  A reproducible instance of this
behavior is resolved by clearing the enable bit in the NVMe configuration
register and waiting for the ready status to clear (disabling the NVMe
controller) prior to FLR.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1542494
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-08-09 15:18:33 -05:00
Alex Williamson
2d2917f774 PCI: Export pcie_has_flr()
pcie_flr() suggests pcie_has_flr() to ensure that PCIe FLR support is
present prior to calling.  pcie_flr() is exported while pcie_has_flr()
is not.  Resolve this.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-08-09 15:18:27 -05:00
zhong jiang
ac5bb5b3b0 rpc: remove unneeded variable 'ret' in rdma_listen_handler
The ret is not modified after initalization, So just remove the variable
and return 0.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09 16:11:21 -04:00
Gustavo A. R. Silva
a677a78325 nfsd: use true and false for boolean values
Return statements in functions returning bool should use true or false
instead of an integer value.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09 16:11:21 -04:00
Eric Biggers
c2cdc2ab24 nfsd: constify write_op[]
write_op[] is never modified, so make it 'const'.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09 16:11:21 -04:00
nixiaoming
5ed96bc545 fs/nfsd: Delete invalid assignment statements in nfsd4_decode_exchange_id
READ_BUF(8);
dummy = be32_to_cpup(p++);
dummy = be32_to_cpup(p++);
...
READ_BUF(4);
dummy = be32_to_cpup(p++);

Assigning value to "dummy" here, but that stored value
is overwritten before it can be used.
At the same time READ_BUF() will re-update the pointer p.

delete invalid assignment statements

Signed-off-by: nixiaoming <nixiaoming@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trondmy@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09 16:11:21 -04:00
Chuck Lever
11b4d66ea3 NFSD: Handle full-length symlinks
I've given up on the idea of zero-copy handling of SYMLINK on the
server side. This is because the Linux VFS symlink API requires the
symlink pathname to be in a NUL-terminated kmalloc'd buffer. The
NUL-termination is going to be problematic (watching out for
landing on a page boundary and dealing with a 4096-byte pathname).

I don't believe that SYMLINK creation is on a performance path or is
requested frequently enough that it will cause noticeable CPU cache
pollution due to data copies.

There will be two places where a transport callout will be necessary
to fill in the rqstp: one will be in the svc_fill_symlink_pathname()
helper that is used by NFSv2 and NFSv3, and the other will be in
nfsd4_decode_create().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09 16:11:21 -04:00
Chuck Lever
3fd9557aec NFSD: Refactor the generic write vector fill helper
fill_in_write_vector() is nearly the same logic as
svc_fill_write_vector(), but there are a few differences so that
the former can handle multiple WRITE payloads in a single COMPOUND.

svc_fill_write_vector() can be adjusted so that it can be used in
the NFSv4 WRITE code path too. Instead of assuming the pages are
coming from rq_args.pages, have the caller pass in the page list.

The immediate benefit is a reduction of code duplication. It also
prevents the NFSv4 WRITE decoder from passing an empty vector
element when the transport has provided the payload in the xdr_buf's
page array.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09 16:11:21 -04:00
Chuck Lever
07d0ff3b0c svcrdma: Clean up Read chunk path
Simplify the error handling at the tail of recv_read_chunk() by
re-arranging rq_pages[] housekeeping and documenting it properly.

NB: In this path, svc_rdma_recvfrom returns zero. Therefore no
subsequent reply processing is done on the svc_rqstp, and thus the
rq_respages field does not need to be updated.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09 16:11:21 -04:00
Chuck Lever
a53d5cb064 svcrdma: Avoid releasing a page in svc_xprt_release()
svc_xprt_release() invokes svc_free_res_pages(), which releases
pages between rq_respages and rq_next_page.

Historically, the RPC/RDMA transport has set these two pointers to
be different by one, which means:

- one page gets released when svc_recv returns 0. This normally
happens whenever one or more RDMA Reads need to be dispatched to
complete construction of an RPC Call.

- one page gets released after every call to svc_send.

In both cases, this released page is immediately refilled by
svc_alloc_arg. There does not seem to be a reason for releasing this
page.

To avoid this unnecessary memory allocator traffic, set rq_next_page
more carefully.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09 16:11:21 -04:00
Gustavo A. R. Silva
7b4d6da4bb nfsd: Mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Warning level 2 was used: -Wimplicit-fallthrough=2

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09 16:11:21 -04:00
YueHaibing
9cc3b98d1f sunrpc: remove redundant variables 'checksumlen','blocksize' and 'data'
Variables 'checksumlen','blocksize' and 'data' are being assigned,
but are never used, hence they are redundant and can be removed.

Fix the following warning:

  net/sunrpc/auth_gss/gss_krb5_wrap.c:443:7: warning: variable ‘blocksize’ set but not used [-Wunused-but-set-variable]
  net/sunrpc/auth_gss/gss_krb5_crypto.c:376:15: warning: variable ‘checksumlen’ set but not used [-Wunused-but-set-variable]
  net/sunrpc/xprtrdma/svc_rdma.c:97:9: warning: variable ‘data’ set but not used [-Wunused-but-set-variable]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09 16:11:21 -04:00
Amir Goldstein
64bed6cbe3 nfsd: fix leaked file lock with nfs exported overlayfs
nfsd and lockd call vfs_lock_file() to lock/unlock the inode
returned by locks_inode(file).

Many places in nfsd/lockd code use the inode returned by
file_inode(file) for lock manipulation. With Overlayfs, file_inode()
(the underlying inode) is not the same object as locks_inode() (the
overlay inode). This can result in "Leaked POSIX lock" messages
and eventually to a kernel crash as reported by Eddie Horng:
https://marc.info/?l=linux-unionfs&m=153086643202072&w=2

Fix all the call sites in nfsd/lockd that should use locks_inode().
This is a correctness bug that manifested when overlayfs gained
NFS export support in v4.16.

Reported-by: Eddie Horng <eddiehorng.tw@gmail.com>
Tested-by: Eddie Horng <eddiehorng.tw@gmail.com>
Cc: Jeff Layton <jlayton@kernel.org>
Fixes: 8383f17488 ("ovl: wire up NFS export operations")
Cc: stable@vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-08-09 16:11:21 -04:00
Guenter Roeck
cd6a2064db hwmon: k10temp: Support Threadripper 2920X, 2970WX; simplify offset table
All announced Threadripper 29xx models have a temperature offset of
27 degrees C. Simplify temperature offset table to match all 29xx
Threadripper models with a single entry. Also simplify the table to match
all 19xx Threadripper models with a single entry. This effectively drops
entries for Threadripper 1910/1920/1950 which never saw the light of day.

Cc: Michael Larabel <Michael@phoronix.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2018-08-09 12:56:56 -07:00
Daniel Borkmann
9c95420117 Merge branch 'bpf-fix-cpu-and-devmap-teardown'
Jesper Dangaard Brouer says:

====================
Removing entries from cpumap and devmap, goes through a number of
syncronization steps to make sure no new xdp_frames can be enqueued.
But there is a small chance, that xdp_frames remains which have not
been flushed/processed yet.  Flushing these during teardown, happens
from RCU context and not as usual under RX NAPI context.

The optimization introduced in commt 389ab7f01a ("xdp: introduce
xdp_return_frame_rx_napi"), missed that the flush operation can also
be called from RCU context.  Thus, we cannot always use the
xdp_return_frame_rx_napi call, which take advantage of the protection
provided by XDP RX running under NAPI protection.

The samples/bpf xdp_redirect_cpu have a --stress-mode, that is
adjusted to easier reproduce (verified by Red Hat QA).
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-09 21:50:45 +02:00
Jesper Dangaard Brouer
1bf9116d08 xdp: fix bug in devmap teardown code path
Like cpumap teardown, the devmap teardown code also flush remaining
xdp_frames, via bq_xmit_all() in case map entry is removed.  The code
can call xdp_return_frame_rx_napi, from the the wrong context, in-case
ndo_xdp_xmit() fails.

Fixes: 389ab7f01a ("xdp: introduce xdp_return_frame_rx_napi")
Fixes: 735fc4054b ("xdp: change ndo_xdp_xmit API to support bulking")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-09 21:50:44 +02:00
Jesper Dangaard Brouer
37d7ff2595 samples/bpf: xdp_redirect_cpu adjustment to reproduce teardown race easier
The teardown race in cpumap is really hard to reproduce.  These changes
makes it easier to reproduce, for QA.

The --stress-mode now have a case of a very small queue size of 8, that helps
to trigger teardown flush to encounter a full queue, which results in calling
xdp_return_frame API, in a non-NAPI protect context.

Also increase MAX_CPUS, as my QA department have larger machines than me.

Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-09 21:50:44 +02:00
Jesper Dangaard Brouer
ad0ab027fc xdp: fix bug in cpumap teardown code path
When removing a cpumap entry, a number of syncronization steps happen.
Eventually the teardown code __cpu_map_entry_free is invoked from/via
call_rcu.

The teardown code __cpu_map_entry_free() flushes remaining xdp_frames,
by invoking bq_flush_to_queue, which calls xdp_return_frame_rx_napi().
The issues is that the teardown code is not running in the RX NAPI
code path.  Thus, it is not allowed to invoke the NAPI variant of
xdp_return_frame.

This bug was found and triggered by using the --stress-mode option to
the samples/bpf program xdp_redirect_cpu.  It is hard to trigger,
because the ptr_ring have to be full and cpumap bulk queue max
contains 8 packets, and a remote CPU is racing to empty the ptr_ring
queue.

Fixes: 389ab7f01a ("xdp: introduce xdp_return_frame_rx_napi")
Tested-by: Jean-Tsung Hsiao <jhsiao@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-09 21:50:44 +02:00
David S. Miller
a736e07468 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net
Overlapping changes in RXRPC, changing to ktime_get_seconds() whilst
adding some tracepoints.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 11:52:36 -07:00
Liu Bo
991f61fe7e Blk-throttle: reduce tail io latency when iops limit is enforced
When an application's iops has exceeded its cgroup's iops limit, surely it
is throttled and kernel will set a timer for dispatching, thus IO latency
includes the delay.

However, the dispatch delay which is calculated by the limit and the
elapsed jiffies is suboptimal.  As the dispatch delay is only calculated
once the application's iops is (iops limit + 1), it doesn't need to wait
any longer than the remaining time of the current slice.

The difference can be proved by the following fio job and cgroup iops
setting,
-----
$ echo 4 > /mnt/config/nullb/disk1/mbps    # limit nullb's bandwidth to 4MB/s for testing.
$ echo "253:1 riops=100 rbps=max" > /sys/fs/cgroup/unified/cg1/io.max
$ cat r2.job
[global]
name=fio-rand-read
filename=/dev/nullb1
rw=randread
bs=4k
direct=1
numjobs=1
time_based=1
runtime=60
group_reporting=1

[file1]
size=4G
ioengine=libaio
iodepth=1
rate_iops=50000
norandommap=1
thinktime=4ms
-----

wo patch:
file1: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.7-66-gedfc
Starting 1 process

   read: IOPS=99, BW=400KiB/s (410kB/s)(23.4MiB/60001msec)
    slat (usec): min=10, max=336, avg=27.71, stdev=17.82
    clat (usec): min=2, max=28887, avg=5929.81, stdev=7374.29
     lat (usec): min=24, max=28901, avg=5958.73, stdev=7366.22
    clat percentiles (usec):
     |  1.00th=[    4],  5.00th=[    4], 10.00th=[    4], 20.00th=[    4],
     | 30.00th=[    4], 40.00th=[    4], 50.00th=[    6], 60.00th=[11731],
     | 70.00th=[11863], 80.00th=[11994], 90.00th=[12911], 95.00th=[22676],
     | 99.00th=[23725], 99.50th=[23987], 99.90th=[23987], 99.95th=[25035],
     | 99.99th=[28967]

w/ patch:
file1: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.7-66-gedfc
Starting 1 process

   read: IOPS=100, BW=400KiB/s (410kB/s)(23.4MiB/60005msec)
    slat (usec): min=10, max=155, avg=23.24, stdev=16.79
    clat (usec): min=2, max=12393, avg=5961.58, stdev=5959.25
     lat (usec): min=23, max=12412, avg=5985.91, stdev=5951.92
    clat percentiles (usec):
     |  1.00th=[    3],  5.00th=[    3], 10.00th=[    4], 20.00th=[    4],
     | 30.00th=[    4], 40.00th=[    5], 50.00th=[   47], 60.00th=[11863],
     | 70.00th=[11994], 80.00th=[11994], 90.00th=[11994], 95.00th=[11994],
     | 99.00th=[11994], 99.50th=[11994], 99.90th=[12125], 99.95th=[12125],
     | 99.99th=[12387]

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-09 12:43:16 -06:00
Joerg Roedel
a29dba161a x86/relocs: Add __end_rodata_aligned to S_REL
This new symbol needs to be in the workaround-list for buggy
binutils, otherwise the build with gcc-4.6 fails.

Fixes: 39d668e04e ('x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit')
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linux-Next Mailing List <linux-next@vger.kernel.org>
Link: https://lkml.kernel.org/r/20180809094449.ddmnrkz7qkvo3j2x@suse.de
2018-08-09 20:42:07 +02:00
David S. Miller
192e91d244 Merge branch 'Add-support-for-XGMAC2-in-stmmac'
Jose Abreu says:

====================
Add support for XGMAC2 in stmmac

This series adds support for 10Gigabit IP in stmmac. The IP is called XGMAC2
and has many similarities with GMAC4. Due to this, its relatively easy to
incorporate this new IP into stmmac driver by adding a new block and
filling the necessary callbacks.

The functionality added by this series is still reduced but its only a
starting point which will later be expanded.

I splitted the patches into funcionality and to ease the review. Only the
patch 8/9 really enables the XGMAC2 block by adding a new compatible string.

Version 4 addresses review comments of Florian Fainelli and Rob Herring.

NOTE: Although the IP supports 10G, for now it was only possible to test it
at 1G speed due to 10G PHY HW shipping problems. Here follows iperf3
results at 1G:

Connecting to host 192.168.0.10, port 5201
[  4] local 192.168.0.3 port 39178 connected to 192.168.0.10 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   110 MBytes   920 Mbits/sec    0    482 KBytes
[  4]   1.00-2.00   sec   113 MBytes   946 Mbits/sec    0    482 KBytes
[  4]   2.00-3.00   sec   112 MBytes   937 Mbits/sec    0    482 KBytes
[  4]   3.00-4.00   sec   113 MBytes   946 Mbits/sec    0    482 KBytes
[  4]   4.00-5.00   sec   112 MBytes   935 Mbits/sec    0    482 KBytes
[  4]   5.00-6.00   sec   113 MBytes   946 Mbits/sec    0    482 KBytes
[  4]   6.00-7.00   sec   112 MBytes   937 Mbits/sec    0    482 KBytes
[  4]   7.00-8.00   sec   113 MBytes   946 Mbits/sec    0    482 KBytes
[  4]   8.00-9.00   sec   112 MBytes   937 Mbits/sec    0    482 KBytes
[  4]   9.00-10.00  sec   113 MBytes   946 Mbits/sec    0    482 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.09 GBytes   940 Mbits/sec    0             sender
[  4]   0.00-10.00  sec  1.09 GBytes   938 Mbits/sec                  receiver
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 11:16:29 -07:00
Jose Abreu
80dfb28641 dt-bindings: net: stmmac: Add the bindings documentation for XGMAC2.
Adds the documentation for XGMAC2 DT bindings.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Cc: devicetree@vger.kernel.org
Cc: Rob Herring <robh+dt@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 11:16:28 -07:00
Jose Abreu
a3f142478a net: stmmac: Add the bindings parsing for XGMAC2
Add the bindings parsing for XGMAC2 IP block.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 11:16:28 -07:00
Jose Abreu
7d9e6c5afa net: stmmac: Integrate XGMAC into main driver flow
Now that we have all the XGMAC related callbacks, lets start integrating
this IP block into main driver.

Also, we corrected the initialization flow to only start DMA after
setting descriptors length.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 11:16:28 -07:00
Jose Abreu
4bb7aff9e6 net: stmmac: Add PTP support for XGMAC2
XGMAC2 uses the same engine of timestamping as GMAC4. Let's use the same
callbacks.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 11:16:28 -07:00
Jose Abreu
6fc21117b7 net: stmmac: Add MDIO related functions for XGMAC2
Add the MDIO related funcionalities for the new IP block XGMAC2.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 11:16:28 -07:00
Jose Abreu
874dfb65a4 net: stmmac: Add descriptor related callbacks for XGMAC2
Add the descriptor related callbacks for the new IP block XGMAC2.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 11:16:28 -07:00
Jose Abreu
d6ddfacd95 net: stmmac: Add DMA related callbacks for XGMAC2
Add the DMA related callbacks for the new IP block XGMAC2.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 11:16:28 -07:00
Jose Abreu
2142754f8b net: stmmac: Add MAC related callbacks for XGMAC2
Add the MAC related callbacks for the new IP block XGMAC2.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 11:16:28 -07:00
Jose Abreu
48ae5554a0 net: stmmac: Add XGMAC 2.10 HWIF entry
Add a new entry to HWIF table for XGMAC 2.10. For now we fill it with
empty callbacks which will be added in posterior patches.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 11:16:28 -07:00
David S. Miller
ff50eda44d Merge branch 'More-complete-PHYLINK-support-for-mv88e6xxx'
Andrew Lunn says:

====================
More complete PHYLINK support for mv88e6xxx

Previous patches added sufficient PHYLINK support to the mv88e6xxx
that it did not break existing use cases, basically fixed-link phys.

This patchset builds out the support so that SFP modules, up to
2.5Gbps can be supported, on mv88e6390X, on ports 9 and 10. It also
provides a framework which can be extended to support SFPs on ports
2-8 of mv88e6390X, 10Gbps PHYs, and SFP support on the 6352 family.

Russell King did much of the initial work, implementing the validate
and mac_link_state calls. However, there is an important TODO in the
commit message:

needs to call phylink_mac_change() when the port link comes up/goes down.

The remaining patches implement this, by adding more support for the
SERDES interfaces, in particular, interrupt support so we get notified
when the SERDES gains/looses sync.

This has been tested on the ZII devel C, using a Clearfog as peer
device.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 11:08:21 -07:00
Andrew Lunn
734447d4ed net: dsa: mv88e6xxx: Re-setup interrupts on CMODE change.
When a port changes CMODE, the SERDES interface being used can change.
Disable interrupts for the old SERDES interface, and enable interrupts
on the new.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 11:08:20 -07:00
Andrew Lunn
efd1ba6af9 net: dsa: mv88e6xxx: Add SERDES phydev_mac_change up for 6390
phylink wants to know when the MAC layers notices a change in the
link. For the 6390 family, this is a change in the SERDES state.

Add interrupt support for the SERDES interface used to implement
SGMII/1000Base-X/2500Base-X. This is currently limited to ports 9 and
10. Support for the 10G SERDES and other ports will be added later,
building on this basic framework.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 11:08:20 -07:00
Andrew Lunn
7b898469b9 net: dsa: mv88e6xxx: link mv88e6xxx_port to mv88e6xxx_chip
An up coming change will register interrupts for individual switch
ports, using the mv88e6xxx_port as the interrupt context information.
Add members to the mv88e6xxx_port structure so we can link it back to
the mv88e6xxx_chip member the port belongs to and the port number of
the port.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 11:08:20 -07:00
Andrew Lunn
364e9d7776 net: dsa: mv88e6xxx: Power on/off SERDES on cmode change
The 6390 family has a number of SERDES interfaces per port. When the
cmode changes, eg 1000Base-X to XAUI, the SERDES interface in use will
also change. Power down the old SERDES interface and power up the new
SERDES interface.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 11:08:20 -07:00
Andrew Lunn
2d2e1dd299 net: dsa: mv88e6xxx: Cache the port cmode
The ports CMODE indicates the type of link between the MAC and the
PHY. It is used often in the SERDES code. Rather than read it each
time, cache its value.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-09 11:08:20 -07:00