Commit Graph

4379 Commits

Author SHA1 Message Date
Colin Ian King
24bb4d8688 RDMA/bnxt_re: fix spelling mistake: "Deallocte" -> "Deallocate"
Trivial fix to spelling mistake in dev_err error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 11:31:30 -04:00
Colin Ian King
a63aa5dbed IB/hfi1: fix spelling mistake in variable name continious
Trivial fix to spelling mistake, rename variable 'continious'
to the correct spelling 'continuous'

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 11:29:22 -04:00
Colin Ian King
055bb27906 IB/qib: fix spelling mistake: "failng" -> "failing"
Trivial fix to spelling mistake in qib_dev_err error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 11:22:20 -04:00
Christophe Jaillet
836daee2df cxgb4: Remove some dead code
This 'BUG_ON(!ep)' can never trigger because we have:
   if (!ep)
      return 0;
just a few lines above. So it can be removed safely.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-17 18:06:37 -04:00
Steve Wise
d4ba61d218 iw_cxgb4: fix misuse of integer variable
Fixes: ee30f7d507 ("iw_cxgb4: Max fastreg depth depends on DSGL support")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-16 11:43:21 -04:00
Colin Ian King
5b59a3969e IB/hns: fix memory leak on ah on error return path
When dmac is NULL, ah is not being freed on the error return path. Fix
this by kfree'ing it.

Detected by CoverityScan, CID#1452636 ("Resource Leak")

Fixes: d8966fcd4c ("IB/core: Use rdma_ah_attr accessor functions")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-16 11:30:33 -04:00
Christopher N Bednarz
aa939c12ab i40iw: Fix potential fcn_id_array out of bounds
Avoid out of bounds error by utilizing I40IW_MAX_STATS_COUNT
instead of I40IW_INVALID_FCN_ID.

Signed-off-by: Christopher N Bednarz <christoper.n.bednarz@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-16 11:27:44 -04:00
Christopher N Bednarz
a28f047e5f i40iw: Use correct alignment for CQ0 memory
Utilize correct alignment variable when allocating
DMA memory for CQ0.

Signed-off-by: Christopher N Bednarz <christopher.n.bednarz@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-16 11:27:44 -04:00
Mustafa Ismail
29c2415a66 i40iw: Fix typecast of tcp_seq_num
The typecast of tcp_seq_num incorrectly uses u8. Fix by
casting to u32.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-16 11:27:44 -04:00
Mustafa Ismail
8129331f01 i40iw: Correct variable names
Fix incorrect naming of status code and struct. Use inline
instead of immediate.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-16 11:27:44 -04:00
Chien Tin Tung
f67ace2d88 i40iw: Fix parsing of query/commit FPM buffers
Parsing of commit/query Host Memory Cache Function Private Memory
is not skipping over reserved fields and incorrectly assigning
those values into object's base/cnt/max_cnt fields. Skip over
reserved fields and set correct values. Also correct memory
alignment requirement for commit/query FPM buffers.

Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Christopher N Bednarz <christopher.n.bednarz@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-16 11:27:44 -04:00
Bryan Tan
a7d2e03928 RDMA/vmw_pvrdma: Report CQ missed events
There is a chance of a race between arming the CQ and receiving
completions. By reporting CQ missed events any ULPs should poll
again to get the completions.

Fixes: 29c8d9eba5 ("IB: Add vmw_pvrdma driver")
Acked-by: Aditya Sarwade <asarwade@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-16 10:56:04 -04:00
Matan Barak
3e5f0881f1 IB/hns: Avoid compile test under non 64bit environments
The hns driver uses __raw_writeq which is only defined in 64BIT
environments. Trying to compile the driver in a 32BIT environment
results in errors. Only COMPILE_TEST when 64BIT is defined.

Fixes: 7d1b6a678e0b ("IB/hns: Support compile test for hns RoCE driver")
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-14 11:16:53 -04:00
Doug Ledford
8ccf098104 Revert "RDMA/hns: fix build regression"
This reverts commit ecd840ff9b.
2017-08-14 11:15:26 -04:00
Doug Ledford
d0d62c34fb Merge branch 'rdma-netlink' into k.o/merge-test
Conflicts:
	include/rdma/ib_verbs.h - Modified a function signature adjacent
	to a newly added function signature from a previous merge

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-10 14:34:18 -04:00
Doug Ledford
320438301b Merge branches '32bit_lid' and 'irq_affinity' into k.o/merge-test
Conflicts:
	drivers/infiniband/hw/mlx5/main.c - Both add new code
	include/rdma/ib_verbs.h - Both add new code

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-10 14:31:29 -04:00
Leon Romanovsky
9abb0d1bbd RDMA: Simplify get firmware interface
There is a need to forward FW version to user space
application through RDMA netlink. In order to make it safe, there
is need to declare nla_policy and limit the size of FW string.

The new define IB_FW_VERSION_NAME_MAX will limit the size of
FW version string. That define was chosen to be equal to
ETHTOOL_FWVERS_LEN, because many drivers anyway are limited
by that value indirectly.

The introduction of this define allows us to remove the string size
from get_fw_str function signature.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10 13:28:10 +03:00
Sagi Grimberg
40b24403f3 mlx5: support ->get_vector_affinity
Simply refer to the generic affinity mask helper.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:58:03 -04:00
Hiatt, Don
7db20ecd1d IB/core: Change wc.slid from 16 to 32 bits
slid field in struct ib_wc is increased to 32 bits.
This enables core components to use larger LIDs if needed.
The user ABI is unchanged and return 16 bit values when queried.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:50:25 -04:00
Dasaratharaman Chandramouli
582faf3150 IB/core: Change port_attr.lid size from 16 to 32 bits
lid field in struct ib_port_attr is increased to 32 bits. This enables core
components to use larger LIDs if needed.
The user ABI is unchanged and return 16 bit values when queried.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:50:25 -04:00
Dan Carpenter
5db465f235 IB/hns: checking for IS_ERR() instead of NULL
The hns_roce_v1_create_lp_qp() returns NULL on error, not error pointers.

Fixes: bfcc681bd0 ("IB/hns: Fix the bug when free mr")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-04 15:09:56 -04:00
Leon Romanovsky
931b3c1a83 RDMA/mlx5: Fix existence check for extended address vector
The extended address vector is the highest bit in be32 variable,
but it was compared with the lowest. This patch fixes the endianness
of that check and removes already declared define.

Fixes: 17d2f88f92 ("IB/mlx5: Add ODP atomics support")
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-04 14:24:05 -04:00
Sebastian Sanchez
913cc67159 IB/hfi1: Always perform offline transition
Always initiate an offline transition request
when a link down occurs. The firmware will
use this request to confirm that the driver
has seen the link down message. A host version
is set to indicate this driver behavior to the
firmware.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:38 -04:00
Sebastian Sanchez
626c077c02 IB/hfi1: Prevent link down request double queuing
When link interrupts occur, multiple link down requests
could be queued up when only one is needed. This could get
the hfi1 out of sync with its link partner during LNI.

Only allow one link down request to be queued at any one time.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:38 -04:00
Sebastian Sanchez
71d47008ca IB/hfi1: Create workqueue for link events
Currently, link down interrupts queue link entries
on a workqueue intended for sending events only.
Create a workqueue for queuing link events.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:38 -04:00
Mike Marciniszyn
3ffea7d8cd IB/{rdmavt, hfi1, qib}: Fix panic with post receive and SGE compression
The server side of qperf panics as follows:

[242446.336860] IP: report_bug+0x64/0x10
[242446.341031] PGD 1c0c067
[242446.341032] P4D 1c0c067
[242446.343951] PUD 1c0d063
[242446.346870] PMD 8587ea067
[242446.349788] PTE 800000083e14016
[242446.352901]
[242446.358352] Oops: 0003 [#1] SM
[242446.437919] CPU: 1 PID: 7442 Comm: irq/92-hfi1_0 k Not tainted 4.12.0-mam-asm #1
[242446.446365] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/201
[242446.458397] task: ffff8808392d2b80 task.stack: ffffc9000664000
[242446.465097] RIP: 0010:report_bug+0x64/0x10
[242446.469859] RSP: 0018:ffffc900066439c0 EFLAGS: 0001000
[242446.475784] RAX: ffffffffa06647e4 RBX: ffffffffa06461e1 RCX: 000000000000000
[242446.483840] RDX: 0000000000000907 RSI: ffffffffa0675040 RDI: ffffffffffff740
[242446.491897] RBP: ffffc900066439e0 R08: 0000000000000001 R09: 000000000000025
[242446.499953] R10: ffffffff81a253df R11: 0000000000000133 R12: ffffc90006643b3
[242446.508010] R13: ffffffffa065bbf0 R14: 00000000000001e5 R15: 000000000000000
[242446.516067] FS:  0000000000000000(0000) GS:ffff88085f640000(0000) knlGS:000000000000000
[242446.525191] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003
[242446.531698] CR2: ffffffffa06647ee CR3: 0000000001c09000 CR4: 00000000001406e
[242446.539756] Call Trace
[242446.542582]  fixup_bug+0x2c/0x5
[242446.546277]  do_trap+0x12b/0x18
[242446.549972]  do_error_trap+0x89/0x11
[242446.554171]  ? hfi1_copy_sge+0x271/0x2b0 [hfi1
[242446.559324]  ? ttwu_do_wakeup+0x1e/0x14
[242446.563795]  ? ttwu_do_activate+0x77/0x8
[242446.568363]  do_invalid_op+0x20/0x3
[242446.572448]  invalid_op+0x1e/0x3
[242446.576247] RIP: 0010:hfi1_copy_sge+0x271/0x2b0 [hfi1
[242446.582075] RSP: 0018:ffffc90006643be8 EFLAGS: 0001004
[242446.587999] RAX: 0000000000000000 RBX: ffff88083e0fa240 RCX: 000000000000000
[242446.596058] RDX: 0000000000000000 RSI: ffff880842508000 RDI: ffff88083e0fa24
[242446.604116] RBP: ffffc90006643c28 R08: 0000000000000000 R09: 000000000000000
[242446.612172] R10: ffffc90009473640 R11: 0000000000000133 R12: 000000000000000
[242446.620228] R13: 0000000000000000 R14: 0000000000002000 R15: ffff88084250800
[242446.628293]  ? hfi1_copy_sge+0x1a1/0x2b0 [hfi1
[242446.633449]  hfi1_rc_rcv+0x3da/0x1270 [hfi1
[242446.638312]  ? sc_buffer_alloc+0x113/0x150 [hfi1
[242446.643662]  hfi1_ib_rcv+0x1c9/0x2e0 [hfi1
[242446.648428]  process_receive_ib+0x19a/0x270 [hfi1
[242446.653866]  ? process_rcv_qp_work+0xd2/0x160 [hfi1
[242446.659505]  handle_receive_interrupt_nodma_rtail+0x184/0x2e0 [hfi1
[242446.666693]  ? irq_finalize_oneshot+0x100/0x10
[242446.671846]  receive_context_thread+0x1b/0x140 [hfi1
[242446.677576]  irq_thread_fn+0x1e/0x4
[242446.681659]  irq_thread+0x13c/0x1b
[242446.685646]  ? irq_forced_thread_fn+0x60/0x6
[242446.690604]  kthread+0x112/0x15
[242446.694298]  ? irq_thread_check_affinity+0xe0/0xe
[242446.699738]  ? kthread_park+0x60/0x6
[242446.703919]  ? do_syscall_64+0x67/0x15
[242446.708292]  ret_from_fork+0x25/0x3
[242446.712374] Code: 63 78 04 44 0f b7 70 08 41 89 d0 4c 8d 2c 38 41 83 e0 01 f6 c2 02 74 17 66 45 85 c0 74 11 f6 c2 04 b9 01 00 00 00 75 bb 83 ca 04 <66> 89 50 0a 66 45 85 c0 74 52 0f b6 48 0b 41 0f b7 f6 4d 89 e0
[242446.733527] RIP: report_bug+0x64/0x100 RSP: ffffc900066439c
[242446.739935] CR2: ffffffffa06647e
[242446.743763] ---[ end trace 0e90a20d0aa494f7 ]--

The root cause is that the qib/hfi1 post receive call to rvt_lkey_ok()
doesn't interpret the new return value from rvt_lkey_ok() properly
leading to an mr reference count underrun.

Additionally, remove an unused argument in rvt_sge_adjacent()
aw well as an unneeded incr local in rvt_post_one_wr().

Fixes: Commit 14fe13fcd3 ("IB/rdmavt: Compress adjacent SGEs in rvt_lkey_ok()")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:38 -04:00
Jan Sokolowski
c822652ea6 IB/hfi1: Disambiguate corruption and uninitialized error cases
The error messages when checksum validation of the platform
configuration fields populated into the ASIC scratch registers fails are
ambiguous. Disambiguate them.

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:38 -04:00
Michael J. Ruhl
e87473bc1b IB/hfi1: Only set fd pointer when base context is completely initialized
The allocate_ctxt() function adds the context to the fd data structure.
Since the context is not completely initialized, this can cause confusion
as to whether the context is valid or not.

Move the fd reference from allocate_ctxt() to setup_base_ctxt().
Update the necessary functions to be aware of this move.

Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:38 -04:00
Jan Sokolowski
96603ed865 IB/hfi1: Do not enable disabled port on cable insert
Fix issue where a disabled port can be enabled by
inserting a cable. The port should be explicitly
enabled instead.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:37 -04:00
Alex Estrin
5efd40cad4 IB/hfi1: Harden state transition to Armed and Active
There is a window that allows other threads to read state of
'host_link_state' as a new, before the hardware actual state is set.
This patch closes the window by indicating a new state only after
hardware transition is complete.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:37 -04:00
Michael J. Ruhl
f13a6e5e2e IB/hfi1: Split copy_to_user data copy for better security
A copy_to_user() call assumes that two members of a data structure
are sequential.  Since this may not always be true, separate the copies
to ensure a safe copy.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:37 -04:00
Alex Estrin
5e2d6764a7 IB/hfi1: Verify port data VLs credits on transition to Armed
There is a window where the FM can read the buffer control table
and decide not to program buffers. When a port goes down, the code
clears the table and if it is not programmed, posted SDMA descriptors
will never complete due to no buffer credits.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:37 -04:00
Bartlomiej Dudek
a618b7e40a IB/hfi1: Move saving PCI values to a separate function
During PCIe initialization some registers' values from
PCI config space are saved in order to restore them later
(i.e. after reset). Restoring those value is done by a
function called restore_pci_variables, while saving them
is put directly into function hfi1_pcie_ddinit.
Move saving values to a separate function in the image
of restoring functionality.

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Bartlomiej Dudek <bartlomiej.dudek@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:37 -04:00
Byczkowski, Jakub
a156abb3cf IB/hfi1: Fix initialization failure for debug firmware
Loading debug signed firmware fails if started immediately after
failed attempt to load production firmware. A short delay is
required so add about a 100us delay after RSA check failure.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:17:55 -04:00
Jan Sokolowski
59ec873695 IB/hfi1: Fix code consistency for if/else blocks in chip.c
Code structure is not consistent for if/else blocks and break
instructions in set_link_state for case HLS_UP_INIT. Physical
state uses break in case of an error and if/else blocks for
logical use cases. These blocks should be implemented consistently.

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:17:55 -04:00
Michael J. Ruhl
bf90aadd63 IB/hfi1: Send MAD traps until repressed
A trap should be sent to the FM until the FM sends a repress message.
This is in line with the IBTA 13.4.9.

Add the ability to resend traps until a repress message is received.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael N. Henry <michael.n.henry@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:17:55 -04:00
Michael J. Ruhl
2250563e2c IB/hfi1: Pass the context pointer rather than the index
The hfi1_rcvctrl() function receives an index which it then converts
to an rcd.  Since most functions have the rcd, use that instead.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:17:55 -04:00
Michael J. Ruhl
17573972f4 IB/hfi1: Use context pointer rather than context index
The hfi1_<set|clear>_ctxt_<j|p>key functions take a context index and
look up the context based on that index.

Since the context index is being retrieved from the context, this
doesn't seem optimal.

Pass the context pointer for use, rather than the context index.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:17:55 -04:00
Michael J. Ruhl
e6f7622df1 IB/hfi1: Size rcd array index correctly and consistently
The array index for the rcd array is sized several different ways
throughout the code.

Use the user interface size (u16) as the standard size and update the
necessary code to reflect this.

u16 is large enough for the largest amount of supported contexts.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:17:55 -04:00
Michael J. Ruhl
91d970abe8 IB/hfi1: Remove unused user context data members
Several data members of the user context have become unused over time.
Cleaning them up.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:17:54 -04:00
Michael J. Ruhl
42492011ab IB/hfi1: Assign context does not clean up file descriptor correctly on error
In the error path for context allocation, the file descriptor pointer
should not point to a context when an error occurs.

Clean up the appropriate references on error.

Fixes: Commit 62239fc6e5 ("IB/hfi1: Clean up on context initialization failure")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:17:54 -04:00
Kaike Wan
bcad29137a IB/hfi1: Serve the most starved iowait entry first
When an egress resource(SDMA descriptors, pio credits) is not available,
a sending thread will be put on the resource's wait queue. When the
resource becomes available again, up to a fixed number of sending threads
can be awakened sequentially and removed from the wait queue, depending
on the number of waiting threads and the number of free resources. Since
each awakened sending thread will send as many packets as possible, it
is highly likely that the first sending thread will consume all the
egress resources. Subsequently, it will be put back to the end of the wait
queue. Depending on the timing when the later sending threads wake up,
they may not be able to send any packet and be again put back to the end
of the wait queue sequentially, right behind the first sending thread.
This starvation cycle continues until some sending threads exceed their
retry limit and consequently fail.

This patch fixes the issue by two simple approaches:
(1) Any starved sending thread will be put to the head of the wait queue
while a served sending thread will be put to the tail;
(2) The most starved sending thread will be served first.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:17:54 -04:00
Mike Marciniszyn
cb51c5d2cd IB/hfi1: Fix bar0 mapping to use write combining
When the debugpat kernel boot flag is turned on the following
traces are printed:

[ 1884.793168] x86/PAT: Overlap at 0x90000000-0x92000000
[ 1884.803510] x86/PAT: reserve_memtype added [mem 0x91200000-0x9127ffff],
track uncached-minus, req write-combining, ret uncached-minus
[ 1884.818167] hfi1 0000:05:00.0: hfi1_0: WC Remapped RcvArray:
ffffc9000a980000

The ioremap_wc() clearly is not returning a write combining mapping due
to an overlap where the RcvArray is mapped in a uncached mapping prior
to creating the proposed write combining mapping.

The patch replaces the single base register for uncached CSRs that
used to overlap the RcvArray with two mappings.   One, kregbase1, from the
bar0 up to the RcvArray and another, kregbase2, from the end of the
RcvArray to the pio send buffer space.  A new dd field, base2_start,
is used to convert the zero-based offset in the CSR routines to the
correct kregbase1/kregbase2 mapping.  A single direct write of the
RcvArray CSRs is replaced with hfi1_put_tid() to insure correct access
using the new disjoint mapping.

Additionally, the kregend field is deleted since it is only ever written.

patdebug now shows the RcvArray as write combining:
[   35.688990] x86/PAT: reserve_memtype added [mem 0x91200000-0x9127ffff],
track write-combining, req write-combining, ret write-combining

To insulate from any potential issues with write combining, all
writeq are now flushed in hfi1_put_tid() and rcv_array_wc_fill().

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:17:54 -04:00
Bartlomiej Dudek
c53df62c7a IB/hfi1: Check return values from PCI config API calls
Ensure that return values from kernel PCI config access
API calls in HFI driver are checked and react properly if
they are not expected (i.e. not successful).

Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Bartlomiej Dudek <bartlomiej.dudek@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:01:36 -04:00
Arnd Bergmann
967de35826 IB/hns: include linux/interrupt.h
I ran into this build error on linux-next:

drivers/infiniband/hw/hns/hns_roce_eq.c:477:8: error: unknown type name 'irqreturn_t'
 static irqreturn_t hns_roce_msi_x_interrupt(int irq, void *eq_ptr)
        ^~~~~~~~~~~
drivers/infiniband/hw/hns/hns_roce_eq.c: In function 'hns_roce_msi_x_interrupt':
drivers/infiniband/hw/hns/hns_roce_eq.c:485:9: error: implicit declaration of function 'IRQ_RETVAL'; did you mean 'BPF_RVAL'? [-Werror=implicit-function-declaration]
  return IRQ_RETVAL(int_work);

I have bisected this to a seemingly unrelated change that happened
to remove some indirect header inclusions. Simply including the
required header explicitly fixes the build failure.

Fixes: 09c7570480 ("xfrm: remove flow cache")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 14:44:47 -04:00
Doug Ledford
ecd840ff9b RDMA/hns: fix build regression
The 0day build system flags implicit includes as errors.  A patch from
Matan Barak to allow hns_roce, an aarch64 specific RDMA driver, to be
built on other arches, but it resulted in build regressions.  The
problem is that hns_roce_device.h needs a definition for __raw_writeq
but did not have an include to provide it.  Add <linux/io.h> as an
include to resolve the issue.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-28 15:17:18 -04:00
kbuild test robot
d1d71499a6 IB/hns: fix semicolon.cocci warnings
drivers/infiniband/hw/hns/hns_roce_eq.c:295:3-4: Unneeded semicolon

 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Fixes: 7d1b6a678e0b ("IB/hns: Support compile test for hns RoCE driver")
CC: Matan Barak <matanb@mellanox.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-28 14:52:05 -04:00
kbuild test robot
87809f83a4 IB/hns: fix returnvar.cocci warnings
drivers/infiniband/hw/hns/hns_roce_hw_v1.c:2026:5-8: Unneeded variable: "ret". Return "0" on line 2046

 Remove unneeded variable used to store return value.

Generated by: scripts/coccinelle/misc/returnvar.cocci

Fixes: 7d1b6a678e0b ("IB/hns: Support compile test for hns RoCE driver")
CC: Matan Barak <matanb@mellanox.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-28 14:51:36 -04:00
kbuild test robot
3756c7f59a IB/hns: fix boolreturn.cocci warnings
drivers/infiniband/hw/hns/hns_roce_qp.c:802:9-10: WARNING: return of 0/1 in function 'hns_roce_wq_overflow' with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

Fixes: 7d1b6a678e0b ("IB/hns: Support compile test for hns RoCE driver")
CC: Matan Barak <matanb@mellanox.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-28 14:51:13 -04:00
Matan Barak
e89bf462b6 IB/hns: Support compile test for hns RoCE driver
Compiling the hns RoCE driver requires ARM architecture.
In order to simplify development of IB/core, support
compile test. Add the necessary includes for that too.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-28 14:48:26 -04:00
Doug Ledford
a5f66725c7 Merge branch 'misc' into k.o/for-next 2017-07-27 09:00:38 -04:00
Amrani, Ram
67cbe3532c RDMA/qedr: notify user application of supported WIDs
The number of supported WIDs, if they are supported at all, can be
limited due to resources. Notifying the user space application the
number of available WIDs allows it to utilize them correctly.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-27 08:59:52 -04:00
Amrani, Ram
ad84dad216 RDMA/qedr: notify user application if DPM is supported
Direct Packet Mode support may be disabled, e.g, due to limited
resources. Notifying the user application prevents wasting cycles
on attempting to send these kind of packets.

Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-27 08:59:52 -04:00
Doug Ledford
f55c1e6608 Merge branches 'rxe' and 'mlx' into k.o/for-next 2017-07-26 20:13:33 -04:00
Colin Ian King
9c2d33d44d net/mlx5: fix spelling mistake: "alloated" -> "allocated"
Trivial fix to spelling mistake in mlx5_ib_dbg message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:46:46 -04:00
Kaike Wan
ff8d836efe IB/hfi1: Add receiving queue info to qp_stats
This patch adds qp->s_ack_queue indices and size to qp_stats printout.
This information will provide information about the receiving side.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:46:46 -04:00
Guy Levi
6afff1c715 IB/mlx4: Expose RSS capabilities
As a final step to support RSS feature, expose the RSS capabilities in
query device verb.

It includes:
- Max rwq indirection tables.
- Max rwq indirection table size.
- Supported qp types.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:45:53 -04:00
Guy Levi
3078f5f1bd IB/mlx4: Add support for RSS QP
Add support to work with a RSS QP by using an indirection table object
upon QP creation. Other related QP verbs (e.g. modify/destroy/query) were
updated as well for that QP mode.

Notes:
- The RX hash properties are supplied as driver private data.
- The RSS QP port is used on the associated WQs in its indirection
  table. Applying different ports during WQ life time is not allowed.
- The expected RSS QP flow is: create, modify(RST->INIT),
  modify(RST->RTR), destroy.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:45:53 -04:00
Guy Levi
b8d46ca035 IB/mlx4: Add support for WQ indirection table related verbs
To enable RSS functionality the IB indirection table object (i.e.
ib_rwq_ind_table) should be used.
This patch implements the related verbs as of create and destroy an
indirection table.

In downstream patches the indirection table will be used as part of RSS
QP creation.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:45:50 -04:00
Guy Levi
400b1ebcfe IB/mlx4: Add support for WQ related verbs
Support create/modify/destroy WQ related verbs.

The base IB object to enable RSS functionality is a WQ (i.e. ib_wq).
This patch implements the related WQ verbs as of create, modify and
destroy.

In downstream patches the WQ will be used as part of an indirection
table (i.e. ib_rwq_ind_table) to enable RSS QP creation.

Notes:
ConnectX-3 hardware requires consecutive WQNs list as receive descriptor
queues for the RSS QP. Hence, the driver manages consecutive ranges lists
per context which the user must respect.
Destroying the WQ does not return its WQN back to its range for
reusing. However, destroying all WQs from the same range releases the
range and in turn releases its WQNs for reusing.

Since the WQ object is not a natural object in the hardware, the driver
implements the WQ by the hardware QP.

As such, the WQ inherits its port from its RSS QP parent upon its
RST->INIT transition and by that time its state is applied to the
hardware.

Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:45:01 -04:00
Moshe Shemesh
f330187016 (IB, net)/mlx4: Add resource utilization support
Adding visibility of resource usage of QPs, CQs and counters used by
virtual functions. This feature will be used to give the PF administrator
more data while debugging VF status. Usage info was added to ALLOC_RES
command, to notify the PF if the resource which is being reserved or
allocated for the VF will be used by kernel driver or by user verbs.

Updated reservation and allocation functions of QP, CQ and counter with
additional usage parameter.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:41:35 -04:00
Maor Gottlieb
ea30b966f7 IB/mlx4: Add inline-receive support
When inline-receive is enabled, the HCA may write received
data into the receive WQE.

Inline-receive is enabled by setting its matching bit in
the QP context and each single-packet message with payload
not exceeding the receive WQE size will be delivered to
the WQE.

The completion report will indicate that the payload was placed to the WQE.

It includes:
1) Return maximum supported size of inline-receive by the hardware
in query_device vendor's data part.
2) Enable the feature when requested by the vendor data input.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:41:02 -04:00
Parav Pandit
58dcb60a22 IB/mlx5: Expose extended error counters
This patch adds below requester and responder side error counters,
which will be exposed by hardware counters interface and are supported
as part of query Q counters command extension.

 +---------------------------+-------------------------------------+
 |      Name                 |           Description               |
 |---------------------------+-------------------------------------|
 |resp_local_length_error    | Number of times responder detected  |
 |                           | local length errors                 |
 |---------------------------+-------------------------------------|
 |resp_cqe_error             | Number of CQEs completed with error |
 |                           | at responder                        |
 |---------------------------+-------------------------------------|
 |req_cqe_error              | Number of CQEs completed with error |
 |                           | at requester                        |
 |---------------------------+-------------------------------------|
 |req_remote_invalid_request | Number of times requester detected  |
 |                           | remote invalid request error        |
 |---------------------------+-------------------------------------|
 |req_remote_access_error    | Number of times requester detected  |
 |                           | remote access error                 |
 |---------------------------+-------------------------------------|
 |resp_remote_access_error   | Number of times responder detected  |
 |                           | remote access error                 |
 |---------------------------+-------------------------------------|
 |resp_cqe_flush_error       | Number of CQEs completed with       |
 |                           | flushed with error at responder     |
 |---------------------------+-------------------------------------|
 |req_cqe_flush_error        | Number of CQEs completed with       |
 |                           | flushed with error at requester     |
 +---------------------------+-------------------------------------+

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:41:01 -04:00
Leon Romanovsky
3fffc82ad6 IB/mlx5: Fix existence check for extended address vector
The extended address vector is the highest bit in be32 variable,
but it was compared with the lowest. This patch fixes the endianness
of that check and removes already declared define.

Fixes: 17d2f88f92 ("IB/mlx5: Add ODP atomics support")
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:41:01 -04:00
Majd Dibbiny
4c25b7a390 IB/mlx5: Fix cached MR allocation flow
When we have a miss in one order of the mkey cache, we try to get
an mkey from a higher order.

We still need to check that the higher order can be used with UMR
before using it. Otherwise, we will get an mkey with 0 entries and
the post send operation that is used to fill it will complete with
the following error:

mlx5_0:dump_cqe:275:(pid 0): dump error cqe
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 0f007806 25000025 49ce59d2

Fixes: 49780d42df ("IB/mlx5: Expose MR cache for mlx5_ib")
Cc: <stable@vger.kernel.org> # v4.10+
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:41:01 -04:00
Yishai Hadas
1d54f89094 IB/mlx5: Report RX checksum capabilities for IPoIB
Report RX checksum capabilities when 'ipoib_enhanced_offloads'
capabilities are set and support checksum.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:41:01 -04:00
Yishai Hadas
81e308804b IB/mlx5: Add multicast flow steering support for underlay QP
In order to add multicast flow steering support, there is need
to block the attaching of mcg flow for underlay QP, recognize
multicast IB_FLOW_SPEC_IPV4 based on its IP and enable
creating/destroying flow for IB layer.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:40:46 -04:00
Yishai Hadas
c2e53b2ce1 IB/mlx5: Add support for QP with a given source QPN
Allow user space applications to accelerate send and receive
traffic which is typically handled by IPoIB ULP by creating
a UD QP with a given source QPN of the IPoIB UD QP.

UD QP with a given source QPN should basically be similar to
RAW QP from point of view of its created resources.

However,
- Its TIS should point to the source QPN.
- Modify can be done only on its state as the transport attributes
  are managed by its source QP.

This patch manages below:
- Creating/destroying/modifying UD QP with a given source QPN.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:40:46 -04:00
Maor Gottlieb
fe248c3a58 IB/mlx5: Add delay drop configuration and statistics
Add debugfs interface for monitor the number of delay drop timeout
events and the number of existing dropless RQs in the system.

In addition add debugfs interface for configuring the global timeout value
which is used in the SET_DELAY_DROP command.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:40:22 -04:00
Maor Gottlieb
03404e8ae6 IB/mlx5: Add support to dropless RQ
RQs that were configured for "delay drop" will prevent packet drops
when their WQEs are depleted.
Marking an RQ to be drop-less is done by setting delay_drop_en in RQ
context using CREATE_RQ command.

Since this feature is globally activated/deactivated by using the
SET_DELAY_DROP command on all the marked RQs, we activated/deactivated
it according to the number of RQs with 'delay_drop' enabled.

When timeout is expired, then the feature is deactivated. Therefore
the driver handles the delay drop timeout event and reactivate it.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:39:53 -04:00
Bodong Wang
7ecf6d8ff1 IB/mlx5: Restore IB guid/policy for virtual functions
When a user sets port_guid, node_guid or policy of an IB virtual
function, save this information in "struct mlx5_vf_context".

This information will be restored later when pci_resume is called.
To make sure this works, one can use aer-inject to generate PCI
errors on mlx5 devices and verify if relevant fields are restored
after PCI resume.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:34:28 -04:00
Parav Pandit
4a2da0b8c0 IB/mlx5: Add debug control parameters for congestion control
This patch adds debug control parameters for congestion control which
can be read or written through debugfs. They are for reaction point and
notification point nodes.

These control parameters are as below:
 +------------------------------+-----------------------------------------+
 |      Name                    |           Description                   |
 |------------------------------+-----------------------------------------|
 |rp_clamp_tgt_rate             | When set target rate is updated to      |
 |                              | current rate                            |
 |------------------------------+-----------------------------------------|
 |rp_clamp_tgt_rate_ati         | When set update target rate based on    |
 |                              | timer as well                           |
 |------------------------------+-----------------------------------------|
 |rp_time_reset                 | time between rate increase if no        |
 |                              | CNP is received unit in usec            |
 |------------------------------+-----------------------------------------|
 |rp_byte_reset                 | Number of bytes between rate inease if  |
 |                              | no CNP is received                      |
 |------------------------------+-----------------------------------------|
 |rp_threshold                  | Threshold for reaction point rate       |
 |                              | control                                 |
 |------------------------------+-----------------------------------------|
 |rp_ai_rate                    | Rate for target rate, unit in Mbps      |
 |------------------------------+-----------------------------------------|
 |rp_hai_rate                   | Rate for hyper increase state           |
 |                              | unit in Mbps                            |
 |------------------------------+-----------------------------------------|
 |rp_min_dec_fac                | Minimum factor by which the current     |
 |                              | transmit rate can be changed when       |
 |                              | processing a CNP, unit is percerntage   |
 |------------------------------+-----------------------------------------|
 |rp_min_rate                   | Minimum value for rate limit,           |
 |                              | unit in Mbps                            |
 |------------------------------+-----------------------------------------|
 |rp_rate_to_set_on_first_cnp   | Rate that is set when first CNP is      |
 |                              | received, unit is Mbps                  |
 |------------------------------+-----------------------------------------|
 |rp_dce_tcp_g                  | Used to calculate alpha                 |
 |------------------------------+-----------------------------------------|
 |rp_dce_tcp_rtt                | Time between updates of alpha value,    |
 |                              | unit is usec                            |
 |------------------------------+-----------------------------------------|
 |rp_rate_reduce_monitor_period | Minimum time between consecutive rate   |
 |                              | reductions                              |
 |------------------------------+-----------------------------------------|
 |rp_initial_alpha_value        | Initial value of alpha                  |
 |------------------------------+-----------------------------------------|
 |rp_gd                         | When CNP is received, flow rate is      |
 |                              | reduced based on gd, rp_gd is given as  |
 |                              | log2(rp_gd)                             |
 |------------------------------+-----------------------------------------|
 |np_cnp_dscp                   | dscp code point for generated cnp       |
 |------------------------------+-----------------------------------------|
 |np_cnp_prio_mode              | 802.1p priority for generated cnp       |
 |------------------------------+-----------------------------------------|
 |np_cnp_prio                   | cnp priority mode                       |
 +------------------------------+-----------------------------------------+

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:34:28 -04:00
Moni Shoua
fd65f1b8ee IB/mlx5: Change logic for dispatching IB events for port state
The old logic ignored link state. This led to missing IB events like
when link goes down on the switch while admin state is up or to redundant
events like when admin state goes up while link is down.
To fix that, probe the port state on NETDEV events and compare to last
known state to decide if IB events needs to be dispatched.

FIxes: 5ec8c83e3a ("IB/mlx5: Port events in RoCE now rely on netdev events")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:29:18 -04:00
Huy Nguyen
c85023e153 IB/mlx5: Add raw ethernet local loopback support
Currently, unicast/multicast loopback raw ethernet
(non-RDMA) packets are sent back to the vport.
A unicast loopback packet is the packet with destination
MAC address the same as the source MAC address.
For multicast, the destination MAC address is in the
vport's multicast filter list.

Moreover, the local loopback is not needed if
there is one or none user space context.

After this patch, the raw ethernet unicast and multicast
local loopback are disabled by default. When there is more
than one user space context, the local loopback is enabled.

Note that when local loopback is disabled, raw ethernet
packets are not looped back to the vport and are forwarded
to the next routing level (eswitch, or multihost switch,
or out to the wire depending on the configuration).

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 10:29:18 -04:00
Selvin Xavier
f218d67ef0 RDMA/bnxt_re: Allow posting when QPs are in error
This  patch allows driver to post send and receive
requests on QPs which are in  error state.

Instead of flushing the QP in the context of polling
error CQEs, the QPs will be added to a flush list
maintained per CQ. QP state is moved to error.
QP is added to flush list if the user moves it
to error state using modify_qp also. After polling the HW
CQ in poll_cq routine, this flush list is traversed
and driver completes work requests on each QP in the flush
list, till the budget expires. The QP is moved out of
flush list during QP destroy or during modify_QP to RESET.

When ULPs post Work Requests while QP is in error state,
driver will store the ULP data and then increment the
QP producer s/w index, without ringing doorbell. It then
schedules a worker to invoke the CQ handler since the
interrupts wont be generated from the HW for this request.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:45:12 -04:00
Kalesh AP
5fac5b1b29 RDMA/bnxt_re: Add vlan tag for untagged RoCE traffic when PFC is configured
Current implementation does not program vlan header insertion
in RoCE packet if no vlan is configured. Firmware does not add
prority when there is no vlan tag in the packet. Modify the code
to insert vlan header when PFC is enabled on the interface.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:45:12 -04:00
Leon Romanovsky
e1267b0124 RDMA: Remove useless MODULE_VERSION
All modules in drivers/infiniband defined and used MODULE_VERSION, which
was pointless because the kernel version describes their state more accurate
then those arbitrary numbers.

Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Sagi Grimbrg <sagi@grimberg.me>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:45:11 -04:00
Yuval Shaia
d41861942f IB/core: Add generic function to extract IB speed from netdev
Logic of retrieving netdev speed from net_device and translating it to
IB speed is implemented in rxe, in usnic and in bnxt drivers.

Define new function which merges all.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Christian Benvenuti <benve@cisco.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:45:11 -04:00
Yuval Shaia
44b0b7455f IB/usnic: Implement get_netdev hook
usnic's get_netdev hook for struct ib_device is missing - add it.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Christian Benvenuti <benve@cisco.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:44:46 -04:00
Gustavo A. R. Silva
f201777a53 IB/qib: remove duplicate code
Remove duplicate code.

Addresses-Coverity-ID: 1226951
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:44:46 -04:00
Leon Romanovsky
4619ed40d9 RDMA/bnxt_re: Delete unsupported modify_port function
There is no need to return always zero for function which is not
supported. The IB stack treats uninitialized ib_device->functions as
not implemented.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24 08:44:46 -04:00
Doug Ledford
03da084ed8 Merge branch 'hfi1' into k.o/for-4.14 2017-07-24 08:33:43 -04:00
Amrani, Ram
c75d3ec8c0 RDMA/qedr: Prevent memory overrun in verbs' user responses
Wrap ib_copy_to_udata with a function that ensures that the data
being copied over to user space isn't longer than the allowed.

Fixes: cecbcddf64 ("qedr: Add support for QP verbs")
Fixes: a7efd7773e ("qedr: Add support for PD,PKEY and CQ verbs")
Fixes: ac1b36e55a ("qedr: Add support for user context verbs")
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:50 -04:00
Ganesh Goudar
720336c42e iw_cxgb4: don't use WR keys/addrs for 0 byte reads
Only use the read sge lkey/addr and the remote rkey/addr if the
length of the read is not zero. Otherwise the read response might
be treated as the RTR read response and not delivered to the
application. Or worse Terminator hardware will fail a 0B read
if the STAG is 0 even if the read length is 0.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:50 -04:00
Håkon Bugge
4542e3c79a IB/mlx4: Fix CM REQ retries in paravirt mode
CM REQs cannot be successfully retried, because a new pv_cm_id is
created for each request, without checking if one already exists.

By checking if an id exists before creating one, the bug is fixed.

This bug can be provoked by running an RDMA CM user-land application,
but inserting a five seconds delay before the rdma_accept() call on
the passive side. This delay is larger than the default CMA timeout,
and triggers a retry from the active side. The retried REQ will use
another pv_cm_id (the cm_id on the wire). This confuses the CM
protocol and two REJs are sent from the passive side.

Here is an excerpt from ibdump running without the patch:

3.285092       LID: 4 -> LID: 4       SDP 290 CM: ConnectRequest(SDP Hello)
7.382711       LID: 4 -> LID: 4       SDP 290 CM: ConnectRequest(SDP Hello)
7.382861       LID: 4 -> LID: 4       InfiniBand 290 CM: ConnectReject
7.387644       LID: 4 -> LID: 4       InfiniBand 290 CM: ConnectReject

and here is the same with bug fix applied:

3.251010       LID: 4 -> LID: 4       SDP 290 CM: ConnectRequest(SDP Hello)
7.349387       LID: 4 -> LID: 4       SDP 290 CM: ConnectRequest(SDP Hello)
8.258443       LID: 4 -> LID: 4       SDP 290 CM: ConnectReply(SDP Hello)
8.259890       LID: 4 -> LID: 4       InfiniBand 290 CM: ReadyToUse

Suggested-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reported-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Tested-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:50 -04:00
Selvin Xavier
601577b7d1 RDMA/bnxt_re: Fix the value reported for local ack delay
Local ack delay exposed by the driver is 0 which means infinite QP
timeout. Reporting the default value to 16 (approx 260ms)

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:50 -04:00
Selvin Xavier
499e456981 RDMA/bnxt_re: Report MISSED_EVENTS in req_notify_cq
While invoking the req_notify_cq hook, ULPs can request
whether the CQs have any CQEs pending. If CQEs are pending,
drivers can indicate  it by returning 1 for req_notify_cq.
The stack will poll CQ again till CQ is empty.

This patch peeks the CQ for any valid entries and return accordingly.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:50 -04:00
Devesh Sharma
10d1dedf9b RDMA/bnxt_re: Fix return value of poll routine
Fix the incorrect reporting of number of polled
entries by taking into account the max CQ depth
in the driver.

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:50 -04:00
Devesh Sharma
254cd2590d RDMA/bnxt_re: Enable atomics only if host bios supports
Driver shall check if the host system bios has enabled
Atomic operations capability in PCI Device Control 2
register of the pci-device. Expose the ATOMIC_HCA
flag only if the Atomic operations capability is set.

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:50 -04:00
Somnath Kotur
536f092805 RDMA/bnxt_re: Specify RDMA component when allocating stats context
Starting FW version 20.6.47, firmware is keeping separate statistics
for L2 and RDMA. However, driver needs to specify RDMA or not when
allocating stat_ctx.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:50 -04:00
Eddie Wai
a25d112fe9 RDMA/bnxt_re: Fixed the max_rd_atomic support for initiator and destination QP
There's a couple of bugs in the support of max_rd_atomic and
max_dest_rd_atomic. In the modify_qp, if the requested max_rd_atomic,
which is the ORRQ size, is greater than what the chip can support,
then we have to cap the request to chip max as we can't have the HW
overflow the ORRQ. Capping the max_rd_atomic support internally is okay
to do as the remaining read/atomic WRs will still be sitting in the SQ.
However, for the max_dest_rd_atomic, the driver has to error out as
this dictates the IRRQ size and we can't control what the remote
side sends.

Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:50 -04:00
Selvin Xavier
58d4a671d0 RDMA/bnxt_re: Report supported value to IB stack in query_device
- Report supported value for max_mr_size to IB stack in query_device.
   Also, check and log if MR size requested by application in
   reg_user_mr() is greater than value currently supported by driver.
 - Report only 4K page size support for now
 - Fix Max_QP value returned by ibv_devinfo -vv.
   In case of PF, FW reserves 129 QPs for creating QP1s of VFs
   and PF. So the max_qp value reported by FW for PF doesn'tt include
   the QP1. Fixing this issue by adding 1 with the value reported
   by FW.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:50 -04:00
Selvin Xavier
4a62c5e9e2 RDMA/bnxt_re: Do not free the ctx_tbl entry if delete GID fails
This fix is added only to avoid system crash in some a
specific scenario. When bnxt_re driver is loaded and if
user tries to change interface mac address, delete GID
fails because QP1 is still associated with existing MAC
(default GID). If the above command fails GID tables are
not modified in the h/w or driver, but the GID context memory
is freed. Now, if the user changes the mac back to the original
value, another add_gid comes to the driver where the driver
reports that the GID is already present in its table
and tries to access the context which was already freed.

So, in this case, in order to  avoid NULL pointer de-reference,
this patch removes the context memory free  if delete_gid fails
and the same context memory is re-used in new add_gid.
Memory cleanup will be taken care during driver unload, while
deleting the GID table.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:50 -04:00
Somnath Kotur
ab69d4c8da RDMA/bnxt_re: Fix WQE Size posted to HW to prevent it from throwing error
Posting WQE size of 2 results in a WQE_FORMAT_ERROR
thrown by the HW as it requires host to supply WQE Size with room
for atleast one SGE so that the resulting WQE size be atleast 3.

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:49 -04:00
Devesh Sharma
b3b2c7c550 RDMA/bnxt_re: Free doorbell page index (DPI) during dealloc ucontext
The driver must free the DPI during the dealloc_ucontext
instead of freeing it during dealloc_pd. However, the DPI
allocation scheme remains unchanged.

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:49 -04:00
Dan Carpenter
396551eb00 IB/mlx5: Fix a warning message
"umem" is a valid pointer.  We intended to print "*umem" or even just
"err" instead.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:49 -04:00
Dan Carpenter
f0c6e88288 RDMA/ocrdma: Fix error codes in ocrdma_create_srq()
If either of these allocations fail then we return ERR_PTR(0).  That's
equivalent to NULL and results in a NULL pointer dereference in the
caller.

Fixes: fe2caefcdf ("RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA adapter")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:49 -04:00
Dan Carpenter
dd75cfa6d3 RDMA/ocrdma: Fix an error code in ocrdma_alloc_pd()
We should preserve the original "status" error code instead of resetting
it to zero.  Returning ERR_PTR(0) is the same as NULL and results in a
NULL dereference in the callers.  I added a printk() on error instead.

Fixes: 45e86b33ec ("RDMA/ocrdma: Cache recv DB until QP moved to RTR")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:49 -04:00
Dan Carpenter
9064d6055c IB/cxgb3: Fix error codes in iwch_alloc_mr()
We accidentally don't set the error code on some error paths.  It means
return ERR_PTR(0) which is NULL and results in a NULL dereference in the
caller.

Fixes: 13a239330a ("RDMA/cxgb3: Don't ignore insert_handle() failures")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:49 -04:00
Dan Carpenter
6ebedacbb4 cxgb4: Fix error codes in c4iw_create_cq()
If one of these kmalloc() calls fails then we return ERR_PTR(0) which is
NULL.  It results in a NULL dereference in the callers.

Fixes: cfdda9d764 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-20 11:20:49 -04:00