Commit Graph

770 Commits

Author SHA1 Message Date
Dennis Dalessandro
09fbca8e62 IB/hfi1: No need to use try_module_get for debugfs
The call in debugfs.c for try_module_get() is not needed. A reference to
the module will be taken by the VFS layer as long as the owner field is
set in the file ops struct. So set this as well as remove the call.

Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Mike Marciniszyn
aa9b79ec37 IB/hfi1: Add missing INVALIDATE opcodes for trace
This was missed in the original implementation of the memory management
extensions.

Fixes: 0db3dfa03c ("IB/hfi1: Work request processing for fast register mr and invalidate")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Michael J. Ruhl
bf3b1e0ce0 IB/hfi1: Reduce excessive aspm inlines
Uninline the aspm API since it increases code space for no reason.

Move the aspm module param to the new aspm C file.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Michael J. Ruhl
2b0ad2da8f IB/{rdmavt, hfi1, qib}: Add helpers to hide SWQE WR details
Add some helper functions to hide struct rvt_swqe details.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Michael J. Ruhl
d310c4bf8a IB/{rdmavt, hfi1, qib}: Remove AH refcount for UD QPs
Historically rdmavt destroy_ah() has returned an -EBUSY when the AH has a
non-zero reference count.  IBTA 11.2.2 notes no such return value or error
case:

	Output Modifiers:
	- Verb results:
	- Operation completed successfully.
	- Invalid HCA handle.
	- Invalid address handle.

ULPs never test for this error and this will leak memory.

The reference count exists to allow for driver independent progress
mechanisms to process UD SWQEs in parallel with post sends.  The SWQE will
hold a reference count until the UD SWQE completes and then drops the
reference.

Fix by removing need to reference count the AH.  Add a UD specific
allocation to each SWQE entry to cache the necessary information for
independent progress.  Copy the information during the post send
processing.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Kamenee Arumugam
5136bfea7e IB/{hfi1, qib, rdmavt}: Put qp in error state when cq is full
When a completion queue is full, the associated queue pairs are not put
into the error state. According to the IBTA specification, this is a
violation.

Quote from IBTA spec:
C9-218: A Requester Class F error occurs when the CQ is inaccessible or
full and an attempt is made to complete a WQE.  The Affected QP shall be
moved to the error state and affiliated asynchronous errors generated as
described in 11.6.3.1 Affiliated Asynchronous Events on page 678. The
current WQE and any subsequent WQEs are left in an unknown state.

C11-37: The CI shall generate a CQ Error when a CQ overrun is
detected. This condition will result in an Affiliated Asynchronous Error
for any associated Work Queues when they attempt to use that
CQ. Completions can no longer be added to the CQ. It is not guaranteed
that completions present in the CQ at the time the error occurred can be
retrieved. Possible causes include a CQ overrun or a CQ protection error.

Put the qp in error state when cq is full. Implement a state called full
to continue to put other associated QPs in error state.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:34:26 -03:00
Kamenee Arumugam
239b0e52d8 IB/hfi1: Move rvt_cq_wc struct into uapi directory
The rvt_cq_wc struct elements are shared between rdmavt and the providers
but not in uapi directory.  As per the comment in
https://marc.info/?l=linux-rdma&m=152296522708522&w=2 The hfi1 driver and
the rdma core driver are not using shared structures in the uapi
directory.

In that case, move rvt_cq_wc struct into the rvt-abi.h header file and
create a rvt_k_cq_w for the kernel completion queue.

Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 22:32:16 -03:00
Jason Gunthorpe
371bb62158 Linux 5.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Os1seHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGtx4H/j6i482XzcGFKTBm
 A7mBoQpy+kLtoUov4EtBAR62OuwI8rsahW9di37QKndPoQrczWaKBmr3De6LCdPe
 v3pl3O6wBbvH5ru+qBPFX9PdNbDvimEChh7LHxmMxNQq3M+AjZAZVJyfpoiFnx35
 Fbge+LZaH/k8HMwZmkMr5t9Mpkip715qKg2o9Bua6dkH0AqlcpLlC8d9a+HIVw/z
 aAsyGSU8jRwhoAOJsE9bJf0acQ/pZSqmFp0rDKqeFTSDMsbDRKLGq/dgv4nW0RiW
 s7xqsjb/rdcvirRj3rv9+lcTVkOtEqwk0PVdL9WOf7g4iYrb3SOIZh8ZyViaDSeH
 VTS5zps=
 =huBY
 -----END PGP SIGNATURE-----

Merge tag 'v5.2-rc6' into rdma.git for-next

For dependencies in next patches.

Resolve conflicts:
- Use uverbs_get_cleared_udata() with new cq allocation flow
- Continue to delete nes despite SPDX conflict
- Resolve list appends in mlx5_command_str()
- Use u16 for vport_rule stuff
- Resolve list appends in struct ib_client

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28 21:18:23 -03:00
Mike Marciniszyn
4a9ceb7dba IB/{rdmavt, qib, hfi1}: Convert to new completion API
Convert all completions to use the new completion routine that
fixes a race between post send and completion where fields from
a SWQE can be read after SWQE has been freed.

This patch also addresses issues reported in
https://marc.info/?l=linux-kernel&m=155656897409107&w=2.

The reserved operation path has no need for any barrier.

The barrier for the other path is addressed by the
smp_load_acquire() barrier.

Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-20 22:35:09 -04:00
Geert Uytterhoeven
5a3113d19c IB/hfi1: Spelling s/statisfied/satisfied/
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-18 22:44:35 -04:00
Mike Marciniszyn
942a899335 IB/hfi1: Handle port down properly in pio
The call to sc_buffer_alloc currently returns NULL (no buffer) or
a buffer descriptor.

There is a third case when the port is down.  Currently that
returns NULL and this prevents the caller from properly handling the
sc_buffer_alloc() failure.  A verbs code link test after the call is
racy so the indication needs to come from the state check inside the allocation
routine to be valid.

Fix by encoding the ECOMM failure like SDMA.   IS_ERR_OR_NULL() tests
are added at all call sites.  For verbs send, this needs to treat any
error by returning a completion without any MMIO copy.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
099a884ba4 IB/hfi1: Handle wakeup of orphaned QPs for pio
Once a send context is taken down due to a link failure, any QPs waiting
for pio credits will stay on the waitlist indefinitely.

Fix by wakeing up all QPs linked to piowait list.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
f972775b1c IB/hfi1: Wakeup QPs orphaned on wait list after flush
Once an SDMA engine is taken down due to a link failure, any waiting QPs
that do not have outstanding descriptors in the ring will stay
on the dmawait list as long as the port is down.

Since there is no timer running, they will stay there for a long time.

The fix is to wake up all iowaits linked to dmawait. The send engine
will build and post packets that get flushed back.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
4bb02e9572 IB/hfi1: Use aborts to trigger RC throttling
SDMA and pio flushes will cause a lot of packets to be transmitted
after a link has gone down, using a lot of CPU to retransmit
packets.

Fix for RC QPs by recognizing the flush status and:
- Forcing a timer start
- Putting the QP into a "send one" mode

Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
9755f72496 IB/hfi1: Create inline to get extended headers
This paves the way for another patch that reacts to a
flush sdma completion for RC.

Fixes: 81cd3891f0 ("IB/hfi1: Add support for 16B Management Packets")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
3230f4a8d4 IB/hfi1: Silence txreq allocation warnings
The following warning can happen when a memory shortage
occurs during txreq allocation:

[10220.939246] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
[10220.939246] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0018.C4.072020161249 07/20/2016
[10220.939247]   cache: mnt_cache, object size: 384, buffer size: 384, default order: 2, min order: 0
[10220.939260] Workqueue: hfi0_0 _hfi1_do_send [hfi1]
[10220.939261]   node 0: slabs: 1026568, objs: 43115856, free: 0
[10220.939262] Call Trace:
[10220.939262]   node 1: slabs: 820872, objs: 34476624, free: 0
[10220.939263]  dump_stack+0x5a/0x73
[10220.939265]  warn_alloc+0x103/0x190
[10220.939267]  ? wake_all_kswapds+0x54/0x8b
[10220.939268]  __alloc_pages_slowpath+0x86c/0xa2e
[10220.939270]  ? __alloc_pages_nodemask+0x2fe/0x320
[10220.939271]  __alloc_pages_nodemask+0x2fe/0x320
[10220.939273]  new_slab+0x475/0x550
[10220.939275]  ___slab_alloc+0x36c/0x520
[10220.939287]  ? hfi1_make_rc_req+0x90/0x18b0 [hfi1]
[10220.939299]  ? __get_txreq+0x54/0x160 [hfi1]
[10220.939310]  ? hfi1_make_rc_req+0x90/0x18b0 [hfi1]
[10220.939312]  __slab_alloc+0x40/0x61
[10220.939323]  ? hfi1_make_rc_req+0x90/0x18b0 [hfi1]
[10220.939325]  kmem_cache_alloc+0x181/0x1b0
[10220.939336]  hfi1_make_rc_req+0x90/0x18b0 [hfi1]
[10220.939348]  ? hfi1_verbs_send_dma+0x386/0xa10 [hfi1]
[10220.939359]  ? find_prev_entry+0xb0/0xb0 [hfi1]
[10220.939371]  hfi1_do_send+0x1d9/0x3f0 [hfi1]
[10220.939372]  process_one_work+0x171/0x380
[10220.939374]  worker_thread+0x49/0x3f0
[10220.939375]  kthread+0xf8/0x130
[10220.939377]  ? max_active_store+0x80/0x80
[10220.939378]  ? kthread_bind+0x10/0x10
[10220.939379]  ret_from_fork+0x35/0x40
[10220.939381] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)

The shortage is handled properly so the message isn't needed. Silence by
adding the no warn option to the slab allocation.

Fixes: 45842abbb2 ("staging/rdma/hfi1: move txreq header code")
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
cf131a8196 IB/hfi1: Avoid hardlockup with flushlist_lock
Heavy contention of the sde flushlist_lock can cause hard lockups at
extreme scale when the flushing logic is under stress.

Mitigate by replacing the item at a time copy to the local list with
an O(1) list_splice_init() and using the high priority work queue to
do the flushes.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-17 21:15:40 -04:00
Mike Marciniszyn
cc78076af1 IB/hfi1: Correct tid qp rcd to match verbs context
The qp priv rcd pointer doesn't match the context being used for verbs
causing issues when 9B and kdeth packets are processed by different
receive contexts and hence different CPUs.

When running on different CPUs the following panic can occur:

 WARNING: CPU: 3 PID: 2584 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
 list_del corruption. prev->next should be ffff9a7ac31f7a30, but was ffff9a7c3bc89230
 CPU: 3 PID: 2584 Comm: z_wr_iss Kdump: loaded Tainted: P           OE  ------------   3.10.0-862.2.3.el7_lustre.x86_64 #1
 Call Trace:
  <IRQ>  [<ffffffffb7b0d78e>] dump_stack+0x19/0x1b
  [<ffffffffb74916d8>] __warn+0xd8/0x100
  [<ffffffffb749175f>] warn_slowpath_fmt+0x5f/0x80
  [<ffffffffb7768671>] __list_del_entry+0xa1/0xd0
  [<ffffffffc0c7a945>] process_rcv_qp_work+0xb5/0x160 [hfi1]
  [<ffffffffc0c7bc2b>] handle_receive_interrupt_nodma_rtail+0x20b/0x2b0 [hfi1]
  [<ffffffffc0c70683>] receive_context_interrupt+0x23/0x40 [hfi1]
  [<ffffffffb7540a94>] __handle_irq_event_percpu+0x44/0x1c0
  [<ffffffffb7540c42>] handle_irq_event_percpu+0x32/0x80
  [<ffffffffb7540ccc>] handle_irq_event+0x3c/0x60
  [<ffffffffb7543a1f>] handle_edge_irq+0x7f/0x150
  [<ffffffffb742d504>] handle_irq+0xe4/0x1a0
  [<ffffffffb7b23f7d>] do_IRQ+0x4d/0xf0
  [<ffffffffb7b16362>] common_interrupt+0x162/0x162
  <EOI>  [<ffffffffb775a326>] ? memcpy+0x6/0x110
  [<ffffffffc109210d>] ? abd_copy_from_buf_off_cb+0x1d/0x30 [zfs]
  [<ffffffffc10920f0>] ? abd_copy_to_buf_off_cb+0x30/0x30 [zfs]
  [<ffffffffc1093257>] abd_iterate_func+0x97/0x120 [zfs]
  [<ffffffffc10934d9>] abd_copy_from_buf_off+0x39/0x60 [zfs]
  [<ffffffffc109b828>] arc_write_ready+0x178/0x300 [zfs]
  [<ffffffffb7b11032>] ? mutex_lock+0x12/0x2f
  [<ffffffffb7b11032>] ? mutex_lock+0x12/0x2f
  [<ffffffffc1164d05>] zio_ready+0x65/0x3d0 [zfs]
  [<ffffffffc04d725e>] ? tsd_get_by_thread+0x2e/0x50 [spl]
  [<ffffffffc04d1318>] ? taskq_member+0x18/0x30 [spl]
  [<ffffffffc115ef22>] zio_execute+0xa2/0x100 [zfs]
  [<ffffffffc04d1d2c>] taskq_thread+0x2ac/0x4f0 [spl]
  [<ffffffffb74cee80>] ? wake_up_state+0x20/0x20
  [<ffffffffc115ee80>] ? zio_taskq_member.isra.7.constprop.10+0x80/0x80 [zfs]
  [<ffffffffc04d1a80>] ? taskq_thread_spawn+0x60/0x60 [spl]
  [<ffffffffb74bae31>] kthread+0xd1/0xe0
  [<ffffffffb74bad60>] ? insert_kthread_work+0x40/0x40
  [<ffffffffb7b1f5f7>] ret_from_fork_nospec_begin+0x21/0x21
  [<ffffffffb74bad60>] ? insert_kthread_work+0x40/0x40

Fix by reading the map entry in the same manner as the hardware so that
the kdeth and verbs contexts match.

Cc: <stable@vger.kernel.org>
Fixes: 5190f052a3 ("IB/hfi1: Allow the driver to initialize QP priv struct")
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-11 17:06:45 -03:00
Mike Marciniszyn
da9de5f852 IB/hfi1: Close PSM sdma_progress sleep window
The call to sdma_progress() is called outside the wait lock.

In this case, there is a race condition where sdma_progress() can return
false and the sdma_engine can idle.  If that happens, there will be no
more sdma interrupts to cause the wakeup and the user_sdma xmit will hang.

Fix by moving the lock to enclose the sdma_progress() call.

Also, delete busycount. The need for this was removed by:
commit bcad29137a ("IB/hfi1: Serve the most starved iowait entry first")

Cc: <stable@vger.kernel.org>
Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Gary Leshner <Gary.S.Leshner@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-11 17:06:45 -03:00
Kaike Wan
5f90677ed3 IB/hfi1: Validate fault injection opcode user input
The opcode range for fault injection from user should be validated before
it is applied to the fault->opcodes[] bitmap to avoid out-of-bound
error.

Cc: <stable@vger.kernel.org>
Fixes: a74d5307ca ("IB/hfi1: Rework fault injection machinery")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-11 17:06:37 -03:00
Jason Gunthorpe
7a15414252 RDMA: Move owner into struct ib_device_ops
This more closely follows how other subsytems work, with owner being a
member of the structure containing the function pointers.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10 16:56:03 -03:00
Jason Gunthorpe
b9560a419b RDMA: Move driver_id into struct ib_device_ops
No reason for every driver to emit code to set this, just make it part of
the driver's existing static const ops structure.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10 16:56:02 -03:00
Linus Torvalds
6e38335dcc 5.2 First rc pull request
The usual driver bug fixes and fixes for a couple of regressions introduced in
 5.2:
 
 - Fix a race on bootup with RDMA device renaming and srp. SRP also needs to
   rename its internal sys files
 
 - Fix a memory leak in hns
 
 - Don't leak resources in efa on certain error unwinds
 
 - Don't panic in certain error unwinds in ib_register_device
 
 - Various small user visible bug fix patches for the hfi and efa drivers
 
 - Fix the 32 bit compilation break
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlz5c5oACgkQOG33FX4g
 mxpEEhAAk64phaKUih9KVT0MpC1zezB1C0EGKg45GKuMOFUFJQ5tZ0g4s6aDEbG3
 374ZE9h82HMgKn4tQ95110AKvCI+VAbbKOS7kzk1rLWE1ruJxk5DNsvp1v5/S3FE
 GXBSws+HtZtdiRAMTYyEOfz0MqpvghFg0vor4PugrmOuqIe2a0bkYPEzYPjYbaNH
 jSctd/q4s/o02n6gfbCrFpXsW0Va3OIaDX5a+Fx5+lWW+GPr/Uzk/3kN95mFbDRp
 XsCE80V+n3ceKSQUp0lYtxU3tm2mT1JpiiZjXuKyjRV8IMUS+xkdJ8scEz0upGcg
 +Jr74mN/xKT3toHaMv7fZ3RGlYgFsSsZcAApm6LrIlTNQXKjJ8hl+2BWdi4nRfYZ
 X89RRWEl3j8i6URu65iH7y7IlfFEhjJGmATUQFdrfECR9hBMJ8VHzBfcz7aYgoac
 Ggi+2Vjm7GQlr9mzW/phXb25PWqP5yVTW6/3BUtMs3oY7kd6vE2n9XzGIy13uBpX
 fzY/tnIMrgZMjphYPPbBAbwl+tBKZCu4k6lpP7cLsVsIwY0NIWS26JCnCdO0efqR
 SnAUPjoAV7nkpG3mMO9Qv7h7yar3HrG7ED15hfmB4VowRNQMfDoTLc8jVWDvGk4/
 aFBSH8dEjszZ5tMO9HL+RXnvpkRcDyQpfVQJttY5adZFQlUOd+0=
 =RmxY
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Things are looking pretty quiet here in RDMA, not too many bug fixes
  rolling in right now. The usual driver bug fixes and fixes for a
  couple of regressions introduced in 5.2:

   - Fix a race on bootup with RDMA device renaming and srp. SRP also
     needs to rename its internal sys files

   - Fix a memory leak in hns

   - Don't leak resources in efa on certain error unwinds

   - Don't panic in certain error unwinds in ib_register_device

   - Various small user visible bug fix patches for the hfi and efa
     drivers

   - Fix the 32 bit compilation break"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/efa: Remove MAYEXEC flag check from mmap flow
  mlx5: avoid 64-bit division
  IB/hfi1: Validate page aligned for a given virtual address
  IB/{qib, hfi1, rdmavt}: Correct ibv_devinfo max_mr value
  IB/hfi1: Insure freeze_work work_struct is canceled on shutdown
  IB/rdmavt: Fix alloc_qpn() WARN_ON()
  RDMA/core: Fix panic when port_data isn't initialized
  RDMA/uverbs: Pass udata on uverbs error unwind
  RDMA/core: Clear out the udata before error unwind
  RDMA/hns: Fix PD memory leak for internal allocation
  RDMA/srp: Rename SRP sysfs name after IB device rename trigger
2019-06-07 09:25:27 -07:00
Gustavo A. R. Silva
6fe1a9b9b6 IB/hfi1: Use struct_size() helper
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes, in particular in the
context in which this code is being used.

So, replace the following form:

sizeof(struct opa_port_status_rsp) + num_vls * sizeof(struct _vls_pctrs)

with:

struct_size(rsp, vls, num_vls)

and so on...

Also, notice that variable size is unnecessary, hence it is removed.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-30 15:40:50 -03:00
Dennis Dalessandro
5f5e4eb4fb IB/hfi1: Remove extra brackets from an if
A recent patch to hfi1 left behind a checkpatch error.

Fixes: fb24ea52f7 ("drivers: Remove explicit invocations of mmiowb()")
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-29 12:59:48 -03:00
Kamenee Arumugam
97736f36db IB/hfi1: Validate page aligned for a given virtual address
User applications can register memory regions for TID buffers that are not
aligned on page boundaries. Hfi1 is expected to pin those pages in memory
and cache the pages with mmu_rb. The rb tree will fail to insert pages
that are not aligned correctly.

Validate whether a given virtual address is page aligned before pinning.

Fixes: 7e7a436ecb ("staging/hfi1: Add TID entry program function body")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-29 12:56:05 -03:00
Mike Marciniszyn
35164f5259 IB/{qib, hfi1, rdmavt}: Correct ibv_devinfo max_mr value
The command 'ibv_devinfo -v' reports 0 for max_mr.

Fix by assigning the query values after the mr lkey_table has been built
rather than early on in the driver.

Fixes: 7b1e2099ad ("IB/rdmavt: Move memory registration into rdmavt")
Reviewed-by: Josh Collier <josh.d.collier@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-29 12:56:05 -03:00
Mike Marciniszyn
6d517353c7 IB/hfi1: Insure freeze_work work_struct is canceled on shutdown
By code inspection, the freeze_work is never canceled.

Fix by adding a cancel_work_sync in the shutdown path to insure it is no
longer running.

Fixes: 7724105686 ("IB/hfi1: add driver files")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-29 12:56:05 -03:00
John Hubbard
ea99697458 RDMA: Convert put_page() to put_user_page*()
For infiniband code that retains pages via get_user_pages*(), release
those pages via the new put_user_page(), or put_user_pages*(), instead of
put_page()

This is a tiny part of the second step of fixing the problem described in
[1]. The steps are:

1) Provide put_user_page*() routines, intended to be used for releasing
   pages that were pinned via get_user_pages*().

2) Convert all of the call sites for get_user_pages*(), to invoke
   put_user_page*(), instead of put_page(). This involves dozens of call
   sites, and will take some time.

3) After (2) is complete, use get_user_pages*() and put_user_page*() to
   implement tracking of these pages. This tracking will be separate from
   the existing struct page refcounting.

4) Use the tracking and identification of these pages, to implement
   special handling (especially in writeback paths) when the pages are
   backed by a filesystem. Again, [1] provides details as to why that is
   desirable.

[1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-27 20:11:11 -03:00
YueHaibing
cfcc048ca7 IB/hfi1: Remove set but not used variables 'offset' and 'fspsn'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/infiniband/hw/hfi1/tid_rdma.c: In function tid_rdma_rcv_error:
drivers/infiniband/hw/hfi1/tid_rdma.c:2029:7: warning: variable offset set but not used [-Wunused-but-set-variable]
drivers/infiniband/hw/hfi1/tid_rdma.c: In function hfi1_rc_rcv_tid_rdma_ack:
drivers/infiniband/hw/hfi1/tid_rdma.c:4555:35: warning: variable fspsn set but not used [-Wunused-but-set-variable]

'offset' is never used since introduction in
commit d0d564a1ca ("IB/hfi1: Add functions to receive TID RDMA READ request")

'fspsn' is never used since introduciotn in
commit 9e93e967f7 ("IB/hfi1: Add a function to receive TID RDMA ACK packet")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-27 20:09:16 -03:00
Thomas Gleixner
ec8f24b7fa treewide: Add SPDX license identifier - Makefile/Kconfig
Add SPDX license identifiers to all Make/Kconfig files which:

 - Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:46 +02:00
Ira Weiny
9fdf4aa156 IB/hfi1: use the new FOLL_LONGTERM flag to get_user_pages_fast()
Use the new FOLL_LONGTERM to get_user_pages_fast() to protect against FS
DAX pages being mapped.

[ira.weiny@intel.com: v3]
  Link: http://lkml.kernel.org/r/20190328084422.29911-6-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190328084422.29911-6-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-6-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
Ira Weiny
73b0140bf0 mm/gup: change GUP fast to use flags rather than a write 'bool'
To facilitate additional options to get_user_pages_fast() change the
singular write parameter to be gup_flags.

This patch does not change any functionality.  New functionality will
follow in subsequent patches.

Some of the get_user_pages_fast() call sites were unchanged because they
already passed FOLL_WRITE or 0 for the write parameter.

NOTE: It was suggested to change the ordering of the get_user_pages_fast()
arguments to ensure that callers were converted.  This breaks the current
GUP call site convention of having the returned pages be the final
parameter.  So the suggestion was rejected.

Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marshall <hubcap@omnibond.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
Linus Torvalds
dce45af5c2 5.2 Merge Window pull request
This has been a smaller cycle than normal. One new driver was accepted,
 which is unusual, and at least one more driver remains in review on the
 list.
 
 - Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4, vmw_pvrdma
 
 - Many patches from MatthewW converting radix tree and IDR users to use
   xarray
 
 - Introduction of tracepoints to the MAD layer
 
 - Build large SGLs at the start for DMA mapping and get the driver to
   split them
 
 - Generally clean SGL handling code throughout the subsystem
 
 - Support for restricting RDMA devices to net namespaces for containers
 
 - Progress to remove object allocation boilerplate code from drivers
 
 - Change in how the mlx5 driver shows representor ports linked to VFs
 
 - mlx5 uapi feature to access the on chip SW ICM memory
 
 - Add a new driver for 'EFA'. This is HW that supports user space packet
   processing through QPs in Amazon's cloud
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlzTIU0ACgkQOG33FX4g
 mxrGKQ/8CqpyvuCyZDW5ovO4DI4YlzYSPXehWlwxA4CWhU1AYTujutnNOdZdngnz
 atTthOlJpZWJV26orvvzwIOi4qX/5UjLXEY3HYdn07JP1Z4iT7E3P4W2sdU3vdl3
 j8bU7xM7ZWmnGxrBZ6yQlVRadEhB8+HJIZWMw+wx66cIPnvU+g9NgwouH67HEEQ3
 PU8OCtGBwNNR508WPiZhjqMDfi/3BED4BfCihFhMbZEgFgObjRgtCV0M33SSXKcR
 IO2FGNVuDAUBlND3vU9guW1+M77xE6p1GvzkIgdCp6qTc724NuO5F2ngrpHKRyZT
 CxvBhAJI6tAZmjBVnmgVJex7rA8p+y/8M/2WD6GE3XSO89XVOkzNBiO2iTMeoxXr
 +CX6VvP2BWwCArxsfKMgW3j0h/WVE9w8Ciej1628m1NvvKEV4AGIJC1g93lIJkRN
 i3RkJ5PkIrdBrTEdKwDu1FdXQHaO7kGgKvwzJ7wBFhso8BRMrMfdULiMbaXs2Bw1
 WdL5zoSe/bLUpPZxcT9IjXRxY5qR0FpIOoo6925OmvyYe/oZo1zbitS5GGbvV90g
 tkq6Jb+aq8ZKtozwCo+oMcg9QPLYNibQsnkL3QirtURXWCG467xdgkaJLdF6s5Oh
 cp+YBqbR/8HNMG/KQlCfnNQKp1ci8mG3EdthQPhvdcZ4jtbqnSI=
 =TS64
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This has been a smaller cycle than normal. One new driver was
  accepted, which is unusual, and at least one more driver remains in
  review on the list.

  Summary:

   - Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4,
     vmw_pvrdma

   - Many patches from MatthewW converting radix tree and IDR users to
     use xarray

   - Introduction of tracepoints to the MAD layer

   - Build large SGLs at the start for DMA mapping and get the driver to
     split them

   - Generally clean SGL handling code throughout the subsystem

   - Support for restricting RDMA devices to net namespaces for
     containers

   - Progress to remove object allocation boilerplate code from drivers

   - Change in how the mlx5 driver shows representor ports linked to VFs

   - mlx5 uapi feature to access the on chip SW ICM memory

   - Add a new driver for 'EFA'. This is HW that supports user space
     packet processing through QPs in Amazon's cloud"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (186 commits)
  RDMA/ipoib: Allow user space differentiate between valid dev_port
  IB/core, ipoib: Do not overreact to SM LID change event
  RDMA/device: Don't fire uevent before device is fully initialized
  lib/scatterlist: Remove leftover from sg_page_iter comment
  RDMA/efa: Add driver to Kconfig/Makefile
  RDMA/efa: Add the efa module
  RDMA/efa: Add EFA verbs implementation
  RDMA/efa: Add common command handlers
  RDMA/efa: Implement functions that submit and complete admin commands
  RDMA/efa: Add the ABI definitions
  RDMA/efa: Add the com service API definitions
  RDMA/efa: Add the efa_com.h file
  RDMA/efa: Add the efa.h header file
  RDMA/efa: Add EFA device definitions
  RDMA: Add EFA related definitions
  RDMA/umem: Remove hugetlb flag
  RDMA/bnxt_re: Use core helpers to get aligned DMA address
  RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size
  RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks
  RDMA/umem: Add API to find best driver supported page size in an MR
  ...
2019-05-09 09:02:46 -07:00
Linus Torvalds
80f232121b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support AES128-CCM ciphers in kTLS, from Vakul Garg.

   2) Add fib_sync_mem to control the amount of dirty memory we allow to
      queue up between synchronize RCU calls, from David Ahern.

   3) Make flow classifier more lockless, from Vlad Buslov.

   4) Add PHY downshift support to aquantia driver, from Heiner
      Kallweit.

   5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces
      contention on SLAB spinlocks in heavy RPC workloads.

   6) Partial GSO offload support in XFRM, from Boris Pismenny.

   7) Add fast link down support to ethtool, from Heiner Kallweit.

   8) Use siphash for IP ID generator, from Eric Dumazet.

   9) Pull nexthops even further out from ipv4/ipv6 routes and FIB
      entries, from David Ahern.

  10) Move skb->xmit_more into a per-cpu variable, from Florian
      Westphal.

  11) Improve eBPF verifier speed and increase maximum program size,
      from Alexei Starovoitov.

  12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit
      spinlocks. From Neil Brown.

  13) Allow tunneling with GUE encap in ipvs, from Jacky Hu.

  14) Improve link partner cap detection in generic PHY code, from
      Heiner Kallweit.

  15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan
      Maguire.

  16) Remove SKB list implementation assumptions in SCTP, your's truly.

  17) Various cleanups, optimizations, and simplifications in r8169
      driver. From Heiner Kallweit.

  18) Add memory accounting on TX and RX path of SCTP, from Xin Long.

  19) Switch PHY drivers over to use dynamic featue detection, from
      Heiner Kallweit.

  20) Support flow steering without masking in dpaa2-eth, from Ioana
      Ciocoi.

  21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri
      Pirko.

  22) Increase the strict parsing of current and future netlink
      attributes, also export such policies to userspace. From Johannes
      Berg.

  23) Allow DSA tag drivers to be modular, from Andrew Lunn.

  24) Remove legacy DSA probing support, also from Andrew Lunn.

  25) Allow ll_temac driver to be used on non-x86 platforms, from Esben
      Haabendal.

  26) Add a generic tracepoint for TX queue timeouts to ease debugging,
      from Cong Wang.

  27) More indirect call optimizations, from Paolo Abeni"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits)
  cxgb4: Fix error path in cxgb4_init_module
  net: phy: improve pause mode reporting in phy_print_status
  dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings
  net: macb: Change interrupt and napi enable order in open
  net: ll_temac: Improve error message on error IRQ
  net/sched: remove block pointer from common offload structure
  net: ethernet: support of_get_mac_address new ERR_PTR error
  net: usb: smsc: fix warning reported by kbuild test robot
  staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check
  net: dsa: support of_get_mac_address new ERR_PTR error
  net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats
  vrf: sit mtu should not be updated when vrf netdev is the link
  net: dsa: Fix error cleanup path in dsa_init_module
  l2tp: Fix possible NULL pointer dereference
  taprio: add null check on sched_nest to avoid potential null pointer dereference
  net: mvpp2: cls: fix less than zero check on a u32 variable
  net_sched: sch_fq: handle non connected flows
  net_sched: sch_fq: do not assume EDT packets are ordered
  net: hns3: use devm_kcalloc when allocating desc_cb
  net: hns3: some cleanup for struct hns3_enet_ring
  ...
2019-05-07 22:03:58 -07:00
Linus Torvalds
dd4e5d6106 Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb())
Remove mmiowb() from the kernel memory barrier API and instead, for
 architectures that need it, hide the barrier inside spin_unlock() when
 MMIO has been performed inside the critical section.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlzMFaUACgkQt6xw3ITB
 YzRICQgAiv7wF/yIbBhDOmCNCAKDO59chvFQWxXWdGk/aAB56kwKAMXJgLOvlMG/
 VRuuLyParTFQETC3jaxKgnO/1hb+PZLDt2Q2KqixtjIzBypKUPWvK2sf6THhSRF1
 GK0DBVUd1rCrWrR815+SPb8el4xXtdBzvAVB+Fx35PXVNpdRdqCkK+EQ6UnXGokm
 rXXHbnfsnquBDtmb4CR4r2beH+aNElXbdt0Kj8VcE5J7f7jTdW3z6Q9WFRvdKmK7
 yrsxXXB2w/EsWXOwFp0SLTV5+fgeGgTvv8uLjDw+SG6t0E0PebxjNAflT7dPrbYL
 WecjKC9WqBxrGY+4ew6YJP70ijLBCw==
 =aC8m
 -----END PGP SIGNATURE-----

Merge tag 'arm64-mmiowb' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull mmiowb removal from Will Deacon:
 "Remove Mysterious Macro Intended to Obscure Weird Behaviours (mmiowb())

  Remove mmiowb() from the kernel memory barrier API and instead, for
  architectures that need it, hide the barrier inside spin_unlock() when
  MMIO has been performed inside the critical section.

  The only relatively recent changes have been addressing review
  comments on the documentation, which is in a much better shape thanks
  to the efforts of Ben and Ingo.

  I was initially planning to split this into two pull requests so that
  you could run the coccinelle script yourself, however it's been plain
  sailing in linux-next so I've just included the whole lot here to keep
  things simple"

* tag 'arm64-mmiowb' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (23 commits)
  docs/memory-barriers.txt: Update I/O section to be clearer about CPU vs thread
  docs/memory-barriers.txt: Fix style, spacing and grammar in I/O section
  arch: Remove dummy mmiowb() definitions from arch code
  net/ethernet/silan/sc92031: Remove stale comment about mmiowb()
  i40iw: Redefine i40iw_mmiowb() to do nothing
  scsi/qla1280: Remove stale comment about mmiowb()
  drivers: Remove explicit invocations of mmiowb()
  drivers: Remove useless trailing comments from mmiowb() invocations
  Documentation: Kill all references to mmiowb()
  riscv/mmiowb: Hook up mmwiob() implementation to asm-generic code
  powerpc/mmiowb: Hook up mmwiob() implementation to asm-generic code
  ia64/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  mips/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  sh/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()
  m68k/io: Remove useless definition of mmiowb()
  nds32/io: Remove useless definition of mmiowb()
  x86/io: Remove useless definition of mmiowb()
  arm64/io: Remove useless definition of mmiowb()
  ARM/io: Remove useless definition of mmiowb()
  mmiowb: Hook up mmiowb helpers to spinlocks and generic I/O accessors
  ...
2019-05-06 16:57:52 -07:00
Mike Marciniszyn
4c4b1996b5 IB/hfi1: Fix WQ_MEM_RECLAIM warning
The work_item cancels that occur when a QP is destroyed can elicit the
following trace:

 workqueue: WQ_MEM_RECLAIM ipoib_wq:ipoib_cm_tx_reap [ib_ipoib] is flushing !WQ_MEM_RECLAIM hfi0_0:_hfi1_do_send [hfi1]
 WARNING: CPU: 7 PID: 1403 at kernel/workqueue.c:2486 check_flush_dependency+0xb1/0x100
 Call Trace:
  __flush_work.isra.29+0x8c/0x1a0
  ? __switch_to_asm+0x40/0x70
  __cancel_work_timer+0x103/0x190
  ? schedule+0x32/0x80
  iowait_cancel_work+0x15/0x30 [hfi1]
  rvt_reset_qp+0x1f8/0x3e0 [rdmavt]
  rvt_destroy_qp+0x65/0x1f0 [rdmavt]
  ? _cond_resched+0x15/0x30
  ib_destroy_qp+0xe9/0x230 [ib_core]
  ipoib_cm_tx_reap+0x21c/0x560 [ib_ipoib]
  process_one_work+0x171/0x370
  worker_thread+0x49/0x3f0
  kthread+0xf8/0x130
  ? max_active_store+0x80/0x80
  ? kthread_bind+0x10/0x10
  ret_from_fork+0x35/0x40

Since QP destruction frees memory, hfi1_wq should have the WQ_MEM_RECLAIM.

The hfi1_wq does not allocate memory with GFP_KERNEL or otherwise become
entangled with memory reclaim, so this flag is appropriate.

Fixes: 0a226edd20 ("staging/rdma/hfi1: Use parallel workqueue for SDMA engines")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-06 12:57:45 -03:00
Jason Gunthorpe
449a224c10 Merge branch 'rdma_mmap' into rdma.git for-next
Jason Gunthorpe says:

====================
Upon review it turns out there are some long standing problems in BAR
mapping area:
 * BAR pages intended for read-only can be switched to writable via mprotect.
 * Missing use of rdma_user_mmap_io for the mlx5 clock BAR page.
 * Disassociate causes SIGBUS when touching the pages.
 * CPU pages are being mapped through to the process via remap_pfn_range
   instead of the more appropriate vm_insert_page, causing weird behaviors
   during disassociation.

This series adds the missing VM_* flag manipulation, adds faulting a zero
page for disassociation and revises the CPU page mappings to use
vm_insert_page.
====================

For dependencies this branch is based on for-rc from
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git

* branch 'rdma_mmap':
  RDMA: Remove rdma_user_mmap_page
  RDMA/mlx5: Use get_zeroed_page() for clock_info
  RDMA/ucontext: Fix regression with disassociate
  RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages
  RDMA/mlx5: Do not allow the user to write to the clock page

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 16:20:34 -03:00
John Fleck
3c176c9d72 IB/hfi1: Remove reference to RHF.VCRCErr
The bit VCRCErr in the receive header flag is actually a
reserved field. Remove bit operations on this field.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: John Fleck <john.fleck@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:48:11 -03:00
Mike Marciniszyn
a9c62e0078 IB/hfi1: Add selected Rcv counters
These counters are required for error analysis and debug.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:48:10 -03:00
Mike Marciniszyn
d40f69c9b9 IB/{rdmavt, qib, hfi1}: Use new routine to release reference counts
The reference count adjustments on reference count completion
are open coded throughout.

Add a routine to do all reference count adjustments and use.

Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:31:49 -03:00
Mike Marciniszyn
62644c1d2b IB/hfi1: Make opfn.h self sufficient
The opfn.h include file build-ablility depends on the including file
having the correct includes.

Fix by making opfn.h self sufficient.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:31:49 -03:00
Kaike Wan
ea752bc5e5 IB/{rdmavt, hfi1): Miscellaneous comment fixes
This patch fixes miscellaneous comment errors.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:31:48 -03:00
Josh Collier
07c5ba9124 IB/hfi1: Add debugfs to control expansion ROM write protect
Some kernels now enable CONFIG_IO_STRICT_DEVMEM which prevents multiple
handles to PCI resource0. In order to continue to support expansion ROM
updates while the driver is loaded, the driver must now provide an
interface to control the expansion ROM write protection.

This patch adds an exprom_wp debugfs interface that allows the hfi1_eprom
user tool to disable the expansion ROM write protection by opening the
file and writing a '1'.  The write protection is released when writing a
'0' or automatically re-enabled when the file handle is closed.  The
current implementation will only allow one handle to be opened at a time
across all hfi1 devices.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Josh Collier <josh.d.collier@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-24 11:26:41 -03:00
David S. Miller
6b0a7f84ea Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflict resolution of af_smc.c from Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17 11:26:25 -07:00
Kaike Wan
d737b25b1a IB/hfi1: Do not flush send queue in the TID RDMA second leg
When a QP is put into error state, the send queue will be flushed.
This mechanism is implemented in both the first and the second leg
of the send engine. Since the second leg is only responsible for
data transactions in the KDETH space for the TID RDMA WRITE request,
it should not perform the flushing of the send queue.

This patch removes the flushing function of the second leg, but
still keeps the bailing out of the QP if it is put into error state.

Fixes: 70dcb2e3dc ("IB/hfi1: Add the TID second leg send packet builder")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-10 15:09:30 -03:00
Will Deacon
fb24ea52f7 drivers: Remove explicit invocations of mmiowb()
mmiowb() is now implied by spin_unlock() on architectures that require
it, so there is no reason to call it from driver code. This patch was
generated using coccinelle:

	@mmiowb@
	@@
	- mmiowb();

and invoked as:

$ for d in drivers include/linux/qed sound; do \
spatch --include-headers --sp-file mmiowb.cocci --dir $d --in-place; done

NOTE: mmiowb() has only ever guaranteed ordering in conjunction with
spin_unlock(). However, pairing each mmiowb() removal in this patch with
the corresponding call to spin_unlock() is not at all trivial, so there
is a small chance that this change may regress any drivers incorrectly
relying on mmiowb() to order MMIO writes between CPUs using lock-free
synchronisation. If you've ended up bisecting to this commit, you can
reintroduce the mmiowb() calls using wmb() instead, which should restore
the old behaviour on all architectures other than some esoteric ia64
systems.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-08 12:01:02 +01:00
Will Deacon
949b8c7276 drivers: Remove useless trailing comments from mmiowb() invocations
In preparation for using coccinelle to remove all mmiowb() instances
from drivers, remove all trailing comments since they won't be picked up
by spatch later on and will end up being preserved in the code.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-08 12:00:56 +01:00
Jason Gunthorpe
1c726c4421 Merge HFI1 updates into k.o/for-next
Based on rdma.git for-rc for dependencies.

From Dennis Dalessandro:

====================

Here are some code improvement patches and fixes for less serious bugs to
TID RDMA than we sent for RC.

====================

* HFI1 updates:
  IB/hfi1: Implement CCA for TID RDMA protocol
  IB/hfi1: Remove WARN_ON when freeing expected receive groups
  IB/hfi1: Unify the software PSN check for TID RDMA READ/WRITE
  IB/hfi1: Add a function to read next expected psn from hardware flow
  IB/hfi1: Delay the release of destination mr for TID RDMA WRITE DATA

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-03 15:28:05 -03:00
Kaike Wan
747b931fbe IB/hfi1: Implement CCA for TID RDMA protocol
Currently, FECN handling is not implemented on TID RDMA expected receive
packets and therefore CCA can't be turned on when TID RDMA is
enabled. This patch adds the CCA support to TID RDMA protocol by:

- modifying FECN RSM rule to include kernel receive contexts

- For TID_RDMA READ RESP or TID RDMA ACK packet, a CNP will be sent out if
  the FECN bit is set. For other TID RDMA packets that generate at least
  one response packet, the BECN bit will be set in the first response
  packet

- Copying expected packet data to destination buffer when FECN bit is set
  in the TID RDMA READ RESP or TID RDMA WRITE DATA packet. In this case,
  the expected packet is received as an eager packet

- Handling the TID sequence error for subsequent normal expected packets.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-03 15:27:39 -03:00