Commit Graph

12625 Commits

Author SHA1 Message Date
Mark Zhang
64733956eb RDMA/sa_query: Use strscpy_pad instead of memcpy to copy a string
When copying the device name, the length of the data memcpy copied exceeds
the length of the source buffer, which cause the KASAN issue below.  Use
strscpy_pad() instead.

 BUG: KASAN: slab-out-of-bounds in ib_nl_set_path_rec_attrs+0x136/0x320 [ib_core]
 Read of size 64 at addr ffff88811a10f5e0 by task rping/140263
 CPU: 3 PID: 140263 Comm: rping Not tainted 5.15.0-rc1+ #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Call Trace:
  dump_stack_lvl+0x57/0x7d
  print_address_description.constprop.0+0x1d/0xa0
  kasan_report+0xcb/0x110
  kasan_check_range+0x13d/0x180
  memcpy+0x20/0x60
  ib_nl_set_path_rec_attrs+0x136/0x320 [ib_core]
  ib_nl_make_request+0x1c6/0x380 [ib_core]
  send_mad+0x20a/0x220 [ib_core]
  ib_sa_path_rec_get+0x3e3/0x800 [ib_core]
  cma_query_ib_route+0x29b/0x390 [rdma_cm]
  rdma_resolve_route+0x308/0x3e0 [rdma_cm]
  ucma_resolve_route+0xe1/0x150 [rdma_ucm]
  ucma_write+0x17b/0x1f0 [rdma_ucm]
  vfs_write+0x142/0x4d0
  ksys_write+0x133/0x160
  do_syscall_64+0x43/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f26499aa90f
 Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 29 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 5c fd ff ff 48
 RSP: 002b:00007f26495f2dc0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: 00000000000007d0 RCX: 00007f26499aa90f
 RDX: 0000000000000010 RSI: 00007f26495f2e00 RDI: 0000000000000003
 RBP: 00005632a8315440 R08: 0000000000000000 R09: 0000000000000001
 R10: 0000000000000000 R11: 0000000000000293 R12: 00007f26495f2e00
 R13: 00005632a83154e0 R14: 00005632a8315440 R15: 00005632a830a810

 Allocated by task 131419:
  kasan_save_stack+0x1b/0x40
  __kasan_kmalloc+0x7c/0x90
  proc_self_get_link+0x8b/0x100
  pick_link+0x4f1/0x5c0
  step_into+0x2eb/0x3d0
  walk_component+0xc8/0x2c0
  link_path_walk+0x3b8/0x580
  path_openat+0x101/0x230
  do_filp_open+0x12e/0x240
  do_sys_openat2+0x115/0x280
  __x64_sys_openat+0xce/0x140
  do_syscall_64+0x43/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 2ca546b92a ("IB/sa: Route SA pathrecord query through netlink")
Link: https://lore.kernel.org/r/72ede0f6dab61f7f23df9ac7a70666e07ef314b0.1635055496.git.leonro@nvidia.com
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-25 11:51:51 -03:00
Mustafa Ismail
2dace185ca RDMA/irdma: Do not hold qos mutex twice on QP resume
When irdma_ws_add fails, irdma_ws_remove is used to cleanup the leaf node.
This lead to holding the qos mutex twice in the QP resume path. Fix this
by avoiding the call to irdma_ws_remove and unwinding the error in
irdma_ws_add. This skips the call to irdma_tc_in_use function which is not
needed in the error unwind cases.

Fixes: 3ae331c751 ("RDMA/irdma: Add QoS definitions")
Link: https://lore.kernel.org/r/20211019151654.1943-2-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-19 20:22:01 -03:00
Mustafa Ismail
cc07b73ef1 RDMA/irdma: Set VLAN in UD work completion correctly
Currently VLAN is reported in UD work completion when VLAN id is zero,
i.e. no VLAN case.

Report VLAN in UD work completion only when VLAN id is non-zero.

Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://lore.kernel.org/r/20211019151654.1943-1-shiraz.saleem@intel.com
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-19 20:22:01 -03:00
Aharon Landau
5508546631 RDMA/mlx5: Initialize the ODP xarray when creating an ODP MR
Normally the zero fill would hide the missing initialization, but an
errant set to desc_size in reg_create() causes a crash:

  BUG: unable to handle page fault for address: 0000000800000000
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  CPU: 5 PID: 890 Comm: ib_write_bw Not tainted 5.15.0-rc4+ #47
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  RIP: 0010:mlx5_ib_dereg_mr+0x14/0x3b0 [mlx5_ib]
  Code: 48 63 cd 4c 89 f7 48 89 0c 24 e8 37 30 03 e1 48 8b 0c 24 eb a0 90 0f 1f 44 00 00 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 30 <48> 8b 2f 65 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 8b 87 c8
  RSP: 0018:ffff88811afa3a60 EFLAGS: 00010286
  RAX: 000000000000001c RBX: 0000000800000000 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000800000000
  RBP: 0000000800000000 R08: 0000000000000000 R09: c0000000fffff7ff
  R10: ffff88811afa38f8 R11: ffff88811afa38f0 R12: ffffffffa02c7ac0
  R13: 0000000000000000 R14: ffff88811afa3cd8 R15: ffff88810772fa00
  FS:  00007f47b9080740(0000) GS:ffff88852cd40000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000800000000 CR3: 000000010761e003 CR4: 0000000000370ea0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   mlx5_ib_free_odp_mr+0x95/0xc0 [mlx5_ib]
   mlx5_ib_dereg_mr+0x128/0x3b0 [mlx5_ib]
   ib_dereg_mr_user+0x45/0xb0 [ib_core]
   ? xas_load+0x8/0x80
   destroy_hw_idr_uobject+0x1a/0x50 [ib_uverbs]
   uverbs_destroy_uobject+0x2f/0x150 [ib_uverbs]
   uobj_destroy+0x3c/0x70 [ib_uverbs]
   ib_uverbs_cmd_verbs+0x467/0xb00 [ib_uverbs]
   ? uverbs_finalize_object+0x60/0x60 [ib_uverbs]
   ? ttwu_queue_wakelist+0xa9/0xe0
   ? pty_write+0x85/0x90
   ? file_tty_write.isra.33+0x214/0x330
   ? process_echoes+0x60/0x60
   ib_uverbs_ioctl+0xa7/0x110 [ib_uverbs]
   __x64_sys_ioctl+0x10d/0x8e0
   ? vfs_write+0x17f/0x260
   do_syscall_64+0x3c/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Add the missing xarray initialization and remove the desc_size set.

Fixes: a639e66703 ("RDMA/mlx5: Zero out ODP related items in the mlx5_ib_mr")
Link: https://lore.kernel.org/r/a4846a11c9de834663e521770da895007f9f0d30.1634642730.git.leonro@nvidia.com
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-19 20:19:59 -03:00
Prabhakar Kushwaha
60fab10766 rdma/qedr: Fix crash due to redundant release of device's qp memory
Device's QP memory should only be allocated and released by IB layer.
This patch removes the redundant release of the device's qp memory and
uses completion APIs to make sure that .destroy_qp() only return, when qp
reference becomes 0.

Fixes: 514aee660d ("RDMA: Globally allocate and release QP memory")
Link: https://lore.kernel.org/r/20211019082212.7052-1-pkushwaha@marvell.com
Acked-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-19 20:17:44 -03:00
Dan Carpenter
663991f328 RDMA/rdmavt: Fix error code in rvt_create_qp()
Return negative -ENOMEM instead of positive ENOMEM.  Returning a postive
value will cause an Oops because it becomes an ERR_PTR() in the
create_qp() function.

Fixes: 514aee660d ("RDMA: Globally allocate and release QP memory")
Link: https://lore.kernel.org/r/20211013080645.GD6010@kili
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-13 13:59:47 -03:00
Mike Marciniszyn
13bac86195 IB/hfi1: Fix abba locking issue with sc_disable()
sc_disable() after having disabled the send context wakes up any waiters
by calling hfi1_qp_wakeup() while holding the waitlock for the sc.

This is contrary to the model for all other calls to hfi1_qp_wakeup()
where the waitlock is dropped and a local is used to drive calls to
hfi1_qp_wakeup().

Fix by moving the sc->piowait into a local list and driving the wakeup
calls from the list.

Fixes: 099a884ba4 ("IB/hfi1: Handle wakeup of orphaned QPs for pio")
Link: https://lore.kernel.org/r/20211013141852.128104.2682.stgit@awfm-01.cornelisnetworks.com
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-13 13:33:22 -03:00
Mike Marciniszyn
d39bf40e55 IB/qib: Protect from buffer overflow in struct qib_user_sdma_pkt fields
Overflowing either addrlimit or bytes_togo can allow userspace to trigger
a buffer overflow of kernel memory. Check for overflows in all the places
doing math on user controlled buffers.

Fixes: f931551baf ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters")
Link: https://lore.kernel.org/r/20211012175519.7298.77738.stgit@awfm-01.cornelisnetworks.com
Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-13 13:26:04 -03:00
Patrisious Haddad
1ab52ac1e9 RDMA/mlx5: Set user priority for DCT
Currently, the driver doesn't set the PCP-based priority for DCT, hence
DCT response packets are transmitted without user priority.

Fix it by setting user provided priority in the eth_prio field in the DCT
context, which in turn sets the value in the transmitted packet.

Fixes: 776a3906b6 ("IB/mlx5: Add support for DC target QP")
Link: https://lore.kernel.org/r/5fd2d94a13f5742d8803c218927322257d53205c.1633512672.git.leonro@nvidia.com
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-06 16:39:52 -03:00
Shiraz Saleem
e93c7d8e8c RDMA/irdma: Process extended CQ entries correctly
The valid bit for extended CQE's written by HW is retrieved from the
incorrect quad-word. This leads to missed completions for any UD traffic
particularly after a wrap-around.

Get the valid bit for extended CQE's from the correct quad-word in the
descriptor.

Fixes: 551c46edc7 ("RDMA/irdma: Add user/kernel shared libraries")
Link: https://lore.kernel.org/r/20211005182302.374-1-shiraz.saleem@intel.com
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-06 16:38:07 -03:00
Wenpeng Liang
e671f0ecfe RDMA/hns: Add the check of the CQE size of the user space
If the CQE size of the user space is not the size supported by the
hardware, the creation of CQ should be stopped.

Fixes: 09a5f210f6 ("RDMA/hns: Add support for CQE in size of 64 Bytes")
Link: https://lore.kernel.org/r/20210927125557.15031-3-liangwenpeng@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27 14:49:49 -03:00
Wenpeng Liang
cc26aee100 RDMA/hns: Fix the size setting error when copying CQE in clean_cq()
The size of CQE is different for different versions of hardware, so the
driver needs to specify the size of CQE explicitly.

Fixes: 09a5f210f6 ("RDMA/hns: Add support for CQE in size of 64 Bytes")
Link: https://lore.kernel.org/r/20210927125557.15031-2-liangwenpeng@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27 14:49:48 -03:00
Guo Zhi
7d5cfafe8b RDMA/hfi1: Fix kernel pointer leak
Pointers should be printed with %p or %px rather than cast to 'unsigned
long long' and printed with %llx.  Change %llx to %p to print the secured
pointer.

Fixes: 042a00f93a ("IB/{ipoib,hfi1}: Add a timeout handler for rdma_netdev")
Link: https://lore.kernel.org/r/20210922134857.619602-1-qtxuning1999@sjtu.edu.cn
Signed-off-by: Guo Zhi <qtxuning1999@sjtu.edu.cn>
Acked-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-27 14:32:14 -03:00
Leon Romanovsky
a86cd017a4 RDMA/usnic: Lock VF with mutex instead of spinlock
Usnic VF doesn't need lock in atomic context to create QPs, so it is safe
to use mutex instead of spinlock. Such change fixes the following smatch
error.

Smatch static checker warning:

   lib/kobject.c:289 kobject_set_name_vargs()
    warn: sleeping in atomic context

Fixes: 514aee660d ("RDMA: Globally allocate and release QP memory")
Link: https://lore.kernel.org/r/2a0e295786c127e518ebee8bb7cafcb819a625f6.1631520231.git.leonro@nvidia.com
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-24 10:55:28 -03:00
Jason Gunthorpe
14351f08ed RDMA/hns: Work around broken constant propagation in gcc 8
gcc 8.3 and 5.4 throw this:

In function 'modify_qp_init_to_rtr',
././include/linux/compiler_types.h:322:38: error: call to '__compiletime_assert_1859' declared with attribute error: FIELD_PREP: value too large for the field
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
[..]
drivers/infiniband/hw/hns/hns_roce_common.h:91:52: note: in expansion of macro 'FIELD_PREP'
   *((__le32 *)ptr + (field_h) / 32) |= cpu_to_le32(FIELD_PREP(   \
                                                    ^~~~~~~~~~
drivers/infiniband/hw/hns/hns_roce_common.h:95:39: note: in expansion of macro '_hr_reg_write'
 #define hr_reg_write(ptr, field, val) _hr_reg_write(ptr, field, val)
                                       ^~~~~~~~~~~~~
drivers/infiniband/hw/hns/hns_roce_hw_v2.c:4412:2: note: in expansion of macro 'hr_reg_write'
  hr_reg_write(context, QPC_LP_PKTN_INI, lp_pktn_ini);

Because gcc has miscalculated the constantness of lp_pktn_ini:

	mtu = ib_mtu_enum_to_int(ib_mtu);
	if (WARN_ON(mtu < 0)) [..]
	lp_pktn_ini = ilog2(MAX_LP_MSG_LEN / mtu);

Since mtu is limited to {256,512,1024,2048,4096} lp_pktn_ini is between 4
and 8 which is compatible with the 4 bit field in the FIELD_PREP.

Work around this broken compiler by adding a 'can never be true'
constraint on lp_pktn_ini's value which clears out the problem.

Fixes: f0cb411aad ("RDMA/hns: Use new interface to modify QP context")
Link: https://lore.kernel.org/r/0-v1-c773ecb137bc+11f-hns_gcc8_jgg@nvidia.com
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-24 08:47:55 -03:00
Jason Gunthorpe
305d568b72 RDMA/cma: Ensure rdma_addr_cancel() happens before issuing more requests
The FSM can run in a circle allowing rdma_resolve_ip() to be called twice
on the same id_priv. While this cannot happen without going through the
work, it violates the invariant that the same address resolution
background request cannot be active twice.

       CPU 1                                  CPU 2

rdma_resolve_addr():
  RDMA_CM_IDLE -> RDMA_CM_ADDR_QUERY
  rdma_resolve_ip(addr_handler)  #1

			 process_one_req(): for #1
                          addr_handler():
                            RDMA_CM_ADDR_QUERY -> RDMA_CM_ADDR_BOUND
                            mutex_unlock(&id_priv->handler_mutex);
                            [.. handler still running ..]

rdma_resolve_addr():
  RDMA_CM_ADDR_BOUND -> RDMA_CM_ADDR_QUERY
  rdma_resolve_ip(addr_handler)
    !! two requests are now on the req_list

rdma_destroy_id():
 destroy_id_handler_unlock():
  _destroy_id():
   cma_cancel_operation():
    rdma_addr_cancel()

                          // process_one_req() self removes it
		          spin_lock_bh(&lock);
                           cancel_delayed_work(&req->work);
	                   if (!list_empty(&req->list)) == true

      ! rdma_addr_cancel() returns after process_on_req #1 is done

   kfree(id_priv)

			 process_one_req(): for #2
                          addr_handler():
	                    mutex_lock(&id_priv->handler_mutex);
                            !! Use after free on id_priv

rdma_addr_cancel() expects there to be one req on the list and only
cancels the first one. The self-removal behavior of the work only happens
after the handler has returned. This yields a situations where the
req_list can have two reqs for the same "handle" but rdma_addr_cancel()
only cancels the first one.

The second req remains active beyond rdma_destroy_id() and will
use-after-free id_priv once it inevitably triggers.

Fix this by remembering if the id_priv has called rdma_resolve_ip() and
always cancel before calling it again. This ensures the req_list never
gets more than one item in it and doesn't cost anything in the normal flow
that never uses this strange error path.

Link: https://lore.kernel.org/r/0-v1-3bc675b8006d+22-syz_cancel_uaf_jgg@nvidia.com
Cc: stable@vger.kernel.org
Fixes: e51060f08a ("IB: IP address based RDMA connection manager")
Reported-by: syzbot+dc3dfba010d7671e05f5@syzkaller.appspotmail.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-23 17:03:09 -03:00
Jason Gunthorpe
bc0bdc5afa RDMA/cma: Do not change route.addr.src_addr.ss_family
If the state is not idle then rdma_bind_addr() will immediately fail and
no change to global state should happen.

For instance if the state is already RDMA_CM_LISTEN then this will corrupt
the src_addr and would cause the test in cma_cancel_operation():

		if (cma_any_addr(cma_src_addr(id_priv)) && !id_priv->cma_dev)

To view a mangled src_addr, eg with a IPv6 loopback address but an IPv4
family, failing the test.

This would manifest as this trace from syzkaller:

  BUG: KASAN: use-after-free in __list_add_valid+0x93/0xa0 lib/list_debug.c:26
  Read of size 8 at addr ffff8881546491e0 by task syz-executor.1/32204

  CPU: 1 PID: 32204 Comm: syz-executor.1 Not tainted 5.12.0-rc8-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
   __dump_stack lib/dump_stack.c:79 [inline]
   dump_stack+0x141/0x1d7 lib/dump_stack.c:120
   print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232
   __kasan_report mm/kasan/report.c:399 [inline]
   kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
   __list_add_valid+0x93/0xa0 lib/list_debug.c:26
   __list_add include/linux/list.h:67 [inline]
   list_add_tail include/linux/list.h:100 [inline]
   cma_listen_on_all drivers/infiniband/core/cma.c:2557 [inline]
   rdma_listen+0x787/0xe00 drivers/infiniband/core/cma.c:3751
   ucma_listen+0x16a/0x210 drivers/infiniband/core/ucma.c:1102
   ucma_write+0x259/0x350 drivers/infiniband/core/ucma.c:1732
   vfs_write+0x28e/0xa30 fs/read_write.c:603
   ksys_write+0x1ee/0x250 fs/read_write.c:658
   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Which is indicating that an rdma_id_private was destroyed without doing
cma_cancel_listens().

Instead of trying to re-use the src_addr memory to indirectly create an
any address build one explicitly on the stack and bind to that as any
other normal flow would do.

Link: https://lore.kernel.org/r/0-v1-9fbb33f5e201+2a-cma_listen_jgg@nvidia.com
Cc: stable@vger.kernel.org
Fixes: 732d41c545 ("RDMA/cma: Make the locking for automatic state transition more clear")
Reported-by: syzbot+6bb0528b13611047209c@syzkaller.appspotmail.com
Tested-by: Hao Sun <sunhao.th@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-22 13:28:32 -03:00
Sindhu Devale
9f7fa37a6b RDMA/irdma: Report correct WC error when there are MW bind errors
Report the correct WC error when MW bind error related asynchronous events
are generated by HW.

Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://lore.kernel.org/r/20210916191222.824-5-shiraz.saleem@intel.com
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-20 14:13:23 -03:00
Sindhu Devale
d3bdcd5963 RDMA/irdma: Report correct WC error when transport retry counter is exceeded
When the retry counter exceeds, as the remote QP didn't send any Ack or
Nack an asynchronous event (AE) for too many retries is generated. Add
code to handle the AE and set the correct IB WC error code
IB_WC_RETRY_EXC_ERR.

Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://lore.kernel.org/r/20210916191222.824-4-shiraz.saleem@intel.com
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-20 14:13:23 -03:00
Sindhu Devale
f4475f2494 RDMA/irdma: Validate number of CQ entries on create CQ
Add lower bound check for CQ entries at creation time.

Fixes: b48c24c2d7 ("RDMA/irdma: Implement device supported verb APIs")
Link: https://lore.kernel.org/r/20210916191222.824-3-shiraz.saleem@intel.com
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-20 14:13:23 -03:00
Sindhu Devale
5b1e985f76 RDMA/irdma: Skip CQP ring during a reset
Due to duplicate reset flags, CQP commands are processed during reset.

This leads CQP failures such as below:

 irdma0: [Delete Local MAC Entry Cmd Error][op_code=49] status=-27 waiting=1 completion_err=0 maj=0x0 min=0x0

Remove the redundant flag and set the correct reset flag so CPQ is paused
during reset

Fixes: 8498a30e1b ("RDMA/irdma: Register auxiliary driver and implement private channel OPs")
Link: https://lore.kernel.org/r/20210916191222.824-2-shiraz.saleem@intel.com
Reported-by: LiLiang <liali@redhat.com>
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-20 14:13:22 -03:00
Tao Liu
ca465e1f1f RDMA/cma: Fix listener leak in rdma_cma_listen_on_all() failure
If cma_listen_on_all() fails it leaves the per-device ID still on the
listen_list but the state is not set to RDMA_CM_ADDR_BOUND.

When the cmid is eventually destroyed cma_cancel_listens() is not called
due to the wrong state, however the per-device IDs are still holding the
refcount preventing the ID from being destroyed, thus deadlocking:

 task:rping state:D stack:   0 pid:19605 ppid: 47036 flags:0x00000084
 Call Trace:
  __schedule+0x29a/0x780
  ? free_unref_page_commit+0x9b/0x110
  schedule+0x3c/0xa0
  schedule_timeout+0x215/0x2b0
  ? __flush_work+0x19e/0x1e0
  wait_for_completion+0x8d/0xf0
  _destroy_id+0x144/0x210 [rdma_cm]
  ucma_close_id+0x2b/0x40 [rdma_ucm]
  __destroy_id+0x93/0x2c0 [rdma_ucm]
  ? __xa_erase+0x4a/0xa0
  ucma_destroy_id+0x9a/0x120 [rdma_ucm]
  ucma_write+0xb8/0x130 [rdma_ucm]
  vfs_write+0xb4/0x250
  ksys_write+0xb5/0xd0
  ? syscall_trace_enter.isra.19+0x123/0x190
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Ensure that cma_listen_on_all() atomically unwinds its action under the
lock during error.

Fixes: c80a0c52d8 ("RDMA/cma: Add missing error handling of listen_id")
Link: https://lore.kernel.org/r/20210913093344.17230-1-thomas.liu@ucloud.cn
Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-15 11:32:32 -03:00
Christoph Lameter
2cc74e1ee3 IB/cma: Do not send IGMP leaves for sendonly Multicast groups
ROCE uses IGMP for Multicast instead of the native Infiniband system where
joins are required in order to post messages on the Multicast group.  On
Ethernet one can send Multicast messages to arbitrary addresses without
the need to subscribe to a group.

So ROCE correctly does not send IGMP joins during rdma_join_multicast().

F.e. in cma_iboe_join_multicast() we see:

   if (addr->sa_family == AF_INET) {
                if (gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) {
                        ib.rec.hop_limit = IPV6_DEFAULT_HOPLIMIT;
                        if (!send_only) {
                                err = cma_igmp_send(ndev, &ib.rec.mgid,
                                                    true);
                        }
                }
        } else {

So the IGMP join is suppressed as it is unnecessary.

However no such check is done in destroy_mc(). And therefore leaving a
sendonly multicast group will send an IGMP leave.

This means that the following scenario can lead to a multicast receiver
unexpectedly being unsubscribed from a MC group:

1. Sender thread does a sendonly join on MC group X. No IGMP join
   is sent.

2. Receiver thread does a regular join on the same MC Group x.
   IGMP join is sent and the receiver begins to get messages.

3. Sender thread terminates and destroys MC group X.
   IGMP leave is sent and the receiver no longer receives data.

This patch adds the same logic for sendonly joins to destroy_mc() that is
also used in cma_iboe_join_multicast().

Fixes: ab15c95a17 ("IB/core: Support for CMA multicast join flags")
Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2109081340540.668072@gentwo.de
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-14 15:39:03 -03:00
Jason Gunthorpe
3110b942d3 IB/qib: Fix clang confusion of NULL pointer comparison
clang becomes confused due to the comparison to NULL in a integer constant
expression context:

 >> drivers/infiniband/hw/qib/qib_sysfs.c:413:1: error: static_assert expression is not an integral constant expression
    QIB_DIAGC_ATTR(rc_resends);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~
    drivers/infiniband/hw/qib/qib_sysfs.c:406:16: note: expanded from macro 'QIB_DIAGC_ATTR'
            static_assert(&((struct qib_ibport *)0)->rvp.n_##N != (u64 *)NULL);    \

Nathan found __same_type that solves this problem nicely, so use it instead.

Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-13 16:28:39 -03:00
Linus Torvalds
4b105f4a25 RDMA v5.15 merge window 2nd Pull Request
An important error case regression fixes in mlx5:
 
 - Wrong size used when computing the error path smaller allocation request
   leads to corruption
 
 - Confusing but ultimately harmless alignment mis-calculation
 
 - Static checker warnings:
     Null pointer subtraction in qib
     kcalloc in bnxt_re
     Missing static on global variable in hfi1
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmE4qagACgkQOG33FX4g
 mxoueg//SfLuxirMifDHOpH2Db5vC2a/BwYW3AuT78zAXzlUmmEWYIUw1fbJA2vK
 vt/fsjaxW2FBsluiIgV8J961Mlrj+/Y1KPxbV+76CKFoBkaYcjEJcIajGXkm71r5
 bXviUuFEHV2810Nxs6QEitMPyEP3sODmcK5EoKIuifVfjhdcJ3XrLbJb0NA5PKk1
 voD2zpAfhumiAzjZ2q3rJgpCv08WWqts30DzAKdii1asdG9tDF8WBS9BJuPC7MHF
 B2AmYVTgwhsepCAUaYGathKxHblqakRXF3gjOIAB4TKuZEv0OwGVzH3flFbukg5L
 icEw/Ai0H2GM7e3yT3BI3QDoXMg52DfCeChgwttkNsJJidjlc7cUCwaaBCJ1oVo/
 860foW9YFA14ftAlBt2BtvfNfOaqYnmz4tmavVNxqW7NiZCqT/+uyV5BSDPD5b5M
 Xtsa8NK/6u65rz92GtNpVYd8W4ZfjOCAc8cWj6+ELeSDsU92sFWBWalWsl59Hq+t
 AdCUi25uwUAlLPiPKRlYxSXoCCOG80iDgrj8sntIGoVYT0nr4ITVv5VmmVWNQLPw
 V5xfDhCmALrNlp1nCBRsmtBHA/Legqu82/lWpkl12WZGOytU/jTEb2gLWPDglwr+
 BWmaq3Us/g36NX9h/GOvqD5tZCogzkjHvWLW0biMRebDyO1DW1Y=
 =Sg6A
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "I don't usually send a second PR in the merge window, but the fix to
  mlx5 is significant enough that it should start going through the
  process ASAP. Along with it comes some of the usual -rc stuff that
  would normally wait for a -rc2 or so.

  Summary:

  Important error case regression fixes in mlx5:

   - Wrong size used when computing the error path smaller allocation
     request leads to corruption

   - Confusing but ultimately harmless alignment mis-calculation

  Static checker warning fixes:

   - NULL pointer subtraction in qib

   - kcalloc in bnxt_re

   - Missing static on global variable in hfi1"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/hfi1: make hist static
  RDMA/bnxt_re: Prefer kcalloc over open coded arithmetic
  IB/qib: Fix null pointer subtraction compiler warning
  RDMA/mlx5: Fix xlt_chunk_align calculation
  RDMA/mlx5: Fix number of allocated XLT entries
2021-09-09 11:14:14 -07:00
chongjiapeng
2169b90889 IB/hfi1: make hist static
This symbol is not used outside of trace.c, so marks it static.

Fix the following sparse warning:

 drivers/infiniband/hw/hfi1/trace.c:491:23: warning: symbol 'hist' was not declared. Should it be static?

Link: https://lore.kernel.org/r/1630921723-21545-1-git-send-email-jiapeng.chong@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: chongjiapeng <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-08 08:33:04 -03:00
Len Baker
f1b195ce81 RDMA/bnxt_re: Prefer kcalloc over open coded arithmetic
As noted in the "Deprecated Interfaces, Language Features, Attributes, and
Conventions" documentation [1], size calculations (especially
multiplication) should not be performed in memory allocator (or similar)
function arguments due to the risk of them overflowing. This could lead to
values wrapping around and a smaller allocation being made than the caller
was expecting. Using those allocations could lead to linear overflows of
heap memory and other misbehaviors.

In this case this is not actually dynamic sizes: both sides of the
multiplication are constant values. However it is best to refactor this
anyway, just to keep the open-coded math idiom out of code.

So, use the purpose specific kcalloc() function instead of the argument
size * count in the kzalloc() function.

Also, remove the unnecessary initialization of the sqp_tbl variable since
it is set a few lines later.

[1] https://www.kernel.org/doc/html/v5.14/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments

Link: https://lore.kernel.org/r/20210905081812.17113-1-len.baker@gmx.com
Signed-off-by: Len Baker <len.baker@gmx.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-08 08:32:53 -03:00
Jason Gunthorpe
84f969e1c4 IB/qib: Fix null pointer subtraction compiler warning
>> drivers/infiniband/hw/qib/qib_sysfs.c:411:1: warning: performing pointer subtraction with a null pointer has undefined behavior
+[-Wnull-pointer-subtraction]
   QIB_DIAGC_ATTR(rc_resends);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/infiniband/hw/qib/qib_sysfs.c:408:51: note: expanded from macro 'QIB_DIAGC_ATTR'
                   .counter = &((struct qib_ibport *)0)->rvp.n_##N - (u64 *)0,    \

Use offsetof and accomplish the type check using static_assert.

Fixes: 4a7aaf88c8 ("RDMA/qib: Use attributes for the port sysfs")
Link: https://lore.kernel.org/r/0-v1-43ae3c759177+65-qib_type_jgg@nvidia.com
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-08 08:32:04 -03:00
Niklas Schnelle
f4c6f31011 RDMA/mlx5: Fix xlt_chunk_align calculation
The XLT chunk alignment depends on ent_size not sizeof(ent_size) aka
sizeof(size_t). The incoming ent_size is either 8 or 16, so the
miscalculation when 16 is required is only an over-alignment and
functional harmless.

Fixes: 8010d74b99 ("RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()")
Link: https://lore.kernel.org/r/20210908081849.7948-2-schnelle@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-08 08:31:10 -03:00
Niklas Schnelle
9660dcbe0d RDMA/mlx5: Fix number of allocated XLT entries
In commit 8010d74b99 ("RDMA/mlx5: Split the WR setup out of
mlx5_ib_update_xlt()") the allocation logic was split out of
mlx5_ib_update_xlt() and the logic was changed to enable better OOM
handling. Sadly this change introduced a miscalculation of the number of
entries that were actually allocated when under memory pressure where it
can actually become 0 which on s390 lets dma_map_single() fail.

It can also lead to corruption of the free pages list when the wrong
number of entries is used in the calculation of sg->length which is used
as argument for free_pages().

Fix this by using the allocation size instead of misusing get_order(size).

Cc: stable@vger.kernel.org
Fixes: 8010d74b99 ("RDMA/mlx5: Split the WR setup out of mlx5_ib_update_xlt()")
Link: https://lore.kernel.org/r/20210908081849.7948-1-schnelle@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-08 08:31:10 -03:00
Linus Torvalds
a9c9a6f741 SCSI misc on 20210902
This series consists of the usual driver updates (ufs, qla2xxx,
 target, smartpqi, lpfc, mpt3sas).  The core change causing the most
 churn was replacing the command request field request with a macro,
 allowing us to offset map to it and remove the redundant field; the
 same was also done for the tag field.  The most impactful change is
 the final removal of scsi_ioctl, which has been deprecated for over a
 decade.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYTD/TiYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishdUkAQCjb3Ux
 4K9438mMelHlzM4er1S1IJ0WNnvObaVMNO9LBwD+JUz+rHsrKvuEX9j3g3C3u6JH
 hC3BUEW8f2LLnujWanQ=
 =lC5o
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This series consists of the usual driver updates (ufs, qla2xxx,
  target, smartpqi, lpfc, mpt3sas).

  The core change causing the most churn was replacing the command
  request field request with a macro, allowing us to offset map to it
  and remove the redundant field; the same was also done for the tag
  field.

  The most impactful change is the final removal of scsi_ioctl, which
  has been deprecated for over a decade"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (293 commits)
  scsi: ufs: Fix ufshcd_request_sense_async() for Samsung KLUFG8RHDA-B2D1
  scsi: ufs: ufs-exynos: Fix static checker warning
  scsi: mpt3sas: Use the proper SCSI midlayer interfaces for PI
  scsi: lpfc: Use the proper SCSI midlayer interfaces for PI
  scsi: lpfc: Copyright updates for 14.0.0.1 patches
  scsi: lpfc: Update lpfc version to 14.0.0.1
  scsi: lpfc: Add bsg support for retrieving adapter cmf data
  scsi: lpfc: Add cmf_info sysfs entry
  scsi: lpfc: Add debugfs support for cm framework buffers
  scsi: lpfc: Add support for maintaining the cm statistics buffer
  scsi: lpfc: Add rx monitoring statistics
  scsi: lpfc: Add support for the CM framework
  scsi: lpfc: Add cmfsync WQE support
  scsi: lpfc: Add support for cm enablement buffer
  scsi: lpfc: Add cm statistics buffer support
  scsi: lpfc: Add EDC ELS support
  scsi: lpfc: Expand FPIN and RDF receive logging
  scsi: lpfc: Add MIB feature enablement support
  scsi: lpfc: Add SET_HOST_DATA mbox cmd to pass date/time info to firmware
  scsi: fc: Add EDC ELS definition
  ...
2021-09-02 15:09:46 -07:00
Linus Torvalds
23852bec53 RDMA v5.15 merge window Pull Request
- Various cleanup and small features for rtrs
 
 - kmap_local_page() conversions
 
 - Driver updates and fixes for: efa, rxe, mlx5, hfi1, qed, hns
 
 - Cache the IB subnet prefix
 
 - Rework how CRC is calcuated in rxe
 
 - Clean reference counting in iwpm's netlink
 
 - Pull object allocation and lifecycle for user QPs to the uverbs core
   code
 
 - Several small hns features and continued general code cleanups
 
 - Fix the scatterlist confusion of orig_nents/nents introduced in an
   earlier patch creating the append operation
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmEudRgACgkQOG33FX4g
 mxraJA//c6bMxrrTVrzmrtrkyYD4tYWE8RDfgvoyZtleZnnEOJeunCQWakQrpJSv
 ukSnOGCA3PtnmRMdV54f/11YJ/7otxOJodSO7jWsIoBrqG/lISAdX8mn2iHhrvJ0
 dIaFEFPLy0WqoMLCJVIYIupR0IStVHb/mWx0uYL4XnnoYKyt7f7K5JMZpNWMhDN2
 ieJw0jfrvEYm8pipWuxUvB16XARlzAWQrjqLpMRI+jFRpbDVBY21dz2/LJvOJPrA
 LcQ+XXsV/F659ibOAGm6bU4BMda8fE6Lw90B/gmhSswJ205NrdziF5cNYHP0QxcN
 oMjrjSWWHc9GEE7MTipC2AH8e36qob16Q7CK+zHEJ+ds7R6/O/8XmED1L8/KFpNA
 FGqnjxnxsl1y27mUegfj1Hh8PfoDp2oVq0lmpEw0CYo4cfVzHSMRrbTR//XmW628
 Ie/mJddpFK4oLk+QkSNjSLrnxOvdTkdA58PU0i84S5eUVMNm41jJDkxg2J7vp0Zn
 sclZsclhUQ9oJ5Q2so81JMWxu4JDn7IByXL0ULBaa6xwQTiVEnyvSxSuPlflhLRW
 0vI2ylATYKyWkQqyX7VyWecZJzwhwZj5gMMWmoGsij8bkZhQ/VaQMaesByzSth+h
 NV5UAYax4GqyOQ/tg/tqT6e5nrI1zof87H64XdTCBpJ7kFyQ/oA=
 =ZwOe
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This is quite a small cycle, no major series stands out. The HNS and
  rxe drivers saw the most activity this cycle, with rxe being broken
  for a good chunk of time. The significant deleted line count is due to
  a SPDX cleanup series.

  Summary:

   - Various cleanup and small features for rtrs

   - kmap_local_page() conversions

   - Driver updates and fixes for: efa, rxe, mlx5, hfi1, qed, hns

   - Cache the IB subnet prefix

   - Rework how CRC is calcuated in rxe

   - Clean reference counting in iwpm's netlink

   - Pull object allocation and lifecycle for user QPs to the uverbs
     core code

   - Several small hns features and continued general code cleanups

   - Fix the scatterlist confusion of orig_nents/nents introduced in an
     earlier patch creating the append operation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (90 commits)
  RDMA/mlx5: Relax DCS QP creation checks
  RDMA/hns: Delete unnecessary blank lines.
  RDMA/hns: Encapsulate the qp db as a function
  RDMA/hns: Adjust the order in which irq are requested and enabled
  RDMA/hns: Remove RST2RST error prints for hw v1
  RDMA/hns: Remove dqpn filling when modify qp from Init to Init
  RDMA/hns: Fix QP's resp incomplete assignment
  RDMA/hns: Fix query destination qpn
  RDMA/hfi1: Convert to SPDX identifier
  IB/rdmavt: Convert to SPDX identifier
  RDMA/hns: Bugfix for incorrect association between dip_idx and dgid
  RDMA/hns: Bugfix for the missing assignment for dip_idx
  RDMA/hns: Bugfix for data type of dip_idx
  RDMA/hns: Fix incorrect lsn field
  RDMA/irdma: Remove the repeated declaration
  RDMA/core/sa_query: Retry SA queries
  RDMA: Use the sg_table directly and remove the opencoded version from umem
  lib/scatterlist: Fix wrong update of orig_nents
  lib/scatterlist: Provide a dedicated function to support table append
  RDMA/hns: Delete unused hns bitmap interface
  ...
2021-09-02 14:47:21 -07:00
Jason Gunthorpe
6a217437f9 Merge branch 'sg_nents' into rdma.git for-next
From Maor Gottlieb
====================

Fix the use of nents and orig_nents in the sg table append helpers. The
nents should be used by the DMA layer to store the number of DMA mapped
sges, the orig_nents is the number of CPU sges.

Since the sg append logic doesn't always create a SGL with exactly
orig_nents entries store a total_nents as well to allow the table to be
properly free'd and reorganize the freeing logic to share across all the
use cases.

====================

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

* 'sg_nents':
  RDMA: Use the sg_table directly and remove the opencoded version from umem
  lib/scatterlist: Fix wrong update of orig_nents
  lib/scatterlist: Provide a dedicated function to support table append
2021-08-30 09:49:59 -03:00
Lior Nahmanson
65f90c8e38 RDMA/mlx5: Relax DCS QP creation checks
In order to create DCS QPs, we don't need to rely on both
log_max_dci_stream_channels and log_max_dci_errored_streams capabilities.

Fixes: 11656f593a ("RDMA/mlx5: Add DCS offload support")
Link: https://lore.kernel.org/r/3e7b3363fd73686176cc584295e86832a7cf99b2.1630320354.git.leonro@nvidia.com
Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-30 09:47:40 -03:00
Jakub Kicinski
97c78d0af5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/wwan/mhi_wwan_mbim.c - drop the extra arg.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-26 17:57:57 -07:00
Xinhao Liu
1a0182785a RDMA/hns: Delete unnecessary blank lines.
Just delete unnecessary blank lines.

Link: https://lore.kernel.org/r/1629985056-57004-8-git-send-email-liangwenpeng@huawei.com
Signed-off-by: Xinhao Liu <liuxinhao5@hisilicon.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26 12:12:21 -03:00
Yixing Liu
ae2854c5d3 RDMA/hns: Encapsulate the qp db as a function
Encapsulate qp db into two functions: user and kernel.

Link: https://lore.kernel.org/r/1629985056-57004-7-git-send-email-liangwenpeng@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26 12:12:21 -03:00
Wenpeng Liang
7fac71691b RDMA/hns: Adjust the order in which irq are requested and enabled
It should first alloc workqueue and request irq, and finally enable irq.

Link: https://lore.kernel.org/r/1629985056-57004-6-git-send-email-liangwenpeng@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26 12:12:20 -03:00
Weihang Li
ab5cbb9d28 RDMA/hns: Remove RST2RST error prints for hw v1
There is no need to prints error for hw_v1.

Link: https://lore.kernel.org/r/1629985056-57004-5-git-send-email-liangwenpeng@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26 12:12:20 -03:00
Wenpeng Liang
fe164fc8d7 RDMA/hns: Remove dqpn filling when modify qp from Init to Init
According to the IB specification, the destination qpn is allowed to be
filled into the qpc only when the qp transitions from Init to RTR, so this
code is unused.

Link: https://lore.kernel.org/r/1629985056-57004-4-git-send-email-liangwenpeng@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26 12:12:20 -03:00
Wenpeng Liang
d2e0ccffcd RDMA/hns: Fix QP's resp incomplete assignment
The resp passed to the user space represents the enable flag of qp,
incomplete assignment will cause some features of the user space to be
disabled.

Fixes: 90ae0b57e4 ("RDMA/hns: Combine enable flags of qp")
Fixes: aba457ca89 ("RDMA/hns: Support owner mode doorbell")
Link: https://lore.kernel.org/r/1629985056-57004-3-git-send-email-liangwenpeng@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26 12:12:20 -03:00
Wenpeng Liang
e788a3cd57 RDMA/hns: Fix query destination qpn
The bit width of dqpn is 24 bits, using u8 will cause truncation error.

Fixes: 926a01dc00 ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1629985056-57004-2-git-send-email-liangwenpeng@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-26 12:12:20 -03:00
Cai Huoqing
145eba1aae RDMA/hfi1: Convert to SPDX identifier
use SPDX-License-Identifier instead of a verbose license text

Link: https://lore.kernel.org/r/20210823042622.109-1-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-25 14:56:48 -03:00
Cai Huoqing
d164bf64a9 IB/rdmavt: Convert to SPDX identifier
use SPDX-License-Identifier instead of a verbose license text

Link: https://lore.kernel.org/r/20210823023530.48-1-caihuoqing@baidu.com
Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-25 14:55:49 -03:00
Junxian Huang
eb653eda1e RDMA/hns: Bugfix for incorrect association between dip_idx and dgid
dip_idx and dgid should be a one-to-one mapping relationship, but when
qp_num loops back to the start number, it may happen that two different
dgid are assiociated to the same dip_idx incorrectly.

One solution is to store the qp_num that is not assigned to dip_idx in an
array. When a dip_idx needs to be allocated to a new dgid, an spare qp_num
is extracted and assigned to dip_idx.

Fixes: f91696f2f0 ("RDMA/hns: Support congestion control type selection according to the FW")
Link: https://lore.kernel.org/r/1629884592-23424-4-git-send-email-liangwenpeng@huawei.com
Signed-off-by: Junxian Huang <huangjunxian4@hisilicon.com>
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-25 13:55:30 -03:00
Junxian Huang
074f315fc5 RDMA/hns: Bugfix for the missing assignment for dip_idx
When the dgid-dip_idx mapping relationship exists, dip should be assigned.

Fixes: f91696f2f0 ("RDMA/hns: Support congestion control type selection according to the FW")
Link: https://lore.kernel.org/r/1629884592-23424-3-git-send-email-liangwenpeng@huawei.com
Signed-off-by: Junxian Huang <huangjunxian4@hisilicon.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-25 13:55:29 -03:00
Junxian Huang
4303e61264 RDMA/hns: Bugfix for data type of dip_idx
dip_idx is associated with qp_num whose data type is u32. However, dip_idx
is incorrectly defined as u8 data in the hns_roce_dip struct, which leads
to data truncation during value assignment.

Fixes: f91696f2f0 ("RDMA/hns: Support congestion control type selection according to the FW")
Link: https://lore.kernel.org/r/1629884592-23424-2-git-send-email-liangwenpeng@huawei.com
Signed-off-by: Junxian Huang <huangjunxian4@hisilicon.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-25 13:55:29 -03:00
Yixing Liu
9bed8a7071 RDMA/hns: Fix incorrect lsn field
In RNR NAK screnario, according to the specification, when no credit is
available, only the first fragment of the send request can be sent. The
LSN(Limit Sequence Number) field should be 0 or the entire packet will be
resent.

Fixes: 926a01dc00 ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1629883169-2306-1-git-send-email-liangwenpeng@huawei.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-25 13:55:11 -03:00
Shaokun Zhang
fc3bf30f1b RDMA/irdma: Remove the repeated declaration
Functions 'irdma_alloc_ws_node_id' and 'irdma_free_ws_node_id' are
declared twice, so remove the repeated declaration.

Link: https://lore.kernel.org/r/1629861674-53343-1-git-send-email-zhangshaokun@hisilicon.com
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-25 13:43:21 -03:00
Håkon Bugge
5f5a650999 RDMA/core/sa_query: Retry SA queries
A MAD packet is sent as an unreliable datagram (UD). SA requests are sent
as MAD packets. As such, SA requests or responses may be silently dropped.

IB Core's MAD layer has a timeout and retry mechanism, which amongst
other, is used by RDMA CM. But it is not used by SA queries. The lack of
retries of SA queries leads to long specified timeout, and error being
returned in case of packet loss. The ULP or user-land process has to
perform the retry.

Fix this by taking advantage of the MAD layer's retry mechanism.

First, a check against a zero timeout is added in rdma_resolve_route(). In
send_mad(), we set the MAD layer timeout to one tenth of the specified
timeout and the number of retries to 10. The special case when timeout is
less than 10 is handled.

With this fix:

 # ucmatose -c 1000 -S 1024 -C 1

runs stable on an Infiniband fabric. Without this fix, we see an
intermittent behavior and it errors out with:

cmatose: event: RDMA_CM_EVENT_ROUTE_ERROR, error: -110

(110 is ETIMEDOUT)

Link: https://lore.kernel.org/r/1628784755-28316-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-25 13:42:47 -03:00