linux

Author	SHA1	Message	Date
Leon Romanovsky	119181d1d4	RDMA: Restore ability to fail on SRQ destroy In similar way to other IB objects, restore the ability to return error on SRQ destroy. Strictly speaking, this change is not necessary, and provided here to ensure a symmetrical interface like other destroy functions. Fixes: `68e326dea1` ("RDMA: Handle SRQ allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-09 14:14:24 -03:00
Leon Romanovsky	fd89099d63	RDMA/mlx5: Issue FW command to destroy SRQ on reentry The HW release can fail and leave the system in limbo state, where SRQ is removed from the table, but can't be destroyed later. In every reentry, the initial xa_erase_irq() check will fail. Rewrite the erase logic to keep index, but don't store the entry itself. By doing it, we can safely reinsert entry back in the case of destroy failure. Link: https://lore.kernel.org/r/20200907120921.476363-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-09 14:04:13 -03:00
Leon Romanovsky	9a9ebf8cd7	RDMA: Restore ability to fail on AH destroy Like any other IB verbs objects, AH are refcounted by ib_core. The release of those objects are controlled by ib_core with promise that AH destroy can't fail. Being SW object for now, this change makes dealloc_ah() to behave like any other destroy IB flows. Fixes: `d345691471` ("RDMA: Handle AH allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-09 13:57:22 -03:00
Leon Romanovsky	91a7c58fce	RDMA: Restore ability to fail on PD deallocate The IB verbs objects are counted by the kernel and ib_core ensures that deallocate PD will success so it will be called once all other objects that depends on PD will be released. This is achieved by managing various reference counters on such objects. The mlx5 driver didn't follow this standard flow when allowed DEVX objects that are not managed by ib_core to be interleaved with the ones under ib_core responsibility. In such interleaved scenarios deallocate command can fail and ib_core will leave uobject in internal DB and attempt to clean it later to free resources anyway. This change partially restores returned value from dealloc_pd() for all drivers, but keeping in mind that non-DEVX devices and kernel verbs paths shouldn't fail. Fixes: `21a428a019` ("RDMA: Handle PD allocations by IB/core") Link: https://lore.kernel.org/r/20200907120921.476363-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-09 13:57:22 -03:00
Md Haris Iqbal	558d52b297	RDMA/rtrs-srv: Incorporate ib_register_client into rtrs server init The rnbd_server module's communication manager (cm) initialization depends on the registration of the "network namespace subsystem" of the RDMA CM agent module. As such, when the kernel is configured to load the rnbd_server and the RDMA cma module during initialization; and if the rnbd_server module is initialized before RDMA cma module, a null ptr dereference occurs during the RDMA bind operation. Call trace: Call Trace: ? xas_load+0xd/0x80 xa_load+0x47/0x80 cma_ps_find+0x44/0x70 rdma_bind_addr+0x782/0x8b0 ? get_random_bytes+0x35/0x40 rtrs_srv_cm_init+0x50/0x80 rtrs_srv_open+0x102/0x180 ? rnbd_client_init+0x6e/0x6e rnbd_srv_init_module+0x34/0x84 ? rnbd_client_init+0x6e/0x6e do_one_initcall+0x4a/0x200 kernel_init_freeable+0x1f1/0x26e ? rest_init+0xb0/0xb0 kernel_init+0xe/0x100 ret_from_fork+0x22/0x30 Modules linked in: CR2: 0000000000000015 All this happens cause the cm init is in the call chain of the module init, which is not a preferred practice. So remove the call to rdma_create_id() from the module init call chain. Instead register rtrs-srv as an ib client, which makes sure that the rdma_create_id() is called only when an ib device is added. Fixes: `9cb8374804` ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20200907103106.104530-1-haris.iqbal@cloud.ionos.com Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Reviewed-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-09 13:31:08 -03:00
Lijun Ou	a2f3d4479f	RDMA/hns: Avoid unncessary initialization Some variables have been initialized when used. As a result, here removes some unncessary initial assignment. Link: https://lore.kernel.org/r/1599547944-30671-1-git-send-email-oulijun@huawei.com Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-09 13:23:15 -03:00
Jason Gunthorpe	f553246f7f	RDMA/core: Change how failing destroy is handled during uobj abort Currently it triggers a WARN_ON and then goes ahead and destroys the uobject anyhow, leaking any driver memory. The only place that leaks driver memory should be during FD close() in uverbs_destroy_ufile_hw(). Drivers are only allowed to fail destroy uobjects if they guarantee destroy will eventually succeed. uverbs_destroy_ufile_hw() provides the loop to give the driver that chance. Link: https://lore.kernel.org/r/20200902081708.746631-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-09 13:16:48 -03:00
Allen Pais	00b3c11879	RDMA/rxe: Convert tasklets to use new tasklet_setup() API In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Link: https://lore.kernel.org/r/20200903060637.424458-6-allen.lkml@gmail.com Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-03 12:01:53 -03:00
Allen Pais	a23afb448b	RDMA/qib: Convert tasklets to use new tasklet_setup() API In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Link: https://lore.kernel.org/r/20200903060637.424458-5-allen.lkml@gmail.com Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-03 12:01:53 -03:00
Allen Pais	4e95f84999	RDMA/i40iw: Convert tasklets to use new tasklet_setup() API In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Link: https://lore.kernel.org/r/20200903060637.424458-4-allen.lkml@gmail.com Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-03 12:01:52 -03:00
Allen Pais	55db47d082	RDMA/hfi1: Convert tasklets to use new tasklet_setup() API In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Link: https://lore.kernel.org/r/20200903060637.424458-3-allen.lkml@gmail.com Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-03 12:01:52 -03:00
Allen Pais	53c2a706ae	RDMA/bnxt_re: Convert tasklets to use new tasklet_setup() API In preparation for unconditionally passing the struct tasklet_struct pointer to all tasklet callbacks, switch to using the new tasklet_setup() and from_tasklet() to pass the tasklet pointer explicitly. Link: https://lore.kernel.org/r/20200903060637.424458-2-allen.lkml@gmail.com Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: Allen Pais <allen.lkml@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-03 12:01:52 -03:00
Leon Romanovsky	4b916ed9f9	RDMA/mlx5: Fix potential race between destroy and CQE poll The SRQ can be destroyed right before mlx5_cmd_get_srq is called. In such case the latter will return NULL instead of expected SRQ. Fixes: `e126ba97db` ("mlx5: Add driver for Mellanox Connect-IB adapters") Link: https://lore.kernel.org/r/20200830084010.102381-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-03 10:12:57 -03:00
Alex Dewar	4f680cb9f1	RDMA/ucma: Fix resource leak on error path In ucma_process_join(), if the call to xa_alloc() fails, the function will return without freeing mc. Fix this by jumping to the correct line. In the process I renamed the jump labels to something more memorable for extra clarity. Link: https://lore.kernel.org/r/20200902162454.332828-1-alex.dewar90@gmail.com Addresses-Coverity-ID: 1496814 ("Resource leak") Fixes: `95fe51096b` ("RDMA/ucma: Remove mc_list and rely on xarray") Signed-off-by: Alex Dewar <alex.dewar90@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-02 17:06:48 -03:00
Kamal Heib	7d11b4787d	RDMA/qedr: Fix reported max_pkeys As qedr driver supports both RoCE and iWarp, make sure to set the max_pkeys only when running in RoCE mode. Link: https://lore.kernel.org/r/20200827141655.406185-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-09-02 15:42:36 -03:00
Alex Dewar	524d8ffd07	RDMA/qib: Tidy up process_cc() This function has a lot of gotos which could be replaced by simple returns, making the function tidier and less bug prone. Link: https://lore.kernel.org/r/20200825171242.448447-2-alex.dewar90@gmail.com Signed-off-by: Alex Dewar <alex.dewar90@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-31 13:21:06 -03:00
Alex Dewar	d2598bb809	RDMA/qib: Remove superfluous fallthrough statements Commit `36a8f01cd2` ("IB/qib: Add congestion control agent implementation") erroneously marked a couple of switch cases as /* FALLTHROUGH */, which were later converted to fallthrough statements by commit `df561f6688` ("treewide: Use fallthrough pseudo-keyword"). This triggered a Coverity warning about unreachable code. Remove the fallthrough statements. Link: https://lore.kernel.org/r/20200825171242.448447-1-alex.dewar90@gmail.com Addresses-Coverity: ("Unreachable code") Fixes: `36a8f01cd2` ("IB/qib: Add congestion control agent implementation") Signed-off-by: Alex Dewar <alex.dewar90@gmail.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-31 13:21:06 -03:00
Jason Gunthorpe	6989aa62d3	Linux 5.9-rc3 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl9ML+IeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGA8EIAIy/kTbFS0yrE9yV hb98oX0z9+EU9YQg9vhaRWwPd+rJF/JMQZLqYcwbhjG9abaUL3T3fEcSAefMHw8E LAt+hYzA38dHt7tqhsFQX3vV1VorvDVICBVN0yRPRWKKikq4OPIHzaAR9tleGAF5 8btQisl1PjN+obwYmLuNb6aX16OCwAF+uXOwehcoJs9dvMNhwtXRzfOflWzOvOo6 tE0bHErlylLDfLv4ZzEfczTdks4QJZ7C0xLSf3oN9AAynW42Xnhct4hi8qZY/hCf CMaqeN4hdpub6TvQIqBdDqMMjEXGFgeNSnAEBQY9VpvUqz8NTu6sQxwgJEKDF5tg d81lv2c= =uW/F -----END PGP SIGNATURE----- Merge tag 'v5.9-rc3' into rdma.git for-next Required due to dependencies in following patches. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-31 12:28:12 -03:00
Bob Pearson	7672dac304	RDMA/rxe: Address an issue with hardened user copy Change rxe pools to use kzalloc instead of kmem_cache to allocate memory for rxe objects. The pools are not really necessary and they trigger hardened user copy warnings as the ioctl framework copies the QP number directly to userspace. Also the general project to move object alloation to the core code will eventually clean these out anyhow. Link: https://lore.kernel.org/r/20200827163535.2632-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-31 12:21:16 -03:00
Bob Pearson	63fa15dbd4	RDMA/rxe: Add SPDX hdrs to rxe source files Add SPDX headers to all rxe .c and .h files. Link: https://lore.kernel.org/r/20200827145439.2273-1-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-31 12:20:02 -03:00
Jason Gunthorpe	5d985d724b	RDMA/core: Trigger a WARN_ON if the driver causes uobjects to become leaked Drivers that fail destroy can cause uverbs to leak uobjects. Drivers are required to always eventually destroy their ubojects, so trigger a WARN_ON to detect this driver bug. Link: https://lore.kernel.org/r/0-v1-b1e0ed400ba9+f7-warn_destroy_ufile_hw_jgg@nvidia.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-31 12:10:11 -03:00
Weihang Li	074bf2c2c7	RDMA/hns: Get udp sport num dynamically instead of using a fixed value The UDP source port number in RoCE v2 is used to create entropy for network routers (ECMP), load balancers and 802.3ad link aggregation switching that are not aware of RoCE IB headers. Considering that the IB core has achieved a new interface to get a hashed value of it, the fixed value of it in QPC and UD WQE in hns driver could be fixed and the port number is to be set dynamically now. For QPC of RC, the value could be hashed from flow_lable if the user pass it in or from remote qpn and local qpn. For WQE of UD, it is set according to fl or as a random value. Link: https://lore.kernel.org/r/1598002289-8611-1-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-31 12:03:17 -03:00
Bob Pearson	5f9e2822d1	RDMA/rxe: Fix style warnings Fixed several minor checkpatch warnings in existing rxe source. Link: https://lore.kernel.org/r/20200820224638.3212-3-rpearson@hpe.com Signed-off-by: Bob Pearson <rpearson@hpe.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 09:51:08 -03:00
Lang Cheng	e0ef0f68c4	RDMA/hns: Add a check for current state before modifying QP It should be considered an illegal operation if the ULP attempts to modify a QP from another state to the current hardware state. Otherwise, the ULP can modify some fields of QPC at any time. For example, for a QP in state of RTS, modify it from RTR to RTS can change the PSN, which is always not as expected. Fixes: `9a4435375c` ("IB/hns: Add driver files for hns RoCE driver") Link: https://lore.kernel.org/r/1598353674-24270-1-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 09:46:07 -03:00
Kamal Heib	b9caebb290	RDMA/usnic: Remove the query_pkey callback Now that the query_pkey() isn't mandatory by the RDMA core, this callback can be removed from the usnic provider. The libfabric userspace never touches the pkey. Link: https://lore.kernel.org/r/20200820125346.111902-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 09:02:42 -03:00
Jason Gunthorpe	657360d6c7	RDMA/ucma: Remove closing and the close_wq Use cancel_work_sync() to ensure that the wq is not running and simply assign NULL to ctx->cm_id to indicate if the work ran or not. Delete the close_wq since flush_workqueue() is no longer needed. Link: https://lore.kernel.org/r/20200818120526.702120-15-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:16 -03:00
Jason Gunthorpe	a1d33b70db	RDMA/ucma: Rework how new connections are passed through event delivery When a new connection is established the RDMA CM creates a new cm_id and passes it through to the event handler. However inside the UCMA the new ID is not assigned a ucma_context until the user retrieves the event from a syscall. This creates a weird edge condition where a cm_id's context can continue to point at the listening_id that created it, and a number of additional edge conditions on event list clean up related to destroying half created IDs. There is also a race condition in ucma_get_events() where the cm_id->context is being assigned without holding the handler_mutex. Simplify all of this by creating the ucma_context inside the event handler itself and eliminating the edge case of a half created cm_id. All cm_id's can be uniformly destroyed via __destroy_id() or via the close_work. Link: https://lore.kernel.org/r/20200818120526.702120-14-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:16 -03:00
Jason Gunthorpe	310ca1a7dc	RDMA/ucma: Narrow file->mut in ucma_event_handler() Since the backlog is now an atomic the file->mut is now only protecting the event_list and ctx_list. Narrow its scope to make it clear Link: https://lore.kernel.org/r/20200818120526.702120-13-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:16 -03:00
Jason Gunthorpe	26c15dec49	RDMA/ucma: Change backlog into an atomic There is no reason to grab the file->mut just to do this inc/dec work. Use an atomic. Link: https://lore.kernel.org/r/20200818120526.702120-12-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:16 -03:00
Jason Gunthorpe	38e03d0926	RDMA/ucma: Add missing locking around rdma_leave_multicast() All entry points to the rdma_cm from a ULP must be single threaded, even this error unwinds. Add the missing locking. Fixes: `7c11910783` ("RDMA/ucma: Put a lock around every call to the rdma_cm layer") Link: https://lore.kernel.org/r/20200818120526.702120-11-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:15 -03:00
Jason Gunthorpe	98837c6c3d	RDMA/ucma: Fix locking for ctx->events_reported This value is locked under the file->mut, ensure it is held whenever touching it. The case in ucma_migrate_id() is a race, while in ucma_free_uctx() it is already not possible for the write side to run, the movement is just for clarity. Fixes: `88314e4dda` ("RDMA/cma: add support for rdma_migrate_id()") Link: https://lore.kernel.org/r/20200818120526.702120-10-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:15 -03:00
Jason Gunthorpe	09e328e47a	RDMA/ucma: Fix the locking of ctx->file ctx->file is changed under the file->mut lock by ucma_migrate_id(), which is impossible to lock correctly. Instead change ctx->file under the handler_lock and ctx_table lock and revise all places touching ctx->file to use this locking when reading ctx->file. Link: https://lore.kernel.org/r/20200818120526.702120-9-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:15 -03:00
Jason Gunthorpe	308571debc	RDMA/ucma: Do not use file->mut to lock destroying The only reader of destroying is inside a handler under the handler_mutex, so directly use the handler_mutex when setting it instead of the larger file->mut. As the refcount could be zero here, and the cm_id already freed, and additional refcount grab around the locking is required to touch the cm_id. Link: https://lore.kernel.org/r/20200818120526.702120-8-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:15 -03:00
Jason Gunthorpe	d114c6feed	RDMA/cma: Add missing locking to rdma_accept() In almost all cases rdma_accept() is called under the handler_mutex by ULPs from their handler callbacks. The one exception was ucma which did not get the handler_mutex. To improve the understand-ability of the locking scheme obtain the mutex for ucma as well. This improves how ucma works by allowing it to directly use handler_mutex for some of its internal locking against the handler callbacks intead of the global file->mut lock. There does not seem to be a serious bug here, other than a DISCONNECT event can be delivered concurrently with accept succeeding. Link: https://lore.kernel.org/r/20200818120526.702120-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:15 -03:00
Jason Gunthorpe	95fe51096b	RDMA/ucma: Remove mc_list and rely on xarray It is not really necessary to keep a linked list of mcs associated with each context when we can just scan the xarray to find the right things. The removes another overloading of file->mut by relying on the xarray locking for mc instead. Link: https://lore.kernel.org/r/20200818120526.702120-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:14 -03:00
Jason Gunthorpe	620db1a118	RDMA/ucma: Fix error cases around ucma_alloc_ctx() The store to ctx->cm_id was based on the idea that _ucma_find_context() would not return the ctx until it was fully setup. Without locking this doesn't work properly. Split things so that the xarray is allocated with NULL to reserve the ID and once everything is final set the cm_id and store. Along the way this shows that the error unwind in ucma_get_event() if a new ctx is created is wrong, fix it up. Link: https://lore.kernel.org/r/20200818120526.702120-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:14 -03:00
Jason Gunthorpe	c07e12d8e9	RDMA/ucma: Consolidate the two destroy flows ucma_close() is open coding the tail end of ucma_destroy_id(), consolidate this duplicated code into a function. Link: https://lore.kernel.org/r/20200818120526.702120-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:14 -03:00
Jason Gunthorpe	07e266a775	RDMA/ucma: Remove unnecessary locking of file->ctx_list in close During the file_operations release function it is already not possible that write() can be running concurrently, remove the extra locking around the ctx_list. Link: https://lore.kernel.org/r/20200818120526.702120-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:13 -03:00
Jason Gunthorpe	ca2968c1ef	RDMA/ucma: Fix refcount 0 incr in ucma_get_ctx() Both ucma_destroy_id() and ucma_close_id() (triggered from an event via a wq) can drive the refcount to zero. ucma_get_ctx() was wrongly assuming that the refcount can only go to zero from ucma_destroy_id() which also removes it from the xarray. Use refcount_inc_not_zero() instead. Link: https://lore.kernel.org/r/20200818120526.702120-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:38:13 -03:00
Mark Zhang	7c4b1ab9f1	IB/mlx5: Add DCT RoCE LAG support When DCT QPs work in RoCE LAG mode: 1. DCT creation is allowed only when it is supported 2. The "port" of a DCT QP is assigned in a round-robin way Link: https://lore.kernel.org/r/20200818115245.700581-3-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:34:28 -03:00
Mark Zhang	8f3243a047	IB/mlx5: Add tx_affinity support for DCI QP DCI QP supports tx_affinity as well. Link: https://lore.kernel.org/r/20200818115245.700581-2-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-27 08:34:28 -03:00
Chuck Lever	8dc105befe	RDMA/cm: Add tracepoints to track MAD send operations Surface the operation of MAD exchanges during connection establishment. Some samples: [root@klimt ~]# trace-cmd report -F ib_cma cpus=4 kworker/0:4-123 [000] 60.677388: icm_send_rep: local_id=1965336542 remote_id=1096195961 state=REQ_RCVD lap_state=LAP_UNINIT kworker/u8:11-391 [002] 60.678808: icm_send_req: local_id=1982113758 remote_id=0 state=IDLE lap_state=LAP_UNINIT kworker/0:4-123 [000] 60.679652: icm_send_rtu: local_id=1982113758 remote_id=1079418745 state=REP_RCVD lap_state=LAP_UNINIT nfsd-1954 [001] 60.691350: icm_send_rep: local_id=1998890974 remote_id=1129750393 state=MRA_REQ_SENT lap_state=LAP_UNINIT nfsd-1954 [003] 62.017931: icm_send_drep: local_id=1998890974 remote_id=1129750393 state=TIMEWAIT lap_state=LAP_UNINIT Link: https://lore.kernel.org/r/159767240197.2968.12048458026453596018.stgit@klimt.1015granger.net Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-24 19:41:41 -03:00
Chuck Lever	75874b3d50	RDMA/cm: Replace pr_debug() call sites with tracepoints In the interest of converging on a common instrumentation infrastructure, modernize the pr_debug() call sites added by commit `119bf81793` ("IB/cm: Add debug prints to ib_cm"). The new tracepoints appear in a new "ib_cma" subsystem. The conversion is somewhat mechanical. Someone more familiar with the semantics of the recorded information might suggest additional data capture. Some benefits include: - Tracepoints enable "always on" reporting of these errors - The error records are structured and compact - Tracepoints provide hooks for eBPF scripts Sample output: nfsd-1954 [003] 62.017901: icm_dreq_skipped: local_id=1998890974 remote_id=1129750393 state=DREQ_RCVD lap_state=LAP_UNINIT Link: https://lore.kernel.org/r/159767239665.2968.10613294222688696646.stgit@klimt.1015granger.net Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-24 19:41:41 -03:00
Chuck Lever	b3d03daa7c	RDMA/core: Move the rdma_show_ib_cm_event() macro Refactor: Make it globally available in the utilities header. Link: https://lore.kernel.org/r/159767239131.2968.9520990257041764685.stgit@klimt.1015granger.net Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-24 16:01:47 -03:00
Gal Pressman	8d9290a4a8	RDMA/efa: Remove redundant udata check from alloc ucontext response The alloc ucontext flow is always called with a valid udata, there's no need to test whether it's NULL. While at it, the 'udata->outlen' check is removed as well as we copy the minimum between the size of the response and outlen, so in case of zero outlen, zero bytes will be copied. Link: https://lore.kernel.org/r/20200818110835.54299-1-galpress@amazon.com Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-24 14:45:54 -03:00
Kamal Heib	62cbff3267	RDMA/vmw_pvrdma: Fix kernel-doc documentation Fix the kernel-doc documentation by matching between the functions definitions and documentation. Link: https://lore.kernel.org/r/20200820123512.105193-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-24 14:34:18 -03:00
Mohammad Heib	fd49ddaf7e	RDMA/rxe: prevent rxe creation on top of vlan interface Creating rxe device on top of vlan interface will create a non-functional device that has an empty gids table and can't be used for rdma cm communication. This is caused by the logic in enum_all_gids_of_dev_cb()/is_eth_port_of_netdev(), which only considers networks connected to "upper devices" of the configured network device, resulting in an empty set of gids for a vlan interface, and attempts to connect via this rdma device fail in cm_init_av_for_response because no gids can be resolved. Apparently, this behavior was implemented to fit the HW-RoCE devices that create RoCE device per port, therefore RXE must behave the same like HW-RoCE devices and create rxe device per real device only. In order to communicate via a vlan interface, the user must use the gid index of the vlan address instead of creating rxe over vlan. Link: https://lore.kernel.org/r/20200811150415.3693-1-goody698@gmail.com Signed-off-by: Mohammad Heib <goody698@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-24 14:12:18 -03:00
Håkon Bugge	785167a114	IB/mlx4: Adjust delayed work when a dup is observed When scheduling delayed work to clean up the cache, if the entry already has been scheduled for deletion, we adjust the delay. Fixes: `3cf69cc8db` ("IB/mlx4: Add CM paravirtualization") Link: https://lore.kernel.org/r/20200803061941.1139994-7-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-24 11:31:22 -03:00
Håkon Bugge	227a0e142e	IB/mlx4: Add support for REJ due to timeout A CM REJ packet with its reason equal to timeout is a special beast in the sense that it doesn't have a Remote Communication ID nor does it have a Remote Port GID. Using CX-3 virtual functions, either from a bare-metal machine or pass-through from a VM, MAD packets are proxied through the PF driver. Since the VF drivers have separate name spaces for MAD Transaction Ids (TIDs), the PF driver has to re-map the TIDs and keep the book keeping in a cache. This proxying doesn't not handle said REJ packets. If the active side abandons its connection attempt after having sent a REQ, it will send a REJ with the reason being timeout. This example can be provoked by a simple user-verbs program, which ends up doing: rdma_connect(cm_id, &conn_param); rdma_destroy_id(cm_id); using the async librdmacm API. Having dynamic debug prints enabled in the mlx4_ib driver, we will then see: mlx4_ib_demux_cm_handler: Couldn't find an entry for pv_cm_id 0x0, attr_id 0x12 The solution is to introduce a radix-tree. When a REQ packet is received and handled in mlx4_ib_demux_cm_handler(), we know the connecting peer's para-virtual cm_id and the destination slave. We then insert an entry into the tree with said information. We also schedule work to remove this entry from the tree and free it, in order to avoid memory leak. When a REJ packet with reason timeout is received, we can look up the slave in the tree, and deliver the packet to the correct slave. When a duplicate REQ packet is received, the entry is in the tree. In this case, we adjust the delayed work in order to avoid a too premature eviction of the entry. When cleaning up, we simply traverse the tree and modify any delayed work to use a zero delay. A subsequent flush of the system_wq will ensure all entries being wiped out. Fixes: `3cf69cc8db` ("IB/mlx4: Add CM paravirtualization") Link: https://lore.kernel.org/r/20200803061941.1139994-6-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-24 11:31:22 -03:00
Håkon Bugge	7fd1507df7	IB/mlx4: Fix starvation in paravirt mux/demux The mlx4 driver will proxy MAD packets through the PF driver. A VM or an instantiated VF will send its MAD packets to the PF driver using loop-back. The PF driver will be informed by an interrupt, but defer the handling and polling of CQEs to a worker thread running on an ordered work-queue. Consider the following scenario: the VMs will in short proximity in time, for example due to a network event, send many MAD packets to the PF driver. Lets say there are K VMs, each sending N packets. The interrupt from the first VM will start the worker thread, which will poll N CQEs. A common case here is where the PF driver will multiplex the packets received from the VMs out on the wire QP. But before the wire QP has returned a send CQE and associated interrupt, the other K - 1 VMs have sent their N packets as well. The PF driver has to multiplex K * N packets out on the wire QP. But the send-queue on the wire QP has a finite capacity. So, in this scenario, if K * N is larger than the send-queue capacity of the wire QP, we will get MAD packets dropped on the floor with this dynamic debug message: mlx4_ib_multiplex_mad: failed sending GSI to wire on behalf of slave 2 (-11) and this despite the fact that the wire send-queue could have capacity, but the PF driver isn't aware, because the wire send CQEs have not yet been polled. We can also have a similar scenario inbound, with a wire recv-queue larger than the tunnel QP's send-queue. If many remote peers send MAD packets to the very same VM, the tunnel send-queue destined to the VM could allegedly be construed to be full by the PF driver. This starvation is fixed by introducing separate work queues for the wire QPs vs. the tunnel QPs. With this fix, using a dual ported HCA, 8 VFs instantiated, we could run cmtime on each of the 18 interfaces towards a similar configured peer, each cmtime instance with 800 QPs (all in all 14400 QPs) without a single CM packet getting lost. Fixes: `3cf69cc8db` ("IB/mlx4: Add CM paravirtualization") Link: https://lore.kernel.org/r/20200803061941.1139994-5-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-08-24 11:31:22 -03:00

1 2 3 4 5 ...

11501 Commits