There are several places in the ucma ABI where userspace can pass in a
sockaddr but set the address family to AF_IB. When that happens,
rdma_addr_size() will return a size bigger than sizeof struct sockaddr_in6,
and the ucma kernel code might end up copying past the end of a buffer
not sized for a struct sockaddr_ib.
Fix this by introducing new variants
int rdma_addr_size_in6(struct sockaddr_in6 *addr);
int rdma_addr_size_kss(struct __kernel_sockaddr_storage *addr);
that are type-safe for the types used in the ucma ABI and return 0 if the
size computed is bigger than the size of the type passed in. We can use
these new variants to check what size userspace has passed in before
copying any addresses.
Reported-by: <syzbot+6800425d54ed3ed8135d@syzkaller.appspotmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
- fix trace_hfi1_ctxt_info() to pass large struct by reference instead of by value
- convert 'type array[]' tracepoint arguments into 'type *array',
since compiler will warn that sizeof('type array[]') == sizeof('type *array')
and later should be used instead
The CAST_TO_U64 macro in the later patch will enforce that tracepoint
arguments can only be integers, pointers, or less than 8 byte structures.
Larger structures should be passed by reference.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Move mlx5_ib dump error CQE implementation to mlx5 CQ header file in
order to use it in a downstream patch from mlx5e.
In addition, use print_hex_dump instead of manual dumping of the buffer.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Move query SQ state function from mlx5_ib to mlx5_core in order to
have it in shared code.
It will be used in a downstream patch from mlx5e.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
IB core maintains the GID cache entries for the GID table.
This cache table has to be maintained regardless of HCA's
support of GID table.
For IB and iWarp ports, cache is created by querying the HCA.
For RoCE cache is created based on netdev events.
Therefore just refer to the RoCE port property of the {device, port} to
decide whether to build cache by querying HCA or from netdev events.
There is no need to check if HCA support GID table or not.
ib_cache_update() referred to RoCE attribute before validating
port. Though in all current callers port is valid, it is incorrect
to query RoCE port property before validating the port. Therefore,
rdma_protocol_roce() check is done after rdma_is_port_valid() verifies
that port is valid.
Fixes: 115b68aa6e ("IB/ocrdma: Removed GID add/del null routines")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Even though API is only used by IPoIB driver, its incorrect to refer
RoCE GID table property to search for GID.
Look for only IB link layer to search for the GID.
Fixes: dbb12562f7 ("IB/{core, ipoib}: Simplify ib_find_gid to search only for IB link layer")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
ib_find_gid_by_filter() searches GID with filter only for RoCE link
layer regardless of HCA's support for GID table.
Therefore, right way to lookup is compare RoCE port property and not
the GID table property.
Fixes: 99b27e3b5d ("IB/cache: Add ib_find_gid_by_filter cache API")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Due to following reasons, GID table event is generated regardless of GID
table property.
1. GID table cache is maintained at ib core layer regardless of link layer.
2. GID change event has no relation with IB link layer.
3. GID change event also doesn't depend on whether HCA supports GID table
or not.
Fixes: f3906bd360 ("IB/core: Refactor GID cache's ib_dispatch_event")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Due to below reasons, it is better to not support alternate path receive
messages for RoCE in near term.
1. Alternate path for RoCE is not supported at rdmacm layer.
2. It is not supported in uverbs/core layer for RoCE.
3. Alternate path for IPv6 for link local address cannot resolve route
determinstically without a valid incoming interface id whose usecase
make sense only with dual port mode.
4. init_av_from_path while processing LAP messages for IB and RoCE can
lead to adding duplicate entry of AV into the port list, leads to list
corruption.
5. rdma-core userspace a well known userspace implementation has removed
support of libucm which use ucm.ko module, which is the only module that
can trigger alternate path related messages.
6. ucm kernel module is requested to be removed from the IB core in
patch [1].
[1] https://patchwork.kernel.org/patch/10268503/
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently access to hardware stats buffer isn't protected, this can
result in multiple writes and reads at the same time to the same
memory location. This can lead to providing an incorrect value to
the user. Add a mutex to protect against it.
Fixes: b40f4757da ("IB/core: Make device counter infrastructure dynamic")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The current for-loop zeros variable i and only loops once, hence
not all the buffers are free'd. Fix this by setting i correctly.
Detected by CoverityScan, CID#1463415 ("Operands don't affect result")
Fixes: a5073d6054 ("RDMA/hns: Add eq support of hip08")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In some firmware configuration, UMR usage from Virtual Functions is restricted.
This information is published to the driver using new capability bits.
Avoid using UMRs in these cases and use the Firmware slow-path flow to create
mkeys and populate them with Virtual to Physical address translation.
Older drivers that do not have this patch, will end up using memory keys that
aren't populated with Virtual to Physical address translation that is done
part of the UMR work.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When working with RC QPs, the FW sets the ECN capable bits for all
the RoCE v2 packets. On the other hand, for UD QPs, the driver needs
to set the the ECN capable bits in the Address Handler since the HW
generates each packet according to the Address Handler and not
the QP context.
If ECN is not enabled in NIC or switch, these bits are ignored.
Fixes: 2811ba51b0 ("IB/mlx5: Add RoCE fields to Address Vector")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
With 32 bit compilation several of the fields become misaligned here.
Fixing this is an ABI break for 32 bit rxe and it is in well used
portions of the rxe ABI.
To handle this we bump the ABI version, as expected. However the user
space driver doesn't handle it properly today, so all existing user
space continues to work.
Updated userspace will start to require the necessary kernel version.
We don't expect there to be any 32 bit users of rxe. Most likely cases,
such as ARM 32 already generally don't work because rxe does not handle
the CPU cache properly on its shared with userspace pages.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The rdma_ucm_event_resp is a different length on 32 and 64 bit compiles.
The kernel requires it to be the expected length or longer so 32 bit
builds running on a 64 bit kernel will not work.
Retain full compat by having all kernels accept a struct with or without
the trailing reserved field.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Ensure that device exists prior to accessing its properties.
Reported-by: <syzbot+71655d44855ac3e76366@syzkaller.appspotmail.com>
Fixes: 7521663857 ("RDMA/cma: Export rdma cm interface to userspace")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Last user is gone after bdf5bd7f21 "rds: tcp: remove
register_netdevice_notifier infrastructure.", so we can
remove this netdevice command. This allows to delete
rtnl_lock() in netdev_run_todo(), which is hot path for
net namespace unregistration.
dev_change_net_namespace() and netdev_wait_allrefs()
have rcu_barrier() before NETDEV_UNREGISTER_FINAL call,
and the source commits say they were introduced to
delemit the call with NETDEV_UNREGISTER, but this patch
leaves them on the places, since they require additional
analysis, whether we need in them for something else.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is preparation to drop NETDEV_UNREGISTER_FINAL.
Since the cmd is used in usnic_ib_netdev_event_to_string()
to get cmd name, after plain removing NETDEV_UNREGISTER_FINAL
from everywhere, we'd have holes in event2str[] in this
function.
Instead of that, let's make NETDEV_XXX commands names
available for everyone, and to define netdev_cmd_to_name()
in the way we won't have to shaffle names after their
numbers are changed.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently CM request for RoCE follows following flow.
rdma_create_id()
rdma_resolve_addr()
rdma_resolve_route()
For RC QPs:
rdma_connect()
->cma_connect_ib()
->ib_send_cm_req()
->cm_init_av_by_path()
->ib_init_ah_attr_from_path()
For UD QPs:
rdma_connect()
->cma_resolve_ib_udp()
->ib_send_cm_sidr_req()
->cm_init_av_by_path()
->ib_init_ah_attr_from_path()
In both the flows, route is already resolved before sending CM requests.
Therefore, code is refactored to avoid resolving route second time in
ib_cm layer.
ib_init_ah_attr_from_path() is extended to resolve route when it is not
yet resolved for RoCE link layer. This is achieved by caller setting
route_resolved field in path record whenever it has route already
resolved.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Fun set of conflict resolutions here...
For the mac80211 stuff, these were fortunately just parallel
adds. Trivially resolved.
In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the
function phy_disable_interrupts() earlier in the file, whilst in
'net-next' the phy_error() call from this function was removed.
In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the
'rt_table_id' member of rtable collided with a bug fix in 'net' that
added a new struct member "rt_mtu_locked" which needs to be copied
over here.
The mlxsw driver conflict consisted of net-next separating
the span code and definitions into separate files, whilst
a 'net' bug fix made some changes to that moved code.
The mlx5 infiniband conflict resolution was quite non-trivial,
the RDMA tree's merge commit was used as a guide here, and
here are their notes:
====================
Due to bug fixes found by the syzkaller bot and taken into the for-rc
branch after development for the 4.17 merge window had already started
being taken into the for-next branch, there were fairly non-trivial
merge issues that would need to be resolved between the for-rc branch
and the for-next branch. This merge resolves those conflicts and
provides a unified base upon which ongoing development for 4.17 can
be based.
Conflicts:
drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f95
(IB/mlx5: Fix cleanup order on unload) added to for-rc and
commit b5ca15ad7e (IB/mlx5: Add proper representors support)
add as part of the devel cycle both needed to modify the
init/de-init functions used by mlx5. To support the new
representors, the new functions added by the cleanup patch
needed to be made non-static, and the init/de-init list
added by the representors patch needed to be modified to
match the init/de-init list changes made by the cleanup
patch.
Updates:
drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
prototypes added by representors patch to reflect new function
names as changed by cleanup patch
drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
stage list to match new order from cleanup patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ib_query_gid() in commit [1] refers to RoCE GID table capability of
the HCA using rdma_cap_roce_gid_table().
ib_core maintains the GID table cache regardless of the HCA provider
drivers capability to maintain RoCE GID table.
Therefore, whether to return a GID table entry from the software cache or
from HCA should be done based on whether the port is RoCE or not.
[1] commit 03db3a2d81 ("IB/core: Add RoCE GID table management")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The restrack clean routine had simple, but powerful WARN_ON check
to see if all resources are cleared prior to releasing device.
The WARN_ON check performed very well, but lack of information
which device caused to resource leak, the object type and origin
made debug to be fun and challenging at the same time.
The fact that all dumps were the same because restrack_clean() is
called in dealloc() didn't help either.
So let's fix spelling error and convert WARN_ON to be more debug
friendly. The dmesg cut below gives example of how the output
will look output for the case fixed in patch [1]
[ 438.421372] restrack: ------------[ cut here ]------------
[ 438.423448] restrack: BUG: RESTRACK detected leak of resources on mlx5_2
[ 438.425600] restrack: Kernel PD object allocated by mlx5_ib is not freed
[ 438.427753] restrack: Kernel CQ object allocated by mlx5_ib is not freed
[ 438.429660] restrack: ------------[ cut here ]------------
[1] https://patchwork.kernel.org/patch/10298695/
Cc: Michal Kalderon <Michal.Kalderon@cavium.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Upon detecting both kernel and user space support record doorbell,
the kernel needs to enable this capability in hardware by db_en,
and it should take place before cq context configuration in
hns_roce_cq_alloc. Currently, db_en is configured after cq alloc
and db_map_user has similar problem.
Reported-by: Xiping Zhang <zhangxiping3@huawei.com>
Fixes: 9b44703d0a ("RDMA/hns: Support cq record doorbell for the user space")
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Open coding a loose value is not acceptable for describing the uABI in
RDMA. Provide the missing struct.
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Once the FW is transitioned to error, FLUSH cqes can be received.
We want the driver to be aware of the fact that QP is already in error.
Without this fix, a user may see false error messages in the dmesg log,
mentioning that a FLUSH cqe was received while QP is not in error state.
Fixes: cecbcddf ("qedr: Add support for QP verbs")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Return code wasn't set properly when CNQ allocation failed.
This only affect error message logging, currently user will
receive an error message that says the qedr driver load failed
with rc '0', instead of ENOMEM
Fixes: ec72fce4 ("qedr: Add support for RoCE HW init")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
QPs that were configured with ack timeout value lower than 1
msec will not implement re-transmission timeout.
This means that if a packet / ACK were dropped, the QP
will not retransmit this packet.
This can lead to an application hang.
Fixes: cecbcddf6 ("qedr: Add support for QP verbs")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The option size check is using optval instead of optlen
causing the set option call to fail. Use the correct
field, optlen, for size check.
Fixes: 6a21dfc0d0 ("RDMA/ucma: Limit possible option size")
Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In case we failed to create UMR resources, mark them as invalid so we
won't try to destroy them on the unwind path.
Add the relevant checks to destroy_umrc_res(), this is done for the
unlikely event ib_register_device() or create_umr_res() err out and we
try to destroy invalid objects.
Fixes: 42cea83f95 ("IB/mlx5: Fix cleanup order on unload")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.
This ends up CPU observing two barriers back to back before executing the
register write.
Since code already has an explicit barrier call, changing writel() to
writel_relaxed().
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
- Many bug fixes related to syzkaller from Leon Romanovsky.
These are still for the mlx driver and ucma interface.
- Fix a situation with port reuse for iWarp, discovered during scale-up
testing
- Bug fixes for the profile and restrack patches accepted during this merge
window
- Compile warning cleanups from Arnd, this is apparently the last warning
to make 32 bit builds quite.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCgAGBQJasZXlAAoJEDht9xV+IJsaOv8QAJ7FwiS/DVadNBD9/Gjksb/V
co41g9eK4s0qEKXJgMLjSr/aaK7gCG2waeXRSSJm52e1ERyiyceRaoLWqhJ98ZOt
xs/rAj4aBKmRgQ5j082O3YYVaAHETTIcPYVjlULDNhkqH1xhtm7LmKxTHPxGxLdU
IMKZeQStY46or3fuOyDmHQ7RbxRbl+7LhL1LEE9+dd7u6fSipgGWv6i3Wrqex/AV
XC1C1Pabq7Qo+d516mNX2JSjo9jT4kuamLprxQtUJxMeU5UJ+7IlZlTOGgU0q2Zv
x8W7744zVvifUc1gs3AFRvwhDkTrwnkhFullCyVN86r1jrduG1kxX+N0ksCY79yB
h9A5r+qV63XPelDJAFQIllkLPh8p3raCfvfQZAMkVqh2Lqn2s1tlukN/6NLAaguW
YAwbRk0Q1XgXRj4mGW+3vEH7UGMgaIqF2JlnU25hOuoyVSUkgvy88NG9aVx//a5h
KCdRa/iqTDJthKfnCAu+yYa4k5AKeRkdNKB0GebiPrrdpgJHMBKuPCjLrd4NP9QG
As1gi9N3gtNoZL7QEyYBL8NIXNpiiY4YANFf7otoZwvFBSzILKWJI74WOg8HWGJT
jKDQk6WQfYS3Xe3WVy0WOXhsdvYyiCXdag63ErUlzAMffhpp1GZoBEEk+4z+lYL8
W69w1xQcntpG9N+EBqBT
=Gal+
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Not much exciting here, almost entirely syzkaller fixes.
This is going to be on ongoing theme for some time, I think. Both
Google and Mellanox are now running syzkaller on different parts of
the user API.
Summary:
- Many bug fixes related to syzkaller from Leon Romanovsky. These are
still for the mlx driver and ucma interface.
- Fix a situation with port reuse for iWarp, discovered during
scale-up testing
- Bug fixes for the profile and restrack patches accepted during this
merge window
- Compile warning cleanups from Arnd, this is apparently the last
warning to make 32 bit builds quiet"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/ucma: Ensure that CM_ID exists prior to access it
RDMA/verbs: Remove restrack entry from XRCD structure
RDMA/ucma: Fix use-after-free access in ucma_close
RDMA/ucma: Check AF family prior resolving address
infiniband: bnxt_re: use BIT_ULL() for 64-bit bit masks
infiniband: qplib_fp: fix pointer cast
IB/mlx5: Fix cleanup order on unload
RDMA/ucma: Don't allow join attempts for unsupported AF family
RDMA/ucma: Fix access to non-initialized CM_ID object
RDMA/core: Do not use invalid destination in determining port reuse
RDMA/mlx5: Fix crash while accessing garbage pointer and freed memory
IB/mlx5: Fix integer overflows in mlx5_ib_create_srq
IB/mlx5: Fix out-of-bounds read in create_raw_packet_qp_rq
Enable the ioctl() uAPI for IB by default if the standard write()
uAPI (INFINIBAND_USER_ACCESS) is enabled. Verbs that are
also available under the old write() uAPI are put inside a new
INFINIBAND_EXP_LEGACY_VERBS_NEW_UAPI Kconfig.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently, all objects are declared in uverbs_std_types. This could lead
to a huge file once we implement all objects, methods and handlers.
Moving each object to its own file to keep the files smaller and more
readable. uverbs_std_types.c will only contain the parsing tree
definition and objects without any methods.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The ioctl() based uverbs is based on merging feature trees. This teaches
the generic parser how to parse methods according to the provider's
support. In order to support merging with the common objects, exporting
the common-object-tree to the provider drivers.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Previously, we've used UVERBS_ATTR_SPEC_F_MIN_SZ for extending existing
attributes. The behavior of this flag was the kernel accepts anything
bigger than the minimum size it specified. This is unsafe, since in
order to safely extend an attribute, we need to make sure unknown size
is zeroed. Replacing UVERBS_ATTR_SPEC_F_MIN_SZ with
UVERBS_ATTR_SPEC_F_MIN_SZ_OR_ZERO, which essentially checks that the
unknown size is zero. In addition, attributes are now decorated with
UVERBS_ATTR_TYPE and UVERBS_ATTR_STRUCT, so we can provide the minimum
and known length.
Users of this flag needs to use copy_from_or_zero functions/macros.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Downstream patches extend uverbs_attr_spec with new fields.
In order to save space, we move the type and flags fields to
the various attribute flavors contained in the union.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Extending uverbs_ioctl header with driver_id and another reserved
field. driver_id should be used in order to identify the driver.
Since every driver could have its own parsing tree, this is necessary
for strace support.
Downstream patches take off the EXPERIMENTAL flag from the ioctl() IB
support and thus we add some reserved fields for future usage.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use macros to make names consistent in ioctl() uAPI:
The ioctl() uAPI works with object-method hierarchy. The method part
also states which handler should be executed when this method is called
from user-space. Therefore, we need to tie method, method's id, method's
handler and the object owning this method together.
Previously, this was done through explicit developer chosen names.
This makes grepping the code harder. Changing the method's name,
method's handler and object's name to be automatically generated based
on the ids.
The headers are split in a way so they be included and used by
user-space. One header strictly contains structures that are used
directly by user-space applications, where another header is used for
internal library (i.e. libibverbs) to form the ioctl() commands.
Other header simply contains the required general command structure.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The error in ucma_create_id() left ctx in the list of contexts belong
to ucma file descriptor. The attempt to close this file descriptor causes
to use-after-free accesses while iterating over such list.
Fixes: 7521663857 ("RDMA/cma: Export rdma cm interface to userspace")
Reported-by: <syzbot+dcfd344365a56fbebd0f@syzkaller.appspotmail.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
According to the SRP standard the INITIATOR and TARGET PORT IDENTIFIER
fields from the login request specify the I_T nexus. Whether or not an
SRP target closes an existing connection for an I_T nexus when a login
request is received depends on the value of the MULTICHANNEL field in
the login request. The SRP initiator derives the value of the
INITIATOR and TARGET PORT IDENTIFIER fields from the .id_ext,
.ioc_guid, .initiator_ext .sgid members of the srp_target_port
structure. This means that the .rdma_cm.dst check must be removed from
srp_conn_unique(). This patch avoids that for target ports that have
multiple addresses, e.g. an IPv4 and an IPv6 address, and if a
connection is established to both target port addresses, that the
initiator logs in alternatingly every 10 seconds to the other target
port address. An SRP target must namely terminate all but one
connections for a given I_T nexus if the MULTICHANNEL field has not
been set in the login request.
Fixes: 19f313438c ("IB/srp: Add RDMA/CM support")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Enable RAW QP to be able to configure burst control by modify_qp. By
using burst control with rate limiting, user can achieve best
performance and accuracy. The burst control information is passed by
user through udata.
This patch also reports burst control capability for mlx5 related
hardwares, burst control is only marked as supported when both
packet_pacing_burst_bound and packet_pacing_typical_size are
supported.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The data in resp will be copied from kernel to userspace, thus it needs to
be initialized to zeros to avoid copying uninited stack memory.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e088a685ea ("RDMA/hns: Support rq record doorbell for the user space")
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use rdma_is_port_valid() which performs port validity check instead of
open coding the same check.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Before commit f1b65df5a2 ("IB/mlx5: Add support for active_width and
active_speed in RoCE"), the mlx5_ib driver set the default active_width
and active_speed to IB_WIDTH_4X and IB_SPEED_QDR.
When the RoCE port is down, the RoCE port does not negotiate the active
width with the remote side, causing the active width to be zero. When
running userspace ibstat to view the port status, ibstat will panic as it
reads an invalid width from sys file.
This patch restores the original behavior.
Fixes: f1b65df5a2 ("IB/mlx5: Add support for active_width and active_speed in RoCE").
Signed-off-by: Honggang Li <honli@redhat.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Before commit f1b65df5a2 ("IB/mlx5: Add support for active_width and
active_speed in RoCE"), the mlx5_ib driver set default active_width and
active_speed to IB_WIDTH_4X and IB_SPEED_QDR.
Now, the active_width and active_speed are zeros if the RoCE port
is in DOWN state. The speed string should be set to " SDR" instead of
a blank string when active_speed is zero.
Signed-off-by: Honggang Li <honli@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
notify uld drivers if the adapter encounters fatal
error.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a default, for Ethernet packets, the device scatters only the payload
of ingress packets. The scatter FCS feature lets the user to get the FCS
(Ethernet's frame check sequence) in the received WR's buffer as a 4
Bytes trailer following the packet's payload.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Report to the user area the TSO device capabilities, it includes the
max_tso size and the QP types that support it.
The TSO is applicable only when when of the ports is ETH and the device
supports it.
uresp logic around rss_caps is updated to fix a till-now harmless bug
computing the length of the structure to copy. The code did not handle the
implicit padding before rss_caps correctly. This is necessay to copy
tss_caps successfully.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is no explicit tear-down sequence initiated on
connections if the Control QP OP, Modify QP to close,
fails. Fix this by triggering a driver generated
Asynchronous Event (AE) on Modify QP failures and
tear-down the connection on receipt of the AE.
This fix can be generalized to other Modify QP failures
(i.e. RTS->TERM, IDLE->RTS, etc) as any modify failure
will require a connection tear-down.
Fixes: d374984179 ("i40iw: add files for iwarp interface")
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The flush CQP OP can be used to optionally generate
Asynchronous Events (AEs) in addition to QP flush.
Consolidate all HW AE generation code under a new
function i40iw_gen_ae which use the flush CQP OP
to only generate AEs.
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Open coding a loose value is not acceptable for describing the uABI in
RDMA. Provide the missing struct.
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Open coding a loose value is not acceptable for describing the uABI in
RDMA. Provide the missing struct.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
All of these defines are part of the uABI for the driver, this
header duplicates providers/i40iw/i40iw-abi.h in rdma-core.
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
MLX4_USER_DEV_CAP_LARGE_CQE (via mlx4_ib_alloc_ucontext_resp.dev_caps)
and MLX4_IB_QUERY_DEV_RESP_MASK_CORE_CLOCK_OFFSET (via
mlx4_uverbs_ex_query_device_resp.comp_mask) are copied directly to
userspace and form part of the uAPI.
Move them to the uapi header where they belong.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Open coding pointer math is not acceptable for describing the uABI in
RDMA. Provide structs for all the cases.
The udata is casted to the struct as close to the verbs entry point
as possible for maximum clarity. Function signatures and so forth
are revised to allow for this.
Tested-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
It isn't used and it couldn't possibly ever be used correctly.
Tested-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
rvt_mregion uses percpu_ref for reference counting and RCU to protect
accesses from lkey_table. When a rvt_mregion needs to be freed, it
first gets unregistered from lkey_table and then rvt_check_refs() is
called to wait for in-flight usages before the rvt_mregion is freed.
rvt_check_refs() seems to have a couple issues.
* It has a fast exit path which tests percpu_ref_is_zero(). However,
a percpu_ref reading zero doesn't mean that the object can be
released. In fact, the ->release() callback might not even have
started executing yet. Proceeding with freeing can lead to
use-after-free.
* lkey_table is RCU protected but there is no RCU grace period in the
free path. percpu_ref uses RCU internally but it's sched-RCU whose
grace periods are different from regular RCU. Also, it generally
isn't a good idea to depend on internal behaviors like this.
To address the above issues, this patch removes the fast exit and adds
an explicit synchronize_rcu().
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: linux-rdma@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Code includes wmb() followed by writel() in multiple places. writel()
already has a barrier on some architectures like arm64.
This ends up CPU observing two barriers back to back before executing the
register write.
Since code already has an explicit barrier call, changing writel() to
writel_relaxed().
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch changes the type of cqn from u32 to u64 to keep
userspace and kernel consistent, initializes resp both for
cq and qp to zeros, and also changes the condition judgment
of outlen considering future caps extension.
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Fixes: e088a685ea (hns: Support rq record doorbell for the user space)
Fixes: 9b44703d0a (hns: Support cq record doorbell for the user space)
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Before commit [1], rdma_addr_find_l2_eth_by_grh() was an exported function
and therefore declaration in include/rdma/ib_addr.h was fine.
But now that its scope is limited to ib_core module, its better to have it
in core_priv.h.
[1] commit 1060f86534 ("IB/{core/cm}: Fix generating a return AH for
RoCEE")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Introduce and use helper function get_cm_port_from_path() to get
cm_port based on the the path record entry.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Resolving route for RoCE for a path record is needed only for the
received CM requests.
Therefore,
(a) ib_init_ah_attr_from_path() is refactored first to isolate the
code of resolving route.
(b) Setting dlid, path bits is not needed for RoCE.
Additionally ah attribute initialization is done from the path record
entry, so it is better to refer to path record entry type for
different link layer instead of ah attribute type while initializing
ah attribute itself.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add and use helper function add_cm_id_to_port_list() to attach
cm_id to port list.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
add_gid() and del_gid() are optional callback routines.
ib_core ignores invoking them while updating GID table entries if
they are not implemented by provider drivers. Therefore remove them.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
rdma_resolve_ip_route() is used only by ib_core module. Therefore it is
removed as an exported symbol.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
rdma_protocol_roce() API from the ib_core already provides a way to
detect whether a given device+port is RoCE or not.
Therefore, make use of it and avoid implementing it again in rdmacm
module.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
ah_attr contains the port number to which cm_id is bound. However, while
searching for GID table for matching GID entry, the port number is
ignored.
This could cause the wrong GID to be used when the ah_attr is converted to
an AH.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The return status of ib_init_ah_from_mcmember() is ignored by
cma_ib_mc_handler(). Honor it and return error event if ah attribute
initialization failed.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
ib_find_gid() is only used by IPoIB driver. For IB link layer, GID table
entries are not based on netdevice. Netdevice parameter is unused here.
Therefore, it is removed.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Exported symbol's comments should be with function definition and not in
the header file. Therefore comments of ib_find_cached_gid() and
ib_find_cached_gid_by_port() functions are moved closer to their
definitions.
The function name in then comment is different than the actual function
name, fix it to be same as ib_cache_gid_find_by_filter().
Also current comment section of ib_find_cached_gid_by_port() contains the
desciption of ib_find_cached_gid(), fix that as well.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The failure to destroy the MRs is printed on mlx5_core layer
as error and it makes warning prints useless.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
"live" is needed for ODP only and is better to be guarded
by appropriate CONFIG.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
According to the IBTA spec 1.3, the driver failure in
MR reregister shall release old and new MRs.
C11-20: If the CI returns any other error, the CI shall
invalidate both "old" and "new" registrations, and release
any associated resources.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Return -EOPNOTSUPP value to the user for unsupported reg_user_mr.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The mlx5_ib_alloc_implicit_mr() can fail to acquire pages
and the returned mr pointer won't be valid. Ensure that it
is not error prior to access.
Cc: <stable@vger.kernel.org> # 4.10
Fixes: 81713d3788 ("IB/mlx5: Add implicit MR support")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Due to bug fixes found by the syzkaller bot and taken into the for-rc
branch after development for the 4.17 merge window had already started
being taken into the for-next branch, there were fairly non-trivial
merge issues that would need to be resolved between the for-rc branch
and the for-next branch. This merge resolves those conflicts and
provides a unified base upon which ongoing development for 4.17 can
be based.
Conflicts:
drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f95
(IB/mlx5: Fix cleanup order on unload) added to for-rc and
commit b5ca15ad7e (IB/mlx5: Add proper representors support)
add as part of the devel cycle both needed to modify the
init/de-init functions used by mlx5. To support the new
representors, the new functions added by the cleanup patch
needed to be made non-static, and the init/de-init list
added by the representors patch needed to be modified to
match the init/de-init list changes made by the cleanup
patch.
Updates:
drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
prototypes added by representors patch to reflect new function
names as changed by cleanup patch
drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
stage list to match new order from cleanup patch
Signed-off-by: Doug Ledford <dledford@redhat.com>
On 32-bit targets, we otherwise get a warning about an impossible constant
integer expression:
In file included from include/linux/kernel.h:11,
from include/linux/interrupt.h:6,
from drivers/infiniband/hw/bnxt_re/ib_verbs.c:39:
drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_query_device':
include/linux/bitops.h:7:24: error: left shift count >= width of type [-Werror=shift-count-overflow]
#define BIT(nr) (1UL << (nr))
^~
drivers/infiniband/hw/bnxt_re/bnxt_re.h:61:34: note: in expansion of macro 'BIT'
#define BNXT_RE_MAX_MR_SIZE_HIGH BIT(39)
^~~
drivers/infiniband/hw/bnxt_re/bnxt_re.h:62:30: note: in expansion of macro 'BNXT_RE_MAX_MR_SIZE_HIGH'
#define BNXT_RE_MAX_MR_SIZE BNXT_RE_MAX_MR_SIZE_HIGH
^~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/bnxt_re/ib_verbs.c:149:25: note: in expansion of macro 'BNXT_RE_MAX_MR_SIZE'
ib_attr->max_mr_size = BNXT_RE_MAX_MR_SIZE;
^~~~~~~~~~~~~~~~~~~
Fixes: 872f357824 ("RDMA/bnxt_re: Add support for MRs with Huge pages")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Building for a 32-bit target results in a couple of warnings from casting
between a 32-bit pointer and a 64-bit integer:
drivers/infiniband/hw/bnxt_re/qplib_fp.c: In function 'bnxt_qplib_service_nq':
drivers/infiniband/hw/bnxt_re/qplib_fp.c:333:23: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
bnxt_qplib_arm_srq((struct bnxt_qplib_srq *)q_handle,
^
drivers/infiniband/hw/bnxt_re/qplib_fp.c:336:12: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
(struct bnxt_qplib_srq *)q_handle,
^
In file included from include/linux/byteorder/little_endian.h:5,
from arch/arm/include/uapi/asm/byteorder.h:22,
from include/asm-generic/bitops/le.h:6,
from arch/arm/include/asm/bitops.h:342,
from include/linux/bitops.h:38,
from include/linux/kernel.h:11,
from include/linux/interrupt.h:6,
from drivers/infiniband/hw/bnxt_re/qplib_fp.c:39:
drivers/infiniband/hw/bnxt_re/qplib_fp.c: In function 'bnxt_qplib_create_srq':
include/uapi/linux/byteorder/little_endian.h:31:43: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
#define __cpu_to_le64(x) ((__force __le64)(__u64)(x))
^
include/linux/byteorder/generic.h:86:21: note: in expansion of macro '__cpu_to_le64'
#define cpu_to_le64 __cpu_to_le64
^~~~~~~~~~~~~
drivers/infiniband/hw/bnxt_re/qplib_fp.c:569:19: note: in expansion of macro 'cpu_to_le64'
req.srq_handle = cpu_to_le64(srq);
Using a uintptr_t as an intermediate works on all architectures.
Fixes: 37cb11acf1 ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
gcc-4.4.4 has issues with initialization of anonymous unions:
drivers/infiniband/ulp/srpt/ib_srpt.c: In function 'srpt_zerolength_write':
drivers/infiniband/ulp/srpt/ib_srpt.c:854: error: unknown field 'wr_cqe' specified in initializer
drivers/infiniband/ulp/srpt/ib_srpt.c:854: warning: initialization makes integer from pointer without a cast
Work aound this.
Fixes: 2a78cb4db4 ("IB/srpt: Fix an out-of-bounds stack access in srpt_zerolength_write()")
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
gcc-4.4.4 has issues with initialization of anonymous unions.
drivers/infiniband/core/verbs.c: In function '__ib_drain_sq':
drivers/infiniband/core/verbs.c:2204: error: unknown field 'wr_cqe' specified in initializer
drivers/infiniband/core/verbs.c:2204: warning: initialization makes integer from pointer without a cast
Work around this.
Fixes: a1ae7d0345 ("RDMA/core: Avoid that ib_drain_qp() triggers an out-of-bounds stack access")
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
On load we create private CQ/QP/PD in order to be used by UMR, we create
those resources after we register ourself as an IB device, and we destroy
them after we unregister as an IB device. This was changed by commit
16c1975f10 ("IB/mlx5: Create profile infrastructure to add and remove
stages") which moved the destruction before we unregistration. This
allowed to trigger an invalid memory access when unloading mlx5_ib while
there are open resources:
BUG: unable to handle kernel paging request at 00000001002c012c
...
Call Trace:
mlx5_ib_post_send_wait+0x75/0x110 [mlx5_ib]
__slab_free+0x9a/0x2d0
delay_time_func+0x10/0x10 [mlx5_ib]
unreg_umr.isra.15+0x4b/0x50 [mlx5_ib]
mlx5_mr_cache_free+0x46/0x150 [mlx5_ib]
clean_mr+0xc9/0x190 [mlx5_ib]
dereg_mr+0xba/0xf0 [mlx5_ib]
ib_dereg_mr+0x13/0x20 [ib_core]
remove_commit_idr_uobject+0x16/0x70 [ib_uverbs]
uverbs_cleanup_ucontext+0xe8/0x1a0 [ib_uverbs]
ib_uverbs_cleanup_ucontext.isra.9+0x19/0x40 [ib_uverbs]
ib_uverbs_remove_one+0x162/0x2e0 [ib_uverbs]
ib_unregister_device+0xd4/0x190 [ib_core]
__mlx5_ib_remove+0x2e/0x40 [mlx5_ib]
mlx5_remove_device+0xf5/0x120 [mlx5_core]
mlx5_unregister_interface+0x37/0x90 [mlx5_core]
mlx5_ib_cleanup+0xc/0x225 [mlx5_ib]
SyS_delete_module+0x153/0x230
do_syscall_64+0x62/0x110
entry_SYSCALL_64_after_hwframe+0x21/0x86
...
We restore the original behavior by breaking the UMR stage into two parts,
pre and post IB registration stages, this way we can restore the original
functionality and maintain clean separation of logic between stages.
Fixes: 16c1975f10 ("IB/mlx5: Create profile infrastructure to add and remove stages")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch fixes RDMA/rxe over 802.1q VLAN devices.
Without it, I observed the following behavior:
a) adding a VLAN device to RXE via rxe_net_add() creates a non-functional
RDMA device. This is caused by the logic in enum_all_gids_of_dev_cb() /
is_eth_port_of_netdev(), which only considers networks connected to
"upper devices" of the configured network device, resulting in an empty
set of gids for a VLAN interface that is an "upper device" itself.
Later attempts to connect via this rdma device fail in cma_acuire_dev()
because no gids can be resolved.
b) adding the master device of the VLAN device instead seems to work
initially, target addresses via VLAN devices are resolved successfully.
But the connection times out because no 802.1q VLAN headers are
inserted in the ethernet packets, which are therefore never received.
This happens because the RXE layer sends the packets via the master
device rather than the VLAN device.
The problem could be solved by changing either a) or b). My thinking was
that the logic in a) was created deliberately, thus I decided to work on
b). It turns out that the information about the VLAN interface for the gid
at hand is available in the AV information. My patch converts the RXE code
to use this netdev instead of rxe->ndev. With this change, RXE over vlan
works on my test system.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
We get a build failure on ARM unless the header is included explicitly:
drivers/infiniband/hw/i40iw/i40iw_verbs.c: In function 'i40iw_get_vector_affinity':
drivers/infiniband/hw/i40iw/i40iw_verbs.c:2747:9: error: implicit declaration of function 'irq_get_affinity_mask'; did you mean 'irq_create_affinity_masks'? [-Werror=implicit-function-declaration]
return irq_get_affinity_mask(msix_vec->irq);
^~~~~~~~~~~~~~~~~~~~~
irq_create_affinity_masks
drivers/infiniband/hw/i40iw/i40iw_verbs.c:2747:9: error: returning 'int' from a function with return type 'const struct cpumask *' makes pointer from integer without a cast [-Werror=int-conversion]
return irq_get_affinity_mask(msix_vec->irq);
Fixes: 7e952b19eb ("i40iw: Implement get_vector_affinity API")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The mlx5 driver needs to be able to issue invalidation to ODP MRs
even if it cannot allocate memory. To this end it preallocates
emergency pages to use when the situation arises.
This flow should be extremely rare enough, that we don't need
to worry about contention and therefore a single emergency page
is good enough.
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Instead synchronizing RCU in a loop when removing mkeys in a batch do it
once at the end before freeing them. The result is only waiting for one
RCU grace period instead of many serially.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
cma_port_is_unique() allows local port reuse if the quad (source
address and port, destination address and port) for this connection
is unique. However, if the destination info is zero or unspecified, it
can't make a correct decision but still allows port reuse. For example,
sometimes rdma_bind_addr() is called with unspecified destination and
reusing the port can lead to creating a connection with a duplicate quad,
after the destination is resolved. The issue manifests when MPI scale-up
tests hang after the duplicate quad is used.
Set the destination address family and add checks for zero destination
address and port to prevent source port reuse based on invalid destination.
Fixes: 19b752a19d ("IB/cma: Allow port reuse for rdma_id")
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Split IPv6 addresses at the colon that separates the IPv6 address
and the port number instead of at a colon in the middle of the IPv6
address. Check whether the IPv6 address is surrounded with square
brackets.
Fixes: 19f313438c ("IB/srp: Add RDMA/CM support")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
All callers to ib_modify_qp_is_ok() provides enum ib_qp_state
makes the checks of out-of-scope redundant. Let's remove them
together with updating function signature to return boolean result.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>