linux

Author	SHA1	Message	Date
Chuhong Yuan	44a7b67590	RDMA/cma: add missed unregister_pernet_subsys in init failure The driver forgets to call unregister_pernet_subsys() in the error path of cma_init(). Add the missed call to fix it. Fixes: `4be74b42a6` ("IB/cma: Separate port allocation to network namespaces") Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Link: https://lore.kernel.org/r/20191206012426.12744-1-hslester96@gmail.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-12-09 12:02:11 -05:00
Sabrina Dubroca	6c8991f415	net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup ipv6_stub uses the ip6_dst_lookup function to allow other modules to perform IPv6 lookups. However, this function skips the XFRM layer entirely. All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the ip_route_output_key and ip_route_output helpers) for their IPv4 lookups, which calls xfrm_lookup_route(). This patch fixes this inconsistent behavior by switching the stub to ip6_dst_lookup_flow, which also calls xfrm_lookup_route(). This requires some changes in all the callers, as these two functions take different arguments and have different return types. Fixes: `5f81bd2e5d` ("ipv6: export a stub for IPv6 symbols used by vxlan") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-04 12:27:13 -08:00
Linus Torvalds	0da522107e	compat_ioctl: remove most of fs/compat_ioctl.c As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite. Signed-off-by: Arnd Bergmann <arnd@arndb.de> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJdsHCdAAoJEJpsee/mABjZtYkP/1JGl3jFv3Iq/5BCdPkaePP1 RtMJRNfURgK3GeuHUui330PvVjI/pLWXU/VXMK2MPTASpJLzYz3uCaZrpVWEMpDZ +ImzGmgJkITlW1uWU3zOcQhOxTyb1hCZ0Ci+2xn9QAmyOL7prXoXCXDWv3h6iyiF lwG+nW+HNtyx41YG+9bRfKNoG0ZJ+nkJ70BV6u0acQHXWn7Xuupa9YUmBL87hxAL 6dlJfLTJg6q8QSv/Q6LxslfWk2Ti8OOJZOwtFM5R8Bgl0iUcvshiRCKfv/3t9jXD dJNvF1uq8z+gracWK49Qsfq5dnZ2ZxHFUo9u0NjbCrxNvWH/sdvhbaUBuJI75seH VIznCkdxFhrqitJJ8KmxANxG08u+9zSKjSlxG2SmlA4qFx/AoStoHwQXcogJscNb YIXYKmWBvwPzYu09QFAXdHFPmZvp/3HhMWU6o92lvDhsDwzkSGt3XKhCJea4DCaT m+oCcoACqSWhMwdbJOEFofSub4bY43s5iaYuKes+c8O261/Dwg6v/pgIVez9mxXm TBnvCsotq5m8wbwzv99eFqGeJH8zpDHrXxEtRR5KQqMqjLq/OQVaEzmpHZTEuK7n e/V/PAKo2/V63g4k6GApQXDxnjwT+m0aWToWoeEzPYXS6KmtWC91r4bWtslu3rdl bN65armTm7bFFR32Avnu =lgCl -----END PGP SIGNATURE----- Merge tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann: "As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite" * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits) scsi: sd: enable compat ioctls for sed-opal pktcdvd: add compat_ioctl handler compat_ioctl: move SG_GET_REQUEST_TABLE handling compat_ioctl: ppp: move simple commands into ppp_generic.c compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic compat_ioctl: unify copy-in of ppp filters tty: handle compat PPP ioctls compat_ioctl: move SIOCOUTQ out of compat_ioctl.c compat_ioctl: handle SIOCOUTQNSD af_unix: add compat_ioctl support compat_ioctl: reimplement SG_IO handling compat_ioctl: move WDIOC handling into wdt drivers fs: compat_ioctl: move FITRIM emulation into file systems gfs2: add compat_ioctl support compat_ioctl: remove unused convert_in_user macro compat_ioctl: remove last RAID handling code compat_ioctl: remove /dev/raw ioctl translation compat_ioctl: remove PCI ioctl translation compat_ioctl: remove joystick ioctl translation ...	2019-12-01 13:46:15 -08:00
Linus Torvalds	aa32f11691	hmm related patches for 5.5 This is another round of bug fixing and cleanup. This time the focus is on the driver pattern to use mmu notifiers to monitor a VA range. This code is lifted out of many drivers and hmm_mirror directly into the mmu_notifier core and written using the best ideas from all the driver implementations. This removes many bugs from the drivers and has a very pleasing diffstat. More drivers can still be converted, but that is for another cycle. - A shared branch with RDMA reworking the RDMA ODP implementation - New mmu_interval_notifier API. This is focused on the use case of monitoring a VA and simplifies the process for drivers - A common seq-count locking scheme built into the mmu_interval_notifier API usable by drivers that call get_user_pages() or hmm_range_fault() with the VA range - Conversion of mlx5 ODP, hfi1, radeon, nouveau, AMD GPU, and Xen GntDev drivers to the new API. This deletes a lot of wonky driver code. - Two improvements for hmm_range_fault(), from testing done by Ralph -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl3cCjQACgkQOG33FX4g mxpp8xAAiR9iOdT28m/tx1GF31XludrMhRZVIiz0vmCIxIiAkWekWEfAEVm9PDnh wdrxTJohSs+B65AK3sfToOM3AIuNCuFVWmbbHI5qmOO76vaSvcZa905Z++pNsawO Bn8mgRCprYoFHcxWLvTvnA5U0g1S2BSSOwBSZI43CbEnVvHjYAR6MnvRqfGMk+NF bf8fTk/x+fl0DCemhynlBLuJkogzoE2Hgl0yPY5bFna4PktOxdpa1yPaQsiqZ7e6 2s2NtM3pbMBJk0W42q5BU+aPhiqfxFFszasPSLBduXrD2xDsG76HJdHj5VydKmfL nelG4BvqJozXTEZWvTEePYhCqaZ41eJZ7Asw8BXtmacVqE5mDlTXo/Zdgbz7yEOR mI5MVyjD5rauZJldUOWXbwrPoWVFRvboauehiSgqvxvT9HvlFp9GKObSuu4gubBQ mzxs4t48tPhA7bswLmw0/pETSogFuVDfaB7hsyY0gi8EwxMFMpw2qFypm1PEEF+C BuUxCSShzvNKrraNe5PWaNNFd3AzIwAOWJHE+poH4bCoXQVr5nA+rq2gnHkdY5vq /xrBCyxkf0U05YoFGYembPVCInMehzp9Xjy8V+SueSvCg2/TYwGDCgGfsbe9dNOP Bc40JpS7BDn5w9nyLUJmOx7jfruNV6kx1QslA7NDDrB/rzOlsEc= =Hj8a -----END PGP SIGNATURE----- Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull hmm updates from Jason Gunthorpe: "This is another round of bug fixing and cleanup. This time the focus is on the driver pattern to use mmu notifiers to monitor a VA range. This code is lifted out of many drivers and hmm_mirror directly into the mmu_notifier core and written using the best ideas from all the driver implementations. This removes many bugs from the drivers and has a very pleasing diffstat. More drivers can still be converted, but that is for another cycle. - A shared branch with RDMA reworking the RDMA ODP implementation - New mmu_interval_notifier API. This is focused on the use case of monitoring a VA and simplifies the process for drivers - A common seq-count locking scheme built into the mmu_interval_notifier API usable by drivers that call get_user_pages() or hmm_range_fault() with the VA range - Conversion of mlx5 ODP, hfi1, radeon, nouveau, AMD GPU, and Xen GntDev drivers to the new API. This deletes a lot of wonky driver code. - Two improvements for hmm_range_fault(), from testing done by Ralph" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: mm/hmm: remove hmm_range_dma_map and hmm_range_dma_unmap mm/hmm: make full use of walk_page_range() xen/gntdev: use mmu_interval_notifier_insert mm/hmm: remove hmm_mirror and related drm/amdgpu: Use mmu_interval_notifier instead of hmm_mirror drm/amdgpu: Use mmu_interval_insert instead of hmm_mirror drm/amdgpu: Call find_vma under mmap_sem nouveau: use mmu_interval_notifier instead of hmm_mirror nouveau: use mmu_notifier directly for invalidate_range_start drm/radeon: use mmu_interval_notifier_insert RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcv RDMA/odp: Use mmu_interval_notifier_insert() mm/hmm: define the pre-processor related parts of hmm.h even if disabled mm/hmm: allow hmm_range to be used with a mmu_interval_notifier or hmm_mirror mm/mmu_notifier: add an interval tree notifier mm/mmu_notifier: define the header pre-processor parts even if disabled mm/hmm: allow snapshot of the special zero page	2019-11-30 10:33:14 -08:00
Linus Torvalds	9a3d7fd275	Driver core patches for 5.5-rc1 Here is the "big" set of driver core patches for 5.5-rc1 There's a few minor cleanups and fixes in here, but the majority of the patches in here fall into two buckets: - debugfs api cleanups and fixes - driver core device link support for boot dependancy issues The debugfs api cleanups are working to slowly refactor the debugfs apis so that it is even harder to use incorrectly. That work has been happening for the past few kernel releases and will continue over time, it's a long-term project/goal The driver core device link support missed 5.4 by just a bit, so it's been sitting and baking for many months now. It's from Saravana Kannan to help resolve the problems that DT-based systems have at boot time with dependancy graphs and kernel modules. Turns out that no one has actually tried to build a generic arm64 kernel with loads of modules and have it "just work" for a variety of platforms (like a distro kernel) The big problem turned out to be a lack of depandancy information between different areas of DT entries, and the work here resolves that problem and now allows devices to boot properly, and quicker than a monolith kernel. All of these patches have been in linux-next for a long time with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXd6m6Q8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+yntJQCcCqg6RQ7LTdHuZv1ETeefXlsfk00An1Jtean6 42bWGx52bGFvAcpjWy8R =P7hq -----END PGP SIGNATURE----- Merge tag 'driver-core-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the "big" set of driver core patches for 5.5-rc1 There's a few minor cleanups and fixes in here, but the majority of the patches in here fall into two buckets: - debugfs api cleanups and fixes - driver core device link support for boot dependancy issues The debugfs api cleanups are working to slowly refactor the debugfs apis so that it is even harder to use incorrectly. That work has been happening for the past few kernel releases and will continue over time, it's a long-term project/goal The driver core device link support missed 5.4 by just a bit, so it's been sitting and baking for many months now. It's from Saravana Kannan to help resolve the problems that DT-based systems have at boot time with dependancy graphs and kernel modules. Turns out that no one has actually tried to build a generic arm64 kernel with loads of modules and have it "just work" for a variety of platforms (like a distro kernel). The big problem turned out to be a lack of dependency information between different areas of DT entries, and the work here resolves that problem and now allows devices to boot properly, and quicker than a monolith kernel. All of these patches have been in linux-next for a long time with no reported issues" * tag 'driver-core-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (68 commits) tracing: Remove unnecessary DEBUG_FS dependency of: property: Add device link support for interrupt-parent, dmas and -gpio(s) debugfs: Fix !DEBUG_FS debugfs_create_automount of: property: Add device link support for "iommu-map" of: property: Fix the semantics of of_is_ancestor_of() i2c: of: Populate fwnode in of_i2c_get_board_info() drivers: base: Fix Kconfig indentation firmware_loader: Fix labels with comma for builtin firmware driver core: Allow device link operations inside sync_state() driver core: platform: Declare ret variable only once cpu-topology: declare parse_acpi_topology in <linux/arch_topology.h> crypto: hisilicon: no need to check return value of debugfs_create functions driver core: platform: use the correct callback type for bus_find_device firmware_class: make firmware caching configurable driver core: Clarify documentation for fwnode_operations.add_links() mailbox: tegra: Fix superfluous IRQ error message net: caif: Fix debugfs on 64-bit platforms mac80211: Use debugfs_create_xul() helper media: c8sectpfe: no need to check return value of debugfs_create functions of: property: Add device link support for iommus, mboxes and io-channels ...	2019-11-27 11:06:20 -08:00
Linus Torvalds	d768869728	RDMA subsystem updates for 5.5 Mainly a collection of smaller of driver updates this cycle. - Various driver updates and bug fixes for siw, bnxt_re, hns, qedr, iw_cxgb4, vmw_pvrdma, mlx5 - Improvements in SRPT from working with iWarp - SRIOV VF support for bnxt_re - Skeleton kernel-doc files for drivers/infiniband - User visible counters for events related to ODP - Common code for tracking of mmap lifetimes so that drivers can link HW object liftime to a VMA - ODP bug fixes and rework - RDMA READ support for efa - Removal of the very old cxgb3 driver -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl3dwFYACgkQOG33FX4g mxohHRAAnkUr0SKh1uV7vuhA8sUlejkwAaw2V2Sm3E1y4GiPpfRAak/FcbEu9F8l E7Q7VI04DRKTTpTRmREVyKYjlKr5QOA4mSryNMfBZobjkp+t++U1GC9YKL3zh65U odyJedtv3zKCLzV9RBwX06C8Vi1PQNQp7RbQ42xH/IyKXJIZPHeCUeKv7PsfTWVo vhuXFc9pPwqHDfRVTbj6s1g+OwVuRHc+SWep6eTKLGYvt8CqdN9WEpA0sJrlwPet NLTZTrFDpuC12RPc4Lo5c7R5MeAzHgojZCeSFaL2DhJLOx3kfmU30wG+rV94Lvsq Y2yt6BwKeLleKzpR5ApkuIVHQt2KECPrJbmVLDFqi+rT7yvzsd2AB+uiCS50+LPG h2UpWJdKWtrnSzTqbJQieXd+oT253Dk+ciy7zbdPSGPwz1dc/Cna9TTn26X2SezR bmRhHOykrh7LCNrv/7jiSq/zWApGTlR9YZ86x9Li+lJ3kkG+7d2DBEofDU+oKE5F p0+1Cer1td5IkN3N8oDpvyXAbCIv7qdWwXB2Cm7dSfUETIpAKnXiBxFCHyvmsuus 7gGyKZh2490N3SoIL1muu12dIvhPC2Obg9ckCNU3ldw9+uPJkCzrs+n+JtkIW0mi i1fMjRQ6FumVT6/KdavYgeCsHmItVqv03SGdFsHF3SGwvWo6y8Y= =UV2u -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Again another fairly quiet cycle with few notable core code changes and the usual variety of driver bug fixes and small improvements. - Various driver updates and bug fixes for siw, bnxt_re, hns, qedr, iw_cxgb4, vmw_pvrdma, mlx5 - Improvements in SRPT from working with iWarp - SRIOV VF support for bnxt_re - Skeleton kernel-doc files for drivers/infiniband - User visible counters for events related to ODP - Common code for tracking of mmap lifetimes so that drivers can link HW object liftime to a VMA - ODP bug fixes and rework - RDMA READ support for efa - Removal of the very old cxgb3 driver" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (168 commits) RDMA/hns: Delete unnecessary callback functions for cq RDMA/hns: Rename the functions used inside creating cq RDMA/hns: Redefine the member of hns_roce_cq struct RDMA/hns: Redefine interfaces used in creating cq RDMA/efa: Expose RDMA read related attributes RDMA/efa: Support remote read access in MR registration RDMA/efa: Store network attributes in device attributes IB/hfi1: remove redundant assignment to variable ret RDMA/bnxt_re: Fix missing le16_to_cpu RDMA/bnxt_re: Fix stat push into dma buffer on gen p5 devices RDMA/bnxt_re: Fix chip number validation Broadcom's Gen P5 series RDMA/bnxt_re: Fix Kconfig indentation IB/mlx5: Implement callbacks for getting VFs GUID attributes IB/ipoib: Add ndo operation for getting VFs GUID attributes IB/core: Add interfaces to get VF node and port GUIDs net/core: Add support for getting VF GUIDs RDMA/qedr: Fix null-pointer dereference when calling rdma_user_mmap_get_offset RDMA/cm: Use refcount_t type for refcount variable IB/mlx5: Support extended number of strides for Striding RQ IB/mlx4: Update HW GID table while adding vlan GID ...	2019-11-27 10:17:28 -08:00
Peter Zijlstra	04ae87a520	ftrace: Rework event_create_dir() Rework event_create_dir() to use an array of static data instead of function pointers where possible. The problem is that it would call the function pointer on module load before parse_args(), possibly even before jump_labels were initialized. Luckily the generated functions don't use jump_labels but it still seems fragile. It also gets in the way of changing when we make the module map executable. The generated function are basically calling trace_define_field() with a bunch of static arguments. So instead of a function, capture these arguments in a static array, avoiding the function call. Now there are a number of cases where the fields are dynamic (syscall arguments, kprobes and uprobes), in which case a static array does not work, for these we preserve the function call. Luckily all these cases are not related to modules and so we can retain the function call for them. Also fix up all broken tracepoint definitions that now generate a compile error. Tested-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191111132458.342979914@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-11-27 07:44:25 +01:00
Yixian Liu	f295e4cece	RDMA/hns: Delete unnecessary callback functions for cq Currently, when cq event occurred, we first call our own callback functions in the event process function, then call ib callback functions. Actually, we can directly call ib callback functions. Link: https://lore.kernel.org/r/1574044493-46984-5-git-send-email-liweihang@hisilicon.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-25 10:31:48 -04:00
Yixian Liu	707783ab5f	RDMA/hns: Rename the functions used inside creating cq Current names of functions are not proper, such as hns_roce_free_cq, actually it means free cqc, thus we rename them. Furthermore, functions used inside one file can be named without the prefix hns_roce_ which will make the functions for verbs symbols more eye-catching. Link: https://lore.kernel.org/r/1574044493-46984-4-git-send-email-liweihang@hisilicon.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-25 10:31:48 -04:00
Yixian Liu	18a96d25ce	RDMA/hns: Redefine the member of hns_roce_cq struct There is no need to package buf and mtt into hns_roce_cq_buf, which will make code more complex, just delete this struct and move buf and mtt into hns_roce_cq. Furthermore, we add size member for hns_roce_buf to avoid repeatly calculating where needed it. Link: https://lore.kernel.org/r/1574044493-46984-3-git-send-email-liweihang@hisilicon.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-25 10:31:48 -04:00
Yixian Liu	e2b2744a06	RDMA/hns: Redefine interfaces used in creating cq Some interfaces defined with unnecessary input parameters, such as "nent" and "vector". This patch redefined these interfaces to make the code more readable and simple. Link: https://lore.kernel.org/r/1574044493-46984-2-git-send-email-liweihang@hisilicon.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-25 10:31:48 -04:00
Daniel Kranzdorf	666e8ff535	RDMA/efa: Expose RDMA read related attributes Query the device attributes for RDMA operations, including maximum transfer size and maximum number of SGEs per RDMA WR, and report them back to the userspace library. Link: https://lore.kernel.org/r/20191121141509.59297-4-galpress@amazon.com Signed-off-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-25 10:31:48 -04:00
Daniel Kranzdorf	e6c4f3ff43	RDMA/efa: Support remote read access in MR registration Enable remote read access for memory regions in order to support RDMA operations. Link: https://lore.kernel.org/r/20191121141509.59297-3-galpress@amazon.com Signed-off-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-25 10:31:47 -04:00
Gal Pressman	bcf7cc534c	RDMA/efa: Store network attributes in device attributes There's no reason to separate the network attributes from all other device attributes. Embed the fields inside the device attributes and query them all in one function. Link: https://lore.kernel.org/r/20191121141509.59297-2-galpress@amazon.com Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-25 10:31:47 -04:00
Colin Ian King	25d24f4241	IB/hfi1: remove redundant assignment to variable ret The variable ret is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Link: https://lore.kernel.org/r/20191122154814.87257-1-colin.king@canonical.com Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-25 10:31:47 -04:00
Devesh Sharma	fca5b9dc09	RDMA/bnxt_re: Fix missing le16_to_cpu From sparse: drivers/infiniband/hw/bnxt_re/main.c:1274:18: warning: cast from restricted __le16 drivers/infiniband/hw/bnxt_re/main.c:1275:18: warning: cast from restricted __le16 drivers/infiniband/hw/bnxt_re/main.c:1276:18: warning: cast from restricted __le16 drivers/infiniband/hw/bnxt_re/main.c:1277:21: warning: restricted __le16 degrades to integer Fixes: `2b827ea192` ("RDMA/bnxt_re: Query HWRM Interface version from FW") Link: https://lore.kernel.org/r/1574317343-23300-4-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-25 10:31:47 -04:00
Devesh Sharma	98998ffe52	RDMA/bnxt_re: Fix stat push into dma buffer on gen p5 devices Due to recent advances in the firmware for Broadcom's gen p5 series of adaptors the driver code to report hardware counters has been broken w.r.t. roce devices. The new firmware command expects dma length to be specified during stat dma buffer allocation. Fixes: `2792b5b95e` ("bnxt_en: Update firmware interface spec. to 1.10.0.89.") Link: https://lore.kernel.org/r/1574317343-23300-3-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-25 10:31:47 -04:00
Luke Starrett	e284b159c6	RDMA/bnxt_re: Fix chip number validation Broadcom's Gen P5 series In the first version of Gen P5 ASIC, chip-id was always set to 0x1750 for all adaptor port configurations. This has been fixed in the new chip rev. Due to this missing fix users are not able to use adaptors based on latest chip rev of Broadcom's Gen P5 adaptors. Fixes: `ae8637e131` ("RDMA/bnxt_re: Add chip context to identify 57500 series") Link: https://lore.kernel.org/r/1574317343-23300-2-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Naresh Kumar PBS <nareshkumar.pbs@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Luke Starrett <luke.starrett@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-25 10:31:47 -04:00
Krzysztof Kozlowski	6e419e35e6	RDMA/bnxt_re: Fix Kconfig indentation Adjust indentation from spaces to tab (+optional two spaces) as in coding style with command like: $ sed -e 's/^ /\t/' -i */Kconfig Link: https://lore.kernel.org/r/20191120134138.15245-1-krzk@kernel.org Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-25 10:31:47 -04:00
Jason Gunthorpe	3694e41e41	Merge branch 'ib-guids' into rdma.git for-next Danit Goldberg says: ==================== This series extends RTNETLINK to provide IB port and node GUIDs, which were configured for Infiniband VFs. The functionality to set VF GUIDs already existed for a long time, and here we are adding the missing "get" so that netlink will be symmetric and various cloud orchestration tools will be able to manage such VFs more naturally. The iproute2 was extended too to present those GUIDs. - ip link show <device> For example: - ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33 - ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10 - ip link show ib4 ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff vf 0 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off ==================== Based on the mlx5-next branch from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for dependencies * branch 'ib-guids': (35 commits) IB/mlx5: Implement callbacks for getting VFs GUID attributes IB/ipoib: Add ndo operation for getting VFs GUID attributes IB/core: Add interfaces to get VF node and port GUIDs net/core: Add support for getting VF GUIDs net/mlx5: Add new chain for netfilter flow table offload net/mlx5: Refactor creating fast path prio chains net/mlx5: Accumulate levels for chains prio namespaces net/mlx5: Define fdb tc levels per prio net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines net/mlx5: Simplify fdb chain and prio eswitch defines IB/mlx5: Load profile according to RoCE enablement state IB/mlx5: Rename profile and init methods net/mlx5: Handle "enable_roce" devlink param net/mlx5: Document flow_steering_mode devlink param devlink: Add new "enable_roce" generic device param net/mlx5: fix spelling mistake "metdata" -> "metadata" net/mlx5: fix kvfree of uninitialized pointer spec IB/mlx5: Introduce and use mlx5_core_is_vf() net/mlx5: E-switch, Enable metadata on own vport net/mlx5: Refactor ingress acl configuration ... Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-25 10:31:47 -04:00
Jason Gunthorpe	3889551db2	RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcv This converts one of the two users of mmu_notifiers to use the new API. The conversion is fairly straightforward, however the existing use of notifiers here seems to be racey. Link: https://lore.kernel.org/r/20191112202231.3856-7-jgg@ziepe.ca Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-23 19:56:44 -04:00
Jason Gunthorpe	f25a546e65	RDMA/odp: Use mmu_interval_notifier_insert() Replace the internal interval tree based mmu notifier with the new common mmu_interval_notifier_insert() API. This removes a lot of code and fixes a deadlock that can be triggered in ODP: zap_page_range() mmu_notifier_invalidate_range_start() [..] ib_umem_notifier_invalidate_range_start() down_read(&per_mm->umem_rwsem) unmap_single_vma() [..] __split_huge_page_pmd() mmu_notifier_invalidate_range_start() [..] ib_umem_notifier_invalidate_range_start() down_read(&per_mm->umem_rwsem) // DEADLOCK mmu_notifier_invalidate_range_end() up_read(&per_mm->umem_rwsem) mmu_notifier_invalidate_range_end() up_read(&per_mm->umem_rwsem) The umem_rwsem is held across the range_start/end as the ODP algorithm for invalidate_range_end cannot tolerate changes to the interval tree. However, due to the nested invalidation regions the second down_read() can deadlock if there are competing writers. The new core code provides an alternative scheme to solve this problem. Fixes: `ca748c39ea` ("RDMA/umem: Get rid of per_mm->notifier_count") Link: https://lore.kernel.org/r/20191112202231.3856-6-jgg@ziepe.ca Tested-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-23 19:56:44 -04:00
Taehee Yoo	ab818362c9	net: use rhashtable_lookup() instead of rhashtable_lookup_fast() rhashtable_lookup_fast() internally calls rcu_read_lock() then, calls rhashtable_lookup(). So if rcu_read_lock() is already held, rhashtable_lookup() is enough. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-11-23 12:15:01 -08:00
Danit Goldberg	9c0015ef09	IB/mlx5: Implement callbacks for getting VFs GUID attributes Implement the IB defined callback mlx5_ib_get_vf_guid used to query FW for VFs attributes and return node and port GUIDs. Signed-off-by: Danit Goldberg <danitg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-11-22 18:17:24 +02:00
Danit Goldberg	2446887ed2	IB/ipoib: Add ndo operation for getting VFs GUID attributes Add ndo operation to the network driver that enables configuring ipoib_get_vf_guid operation. The operation allows to get a VF port and node GUIDs. Signed-off-by: Danit Goldberg <danitg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-11-22 18:17:24 +02:00
Danit Goldberg	bfcb3c5d14	IB/core: Add interfaces to get VF node and port GUIDs Provide ability to get node and port GUIDs of VFs to be symmetrical to already existing set option. Signed-off-by: Danit Goldberg <danitg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-11-22 18:17:24 +02:00
Michal Kalderon	a25984f3ba	RDMA/qedr: Fix null-pointer dereference when calling rdma_user_mmap_get_offset When running against rdma-core that doesn't support doorbell recovery, the rdma_user_mmap_entry won't be allocated for doorbell recovery related mappings. We have a flag indicating whether rdma-core supports doorbell recovery or not which was used during initialization, however some cases didn't check that the rdma_user_mmap_entry exists before attempting to acquire it's offset. Fixes: `97f6125092` ("RDMA/qedr: Add doorbell overflow recovery support") Link: https://lore.kernel.org/r/20191118150645.26602-1-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-19 16:15:36 -04:00
Danit Goldberg	0acc637dac	RDMA/cm: Use refcount_t type for refcount variable This atomic in struct cm_id_private is being used as a refcount, change it to refcount_t for better clarity and to get the refcount protections. Link: https://lore.kernel.org/r/1573997601-4502-1-git-send-email-danitg@mellanox.com Signed-off-by: Danit Goldberg <danitg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-19 16:10:04 -04:00
Mark Zhang	c16339b69c	IB/mlx5: Support extended number of strides for Striding RQ Extends the minimum single WQE strides from 64 to 8, which is exposed by the "min_single_wqe_log_num_of_strides" field of striding_rq_caps. Choose right number of strides based on FW capability. Link: https://lore.kernel.org/r/20191115154555.247856-1-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-19 16:05:41 -04:00
Danit Goldberg	ff3195b3ed	IB/mlx4: Update HW GID table while adding vlan GID When adding a new GID compare the vlan along with the GID and type. This allows vlan's to have GIDs that alias each other, such as the default GID. Otherwise they the GID cache view can become inconsistent with the HW view. Link: https://lore.kernel.org/r/20191115154457.247763-1-leon@kernel.org Signed-off-by: Danit Goldberg <danitg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-19 15:58:55 -04:00
Christophe JAILLET	9067f2f0b4	RDMA/iw_cgxb4: Fix an error handling path in 'c4iw_connect()' We should jump to fail3 in order to undo the 'xa_insert_irq()' call. Link: https://lore.kernel.org/r/20190923190746.10964-1-christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-17 10:37:00 -04:00
Dag Moxnes	e1ee1e62be	RDMA/cma: Use ACK timeout for RoCE packetLifeTime The cma is currently using a hard-coded value, CMA_IBOE_PACKET_LIFETIME, for the PacketLifeTime, as it can not be determined from the network. This value might not be optimal for all networks. The cma module supports the function rdma_set_ack_timeout to set the ACK timeout for a QP associated with a connection. As per IBTA 12.7.34 local ACK timeout = (2 * PacketLifeTime + Local CA’s ACK delay). Assuming a negligible local ACK delay, we can use PacketLifeTime = local ACK timeout/2 as a reasonable approximation for RoCE networks. Link: https://lore.kernel.org/r/1572439440-17416-1-git-send-email-dag.moxnes@oracle.com Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-17 10:37:00 -04:00
Christoph Hellwig	72b894b09a	IB/umem: remove the dmasync argument to ib_umem_get The argument is always ignored, so remove it. Link: https://lore.kernel.org/r/20191113073214.9514-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-17 10:37:00 -04:00
David S. Miller	19b7e21c55	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Lots of overlapping changes and parallel additions, stuff like that. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-16 21:51:42 -08:00
Christoph Hellwig	7283fff8b5	dma-mapping: remove the DMA_ATTR_WRITE_BARRIER flag This flag is not implemented by any backend and only set by the ib_umem module in a single instance. Link: https://lore.kernel.org/r/20191113073214.9514-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-14 12:01:54 -04:00
Gal Pressman	64c264872b	RDMA/efa: Clear the admin command buffer prior to its submission We cannot rely on the entry memcpy as we only copy the actual size of the command, the rest of the bytes must be memset to zero. Currently providing non-zero memory will not have any user visible impact. However, since admin commands are extendable (in a backwards compatible way) everything beyond the size of the command must be cleared to prevent issues in the future. Fixes: `0420e54256` ("RDMA/efa: Implement functions that submit and complete admin commands") Link: https://lore.kernel.org/r/20191112092608.46964-1-galpress@amazon.com Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Firas JahJah <firasj@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-14 11:57:33 -04:00
Bernard Metzler	289b20b2a5	RDMA/siw: Cleanup unused mmap structures. Removes obsolete driver specific mmap information after generalization of RDMA driver mmap service. Also removes useless forward declaration of struct siw_mr. Fixes: `11f1a75567` ("RDMA/siw: Use the common mmap_xa helpers") Link: https://lore.kernel.org/r/20191113153404.7402-1-bmt@zurich.ibm.com Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-14 11:56:11 -04:00
Kamal Heib	9a5407d74c	RDMA/qedr: Make qedr_iw_load_qp() static The function qedr_iw_load_qp() is only used in qedr_iw_cm.c Fixes: `82af6d19d8` ("RDMA/qedr: Fix synchronization methods and memory leaks in qedr") Link: https://lore.kernel.org/r/20191110113645.20058-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-14 11:55:31 -04:00
Colin Ian King	6296665cee	RDMA/ocrdma: Fix spelling mistake in variable name There is a spelling mistake in the variable nak_invalid_requst_errors, rename it to nak_invalid_request_errors. Link: https://lore.kernel.org/r/20191107224855.417647-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-14 11:51:42 -04:00
Viresh Kumar	7ee23491b3	RDMA/qib: Validate ->show()/store() callbacks before calling them The permissions of the read-only or write-only sysfs files can be changed (as root) and the user can then try to read a write-only file or write to a read-only file which will lead to kernel crash here. Protect against that by always validating the show/store callbacks. Link: https://lore.kernel.org/r/d45cc26361a174ae12dbb86c994ef334d257924b.1573096807.git.viresh.kumar@linaro.org Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-14 11:49:15 -04:00
Pan Bian	da046d5f89	RDMA/i40iw: Fix potential use after free Release variable dst after logging dst->error to avoid possible use after free. Link: https://lore.kernel.org/r/1573022651-37171-1-git-send-email-bianpan2016@163.com Signed-off-by: Pan Bian <bianpan2016@163.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-14 11:47:18 -04:00
Pan Bian	960657b732	RDMA/qedr: Fix potential use after free Move the release operation after error log to avoid possible use after free. Link: https://lore.kernel.org/r/1573021434-18768-1-git-send-email-bianpan2016@163.com Signed-off-by: Pan Bian <bianpan2016@163.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-14 11:47:18 -04:00
Saeed Mahameed	c94ef13b04	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux 1) New generic devlink param "enable_roce", for downstream devlink reload support 2) Do vport ACL configuration on per vport basis when enabling/disabling a vport. This enables to have vports enabled/disabled outside of eswitch config for future 3) Split the code for legacy vs offloads mode and make it clear 4) Tide up vport locking and workqueue usage 5) Fix metadata enablement for ECPF 6) Make explicit use of VF property to publish IB_DEVICE_VIRTUAL_FUNCTION 7) E-Switch and flow steering core low level support and refactoring for netfilter flowtables offload Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-11-13 14:24:58 -08:00
Bart Van Assche	e88982ad1b	RDMA/srpt: Report the SCSI residual to the initiator The code added by this patch is similar to the code that already exists in ibmvscsis_determine_resid(). This patch has been tested by running the following command: strace sg_raw -r 1k /dev/sdb 12 00 00 00 60 00 -o inquiry.bin \|& grep resid= Link: https://lore.kernel.org/r/20191105214632.183302-1-bvanassche@acm.org Fixes: `a42d985bd5` ("ib_srpt: Initial SRP Target merge for v3.3-rc1") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Honggang Li <honli@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-13 15:50:41 -04:00
Yevgeny Kliteynik	208d70f562	IB/mlx5: Support flow counters offset for bulk counters Add support for flow steering counters action with a non-base counter ID (offset) for bulk counters. When creating a flow counter object, save the bulk value. This value is used when a flow action with a non-base counter ID is requested - to validate that the required offset is in the range of the allocated bulk. Link: https://lore.kernel.org/r/20191103140723.77411-1-leon@kernel.org Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-13 15:42:36 -04:00
Leon Romanovsky	e26e7b88f6	RDMA: Change MAD processing function to remove extra casting and parameter All users of process_mad() converts input pointers from ib_mad_hdr to be ib_mad, update the function declaration to use ib_mad directly. Also remove not used input MAD size parameter. Link: https://lore.kernel.org/r/20191029062745.7932-17-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Tested-By: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-12 20:20:15 -04:00
Leon Romanovsky	333ee7e2d0	RDMA/hfi1: Delete unreachable code All callers allocate MAD structures with proper sizes, there is no need to recheck it. Link: https://lore.kernel.org/r/20191029062745.7932-8-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-12 20:20:14 -04:00
Michael Guralnik	94de879c28	IB/mlx5: Load profile according to RoCE enablement state When RoCE is disabled load mlx5_ib in raw_eth profile. Clean pf_profile roce capability checks as it will not be used without roce capability. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-11-11 12:15:29 -08:00
Michael Guralnik	b5a498baf9	IB/mlx5: Rename profile and init methods Rename uplink_rep_profile and its unique init and cleanup stages to suit its upcoming use as the profile when RoCE is disabled. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-11-11 12:15:29 -08:00
Wenpeng Liang	d11769fdc1	RDMA/hns: Modify appropriate printings Modify some printings that is not in uniformed style, non-standard or with spelling errors. Link: https://lore.kernel.org/r/1572952082-6681-10-git-send-email-liweihang@hisilicon.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-08 16:37:55 -04:00
Yixian Liu	1ceb0b11a8	RDMA/hns: Fix non-standard error codes It is better to return a linux error code than define a private constant. Link: https://lore.kernel.org/r/1572952082-6681-9-git-send-email-liweihang@hisilicon.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-08 16:37:54 -04:00
Lang Cheng	301cc7eb2c	RDMA/hns: Modify hns_roce_hw_v2_get_cfg to simplify the code Merge base configuration of hr_dev into hns_roce_hw_v2_get_cfg(). In addition, there is no need to return 0 at last, so we change return type of it to void. Link: https://lore.kernel.org/r/1572952082-6681-8-git-send-email-liweihang@hisilicon.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-08 16:37:54 -04:00
Lang Cheng	880f133c60	RDMA/hns: Simplify doorbell initialization code If a variable needs to be set to 0 before use, it can be directly initialized to 0. Link: https://lore.kernel.org/r/1572952082-6681-7-git-send-email-liweihang@hisilicon.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-08 16:37:54 -04:00
Yixing Liu	6eef524201	RDMA/hns: Replace not intuitive function/macro names Replace "sw2hw" and "hw2sw" which is hard to understand with "create" and "destroy". Link: https://lore.kernel.org/r/1572952082-6681-6-git-send-email-liweihang@hisilicon.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-08 16:37:54 -04:00
Yixian Liu	d938d7856f	RDMA/hns: Modify fields of struct hns_roce_srq Use wqe_cnt instead of max which means the queue size of srq, and remove wqe_ctr which is not used. Link: https://lore.kernel.org/r/1572952082-6681-5-git-send-email-liweihang@hisilicon.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-08 16:37:54 -04:00
Yixian Liu	03ccba5c2c	RDMA/hns: Delete unnecessary uar from hns_roce_cq The uar information is already recorded in priv_uar of hns_roce_dev, there is no need to record it in hns_roce_cq again. Link: https://lore.kernel.org/r/1572952082-6681-4-git-send-email-liweihang@hisilicon.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-08 16:37:53 -04:00
Lang Cheng	16a11e0bff	RDMA/hns: Remove unnecessary structure hns_roce_sqp Special QP have no differences with normal qp in data structure, so definition of struct hns_roce_sqp should be removed and replaced by struct hns_roce_qp. Link: https://lore.kernel.org/r/1572952082-6681-3-git-send-email-liweihang@hisilicon.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-08 16:37:53 -04:00
Yixian Liu	ec6adad0a1	RDMA/hns: Delete unnecessary variable max_post There is no need to define max_post in hns_roce_wq, as it does same thing as wqe_cnt. Link: https://lore.kernel.org/r/1572952082-6681-2-git-send-email-liweihang@hisilicon.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-08 16:37:53 -04:00
Leon Romanovsky	ffa2fd1323	RDMA/mlx5: Rewrite MAD processing logic to be readable Multiple "if"s and "\|\|" make extension of process_mad() function as a tedious task, rewrite that function to be more readable. Link: https://lore.kernel.org/r/20191029062745.7932-14-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 16:00:06 -04:00
Leon Romanovsky	84b56d57cf	RDMA/ocrdma: Simplify process_mad function Change the switch with one case into a simple if statement so the code is less confusing. Link: https://lore.kernel.org/r/20191029062745.7932-12-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 16:00:06 -04:00
Leon Romanovsky	dd0b0159f7	RDMA/mad: Do not check MAD sizes in roce and ib drivers All callers for process_mad allocate MAD structures with proper sizes, there is no need to recheck it. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 15:55:22 -04:00
Leon Romanovsky	6a42265c91	RDMA/ocrdma: Make ocrdma_pma_counters() return void This function always returns 0, so just use void and remove the bogus checking at the only call site. Link: https://lore.kernel.org/r/20191029062745.7932-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 15:38:57 -04:00
Leon Romanovsky	be4a8d4673	RDMA/mad: Allocate zeroed MAD buffer Ensure that MAD output buffer is zero-based allocated in all the callers of process_mad and remove the various memset()'s from the drivers. Link: https://lore.kernel.org/r/20191029062745.7932-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 15:38:28 -04:00
Leon Romanovsky	874e476ba9	RDMA/qib: Delete empty check_cc_key function Function always returns zero, just delete it. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 15:25:24 -04:00
Leon Romanovsky	688eec9d3d	RDMA/qib: Delete extra line Trivial cleanup to fix the following warning: drivers/infiniband/hw/qib/qib_iba6120.c:1420: warning: bad line: Fixes: `f931551baf` ("IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters") Link: https://lore.kernel.org/r/20191029062745.7932-15-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 15:19:33 -04:00
Leon Romanovsky	8a80cf9310	RDMA/mad: Delete never implemented functions Delete never implemented and used MAD functions. Link: https://lore.kernel.org/r/20191029062745.7932-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 15:14:23 -04:00
Bart Van Assche	77cf98d4ec	Revert "RDMA/srpt: Postpone HCA removal until after configfs directory removal" Although the mentioned patch fixes a use-after-free bug, it introduces a hang during shutdown. Since the latter is worse, revert this patch. Link: https://lore.kernel.org/r/20191101204756.182162-1-bvanassche@acm.org Reported-by: Honggang Li <honli@redhat.com> Fixes: `9b64f7d0bb` ("RDMA/srpt: Postpone HCA removal until after configfs directory removal") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Honggang Li <honli@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 15:13:03 -04:00
Wenpeng Liang	411c1e6774	RDMA/hns: Correct the value of srq_desc_size srq_desc_size should be rounded up to pow of two before used, or related calculation may cause allocating wrong size of memory for srq buffer. Fixes: `c7bcb13442` ("RDMA/hns: Add SRQ support for hip08 kernel mode") Link: https://lore.kernel.org/r/1572575610-52530-3-git-send-email-liweihang@hisilicon.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 13:37:02 -04:00
Sirong Wang	531eb45b3d	RDMA/hns: Correct the value of HNS_ROCE_HEM_CHUNK_LEN Size of pointer to buf field of struct hns_roce_hem_chunk should be considered when calculating HNS_ROCE_HEM_CHUNK_LEN, or sg table size will be larger than expected when allocating hem. Fixes: `9a4435375c` ("IB/hns: Add driver files for hns RoCE driver") Link: https://lore.kernel.org/r/1572575610-52530-2-git-send-email-liweihang@hisilicon.com Signed-off-by: Sirong Wang <wangsirong@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 13:37:02 -04:00
Kamal Heib	ad0593ec89	RDMA/qedr: Remove unsupported modify_port callback There is no need to return always zero for function which is not supported. Fixes: `ac1b36e55a` ("qedr: Add support for user context verbs") Link: https://lore.kernel.org/r/20191028155931.1114-5-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 13:30:24 -04:00
Kamal Heib	6135b71159	RDMA/ocrdma: Remove unsupported modify_port callback There is no need to return always zero for function which is not supported. Fixes: `fe2caefcdf` ("RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA adapter") Link: https://lore.kernel.org/r/20191028155931.1114-4-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 13:29:15 -04:00
Kamal Heib	25f3b49b92	RDMA/hns: Remove unsupported modify_port callback There is no need to return always zero for function which is not supported. Fixes: `9a4435375c` ("IB/hns: Add driver files for hns RoCE driver") Link: https://lore.kernel.org/r/20191028155931.1114-3-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 13:29:15 -04:00
Kamal Heib	55bfe905fa	RDMA/core: Fix return code when modify_port isn't supported Improve return code from ib_modify_port() by doing the following: - Use "-EOPNOTSUPP" instead "-ENOSYS" which is the proper return code - Allow only fake IB_PORT_CM_SUP manipulation for RoCE providers that didn't implement the modify_port callback, otherwise return "-EOPNOTSUPP" Fixes: `61e0962d52` ("IB: Avoid ib_modify_port() failure for RoCE devices") Link: https://lore.kernel.org/r/20191028155931.1114-2-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 13:29:15 -04:00
Kaike Wan	ce8e8087cf	IB/hfi1: TID RDMA WRITE should not return IB_WC_RNR_RETRY_EXC_ERR Normal RDMA WRITE request never returns IB_WC_RNR_RETRY_EXC_ERR to ULPs because it does not need post receive buffer on the responder side. Consequently, as an enhancement to normal RDMA WRITE request inside the hfi1 driver, TID RDMA WRITE request should not return such an error status to ULPs, although it does receive RNR NAKs from the responder when TID resources are not available. This behavior is violated when qp->s_rnr_retry_cnt is set in current hfi1 implementation. This patch enforces these semantics by avoiding any reaction to the updates of the RNR QP attributes. Fixes: `3c6cb20a0d` ("IB/hfi1: Add TID RDMA WRITE functionality into RDMA verbs") Link: https://lore.kernel.org/r/20191025195842.106825.71532.stgit@awfm-01.aw.intel.com Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 13:15:36 -04:00
Kaike Wan	c2be3865a1	IB/hfi1: Calculate flow weight based on QP MTU for TID RDMA For a TID RDMA WRITE request, a QP on the responder side could be put into a queue when a hardware flow is not available. A RNR NAK will be returned to the requester with a RNR timeout value based on the position of the QP in the queue. The tid_rdma_flow_wt variable is used to calculate the timeout value and is determined by using a MTU of 4096 at the module loading time. This could reduce the timeout value by half from the desired value, leading to excessive RNR retries. This patch fixes the issue by calculating the flow weight with the real MTU assigned to the QP. Fixes: `07b923701e` ("IB/hfi1: Add functions to receive TID RDMA WRITE request") Link: https://lore.kernel.org/r/20191025195836.106825.77769.stgit@awfm-01.aw.intel.com Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 13:15:36 -04:00
Kaike Wan	c1abd865bd	IB/hfi1: Ensure r_tid_ack is valid before building TID RDMA ACK packet The index r_tid_ack is used to indicate the next TID RDMA WRITE request to acknowledge in the ring s_ack_queue[] on the responder side and should be set to a valid index other than its initial value before r_tid_tail is advanced to the next TID RDMA WRITE request and particularly before a TID RDMA ACK is built. Otherwise, a NULL pointer dereference may result: BUG: unable to handle kernel paging request at ffff9a32d27abff8 IP: [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1] PGD 2749032067 PUD 0 Oops: 0000 1 SMP Modules linked in: osp(OE) ofd(OE) lfsck(OE) ost(OE) mgc(OE) osd_zfs(OE) lquota(OE) lustre(OE) lmv(OE) mdc(OE) lov(OE) fid(OE) fld(OE) ko2iblnd(OE) ptlrpc(OE) obdclass(OE) lnet(OE) libcfs(OE) ib_ipoib(OE) hfi1(OE) rdmavt(OE) nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_mod target_core_mod ib_ucm dm_mirror dm_region_hash dm_log mlx5_ib dm_mod zfs(POE) rpcrdma sunrpc rdma_ucm ib_uverbs opa_vnic ib_iser zunicode(POE) ib_umad zavl(POE) icp(POE) sb_edac intel_powerclamp coretemp rdma_cm intel_rapl iosf_mbi iw_cm libiscsi scsi_transport_iscsi kvm ib_cm iTCO_wdt mxm_wmi iTCO_vendor_support irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd zcommon(POE) znvpair(POE) pcspkr spl(OE) mei_me sg mei ioatdma lpc_ich joydev i2c_i801 shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 mlx5_core drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ixgbe ahci ttm mlxfw ib_core libahci devlink mdio crct10dif_pclmul crct10dif_common drm ptp libata megaraid_sas crc32c_intel i2c_algo_bit pps_core i2c_core dca [last unloaded: rdmavt] CPU: 15 PID: 68691 Comm: kworker/15:2H Kdump: loaded Tainted: P W OE ------------ 3.10.0-862.2.3.el7_lustre.x86_64 #1 Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016 Workqueue: hfi0_0 _hfi1_do_tid_send [hfi1] task: ffff9a01f47faf70 ti: ffff9a11776a8000 task.ti: ffff9a11776a8000 RIP: 0010:[<ffffffffc0d87ea6>] [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1] RSP: 0018:ffff9a11776abd08 EFLAGS: 00010002 RAX: ffff9a32d27abfc0 RBX: ffff99f2d27aa000 RCX: 00000000ffffffff RDX: 0000000000000000 RSI: 0000000000000220 RDI: ffff99f2ffc05300 RBP: ffff9a11776abd88 R08: 000000000001c310 R09: ffffffffc0d87ad4 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9a117a423c00 R13: ffff9a117a423c00 R14: ffff9a03500c0000 R15: ffff9a117a423cb8 FS: 0000000000000000(0000) GS:ffff9a117e9c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff9a32d27abff8 CR3: 0000002748a0e000 CR4: 00000000001607e0 Call Trace: [<ffffffffc0d88874>] _hfi1_do_tid_send+0x194/0x320 [hfi1] [<ffffffffaf0b2dff>] process_one_work+0x17f/0x440 [<ffffffffaf0b3ac6>] worker_thread+0x126/0x3c0 [<ffffffffaf0b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0 [<ffffffffaf0bae31>] kthread+0xd1/0xe0 [<ffffffffaf0bad60>] ? insert_kthread_work+0x40/0x40 [<ffffffffaf71f5f7>] ret_from_fork_nospec_begin+0x21/0x21 [<ffffffffaf0bad60>] ? insert_kthread_work+0x40/0x40 hfi1 0000:05:00.0: hfi1_0: reserved_op: opcode 0xf2, slot 2, rsv_used 1, rsv_ops 1 Code: 00 00 41 8b 8d d8 02 00 00 89 c8 48 89 45 b0 48 c1 65 b0 06 48 8b 83 a0 01 00 00 48 01 45 b0 48 8b 45 b0 41 80 bd 10 03 00 00 00 <48> 8b 50 38 4c 8d 7a 50 74 45 8b b2 d0 00 00 00 85 f6 0f 85 72 RIP [<ffffffffc0d87ea6>] hfi1_make_tid_rdma_pkt+0x476/0xcb0 [hfi1] RSP <ffff9a11776abd08> CR2: ffff9a32d27abff8 This problem can happen if a RESYNC request is received before r_tid_ack is modified. This patch fixes the issue by making sure that r_tid_ack is set to a valid value before a TID RDMA ACK is built. Functions are defined to simplify the code. Fixes: `07b923701e` ("IB/hfi1: Add functions to receive TID RDMA WRITE request") Fixes: `7cf0ad679d` ("IB/hfi1: Add a function to receive TID RDMA RESYNC packet") Link: https://lore.kernel.org/r/20191025195830.106825.44022.stgit@awfm-01.aw.intel.com Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 13:15:35 -04:00
James Erwin	a9c3c4c597	IB/hfi1: Ensure full Gen3 speed in a Gen4 system If an hfi1 card is inserted in a Gen4 systems, the driver will avoid the gen3 speed bump and the card will operate at half speed. This is because the driver avoids the gen3 speed bump when the parent bus speed isn't identical to gen3, 8.0GT/s. This is not compatible with gen4 and newer speeds. Fix by relaxing the test to explicitly look for the lower capability speeds which inherently allows for gen4 and all future speeds. Fixes: `7724105686` ("IB/hfi1: add driver files") Link: https://lore.kernel.org/r/20191101192059.106248.1699.stgit@awfm-01.aw.intel.com Cc: <stable@vger.kernel.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: James Erwin <james.erwin@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 13:13:43 -04:00
Michal Kalderon	b4bc766097	RDMA/qedr: Add iWARP doorbell recovery support This patch adds the iWARP specific doorbells to the doorbell recovery mechanism. Link: https://lore.kernel.org/r/20191030094417.16866-9-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 13:08:01 -04:00
Michal Kalderon	97f6125092	RDMA/qedr: Add doorbell overflow recovery support Use the doorbell recovery mechanism to register rdma related doorbells that will be restored in case there is a doorbell overflow attention. Link: https://lore.kernel.org/r/20191030094417.16866-8-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 13:08:01 -04:00
Michal Kalderon	4c6bb02d59	RDMA/qedr: Use the common mmap API Remove all functions related to mmap from qedr and use the common API. Link: https://lore.kernel.org/r/20191030094417.16866-7-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 13:08:01 -04:00
Michal Kalderon	11f1a75567	RDMA/siw: Use the common mmap_xa helpers Remove the functions related to managing the mmap_xa database. This code is now common in ib_core. Link: https://lore.kernel.org/r/20191030094417.16866-6-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Tested-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 13:08:01 -04:00
Michal Kalderon	e84d3c184e	RDMA/efa: Use the common mmap_xa helpers Remove the functions related to managing the mmap_xa database. This code was replaced with common code in ib_core. Link: https://lore.kernel.org/r/20191030094417.16866-5-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 13:08:01 -04:00
Michal Kalderon	c043ff2cfb	RDMA: Connect between the mmap entry and the umap_priv structure The rdma_user_mmap_io interface created a common interface for drivers to correctly map hw resources and zap them once the ucontext is destroyed enabling the drivers to safely free the hw resources. However, this meant the drivers need to delay freeing the resource to the ucontext destroy phase to ensure they were no longer mapped. The new mechanism for a common way of handling user/driver address mapping enabled notifying the driver if all umap_priv mappings were removed, and enabled freeing the hw resources when they are done with and not delay it until ucontext destroy. Since not all drivers use the mechanism, NULL can be sent to the rdma_user_mmap_io interface to continue working as before. Drivers that use the mmap_xa interface can pass the entry being mapped to the rdma_user_mmap_io function to be linked together. Link: https://lore.kernel.org/r/20191030094417.16866-4-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 13:08:01 -04:00
Michal Kalderon	3411f9f01b	RDMA/core: Create mmap database and cookie helper functions Create some common API's for adding entries to a xa_mmap. Searching for an entry and freeing one. The general approach is copied from the EFA driver and improved to be more general and do more to help the drivers. Integration with the core allows a reference counted scheme with a free function so that the driver can know when its mmaps are all gone. This significant new functionality will be helpful for drivers to have the correct lifetime model for mmap objects. Link: https://lore.kernel.org/r/20191030094417.16866-3-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-06 13:08:00 -04:00
Michal Kalderon	b86deba977	RDMA/core: Move core content from ib_uverbs to ib_core Move functionality that is called by the driver, which is related to umap, to a new file that will be linked in ib_core. This is a first step in later enabling ib_uverbs to be optional. vm_ops is now initialized in ib_uverbs_mmap instead of priv_init to avoid having to move all the rdma_umap functions as well. Link: https://lore.kernel.org/r/20191030094417.16866-2-michal.kalderon@marvell.com Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-11-05 09:59:26 -04:00
Greg Kroah-Hartman	09b0965ee8	IB: mlx5: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Leon Romanovsky <leon@kernel.org> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: linux-rdma@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-11-04 08:38:58 +01:00
Parav Pandit	e53a9d26cf	IB/mlx5: Introduce and use mlx5_core_is_vf() Instead of deciding a given device is virtual function or not based on a device is PF or not, use already defined MLX5_COREDEV_VF by introducing an helper API mlx5_core_is_vf(). This enables to clearly identify PF, VF and non virtual functions. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-11-01 14:40:27 -07:00
Michael Guralnik	11f552e217	IB/mlx5: Test write combining support Linux can run in all sorts of physical machines and VMs where write combining may or may not be supported. Currently there is no way to reliably tell if the system supports WC, or not. The driver uses WC to optimize posting work to the HCA, and getting this wrong in either direction can cause a significant performance loss. Add a test in mlx5_ib initialization process to test whether write-combining is supported on the machine. The test will run as part of the enable_driver callback to ensure that the test runs after the device is setup and can create and modify the QP needed, but runs before the device is exposed to the users. The test opens UD QP and posts NOP WQEs, the WQE written to the BlueFlame is different from the WQE in memory, requesting CQE only on the BlueFlame WQE. By checking whether we received a completion on one of these WQEs we can know if BlueFlame succeeded and this write-combining must be supported. Change reporting of BlueFlame support to be dependent on write-combining support instead of the FW's guess as to what the machine can do. Link: https://lore.kernel.org/r/20191027062234.10993-1-leon@kernel.org Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-31 15:52:51 -03:00
Leon Romanovsky	546d30099e	RDMA/mlx5: Return proper error value Returned value from mlx5_mr_cache_alloc() is checked to be error or real pointer. Return proper error code instead of NULL which is not checked later. Fixes: `81713d3788` ("IB/mlx5: Add implicit MR support") Link: https://lore.kernel.org/r/20191029055721.7192-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-31 15:31:49 -03:00
Arnd Bergmann	d5b60e26e8	RDMA/hns: Fix build error again This is not the first attempt to fix building random configurations, unfortunately the attempt in commit `a07fc0bb48` ("RDMA/hns: Fix build error") caused a new problem when CONFIG_INFINIBAND_HNS_HIP06=m and CONFIG_INFINIBAND_HNS_HIP08=y: drivers/infiniband/hw/hns/hns_roce_main.o:(.rodata+0xe60): undefined reference to `__this_module' Revert commits `a07fc0bb48` ("RDMA/hns: Fix build error") and `a3e2d4c7e7` ("RDMA/hns: remove obsolete Kconfig comment") to get back to the previous state, then fix the issues described there differently, by adding more specific dependencies: INFINIBAND_HNS can now only be built-in if at least one of HNS or HNS3 are built-in, and the individual back-ends are only available if that code is reachable from the main driver. Fixes: `a07fc0bb48` ("RDMA/hns: Fix build error") Fixes: `a3e2d4c7e7` ("RDMA/hns: remove obsolete Kconfig comment") Fixes: `dd74282df5` ("RDMA/hns: Initialize the PCI device for hip08 RoCE") Fixes: `08805fdbeb` ("RDMA/hns: Split hw v1 driver from hns roce driver") Link: https://lore.kernel.org/r/20191007211826.3361202-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-29 16:16:54 -03:00
Jason Gunthorpe	bb3dba3300	Merge branch 'odp_rework' into rdma.git for-next Jason Gunthorpe says: ==================== In order to hoist the interval tree code out of the drivers and into the mmu_notifiers it is necessary for the drivers to not use the interval tree for other things. This series replaces the interval tree with an xarray and along the way re-aligns all the locking to use a sensible SRCU model where the 'update' step is done by modifying an xarray. The result is overall much simpler and with less locking in the critical path. Many functions were reworked for clarity and small details like using 'imr' to refer to the implicit MR make the entire code flow here more readable. This also squashes at least two race bugs on its own, and quite possibily more that haven't been identified. ==================== Merge conflicts with the odp statistics patch resolved. * branch 'odp_rework': RDMA/odp: Remove broken debugging call to invalidate_range RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray RDMA/mlx5: Rework implicit ODP destroy RDMA/mlx5: Avoid double lookups on the pagefault path RDMA/mlx5: Reduce locking in implicit_mr_get_data() RDMA/mlx5: Use an xarray for the children of an implicit ODP RDMA/mlx5: Split implicit handling from pagefault_mr RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it RDMA/mlx5: Rework implicit_mr_get_data RDMA/mlx5: Delete struct mlx5_priv->mkey_table RDMA/mlx5: Use a dedicated mkey xarray for ODP RDMA/mlx5: Split sig_err MR data into its own xarray RDMA/mlx5: Use SRCU properly in ODP prefetch Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 16:47:52 -03:00
Jason Gunthorpe	46870b2391	RDMA/odp: Remove broken debugging call to invalidate_range invalidate_range() also obtains the umem_mutex which is being held at this point, so if this path were was ever called it would deadlock. Thus conclude the debugging never triggers and rework it into a simple WARN_ON and leave things as they are. While here add a note to explain how we could possibly get inconsistent page pointers. Link: https://lore.kernel.org/r/20191009160934.3143-16-jgg@ziepe.ca Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 16:41:14 -03:00
Jason Gunthorpe	09689703d2	RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy For creation, as soon as the umem_odp is created the notifier can be called, however the underlying MR may not have been setup yet. This would cause problems if mlx5_ib_invalidate_range() runs. There is some confusing/ulocked/racy code that might by trying to solve this, but without locks it isn't going to work right. Instead trivially solve the problem by short-circuiting the invalidation if there are not yet any DMA mapped pages. By definition there is nothing to invalidate in this case. The create code will have the umem fully setup before anything is DMA mapped, and npages is fully locked by the umem_mutex. For destroy, invalidate the entire MR at the HW to stop DMA then DMA unmap the pages before destroying the MR. This drives npages to zero and prevents similar racing with invalidate while the MR is undergoing destruction. Arguably it would be better if the umem was created after the MR and destroyed before, but that would require a big rework of the MR code. Fixes: `6aec21f6a8` ("IB/mlx5: Page faults handling infrastructure") Link: https://lore.kernel.org/r/20191009160934.3143-15-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 16:41:14 -03:00
Jason Gunthorpe	d561987f34	RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray These mkeys are entirely internal and are never used by the HW for page fault. They should also never be used by userspace for prefetch. Simplify & optimize things by not including them in the xarray. Since the prefetch path can now never see a child mkey there is no need for the second synchronize_srcu() during imr destroy. Link: https://lore.kernel.org/r/20191009160934.3143-14-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 16:41:14 -03:00
Jason Gunthorpe	5256edcb98	RDMA/mlx5: Rework implicit ODP destroy Use SRCU in a sensible way by removing all MRs in the implicit tree from the two xarrays (the update operation), then a synchronize, followed by a normal single threaded teardown. This is only a little unusual from the normal pattern as there can still be some work pending in the unbound wq that may also require a workqueue flush. This is tracked with a single atomic, consolidating the redundant existing atomics and wait queue. For understand-ability the entire ODP implicit create/destroy flow now largely exists in a single pair of functions within odp.c, with a few support functions for tearing down an unused child. Link: https://lore.kernel.org/r/20191009160934.3143-13-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 16:41:14 -03:00
Jason Gunthorpe	b70d785d23	RDMA/mlx5: Avoid double lookups on the pagefault path Now that the locking is simplified combine pagefault_implicit_mr() with implicit_mr_get_data() so that we sweep over the idx range only once, and do the single xlt update at the end, after the child umems are setup. This avoids double iteration/xa_loads plus the sketchy failure path if the xa_load() fails. Link: https://lore.kernel.org/r/20191009160934.3143-12-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 16:41:14 -03:00
Jason Gunthorpe	3389baa831	RDMA/mlx5: Reduce locking in implicit_mr_get_data() Now that the child MRs are stored in an xarray we can rely on the SRCU lock to protect the xa_load and use xa_cmpxchg on the slow allocation path to resolve races with concurrent page fault. This reduces the scope of the critical section of umem_mutex for implicit MRs to only cover mlx5_ib_update_xlt, and avoids taking a lock at all if the child MR is already in the xarray. This makes it consistent with the normal ODP MR critical section for umem_lock, and the locking approach used for destroying an unusued implicit child MR. The MLX5_IB_UPD_XLT_ATOMIC is no longer needed in implicit_get_child_mr() since it is no longer called with any locks. Link: https://lore.kernel.org/r/20191009160934.3143-11-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 16:41:14 -03:00
Jason Gunthorpe	423f52d650	RDMA/mlx5: Use an xarray for the children of an implicit ODP Currently the child leaves are stored in the shared interval tree and every lookup for a child must be done under the interval tree rwsem. This is further complicated by dropping the rwsem during iteration (ie the odp_lookup(), odp_next() pattern), which requires a very tricky an difficult to understand locking scheme with SRCU. Instead reserve the interval tree for the exclusive use of the mmu notifier related code in umem_odp.c and give each implicit MR a xarray containing all the child MRs. Since the size of each child is 1GB of VA, a 1 level xarray will index 64G of VA, and a 2 level will index 2TB, making xarray a much better data structure choice than an interval tree. The locking properties of xarray will be used in the next patches to rework the implicit ODP locking scheme into something simpler. At this point, the xarray is locked by the implicit MR's umem_mutex, and read can also be locked by the odp_srcu. Link: https://lore.kernel.org/r/20191009160934.3143-10-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 16:41:14 -03:00
Jason Gunthorpe	54375e7382	RDMA/mlx5: Split implicit handling from pagefault_mr The single routine has a very confusing scheme to advance to the next child MR when working on an implicit parent. This scheme can only be used when working with an implicit parent and must not be triggered when working on a normal MR. Re-arrange things by directly putting all the single-MR stuff into one function and calling it in a loop for the implicit case. Simplify some of the error handling in the new pagefault_real_mr() to remove unneeded gotos. Link: https://lore.kernel.org/r/20191009160934.3143-9-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 16:41:13 -03:00
Jason Gunthorpe	9162420dde	RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree Instead of rewriting all the IOVA's to 0 as things progress down the tree make the IOVA of the children equal to placement in the tree. This makes things easier to understand by keeping mmkey.iova == HW configuration. Link: https://lore.kernel.org/r/20191009160934.3143-8-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 16:41:13 -03:00
Jason Gunthorpe	c2edcd6935	RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it This makes the routines easier to understand, particularly with respect the locking requirements of the entire sequence. The implicit_mr_alloc() had a lot of ifs specializing it to each of the callers, and only a very small amount of code was actually shared. Following patches will cause the flow in the two functions to diverge further. Link: https://lore.kernel.org/r/20191009160934.3143-7-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 16:41:13 -03:00
Jason Gunthorpe	3d5f3c54e7	RDMA/mlx5: Rework implicit_mr_get_data This function is intended to loop across each MTT chunk in the implicit parent that intersects the range [io_virt, io_virt+bnct). But it is has a confusing construction, so: - Consistently use imr and odp_imr to refer to the implicit parent to avoid confusion with the normal mr and odp of the child - Directly compute the inclusive start/end indexes by shifting. This is clearer to understand the intent and avoids any errors from unaligned values of addr - Iterate directly over the range of MTT indexes, do not make a loop out of goto - Follow 'success oriented flow', with goto error unwind - Directly calculate the range of idx's that need update_xlt - Ensure that any leaf MR added to the interval tree always results in an update to the XLT Link: https://lore.kernel.org/r/20191009160934.3143-6-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 16:41:13 -03:00
Jason Gunthorpe	74bddb3682	RDMA/mlx5: Delete struct mlx5_priv->mkey_table No users are left, delete it. Link: https://lore.kernel.org/r/20191009160934.3143-5-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 16:41:13 -03:00
Jason Gunthorpe	806b101b2b	RDMA/mlx5: Use a dedicated mkey xarray for ODP There is a per device xarray storing mkeys that is used to store every mkey in the system. However, this xarray is now only read by ODP for certain ODP designated MRs (ODP, implicit ODP, MW, DEVX_INDIRECT). Create an xarray only for use by ODP, that only contains ODP related MKeys. This xarray is protected by SRCU and all erases are protected by a synchronize. This improves performance: - All MRs in the odp_mkeys xarray are ODP MRs, so some tests for is_odp() can be deleted. The xarray will also consume fewer nodes. - normal MR's are never mixed with ODP MRs in a SRCU data structure so performance sucking synchronize_srcu() on every MR destruction is not needed. - No smp_load_acquire(live) and xa_load() double barrier on read Due to the SRCU locking scheme care must be taken with the placement of the xa_store(). Once it completes the MR is immediately visible to other threads and only through a xa_erase() & synchronize_srcu() cycle could it be destroyed. Link: https://lore.kernel.org/r/20191009160934.3143-4-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 16:41:13 -03:00
Jason Gunthorpe	50211ec944	RDMA/mlx5: Split sig_err MR data into its own xarray The locking model for signature is completely different than ODP, do not share the same xarray that relies on SRCU locking to support ODP. Simply store the active mlx5_core_sig_ctx's in an xarray when signature MRs are created and rely on trivial xarray locking to serialize everything. The overhead of storing only a handful of SIG related MRs is going to be much less than an xarray full of every mkey. Link: https://lore.kernel.org/r/20191009160934.3143-3-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 16:41:13 -03:00
Jason Gunthorpe	fb985e278a	RDMA/mlx5: Use SRCU properly in ODP prefetch When working with SRCU protected xarrays the xarray itself should be the SRCU 'update' point. Instead prefetch is using live as the SRCU update point and this prevents switching the locking design to use the xarray instead. To solve this the prefetch must only read from the xarray once, and hold on to the actual MR pointer for the duration of the async operation. Incrementing num_pending_prefetch delays destruction of the MR, so it is suitable. Prefetch calls directly to the pagefault_mr using the MR pointer and only does a single xarray lookup. All the testing if a MR is prefetchable or not is now done only in the prefetch code and removed from the pagefault critical path. Link: https://lore.kernel.org/r/20191009160934.3143-2-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 16:41:13 -03:00
Jason Gunthorpe	036313316d	Linux 5.4-rc5 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl210Z8eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGv+kIAKRpO7EuDokQL4qp hxEEaCMJA1T055EMlNU6FVAq/ZbmapzreUyNYiRMpPWKGTWNMkhIcZQfysYeGZz5 y/KRxAiVxlcB+3v3yRmoZd/XoQmhgvJmqD4zhaGI2Utonow4f/SGSEFFZqqs9WND 4HJROjZHgQ4JBxg9Z+QMo0FxbV/DCZpEOUq51N9WJywyyDRb18zotE83stpU+pE2 fjqT7mk0NLrnYXuDRAbFC1Aau9ed4H6LlwLmxaqxq/Pt5Rz7wIKwKL9HIT4Dm/0a qpani6phhHWL7MwUpa2wkEonFCD03rJFl3DUVJo64Ijh4up5D/jyXQ+GKV2P4WKJ 275Rb5Q= =WiZZ -----END PGP SIGNATURE----- Merge tag 'v5.4-rc5' into rdma.git for-next Linux 5.4-rc5 For dependencies in the next patches Conflict resolved by keeping the delete of the unlock. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 16:36:29 -03:00
Bryan Tan	a52dc3a100	RDMA/vmw_pvrdma: Use resource ids from physical device if available This change allows the RDMA stack to use physical resource numbers if they are passed up from the device. This is accomplished by separating the concept of the QP number from the QP handle. Previously, the two were the same, as the QP number was exposed to the guest and also used to reference a virtual QP in the device backend. With physical resource numbers exposed, the QP number given to the guest is the number assigned from the physical HCA's QP, while the QP handle is still the internal handle used to reference a virtual QP. Regardless of whether the device is exposing physical ids, the driver will still try to pick up the QP handle from the backend if possible. The MR keys exposed to the guest will also be the MR keys created by the physical HCA, instead of virtual MR keys. The distinction between handle and keys is already present for MRs so there is no need to do anything special here. A new version of the create QP response has been added to the device API to pass up the QP number and handle. The driver will also report these to userspace in the udata response if userspace supports it or not create the queuepair if not. I also had to do a refactor of the destroy qp code to reuse it if we fail to copy to userspace. Link: https://lore.kernel.org/r/20191028181444.19448-1-aditr@vmware.com Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Bryan Tan <bryantan@vmware.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 16:09:23 -03:00
Lijun Ou	b681a05299	RDMA/hns: Prevent memory leaks of eq->buf_list eq->buf_list->buf and eq->buf_list should also be freed when eqe_hop_num is set to 0, or there will be memory leaks. Fixes: `a5073d6054` ("RDMA/hns: Add eq support of hip08") Link: https://lore.kernel.org/r/1572072995-11277-3-git-send-email-liweihang@hisilicon.com Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 15:06:38 -03:00
Bart Van Assche	c9121262d5	RDMA/core: Set DMA parameters correctly The dma_set_max_seg_size() call in setup_dma_device() does not have any effect since device->dev.dma_parms is NULL. Fix this by initializing device->dev.dma_parms first. Link: https://lore.kernel.org/r/20191025225830.257535-5-bvanassche@acm.org Fixes: `d10bcf947a` ("RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 14:52:04 -03:00
Bart Van Assche	a401fb819c	RDMA/siw: Increase DMA max_segment_size parameter Increase the DMA max_segment_size parameter from 64 KB to 2 GB. Link: https://lore.kernel.org/r/20191025225830.257535-4-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 14:52:04 -03:00
Bart Van Assche	97458fd510	RDMA/rxe: Increase DMA max_segment_size parameter Increase the DMA max_segment_size parameter from 64 KB to 2 GB. Link: https://lore.kernel.org/r/20191025225830.257535-3-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 14:52:03 -03:00
Bernard Metzler	0edefddbae	RDMA/siw: Fix post_recv QP state locking Do not release qp state lock if not previously acquired. Fixes: `cf049bb31f` ("RDMA/siw: Fix SQ/RQ drain logic") Link: https://lore.kernel.org/r/20191025142903.20625-1-bmt@zurich.ibm.com Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 14:34:33 -03:00
Potnuri Bharat Teja	d4934f4569	RDMA/iw_cxgb4: Avoid freeing skb twice in arp failure case _put_ep_safe() and _put_pass_ep_safe() free the skb before it is freed by process_work(). fix double free by freeing the skb only in process_work(). Fixes: `1dad0ebeea` ("iw_cxgb4: Avoid touch after free error in ARP failure handlers") Link: https://lore.kernel.org/r/1572006880-5800-1-git-send-email-bharat@chelsio.com Signed-off-by: Dakshaja Uppalapati <dakshaja@chelsio.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 14:13:33 -03:00
Potnuri Bharat Teja	5212c3fda2	RDMA/iw_cxgb4: Report correct port speed/width Query speed/width from corresponding netdev. Link: https://lore.kernel.org/r/1572001022-4533-1-git-send-email-bharat@chelsio.com Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 14:11:46 -03:00
Jason Gunthorpe	1524b12a6e	RDMA/mlx5: Use irq xarray locking for mkey_table The mkey_table xarray is touched by the reg_mr_callback() function which is called from a hard irq. Thus all other uses of xa_lock must use the _irq variants. WARNING: inconsistent lock state 5.4.0-rc1 #12 Not tainted -------------------------------- inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. python3/343 [HC0[0]:SC0[0]:HE1:SE1] takes: ffff888182be1d40 (&(&xa->xa_lock)->rlock#3){?.-.}, at: xa_erase+0x12/0x30 {IN-HARDIRQ-W} state was registered at: lock_acquire+0xe1/0x200 _raw_spin_lock_irqsave+0x35/0x50 reg_mr_callback+0x2dd/0x450 [mlx5_ib] mlx5_cmd_exec_cb_handler+0x2c/0x70 [mlx5_core] mlx5_cmd_comp_handler+0x355/0x840 [mlx5_core] [..] Possible unsafe locking scenario: CPU0 ---- lock(&(&xa->xa_lock)->rlock#3); <Interrupt> lock(&(&xa->xa_lock)->rlock#3); * DEADLOCK * 2 locks held by python3/343: #0: ffff88818eb4bd38 (&uverbs_dev->disassociate_srcu){....}, at: ib_uverbs_ioctl+0xe5/0x1e0 [ib_uverbs] #1: ffff888176c76d38 (&file->hw_destroy_rwsem){++++}, at: uobj_destroy+0x2d/0x90 [ib_uverbs] stack backtrace: CPU: 3 PID: 343 Comm: python3 Not tainted 5.4.0-rc1 #12 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x86/0xca print_usage_bug.cold.50+0x2e5/0x355 mark_lock+0x871/0xb50 ? match_held_lock+0x20/0x250 ? check_usage_forwards+0x240/0x240 __lock_acquire+0x7de/0x23a0 ? __kasan_check_read+0x11/0x20 ? mark_lock+0xae/0xb50 ? mark_held_locks+0xb0/0xb0 ? find_held_lock+0xca/0xf0 lock_acquire+0xe1/0x200 ? xa_erase+0x12/0x30 _raw_spin_lock+0x2a/0x40 ? xa_erase+0x12/0x30 xa_erase+0x12/0x30 mlx5_ib_dealloc_mw+0x55/0xa0 [mlx5_ib] uverbs_dealloc_mw+0x3c/0x70 [ib_uverbs] uverbs_free_mw+0x1a/0x20 [ib_uverbs] destroy_hw_idr_uobject+0x49/0xa0 [ib_uverbs] [..] Fixes: `0417791536` ("RDMA/mlx5: Add missing synchronize_srcu() for MW cases") Link: https://lore.kernel.org/r/20191024234910.GA9038@ziepe.ca Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 14:08:48 -03:00
Michal Kalderon	24e412c1e0	RDMA/qedr: Fix memory leak in user qp and mr User QPs pbl's weren't freed properly. MR pbls weren't freed properly. Fixes: `e0290cce6a` ("qedr: Add support for memory registeration verbs") Link: https://lore.kernel.org/r/20191027200451.28187-5-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 14:01:36 -03:00
Michal Kalderon	82af6d19d8	RDMA/qedr: Fix synchronization methods and memory leaks in qedr Re-design of the iWARP CM related objects reference counting and synchronization methods, to ensure operations are synchronized correctly and that memory allocated for "ep" is properly released. Also makes sure QP memory is not released before ep is finished accessing it. Where as the QP object is created/destroyed by external operations, the ep is created/destroyed by internal operations and represents the tcp connection associated with the QP. QP destruction flow: - needs to wait for ep establishment to complete (either successfully or with error) - needs to wait for ep disconnect to be fully posted to avoid a race condition of disconnect being called after reset. - both the operations above don't always happen, so we use atomic flags to indicate whether the qp destruction flow needs to wait for these completions or not, if the destroy is called before these operations began, the flows will check the flags and not execute them ( connect / disconnect). We use completion structure for waiting for the completions mentioned above. The QP refcnt was modified to kref object. The EP has a kref added to it to handle additional worker thread accessing it. Memory Leaks - https://www.spinics.net/lists/linux-rdma/msg83762.html Concurrency not managed correctly - https://www.spinics.net/lists/linux-rdma/msg67949.html Fixes: `de0089e692` ("RDMA/qedr: Add iWARP connection management qp related callbacks") Link: https://lore.kernel.org/r/20191027200451.28187-4-michal.kalderon@marvell.com Reported-by: Chuck Lever <chuck.lever@oracle.com> Reported-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 14:01:35 -03:00
Michal Kalderon	5fdff18b4d	RDMA/qedr: Fix qpids xarray api used The qpids xarray isn't accessed from irq context and therefore there is no need to use the xa_XXX_irq version of the apis. Remove the _irq. Fixes: `b6014f9e5f` ("qedr: Convert qpidr to XArray") Link: https://lore.kernel.org/r/20191027200451.28187-3-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 14:01:35 -03:00
Michal Kalderon	73ab512f72	RDMA/qedr: Fix srqs xarray initialization There was a missing initialization for the srqs xarray. SRQs xarray can also be called from irq context when searching for an element and uses the xa_XXX_irq apis, therefore should be initialized with IRQ flags. Fixes: `9fd15987ed` ("qedr: Convert srqidr to XArray") Link: https://lore.kernel.org/r/20191027200451.28187-2-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 14:01:27 -03:00
Colin Ian King	994195e153	RDMA/hns: Fix memory leak on 'context' on error return path Currently, the error return path when the call to function dev->dfx->query_cqc_info fails will leak object 'context'. Fix this by making the error return path via 'err' return return codes rather than -EMSGSIZE, set ret appropriately for all error return paths and for the memory leak now return via 'err' rather than just returning without freeing context. Link: https://lore.kernel.org/r/20191024131034.19989-1-colin.king@canonical.com Addresses-Coverity: ("Resource leak") Fixes: `e1c9a0dc29` ("RDMA/hns: Dump detailed driver-specific CQ") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 13:41:23 -03:00
Yangyang Li	887803db86	RDMA/hns: Bugfix for qpc/cqc timer configuration qpc/cqc timer entry size needs one page, but currently they are fixedly configured to 4096, which is not appropriate in 64K page scenarios. So they should be modified to PAGE_SIZE. Fixes: `0e40dc2f70` ("RDMA/hns: Add timer allocation support for hip08") Link: https://lore.kernel.org/r/1571908917-16220-3-git-send-email-liweihang@hisilicon.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 13:33:06 -03:00
Lijun Ou	5c7e76fb7c	RDMA/hns: Fix to support 64K page for srq SRQ's page size configuration of BA and buffer should depend on current PAGE_SHIFT, or it can't work in scenario of 64K page. Fixes: `c7bcb13442` ("RDMA/hns: Add SRQ support for hip08 kernel mode") Link: https://lore.kernel.org/r/1571908917-16220-2-git-send-email-liweihang@hisilicon.com Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 13:33:06 -03:00
Bart Van Assche	79d81ef42c	RDMA/srpt: Fix TPG creation Unlike the iSCSI target driver, for the SRP target driver it is sufficient if a single TPG can be associated with each RDMA port name. However, users started associating multiple TPGs with RDMA port names. Support this by converting the single TPG in struct srpt_port_id into a list. This patch fixes the following list corruption issue: list_add corruption. prev->next should be next (ffffffffc0a080c0), but was ffffa08a994ce6f0. (prev=ffffa08a994ce6f0). WARNING: CPU: 2 PID: 2597 at lib/list_debug.c:28 __list_add_valid+0x6a/0x70 CPU: 2 PID: 2597 Comm: targetcli Not tainted 5.4.0-rc1.3bfa3c9602a7 #1 RIP: 0010:__list_add_valid+0x6a/0x70 Call Trace: core_tpg_register+0x116/0x200 [target_core_mod] srpt_make_tpg+0x3f/0x60 [ib_srpt] target_fabric_make_tpg+0x41/0x290 [target_core_mod] configfs_mkdir+0x158/0x3e0 vfs_mkdir+0x108/0x1a0 do_mkdirat+0x77/0xe0 do_syscall_64+0x55/0x1d0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Link: https://lore.kernel.org/r/20191023204106.23326-1-bvanassche@acm.org Reported-by: Honggang LI <honli@redhat.com> Fixes: `a42d985bd5` ("ib_srpt: Initial SRP Target merge for v3.3-rc1") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Honggang Li <honli@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 13:30:00 -03:00
Leon Romanovsky	f9e66db143	RDMA/hns: Delete BITS_PER_BYTE redefinition HNS redefined available in bits.h define and didn't use it, we can safely delete it. Link: https://lore.kernel.org/r/20191023054239.31648-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 13:28:24 -03:00
Jason Gunthorpe	515f60004e	RDMA/hns: Prevent undefined behavior in hns_roce_set_user_sq_size() The "ucmd->log_sq_bb_count" variable is a user controlled variable in the 0-255 range. If we shift more than then number of bits in an int then it's undefined behavior (it shift wraps), and potentially the int could become negative. Fixes: `9a4435375c` ("IB/hns: Add driver files for hns RoCE driver") Link: https://lore.kernel.org/r/20190608092514.GC28890@mwanda Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>	2019-10-28 11:20:34 -03:00
Leon Romanovsky	24f5214923	RDMA/cm: Update copyright together with SPDX tag Add Mellanox to lust of copyright holders and replace copyright boilerplate with relevant SPDX tag. Link: https://lore.kernel.org/r/20191020071559.9743-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 10:15:11 -03:00
Leon Romanovsky	a916051191	RDMA/cm: Use specific keyword to check define There is a specific define keyword to check if define exists or not, let's use it instead of open-coded variant. Link: https://lore.kernel.org/r/20191020071559.9743-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 10:15:10 -03:00
Leon Romanovsky	8d625101a7	RDMA/cm: Delete unused cm_is_active_peer function Function cm_is_active_peer is not used, delete it. Link: https://lore.kernel.org/r/20191020071559.9743-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-28 10:15:10 -03:00
Parav Pandit	549af00833	IB/core: Avoid deadlock during netlink message handling When rdmacm module is not loaded, and when netlink message is received to get char device info, it results into a deadlock due to recursive locking of rdma_nl_mutex with the below call sequence. [..] rdma_nl_rcv() mutex_lock() [..] rdma_nl_rcv_msg() ib_get_client_nl_info() request_module() iw_cm_init() rdma_nl_register() mutex_lock(); <- Deadlock, acquiring mutex again Due to above call sequence, following call trace and deadlock is observed. kernel: __mutex_lock+0x35e/0x860 kernel: ? __mutex_lock+0x129/0x860 kernel: ? rdma_nl_register+0x1a/0x90 [ib_core] kernel: rdma_nl_register+0x1a/0x90 [ib_core] kernel: ? 0xffffffffc029b000 kernel: iw_cm_init+0x34/0x1000 [iw_cm] kernel: do_one_initcall+0x67/0x2d4 kernel: ? kmem_cache_alloc_trace+0x1ec/0x2a0 kernel: do_init_module+0x5a/0x223 kernel: load_module+0x1998/0x1e10 kernel: ? __symbol_put+0x60/0x60 kernel: __do_sys_finit_module+0x94/0xe0 kernel: do_syscall_64+0x5a/0x270 kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe process stack trace: [<0>] __request_module+0x1c9/0x460 [<0>] ib_get_client_nl_info+0x5e/0xb0 [ib_core] [<0>] nldev_get_chardev+0x1ac/0x320 [ib_core] [<0>] rdma_nl_rcv_msg+0xeb/0x1d0 [ib_core] [<0>] rdma_nl_rcv+0xcd/0x120 [ib_core] [<0>] netlink_unicast+0x179/0x220 [<0>] netlink_sendmsg+0x2f6/0x3f0 [<0>] sock_sendmsg+0x30/0x40 [<0>] ___sys_sendmsg+0x27a/0x290 [<0>] __sys_sendmsg+0x58/0xa0 [<0>] do_syscall_64+0x5a/0x270 [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe To overcome this deadlock and to allow multiple netlink messages to progress in parallel, following scheme is implemented. 1. Split the lock protecting the cb_table into a per-index lock, and make it a rwlock. This lock is used to ensure no callbacks are running after unregistration returns. Since a module will not be registered once it is already running callbacks, this avoids the deadlock. 2. Use smp_store_release() to update the cb_table during registration so that no lock is required. This avoids lockdep problems with thinking all the rwsems are the same lock class. Fixes: `0e2d00eb6f` ("RDMA: Add NLDEV_GET_CHARDEV to allow char dev discovery and autoload") Link: https://lore.kernel.org/r/20191015080733.18625-1-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-24 20:49:37 -03:00
Mark Zhang	a15542bb72	RDMA/nldev: Skip counter if port doesn't match The counter resource should return -EAGAIN if it was requested for a different port, this is similar to how QP works if the users provides a port filter. Otherwise port filtering in netlink will return broken counter nests. Fixes: `c4ffee7c9b` ("RDMA/netlink: Implement counter dumpit calback") Link: https://lore.kernel.org/r/20191020062800.8065-1-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-24 15:31:23 -03:00
Leon Romanovsky	dc2f7edcc0	RDMA/rxe: Remove useless rxe_init_device_param assignments IB devices are allocated with kzalloc and don't need explicit zero assignments for their parameters. It can be removed safely. Link: https://lore.kernel.org/r/20191020055724.7410-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-24 15:20:18 -03:00
Leon Romanovsky	ac71ffcfb4	RDMA/core: Check that process is still alive before sending it to the users The PID information can disappear asynchronously because the task can be killed and moved to zombie state. In this case, PID will be zero in similar way to the kernel tasks. Recognize such situation where we are asking to return orphaned object and simply skip filling PID attribute. As part of this change, document the same scenario in counter.c code. Link: https://lore.kernel.org/r/20191010071105.25538-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-23 16:02:12 -03:00
Leon Romanovsky	cf7e93c12f	RDMA/restrack: Remove PID namespace support IB resources are bounded to IB device and file descriptors, both entities are unaware to PID namespaces and to task lifetime. The difference in model caused to unpredictable behavior for the following scenario: 1. Create FD and context 2. Share it with ephemeral child 3. Create any object and exit that child The end result of this flow, that those newly created objects will be tracked by restrack, but won't be visible for users because task_struct associated with them already exited. The right thing is to rely on net namespace only for any filtering purposes and drop PID namespace. Link: https://lore.kernel.org/r/20191010071105.25538-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-23 15:58:31 -03:00
Arnd Bergmann	1832f2d8ff	compat_ioctl: move more drivers to compat_ptr_ioctl The .ioctl and .compat_ioctl file operations have the same prototype so they can both point to the same function, which works great almost all the time when all the commands are compatible. One exception is the s390 architecture, where a compat pointer is only 31 bit wide, and converting it into a 64-bit pointer requires calling compat_ptr(). Most drivers here will never run in s390, but since we now have a generic helper for it, it's easy enough to use it consistently. I double-checked all these drivers to ensure that all ioctl arguments are used as pointers or are ignored, but are not interpreted as integer values. Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: David Sterba <dsterba@suse.com> Acked-by: Darren Hart (VMware) <dvhart@infradead.org> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2019-10-23 17:23:44 +02:00
Parav Pandit	c4c8aff5a9	IB/core: Do not notify GID change event of an unregistered device When IB device is undergoing unregistration, the GID cache is always cleaned up after all clients are unregistered with the below flow. __ib_unregister_device() disable_device() ib_cache_cleanup_one() gid_table_cleanup_one() cleanup_gid_table_port() There is no use in generating a GID change event at this stage, where there is no active client of the device and device is nearly unregistered. Link: https://lore.kernel.org/r/20191020065427.8772-4-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 16:58:22 -03:00
Michael Guralnik	3f89b01f4b	IB/mlx5: Align usage of QP1 create flags with rest of mlx5 defines There is little value in keeping separate function for one flag, provide it directly like any other mlx5 define. Link: https://lore.kernel.org/r/20191020064400.8344-2-leon@kernel.org Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 16:39:49 -03:00
Ran Rozenstein	68abaa765e	IB/mlx5: Remove dead code mlx5_ib_dc_atomic_is_supported function is not used anywhere. Remove the dead code. Fixes: `a60109dc9a` ("IB/mlx5: Add support for extended atomic operations") Link: https://lore.kernel.org/r/20191020064454.8551-1-leon@kernel.org Signed-off-by: Ran Rozenstein <ranro@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 16:37:48 -03:00
Chuhong Yuan	a29e1012c1	RDMA/uverbs: Add a check for uverbs_attr_get to uverbs_copy_to_struct_or_zero All current callers for uverbs_copy_to_struct_or_zero() already check that the attribute exists, but it make sense to verify the result like the other functions do. Link: https://lore.kernel.org/r/20191018081533.8544-1-hslester96@gmail.com Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 16:10:03 -03:00
Parav Pandit	d3bd939670	IB/cma: Honor traffic class from lower netdevice for RoCE When a macvlan netdevice is used for RoCE, consider the tos->prio->tc mapping as SL using its lower netdevice. 1. If the lower netdevice is a VLAN netdevice, consider the VLAN netdevice and it's parent netdevice for mapping 2. If the lower netdevice is not a VLAN netdevice, consider tc mapping directly from the lower netdevice Link: https://lore.kernel.org/r/20191015072058.17347-1-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 15:56:22 -03:00
Erez Alfasi	4061ff7aa3	RDMA/nldev: Provide MR statistics Add RDMA nldev netlink interface for dumping MR statistics information. Output example: $ ./ibv_rc_pingpong -o -P -s 500000000 local address: LID 0x0001, QPN 0x00008a, PSN 0xf81096, GID :: $ rdma stat show mr dev mlx5_0 mrn 2 page_faults 122071 page_invalidations 0 Link: https://lore.kernel.org/r/20191016062308.11886-5-leon@kernel.org Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 15:33:31 -03:00
Erez Alfasi	e1b95ae0b0	RDMA/mlx5: Return ODP type per MR Provide an ODP explicit/implicit type as part of 'rdma -dd resource show mr' dump. For example: $ rdma -dd resource show mr dev mlx5_0 mrn 1 rkey 0xa99a lkey 0xa99a mrlen 50000000 pdn 9 pid 7372 comm ibv_rc_pingpong drv_odp explicit For non-ODP MRs, we won't print "drv_odp ..." at all. Link: https://lore.kernel.org/r/20191016062308.11886-4-leon@kernel.org Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 15:22:47 -03:00
Erez Alfasi	fb91069088	RDMA/nldev: Allow different fill function per resource So far res_get_common_{dumpit, doit} was using the default resource fill function which was defined as part of the nldev_fill_res_entry fill_entries. Add a fill function pointer as an argument allows us to use different fill function in case we want to dump different values then 'rdma resource' flow do, but still use the same existing general resources dumping flow. Link: https://lore.kernel.org/r/20191016062308.11886-3-leon@kernel.org Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 15:22:47 -03:00
Erez Alfasi	a3de94e3d6	IB/mlx5: Introduce ODP diagnostic counters Introduce ODP diagnostic counters and count the following per MR within IB/mlx5 driver: 1) Page faults: Total number of faulted pages. 2) Page invalidations: Total number of pages invalidated by the OS during all invalidation events. The translations can be no longer valid due to either non-present pages or mapping changes. Link: https://lore.kernel.org/r/20191016062308.11886-2-leon@kernel.org Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 15:22:47 -03:00
Dan Carpenter	a9018adfde	RDMA/uverbs: Prevent potential underflow The issue is in drivers/infiniband/core/uverbs_std_types_cq.c in the UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE) function. We check that: if (attr.comp_vector >= attrs->ufile->device->num_comp_vectors) { But we don't check if "attr.comp_vector" is negative. It could potentially lead to an array underflow. My concern would be where cq->vector is used in the create_cq() function from the cxgb4 driver. And really "attr.comp_vector" is appears as a u32 to user space so that's the right type to use. Fixes: `9ee79fce36` ("IB/core: Add completion queue (cq) object actions") Link: https://lore.kernel.org/r/20191011133419.GA22905@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 15:05:36 -03:00
rd.dunlab@gmail.com	7c21072dde	infiniband: fix sw/rdmavt/ kernel-doc notation Add kernel-doc for missing function parameters. Remove excess kernel-doc descriptions. Fix expected kernel-doc formatting (use ':' instead of '-' after @funcarg). ../drivers/infiniband/sw/rdmavt/ah.c:138: warning: Excess function parameter 'udata' description in 'rvt_destroy_ah' ../drivers/infiniband/sw/rdmavt/vt.c:698: warning: Function parameter or member 'pkey_table' not described in 'rvt_init_port' ../drivers/infiniband/sw/rdmavt/cq.c:561: warning: Excess function parameter 'rdi' description in 'rvt_driver_cq_init' ../drivers/infiniband/sw/rdmavt/cq.c:575: warning: Excess function parameter 'rdi' description in 'rvt_cq_exit' ../drivers/infiniband/sw/rdmavt/qp.c:2573: warning: Function parameter or member 'qp' not described in 'rvt_add_rnr_timer' ../drivers/infiniband/sw/rdmavt/qp.c:2573: warning: Function parameter or member 'aeth' not described in 'rvt_add_rnr_timer' ../drivers/infiniband/sw/rdmavt/qp.c:2591: warning: Function parameter or member 'qp' not described in 'rvt_stop_rc_timers' ../drivers/infiniband/sw/rdmavt/qp.c:2624: warning: Function parameter or member 'qp' not described in 'rvt_del_timers_sync' ../drivers/infiniband/sw/rdmavt/qp.c:2697: warning: Function parameter or member 'cb' not described in 'rvt_qp_iter_init' ../drivers/infiniband/sw/rdmavt/qp.c:2728: warning: Function parameter or member 'iter' not described in 'rvt_qp_iter_next' ../drivers/infiniband/sw/rdmavt/qp.c:2796: warning: Function parameter or member 'rdi' not described in 'rvt_qp_iter' ../drivers/infiniband/sw/rdmavt/qp.c:2796: warning: Function parameter or member 'v' not described in 'rvt_qp_iter' ../drivers/infiniband/sw/rdmavt/qp.c:2796: warning: Function parameter or member 'cb' not described in 'rvt_qp_iter' Link: https://lore.kernel.org/r/20191010035240.251184229@gmail.com Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 14:52:56 -03:00
rd.dunlab@gmail.com	d6537c1a9c	infiniband: fix core/ kernel-doc notation Correct function parameter names (typos or renames). Add kernel-doc notation for missing function parameters. ../drivers/infiniband/core/sa_query.c:1263: warning: Function parameter or member 'gid_attr' not described in 'ib_init_ah_attr_from_path' ../drivers/infiniband/core/sa_query.c:1263: warning: Excess function parameter 'sgid_attr' description in 'ib_init_ah_attr_from_path' ../drivers/infiniband/core/device.c:145: warning: Function parameter or member 'dev' not described in 'rdma_dev_access_netns' ../drivers/infiniband/core/device.c:145: warning: Excess function parameter 'device' description in 'rdma_dev_access_netns' ../drivers/infiniband/core/device.c:1333: warning: Function parameter or member 'name' not described in 'ib_register_device' ../drivers/infiniband/core/device.c:1461: warning: Function parameter or member 'ib_dev' not described in 'ib_unregister_device' ../drivers/infiniband/core/device.c:1461: warning: Excess function parameter 'device' description in 'ib_unregister_device' ../drivers/infiniband/core/device.c:1483: warning: Function parameter or member 'ib_dev' not described in 'ib_unregister_device_and_put' ../drivers/infiniband/core/device.c:1550: warning: Function parameter or member 'ib_dev' not described in 'ib_unregister_device_queued' Link: https://lore.kernel.org/r/20191010035240.191542461@gmail.com Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 14:52:56 -03:00
rd.dunlab@gmail.com	b24da1a0d4	infiniband: fix ulp/iser/iser_initiator.c kernel-doc warnings Add kernel-doc notation for missing function parameters: ../drivers/infiniband/ulp/iser/iser_initiator.c:365: warning: Function parameter or member 'conn' not described in 'iser_send_command' ../drivers/infiniband/ulp/iser/iser_initiator.c:365: warning: Function parameter or member 'task' not described in 'iser_send_command' ../drivers/infiniband/ulp/iser/iser_initiator.c:437: warning: Function parameter or member 'conn' not described in 'iser_send_data_out' ../drivers/infiniband/ulp/iser/iser_initiator.c:437: warning: Function parameter or member 'task' not described in 'iser_send_data_out' ../drivers/infiniband/ulp/iser/iser_initiator.c:437: warning: Function parameter or member 'hdr' not described in 'iser_send_data_out' Link: https://lore.kernel.org/r/20191010035240.132033937@gmail.com Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 14:52:56 -03:00
rd.dunlab@gmail.com	134a42a66b	infiniband: fix ulp/iser/iser_verbs.c kernel-doc notation Various kernel-doc fixes: - fix typos - don't use /** for internal structs or functions - fix Return: kernel-doc formatting - add kernel-doc notation for missing function parameters ../drivers/infiniband/ulp/iser/iser_verbs.c:159: warning: Function parameter or member 'ib_conn' not described in 'iser_alloc_fmr_pool' ../drivers/infiniband/ulp/iser/iser_verbs.c:159: warning: Function parameter or member 'cmds_max' not described in 'iser_alloc_fmr_pool' ../drivers/infiniband/ulp/iser/iser_verbs.c:159: warning: Function parameter or member 'size' not described in 'iser_alloc_fmr_pool' ../drivers/infiniband/ulp/iser/iser_verbs.c:221: warning: Function parameter or member 'ib_conn' not described in 'iser_free_fmr_pool' ../drivers/infiniband/ulp/iser/iser_verbs.c:304: warning: Function parameter or member 'ib_conn' not described in 'iser_alloc_fastreg_pool' ../drivers/infiniband/ulp/iser/iser_verbs.c:304: warning: Function parameter or member 'cmds_max' not described in 'iser_alloc_fastreg_pool' ../drivers/infiniband/ulp/iser/iser_verbs.c:304: warning: Function parameter or member 'size' not described in 'iser_alloc_fastreg_pool' ../drivers/infiniband/ulp/iser/iser_verbs.c:338: warning: Function parameter or member 'ib_conn' not described in 'iser_free_fastreg_pool' ../drivers/infiniband/ulp/iser/iser_verbs.c:568: warning: Function parameter or member 'iser_conn' not described in 'iser_conn_release' ../drivers/infiniband/ulp/iser/iser_verbs.c:603: warning: Function parameter or member 'iser_conn' not described in 'iser_conn_terminate' ../drivers/infiniband/ulp/iser/iser_verbs.c:1040: warning: Function parameter or member 'signal' not described in 'iser_post_send' ../drivers/infiniband/ulp/iser/iser_verbs.c:1040: warning: Function parameter or member 'ib_conn' not described in 'iser_post_send' ../drivers/infiniband/ulp/iser/iser_verbs.c:1040: warning: Function parameter or member 'tx_desc' not described in 'iser_post_send' Link: https://lore.kernel.org/r/20191010035240.070520193@gmail.com Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 14:52:56 -03:00
rd.dunlab@gmail.com	094c88f3c5	infiniband: fix core/verbs.c kernel-doc notation Add missing function parameter descriptions: ../drivers/infiniband/core/verbs.c:257: warning: Function parameter or member 'flags' not described in '__ib_alloc_pd' ../drivers/infiniband/core/verbs.c:257: warning: Function parameter or member 'caller' not described in '__ib_alloc_pd' Link: https://lore.kernel.org/r/20191010035240.011497492@gmail.com Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 14:52:56 -03:00
rd.dunlab@gmail.com	96f4b0b68d	infiniband: fix ulp/srpt/ib_srpt.h kernel-doc notation Fix kernel-doc warnings (typos or renames) in ib_srpt.h: ../drivers/infiniband/ulp/srpt/ib_srpt.h:419: warning: Function parameter or member 'port_guid_id' not described in 'srpt_port' ../drivers/infiniband/ulp/srpt/ib_srpt.h:419: warning: Function parameter or member 'port_gid_id' not described in 'srpt_port' Link: https://lore.kernel.org/r/20191010035239.950150496@gmail.com Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 14:52:56 -03:00
rd.dunlab@gmail.com	dfa4344da3	infiniband: fix ulp/opa_vnic/opa_vnic_internal.h kernel-doc notation Remove kernel-doc notation on 4 structs since they are internal and none of the struct fields/members are described. This removes 45 kernel-doc warnings. Link: https://lore.kernel.org/r/20191010035239.818405496@gmail.com Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 14:45:31 -03:00
rd.dunlab@gmail.com	28f2a6aeed	infiniband: fix ulp/iser/iscsi_iser.h kernel-doc warnings Fix kernel-doc warnings and typos/spellos. ../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'dma_addr' not described in 'iser_tx_desc' ../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'cqe' not described in 'iser_tx_desc' ../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'reg_wr' not described in 'iser_tx_desc' ../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'send_wr' not described in 'iser_tx_desc' ../drivers/infiniband/ulp/iser/iscsi_iser.h:254: warning: Function parameter or member 'inv_wr' not described in 'iser_tx_desc' ../drivers/infiniband/ulp/iser/iscsi_iser.h:277: warning: Function parameter or member 'cqe' not described in 'iser_rx_desc' ../drivers/infiniband/ulp/iser/iscsi_iser.h:296: warning: Function parameter or member 'rsp' not described in 'iser_login_desc' ../drivers/infiniband/ulp/iser/iscsi_iser.h:339: warning: Function parameter or member 'reg_mem' not described in 'iser_reg_ops' ../drivers/infiniband/ulp/iser/iscsi_iser.h:399: warning: Function parameter or member 'all_list' not described in 'iser_fr_desc' ../drivers/infiniband/ulp/iser/iscsi_iser.h:413: warning: Function parameter or member 'all_list' not described in 'iser_fr_pool' ../drivers/infiniband/ulp/iser/iscsi_iser.h:439: warning: Function parameter or member 'reg_cqe' not described in 'ib_conn' ../drivers/infiniband/ulp/iser/iscsi_iser.h:491: warning: Function parameter or member 'snd_w_inv' not described in 'iser_conn' This leaves 2 "member not described" warnings that I don't know how to fix: ../drivers/infiniband/ulp/iser/iscsi_iser.h:401: warning: Function parameter or member 'all_list' not described in 'iser_fr_desc' ../drivers/infiniband/ulp/iser/iscsi_iser.h:415: warning: Function parameter or member 'all_list' not described in 'iser_fr_pool' Link: https://lore.kernel.org/r/20191010035239.756365352@gmail.com Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 14:45:31 -03:00
rd.dunlab@gmail.com	526f2c5063	infiniband: fix core/ipwm_util.h kernel-doc warnings Fix kernel-doc warnings and expected formatting. ../drivers/infiniband/core/iwpm_util.h:219: warning: Function parameter or member 'a_sockaddr' not described in 'iwpm_compare_sockaddr' ../drivers/infiniband/core/iwpm_util.h:219: warning: Function parameter or member 'b_sockaddr' not described in 'iwpm_compare_sockaddr' ../drivers/infiniband/core/iwpm_util.h:280: warning: Function parameter or member 'iwpm_pid' not described in 'iwpm_send_hello' Link: https://lore.kernel.org/r/20191010035239.695604406@gmail.com Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 14:45:31 -03:00
rd.dunlab@gmail.com	df130f878e	infiniband: fix ulp/iser/iscsi_iser.[hc] kernel-doc notation Fix struct name in kernel-doc notation to match the struct name below it. Fix one typo (spello). Fix formatting as expected for kernel-doc notation. Fix parameter name to match the function's parameter name to eliminate a kernel-doc warning. ../drivers/infiniband/ulp/iser/iscsi_iser.c:815: warning: Function parameter or member 'non_blocking' not described in 'iscsi_iser_ep_connect' Link: https://lore.kernel.org/r/20191010035239.623888112@gmail.com Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 14:45:31 -03:00
Jason Gunthorpe	a2aca4d7f0	Merge branch 'mlx5-rd-sgl' into rdma.git for-next From Yamin Friedman: ==================== This series from Yamin implements long standing "TODO" existed in rw.c. It allows the driver to specify a cut-over point where it is faster to build a lkey MR rather than do a large SGL for RDMA READ operations. mlx5 HW gets a notable performane boost by switching to MRs. ==================== Based on the mlx5-next branch from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for dependencies * branch 'mlx5-rd-sgl': (3 commits) RDMA/mlx5: Add capability for max sge to get optimized performance RDMA/rw: Support threshold for registration vs scattering to local pages net/mlx5: Expose optimal performance scatter entries capability	2019-10-22 14:27:25 -03:00
Yamin Friedman	366090564b	RDMA/mlx5: Add capability for max sge to get optimized performance Allows the IB device to provide a value of maximum scatter gather entries per RDMA READ. In certain cases it may be preferable for a device to perform UMR memory registration rather than have many scatter entries in a single RDMA READ. This provides a significant performance increase in devices capable of using different memory registration schemes based on the number of scatter gather entries. This general capability allows each device vendor to fine tune when it is better to use memory registration. Link: https://lore.kernel.org/r/20191007135933.12483-4-leon@kernel.org Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 14:27:00 -03:00
Yamin Friedman	00bd1439f4	RDMA/rw: Support threshold for registration vs scattering to local pages If there are more scatter entries than the recommended limit provided by the ib device, UMR registration is used. This will provide optimal performance when performing large RDMA READs over devices that advertise the threshold capability. With ConnectX-5 running NVMeoF RDMA with FIO single QP 128KB writes: Without use of cap: 70Gb/sec With use of cap: 84Gb/sec Link: https://lore.kernel.org/r/20191007135933.12483-3-leon@kernel.org Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 14:26:52 -03:00
Bernard Metzler	cf049bb31f	RDMA/siw: Fix SQ/RQ drain logic Storage ULPs (e.g. iSER & NVMeOF) use ib_drain_qp() to drain QP/CQ. Current SIW's own drain routines do not properly wait until all SQ/RQ elements are completed and reaped from the CQ. This may cause touch after free issues. New logic relies on generic __ib_drain_sq()/__ib_drain_rq() posting a final work request, which SIW immediately flushes to CQ. Fixes: `303ae1cdfd` ("rdma/siw: application interface") Link: https://lore.kernel.org/r/20191004125356.20673-1-bmt@zurich.ibm.com Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-22 13:43:10 -03:00
Donald Dutile	5a0d523781	ib/srp: Add missing new line after displaying fast_io_fail_tmo param Long-time missing new-line in sysfs output. Simply add it. Signed-off-by: Donald Dutile <ddutile@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20191009164937.21989-1-ddutile@redhat.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-10-21 16:55:16 -04:00
Yangyang Li	d302c6e3a6	RDMA/hns: Release qp resources when failed to destroy qp Even if no response from hardware, we should make sure that qp related resources are released to avoid memory leaks. Fixes: `926a01dc00` ("RDMA/hns: Add QP operations support for hip08 SoC") Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Link: https://lore.kernel.org/r/1570584110-3659-1-git-send-email-liweihang@hisilicon.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-10-21 15:45:22 -04:00
Yixing Liu	3dcad1f842	RDMA/hns: Fix a spelling mistake in a macro HNS_ROCE_ALOGN_UP should be HNS_ROCE_ALIGN_UP, this patch fix it. Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Link: https://lore.kernel.org/r/1567566885-23088-6-git-send-email-liweihang@hisilicon.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-10-21 15:29:38 -04:00
Lang Cheng	cfd82da4e7	RDMA/hns: Modify return value of restrack functions The restrack function return EINVAL instead of EMSGSIZE when the driver operation fails. Fixes: `4b42d05d0b` ("RDMA/hns: Remove unnecessary kzalloc") Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Link: https://lore.kernel.org/r/1567566885-23088-5-git-send-email-liweihang@hisilicon.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-10-21 15:29:38 -04:00
Weihang Li	32883228b9	RDMA/hns: Modify variable/field name from vlan to vlan_id The name of vlan and vlan_tag is not clear enough, it's actually means vlan id. Signed-off-by: Weihang Li <liweihang@hisilicon.com> Link: https://lore.kernel.org/r/1567566885-23088-4-git-send-email-liweihang@hisilicon.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-10-21 15:29:38 -04:00
Weihang Li	e8a07de57e	RDMA/hns: Fix wrong parameters when initial mtt of srq->idx_que The parameters npages used to initial mtt of srq->idx_que shouldn't be same with srq's. And page_shift should be calculated from idx_buf_pg_sz. This patch fixes above issues and use field named npage and page_shift in hns_roce_buf instead of two temporary variables to let us use them anywhere. Fixes: `18df508c79` ("RDMA/hns: Remove if-else judgment statements for creating srq") Signed-off-by: Weihang Li <liweihang@hisilicon.com> Link: https://lore.kernel.org/r/1567566885-23088-3-git-send-email-liweihang@hisilicon.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-10-21 15:29:37 -04:00
Weihang Li	9f7d706400	RDMA/hns: remove a redundant le16_to_cpu Type of ah->av.vlan is u16, there will be a problem using le16_to_cpu on it. Fixes: `82e620d9c3` ("RDMA/hns: Modify the data structure of hns_roce_av") Signed-off-by: Weihang Li <liweihang@hisilicon.com> Link: https://lore.kernel.org/r/1567566885-23088-2-git-send-email-liweihang@hisilicon.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-10-21 15:29:37 -04:00
Parav Pandit	777a8b32bc	IB/core: Use rdma_read_gid_l2_fields to compare GID L2 fields Current code tries to derive VLAN ID and compares it with GID attribute for matching entry. This raw search fails on macvlan netdevice as its not a VLAN device, but its an upper device of a VLAN netdevice. Due to this limitation, incoming QP1 packets fail to match in the GID table. Such packets are dropped. Hence, to support it, use the existing rdma_read_gid_l2_fields() that takes care of diffferent device types. Fixes: `dbf727de74` ("IB/core: Use GID table in AH creation and dmac resolution") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191002121750.17313-1-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-10-18 14:55:33 -04:00
Kamal Heib	b806c94ee4	RDMA/qedr: Fix reported firmware version Remove spaces from the reported firmware version string. Actual value: $ cat /sys/class/infiniband/qedr0/fw_ver 8. 37. 7. 0 Expected value: $ cat /sys/class/infiniband/qedr0/fw_ver 8.37.7.0 Fixes: `ec72fce401` ("qedr: Add support for RoCE HW init") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Link: https://lore.kernel.org/r/20191007210730.7173-1-kamalheib1@gmail.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-10-18 14:50:14 -04:00
Krishnamraju Eraparaju	e17fa5c95e	RDMA/siw: free siw_base_qp in kref release routine As siw_free_qp() is the last routine to access 'siw_base_qp' structure, freeing this structure early in siw_destroy_qp() could cause touch-after-free issue. Hence, moved kfree(siw_base_qp) from siw_destroy_qp() to siw_free_qp(). Fixes: `303ae1cdfd` ("rdma/siw: application interface") Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Link: https://lore.kernel.org/r/20191007104229.29412-1-krishna2@chelsio.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-10-18 14:49:01 -04:00
Krishnamraju Eraparaju	54102dd410	RDMA/iwcm: move iw_rem_ref() calls out of spinlock kref release routines usually perform memory release operations, hence, they should not be called with spinlocks held. one such case is: SIW kref release routine siw_free_qp(), which can sleep via vfree() while freeing queue memory. Hence, all iw_rem_ref() calls in IWCM are moved out of spinlocks. Fixes: `922a8e9fb2` ("RDMA: iWARP Connection Manager.") Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20191007102627.12568-1-krishna2@chelsio.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-10-18 14:40:01 -04:00
Potnuri Bharat Teja	612e0486ad	iw_cxgb4: fix ECN check on the passive accept pass_accept_req() is using the same skb for handling accept request and sending accept reply to HW. Here req and rpl structures are pointing to same skb->data which is over written by INIT_TP_WR() and leads to accessing corrupt req fields in accept_cr() while checking for ECN flags. Reordered code in accept_cr() to fetch correct req fields. Fixes: `92e7ae7172` ("iw_cxgb4: Choose appropriate hw mtu index and ISS for iWARP connections") Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Link: https://lore.kernel.org/r/20191003104353.11590-1-bharat@chelsio.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-10-18 14:35:19 -04:00
Mike Marciniszyn	22bb136534	IB/hfi1: Use a common pad buffer for 9B and 16B packets There is no reason for a different pad buffer for the two packet types. Expand the current buffer allocation to allow for both packet types. Fixes: `f8195f3b14` ("IB/hfi1: Eliminate allocation while atomic") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Kaike Wan <kaike.wan@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20191004204934.26838.13099.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-10-17 16:32:25 -04:00
Kaike Wan	9ed5bd7d22	IB/hfi1: Avoid excessive retry for TID RDMA READ request A TID RDMA READ request could be retried under one of the following conditions: - The RC retry timer expires; - A later TID RDMA READ RESP packet is received before the next expected one. For the latter, under normal conditions, the PSN in IB space is used for comparison. More specifically, the IB PSN in the incoming TID RDMA READ RESP packet is compared with the last IB PSN of a given TID RDMA READ request to determine if the request should be retried. This is similar to the retry logic for noraml RDMA READ request. However, if a TID RDMA READ RESP packet is lost due to congestion, header suppresion will be disabled and each incoming packet will raise an interrupt until the hardware flow is reloaded. Under this condition, each packet KDETH PSN will be checked by software against r_next_psn and a retry will be requested if the packet KDETH PSN is later than r_next_psn. Since each TID RDMA READ segment could have up to 64 packets and each TID RDMA READ request could have many segments, we could make far more retries under such conditions, and thus leading to RETRY_EXC_ERR status. This patch fixes the issue by removing the retry when the incoming packet KDETH PSN is later than r_next_psn. Instead, it resorts to RC timer and normal IB PSN comparison for any request retry. Fixes: `9905bf06e8` ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20191004204035.26542.41684.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-10-17 16:31:17 -04:00
Rafi Wiener	c8973df2da	RDMA/mlx5: Clear old rate limit when closing QP Before QP is closed it changes to ERROR state, when this happens the QP was left with old rate limit that was already removed from the table. Fixes: `7d29f349a4` ("IB/mlx5: Properly adjust rate limit on QP state transitions") Signed-off-by: Rafi Wiener <rafiw@mellanox.com> Signed-off-by: Oleg Kuporosov <olegk@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191002120243.16971-1-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-10-17 16:07:25 -04:00
Parav Pandit	03232cc43c	IB/mlx5: Introduce and use mkey context setting helper routine Introduce and use set_mkc_access_pd_addr_fields() which sets mkey context's access rights, PD, address fields. Thereby avoid the code duplication. Link: https://lore.kernel.org/r/20191006155443.31068-1-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-08 16:45:07 -03:00
Max Gurtovoy	3466c060ef	RDMA/iser: Use iser_err instead of pr_err for logging Make sure all the debug prints in ib_iser module use the common driver logger. Link: https://lore.kernel.org/r/1570366580-24097-1-git-send-email-maxg@mellanox.com Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-08 16:35:52 -03:00
Devesh Sharma	39c48c5146	RDMA/bnxt_re: Enable SRIOV VF support on Broadcom's 57500 adapter series Broadcom's 575xx adapter series has support for SRIOV VFs. Making changes to enable SRIOV VF support. There are two major area where changes are done: - Added new DB location for control-path and data-path DB ring - New devices do not need to issue the sriov-config slow-path command thus, skipping to call that firmware command. For now enabling support for 64 RoCE VFs. Link: https://lore.kernel.org/r/1570081715-14301-1-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-08 16:22:24 -03:00
Honggang Li	b2e872f451	RDMA/srp: Calculate max_it_iu_size if remote max_it_iu length available The default maximum immediate size is too big for old srp clients, which do not support immediate data. According to the SRP and SRP-2 specifications, the IOControllerProfile attributes for SRP target ports contains the maximum initiator to target iu length. The maximum initiator to target iu length can be obtained by sending MAD packets to query subnet manager port and SRP target ports. We should calculate the max_it_iu_size instead of the default value, when remote maximum initiator to target iu length available. Link: https://lore.kernel.org/r/20190927174352.7800-2-honli@redhat.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Honggang Li <honli@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-08 15:17:32 -03:00
Honggang Li	547ed331bb	RDMA/srp: Add parse function for maximum initiator to target IU size According to SRP specifications 'srp-r16a' and 'srp2r06', IOControllerProfile attributes for SRP target port include the maximum initiator to target IU size. SRP connection daemons, such as srp_daemon, can get the value from the subnet manager. The SRP connection daemon can pass this value to kernel. This patch adds a parse function for it. Upstream commit [1] enables the kernel parameter, 'use_imm_data', by default. [1] also use (8 * 1024) as the default value for kernel parameter 'max_imm_data'. With those default values, the maximum initiator to target IU size will be 8260. In case the SRPT modules, which include the in-tree 'ib_srpt.ko' module, do not support SRP-2 'immediate data' feature, the default maximum initiator to target IU size is significantly smaller than 8260. For 'ib_srpt.ko' module, which built from source before [2], the default maximum initiator to target IU is 2116. [1] introduces a regression issue for old srp targets with default kernel parameters, as the connection will be rejected because of a too large maximum initiator to target IU size. [1] commit `882981f4a4` ("RDMA/srp: Add support for immediate data") [2] commit `5dabcd0456` ("RDMA/srpt: Add support for immediate data") Link: https://lore.kernel.org/r/20190927174352.7800-1-honli@redhat.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Honggang Li <honli@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-08 15:17:32 -03:00
Jason Gunthorpe	0417791536	RDMA/mlx5: Add missing synchronize_srcu() for MW cases While MR uses live as the SRCU 'update', the MW case uses the xarray directly, xa_erase() causes the MW to become inaccessible to the pagefault thread. Thus whenever a MW is removed from the xarray we must synchronize_srcu() before freeing it. This must be done before freeing the mkey as re-use of the mkey while the pagefault thread is using the stale mkey is undesirable. Add the missing synchronizes to MW and DEVX indirect mkey and delete the bogus protection against double destroy in mlx5_core_destroy_mkey() Fixes: `534fd7aac5` ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP") Fixes: `6aec21f6a8` ("IB/mlx5: Page faults handling infrastructure") Link: https://lore.kernel.org/r/20191001153821.23621-7-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:54:22 -03:00
Jason Gunthorpe	aa603815c7	RDMA/mlx5: Put live in the correct place for ODP MRs live is used to signal to the pagefault thread that the MR is initialized and ready for use. It should be after the umem is assigned and all other setup is completed. This prevents races (at least) of the form: CPU0 CPU1 mlx5_ib_alloc_implicit_mr() implicit_mr_alloc() live = 1 imr->umem = umem num_pending_prefetch_inc() if (live) atomic_inc(num_pending_prefetch) atomic_set(num_pending_prefetch,0) // Overwrites other thread's store Further, live is being used with SRCU as the 'update' in an acquire/release fashion, so it can not be read and written raw. Move all live = 1's to after MR initialization is completed and use smp_store_release/smp_load_acquire() for manipulating it. Add a missing live = 0 when an implicit MR child is deleted, before queuing work to do synchronize_srcu(). The barriers in update_odp_mr() were some broken attempt to create a acquire/release, but were not even applied consistently and missed the point, delete it as well. Fixes: `6aec21f6a8` ("IB/mlx5: Page faults handling infrastructure") Link: https://lore.kernel.org/r/20191001153821.23621-6-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:54:22 -03:00
Jason Gunthorpe	aa116b810a	RDMA/mlx5: Order num_pending_prefetch properly with synchronize_srcu During destroy setting live = 0 and then synchronize_srcu() prevents num_pending_prefetch from incrementing, and also, ensures that all work holding that count is queued on the WQ. Testing before causes races of the form: CPU0 CPU1 dereg_mr() mlx5_ib_advise_mr_prefetch() srcu_read_lock() num_pending_prefetch_inc() if (!live) live = 0 atomic_read() == 0 // skip flush_workqueue() atomic_inc() queue_work(); srcu_read_unlock() WARN_ON(atomic_read()) // Fails Swap the order so that the synchronize_srcu() prevents this. Fixes: `a6bc3875f1` ("IB/mlx5: Protect against prefetch of invalid MR") Link: https://lore.kernel.org/r/20191001153821.23621-5-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:54:22 -03:00
Jason Gunthorpe	9dc775e7f5	RDMA/odp: Lift umem_mutex out of ib_umem_odp_unmap_dma_pages() This fixes a race of the form: CPU0 CPU1 mlx5_ib_invalidate_range() mlx5_ib_invalidate_range() // This one actually makes npages == 0 ib_umem_odp_unmap_dma_pages() if (npages == 0 && !dying) // This one does nothing ib_umem_odp_unmap_dma_pages() if (npages == 0 && !dying) dying = 1; dying = 1; schedule_work(&umem_odp->work); // Double schedule of the same work schedule_work(&umem_odp->work); // BOOM npages and dying must be read and written under the umem_mutex lock. Since whenever ib_umem_odp_unmap_dma_pages() is called mlx5 must also call mlx5_ib_update_xlt, and both need to be done in the same locking region, hoist the lock out of unmap. This avoids an expensive double critical section in mlx5_ib_invalidate_range(). Fixes: `81713d3788` ("IB/mlx5: Add implicit MR support") Link: https://lore.kernel.org/r/20191001153821.23621-4-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:54:21 -03:00
Jason Gunthorpe	f28b1932ea	RDMA/mlx5: Fix a race with mlx5_ib_update_xlt on an implicit MR mlx5_ib_update_xlt() must be protected against parallel free of the MR it is accessing, also it must be called single threaded while updating the HW. Otherwise we can have races of the form: CPU0 CPU1 mlx5_ib_update_xlt() mlx5_odp_populate_klm() odp_lookup() == NULL pklm = ZAP implicit_mr_get_data() implicit_mr_alloc() <update interval tree> mlx5_ib_update_xlt mlx5_odp_populate_klm() odp_lookup() != NULL pklm = VALID mlx5_ib_post_send_wait() mlx5_ib_post_send_wait() // Replaces VALID with ZAP This can be solved by putting both the SRCU and the umem_mutex lock around every call to mlx5_ib_update_xlt(). This ensures that the content of the interval tree relavent to mlx5_odp_populate_klm() (ie mr->parent == mr) will not change while it is running, and thus the posted WRs to update the KLM will always reflect the correct information. The race above will resolve by either having CPU1 wait till CPU0 completes the ZAP or CPU0 will run after the add and instead store VALID. The pagefault path adding children already holds the umem_mutex and SRCU, so the only missed lock is during MR destruction. Fixes: `81713d3788` ("IB/mlx5: Add implicit MR support") Link: https://lore.kernel.org/r/20191001153821.23621-3-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:54:21 -03:00
Jason Gunthorpe	880505cfef	RDMA/mlx5: Do not allow rereg of a ODP MR This code is completely broken, the umem of a ODP MR simply cannot be discarded without a lot more locking, nor can an ODP mkey be blithely destroyed via destroy_mkey(). Fixes: `6aec21f6a8` ("IB/mlx5: Page faults handling infrastructure") Link: https://lore.kernel.org/r/20191001153821.23621-2-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:54:21 -03:00
Mohamad Heib	1cbe866cbc	IB/core: Fix wrong iterating on ports rdma_for_each_port is already incrementing the iterator's value it receives therefore, after the first iteration the iterator is increased by 2 which eventually causing wrong queries and possible traces. Fix the above by removing the old redundant incrementation that was used before rdma_for_each_port() macro. Cc: <stable@vger.kernel.org> Fixes: `ea1075edcb` ("RDMA: Add and use rdma_for_each_port") Link: https://lore.kernel.org/r/20191002122127.17571-1-leon@kernel.org Signed-off-by: Mohamad Heib <mohamadh@mellanox.com> Reviewed-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:50:27 -03:00
Parav Pandit	909624d8db	IB/cm: Use container_of() instead of typecast Use container_of() macro to get to timewait info structure instead of typecasting. Link: https://lore.kernel.org/r/20191002122517.17721-5-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:38:16 -03:00
Erez Alfasi	6f26b2ac69	IB/mlx5: Remove unnecessary else statement 'else' is not generally useful after a break or return. Remove this unnecessary statement. Link: https://lore.kernel.org/r/20191002122517.17721-4-leon@kernel.org Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:38:16 -03:00
Erez Alfasi	2d67c07988	IB/mlx5: Remove unnecessary return statement There is no reason to call return at the end of function which returns void. Remove this unnecessary statement. Link: https://lore.kernel.org/r/20191002122517.17721-3-leon@kernel.org Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:38:16 -03:00
Leon Romanovsky	4b2a67362e	RDMA/mlx5: Group boolean parameters to take less space Clean the code to store all boolean parameters inside one variable. Link: https://lore.kernel.org/r/20191002122517.17721-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:38:16 -03:00
Bart Van Assche	9b64f7d0bb	RDMA/srpt: Postpone HCA removal until after configfs directory removal A shortcoming of the SCSI target core is that it does not have an API for removing tpg or wwn objects. Wait until these directories have been removed before allowing HCA removal to finish. See also Bart Van Assche, "Re: Why using configfs as the only interface is wrong for a storage target", 2011-02-07 (https://www.spinics.net/lists/linux-scsi/msg50248.html). This patch fixes the following kernel crash: ================================================================== BUG: KASAN: use-after-free in __configfs_open_file.isra.4+0x1a8/0x400 Read of size 8 at addr ffff88811880b690 by task restart-lio-srp/1215 CPU: 1 PID: 1215 Comm: restart-lio-srp Not tainted 5.3.0-dbg+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Call Trace: dump_stack+0x86/0xca print_address_description+0x74/0x32d __kasan_report.cold.6+0x1b/0x36 kasan_report+0x12/0x17 __asan_load8+0x54/0x90 __configfs_open_file.isra.4+0x1a8/0x400 configfs_open_file+0x13/0x20 do_dentry_open+0x2b1/0x770 vfs_open+0x58/0x60 path_openat+0x5fa/0x14b0 do_filp_open+0x115/0x180 do_sys_open+0x1d4/0x2a0 __x64_sys_openat+0x59/0x70 do_syscall_64+0x6b/0x2d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f2f2bd3fcce Code: 25 00 00 41 00 3d 00 00 41 00 74 48 48 8d 05 19 d7 0d 00 8b 00 85 c0 75 69 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 f0 ff ff 0f 87 a6 00 00 00 48 8b 4c 24 28 64 48 33 0c 25 RSP: 002b:00007ffd155f7850 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 0000564609ba88e0 RCX: 00007f2f2bd3fcce RDX: 0000000000000241 RSI: 0000564609ba8cf0 RDI: 00000000ffffff9c RBP: 00007ffd155f7950 R08: 0000000000000000 R09: 0000000000000020 R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000003 R14: 0000000000000001 R15: 0000564609ba8cf0 Allocated by task 995: save_stack+0x21/0x90 __kasan_kmalloc.constprop.9+0xc7/0xd0 kasan_kmalloc+0x9/0x10 __kmalloc+0x153/0x370 srpt_add_one+0x4f/0x561 [ib_srpt] add_client_context+0x251/0x290 [ib_core] ib_register_client+0x1da/0x220 [ib_core] iblock_get_alignment_offset_lbas+0x6b/0x100 [target_core_iblock] do_one_initcall+0xcd/0x43a do_init_module+0x103/0x380 load_module+0x3b77/0x3eb0 __do_sys_finit_module+0x12d/0x1b0 __x64_sys_finit_module+0x43/0x50 do_syscall_64+0x6b/0x2d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 1221: save_stack+0x21/0x90 __kasan_slab_free+0x139/0x190 kasan_slab_free+0xe/0x10 slab_free_freelist_hook+0x67/0x1e0 kfree+0xcb/0x2a0 srpt_remove_one+0x596/0x670 [ib_srpt] remove_client_context+0x9a/0xe0 [ib_core] disable_device+0x106/0x1b0 [ib_core] __ib_unregister_device+0x5f/0xf0 [ib_core] ib_unregister_driver+0x11a/0x170 [ib_core] 0xffffffffa087f666 __x64_sys_delete_module+0x1f8/0x2c0 do_syscall_64+0x6b/0x2d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff88811880b300 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 912 bytes inside of 4096-byte region [ffff88811880b300, ffff88811880c300) The buggy address belongs to the page: page:ffffea0004620200 refcount:1 mapcount:0 mapping:ffff88811ac0de00 index:0x0 compound_mapcount: 0 flags: 0x2fff000000010200(slab\|head) raw: 2fff000000010200 dead000000000100 dead000000000122 ffff88811ac0de00 raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88811880b580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88811880b600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88811880b680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88811880b700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88811880b780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Link: https://lore.kernel.org/r/20190930231707.48259-16-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:35:07 -03:00
Bart Van Assche	3236fd61ee	RDMA/srpt: Make the code for handling port identities more systematic Introduce a new data structure for the information about an RDMA port name. This patch does not change any functionality. Link: https://lore.kernel.org/r/20190930231707.48259-15-bvanassche@acm.org Cc: Honggang LI <honli@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:35:07 -03:00
Bart Van Assche	be408e65f5	RDMA/srpt: Rework the code that waits until an RDMA port is no longer in use The current implementation does not wait until srpt_release_channel() has finished and hence can trigger a use-after-free. Rework srpt_release_sport() such that it waits until srpt_release_channel() has finished. This patch fixes the following KASAN complaint: ================================================================== BUG: KASAN: use-after-free in srpt_free_ioctx.part.23+0x42/0x100 [ib_srpt] Read of size 8 at addr ffff888115c71100 by task kworker/4:3/807 CPU: 4 PID: 807 Comm: kworker/4:3 Not tainted 5.3.0-dbg+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Workqueue: events srpt_release_channel_work [ib_srpt] Call Trace: dump_stack+0x86/0xca print_address_description+0x74/0x32d __kasan_report.cold.6+0x1b/0x36 kasan_report+0x12/0x17 __asan_load8+0x54/0x90 srpt_free_ioctx.part.23+0x42/0x100 [ib_srpt] srpt_free_ioctx_ring.part.24+0x50/0x80 [ib_srpt] srpt_release_channel_work+0x2ad/0x390 [ib_srpt] process_one_work+0x51a/0xa60 worker_thread+0x67/0x5b0 kthread+0x1dc/0x200 ret_from_fork+0x24/0x30 Allocated by task 984: save_stack+0x21/0x90 __kasan_kmalloc.constprop.9+0xc7/0xd0 kasan_kmalloc+0x9/0x10 __kmalloc+0x153/0x370 srpt_add_one+0x4f/0x570 [ib_srpt] add_client_context+0x251/0x290 [ib_core] ib_register_client+0x1da/0x220 [ib_core] iblock_get_alignment_offset_lbas+0x6b/0x100 [target_core_iblock] do_one_initcall+0xcd/0x43a do_init_module+0x103/0x380 load_module+0x3b77/0x3eb0 __do_sys_finit_module+0x12d/0x1b0 __x64_sys_finit_module+0x43/0x50 do_syscall_64+0x6b/0x2d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 1128: save_stack+0x21/0x90 __kasan_slab_free+0x139/0x190 kasan_slab_free+0xe/0x10 slab_free_freelist_hook+0x67/0x1e0 kfree+0xcb/0x2a0 srpt_remove_one+0x569/0x5b0 [ib_srpt] remove_client_context+0x9a/0xe0 [ib_core] disable_device+0x106/0x1b0 [ib_core] __ib_unregister_device+0x5f/0xf0 [ib_core] ib_unregister_device_and_put+0x48/0x60 [ib_core] nldev_dellink+0x120/0x180 [ib_core] rdma_nl_rcv+0x287/0x480 [ib_core] netlink_unicast+0x2cc/0x370 netlink_sendmsg+0x3b1/0x630 __sys_sendto+0x1db/0x290 __x64_sys_sendto+0x80/0xa0 do_syscall_64+0x6b/0x2d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff888115c71100 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 0 bytes inside of 4096-byte region [ffff888115c71100, ffff888115c72100) The buggy address belongs to the page: page:ffffea0004571c00 refcount:1 mapcount:0 mapping:ffff88811ac0de00 index:0xffff888115c70000 compound_mapcount: 0 flags: 0x2fff000000010200(slab\|head) raw: 2fff000000010200 ffffea00045ac408 ffffea0004593208 ffff88811ac0de00 raw: ffff888115c70000 0000000000070002 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888115c71000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888115c71080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff888115c71100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888115c71180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888115c71200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Link: https://lore.kernel.org/r/20190930231707.48259-14-bvanassche@acm.org Cc: Honggang LI <honli@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:35:07 -03:00
Bart Van Assche	6eaed91c67	RDMA/srpt: Rework the approach for closing an RDMA channel Instead of relying on a waitqueue, report when the identity of an RDMA channel can be reused through a completion. Link: https://lore.kernel.org/r/20190930231707.48259-13-bvanassche@acm.org Cc: Honggang LI <honli@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:35:07 -03:00
Bart Van Assche	b5948cfdde	RDMA/srpt: Improve a debug message The ib_srpt driver uses two different identifiers while registering a session with the LIO core. Report both identifiers if the modified pr_debug() statement is enabled. Link: https://lore.kernel.org/r/20190930231707.48259-12-bvanassche@acm.org Cc: Honggang LI <honli@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:35:07 -03:00
Bart Van Assche	cbca2442a0	RDMA/srpt: Fix handling of iWARP logins The path_rec pointer is NULL set for IB and RoCE logins but not for iWARP logins. Hence check the path_rec pointer before dereferencing it. Link: https://lore.kernel.org/r/20190930231707.48259-11-bvanassche@acm.org Cc: Honggang LI <honli@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:35:06 -03:00
Bart Van Assche	09f8a1486d	RDMA/srpt: Fix handling of SR-IOV and iWARP ports Management datagrams (MADs) are not supported by SR-IOV VFs nor by iWARP ports. Support SR-IOV VFs and iWARP ports by only logging an error message if MAD handler registration fails. Link: https://lore.kernel.org/r/20190930231707.48259-10-bvanassche@acm.org Cc: Honggang LI <honli@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:35:06 -03:00
Bart Van Assche	fdbcf5c026	RDMA/srp: Make route resolving error messages more informative The IPv6 scope ID is essential when setting up an iWARP connection between IPv6 link-local addresses. Report the scope ID in error messages. Link: https://lore.kernel.org/r/20190930231707.48259-9-bvanassche@acm.org Cc: Honggang LI <honli@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:35:06 -03:00
Bart Van Assche	bf58347061	RDMA/srp: Honor the max_send_sge device attribute Instead of assuming that max_send_sge >= 3, restrict the number of scatter gather elements to what is supported by the RDMA adapter. Link: https://lore.kernel.org/r/20190930231707.48259-8-bvanassche@acm.org Cc: Honggang LI <honli@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:35:06 -03:00
Bart Van Assche	14673778d0	RDMA/srp: Remove two casts This patch does not change any functionality. Link: https://lore.kernel.org/r/20190930231707.48259-7-bvanassche@acm.org Cc: Honggang LI <honli@redhat.com> Cc: Laurence Oberman <loberman@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:35:05 -03:00
Bart Van Assche	934f05b05d	RDMA/siw: Make node GUIDs valid EUI-64 identifiers >From the IBTA: "GUID (Global Unique Identifier): A globally unique EUI-64 compliant identifier." Make sure that siw GUIDs are valid EUI-64 identifiers. Link: https://lore.kernel.org/r/20190930231707.48259-6-bvanassche@acm.org Cc: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:35:05 -03:00
Leon Romanovsky	594e6c5d41	RDMA/nldev: Reshuffle the code to avoid need to rebind QP in error path Properly unwind QP counter rebinding in case of failure. Trying to rebind the counter after unbiding it is not going to work reliably, move the unbind to the end so it doesn't have to be unwound. Fixes: `b389327df9` ("RDMA/nldev: Allow counter manual mode configration through RDMA netlink") Link: https://lore.kernel.org/r/20191002115627.16740-1-leon@kernel.org Reviewed-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:29:55 -03:00
Greg KH	3840c5b788	RDMA/cxgb4: Do not dma memory off of the stack Nicolas pointed out that the cxgb4 driver is doing dma off of the stack, which is generally considered a very bad thing. On some architectures it could be a security problem, but odds are none of them actually run this driver, so it's just a "normal" bug. Resolve this by allocating the memory for a message off of the heap instead of the stack. kmalloc() always will give us a proper memory location that DMA will work correctly from. Link: https://lore.kernel.org/r/20191001165611.GA3542072@kroah.com Reported-by: Nicolas Waisman <nico@semmle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Tested-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:13:27 -03:00
Potnuri Bharat Teja	30e0f6cf5a	RDMA/iw_cxgb3: Remove the iw_cxgb3 module from kernel Remove iw_cxgb3 module from kernel as the corresponding HW Chelsio T3 has reached EOL. Link: https://lore.kernel.org/r/20190930074252.20133-1-bharat@chelsio.com Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 15:08:59 -03:00
Jack Morgenstein	94635c36f3	RDMA/cm: Fix memory leak in cm_add/remove_one In the process of moving the debug counters sysfs entries, the commit mentioned below eliminated the cm_infiniband sysfs directory. This sysfs directory was tied to the cm_port object allocated in procedure cm_add_one(). Before the commit below, this cm_port object was freed via a call to kobject_put(port->kobj) in procedure cm_remove_port_fs(). Since port no longer uses its kobj, kobject_put(port->kobj) was eliminated. This, however, meant that kfree was never called for the cm_port buffers. Fix this by adding explicit kfree(port) calls to functions cm_add_one() and cm_remove_one(). Note: the kfree call in the first chunk below (in the cm_add_one error flow) fixes an old, undetected memory leak. Fixes: `c87e65cfb9` ("RDMA/cm: Move debug counters to be under relevant IB device") Link: https://lore.kernel.org/r/20190916071154.20383-2-leon@kernel.org Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 14:58:31 -03:00
Christophe JAILLET	ab59ca3eb4	RDMA/core: Fix an error handling path in 'res_get_common_doit()' According to surrounding error paths, it is likely that 'goto err_get;' is expected here. Otherwise, a call to 'rdma_restrack_put(res);' would be missing. Fixes: `c5dfe0ea6f` ("RDMA/nldev: Add resource tracker doit callback") Link: https://lore.kernel.org/r/20190818091044.8845-1-christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 14:55:18 -03:00
Shiraz, Saleem	ee4e4040ab	RDMA/i40iw: Associate ibdev to netdev before IB device registration i40iw IB device registration fails with ENODEV. ib_register_device setup_device/setup_port_data i40iw_port_immutable ib_query_port iw_query_port ib_device_get_netdev(ENODEV) ib_device_get_netdev() does not have a netdev associated with the ibdev and thus fails. Use ib_device_set_netdev() to associate netdev to ibdev in i40iw before IB device registration. Fixes: `4929116bdf` ("RDMA/core: Add common iWARP query port") Link: https://lore.kernel.org/r/20190925164524.856-1-shiraz.saleem@intel.com Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Reviewed-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-04 14:29:14 -03:00
Kamal Heib	f3fceba5da	RDMA/rxe: Verify modify_device mask Verify that the passed mask to rxe_modify_device() is supported. Link: https://lore.kernel.org/r/20190923104158.5331-4-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-01 13:06:10 -03:00
Kamal Heib	39ce85f3b1	RDMA/bnxt_re: Remove unsupported modify_device callback There is no need to return always zero for function which is not supported. Link: https://lore.kernel.org/r/20190923104158.5331-3-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-01 13:06:10 -03:00
Kamal Heib	d0f3ef36bf	RDMA/core: Fix return code when modify_device isn't supported The proper return code is "-EOPNOTSUPP" when modify_device callback is not supported. Link: https://lore.kernel.org/r/20190923104158.5331-2-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-01 13:06:10 -03:00
Bart Van Assche	050dbddf24	RDMA/siw: Fix port number endianness in a debug message sin_port and sin6_port are big endian member variables. Convert these port numbers into CPU endianness before printing. Link: https://lore.kernel.org/r/20190930231707.48259-5-bvanassche@acm.org Fixes: `6c52fdc244` ("rdma/siw: connection management") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-01 12:18:18 -03:00
Bart Van Assche	23c1c13cdd	RDMA/siw: Simplify several debug messages Do not print the remote address if it is not used. Use %pISp instead of %pI4 %d. Link: https://lore.kernel.org/r/20190930231707.48259-4-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-01 12:18:17 -03:00
Bart Van Assche	b66f31efbd	RDMA/iwcm: Fix a lock inversion issue This patch fixes the lock inversion complaint: ============================================ WARNING: possible recursive locking detected 5.3.0-rc7-dbg+ #1 Not tainted -------------------------------------------- kworker/u16:6/171 is trying to acquire lock: 00000000035c6e6c (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x78/0x4a0 [rdma_cm] but task is already holding lock: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&id_priv->handler_mutex); lock(&id_priv->handler_mutex); * DEADLOCK * May be due to missing lock nesting notation 3 locks held by kworker/u16:6/171: #0: 00000000e2eaa773 ((wq_completion)iw_cm_wq){+.+.}, at: process_one_work+0x472/0xac0 #1: 000000001efd357b ((work_completion)(&work->work)#3){+.+.}, at: process_one_work+0x476/0xac0 #2: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm] stack backtrace: CPU: 3 PID: 171 Comm: kworker/u16:6 Not tainted 5.3.0-rc7-dbg+ #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: iw_cm_wq cm_work_handler [iw_cm] Call Trace: dump_stack+0x8a/0xd6 __lock_acquire.cold+0xe1/0x24d lock_acquire+0x106/0x240 __mutex_lock+0x12e/0xcb0 mutex_lock_nested+0x1f/0x30 rdma_destroy_id+0x78/0x4a0 [rdma_cm] iw_conn_req_handler+0x5c9/0x680 [rdma_cm] cm_work_handler+0xe62/0x1100 [iw_cm] process_one_work+0x56d/0xac0 worker_thread+0x7a/0x5d0 kthread+0x1bc/0x210 ret_from_fork+0x24/0x30 This is not a bug as there are actually two lock classes here. Link: https://lore.kernel.org/r/20190930231707.48259-3-bvanassche@acm.org Fixes: `de910bd921` ("RDMA/cma: Simplify locking needed for serialization of callbacks") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-01 12:11:50 -03:00
Potnuri Bharat Teja	91724c1e5a	RDMA/iw_cxgb4: fix SRQ access from dump_qp() dump_qp() is wrongly trying to dump SRQ structures as QP when SRQ is used by the application. This patch matches the QPID before dumping them. Also removes unwanted SRQ id addition to QP id xarray. Fixes: `2f43129127` ("cxgb4: Convert qpidr to XArray") Link: https://lore.kernel.org/r/20190930074119.20046-1-bharat@chelsio.com Signed-off-by: Rahul Kundu <rahul.kundu@chelsio.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-01 11:48:10 -03:00
Navid Emamdoost	34b3be18a0	RDMA/hfi1: Prevent memory leak in sdma_init In sdma_init if rhashtable_init fails the allocated memory for tmp_sdma_rht should be released. Fixes: `5a52a7acf7` ("IB/hfi1: NULL pointer dereference when freeing rhashtable") Link: https://lore.kernel.org/r/20190925144543.10141-1-navid.emamdoost@gmail.com Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-01 11:34:55 -03:00
Michal Kalderon	390d3fdcae	RDMA/core: Fix use after free and refcnt leak on ndev in_device in iwarp_query_port If an iWARP driver is probed and removed while there are no ips set for the device, it will lead to a reference count leak on the inet device of the netdevice. In addition, the netdevice was accessed after already calling netdev_put, which could lead to using the netdev after already freed. Fixes: `4929116bdf` ("RDMA/core: Add common iWARP query port") Link: https://lore.kernel.org/r/20190925123332.10746-1-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-01 11:31:27 -03:00
Max Gurtovoy	6eeff06db9	IB/iser: remove redundant macro definitions Use the general linux definition for 4K and retrieve the rest from it. Link: https://lore.kernel.org/r/1569359148-12312-1-git-send-email-maxg@mellanox.com Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-01 11:27:13 -03:00
Max Gurtovoy	7718cf03c3	IB/iser: bound protection_sg size by data_sg size In case we don't set the sg_prot_tablesize, the scsi layer assign the default size (65535 entries). We should limit this size since we should take into consideration the underlaying device capability. This cap is considered when calculating the sg_tablesize. Otherwise, for example, we can get that /sys/block/sdb/queue/max_segments is 128 and /sys/block/sdb/queue/max_integrity_segments is 65535. Link: https://lore.kernel.org/r/1569359027-10987-1-git-send-email-maxg@mellanox.com Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-01 11:24:24 -03:00
Max Gurtovoy	70bcc63f84	IB/iser: add unlikely checks in the fast path ib_post_send, ib_post_recv and ib_dma_map_sg operations should succeed unless something unusual happened to the ib device. Link: https://lore.kernel.org/r/1569274369-29217-1-git-send-email-maxg@mellanox.com Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-01 11:23:22 -03:00
Krishnamraju Eraparaju	df791c54d6	RDMA/siw: Fix serialization issue in write_space() In siw_qp_llp_write_space(), 'sock' members should be accessed with sk_callback_lock held, otherwise, it could race with siw_sk_restore_upcalls(). And this could cause "NULL deref" panic. Below panic is due to the NULL cep returned from sk_to_cep(sk): Call Trace: <IRQ> siw_qp_llp_write_space+0x11/0x40 [siw] tcp_check_space+0x4c/0xf0 tcp_rcv_established+0x52b/0x630 tcp_v4_do_rcv+0xf4/0x1e0 tcp_v4_rcv+0x9b8/0xab0 ip_protocol_deliver_rcu+0x2c/0x1c0 ip_local_deliver_finish+0x44/0x50 ip_local_deliver+0x6b/0xf0 ? ip_protocol_deliver_rcu+0x1c0/0x1c0 ip_rcv+0x52/0xd0 ? ip_rcv_finish_core.isra.14+0x390/0x390 __netif_receive_skb_one_core+0x83/0xa0 netif_receive_skb_internal+0x73/0xb0 napi_gro_frags+0x1ff/0x2b0 t4_ethrx_handler+0x4a7/0x740 [cxgb4] process_responses+0x2c9/0x590 [cxgb4] ? t4_sge_intr_msix+0x1d/0x30 [cxgb4] ? handle_irq_event_percpu+0x51/0x70 ? handle_irq_event+0x41/0x60 ? handle_edge_irq+0x97/0x1a0 napi_rx_handler+0x14/0xe0 [cxgb4] net_rx_action+0x2af/0x410 __do_softirq+0xda/0x2a8 do_softirq_own_stack+0x2a/0x40 </IRQ> do_softirq+0x50/0x60 __local_bh_enable_ip+0x50/0x60 ip_finish_output2+0x18f/0x520 ip_output+0x6e/0xf0 ? __ip_finish_output+0x1f0/0x1f0 __ip_queue_xmit+0x14f/0x3d0 ? __slab_alloc+0x4b/0x58 __tcp_transmit_skb+0x57d/0xa60 tcp_write_xmit+0x23b/0xfd0 __tcp_push_pending_frames+0x2e/0xf0 tcp_sendmsg_locked+0x939/0xd50 tcp_sendmsg+0x27/0x40 sock_sendmsg+0x57/0x80 siw_tx_hdt+0x894/0xb20 [siw] ? find_busiest_group+0x3e/0x5b0 ? common_interrupt+0xa/0xf ? common_interrupt+0xa/0xf ? common_interrupt+0xa/0xf siw_qp_sq_process+0xf1/0xe60 [siw] ? __wake_up_common_lock+0x87/0xc0 siw_sq_resume+0x33/0xe0 [siw] siw_run_sq+0xac/0x190 [siw] ? remove_wait_queue+0x60/0x60 kthread+0xf8/0x130 ? siw_sq_resume+0xe0/0xe0 [siw] ? kthread_bind+0x10/0x10 ret_from_fork+0x35/0x40 Fixes: `f29dd55b02` ("rdma/siw: queue pair methods") Link: https://lore.kernel.org/r/20190923101112.32685-1-krishna2@chelsio.com Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-01 10:55:27 -03:00
Adit Ranadive	18545e8b68	RDMA/vmw_pvrdma: Free SRQ only once An extra kfree cleanup was missed since these are now deallocated by core. Link: https://lore.kernel.org/r/1568848066-12449-1-git-send-email-aditr@vmware.com Cc: <stable@vger.kernel.org> Fixes: `68e326dea1` ("RDMA: Handle SRQ allocations by IB/core") Signed-off-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Vishnu Dasa <vdasa@vmware.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-01 10:47:58 -03:00
Mark Zhang	663912a637	RDMA/counter: Prevent QP counter manual binding in auto mode If auto mode is configured, manual counter allocation and QP bind is not allowed. Fixes: `1bd8e0a9d0` ("RDMA/counter: Allow manual mode configuration support") Link: https://lore.kernel.org/r/20190916071154.20383-3-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-10-01 10:30:16 -03:00
Linus Torvalds	02dc96ef6c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) Sanity check URB networking device parameters to avoid divide by zero, from Oliver Neukum. 2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6 don't work properly. Longer term this needs a better fix tho. From Vijay Khemka. 3) Small fixes to selftests (use ping when ping6 is not present, etc.) from David Ahern. 4) Bring back rt_uses_gateway member of struct rtable, it's semantics were not well understood and trying to remove it broke things. From David Ahern. 5) Move usbnet snaity checking, ignore endpoints with invalid wMaxPacketSize. From Bjørn Mork. 6) Missing Kconfig deps for sja1105 driver, from Mao Wenan. 7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel, Alex Vesker, and Yevgeny Kliteynik 8) Missing CAP_NET_RAW checks in various places, from Ori Nimron. 9) Fix crash when removing sch_cbs entry while offloading is enabled, from Vinicius Costa Gomes. 10) Signedness bug fixes, generally in looking at the result given by of_get_phy_mode() and friends. From Dan Crapenter. 11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet. 12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern. 13) Fix quantization code in tcp_bbr, from Kevin Yang. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits) net: tap: clean up an indentation issue nfp: abm: fix memory leak in nfp_abm_u32_knode_replace tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state sk_buff: drop all skb extensions on free and skb scrubbing tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions Documentation: Clarify trap's description mlxsw: spectrum: Clear VLAN filters during port initialization net: ena: clean up indentation issue NFC: st95hf: clean up indentation issue net: phy: micrel: add Asym Pause workaround for KSZ9021 net: socionext: ave: Avoid using netdev_err() before calling register_netdev() ptp: correctly disable flags on old ioctls lib: dimlib: fix help text typos net: dsa: microchip: Always set regmap stride to 1 nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled net: sched: sch_sfb: don't call qdisc_put() while holding tree lock ...	2019-09-28 17:47:33 -07:00
Denis Efremov	7b0b692594	IB/hfi1: remove unlikely() from IS_ERR*() condition "unlikely(IS_ERR_OR_NULL(x))" is excessive. IS_ERR_OR_NULL() already uses unlikely() internally. Link: http://lkml.kernel.org/r/20190829165025.15750-8-efremov@linux.com Signed-off-by: Denis Efremov <efremov@linux.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Joe Perches <joe@perches.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-09-26 10:10:30 -07:00
Linus Torvalds	9c9fa97a8e	Merge branch 'akpm' (patches from Andrew) Merge updates from Andrew Morton: - a few hot fixes - ocfs2 updates - almost all of -mm (slab-generic, slab, slub, kmemleak, kasan, cleanups, debug, pagecache, memcg, gup, pagemap, memory-hotplug, sparsemem, vmalloc, initialization, z3fold, compaction, mempolicy, oom-kill, hugetlb, migration, thp, mmap, madvise, shmem, zswap, zsmalloc) * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (132 commits) mm/zsmalloc.c: fix a -Wunused-function warning zswap: do not map same object twice zswap: use movable memory if zpool support allocate movable memory zpool: add malloc_support_movable to zpool_driver shmem: fix obsolete comment in shmem_getpage_gfp() mm/madvise: reduce code duplication in error handling paths mm: mmap: increase sockets maximum memory size pgoff for 32bits mm/mmap.c: refine find_vma_prev() with rb_last() riscv: make mmap allocation top-down by default mips: use generic mmap top-down layout and brk randomization mips: replace arch specific way to determine 32bit task with generic version mips: adjust brk randomization offset to fit generic version mips: use STACK_TOP when computing mmap base address mips: properly account for stack randomization and stack guard gap arm: use generic mmap top-down layout and brk randomization arm: use STACK_TOP when computing mmap base address arm: properly account for stack randomization and stack guard gap arm64, mm: make randomization selected by generic topdown mmap layout arm64, mm: move generic mmap layout functions to mm arm64: consider stack randomization for mmap base only when necessary ...	2019-09-24 16:10:23 -07:00
akpm@linux-foundation.org	2d15eb31b5	mm/gup: add make_dirty arg to put_user_pages_dirty_lock() [11~From: John Hubbard <jhubbard@nvidia.com> Subject: mm/gup: add make_dirty arg to put_user_pages_dirty_lock() Patch series "mm/gup: add make_dirty arg to put_user_pages_dirty_lock()", v3. There are about 50+ patches in my tree [2], and I'll be sending out the remaining ones in a few more groups: * The block/bio related changes (Jerome mostly wrote those, but I've had to move stuff around extensively, and add a little code) * mm/ changes * other subsystem patches * an RFC that shows the current state of the tracking patch set. That can only be applied after all call sites are converted, but it's good to get an early look at it. This is part a tree-wide conversion, as described in `fc1d8e7cca` ("mm: introduce put_user_page(), placeholder versions"). This patch (of 3): Provide more capable variation of put_user_pages_dirty_lock(), and delete put_user_pages_dirty(). This is based on the following: 1. Lots of call sites become simpler if a bool is passed into put_user_page(), instead of making the call site choose which put_user_page() variant to call. 2. Christoph Hellwig's observation that set_page_dirty_lock() is usually correct, and set_page_dirty() is usually a bug, or at least questionable, within a put_user_page() calling chain. This leads to the following API choices: * put_user_pages_dirty_lock(page, npages, make_dirty) * There is no put_user_pages_dirty(). You have to hand code that, in the rare case that it's required. [jhubbard@nvidia.com: remove unused variable in siw_free_plist()] Link: http://lkml.kernel.org/r/20190729074306.10368-1-jhubbard@nvidia.com Link: http://lkml.kernel.org/r/20190724044537.10458-2-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-09-24 15:54:08 -07:00
Linus Torvalds	299d14d4c3	pci-v5.4-changes -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl2JNVAUHGJoZWxnYWFz QGdvb2dsZS5jb20ACgkQWYigwDrT+vyTOA/9EZeyS7J+ZcOwihWz5vNijf0kfpKp /jZ9VF9nHjsL9Pw3/Fzha605Ssrtwcqge8g/sze9f0g/pxZk99lLHokE6dEOurEA GyKpNNMdiBol4YZMCsSoYji0MpwW0uMCuASPMiEwv2LxZ72A2Tu1RbgYLU+n4m1T fQldDTxsUMXc/OH/8SL8QDEh6o8qyDRhmSXFAOv8RGqN8N3iUwVwhQobKpwpmEvx ddzqWMS8f91qkhIKO7fgc9P4NI/7yI7kkF+wcdwtfiMO8Qkr4IdcdF7qwNVAtpKA A+sMRi59i2XxDTqRFx+wXXMa+rt+Pf1pucv77SO74xXWwpuXSxLVDYjULP1YQugK FTBo4SNmico/ts+n5cgm+CGMq2P2E29VYeqkI1Un6eDDvQnQlBgQdpdcBoadJ0rW y31OInjhRJC1ZK5bATKfCMbmB+VQxFsbyeUA7PBlrALyAmXZfw30iNxX9iHBhWqc myPNVEJJGp0cWTxGxMAU9MhelzeQxDAd+Eb44J5gv51bx0w9yqmZHECSDrOVdtYi HpOyI7E3Cb8m23BOHvCdB/v8igaYMZl08LUUJqu1S9mFclYyYVuOOIB04Yc2Qrx1 3PHtT8TC47FbWuzKwo12RflzoAiNShJGw+tNKo6T1jC+r5jdbKWWtTnsoRqbSfaG rG5RJpB7EuQSP1Y= =/xB3 -----END PGP SIGNATURE----- Merge tag 'pci-v5.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Consolidate _HPP/_HPX stuff in pci-acpi.c and simplify it (Krzysztof Wilczynski) - Fix incorrect PCIe device types and remove dev->has_secondary_link to simplify code that deals with upstream/downstream ports (Mika Westerberg) - After suspend, restore Resizable BAR size bits correctly for 1MB BARs (Sumit Saxena) - Enable PCI_MSI_IRQ_DOMAIN support for RISC-V (Wesley Terpstra) Virtualization: - Add ACS quirks for iProc PAXB (Abhinav Ratna), Amazon Annapurna Labs (Ali Saidi) - Move sysfs SR-IOV functions to iov.c (Kelsey Skunberg) - Remove group write permissions from sysfs sriov_numvfs, sriov_drivers_autoprobe (Kelsey Skunberg) Hotplug: - Simplify pciehp indicator control (Denis Efremov) Peer-to-peer DMA: - Allow P2P DMA between root ports for whitelisted bridges (Logan Gunthorpe) - Whitelist some Intel host bridges for P2P DMA (Logan Gunthorpe) - DMA map P2P DMA requests that traverse host bridge (Logan Gunthorpe) Amazon Annapurna Labs host bridge driver: - Add DT binding and controller driver (Jonathan Chocron) Hyper-V host bridge driver: - Fix hv_pci_dev->pci_slot use-after-free (Dexuan Cui) - Fix PCI domain number collisions (Haiyang Zhang) - Use instance ID bytes 4 & 5 as PCI domain numbers (Haiyang Zhang) - Fix build errors on non-SYSFS config (Randy Dunlap) i.MX6 host bridge driver: - Limit DBI register length (Stefan Agner) Intel VMD host bridge driver: - Fix config addressing issues (Jon Derrick) Layerscape host bridge driver: - Add bar_fixed_64bit property to endpoint driver (Xiaowei Bao) - Add CONFIG_PCI_LAYERSCAPE_EP to build EP/RC drivers separately (Xiaowei Bao) Mediatek host bridge driver: - Add MT7629 controller support (Jianjun Wang) Mobiveil host bridge driver: - Fix CPU base address setup (Hou Zhiqiang) - Make "num-lanes" property optional (Hou Zhiqiang) Tegra host bridge driver: - Fix OF node reference leak (Nishka Dasgupta) - Disable MSI for root ports to work around design problem (Vidya Sagar) - Add Tegra194 DT binding and controller support (Vidya Sagar) - Add support for sideband pins and slot regulators (Vidya Sagar) - Add PIPE2UPHY support (Vidya Sagar) Misc: - Remove unused pci_block_cfg_access() et al (Kelsey Skunberg) - Unexport pci_bus_get(), etc (Kelsey Skunberg) - Hide PM, VC, link speed, ATS, ECRC, PTM constants and interfaces in the PCI core (Kelsey Skunberg) - Clean up sysfs DEVICE_ATTR() usage (Kelsey Skunberg) - Mark expected switch fall-through (Gustavo A. R. Silva) - Propagate errors for optional regulators and PHYs (Thierry Reding) - Fix kernel command line resource_alignment parameter issues (Logan Gunthorpe)" * tag 'pci-v5.4-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (112 commits) PCI: Add pci_irq_vector() and other stubs when !CONFIG_PCI arm64: tegra: Add PCIe slot supply information in p2972-0000 platform arm64: tegra: Add configuration for PCIe C5 sideband signals PCI: tegra: Add support to enable slot regulators PCI: tegra: Add support to configure sideband pins PCI: vmd: Fix shadow offsets to reflect spec changes PCI: vmd: Fix config addressing when using bus offsets PCI: dwc: Add validation that PCIe core is set to correct mode PCI: dwc: al: Add Amazon Annapurna Labs PCIe controller driver dt-bindings: PCI: Add Amazon's Annapurna Labs PCIe host bridge binding PCI: Add quirk to disable MSI-X support for Amazon's Annapurna Labs Root Port PCI/VPD: Prevent VPD access for Amazon's Annapurna Labs Root Port PCI: Add ACS quirk for Amazon Annapurna Labs root ports PCI: Add Amazon's Annapurna Labs vendor ID MAINTAINERS: Add PCI native host/endpoint controllers designated reviewer PCI: hv: Use bytes 4 and 5 from instance ID as the PCI domain numbers dt-bindings: PCI: tegra: Add PCIe slot supplies regulator entries dt-bindings: PCI: tegra: Add sideband pins configuration entries PCI: tegra: Add Tegra194 PCIe support PCI: Get rid of dev->has_secondary_link flag ...	2019-09-23 19:16:01 -07:00
Linus Torvalds	018c6837f3	RDMA subsystem updates for 5.4 This cycle mainly saw lots of bug fixes and clean up code across the core code and several drivers, few new functional changes were made. - Many cleanup and bug fixes for hns - Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed, bnxt_re, efa - Share the query_port code between all the iWarp drivers - General rework and cleanup of the ODP MR umem code to fit better with the mmu notifier get/put scheme - Support rdma netlink in non init_net name spaces - mlx5 support for XRC devx and DC ODP -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl2A1ugACgkQOG33FX4g mxp+EQ//Ug8CyyDs40SGZztItoGghuQ4TVA2pXtuZ9LkFVJRWhYPJGOadHIYXqGO KQJJpZPQ02HTZUPWBZNKmD5bwHfErm4cS73/mVmUpximnUqu/UsLRJp8SIGmBg1D W1Lz1BJX24MdV8aUnChYvdL5Hbl52q+BaE99Z0haOvW7I3YnKJC34mR8m/A5MiRf rsNIZNPHdq2U6pKLgCbOyXib8yBcJQqBb8x4WNAoB1h4MOx+ir5QLfr3ozrZs1an xXgqyiOBmtkUgCMIpXC4juPN/6gw3Y5nkk2VIWY+MAY1a7jZPbI+6LJZZ1Uk8R44 Lf2KSzabFMMYGKJYE1Znxk+JWV8iE+m+n6wWEfRM9l0b4gXXbrtKgaouFbsLcsQA CvBEQuiFSO9Kq01JPaAN1XDmhqyTarG6lHjXnW7ifNlLXnPbR1RJlprycExOhp0c axum5K2fRNW2+uZJt+zynMjk2kyjT1lnlsr1Rbgc4Pyionaiydg7zwpiac7y/bdS F7/WqdmPiff78KMi187EF5pjFqMWhthvBtTbWDuuxaxc2nrXSdiCUN+96j1amQUH yU/7AZzlnKeKEQQCR4xddsSs2eTrXiLLFRLk9GCK2eh4cUN212eHTrPLKkQf1cN+ ydYbR2pxw3B38LCCNBy+bL+u7e/Tyehs4ynATMpBuEgc5iocTwE= =zHXW -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull RDMA subsystem updates from Jason Gunthorpe: "This cycle mainly saw lots of bug fixes and clean up code across the core code and several drivers, few new functional changes were made. - Many cleanup and bug fixes for hns - Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed, bnxt_re, efa - Share the query_port code between all the iWarp drivers - General rework and cleanup of the ODP MR umem code to fit better with the mmu notifier get/put scheme - Support rdma netlink in non init_net name spaces - mlx5 support for XRC devx and DC ODP" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits) RDMA: Fix double-free in srq creation error flow RDMA/efa: Fix incorrect error print IB/mlx5: Free mpi in mp_slave mode IB/mlx5: Use the original address for the page during free_pages RDMA/bnxt_re: Fix spelling mistake "missin_resp" -> "missing_resp" RDMA/hns: Package operations of rq inline buffer into separate functions RDMA/hns: Optimize cmd init and mode selection for hip08 IB/hfi1: Define variables as unsigned long to fix KASAN warning IB/{rdmavt, hfi1, qib}: Add a counter for credit waits IB/hfi1: Add traces for TID RDMA READ RDMA/siw: Relax from kmap_atomic() use in TX path IB/iser: Support up to 16MB data transfer in a single command RDMA/siw: Fix page address mapping in TX path RDMA: Fix goto target to release the allocated memory RDMA/usnic: Avoid overly large buffers on stack RDMA/odp: Add missing cast for 32 bit RDMA/hns: Use devm_platform_ioremap_resource() to simplify code Documentation/infiniband: update name of some functions RDMA/cma: Fix false error message RDMA/hns: Fix wrong assignment of qp_access_flags ...	2019-09-21 10:26:24 -07:00
Linus Torvalds	84da111de0	hmm related patches for 5.4 This is more cleanup and consolidation of the hmm APIs and the very strongly related mmu_notifier interfaces. Many places across the tree using these interfaces are touched in the process. Beyond that a cleanup to the page walker API and a few memremap related changes round out the series: - General improvement of hmm_range_fault() and related APIs, more documentation, bug fixes from testing, API simplification & consolidation, and unused API removal - Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE, and make them internal kconfig selects - Hoist a lot of code related to mmu notifier attachment out of drivers by using a refcount get/put attachment idiom and remove the convoluted mmu_notifier_unregister_no_release() and related APIs. - General API improvement for the migrate_vma API and revision of its only user in nouveau - Annotate mmu_notifiers with lockdep and sleeping region debugging Two series unrelated to HMM or mmu_notifiers came along due to dependencies: - Allow pagemap's memremap_pages family of APIs to work without providing a struct device - Make walk_page_range() and related use a constant structure for function pointers -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl1/nnkACgkQOG33FX4g mxqaRg//c6FqowV1pQlLutvAOAgMdpzfZ9eaaDKngy9RVQxz+k/MmJrdRH/p/mMA Pq93A1XfwtraGKErHegFXGEDk4XhOustVAVFwvjyXO41dTUdoFVUkti6ftbrl/rS 6CT+X90jlvrwdRY7QBeuo7lxx7z8Qkqbk1O1kc1IOracjKfNJS+y6LTamy6weM3g tIMHI65PkxpRzN36DV9uCN5dMwFzJ73DWHp1b0acnDIigkl6u5zp6orAJVWRjyQX nmEd3/IOvdxaubAoAvboNS5CyVb4yS9xshWWMbH6AulKJv3Glca1Aa7QuSpBoN8v wy4c9+umzqRgzgUJUe1xwN9P49oBNhJpgBSu8MUlgBA4IOc3rDl/Tw0b5KCFVfkH yHkp8n6MP8VsRrzXTC6Kx0vdjIkAO8SUeylVJczAcVSyHIo6/JUJCVDeFLSTVymh EGWJ7zX2iRhUbssJ6/izQTTQyCH3YIyZ5QtqByWuX2U7ZrfkqS3/EnBW1Q+j+gPF Z2yW8iT6k0iENw6s8psE9czexuywa/Lttz94IyNlOQ8rJTiQqB9wLaAvg9hvUk7a kuspL+JGIZkrL3ouCeO/VA6xnaP+Q7nR8geWBRb8zKGHmtWrb5Gwmt6t+vTnCC2l olIDebrnnxwfBQhEJ5219W+M1pBpjiTpqK/UdBd92A4+sOOhOD0= =FRGg -----END PGP SIGNATURE----- Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull hmm updates from Jason Gunthorpe: "This is more cleanup and consolidation of the hmm APIs and the very strongly related mmu_notifier interfaces. Many places across the tree using these interfaces are touched in the process. Beyond that a cleanup to the page walker API and a few memremap related changes round out the series: - General improvement of hmm_range_fault() and related APIs, more documentation, bug fixes from testing, API simplification & consolidation, and unused API removal - Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE, and make them internal kconfig selects - Hoist a lot of code related to mmu notifier attachment out of drivers by using a refcount get/put attachment idiom and remove the convoluted mmu_notifier_unregister_no_release() and related APIs. - General API improvement for the migrate_vma API and revision of its only user in nouveau - Annotate mmu_notifiers with lockdep and sleeping region debugging Two series unrelated to HMM or mmu_notifiers came along due to dependencies: - Allow pagemap's memremap_pages family of APIs to work without providing a struct device - Make walk_page_range() and related use a constant structure for function pointers" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (75 commits) libnvdimm: Enable unit test infrastructure compile checks mm, notifier: Catch sleeping/blocking for !blockable kernel.h: Add non_block_start/end() drm/radeon: guard against calling an unpaired radeon_mn_unregister() csky: add missing brackets in a macro for tlb.h pagewalk: use lockdep_assert_held for locking validation pagewalk: separate function pointers from iterator data mm: split out a new pagewalk.h header from mm.h mm/mmu_notifiers: annotate with might_sleep() mm/mmu_notifiers: prime lockdep mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end mm/mmu_notifiers: remove the __mmu_notifier_invalidate_range_start/end exports mm/hmm: hmm_range_fault() infinite loop mm/hmm: hmm_range_fault() NULL pointer bug mm/hmm: fix hmm_range_fault()'s handling of swapped out pages mm/mmu_notifiers: remove unregister_no_release RDMA/odp: remove ib_ucontext from ib_umem RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm' RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr RDMA/mlx5: Use ib_umem_start instead of umem.address ...	2019-09-21 10:07:42 -07:00
David Ahern	77d5bc7e6a	ipv4: Revert removal of rt_uses_gateway Julian noted that rt_uses_gateway has a more subtle use than 'is gateway set': https://lore.kernel.org/netdev/alpine.LFD.2.21.1909151104060.2546@ja.home.ssi.bg/ Revert that part of the commit referenced in the Fixes tag. Currently, there are no u8 holes in 'struct rtable'. There is a 4-byte hole in the second cacheline which contains the gateway declaration. So move rt_gw_family down to the gateway declarations since they are always used together, and then re-use that u8 for rt_uses_gateway. End result is that rtable size is unchanged. Fixes: `1550c17193` ("ipv4: Prepare rtable for IPv6 gateway") Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-09-20 18:23:33 -07:00
Linus Torvalds	53e5e7a7a7	Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs namei updates from Al Viro: "Pathwalk-related stuff" [ Audit-related cleanups, misc simplifications, and easier to follow nd->root refcounts - Linus ] * 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: devpts_pty_kill(): don't bother with d_delete() infiniband: don't bother with d_delete() hypfs: don't bother with d_delete() fs/namei.c: keep track of nd->root refcount status fs/namei.c: new helper - legitimize_root() kill the last users of user_{path,lpath,path_dir}() namei.h: get the comments on LOOKUP_... in sync with reality kill LOOKUP_NO_EVAL, don't bother including namei.h from audit.h audit_inode(): switch to passing AUDIT_INODE_... filename_mountpoint(): make LOOKUP_NO_EVAL unconditional there filename_lookup(): audit_inode() argument is always 0	2019-09-18 13:03:01 -07:00
Linus Torvalds	81160dda9a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from David Miller: 1) Support IPV6 RA Captive Portal Identifier, from Maciej Żenczykowski. 2) Use bio_vec in the networking instead of custom skb_frag_t, from Matthew Wilcox. 3) Make use of xmit_more in r8169 driver, from Heiner Kallweit. 4) Add devmap_hash to xdp, from Toke Høiland-Jørgensen. 5) Support all variants of 5750X bnxt_en chips, from Michael Chan. 6) More RTNL avoidance work in the core and mlx5 driver, from Vlad Buslov. 7) Add TCP syn cookies bpf helper, from Petar Penkov. 8) Add 'nettest' to selftests and use it, from David Ahern. 9) Add extack support to drop_monitor, add packet alert mode and support for HW drops, from Ido Schimmel. 10) Add VLAN offload to stmmac, from Jose Abreu. 11) Lots of devm_platform_ioremap_resource() conversions, from YueHaibing. 12) Add IONIC driver, from Shannon Nelson. 13) Several kTLS cleanups, from Jakub Kicinski. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1930 commits) mlxsw: spectrum_buffers: Add the ability to query the CPU port's shared buffer mlxsw: spectrum: Register CPU port with devlink mlxsw: spectrum_buffers: Prevent changing CPU port's configuration net: ena: fix incorrect update of intr_delay_resolution net: ena: fix retrieval of nonadaptive interrupt moderation intervals net: ena: fix update of interrupt moderation register net: ena: remove all old adaptive rx interrupt moderation code from ena_com net: ena: remove ena_restore_ethtool_params() and relevant fields net: ena: remove old adaptive interrupt moderation code from ena_netdev net: ena: remove code duplication in ena_com_update_nonadaptive_moderation_interval _*() net: ena: enable the interrupt_moderation in driver_supported_features net: ena: reimplement set/get_coalesce() net: ena: switch to dim algorithm for rx adaptive interrupt moderation net: ena: add intr_moder_rx_interval to struct ena_com_dev and use it net: phy: adin: implement Energy Detect Powerdown mode via phy-tunable ethtool: implement Energy Detect Powerdown support via phy-tunable xen-netfront: do not assume sk_buff_head list is empty in error handling s390/ctcm: Delete unnecessary checks before the macro call “dev_kfree_skb” net: ena: don't wake up tx queue when down drop_monitor: Better sanitize notified packets ...	2019-09-18 12:34:53 -07:00
Linus Torvalds	4feaab05dc	LED updates for 5.4-rc1 -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQUwxxKyE5l/npt8ARiEGxRG/Sl2wUCXYAIeQAKCRBiEGxRG/Sl 2/SzAQDEnoNxzV/R5kWFd+2kmFeY3cll0d99KMrWJ8om+kje6QD/cXxZHzFm+T1L UPF66k76oOODV7cyndjXnTnRXbeCRAM= =Szby -----END PGP SIGNATURE----- Merge tag 'leds-for-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds Pull LED updates from Jacek Anaszewski: "In this cycle we've finally managed to contribute the patch set sorting out LED naming issues. Besides that there are many changes scattered among various LED class drivers and triggers. LED naming related improvements: - add new 'function' and 'color' fwnode properties and deprecate 'label' property which has been frequently abused for conveying vendor specific names that have been available in sysfs anyway - introduce a set of standard LED_FUNCTION* definitions - introduce a set of standard LED_COLOR_ID* definitions - add a new {devm_}led_classdev_register_ext() API with the capability of automatic LED name composition basing on the properties available in the passed fwnode; the function is backwards compatible in a sense that it uses 'label' data, if present in the fwnode, for creating LED name - add tools/leds/get_led_device_info.sh script for retrieving LED vendor, product and bus names, if applicable; it also performs basic validation of an LED name - update following drivers and their DT bindings to use the new LED registration API: - leds-an30259a, leds-gpio, leds-as3645a, leds-aat1290, leds-cr0014114, leds-lm3601x, leds-lm3692x, leds-lp8860, leds-lt3593, leds-sc27xx-blt Other LED class improvements: - replace {devm_}led_classdev_register() macros with inlines - allow to call led_classdev_unregister() unconditionally - switch to use fwnode instead of be stuck with OF one LED triggers improvements: - led-triggers: - fix dereferencing of null pointer - fix a memory leak bug - ledtrig-gpio: - GPIO 0 is valid Drop superseeded apu2/3 support from leds-apu since for apu2+ a newer, more complete driver exists, based on a generic driver for the AMD SOCs gpio-controller, supporting LEDs as well other devices: - drop profile field from priv data - drop iosize field from priv data - drop enum_apu_led_platform_types - drop superseeded apu2/3 led support - add pr_fmt prefix for better log output - fix error message on probing failure Other misc fixes and improvements to existing LED class drivers: - leds-ns2, leds-max77650: - add of_node_put() before return - leds-pwm, leds-is31fl32xx: - use struct_size() helper - leds-lm3697, leds-lm36274, leds-lm3532: - switch to use fwnode_property_count_uXX() - leds-lm3532: - fix brightness control for i2c mode - change the define for the fs current register - fixes for the driver for stability - add full scale current configuration - dt: Add property for full scale current. - avoid potentially unpaired regulator calls - move static keyword to the front of declarations - fix optional led-max-microamp prop error handling - leds-max77650: - add of_node_put() before return - add MODULE_ALIAS() - Switch to fwnode property API - leds-as3645a: - fix misuse of strlcpy - leds-netxbig: - add of_node_put() in netxbig_leds_get_of_pdata() - remove legacy board-file support - leds-is31fl319x: - simplify getting the adapter of a client - leds-ti-lmu-common: - fix coccinelle issue - move static keyword to the front of declaration - leds-syscon: - use resource managed variant of device register - leds-ktd2692: - fix a typo in the name of a constant - leds-lp5562: - allow firmware files up to the maximum length - leds-an30259a: - fix typo - leds-pca953x: - include the right header" * tag 'leds-for-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: (72 commits) leds: lm3532: Fix optional led-max-microamp prop error handling led: triggers: Fix dereferencing of null pointer leds: ti-lmu-common: Move static keyword to the front of declaration leds: lm3532: Move static keyword to the front of declarations leds: trigger: gpio: GPIO 0 is valid leds: pwm: Use struct_size() helper leds: is31fl32xx: Use struct_size() helper leds: ti-lmu-common: Fix coccinelle issue in TI LMU leds: lm3532: Avoid potentially unpaired regulator calls leds: syscon: Use resource managed variant of device register leds: Replace {devm_}led_classdev_register() macros with inlines leds: Allow to call led_classdev_unregister() unconditionally leds: lm3532: Add full scale current configuration dt: lm3532: Add property for full scale current. leds: lm3532: Fixes for the driver for stability leds: lm3532: Change the define for the fs current register leds: lm3532: Fix brightness control for i2c mode leds: Switch to use fwnode instead of be stuck with OF one leds: max77650: Switch to fwnode property API led: triggers: Fix a memory leak bug ...	2019-09-17 18:40:42 -07:00
Jack Morgenstein	3eca7fc2d8	RDMA: Fix double-free in srq creation error flow The cited commit introduced a double-free of the srq buffer in the error flow of procedure __uverbs_create_xsrq(). The problem is that ib_destroy_srq_user() called in the error flow also frees the srq buffer. Thus, if uverbs_response() fails in __uverbs_create_srq(), the srq buffer will be freed twice. Cc: <stable@vger.kernel.org> Fixes: `68e326dea1` ("RDMA: Handle SRQ allocations by IB/core") Link: https://lore.kernel.org/r/20190916071154.20383-5-leon@kernel.org Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-16 14:37:38 -03:00
Gal Pressman	a3f4b8e318	RDMA/efa: Fix incorrect error print The error print should indicate that it failed to get the queue attributes, not network attributes. Link: https://lore.kernel.org/r/20190910134301.4194-2-galpress@amazon.com Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Firas JahJah <firasj@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-16 14:25:43 -03:00
Danit Goldberg	5d44adebbb	IB/mlx5: Free mpi in mp_slave mode ib_add_slave_port() allocates a multiport struct but never frees it. Don't leak memory, free the allocated mpi struct during driver unload. Cc: <stable@vger.kernel.org> Fixes: `32f69e4be2` ("{net, IB}/mlx5: Manage port association for multiport RoCE") Link: https://lore.kernel.org/r/20190916064818.19823-3-leon@kernel.org Signed-off-by: Danit Goldberg <danitg@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-16 14:08:18 -03:00
Danit Goldberg	130c2c576e	IB/mlx5: Use the original address for the page during free_pages The removal of 'buffer' in the patch below caused free_page() to use a value that had been offset since the wqe pointer is adjusted while the routine runs. The current implementation of free_pages() rounds down to a pfn, discarding the adjustment, but this is not the right way to use the API. Preserve the initial value and use it for free_page(). Fixes: `0f51427bd0` ("RDMA/mlx5: Cleanup WQE page fault handler") Link: https://lore.kernel.org/r/20190916064818.19823-2-leon@kernel.org Signed-off-by: Danit Goldberg <danitg@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-16 13:39:56 -03:00
Colin Ian King	d97a3e92f3	RDMA/bnxt_re: Fix spelling mistake "missin_resp" -> "missing_resp" There is a spelling mistake in a literal string, fix it. Fixes: `89f81008ba` ("RDMA/bnxt_re: expose detailed stats retrieved from HW") Link: https://lore.kernel.org/r/20190911092856.11146-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-16 10:58:57 -03:00
Lijun Ou	395b59a116	RDMA/hns: Package operations of rq inline buffer into separate functions Here packages the codes of allocating and freeing rq inline buffer in hns_roce_create_qp_common function in order to reduce the complexity. Link: https://lore.kernel.org/r/1567068102-56919-3-git-send-email-liweihang@hisilicon.com Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-16 10:52:20 -03:00
Yixian Liu	3d50503b3b	RDMA/hns: Optimize cmd init and mode selection for hip08 There are two modes for mailbox command (cmd) queue, i.e., event mode and poll mode. For each mode, we use corresponding semaphores to protect the cmd queue resource competition, so called event_sem and poll_sem. During cmd init, both semaphores are initialized and poll mode is selected. Thus, there is no need to up poll_sema again in cmd_use_polling. Furthermore, there is no need to down the sema of the other side while switching mode. This patch aims to decouple the switch between event mode and poll mode of cmd. Link: https://lore.kernel.org/r/1567068102-56919-2-git-send-email-liweihang@hisilicon.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-16 10:52:20 -03:00
Ira Weiny	f8659d68e2	IB/hfi1: Define variables as unsigned long to fix KASAN warning Define the working variables to be unsigned long to be compatible with for_each_set_bit and change types as needed. While we are at it remove unused variables from a couple of functions. This was found because of the following KASAN warning: ================================================================== BUG: KASAN: stack-out-of-bounds in find_first_bit+0x19/0x70 Read of size 8 at addr ffff888362d778d0 by task kworker/u308:2/1889 CPU: 21 PID: 1889 Comm: kworker/u308:2 Tainted: G W 5.3.0-rc2-mm1+ #2 Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.02.04.0003.102320141138 10/23/2014 Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core] Call Trace: dump_stack+0x9a/0xf0 ? find_first_bit+0x19/0x70 print_address_description+0x6c/0x332 ? find_first_bit+0x19/0x70 ? find_first_bit+0x19/0x70 __kasan_report.cold.6+0x1a/0x3b ? find_first_bit+0x19/0x70 kasan_report+0xe/0x12 find_first_bit+0x19/0x70 pma_get_opa_portstatus+0x5cc/0xa80 [hfi1] ? ret_from_fork+0x3a/0x50 ? pma_get_opa_port_ectrs+0x200/0x200 [hfi1] ? stack_trace_consume_entry+0x80/0x80 hfi1_process_mad+0x39b/0x26c0 [hfi1] ? __lock_acquire+0x65e/0x21b0 ? clear_linkup_counters+0xb0/0xb0 [hfi1] ? check_chain_key+0x1d7/0x2e0 ? lock_downgrade+0x3a0/0x3a0 ? match_held_lock+0x2e/0x250 ib_mad_recv_done+0x698/0x15e0 [ib_core] ? clear_linkup_counters+0xb0/0xb0 [hfi1] ? ib_mad_send_done+0xc80/0xc80 [ib_core] ? mark_held_locks+0x79/0xa0 ? _raw_spin_unlock_irqrestore+0x44/0x60 ? rvt_poll_cq+0x1e1/0x340 [rdmavt] __ib_process_cq+0x97/0x100 [ib_core] ib_cq_poll_work+0x31/0xb0 [ib_core] process_one_work+0x4ee/0xa00 ? pwq_dec_nr_in_flight+0x110/0x110 ? do_raw_spin_lock+0x113/0x1d0 worker_thread+0x57/0x5a0 ? process_one_work+0xa00/0xa00 kthread+0x1bb/0x1e0 ? kthread_create_on_node+0xc0/0xc0 ret_from_fork+0x3a/0x50 The buggy address belongs to the page: page:ffffea000d8b5dc0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0x17ffffc0000000() raw: 0017ffffc0000000 0000000000000000 ffffea000d8b5dc8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected addr ffff888362d778d0 is located in stack of task kworker/u308:2/1889 at offset 32 in frame: pma_get_opa_portstatus+0x0/0xa80 [hfi1] this frame has 1 object: [32, 36) 'vl_select_mask' Memory state around the buggy address: ffff888362d77780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888362d77800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff888362d77880: 00 00 00 00 00 00 f1 f1 f1 f1 04 f2 f2 f2 00 00 ^ ffff888362d77900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888362d77980: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f2 f2 f2 ================================================================== Cc: <stable@vger.kernel.org> Fixes: `7724105686` ("IB/hfi1: add driver files") Link: https://lore.kernel.org/r/20190911113053.126040.47327.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-13 16:59:55 -03:00
Kaike Wan	7199435414	IB/{rdmavt, hfi1, qib}: Add a counter for credit waits This patch adds a counter for credit waits to assist field debugging. Link: https://lore.kernel.org/r/20190911113047.126040.10857.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-13 16:59:55 -03:00
Kaike Wan	c05fc15634	IB/hfi1: Add traces for TID RDMA READ This patch adds traces to debug packet loss and retry for TID RDMA READ protocol. Link: https://lore.kernel.org/r/20190911113041.126040.64541.stgit@awfm-01.aw.intel.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-13 16:59:55 -03:00
Bernard Metzler	4db8fd4973	RDMA/siw: Relax from kmap_atomic() use in TX path Since the transmit path is never executed in an atomic context, we do not need kmap_atomic() and can always use less demanding kmap(). Link: https://lore.kernel.org/r/20190909132945.30462-1-bmt@zurich.ibm.com Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-13 16:59:55 -03:00
Jason Gunthorpe	75c66515e4	Merge tag 'v5.3-rc8' into rdma.git for-next To resolve dependencies in following patches mlx5_ib.h conflict resolved by keeing both hunks Linux 5.3-rc8 Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-13 16:59:51 -03:00
Sergey Gorenko	1ba7c8f800	IB/iser: Support up to 16MB data transfer in a single command Maximum supported IO size is 8MB for the iSER driver. The current value is limited by the ISCSI_ISER_MAX_SG_TABLESIZE macro. But the driver is able to handle 16MB IOs without any significant changes. Increasing this limit can be useful for the storage arrays which are fine tuned for IOs larger than 8 MB. This commit allows to configure maximum IO size up to 16MB using the max_sectors module parameter. Link: https://lore.kernel.org/r/20190912103534.18210-1-sergeygo@mellanox.com Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-13 16:55:55 -03:00
Bernard Metzler	0404bd629f	RDMA/siw: Fix page address mapping in TX path Use the correct kmap()/kunmap() flow to determine page address used for CRC computation. Using page_address() is wrong, since page might be in highmem. Fixes: `b9be6f18cf` ("rdma/siw: transmit path") Link: https://lore.kernel.org/r/20190909132427.30264-1-bmt@zurich.ibm.com Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-13 16:55:55 -03:00
Navid Emamdoost	4a9d46a9fe	RDMA: Fix goto target to release the allocated memory In bnxt_re_create_srq(), when ib_copy_to_udata() fails allocated memory should be released by goto fail. Fixes: `37cb11acf1` ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters") Link: https://lore.kernel.org/r/20190910222120.16517-1-navid.emamdoost@gmail.com Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-13 16:55:55 -03:00
Arnd Bergmann	2ac5a6d3a9	RDMA/usnic: Avoid overly large buffers on stack It's never a good idea to put a 1000-byte buffer on the kernel stack. The compiler warns about this instance when usnic_ib_log_vf() gets inlined into usnic_ib_pci_probe(): drivers/infiniband/hw/usnic/usnic_ib_main.c:543:12: error: stack frame size of 1044 bytes in function 'usnic_ib_pci_probe' [-Werror,-Wframe-larger-than=] As this is only called for debugging purposes in the setup path, it's trivial to convert to a dynamic allocation. Link: https://lore.kernel.org/r/20190906155730.2750200-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-13 16:55:55 -03:00
Jason Gunthorpe	b97b218b30	RDMA/odp: Add missing cast for 32 bit length is a size_t which is unsigned int on 32 bit: ../drivers/infiniband/core/umem_odp.c: In function 'ib_init_umem_odp': ../include/linux/overflow.h:59:15: warning: comparison of distinct pointer types lacks a cast 59 \| (void) (&__a == &__b); \ \| ^~ ../drivers/infiniband/core/umem_odp.c:220:7: note: in expansion of macro 'check_add_overflow' Fixes: `204e3e5630` ("RDMA/odp: Check for overflow when computing the umem_odp end") Link: https://lore.kernel.org/r/20190908080726.30017-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-13 16:55:55 -03:00
YueHaibing	3b961b4f83	RDMA/hns: Use devm_platform_ioremap_resource() to simplify code Use devm_platform_ioremap_resource() to simplify the code a bit. This is detected by coccinelle. Link: https://lore.kernel.org/r/20190906141727.26552-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-13 16:55:55 -03:00
Håkon Bugge	a6e4d254c1	RDMA/cma: Fix false error message In addr_handler(), assuming status == 0 and the device already has been acquired (id_priv->cma_dev != NULL), we get the following incorrect "error" message: RDMA CM: ADDR_ERROR: failed to resolve IP. status 0 Fixes: `498683c6a7` ("IB/cma: Add debug messages to error flows") Link: https://lore.kernel.org/r/20190902092731.1055757-1-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-09-13 16:55:55 -03:00
David S. Miller	94810bd365	mlx5-updates-2019-09-01 (Software steering support) Abstract: -------- Mellanox ConnetX devices supports packet matching, packet modification and redirection. These functionalities are also referred to as flow-steering. To configure a steering rule, the rule is written to the device owned memory, this memory is accessed and cached by the device when processing a packet. Steering rules are constructed from multiple steering entries (STE). Rules are configured using the Firmware command interface. The Firmware processes the given driver command and translates them to STEs, then writes them to the device memory in the current steering tables. This process is slow due to the architecture of the command interface and the processing complexity of each rule. The highlight of this patchset is to cut the middle man (The firmware) and do steering rules programming into device directly from the driver, with no firmware intervention whatsoever. Motivation: ----------- Software (driver managed) steering allows for high rule insertion rates compared to the FW steering described above, this is achieved by using internal RDMA writes to the device owned memory instead of the slow command interface to program steering rules. Software (driver managed) steering, doesn't depend on new FW for new steering functionality, new implementations can be done in the driver skipping the FW layer. Performance: ------------ The insertion rate on a single core using the new approach allows programming ~300K rules per sec. (Done via direct raw test to the new mlx5 sw steering layer, without any kernel layer involved). Test: TC L2 rules 33K/s with Software steering (this patchset). 5K/s with FW and current driver. This will improve OVS based solution performance. Architecture and implementation details: ---------------------------------------- Software steering will be dynamically selected via devlink device parameter. Example: $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode pci/0000:06:00.0: name flow_steering_mode type driver-specific values: cmode runtime value smfs mlx5 software steering module a.k.a (DR - Direct Rule) is implemented and contained in mlx5/core/steering directory and controlled by MLX5_SW_STEERING kconfig flag. mlx5 core steering layer (fs_core) already provides a shim layer for implementing different steering mechanisms, software steering will leverage that as seen at the end of this series. When Software Steering for a specific steering domain (NIC/RDMA/Vport/ESwitch, etc ..) is supported, it will cause rules targeting this domain to be created using SW steering instead of FW. The implementation includes: Domain - The steering domain is the object that all other object resides in. It holds the memory allocator, send engine, locks and other shared data needed by lower objects such as table, matcher, rule, action. Each domain can contain multiple tables. Domain is equivalent to namespaces e.g (NIC/RDMA/Vport/ESwitch, etc ..) as implemented currently in mlx5_core fs_core (flow steering core). Table - Table objects are used for holding multiple matchers, each table has a level used to prevent processing loops. Packets are being directed to this table once it is set as the root table, this is done by fs_core using a FW command. A packet is being processed inside the table matcher by matcher until a successful hit, otherwise the packet will perform the default action. Matcher - Matchers objects are used to specify the fields mask for matching when processing a packet. A matcher belongs to a table, each matcher can hold multiple rules, each rule with different matching values corresponding to the matcher mask. Each matcher has a priority used for rule processing order inside the table. Action - Action objects are created to specify different steering actions such as count, reformat (encapsulate, decapsulate, ...), modify header, forward to table and many other actions. When creating a rule a sequence of actions can be provided to be executed on a successful match. Rule - Rule objects are used to specify a specific match on packets as well as the actions that should be executed. A rule belongs to a matcher. STE - This layer is used to hold the specific STE format for the device and to convert the requested rule to STEs. Each rule is constructed of an STE chain, Multiple rules construct a steering graph. Each node in the graph is a hash table containing multiple STEs. The index of each STE in the hash table is being calculated using a CRC32 hash function. Memory pool - Used for managing and caching device owned memory for rule insertion. The memory is being allocated using DM (device memory) API. Communication with device - layer for standard RDMA operation using RC QP to configure the device steering. Command utility - This module holds all of the FW commands that are required for SW steering to function. Patch planning and files: ------------------------- 1) First patch, adds the support to Add flow steering actions to fs_cmd shim layer. 2) Next 12 patch will add a file per each Software steering functionality/module as described above. (See patches with title: DR, ) 3) Add CONFIG_MLX5_SW_STEERING for software steering support and enable build with the new files 4) Next two patches will add the support for software steering in mlx5 steering shim layer net/mlx5: Add API to set the namespace steering mode net/mlx5: Add direct rule fs_cmd implementation 5) Last two patches will add the new devlink parameter to select mlx5 steering mode, will be valid only for switchdev mode for now. Two modes are supported: 1. DMFS - Device managed flow steering 2. SMFS - Software/Driver managed flow steering. In the DMFS mode, the HW steering entities are created through the FW. In the SMFS mode this entities are created though the driver directly. The driver will use the devlink steering mode only if the steering domain supports it, for now SMFS will manages only the switchdev eswitch steering domain. User command examples: - Set SMFS flow steering mode:: $ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime - Read device flow steering mode:: $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode pci/0000:06:00.0: name flow_steering_mode type driver-specific values: cmode runtime value smfs -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl1uxPAACgkQSD+KveBX +j5AkggAymoYqG2G+s8cLa4vQFySaD1Td3VzzWglp7PlpDBE3UcSoMAMg/gIDU1D 8F04PeCsJ6snt1ICk56vPNyAEHWfWeBUd56+QK5lEJBuwozyFvBh6HP81Bnr6T/n n6uTx45ljAFQPTHJjEOLBPSzEXecLu07+mvpzSoW0F3ehfGbELhL1IkVobr/RELx z4xZW9uM2vm5ylheWvjf4V1S/SvokgJazW9+4fh//rl8tfXgun5IfPoS0hqKie1/ h5sjcMSYkYR4gLVqrhKmBYHmHVl/h0TYROckW8iC/+XX7ailSo9uPG7lPa6cm+GE 7Bajlbz4oD/K5RWoByo+q+dmyjeVhQ== =M9bS -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2019-09-01-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-09-01 (Software steering support) Abstract: -------- Mellanox ConnetX devices supports packet matching, packet modification and redirection. These functionalities are also referred to as flow-steering. To configure a steering rule, the rule is written to the device owned memory, this memory is accessed and cached by the device when processing a packet. Steering rules are constructed from multiple steering entries (STE). Rules are configured using the Firmware command interface. The Firmware processes the given driver command and translates them to STEs, then writes them to the device memory in the current steering tables. This process is slow due to the architecture of the command interface and the processing complexity of each rule. The highlight of this patchset is to cut the middle man (The firmware) and do steering rules programming into device directly from the driver, with no firmware intervention whatsoever. Motivation: ----------- Software (driver managed) steering allows for high rule insertion rates compared to the FW steering described above, this is achieved by using internal RDMA writes to the device owned memory instead of the slow command interface to program steering rules. Software (driver managed) steering, doesn't depend on new FW for new steering functionality, new implementations can be done in the driver skipping the FW layer. Performance: ------------ The insertion rate on a single core using the new approach allows programming ~300K rules per sec. (Done via direct raw test to the new mlx5 sw steering layer, without any kernel layer involved). Test: TC L2 rules 33K/s with Software steering (this patchset). 5K/s with FW and current driver. This will improve OVS based solution performance. Architecture and implementation details: ---------------------------------------- Software steering will be dynamically selected via devlink device parameter. Example: $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode pci/0000:06:00.0: name flow_steering_mode type driver-specific values: cmode runtime value smfs mlx5 software steering module a.k.a (DR - Direct Rule) is implemented and contained in mlx5/core/steering directory and controlled by MLX5_SW_STEERING kconfig flag. mlx5 core steering layer (fs_core) already provides a shim layer for implementing different steering mechanisms, software steering will leverage that as seen at the end of this series. When Software Steering for a specific steering domain (NIC/RDMA/Vport/ESwitch, etc ..) is supported, it will cause rules targeting this domain to be created using SW steering instead of FW. The implementation includes: Domain - The steering domain is the object that all other object resides in. It holds the memory allocator, send engine, locks and other shared data needed by lower objects such as table, matcher, rule, action. Each domain can contain multiple tables. Domain is equivalent to namespaces e.g (NIC/RDMA/Vport/ESwitch, etc ..) as implemented currently in mlx5_core fs_core (flow steering core). Table - Table objects are used for holding multiple matchers, each table has a level used to prevent processing loops. Packets are being directed to this table once it is set as the root table, this is done by fs_core using a FW command. A packet is being processed inside the table matcher by matcher until a successful hit, otherwise the packet will perform the default action. Matcher - Matchers objects are used to specify the fields mask for matching when processing a packet. A matcher belongs to a table, each matcher can hold multiple rules, each rule with different matching values corresponding to the matcher mask. Each matcher has a priority used for rule processing order inside the table. Action - Action objects are created to specify different steering actions such as count, reformat (encapsulate, decapsulate, ...), modify header, forward to table and many other actions. When creating a rule a sequence of actions can be provided to be executed on a successful match. Rule - Rule objects are used to specify a specific match on packets as well as the actions that should be executed. A rule belongs to a matcher. STE - This layer is used to hold the specific STE format for the device and to convert the requested rule to STEs. Each rule is constructed of an STE chain, Multiple rules construct a steering graph. Each node in the graph is a hash table containing multiple STEs. The index of each STE in the hash table is being calculated using a CRC32 hash function. Memory pool - Used for managing and caching device owned memory for rule insertion. The memory is being allocated using DM (device memory) API. Communication with device - layer for standard RDMA operation using RC QP to configure the device steering. Command utility - This module holds all of the FW commands that are required for SW steering to function. Patch planning and files: ------------------------- 1) First patch, adds the support to Add flow steering actions to fs_cmd shim layer. 2) Next 12 patch will add a file per each Software steering functionality/module as described above. (See patches with title: DR, ) 3) Add CONFIG_MLX5_SW_STEERING for software steering support and enable build with the new files 4) Next two patches will add the support for software steering in mlx5 steering shim layer net/mlx5: Add API to set the namespace steering mode net/mlx5: Add direct rule fs_cmd implementation 5) Last two patches will add the new devlink parameter to select mlx5 steering mode, will be valid only for switchdev mode for now. Two modes are supported: 1. DMFS - Device managed flow steering 2. SMFS - Software/Driver managed flow steering. In the DMFS mode, the HW steering entities are created through the FW. In the SMFS mode this entities are created though the driver directly. The driver will use the devlink steering mode only if the steering domain supports it, for now SMFS will manages only the switchdev eswitch steering domain. User command examples: - Set SMFS flow steering mode:: $ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime - Read device flow steering mode:: $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode pci/0000:06:00.0: name flow_steering_mode type driver-specific values: cmode runtime value smfs ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-03 21:46:13 -07:00
Maor Gottlieb	2b688ea5ef	net/mlx5: Add flow steering actions to fs_cmd shim layer Add flow steering actions: modify header and packet reformat to the fs_cmd shim layer. This allows each namespace to define possibly different functionality for alloc/dealloc action commands. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-09-03 12:54:19 -07:00
Al Viro	6effcab4da	infiniband: don't bother with d_delete() Dentries are never retained there; d_delete() + dput() is no different from d_drop() + dput(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2019-09-03 09:30:55 -04:00
David S. Miller	765b7590c9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net r8152 conflicts are the NAPI fixes in 'net' overlapping with some tasklet stuff in net-next Signed-off-by: David S. Miller <davem@davemloft.net>	2019-09-02 11:20:17 -07:00
Saeed Mahameed	a06ebb8d95	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Merge mlx5-next patches needed for upcoming mlx5 software steering. 1) Alex adds HW bits and definitions required for SW steering 2) Ariel moves device memory management to mlx5_core (From mlx5_ib) 3) Maor, Cleanups and fixups for eswitch mode and RoCE 4) Mark, Set only stag for match untagged packets Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-09-02 00:16:05 -07:00
Ariel Levkovich	c9b9dcb430	net/mlx5: Move device memory management to mlx5_core Move the device memory allocation and deallocation commands SW ICM memory to mlx5_core to expose this API for all mlx5_core users. This comes as preparation for supporting SW steering in kernel where it will be required to allocate and register device memory for direct rule insertion. In addition, an API to register this device memory for future remote access operations is introduced using the create_mkey commands. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-09-01 23:44:41 -07:00
Saeed Mahameed	537f321097	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux mlx5 HW spec and bits updates: 1) Aya exposes IP-in-IP capability in mlx5_core. 2) Maxim exposes lag tx port affinity capabilities. 3) Moshe adds VNIC_ENV internal rq counter bits. 4) ODP capabilities for DC transport Misc updates: 5) Saeed, two compiler warnings cleanups 6) Add XRQ legacy commands opcodes 7) Use refcount_t for refcount 8) fix a -Wstringop-truncation warning	2019-08-28 11:48:56 -07:00
Lijun Ou	98c09b8c97	RDMA/hns: Fix wrong assignment of qp_access_flags We used wrong shifts when set qp_attr->qp_access_flag, this patch exchange V2_QP_RRE_S and V2_QP_RWE_S to fix it. Fixes: `2a3d923f87` ("RDMA/hns: Replace magic numbers with #defines") Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Link: https://lore.kernel.org/r/1566393276-42555-10-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-28 11:57:26 -04:00
Wenpeng Liang	afca2a2b83	RDMA/hns: Delete the not-used lines Delete the assignment of srq->ibsrq.ext.xrc.srq_num, beacause this value is not used. Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Link: https://lore.kernel.org/r/1566393276-42555-9-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-28 11:57:26 -04:00
Wenpeng Liang	18df508c79	RDMA/hns: Remove if-else judgment statements for creating srq Because if the value of srqwqe_buf_pg_sz is zero, npages and ib_umem_page_count are equivalent, page_shif and PAGE_SHIFT are equivalent in hns_roce_create_srq. Here remove it. Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Link: https://lore.kernel.org/r/1566393276-42555-8-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-28 11:57:26 -04:00
Lang Cheng	e075da5e7c	RDMA/hns: Add reset process for function-clear If the hardware is resetting, the driver should not perform the mailbox operation.Function-clear needs to add relevant judgment. Signed-off-by: Lang Cheng <chenglang@huawei.com> Link: https://lore.kernel.org/r/1566393276-42555-7-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-28 11:57:26 -04:00
Lang Cheng	bfe860351e	RDMA/hns: Fix cast from or to restricted __le32 for driver Sparse is whining about the u32 and __le32 mixed usage in the driver. The roce_set_field() is used to __le32 data of hardware only. If a variable is not delivered to the hardware, the __le32 type and related operations are not required. Signed-off-by: Lang Cheng <chenglang@huawei.com> Link: https://lore.kernel.org/r/1566393276-42555-6-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-28 11:57:26 -04:00
Lijun Ou	90c559b186	RDMA/hns: Remove the some magic number Here uses the meaningful macro instead of the magic number for readability. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Lang Chen <chenglang@huawei.com> Link: https://lore.kernel.org/r/1566393276-42555-5-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-28 11:57:26 -04:00
Lang Cheng	82e620d9c3	RDMA/hns: Modify the data structure of hns_roce_av we change type of some members to u32/u8 from __le32 as well as split sl_tclass_flowlabel into three variables in hns_roce_av. Signed-off-by: Lang Cheng <chenglang@huawei.com> Link: https://lore.kernel.org/r/1566393276-42555-4-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-28 11:57:26 -04:00
Bernard Metzler	531a64e4c3	RDMA/siw: Fix IPv6 addr_list locking Walking the address list of an inet6_dev requires appropriate locking. Since the called function siw_listen_address() may sleep, we have to use rtnl_lock() instead of read_lock_bh(). Also introduces sanity checks if we got a device from in_dev_get() or in6_dev_get(). Reported-by: Bart Van Assche <bvanassche@acm.org> Fixes: `6c52fdc244` ("rdma/siw: connection management") Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20190828130355.22830-1-bmt@zurich.ibm.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-28 10:29:19 -04:00
Jason Gunthorpe	a0d8994b30	Merge branch 'mlx5-odp-dc' into rdma.git for-next Michael Guralnik says: ==================== The series adds support for on-demand paging for DC transport. As DC is a mlx-only transport, the capabilities are exposed to the user using DEVX objects and later on through mlx5dv_query_device. ==================== Based on the mlx5-next branch from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for dependencies * branch 'mlx5-odp-dc': IB/mlx5: Add page fault handler for DC initiator WQE IB/mlx5: Remove check of FW capabilities in ODP page fault handling net/mlx5: Set ODP capabilities for DC transport to max	2019-08-28 11:25:37 -03:00
Michael Guralnik	75e46fc02c	IB/mlx5: Add page fault handler for DC initiator WQE Parsing DC initiator WQEs upon page fault requires skipping an address vector segment, as in UD WQEs. Link: https://lore.kernel.org/r/20190819120815.21225-4-leon@kernel.org Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-28 11:18:40 -03:00
Michael Guralnik	29af94987b	IB/mlx5: Remove check of FW capabilities in ODP page fault handling As page fault handling is initiated by FW, there is no need to check that the ODP supports the operation and transport. Link: https://lore.kernel.org/r/20190819120815.21225-3-leon@kernel.org Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-28 11:18:39 -03:00
David S. Miller	68aaf44595	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Minor conflict in r8169, bug fix had two versions in net and net-next, take the net-next hunks. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-27 14:23:31 -07:00
Markus Elfring	fd1a52f38c	RDMA/iwpm: Delete unnecessary checks before the macro call "dev_kfree_skb" The dev_kfree_skb() function performs also input parameter validation. Thus the test around the shown calls is not needed. This issue was detected by using the Coccinelle software. Link: https://lore.kernel.org/r/16df4c50-1f61-d7c4-3fc8-3073666d281d@web.de Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-27 13:09:23 -03:00
Gal Pressman	1bc5ba836e	RDMA/efa: Use existing FIELD_SIZEOF macro Use FIELD_SIZEOF macro instead of hard coding it in field_avail macro. Link: https://lore.kernel.org/r/20190826115350.21718-3-galpress@amazon.com Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Firas JahJah <firasj@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-27 13:01:15 -03:00
Gal Pressman	958b6813f0	RDMA/efa: Remove umem check on dereg MR flow EFA driver is not a kverbs provider, the check for MR umem is redundant. Link: https://lore.kernel.org/r/20190826115350.21718-2-galpress@amazon.com Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-27 13:01:14 -03:00
Mark Zhang	d8abe88450	RDMA/mlx5: RDMA_RX flow type support for user applications Currently user applications can only steer TCP/IP(NIC RX/RX) traffic. This patch adds RDMA_RX as a new flow type to allow the user to insert steering rules to control RDMA traffic. Two destinations are supported(but not set at the same time): devx flow table object and QP. Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190819113626.20284-4-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-26 10:30:00 -04:00
Bernard Metzler	c536277e0d	RDMA/siw: Fix 64/32bit pointer inconsistency Fixes improper casting between addresses and unsigned types. Changes siw_pbl_get_buffer() function to return appropriate dma_addr_t, and not u64. Also fixes debug prints. Now any potentially kernel private pointers are printed formatted as '%pK', to allow keeping that information secret. Fixes: d941bfe500be ("RDMA/siw: Change CQ flags from 64->32 bits") Fixes: `b0fff7317b` ("rdma/siw: completion queue methods") Fixes: `8b6a361b8c` ("rdma/siw: receive path") Fixes: `b9be6f18cf` ("rdma/siw: transmit path") Fixes: `f29dd55b02` ("rdma/siw: queue pair methods") Fixes: `2251334dca` ("rdma/siw: application buffer management") Fixes: `303ae1cdfd` ("rdma/siw: application interface") Fixes: `6c52fdc244` ("rdma/siw: connection management") Fixes: `a531975279` ("rdma/siw: main include file") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: Jason Gunthorpe <jgg@ziepe.ca> Reported-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20190822173738.26817-1-bmt@zurich.ibm.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-23 12:08:27 -04:00
Bernard Metzler	fab4f97e1f	RDMA/siw: Fix SGL mapping issues All user level and most in-kernel applications submit WQEs where the SG list entries are all of a single type. iSER in particular, however, will send us WQEs with mixed SG types: sge[0] = kernel buffer, sge[1] = PBL region. Check and set is_kva on each SG entry individually instead of assuming the first SGE type carries through to the last. This fixes iSER over siw. Fixes: `b9be6f18cf` ("rdma/siw: transmit path") Reported-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Tested-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20190822150741.21871-1-bmt@zurich.ibm.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-22 11:21:06 -04:00
Selvin Xavier	d37b1e5340	RDMA/bnxt_re: Fix stack-out-of-bounds in bnxt_qplib_rcfw_send_message Driver copies FW commands to the HW queue as units of 16 bytes. Some of the command structures are not exact multiple of 16. So while copying the data from those structures, the stack out of bounds messages are reported by KASAN. The following error is reported. [ 1337.530155] ================================================================== [ 1337.530277] BUG: KASAN: stack-out-of-bounds in bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530413] Read of size 16 at addr ffff888725477a48 by task rmmod/2785 [ 1337.530540] CPU: 5 PID: 2785 Comm: rmmod Tainted: G OE 5.2.0-rc6+ #75 [ 1337.530541] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014 [ 1337.530542] Call Trace: [ 1337.530548] dump_stack+0x5b/0x90 [ 1337.530556] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530560] print_address_description+0x65/0x22e [ 1337.530568] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530575] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530577] __kasan_report.cold.3+0x37/0x77 [ 1337.530581] ? _raw_write_trylock+0x10/0xe0 [ 1337.530588] ? bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530590] kasan_report+0xe/0x20 [ 1337.530592] memcpy+0x1f/0x50 [ 1337.530600] bnxt_qplib_rcfw_send_message+0x40a/0x850 [bnxt_re] [ 1337.530608] ? bnxt_qplib_creq_irq+0xa0/0xa0 [bnxt_re] [ 1337.530611] ? xas_create+0x3aa/0x5f0 [ 1337.530613] ? xas_start+0x77/0x110 [ 1337.530615] ? xas_clear_mark+0x34/0xd0 [ 1337.530623] bnxt_qplib_free_mrw+0x104/0x1a0 [bnxt_re] [ 1337.530631] ? bnxt_qplib_destroy_ah+0x110/0x110 [bnxt_re] [ 1337.530633] ? bit_wait_io_timeout+0xc0/0xc0 [ 1337.530641] bnxt_re_dealloc_mw+0x2c/0x60 [bnxt_re] [ 1337.530648] bnxt_re_destroy_fence_mr+0x77/0x1d0 [bnxt_re] [ 1337.530655] bnxt_re_dealloc_pd+0x25/0x60 [bnxt_re] [ 1337.530677] ib_dealloc_pd_user+0xbe/0xe0 [ib_core] [ 1337.530683] srpt_remove_one+0x5de/0x690 [ib_srpt] [ 1337.530689] ? __srpt_close_all_ch+0xc0/0xc0 [ib_srpt] [ 1337.530692] ? xa_load+0x87/0xe0 ... [ 1337.530840] do_syscall_64+0x6d/0x1f0 [ 1337.530843] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1337.530845] RIP: 0033:0x7ff5b389035b [ 1337.530848] Code: 73 01 c3 48 8b 0d 2d 0b 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fd 0a 2c 00 f7 d8 64 89 01 48 [ 1337.530849] RSP: 002b:00007fff83425c28 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [ 1337.530852] RAX: ffffffffffffffda RBX: 00005596443e6750 RCX: 00007ff5b389035b [ 1337.530853] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005596443e67b8 [ 1337.530854] RBP: 0000000000000000 R08: 00007fff83424ba1 R09: 0000000000000000 [ 1337.530856] R10: 00007ff5b3902960 R11: 0000000000000206 R12: 00007fff83425e50 [ 1337.530857] R13: 00007fff8342673c R14: 00005596443e6260 R15: 00005596443e6750 [ 1337.530885] The buggy address belongs to the page: [ 1337.530962] page:ffffea001c951dc0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 [ 1337.530964] flags: 0x57ffffc0000000() [ 1337.530967] raw: 0057ffffc0000000 0000000000000000 ffffffff1c950101 0000000000000000 [ 1337.530970] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 1337.530970] page dumped because: kasan: bad access detected [ 1337.530996] Memory state around the buggy address: [ 1337.531072] ffff888725477900: 00 00 00 00 f1 f1 f1 f1 00 00 00 00 00 f2 f2 f2 [ 1337.531180] ffff888725477980: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 [ 1337.531288] >ffff888725477a00: 00 f2 f2 f2 f2 f2 f2 00 00 00 f2 00 00 00 00 00 [ 1337.531393] ^ [ 1337.531478] ffff888725477a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1337.531585] ffff888725477b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1337.531691] ================================================================== Fix this by passing the exact size of each FW command to bnxt_qplib_rcfw_send_message as req->cmd_size. Before sending the command to HW, modify the req->cmd_size to number of 16 byte units. Fixes: `1ac5a40479` ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://lore.kernel.org/r/1566468170-489-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-22 10:52:15 -04:00
Jason Gunthorpe	47f725ee7b	RDMA/odp: remove ib_ucontext from ib_umem At this point the ucontext is only being stored to access the ib_device, so just store the ib_device directly instead. This is more natural and logical as the umem has nothing to do with the ucontext. Link: https://lore.kernel.org/r/20190806231548.25242-8-jgg@ziepe.ca Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-21 20:58:19 -03:00
Jason Gunthorpe	c571feca2d	RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm' This is a significant simplification, no extra list is kept per FD, and the interval tree is now shared between all the ucontexts, reducing overhead if there are multiple ucontexts active. Link: https://lore.kernel.org/r/20190806231548.25242-7-jgg@ziepe.ca Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-21 20:58:18 -03:00
Jason Gunthorpe	868df536f5	Merge branch 'odp_fixes' into rdma.git for-next Jason Gunthorpe says: ==================== This is a collection of general cleanups for ODP to clarify some of the flows around umem creation and use of the interval tree. ==================== The branch is based on v5.3-rc5 due to dependencies * odp_fixes: RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr RDMA/mlx5: Use ib_umem_start instead of umem.address RDMA/core: Make invalidate_range a device operation RDMA/odp: Use kvcalloc for the dma_list and page_list RDMA/odp: Check for overflow when computing the umem_odp end RDMA/odp: Provide ib_umem_odp_release() to undo the allocs RDMA/odp: Split creating a umem_odp from ib_umem_get RDMA/odp: Make the three ways to create a umem_odp clear RMDA/odp: Consolidate umem_odp initialization RDMA/odp: Make it clearer when a umem is an implicit ODP umem RDMA/odp: Iterate over the whole rbtree directly RDMA/odp: Use the common interval tree library instead of generic RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-21 14:10:36 -03:00
Jason Gunthorpe	fba0e448a2	RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr These are the same thing since mr always comes from odp->private. It is confusing to reference the same memory via two names. Link: https://lore.kernel.org/r/20190819111710.18440-13-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-21 14:08:43 -03:00
Jason Gunthorpe	a705f3e3a1	RDMA/mlx5: Use ib_umem_start instead of umem.address These are subtly different, the address is the original VA requested during umem_get, while ib_umem_start() is the version that is rounded to the proper page size, ie is the true start of the umem's dma map. Link: https://lore.kernel.org/r/20190819111710.18440-12-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-21 14:08:43 -03:00
Moni Shoua	ce51346fee	RDMA/core: Make invalidate_range a device operation The callback function 'invalidate_range' is implemented in a driver so the place for it is in the ib_device_ops structure and not in ib_ucontext. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Link: https://lore.kernel.org/r/20190819111710.18440-11-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-21 14:08:43 -03:00
Jason Gunthorpe	37824952dc	RDMA/odp: Use kvcalloc for the dma_list and page_list There is no specific need for these to be in the valloc space, let the system decide automatically how to do the allocation. Link: https://lore.kernel.org/r/20190819111710.18440-10-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-21 14:08:43 -03:00
Jason Gunthorpe	204e3e5630	RDMA/odp: Check for overflow when computing the umem_odp end Since the page size can be extended in the ODP case by IB_ACCESS_HUGETLB the existing overflow checks done by ib_umem_get() are not sufficient. Check for overflow again. Further, remove the unchecked math from the inlines and just use the precomputed value stored in the interval_tree_node. Link: https://lore.kernel.org/r/20190819111710.18440-9-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-21 14:08:43 -03:00
Jason Gunthorpe	0446cad9ca	RDMA/odp: Provide ib_umem_odp_release() to undo the allocs Now that there are allocator APIs that return the ib_umem_odp directly it should be freed through a umem_odp free'er as well. Link: https://lore.kernel.org/r/20190819111710.18440-8-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-21 14:08:42 -03:00
Jason Gunthorpe	261dc53f8e	RDMA/odp: Split creating a umem_odp from ib_umem_get This is the last creation API that is overloaded for both, there is very little code sharing and a driver has to be specifically ready for a umem_odp to be created to use the odp version. Link: https://lore.kernel.org/r/20190819111710.18440-7-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-21 14:08:42 -03:00
Jason Gunthorpe	f20bef6a95	RDMA/odp: Make the three ways to create a umem_odp clear The three paths to build the umem_odps are kind of muddled, they are: - As a normal ib_mr umem - As a child in an implicit ODP umem tree - As the root of an implicit ODP umem tree Only the first two are actually umem's, the last is an abuse. The implicit case can only be triggered by explicit driver request, it should never be co-mingled with the normal case. While we are here, make sensible function names and add some comments to make this clearer. Link: https://lore.kernel.org/r/20190819111710.18440-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-21 14:08:42 -03:00
Jason Gunthorpe	22d79c9a91	RMDA/odp: Consolidate umem_odp initialization This is done in two different places, consolidate all the post-allocation initialization into a single function. Link: https://lore.kernel.org/r/20190819111710.18440-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-21 14:08:42 -03:00
Jason Gunthorpe	fd7dbf035e	RDMA/odp: Make it clearer when a umem is an implicit ODP umem Implicit ODP umems are special, they don't have any page lists, they don't exist in the interval tree and they are never DMA mapped. Instead of trying to guess this based on a zero length use an explicit flag. Further, do not allow non-implicit umems to be 0 size. Link: https://lore.kernel.org/r/20190819111710.18440-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-21 14:08:42 -03:00
Jason Gunthorpe	f993de88a5	RDMA/odp: Iterate over the whole rbtree directly Instead of intersecting a full interval, just iterate over every element directly. This is faster and clearer. Link: https://lore.kernel.org/r/20190819111710.18440-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-21 14:08:24 -03:00
Jason Gunthorpe	7cc2e18f21	RDMA/odp: Use the common interval tree library instead of generic ODP is working with userspace VA's in the interval tree which always fit into an unsigned long, so we can use the common code. This comes at a cost of a 16 byte increase in ib_umem_odp struct size due to storing the interval tree start/last in addition to the umem addr/length. However these values were computed and are performance critical for the interval lookup, so this seems like a worthwhile trade off. Removes 2k of .text from the kernel. Link: https://lore.kernel.org/r/20190819111710.18440-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-21 13:34:09 -03:00
Wenwen Wang	2323d7baab	infiniband: hfi1: fix memory leaks In fault_opcodes_write(), 'data' is allocated through kcalloc(). However, it is not deallocated in the following execution if an error occurs, leading to memory leaks. To fix this issue, introduce the 'free_data' label to free 'data' before returning the error. Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/1566154486-3713-1-git-send-email-wenwen@cs.uga.edu Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:44:45 -04:00
Wenwen Wang	b08afa064c	infiniband: hfi1: fix a memory leak bug In fault_opcodes_read(), 'data' is not deallocated if debugfs_file_get() fails, leading to a memory leak. To fix this bug, introduce the 'free_data' label to free 'data' before returning the error. Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/1566156571-4335-1-git-send-email-wenwen@cs.uga.edu Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:44:45 -04:00
Wenwen Wang	5c1baaa82c	IB/mlx4: Fix memory leaks In mlx4_ib_alloc_pv_bufs(), 'tun_qp->tx_ring' is allocated through kcalloc(). However, it is not always deallocated in the following execution if an error occurs, leading to memory leaks. To fix this issue, free 'tun_qp->tx_ring' whenever an error occurs. Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Acked-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/1566159781-4642-1-git-send-email-wenwen@cs.uga.edu Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:44:45 -04:00
zhengbin	a7bfb93f02	RDMA/cma: fix null-ptr-deref Read in cma_cleanup In cma_init, if cma_configfs_init fails, need to free the previously memory and return fail, otherwise will trigger null-ptr-deref Read in cma_cleanup. cma_cleanup cma_configfs_exit configfs_unregister_subsystem Fixes: `045959db65` ("IB/cma: Add configfs for rdma_cm") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Link: https://lore.kernel.org/r/1566188859-103051-1-git-send-email-zhengbin13@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:44:45 -04:00
Moni Shoua	841b07f99a	IB/mlx5: Block MR WR if UMR is not possible Check conditions that are mandatory to post_send UMR WQEs. 1. Modifying page size. 2. Modifying remote atomic permissions if atomic access is required. If either condition is not fulfilled then fail to post_send() flow. Fixes: `c8d75a980f` ("IB/mlx5: Respect new UMR capabilities") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-9-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:44:45 -04:00
Moni Shoua	25a4517214	IB/mlx5: Fix MR re-registration flow to use UMR properly The UMR WQE in the MR re-registration flow requires that modify_atomic and modify_entity_size capabilities are enabled. Therefore, check that the these capabilities are present before going to umr flow and go through slow path if not. Fixes: `c8d75a980f` ("IB/mlx5: Respect new UMR capabilities") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-8-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:44:45 -04:00
Moni Shoua	008157528a	IB/mlx5: Report and handle ODP support properly ODP depends on the several device capabilities, among them is the ability to send UMR WQEs with that modify atomic and entity size of the MR. Therefore, only if all conditions to send such a UMR WQE are met then driver can report that ODP is supported. Use this check of conditions in all places where driver needs to know about ODP support. Also, implicit ODP support depends on ability of driver to send UMR WQEs for an indirect mkey. Therefore, verify that all conditions to do so are met when reporting support. Fixes: `c8d75a980f` ("IB/mlx5: Respect new UMR capabilities") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-7-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:44:45 -04:00
Moni Shoua	0e6613b41e	IB/mlx5: Consolidate use_umr checks into single function Introduce helper function to unify various use_umr checks. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-6-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:44:44 -04:00
Leon Romanovsky	60c78668ae	RDMA/restrack: Rewrite PID namespace check to be reliable task_active_pid_ns() is wrong API to check PID namespace because it posses some restrictions and return PID namespace where the process was allocated. It created mismatches with current namespace, which can be different. Rewrite whole rdma_is_visible_in_pid_ns() logic to provide reliable results without any relation to allocated PID namespace. Fixes: `8be565e65f` ("RDMA/nldev: Factor out the PID namespace check") Fixes: `6a6c306a09` ("RDMA/restrack: Make is_visible_in_pid_ns() as an API") Reviewed-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-4-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:44:44 -04:00
Leon Romanovsky	c8b32408b4	RDMA/counters: Properly implement PID checks "Auto" configuration mode is called for visible in that PID namespace and it ensures that all counters and QPs are coexist in the same namespace and belong to same PID. Fixes: `99fa331dc8` ("RDMA/counter: Add "auto" configuration mode support") Reviewed-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-3-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:44:44 -04:00
Ido Kalir	948a7287b2	IB/core: Fix NULL pointer dereference when bind QP to counter If QP is not visible to the pid, then we try to decrease its reference count and return from the function before the QP pointer is initialized. This lead to NULL pointer dereference. Fix it by pass directly the res to the rdma_restract_put as arg instead of &qp->res. This fixes below call trace: [ 5845.110329] BUG: kernel NULL pointer dereference, address: 00000000000000dc [ 5845.120482] Oops: 0002 [#1] SMP PTI [ 5845.129119] RIP: 0010:rdma_restrack_put+0x5/0x30 [ib_core] [ 5845.169450] Call Trace: [ 5845.170544] rdma_counter_get_qp+0x5c/0x70 [ib_core] [ 5845.172074] rdma_counter_bind_qpn_alloc+0x6f/0x1a0 [ib_core] [ 5845.173731] nldev_stat_set_doit+0x314/0x330 [ib_core] [ 5845.175279] rdma_nl_rcv_msg+0xeb/0x1d0 [ib_core] [ 5845.176772] ? __kmalloc_node_track_caller+0x20b/0x2b0 [ 5845.178321] rdma_nl_rcv+0xcb/0x120 [ib_core] [ 5845.179753] netlink_unicast+0x179/0x220 [ 5845.181066] netlink_sendmsg+0x2d8/0x3d0 [ 5845.182338] sock_sendmsg+0x30/0x40 [ 5845.183544] __sys_sendto+0xdc/0x160 [ 5845.184832] ? syscall_trace_enter+0x1f8/0x2e0 [ 5845.186209] ? __audit_syscall_exit+0x1d9/0x280 [ 5845.187584] __x64_sys_sendto+0x24/0x30 [ 5845.188867] do_syscall_64+0x48/0x120 [ 5845.190097] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: `1bd8e0a9d0` ("RDMA/counter: Allow manual mode configuration support") Signed-off-by: Ido Kalir <idok@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-2-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:44:44 -04:00
Kaike Wan	d9d1f5e7bb	IB/hfi1: Drop stale TID RDMA packets that cause TIDErr In a congested fabric with adaptive routing enabled, traces show that packets could be delivered out of order. A stale TID RDMA data packet could lead to TidErr if the TID entries have been released by duplicate data packets generated from retries, and subsequently erroneously force the qp into error state in the current implementation. Since the payload has already been dropped by hardware, the packet can be simply dropped and it is no longer necessary to put the qp into error state. Fixes: `9905bf06e8` ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192058.105923.72324.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:44:44 -04:00
Kaike Wan	90fdae66e7	IB/hfi1: Add additional checks when handling TID RDMA WRITE DATA packet In a congested fabric with adaptive routing enabled, traces show that packets could be delivered out of order, which could cause incorrect processing of stale packets. For stale TID RDMA WRITE DATA packets that cause KDETH EFLAGS errors, this patch adds additional checks before processing the packets. Fixes: `d72fe7d500` ("IB/hfi1: Add a function to receive TID RDMA WRITE DATA packet") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192051.105923.69979.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:44:44 -04:00
Kaike Wan	a8adbf7d0d	IB/hfi1: Add additional checks when handling TID RDMA READ RESP packet In a congested fabric with adaptive routing enabled, traces show that packets could be delivered out of order, which could cause incorrect processing of stale packets. For stale TID RDMA READ RESP packets that cause KDETH EFLAGS errors, this patch adds additional checks before processing the packets. Fixes: `9905bf06e8` ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192045.105923.59813.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:44:44 -04:00
Kaike Wan	35d5c8b82e	IB/hfi1: Unsafe PSN checking for TID RDMA READ Resp packet When processing a TID RDMA READ RESP packet that causes KDETH EFLAGS errors, the packet's IB PSN is checked against qp->s_last_psn and qp->s_psn without the protection of qp->s_lock, which is not safe. This patch fixes the issue by acquiring qp->s_lock first. Fixes: `9905bf06e8` ("IB/hfi1: Add functions to receive TID RDMA READ response") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192039.105923.7852.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:44:44 -04:00
Kaike Wan	d58c1834bf	IB/hfi1: Drop stale TID RDMA packets In a congested fabric with adaptive routing enabled, traces show that the sender could receive stale TID RDMA NAK packets that contain newer KDETH PSNs and older Verbs PSNs. If not dropped, these packets could cause the incorrect rewinding of the software flows and the incorrect completion of TID RDMA WRITE requests, and eventually leading to memory corruption and kernel crash. The current code drops stale TID RDMA ACK/NAK packets solely based on KDETH PSNs, which may lead to erroneous processing. This patch fixes the issue by also checking the Verbs PSN. Addition checks are added before rewinding the TID RDMA WRITE DATA packets. Fixes: `9e93e967f7` ("IB/hfi1: Add a function to receive TID RDMA ACK packet") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Link: https://lore.kernel.org/r/20190815192033.105923.44192.stgit@awfm-01.aw.intel.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:44:44 -04:00
Bernard Metzler	9b44007801	RDMA/siw: Fix potential NULL de-ref In siw_connect() we have an error flow where there is no valid qp pointer. Make sure we don't try to de-ref in that situation. Fixes: `6c52fdc244` ("rdma/siw: connection management") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20190819140257.19319-1-bmt@zurich.ibm.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:44:44 -04:00
Jason Gunthorpe	27b7fb1ab7	RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB When ODP is enabled with IB_ACCESS_HUGETLB then the required pages should be calculated based on the extent of the MR, which is rounded to the nearest huge page alignment. Fixes: `d2183c6f19` ("RDMA/umem: Move page_shift from ib_umem to ib_odp_umem") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-5-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:44:43 -04:00
Leon Romanovsky	b2299e8381	RDMA: Delete DEBUG code There is no need to keep DEBUG defines for out-of-the tree testing. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190819114547.20704-1-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 13:27:53 -04:00
Dan Carpenter	a7325af725	RDMA/hns: Fix some white space check_mtu_validate() This line was indented a bit too far. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20190816113907.GA30799@mwanda Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-20 12:57:09 -04:00
David S. Miller	446bf64b61	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Merge conflict of mlx5 resolved using instructions in merge commit `9566e650bf`. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-19 11:54:03 -07:00
Logan Gunthorpe	7f73eac3a7	PCI/P2PDMA: Introduce pci_p2pdma_unmap_sg() Add pci_p2pdma_unmap_sg() to the two places that call pci_p2pdma_map_sg(). This is a prep patch to introduce correct mappings for p2pdma transactions that go through the root complex. Link: https://lore.kernel.org/r/20190730163545.4915-10-logang@deltatee.com Link: https://lore.kernel.org/r/20190812173048.9186-10-logang@deltatee.com Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2019-08-16 08:41:26 -05:00
Lang Cheng	77905379e9	RDMA/hns: Remove unuseful member For struct hns_roce_cmdq, toggle is unused and delete it. Signed-off-by: Lang Cheng <chenglang@huawei.com> Link: https://lore.kernel.org/r/1565343666-73193-8-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-13 12:32:37 -04:00
Xi Wang	bf8c02f961	RDMA/hns: bugfix for slab-out-of-bounds when loading hip08 driver kasan will report a BUG when run command 'insmod hns_roce_hw_v2.ko', the calltrace is as follows: ================================================================== BUG: KASAN: slab-out-of-bounds in hns_roce_v2_init_eq_table+0x1324/0x1948 [hns_roce_hw_v2] Read of size 8 at addr ffff8020e7a10608 by task insmod/256 CPU: 0 PID: 256 Comm: insmod Tainted: G O 5.2.0-rc4 #1 Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 Call trace: dump_backtrace+0x0/0x1e8 show_stack+0x14/0x20 dump_stack+0xc4/0xfc print_address_description+0x60/0x270 __kasan_report+0x164/0x1b8 kasan_report+0xc/0x18 __asan_load8+0x84/0xa8 hns_roce_v2_init_eq_table+0x1324/0x1948 [hns_roce_hw_v2] hns_roce_init+0xf8/0xfe0 [hns_roce] __hns_roce_hw_v2_init_instance+0x284/0x330 [hns_roce_hw_v2] hns_roce_hw_v2_init_instance+0xd0/0x1b8 [hns_roce_hw_v2] hclge_init_roce_client_instance+0x180/0x310 [hclge] hclge_init_client_instance+0xcc/0x508 [hclge] hnae3_init_client_instance.part.3+0x3c/0x80 [hnae3] hnae3_register_client+0x134/0x1a8 [hnae3] hns_roce_hw_v2_init+0x14/0x10000 [hns_roce_hw_v2] do_one_initcall+0x9c/0x3e0 do_init_module+0xd4/0x2d8 load_module+0x3284/0x3690 __se_sys_init_module+0x274/0x308 __arm64_sys_init_module+0x40/0x50 el0_svc_handler+0xbc/0x210 el0_svc+0x8/0xc Allocated by task 256: __kasan_kmalloc.isra.0+0xd0/0x180 kasan_kmalloc+0xc/0x18 __kmalloc+0x16c/0x328 hns_roce_v2_init_eq_table+0x764/0x1948 [hns_roce_hw_v2] hns_roce_init+0xf8/0xfe0 [hns_roce] __hns_roce_hw_v2_init_instance+0x284/0x330 [hns_roce_hw_v2] hns_roce_hw_v2_init_instance+0xd0/0x1b8 [hns_roce_hw_v2] hclge_init_roce_client_instance+0x180/0x310 [hclge] hclge_init_client_instance+0xcc/0x508 [hclge] hnae3_init_client_instance.part.3+0x3c/0x80 [hnae3] hnae3_register_client+0x134/0x1a8 [hnae3] hns_roce_hw_v2_init+0x14/0x10000 [hns_roce_hw_v2] do_one_initcall+0x9c/0x3e0 do_init_module+0xd4/0x2d8 load_module+0x3284/0x3690 __se_sys_init_module+0x274/0x308 __arm64_sys_init_module+0x40/0x50 el0_svc_handler+0xbc/0x210 el0_svc+0x8/0xc Freed by task 0: (stack is not available) The buggy address belongs to the object at ffff8020e7a10600 which belongs to the cache kmalloc-128 of size 128 The buggy address is located 8 bytes inside of 128-byte region [ffff8020e7a10600, ffff8020e7a10680) The buggy address belongs to the page: page:ffff7fe00839e840 refcount:1 mapcount:0 mapping:ffff802340020200 index:0x0 flags: 0x5fffe00000000200(slab) raw: 5fffe00000000200 dead000000000100 dead000000000200 ffff802340020200 raw: 0000000000000000 0000000081000100 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8020e7a10500: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc ffff8020e7a10580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff8020e7a10600: 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff8020e7a10680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8020e7a10700: 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Disabling lock debugging due to kernel taint Fixes: `a5073d6054` ("RDMA/hns: Add eq support of hip08") Signed-off-by: Xi Wang <wangxi11@huawei.com> Link: https://lore.kernel.org/r/1565343666-73193-7-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-13 12:32:37 -04:00
Xi Wang	9bba3f0cbf	RDMA/hns: Bugfix for slab-out-of-bounds when unloading hip08 driver kasan will report a BUG when run command 'rmmod hns_roce_hw_v2', the calltrace is as follows: ================================================================== BUG: KASAN: slab-out-of-bounds in hns_roce_table_mhop_put+0x584/0x828 [hns_roce] Read of size 8 at addr ffff802185e08300 by task rmmod/270 Call trace: dump_backtrace+0x0/0x1e8 show_stack+0x14/0x20 dump_stack+0xc4/0xfc print_address_description+0x60/0x270 __kasan_report+0x164/0x1b8 kasan_report+0xc/0x18 __asan_load8+0x84/0xa8 hns_roce_table_mhop_put+0x584/0x828 [hns_roce] hns_roce_table_put+0x174/0x1a0 [hns_roce] hns_roce_mr_free+0x124/0x210 [hns_roce] hns_roce_dereg_mr+0x90/0xb8 [hns_roce] ib_dealloc_pd_user+0x60/0xf0 ib_mad_port_close+0x128/0x1d8 ib_mad_remove_device+0x94/0x118 remove_client_context+0xa0/0xe0 disable_device+0xfc/0x1c0 __ib_unregister_device+0x60/0xe0 ib_unregister_device+0x24/0x38 hns_roce_exit+0x3c/0x138 [hns_roce] __hns_roce_hw_v2_uninit_instance.isra.30+0x28/0x50 [hns_roce_hw_v2] hns_roce_hw_v2_uninit_instance+0x44/0x60 [hns_roce_hw_v2] hclge_uninit_client_instance+0x15c/0x238 [hclge] hnae3_uninit_client_instance+0x84/0xa8 [hnae3] hnae3_unregister_client+0x84/0x158 [hnae3] hns_roce_hw_v2_exit+0x14/0x20 [hns_roce_hw_v2] __arm64_sys_delete_module+0x20c/0x308 el0_svc_handler+0xbc/0x210 el0_svc+0x8/0xc Allocated by task 255: __kasan_kmalloc.isra.0+0xd0/0x180 kasan_kmalloc+0xc/0x18 __kmalloc+0x16c/0x328 hns_roce_init_hem_table+0x20c/0x428 [hns_roce] hns_roce_init+0x214/0xfe0 [hns_roce] __hns_roce_hw_v2_init_instance+0x284/0x330 [hns_roce_hw_v2] hns_roce_hw_v2_init_instance+0xd0/0x1b8 [hns_roce_hw_v2] hclge_init_roce_client_instance+0x180/0x310 [hclge] hclge_init_client_instance+0xcc/0x508 [hclge] hnae3_init_client_instance.part.3+0x3c/0x80 [hnae3] hnae3_register_client+0x134/0x1a8 [hnae3] 0xffff200009c00014 do_one_initcall+0x9c/0x3e0 do_init_module+0xd4/0x2d8 load_module+0x3284/0x3690 __se_sys_init_module+0x274/0x308 __arm64_sys_init_module+0x40/0x50 el0_svc_handler+0xbc/0x210 el0_svc+0x8/0xc Freed by task 0: (stack is not available) The buggy address belongs to the object at ffff802185e06300 which belongs to the cache kmalloc-8k of size 8192 The buggy address is located 0 bytes to the right of 8192-byte region [ffff802185e06300, ffff802185e08300) The buggy address belongs to the page: page:ffff7fe008617800 refcount:1 mapcount:0 mapping:ffff802340020e00 index:0x0 compound_mapcount: 0 flags: 0x5fffe00000010200(slab\|head) raw: 5fffe00000010200 dead000000000100 dead000000000200 ffff802340020e00 raw: 0000000000000000 00000000803e003e 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff802185e08200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff802185e08280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff802185e08300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff802185e08380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff802185e08400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Disabling lock debugging due to kernel taint Fixes: `a25d13cbe8` ("RDMA/hns: Add the interfaces to support multi hop addressing for the contexts in hip08") Signed-off-by: Xi Wang <wangxi11@huawei.com> Link: https://lore.kernel.org/r/1565343666-73193-6-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-13 12:32:37 -04:00
Yangyang Li	d7e5ca88d6	RDMA/hns: Modify pi vlaue when cq overflows When exiting "for loop", the actual value of pi will be increased by 1, which is compatible with the next calculation. But when pi is equal to "ci + hr_cq-> ib_cq.cqe", the "break" was called and the pi is actual value, it will lead one cqe still existing, so the "==" should be modify to ">". Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Link: https://lore.kernel.org/r/1565343666-73193-5-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-13 12:32:37 -04:00
Lijun Ou	76827087bb	RDMA/hns: Bugfix for creating qp attached to srq When create a qp and attached to srq, rq will no longer be used and the members of rq will be set zero. As a result, the wrid of rq will not be allocated and used. Fixes: `926a01dc00` ("RDMA/hns: Add QP operations support for hip08 SoC") Signed-off-by: Lijun Ou <oulijun@huawei.com> Link: https://lore.kernel.org/r/1565343666-73193-3-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-13 12:32:36 -04:00
Weihang Li	0e1aa6f095	RDMA/hns: Logic optimization of wc_flags We should set IB_WC_WITH_VLAN only when VLAN is enabled. In addition, move setting of IB_WC_WITH_SMAC below setting of wc->smac. Signed-off-by: Weihang Li <liweihang@hisilicon.com> Link: https://lore.kernel.org/r/1565343666-73193-2-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-13 12:32:36 -04:00
Leon Romanovsky	9dc4cfff11	RDMA/mlx5: Annotate lock dependency in bind/unbind slave port NULL-ing notifier_call is performed under protection of mlx5_ib_multiport_mutex lock. Such protection is not easily spotted and better to be guarded by lockdep annotation. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190813102814.22350-1-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-13 12:26:06 -04:00
Yishai Hadas	8293a598fe	IB/mlx5: Expose XRQ legacy commands over the DEVX interface Expose missing XRQ legacy commands over the DEVX interface. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190808084358.29517-5-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-13 12:24:32 -04:00
Yishai Hadas	972d7560ee	IB/mlx5: Add legacy events to DEVX list Add two events that were defined in the device specification but were not exposed in the driver list. Post this patch those events can be read over the DEVX events interface once be reported by the firmware. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Edward Srouji <edwards@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190808084358.29517-4-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-13 12:24:17 -04:00
Bernard Metzler	2c8ccb37b0	RDMA/siw: Change CQ flags from 64->32 bits This patch changes the driver/user shared (mmapped) CQ notification flags field from unsigned 64-bits size to unsigned 32-bits size. This enables building siw on 32-bit architectures. This patch changes the siw-abi, but as siw was only just merged in this merge window cycle, there are no released kernels with the prior abi. We are making no attempt to be binary compatible with siw user space libraries prior to the merge of siw into the upstream kernel, only moving forward with upstream kernels and upstream rdma-core provided siw libraries are we guaranteeing compatibility. Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20190809151816.13018-1-bmt@zurich.ibm.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-13 12:22:06 -04:00
Doug Ledford	749b9eef50	Merge remote-tracking branch 'mlx5-next/mlx5-next' into wip/dl-for-next Merging tip of mlx5-next in order to get changes related to adding XRQ support to the DEVX interface needed prior to the following two patches. Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-13 12:19:19 -04:00
Dan Carpenter	932727c556	RDMA/core: Fix error code in stat_get_doit_qp() We need to set the error codes on these paths. Currently the only possible error code is -EMSGSIZE so that's what the patch uses. Fixes: `83c2c1fcbd` ("RDMA/nldev: Allow get counter mode through RDMA netlink") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190809101311.GA17867@mwanda Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 11:05:05 -04:00
Dan Carpenter	17c19287ec	RDMA/siw: Fix a memory leak in siw_init_cpulist() The error handling code doesn't free siw_cpu_info.tx_valid_cpus[0]. The first iteration through the loop is a no-op so this is sort of an off by one bug. Also Bernard pointed out that we can remove the NULL assignment and simplify the code a bit. Fixes: `bdcf26bf9b` ("rdma/siw: network and RDMA core interface") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20190809140904.GB3552@mwanda Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:59:36 -04:00
Yishai Hadas	e9eec6a55c	IB/mlx5: Fix use-after-free error while accessing ev_file pointer Call to uverbs_close_fd() releases file pointer to 'ev_file' and mlx5_ib_dev is going to be inaccessible. Cache pointer prior cleaning resources to solve the KASAN warning below. BUG: KASAN: use-after-free in devx_async_event_close+0x391/0x480 [mlx5_ib] Read of size 8 at addr ffff888301e3cec0 by task devx_direct_tes/4631 CPU: 1 PID: 4631 Comm: devx_direct_tes Tainted: G OE 5.3.0-rc1-for-upstream-dbg-2019-07-26_01-19-56-93 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014 Call Trace: dump_stack+0x9a/0xeb print_address_description+0x1e2/0x400 ? devx_async_event_close+0x391/0x480 [mlx5_ib] __kasan_report+0x15c/0x1df ? devx_async_event_close+0x391/0x480 [mlx5_ib] kasan_report+0xe/0x20 devx_async_event_close+0x391/0x480 [mlx5_ib] __fput+0x26a/0x7b0 task_work_run+0x10d/0x180 exit_to_usermode_loop+0x137/0x160 do_syscall_64+0x3c7/0x490 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f5df907d664 Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 6a cd 20 00 48 63 ff 85 c0 75 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 44 f3 c3 66 90 48 83 ec 18 48 89 7c 24 08 e8 RSP: 002b:00007ffd353cb958 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 000056017a88c348 RCX: 00007f5df907d664 RDX: 00007f5df969d400 RSI: 00007f5de8f1ec90 RDI: 0000000000000006 RBP: 00007f5df9681dc0 R08: 00007f5de8736410 R09: 000056017a9d2dd0 R10: 000000000000000b R11: 0000000000000246 R12: 00007f5de899d7d0 R13: 00007f5df96c4248 R14: 00007f5de8f1ecb0 R15: 000056017ae41308 Allocated by task 4631: save_stack+0x19/0x80 kasan_kmalloc.constprop.3+0xa0/0xd0 alloc_uobj+0x71/0x230 [ib_uverbs] alloc_begin_fd_uobject+0x2e/0xc0 [ib_uverbs] rdma_alloc_begin_uobject+0x96/0x140 [ib_uverbs] ib_uverbs_run_method+0xdf0/0x1940 [ib_uverbs] ib_uverbs_cmd_verbs+0x57e/0xdb0 [ib_uverbs] ib_uverbs_ioctl+0x177/0x260 [ib_uverbs] do_vfs_ioctl+0x18f/0x1010 ksys_ioctl+0x70/0x80 __x64_sys_ioctl+0x6f/0xb0 do_syscall_64+0x95/0x490 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 4631: save_stack+0x19/0x80 __kasan_slab_free+0x11d/0x160 slab_free_freelist_hook+0x67/0x1a0 kfree+0xb9/0x2a0 uverbs_close_fd+0x118/0x1c0 [ib_uverbs] devx_async_event_close+0x28a/0x480 [mlx5_ib] __fput+0x26a/0x7b0 task_work_run+0x10d/0x180 exit_to_usermode_loop+0x137/0x160 do_syscall_64+0x3c7/0x490 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff888301e3cda8 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 280 bytes inside of 512-byte region [ffff888301e3cda8, ffff888301e3cfa8) The buggy address belongs to the page: page:ffffea000c078e00 refcount:1 mapcount:0 mapping:ffff888352811300 index:0x0 compound_mapcount: 0 flags: 0x2fffff80010200(slab\|head) raw: 002fffff80010200 ffffea000d152608 ffffea000c077808 ffff888352811300 raw: 0000000000000000 0000000000250025 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888301e3cd80: fc fc fc fc fc fb fb fb fb fb fb fb fb fb fb fb ffff888301e3ce00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888301e3ce80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888301e3cf00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888301e3cf80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc Disabling lock debugging due to kernel taint Cc: <stable@vger.kernel.org> # 5.2 Fixes: `7597385371` ("IB/mlx5: Enable subscription for device events over DEVX") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Link: https://lore.kernel.org/r/20190808081538.28772-1-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:46:30 -04:00
Lijun Ou	db50077b95	RDMA/hns: Use the new APIs for printing log Here uses the new APIs instead of some dev print interfaces in some functions. Signed-off-by: Lijun Ou <oulijun@huawei.com> Link: https://lore.kernel.org/r/1565276034-97329-15-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:45:09 -04:00
Weihang Li	d967e2625f	RDMA/hns: Disable alw_lcl_lpbk of SSU If we enabled alw_lcl_lpbk in promiscuous mode, packet whose source and destination mac address is equal will be handled in both inner loopback and outer loopback. This will halve performance of roce in promiscuous mode. Signed-off-by: Weihang Li <liweihang@hisilicon.com> Link: https://lore.kernel.org/r/1565276034-97329-14-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:45:09 -04:00
Weihang Li	249f2f921f	RDMA/hns: Remove redundant print in hns_roce_v2_ceq_int() There is no need to tell users when eq->cons_index is overflow, we just set it back to zero. Signed-off-by: Weihang Li <liweihang@hisilicon.com> Link: https://lore.kernel.org/r/1565276034-97329-13-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:45:09 -04:00
Yangyang Li	260c3b3499	RDMA/hns: Refactor hns_roce_v2_set_hem for hip08 In order to reduce the complexity of hns_roce_v2_set_hem, extract the implementation of op as a function. Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Link: https://lore.kernel.org/r/1565276034-97329-12-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:45:08 -04:00
Lang Cheng	4b42d05d0b	RDMA/hns: Remove unnecessary kzalloc For hns_roce_v2_query_qp and hns_roce_v2_modify_qp, we can use stack memory to create qp context data. Make the code simpler. Signed-off-by: Lang Cheng <chenglang@huawei.com> Link: https://lore.kernel.org/r/1565276034-97329-11-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:45:08 -04:00
Lang Cheng	bebdb83f97	RDMA/hns: Refactor irq request code Remove unnecessary if...else..., to make the code look simpler. Signed-off-by: Lang Cheng <chenglang@huawei.com> Link: https://lore.kernel.org/r/1565276034-97329-10-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:45:08 -04:00
Lang Cheng	e7f40440af	RDMA/hns: Split bool statement and assign statement Assign statement can not be contained in bool statement or function param. Signed-off-by: Lang Cheng <chenglang@huawei.com> Link: https://lore.kernel.org/r/1565276034-97329-9-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:45:08 -04:00
Lang Cheng	0e20ebf8d4	RDMA/hns: Handling the error return value of hem function Handling the error return value of hns_roce_calc_hem_mhop. Signed-off-by: Lang Cheng <chenglang@huawei.com> Link: https://lore.kernel.org/r/1565276034-97329-8-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:45:08 -04:00
Lang Cheng	6def7de6d4	RDMA/hns: Update some comments style Here removes some useless comments and adds necessary spaces to another comments. Signed-off-by: Lang Cheng <chenglang@huawei.com> Link: https://lore.kernel.org/r/1565276034-97329-7-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:45:08 -04:00
Lang Cheng	b5c229dc15	RDMA/hns: Clean up unnecessary initial assignment Here remove some unncessary initialization for some valiables. Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Link: https://lore.kernel.org/r/1565276034-97329-6-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:45:08 -04:00
Yixian Liu	2288b3b3b1	RDMA/hns: Remove unnessary init for cmq reg There is no need to init the enable bit of cmq. Signed-off-by: Yixian Liu <liuyixian@huawei.com> Link: https://lore.kernel.org/r/1565276034-97329-5-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:45:07 -04:00
Yixian Liu	ece9c205f7	RDMA/hns: Update the prompt message for creating and destroy qp Current prompt message is uncorrect when destroying qp, add qpn information when creating qp. Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Link: https://lore.kernel.org/r/1565276034-97329-4-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:45:07 -04:00
Lijun Ou	8ea417ffc2	RDMA/hns: Optimize hns_roce_modify_qp function Here mainly packages some code into some new functions in order to reduce code compelexity. Signed-off-by: Lijun Ou <oulijun@huawei.com> Link: https://lore.kernel.org/r/1565276034-97329-3-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:45:07 -04:00
Lijun Ou	cc95b23c25	RDMA/hns: Encapsulate some lines for setting sq size in user mode It needs to check the sq size with integrity when configures the relatived parameters of sq. Here moves the relatived code into a special function. Signed-off-by: Lijun Ou <oulijun@huawei.com> Link: https://lore.kernel.org/r/1565276034-97329-2-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:45:07 -04:00
YueHaibing	a3e2d4c7e7	RDMA/hns: remove obsolete Kconfig comment Since commit `a07fc0bb48` ("RDMA/hns: Fix build error") these kconfig comment is obsolete, so just remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20190807032228.6788-1-yuehaibing@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:19:43 -04:00
Kamal Heib	d8d5cfac45	RDMA/{cxgb3, cxgb4, i40iw}: Remove common code Now that we have a common iWARP query port function we can remove the common code from the iWARP drivers. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Potnuri Bharat Teja <bharat@chelsio.com> Link: https://lore.kernel.org/r/20190807103138.17219-5-kamalheib1@gmail.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:19:43 -04:00
Kamal Heib	4929116bdf	RDMA/core: Add common iWARP query port Add support for a common iWARP query port function, the new function includes a common code that is used by the iWARP devices to update the port attributes like max_mtu, active_mtu, state, and phys_state, the function also includes a call for the driver-specific query_port callback to query the device-specific port attributes. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Link: https://lore.kernel.org/r/20190807103138.17219-4-kamalheib1@gmail.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:19:43 -04:00
Kamal Heib	691f380df2	RDMA/cxgb3: Use ib_device_set_netdev() This change is required to associate the cxgb3 ib_dev with the underlying net_device, so in the upcoming patch we can call ib_device_get_netdev(). Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Link: https://lore.kernel.org/r/20190807103138.17219-3-kamalheib1@gmail.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:19:43 -04:00
Kamal Heib	72a7720fca	RDMA: Introduce ib_port_phys_state enum In order to improve readability, add ib_port_phys_state enum to replace the use of magic numbers. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Andrew Boyer <aboyer@tobark.org> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20190807103138.17219-2-kamalheib1@gmail.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-12 10:18:52 -04:00
Dan Carpenter	e7e6c6320c	IB/mlx5: Check the correct variable in error handling code The code accidentally checks "event_sub" instead of "event_sub->eventfd". Fixes: `7597385371` ("IB/mlx5: Enable subscription for device events over DEVX") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190807123236.GA11452@mwanda Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-07 16:46:41 -04:00
Mark Zhang	d97de8887a	RDMA/counter: Prevent QP counter binding if counters unsupported In case of rdma_counter_init() fails, counter allocation and QP bind should not be allowed. Fixes: `413d334750` ("RDMA/counter: Add set/clear per-port auto mode support") Fixes: `1bd8e0a9d0` ("RDMA/counter: Allow manual mode configuration support") Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190807101819.7581-1-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-07 16:09:23 -04:00

... 5 6 7 8 9 ...

10920 Commits