Commit Graph

8301 Commits

Author SHA1 Message Date
Zhu Yanjun
25d0e2db3d IB/mlx5: remove duplicate header file
The header file fs_helpers.h is included twice. So it should be removed.

Fixes: 802c212568 ("IB/mlx5: Add IPsec support for egress and ingress")
CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-16 16:23:02 -06:00
Shamir Rabinovitch
ef95a90ae6 RDMA/ucma: ucma_context reference leak in error path
Validating input parameters should be done before getting the cm_id
otherwise it can leak a cm_id reference.

Fixes: 6a21dfc0d0 ("RDMA/ucma: Limit possible option size")
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-16 09:49:24 -06:00
Linus Torvalds
19fd08b85b Merge candidates for 4.17 merge window
- Fix RDMA uapi headers to actually compile in userspace and be more
   complete
 
 - Three shared with netdev pull requests from Mellanox:
 
    * 7 patches, mostly to net with 1 IB related one at the back). This
      series addresses an IRQ performance issue (patch 1), cleanups related to
      the fix for the IRQ performance problem (patches 2-6), and then extends
      the fragmented completion queue support that already exists in the net
      side of the driver to the ib side of the driver (patch 7).
 
    * Mostly IB, with 5 patches to net that are needed to support the remaining
      10 patches to the IB subsystem. This series extends the current
      'representor' framework when the mlx5 driver is in switchdev mode from
      being a netdev only construct to being a netdev/IB dev construct. The IB
      dev is limited to raw Eth queue pairs only, but by having an IB dev of
      this type attached to the representor for a switchdev port, it enables
      DPDK to work on the switchdev device.
 
    * All net related, but needed as infrastructure for the rdma driver
 
 - Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers
 
 - SRP performance updates
 
 - IB uverbs write path cleanup patch series from Leon
 
 - Add RDMA_CM support to ib_srpt. This is disabled by default.  Users need to
   set the port for ib_srpt to listen on in configfs in order for it to be
   enabled (/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port)
 
 - TSO and Scatter FCS support in mlx4
 
 - Refactor of modify_qp routine to resolve problems seen while working on new
   code that is forthcoming
 
 - More refactoring and updates of RDMA CM for containers support from Parav
 
 - mlx5 'fine grained packet pacing', 'ipsec offload' and 'device memory'
   user API features
 
 - Infrastructure updates for the new IOCTL interface, based on increased usage
 
 - ABI compatibility bug fixes to fully support 32 bit userspace on 64 bit
   kernel as was originally intended. See the commit messages for
   extensive details
 
 - Syzkaller bugs and code cleanups motivated by them
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJax5Z0AAoJEDht9xV+IJsacCwQAJBIgmLCvVp5fBu2kJcXMMVI
 y3l2YNzAUJvDDKv1r5yTC9ugBXEkDtgzi/W/C2/5es2yUG/QeT/zzQ3YPrtsnN68
 5FkiXQ35Tt7+PBHMr0cacGRmF4M3Td3MeW0X5aJaBKhqlNKwA+aF18pjGWBmpVYx
 URYCwLb5BZBKVh4+1Leebsk4i0/7jSauAqE5M+9notuAUfBCoY1/Eve3DipEIBBp
 EyrEnMDIdujYRsg4KHlxFKKJ1EFGItknLQbNL1+SEa0Oe0SnEl5Bd53Yxfz7ekNP
 oOWQe5csTcs3Yr4Ob0TC+69CzI71zKbz6qPDILTwXmsPFZJ9ipJs4S8D6F7ra8tb
 D5aT1EdRzh/vAORPC9T3DQ3VsHdvhwpUMG7knnKrVT9X/g7E+gSji1BqaQaTr/xs
 i40GepHT7lM/TWEuee/6LRpqdhuOhud7vfaRFwn2JGRX9suqTcvwhkBkPUDGV5XX
 5RkHcWOb/7KvmpG7S1gaRGK5kO208LgmAZi7REaJFoZB74FqSneMR6NHIH07ha41
 Zou7rnxV68CT2bgu27m+72EsprgmBkVDeEzXgKxVI/+PZ1oadUFpgcZ3pRLOPWVx
 rEqjHu65rlA/YPog4iXQaMfSwt/oRD3cVJS/n8EdJKXi4Qt2RDDGdyOmt74w4prM
 QuLEdvJIFmwrND1KDoqn
 =Ku8g
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Doug and I are at a conference next week so if another PR is sent I
  expect it to only be bug fixes. Parav noted yesterday that there are
  some fringe case behavior changes in his work that he would like to
  fix, and I see that Intel has a number of rc looking patches for HFI1
  they posted yesterday.

  Parav is again the biggest contributor by patch count with his ongoing
  work to enable container support in the RDMA stack, followed by Leon
  doing syzkaller inspired cleanups, though most of the actual fixing
  went to RC.

  There is one uncomfortable series here fixing the user ABI to actually
  work as intended in 32 bit mode. There are lots of notes in the commit
  messages, but the basic summary is we don't think there is an actual
  32 bit kernel user of drivers/infiniband for several good reasons.

  However we are seeing people want to use a 32 bit user space with 64
  bit kernel, which didn't completely work today. So in fixing it we
  required a 32 bit rxe user to upgrade their userspace. rxe users are
  still already quite rare and we think a 32 bit one is non-existing.

   - Fix RDMA uapi headers to actually compile in userspace and be more
     complete

   - Three shared with netdev pull requests from Mellanox:

      * 7 patches, mostly to net with 1 IB related one at the back).
        This series addresses an IRQ performance issue (patch 1),
        cleanups related to the fix for the IRQ performance problem
        (patches 2-6), and then extends the fragmented completion queue
        support that already exists in the net side of the driver to the
        ib side of the driver (patch 7).

      * Mostly IB, with 5 patches to net that are needed to support the
        remaining 10 patches to the IB subsystem. This series extends
        the current 'representor' framework when the mlx5 driver is in
        switchdev mode from being a netdev only construct to being a
        netdev/IB dev construct. The IB dev is limited to raw Eth queue
        pairs only, but by having an IB dev of this type attached to the
        representor for a switchdev port, it enables DPDK to work on the
        switchdev device.

      * All net related, but needed as infrastructure for the rdma
        driver

   - Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers

   - SRP performance updates

   - IB uverbs write path cleanup patch series from Leon

   - Add RDMA_CM support to ib_srpt. This is disabled by default. Users
     need to set the port for ib_srpt to listen on in configfs in order
     for it to be enabled
     (/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port)

   - TSO and Scatter FCS support in mlx4

   - Refactor of modify_qp routine to resolve problems seen while
     working on new code that is forthcoming

   - More refactoring and updates of RDMA CM for containers support from
     Parav

   - mlx5 'fine grained packet pacing', 'ipsec offload' and 'device
     memory' user API features

   - Infrastructure updates for the new IOCTL interface, based on
     increased usage

   - ABI compatibility bug fixes to fully support 32 bit userspace on 64
     bit kernel as was originally intended. See the commit messages for
     extensive details

   - Syzkaller bugs and code cleanups motivated by them"

* tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (199 commits)
  IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
  IB/mlx5: Device memory mr registration support
  net/mlx5: Mkey creation command adjustments
  IB/mlx5: Device memory support in mlx5_ib
  net/mlx5: Query device memory capabilities
  IB/uverbs: Add device memory registration ioctl support
  IB/uverbs: Add alloc/free dm uverbs ioctl support
  IB/uverbs: Add device memory capabilities reporting
  IB/uverbs: Expose device memory capabilities to user
  RDMA/qedr: Fix wmb usage in qedr
  IB/rxe: Removed GID add/del dummy routines
  RDMA/qedr: Zero stack memory before copying to user space
  IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR
  IB/mlx5: Add information for querying IPsec capabilities
  IB/mlx5: Add IPsec support for egress and ingress
  {net,IB}/mlx5: Add ipsec helper
  IB/mlx5: Add modify_flow_action_esp verb
  IB/mlx5: Add implementation for create and destroy action_xfrm
  IB/uverbs: Introduce ESP steering match filter
  IB/uverbs: Add modify ESP flow_action
  ...
2018-04-06 17:35:43 -07:00
Mikhail Malygin
efc365e729 IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
On ppc64le arch rxe_add command causes oops in kernel log:

[   92.495140] Oops: Kernel access of bad area, sig: 11 [#1]
[   92.499710] SMP NR_CPUS=2048 NUMA pSeries
[   92.499792] Modules linked in: ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) nf_conntrack_netlink(E) nfnetlink(E) xfrm_user(E) iptable
_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) xt_addrtype(E) iptable_filter(E) ip_tables(E) xt_conntrack(E) x_tables(E)
 nf_nat(E) nf_conntrack(E) br_netfilter(E) bridge(E) stp(E) llc(E) overlay(E) af_packet(E) rpcrdma(E) ib_isert(E) iscsi_target_mod(E) i
b_iser(E) libiscsi(E) ib_srpt(E) target_core_mod(E) ib_srp(E) ib_ipoib(E) rdma_ucm(E) ib_ucm(E) ib_uverbs(E) ib_umad(E) bochs_drm(E) tt
m(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) drm(E) agpgart(E) virtio_rng(E) virtio_console(E) rtc_
generic(E) dm_ec(OEN) ttln_rdma(OEN) rdma_cm(E) configfs(E) iw_cm(E) ib_cm(E) rdma_rxe(E) ip6_udp_tunnel(E) udp_tunnel(E) ib_core(E) ql
a2xxx(E)
[   92.499832]  scsi_transport_fc(E) nvme_fc(E) nvme_fabrics(E) nvme_core(E) ipmi_watchdog(E) ipmi_ssif(E) ipmi_poweroff(E) ipmi_powernv(EX) ipmi_devintf(E) ipmi_msghandler(E) dummy(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_service_time(E) scsi_transport_iscsi(E) sd_mod(E) sr_mod(E) cdrom(E) hid_generic(E) usbhid(E) virtio_blk(E) virtio_scsi(E) virtio_net(E) ibmvscsi(EX) scsi_transport_srp(E) xhci_pci(E) xhci_hcd(E) usbcore(E) usb_common(E) virtio_pci(E) virtio_ring(E) virtio(E) sunrpc(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E)
[   92.499834] Supported: No, Unsupported modules are loaded
[   92.499839] CPU: 3 PID: 5576 Comm: sh Tainted: G           OE   NX 4.4.120-ttln.17-default #1
[   92.499841] task: c0000000afe8a490 ti: c0000000beba8000 task.ti: c0000000beba8000
[   92.499842] NIP: c00000000008ba3c LR: c000000000027644 CTR: c00000000008ba10
[   92.499844] REGS: c0000000bebab750 TRAP: 0300   Tainted: G           OE   NX  (4.4.120-ttln.17-default)
[   92.499850] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28424428  XER: 20000000
[   92.499871] CFAR: 0000000000002424 DAR: 0000000000000208 DSISR: 40000000 SOFTE: 1
               GPR00: c000000000027644 c0000000bebab9d0 c000000000f09700 0000000000000000
               GPR04: d0000000043d7192 0000000000000002 000000000000001a fffffffffffffffe
               GPR08: 000000000000009c c00000000008ba10 d0000000043e5848 d0000000043d3828
               GPR12: c00000000008ba10 c000000007a02400 0000000010062e38 0000010020388860
               GPR16: 0000000000000000 0000000000000000 00000100203885f0 00000000100f6c98
               GPR20: c0000000b3f1fcc0 c0000000b3f1fc48 c0000000b3f1fbd0 c0000000b3f1fb58
               GPR24: c0000000b3f1fae0 c0000000b3f1fa68 00000000000005dc c0000000b3f1f9f0
               GPR28: d0000000043e5848 c0000000b3f1f900 c0000000b3f1f320 c0000000b3f1f000
[   92.499881] NIP [c00000000008ba3c] dma_get_required_mask_pSeriesLP+0x2c/0x1a0
[   92.499885] LR [c000000000027644] dma_get_required_mask+0x44/0xac
[   92.499886] Call Trace:
[   92.499891] [c0000000bebab9d0] [c0000000bebaba30] 0xc0000000bebaba30 (unreliable)
[   92.499894] [c0000000bebaba10] [c000000000027644] dma_get_required_mask+0x44/0xac
[   92.499904] [c0000000bebaba30] [d0000000043cb4b4] rxe_register_device+0xc4/0x430 [rdma_rxe]
[   92.499910] [c0000000bebabab0] [d0000000043c06c8] rxe_add+0x448/0x4e0 [rdma_rxe]
[   92.499915] [c0000000bebabb30] [d0000000043d28dc] rxe_net_add+0x4c/0xf0 [rdma_rxe]
[   92.499921] [c0000000bebabb60] [d0000000043d305c] rxe_param_set_add+0x6c/0x1ac [rdma_rxe]
[   92.499924] [c0000000bebabbf0] [c0000000000e78c0] param_attr_store+0xa0/0x180
[   92.499927] [c0000000bebabc70] [c0000000000e6448] module_attr_store+0x48/0x70
[   92.499932] [c0000000bebabc90] [c000000000391f60] sysfs_kf_write+0x70/0xb0
[   92.499935] [c0000000bebabcb0] [c000000000390f1c] kernfs_fop_write+0x18c/0x1e0
[   92.499939] [c0000000bebabd00] [c0000000002e22ac] __vfs_write+0x4c/0x1d0
[   92.499942] [c0000000bebabd90] [c0000000002e2f94] vfs_write+0xc4/0x200
[   92.499945] [c0000000bebabde0] [c0000000002e488c] SyS_write+0x6c/0x110
[   92.499948] [c0000000bebabe30] [c000000000009384] system_call+0x38/0xe4
[   92.499949] Instruction dump:
[   92.499954] 4e800020 3c4c00e8 3842dcf0 7c0802a6 f8010010 60000000 7c0802a6 fba1ffe8
[   92.499958] fbc1fff0 fbe1fff8 f8010010 f821ffc1 <e9230208> 7c7e1b78 2fa90000 419e0078
[   92.499962] ---[ end trace bed077e15eb420cf ]---

It fails in dma_get_required_mask, that has ppc-specific implementation,
and fail if provided device argument is NULL

Signed-off-by: Mikhail Malygin <mikhail@malygin.me>
Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 13:04:50 -06:00
Ariel Levkovich
6c29f57ea4 IB/mlx5: Device memory mr registration support
Adding mlx5_ib driver implementation for reg_dm_mr callback
which allows registering device memory (DM) as an MR for
local and remote access.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 13:04:49 -06:00
Ariel Levkovich
cdbd0d2bae net/mlx5: Mkey creation command adjustments
This change updates the mlx5 interface to create mkey
on the device.

The updates in the command mailbox include increasing the
access mode type field to 5 bits in order to support additional
types such as MLX5_MKC_ACCESS_MODE_MEMIC which represents device
memory access type and will be used when registering MR on allocated
device memory.

All the places that use the old access mode format are adjusted as
well.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 13:04:49 -06:00
Ariel Levkovich
24da00164f IB/mlx5: Device memory support in mlx5_ib
This patch adds the mlx5_ib driver implementation for the device
memory allocation API.
It implements the ib_device callbacks for allocation and deallocation
operations as well as a new mmap command support which allows mapping
an allocated device memory to a VMA.

The change also adds reporting of device memory maximum size and
alignment parameters reported in device capabilities.

The allocation/deallocation operations are using new firmware
commands to allocate MEMIC memory on the device.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 13:04:49 -06:00
Ariel Levkovich
be934cca9e IB/uverbs: Add device memory registration ioctl support
Adding new ioctl method for the MR object - REG_DM_MR.

This command can be used by users to register an allocated
device memory buffer as an MR and receive lkey and rkey
to be used within work requests.

It is added as a new method under the MR object and using a new
ib_device callback - reg_dm_mr.
The command creates a standard ib_mr object which represents the
registered memory.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 11:16:39 -06:00
Ariel Levkovich
bee76d7ab5 IB/uverbs: Add alloc/free dm uverbs ioctl support
This change adds uverbs support for allocation/freeing
of device memory commands.

A new uverbs object is defined of type idr to represent
and track the new resource type allocation per context.

The API requires provider driver to implement 2 new ib_device
callbacks - one for allocation and one for deallocation which
return and accept (respectively) the ib_dm object which represents
the allocated memory on the device.

The support is added via the ioctl command infrastructure
only.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 11:16:39 -06:00
Ariel Levkovich
1d8eeb9f6a IB/uverbs: Add device memory capabilities reporting
This change allows vendors to report device memory capability
max_dm_size - to user via uverbs command.

Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 11:16:39 -06:00
Kalderon, Michal
09c4854fde RDMA/qedr: Fix wmb usage in qedr
This patch comes as a result of Sinan Kaya's work and the decision that
writel() must be a strong enough barrier for DMA.

wmb usages in qedr driver have either been removed where they were there
only to order DMA accesses, and replaced with smp_wmb and comments for the
places that the barrier was there for SMP reasons.

Fixes: 561e5d4896 ("RDMA/qedr: eliminate duplicate barriers on weakly-ordered archs")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 11:10:14 -06:00
Parav Pandit
39e00b6cf6 IB/rxe: Removed GID add/del dummy routines
rxe driver's add_gid() and del_gid() callbacks are doing simple
checks which are already done by the ib core before invoking these
callback routines.
Therefore, code is simplified to skip implementing add_gid() and
del_gid() callback functions.
They are only invoked by ib_core if they are implemented.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 10:15:33 -06:00
Jason Gunthorpe
57939021e8 RDMA/qedr: Zero stack memory before copying to user space
The fact this struct was not init'd like all the others was missed when
the padding reserved field was added.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 71e80a4781 ("RDMA/qedr: Fix uABI structure layouts for 32/64 compat")
Acked-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05 10:11:37 -06:00
Matan Barak
2d93fc8569 IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR
When a Raw Ethernet QP is created, we actually create a few objects.
One of these objects is a TIR. Currently, a TIR could hash (and spread
the traffic) by IP or port only. Adding a hashing by IPSec SPI to TIR
creation with the required UAPI bit.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:28 -06:00
Matan Barak
c03faa562d IB/mlx5: Add information for querying IPsec capabilities
Users should be able to query for IPSec support. Adding a few
capabilities bits as part of the driver specific part in
alloc_ucontext:
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_REQ_METADATA
	Payload's header is returned with metadata representing the
	IPSec decryption state.
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_RX
	Support ESP_AES_GCM in ingress path.
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_TX
	Support ESP_AES_GCM in egress path.
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_SPI_RSS_ONLY
	Hardware doesn't support matching SPI in flow steering rules
	but just hashing and spreading the traffic accordingly.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:28 -06:00
Aviad Yehezkel
802c212568 IB/mlx5: Add IPsec support for egress and ingress
This commit introduces support for the esp_aes_gcm flow
specification for the Innova device. To that end we add
support for egress steering and some validations that an
IPsec rule is indeed valid.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:27 -06:00
Matan Barak
349705c193 IB/mlx5: Add modify_flow_action_esp verb
Adding implementation in mlx5 driver to modify action_xfrm object. This
merely call the accel layer. Currently a user can modify only the
ESN parameters.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:27 -06:00
Aviad Yehezkel
c6475a0bca IB/mlx5: Add implementation for create and destroy action_xfrm
Adding implementation in mlx5 driver to create and destroy action_xfrm
object. This merely call the accel layer.

A user may pass MLX5_IB_XFRM_FLAGS_REQUIRE_METADATA flag which states
that [s]he expects a metadata header to be added to the payload. This
header represents information regarding the transformation's state.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:26 -06:00
Matan Barak
56ab0b38b8 IB/uverbs: Introduce ESP steering match filter
Adding a new ESP steering match filter that could match against
spi and seq used in IPSec protocol.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:26 -06:00
Matan Barak
7d12f8d5a1 IB/uverbs: Add modify ESP flow_action
flow_actions of ESP type could be modified during runtime. This could be
common for example when ESN should be changed. Adding a new
UVERBS_FLOW_ACTION_ESP_MODIFY method for changing ESP parameters of an
existing ESP flow_action.
The new method uses the UVERBS_FLOW_ACTION_ESP_CREATE attributes, but
adds a new IB_FLOW_ACTION_ESP_FLAGS_MOD_ESP_ATTRS which means ESP_ATTRS
should be changed.
In addition, we add a new FLOW_ACTION_ESP_REPLAY_NONE replay type that
could be used when one wants to disable a replay protection over a
specific flow_action.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:26 -06:00
Matan Barak
9b82844197 IB/uverbs: Add action_handle flow steering specification
Binding a flow_action to flow steering rule requires using a new
specification. Therefore, adding such an IB_FLOW_SPEC_ACTION_HANDLE flow
specification.

Flow steering rules could use flow_action(s) and as of that we need to
avoid deleting flow_action(s) as long as they're being used.
Moreover, when the attached rules are deleted, action_handle reference
count should be decremented. Introducing a new mechanism of flow
resources to keep track on the attached action_handle(s). Later on, this
mechanism should be extended to other attached flow steering resources
like flow counters.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:25 -06:00
Matan Barak
2eb9beaee5 IB/uverbs: Add flow_action create and destroy verbs
A verbs application may receive and transmits packets using a data
path pipeline. Sometimes, the first stage in the receive pipeline or
the last stage in the transmit pipeline involves transforming a
packet, either in order to make it easier for later stages to process
it or to prepare it for transmission over the wire. Such transformation
could be stripping/encapsulating the packet (i.e. vxlan),
decrypting/encrypting it (i.e. ipsec), altering headers, doing some
complex FPGA changes, etc.

Some hardware could do such transformations without software data path
intervention at all. The flow steering API supports steering a
packet (either to a QP or dropping it) and some simple packet
immutable actions (i.e. tagging a packet). Complex actions, that may
change the packet, could bloat the flow steering API extensively.
Sometimes the same action should be applied to several flows.
In this case, it's easier to bind several flows to the same action and
modify it than change all matching flows.

Introducing a new flow_action object that abstracts any packet
transformation (out of a standard and well defined set of actions).
This flow_action object could be tied to a flow steering rule via a
new specification.

Currently, we support esp flow_action, which encrypts or decrypts a
packet according to the given parameters. However, we present a
flexible schema that could be used to other transformation actions tied
to flow rules.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:25 -06:00
Matan Barak
766d8551ad IB/uverbs: Refactor kern_spec_to_ib_spec_filter
The current implementation of kern_spec_to_ib_spec_filter, which takes
a uAPI based flow steering specification and creates the respective kernel
API flow steering structure, gets a ib_uverbs_flow_spec structure.
The new flow_action uAPI gets a match mask and filter from user-space
which aren't encoded in the flow steering's ib_uverbs_flow_spec structure.
Exporting the logic out of kern_spec_to_ib_spec_filter to get user-space
blobs rather than ib_uverbs_flow_spec structure.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:24 -06:00
Boris Pismenny
8510020d29 IB/mlx4: Check for egress flow steering
ConnectX3 doesn't support egress flow steering. Return an EOPNOTSUPP
error when such a flow is being created.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:24 -06:00
Matan Barak
494c5580aa IB/uverbs: Add enum attribute type to ioctl() interface
Methods sometimes need to get one attribute out of a group of
pre-defined attributes. This is an enum-like behavior. Since
this is a common requirement, we add a new ENUM attribute to the
generic uverbs ioctl() layer. This attribute is embedded in methods,
like any other attributes we currently have. ENUM attributes point to
an array of standard UVERBS_ATTR_PTR_IN. The user-space encodes the
enum's attribute id in the id field and the internal PTR_IN attr id in
the enum_data.elem_id field. This ENUM attribute could be shared by
several attributes and it can get UVERBS_ATTR_SPEC_F_MANDATORY flag,
stating this attribute must be supported by the kernel, like any other
attribute.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:24 -06:00
Matan Barak
8c84660bb4 IB/mlx5: Initialize the parsing tree root without the help of uverbs
In order to have a custom parsing tree, a provider driver needs to
assign its parsing tree to ib_device specs_tree field. Otherwise, the
uverbs client assigns a common default parsing tree for it.
In downstream patches, the mlx5_ib driver gains a custom parsing tree,
which contains both the common objects and a new flags field for the
UVERBS_FLOW_ACTION_ESP_CREATE command.
This patch makes mlx5_ib assign its own tree to specs_root, which
later on will be extended.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04 12:06:23 -06:00
Parav Pandit
414448d249 RDMA: Use ib_gid_attr during GID modification
Now that ib_gid_attr contains device, port and index, simplify the
provider APIs add_gid() and del_gid() to use device, port and index
fields from the ib_gid_attr attributes structure.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:34:16 -06:00
Parav Pandit
3e44e0ee08 IB/providers: Avoid null netdev check for RoCE
Now that IB core GID cache ensures that all RoCE entries have an
associated netdev remove null checks from the provider drivers for
clarity.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:33:51 -06:00
Parav Pandit
14169e333e IB/providers: Avoid zero GID check for RoCE
Now that the IB core GID cache ensures that a zero GID doesn't exist in
the GID table remove zero GID checks from the provider drivers for
clarity.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:33:51 -06:00
Parav Pandit
598ff6bae6 IB/core: Refactor GID modify code for RoCE
Code is refactored to prepare separate functions for RoCE which can do more
complex operations related to reference counting, while still
maintainining code readability. This includes
(a) Simplification to not perform netdevice checks and modifications
for IB link layer.
(b) Do not add RoCE GID entry which has NULL netdevice; instead return
an error.
(c) If GID addition fails at provider level add_gid(), do not add the
entry in the cache and keep the entry marked as INVALID.
(d) Simplify and reuse the ib_cache_gid_add()/del() routines so that they
can be used even for modifying default GIDs. This avoid some code
duplication in modifying default GIDs.
(e) find_gid() routine refers to the data entry flags to qualify a GID
as valid or invalid GID rather than depending on attributes and zeroness
of the GID content.
(f) gid_table_reserve_default() sets the GID default attribute at
beginning while setting up the GID table. There is no need to use
default_gid flag in low level functions such as write_gid(), add_gid(),
del_gid(), as they never need to update the DEFAULT property of the GID
entry while during GID table update.

As as result of this refactor, reserved GID 0:0:0:0:0:0:0:0 is no longer
searchable as described below.

A unicast GID entry of 0:0:0:0:0:0:0:0 is Reserved GID as per the IB
spec version 1.3 section 4.1.1, point (6) whose snippet is below.

"The unicast GID address 0:0:0:0:0:0:0:0 is reserved - referred to as
the Reserved GID. It shall never be assigned to any endport. It shall
not be used as a destination address or in a global routing header
(GRH)."

GID table cache now only stores valid GID entries. Before this patch,
Reserved GID 0:0:0:0:0:0:0:0 was searchable in the GID table using
ib_find_cached_gid_by_port() and other similar find routines.

Zero GID is no longer searchable as it shall not to be present in GRH or
path recored entry as described in IB spec version 1.3 section 4.1.1,
point (6), section 12.7.10 and section 12.7.20.

ib_cache_update() is simplified to check link layer once, use unified
locking scheme for all link layers, removed temporary gid table
allocation/free logic.

Additionally,
(a) Expand ib_gid_attr to store port and index so that GID query
routines can get port and index information from the attribute structure.
(b) Expand ib_gid_attr to store device as well so that in future code when
GID reference counting is done, device is used to reach back to the GID
table entry.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:33:50 -06:00
Parav Pandit
f35faa4ba9 IB/core: Simplify ib_query_gid to always refer to cache
Currently following inconsistencies exist.
1. ib_query_gid() returns GID from the software cache for a RoCE port
and returns GID from the HCA for an IB port.
This is incorrect because software GID cache is maintained regardless
of HCA port type.

2. GID is queries from the HCA via ib_query_gid and updated in the
software cache for IB link layer. Both of them might not be in sync.

ULPs such as SRP initiator, SRP target, IPoIB driver have historically
used ib_query_gid() API to query the GID. However CM used cached version
during CM processing, When software cache was introduced, this
inconsitency remained.

In order to simplify, improve readability and avoid link layer
specific above inconsistencies, this patch brings following changes.

1. ib_query_gid() always refers to the cache layer regardless of link
layer.

2. cache module who reads the GID entry from HCA and builds the cache,
directly invokes the HCA provider verb's query_gid() callback function.

3. ib_query_port() is being called in early stage where GID cache is not
yet build while reading port immutable property. Therefore it needs to
read the default GID from the HCA for IB link layer to publish the
subnet prefix.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:33:50 -06:00
Parav Pandit
0e1f9b9244 RDMA/providers: Simplify query_gid callback of RoCE providers
ib_query_gid() fetches the GID from the software cache maintained in
ib_core for RoCE ports.

Therefore, simplify the provider drivers for RoCE to treat query_gid()
callback as never called for RoCE, and only require non-RoCE devices to
implement it.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:33:47 -06:00
Roland Dreier
8435168d50 RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA device
Check to make sure that ctx->cm_id->device is set before we use it.
Otherwise userspace can trigger a NULL dereference by doing
RDMA_USER_CM_CMD_SET_OPTION on an ID that is not bound to a device.

Cc: <stable@vger.kernel.org>
Reported-by: <syzbot+a67bc93e14682d92fc2f@syzkaller.appspotmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 21:06:14 -06:00
Parav Pandit
ca486a3b33 IB/qedr: Remove GID add/del dummy routines
qedr driver's add_gid() and del_gid() callbacks are doing simple
checks which are already done by the ib core before invoking these
callback routines.

Therefore, code is simplified to skip implementing add_gid() and
del_gid() callback functions.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 13:46:29 -06:00
Shiraz Saleem
3e64f8d6f5 i40iw: Remove pre-production workaround for resource profile 1
Support for resource profile 1 is currenlty deprecated due to
a pre-production errata. Remove this workaround as its no longer
needed.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 13:40:39 -06:00
Jason Gunthorpe
41d902cb7c RDMA/mlx5: Fix definition of mlx5_ib_create_qp_resp
This structure is pushed down the ex and the non-ex path, so it needs to be
aligned to 8 bytes to go through ex without implicit padding.

Old user space will provide 4 bytes of resp on !ex and 8 bytes on ex, so
take the approach of just copying the minimum length.

New user space will consistently provide 8 bytes in both cases.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 13:38:40 -06:00
Gustavo A. R. Silva
689a8e3193 IB/ocrdma_hw: Remove redundant checks and goto labels
Check on return values and goto label mbx_err are unnecessary.

Addresses-Coverity-ID: 1271151 ("Identical code for different branches")
Addresses-Coverity-ID: 1268788 ("Identical code for different branches")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 10:43:35 -06:00
Yuval Shaia
12ed56bad9 IB/ipoib: Delete unused struct
This structure is not needed since the introduction of commit
'c42687784b9a ("IB/ipoib: Scatter-Gather support in connected mode")'

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03 10:42:40 -06:00
David S. Miller
c0b458a946 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c,
we had some overlapping changes:

1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE -->
   MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE

2) In 'net-next' params->log_rq_size is renamed to be
   params->log_rq_mtu_frames.

3) In 'net-next' params->hard_mtu is added.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-01 19:49:34 -04:00
David S. Miller
d4069fe6fc Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-03-31

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add raw BPF tracepoint API in order to have a BPF program type that
   can access kernel internal arguments of the tracepoints in their
   raw form similar to kprobes based BPF programs. This infrastructure
   also adds a new BPF_RAW_TRACEPOINT_OPEN command to BPF syscall which
   returns an anon-inode backed fd for the tracepoint object that allows
   for automatic detach of the BPF program resp. unregistering of the
   tracepoint probe on fd release, from Alexei.

2) Add new BPF cgroup hooks at bind() and connect() entry in order to
   allow BPF programs to reject, inspect or modify user space passed
   struct sockaddr, and as well a hook at post bind time once the port
   has been allocated. They are used in FB's container management engine
   for implementing policy, replacing fragile LD_PRELOAD wrapper
   intercepting bind() and connect() calls that only works in limited
   scenarios like glibc based apps but not for other runtimes in
   containerized applications, from Andrey.

3) BPF_F_INGRESS flag support has been added to sockmap programs for
   their redirect helper call bringing it in line with cls_bpf based
   programs. Support is added for both variants of sockmap programs,
   meaning for tx ULP hooks as well as recv skb hooks, from John.

4) Various improvements on BPF side for the nfp driver, besides others
   this work adds BPF map update and delete helper call support from
   the datapath, JITing of 32 and 64 bit XADD instructions as well as
   offload support of bpf_get_prandom_u32() call. Initial implementation
   of nfp packet cache has been tackled that optimizes memory access
   (see merge commit for further details), from Jakub and Jiong.

5) Removal of struct bpf_verifier_env argument from the print_bpf_insn()
   API has been done in order to prepare to use print_bpf_insn() soon
   out of perf tool directly. This makes the print_bpf_insn() API more
   generic and pushes the env into private data. bpftool is adjusted
   as well with the print_bpf_insn() argument removal, from Jiri.

6) Couple of cleanups and prep work for the upcoming BTF (BPF Type
   Format). The latter will reuse the current BPF verifier log as
   well, thus bpf_verifier_log() is further generalized, from Martin.

7) For bpf_getsockopt() and bpf_setsockopt() helpers, IPv4 IP_TOS read
   and write support has been added in similar fashion to existing
   IPv6 IPV6_TCLASS socket option we already have, from Nikita.

8) Fixes in recent sockmap scatterlist API usage, which did not use
   sg_init_table() for initialization thus triggering a BUG_ON() in
   scatterlist API when CONFIG_DEBUG_SG was enabled. This adds and
   uses a small helper sg_init_marker() to properly handle the affected
   cases, from Prashant.

9) Let the BPF core follow IDR code convention and therefore use the
   idr_preload() and idr_preload_end() helpers, which would also help
   idr_alloc_cyclic() under GFP_ATOMIC to better succeed under memory
   pressure, from Shaohua.

10) Last but not least, a spelling fix in an error message for the
    BPF cookie UID helper under BPF sample code, from Colin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-31 23:33:04 -04:00
Linus Torvalds
d89b9f5029 Fifth pull request for 4.16-rc
Bug fixes:
 - qedr driver bugfixes causing application hangs, wrong uapi errnos, and a
   race condition
 - 3 syzkaller found bugfixes in the ucma uapi
 
 Regression fixes for things introduced in 4.16:
 - Crash on error introduced in mlx5 UMR flow
 - Crash on module unload/etc introduced by bad interaction of restrack
   and mlx5 patches this cycle
 - Typo in a two line syzkaller bugfix causing a bad regression
 - Coverity report of nonsense code in hns driver
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJavTPwAAoJEDht9xV+IJsaMtIP/04hM1pWJAxCPtxlXFqlfLnQ
 llGvDlzyGUrFlSbDITmXXS3nVFtk36SM6Eqqa48yi7oZF+2+4JjlaqIUiYXmAOOR
 ocvpDB4QKXgnjAc9mIyJ8SOILhmSDOwwbueaKBClnyPIj5wGvrKlnAdeGDgPeuSU
 Jcmect5penbU4U44m4JtbqSNIRWuoUvrbQ6ioftHV32RnXBRyrP1KxXtM3tVvav8
 TlBgCt6zWhab1u6MGEebJgx97eFwhgc1Bd1mIJv9TPPEplC8kqaNRFrsctsyDUxu
 h674VNE5YyzoLBrUGI4IzvL5f3p8OEa18wslJB5ZyL6qiorj5y4vf+lSiQT8qOSF
 NW+jmsVEA0l0trVkl5r0qhzIV+EVTgSoR4C5wKbxEwMx51PmG/utPqFV+N511In7
 GPqmRL3KuJPBZ0TIepwoH57FwrXdfc/UiF95duLizHojJgMpbnn18pQUBj2Fofch
 Gs9IjipO8AxpYybRoGvBC7fMTrzs5IV3yNj2qxu2mCq0tRQMu1cbOh6y//YZKqjL
 wQFtUSX2rO/rcvABAgpEP7a/9aLEj5m+vsFpEtigteRQRggOH6dAxXYzK8qKFqPK
 4C9+5ybpAJqjjMuFxjd9n6BIYJG8gEhSGIyOaeP6cK016AQj4FN8ZgLwR9nTokQS
 p9DdyVZWFpqAuCWV5ML1
 =0JcN
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "It has been fairly silent lately on our -rc front. Big queue of
  patches on the mailing list going to for-next though.

  Bug fixes:
   - qedr driver bugfixes causing application hangs, wrong uapi errnos,
     and a race condition
   - three syzkaller found bugfixes in the ucma uapi

  Regression fixes for things introduced in 4.16:
   - Crash on error introduced in mlx5 UMR flow
   - Crash on module unload/etc introduced by bad interaction of
     restrack and mlx5 patches this cycle
   - Typo in a two line syzkaller bugfix causing a bad regression
   - Coverity report of nonsense code in hns driver"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/ucma: Introduce safer rdma_addr_size() variants
  RDMA/hns: ensure for-loop actually iterates and free's buffers
  RDMA/ucma: Check that device exists prior to accessing it
  RDMA/ucma: Check that device is connected prior to access it
  RDMA/rdma_cm: Fix use after free race with process_one_req
  RDMA/qedr: Fix QP state initialization race
  RDMA/qedr: Fix rc initialization on CNQ allocation failure
  RDMA/qedr: fix QP's ack timeout configuration
  RDMA/ucma: Correct option size check using optlen
  RDMA/restrack: Move restrack_clean to be symmetrical to restrack_init
  IB/mlx5: Don't clean uninitialized UMR resources
2018-03-29 19:23:24 -10:00
Bharat Potnuri
44016b3466 iw_cxgb4: print mapped ports correctly
c4iw_ep_common structure holds the mapped addresses, so while printing
them, use appropriate pointers.

Fixes: bab572f1d ("iw_cxgb4: Guard against null cm_id in dump_ep/qp")
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29 16:00:59 -06:00
Parav Pandit
218b9e3eb8 RDMA/cma: Move rdma_cm_state to cma_priv.h
rdma_cm_state enum is internal to rdma_cm kernel module.
It is not required to expose state enums to ULP modules.
So lets keep its scope limited to rdma_cm module in cma_priv.h file.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29 13:54:21 -06:00
Parav Pandit
fd59015d68 IB/addr: Constify dst_entry pointer
Make dst_entry pointer as const struct dst_entry* to improve code
readablity to make sure that dst structure fields are not modified by
various functions which are using it.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29 13:54:20 -06:00
Jason Gunthorpe
6f57c933a4 RDMA: Use u64_to_user_ptr everywhere
This is already used in many places, get the rest of them too, only
to make the code a bit clearer & simpler.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29 13:42:29 -06:00
Leon Romanovsky
5b2cc79de8 RDMA/nldev: Provide netdevice name and index
Export the net device name and index to easily find connection
between IB devices and relevant net devices.

We also updated the comment regarding the devices without FW.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29 13:32:40 -06:00
Zhu Yanjun
99dae69025 IB/rxe: optimize mcast recv process
In mcast recv process, the function skb_clone is used. In fact,
the refcount can be increased to replace cloning a new skb since
the original skb will not be modified before it is freed.

This can make the performance better and save the memory.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29 13:25:22 -06:00
Colin Ian King
a343e3f89e qedr: Fix spelling mistake: "hanlde" -> "handle"
Trivial fix to spelling mistake in DP_ERR message text

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29 13:19:23 -06:00
Michal Kalderon
50bc60cb15 qed*: Utilize FW 8.33.11.0
This FW contains several fixes and features

RDMA Features
- SRQ support
- XRC support
- Memory window support
- RDMA low latency queue support
- RDMA bonding support

RDMA bug fixes
- RDMA remote invalidate during retransmit fix
- iWARP MPA connect interop issue with RTR fix
- iWARP Legacy DPM support
- Fix MPA reject flow
- iWARP error handling
- RQ WQE validation checks

MISC
- Fix some HSI types endianity
- New Restriction: vlan insertion in core_tx_bd_data can't be set
  for LB packets

ETH
- HW QoS offload support
- Fix vlan, dcb and sriov flow of VF sending a packet with
  inband VLAN tag instead of default VLAN
- Allow GRE version 1 offloads in RX flow
- Allow VXLAN steering

iSCSI / FcoE
- Fix bd availability checking flow
- Support 256th sge proerly in iscsi/fcoe retransmit
- Performance improvement
- Fix handle iSCSI command arrival with AHS and with immediate
- Fix ipv6 traffic class configuration

DEBUG
- Update debug utilities

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:18:02 -04:00
David S. Miller
6e2135ce54 mlx5-updates-2018-03-27 (Misc updates & SQ recovery)
This series contains Misc updates and cleanups for mlx5e rx path
 and SQ recovery feature for tx path.
 
 From Tariq: (RX updates)
     - Disable Striding RQ when PCI devices, striding RQ limits the use
       of CQE compression feature, which is very critical for slow PCI
       devices performance, in this change we will prefer CQE compression
       over Striding RQ only on specific "slow"  PCIe links.
     - RX path cleanups
     - Private flag to enable/disable striding RQ
 
 From Eran: (TX fast recovery)
     - TX timeout logic improvements, fast SQ recovery and TX error reporting
       if a HW error occurs while transmitting on a specific SQ, the driver will
       ignore such error and will wait for TX timeout to occur and reset all
       the rings. Instead, the current series improves the resiliency for such
       HW errors by detecting TX completions with errors, which will report them
       and perform a fast recover for the specific faulty SQ even before a TX
       timeout is detected.
 
 Thanks,
 Saeed.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJauuK/AAoJEEg/ir3gV/o+Ro4IAIBcQl4S/6jPttGDOPwElU4b
 Q8MuQbnB9k36Kf6x3GrhBqwtoS3OYBLvfyOeZsghD8uyzklhnqR/psrCt1YhLXq7
 aI5iFbuQucCUwIYBscuZegXBs+PTJYALoil05hOwSkwhEBZR53ZvdRQ9Jg0G0QiY
 sYEYAl41A1DWQOLET5lwleUpVQnVnNvIcXqNHKBKbS4f8rmLpfsZ6YAxJiiIjeDJ
 gLbsaxUyY3p77594okBk6DuFBWZsEAKwumspdSc92OEMrZSYE5ugNL+UgVEE7I8F
 JgK/4OUarw3K7xwhAg//ZhTM1iRRdJv0Sz2GPhhWRzYB0jWACc3ZG2/M/WmxFIg=
 =Uslu
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-03-27 (Misc updates & SQ recovery)

This series contains Misc updates and cleanups for mlx5e rx path
and SQ recovery feature for tx path.

From Tariq: (RX updates)
    - Disable Striding RQ when PCI devices, striding RQ limits the use
      of CQE compression feature, which is very critical for slow PCI
      devices performance, in this change we will prefer CQE compression
      over Striding RQ only on specific "slow"  PCIe links.
    - RX path cleanups
    - Private flag to enable/disable striding RQ

From Eran: (TX fast recovery)
    - TX timeout logic improvements, fast SQ recovery and TX error reporting
      if a HW error occurs while transmitting on a specific SQ, the driver will
      ignore such error and will wait for TX timeout to occur and reset all
      the rings. Instead, the current series improves the resiliency for such
      HW errors by detecting TX completions with errors, which will report them
      and perform a fast recover for the specific faulty SQ even before a TX
      timeout is detected.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 14:01:40 -04:00
Kirill Tkhai
f0b07bb151 net: Introduce net_rwsem to protect net_namespace_list
rtnl_lock() is used everywhere, and contention is very high.
When someone wants to iterate over alive net namespaces,
he/she has no a possibility to do that without exclusive lock.
But the exclusive rtnl_lock() in such places is overkill,
and it just increases the contention. Yes, there is already
for_each_net_rcu() in kernel, but it requires rcu_read_lock(),
and this can't be sleepable. Also, sometimes it may be need
really prevent net_namespace_list growth, so for_each_net_rcu()
is not fit there.

This patch introduces new rw_semaphore, which will be used
instead of rtnl_mutex to protect net_namespace_list. It is
sleepable and allows not-exclusive iterations over net
namespaces list. It allows to stop using rtnl_lock()
in several places (what is made in next patches) and makes
less the time, we keep rtnl_mutex. Here we just add new lock,
while the explanation of we can remove rtnl_lock() there are
in next patches.

Fine grained locks generally are better, then one big lock,
so let's do that with net_namespace_list, while the situation
allows that.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29 13:47:53 -04:00
Steve Wise
2253fc0caa RDMA/CMA: Add rdma_port_space to UAPI
Since the rdma_port_space enum is being passed between user and kernel for
user cm_id setup, we need it in a UAPI header.  So add it to
rdma_user_cm.h.

This also fixes the cm_id restrack changes which pass up the port space
value via the RDMA_NLDEV_ATTR_RES_PS attribute.

Fixes: 00313983cd ("RDMA/nldev: provide detailed CM_ID information")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-28 20:50:45 -06:00
Roland Dreier
84652aefb3 RDMA/ucma: Introduce safer rdma_addr_size() variants
There are several places in the ucma ABI where userspace can pass in a
sockaddr but set the address family to AF_IB.  When that happens,
rdma_addr_size() will return a size bigger than sizeof struct sockaddr_in6,
and the ucma kernel code might end up copying past the end of a buffer
not sized for a struct sockaddr_ib.

Fix this by introducing new variants

    int rdma_addr_size_in6(struct sockaddr_in6 *addr);
    int rdma_addr_size_kss(struct __kernel_sockaddr_storage *addr);

that are type-safe for the types used in the ucma ABI and return 0 if the
size computed is bigger than the size of the type passed in.  We can use
these new variants to check what size userspace has passed in before
copying any addresses.

Reported-by: <syzbot+6800425d54ed3ed8135d@syzkaller.appspotmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-28 16:13:36 -06:00
Alexei Starovoitov
c105547501 treewide: remove large struct-pass-by-value from tracepoint arguments
- fix trace_hfi1_ctxt_info() to pass large struct by reference instead of by value
- convert 'type array[]' tracepoint arguments into 'type *array',
  since compiler will warn that sizeof('type array[]') == sizeof('type *array')
  and later should be used instead

The CAST_TO_U64 macro in the later patch will enforce that tracepoint
arguments can only be integers, pointers, or less than 8 byte structures.
Larger structures should be passed by reference.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-28 22:55:18 +02:00
Eran Ben Elisha
1acae6b030 mlx5: Move dump error CQE function out of mlx5_ib for code sharing
Move mlx5_ib dump error CQE implementation to mlx5 CQ header file in
order to use it in a downstream patch from mlx5e.

In addition, use print_hex_dump instead of manual dumping of the buffer.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27 17:17:28 -07:00
Eran Ben Elisha
2816077127 mlx5_{ib,core}: Add query SQ state helper function
Move query SQ state function from mlx5_ib to mlx5_core in order to
have it in shared code.

It will be used in a downstream patch from mlx5e.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-27 17:17:28 -07:00
Parav Pandit
190fb9c4d1 IB/core: Refer to RoCE port property to decide building cache
IB core maintains the GID cache entries for the GID table.
This cache table has to be maintained regardless of HCA's
support of GID table.
For IB and iWarp ports, cache is created by querying the HCA.
For RoCE cache is created based on netdev events.

Therefore just refer to the RoCE port property of the {device, port} to
decide whether to build cache by querying HCA or from netdev events.
There is no need to check if HCA support GID table or not.

ib_cache_update() referred to RoCE attribute before validating
port. Though in all current callers port is valid, it is incorrect
to query RoCE port property before validating the port. Therefore,
rdma_protocol_roce() check is done after rdma_is_port_valid() verifies
that port is valid.

Fixes: 115b68aa6e ("IB/ocrdma: Removed GID add/del null routines")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 16:22:12 -06:00
Parav Pandit
22d24f75a1 IB/core: Search GID only for IB link layer
Even though API is only used by IPoIB driver, its incorrect to refer
RoCE GID table property to search for GID.

Look for only IB link layer to search for the GID.

Fixes: dbb12562f7 ("IB/{core, ipoib}: Simplify ib_find_gid to search only for IB link layer")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 16:22:12 -06:00
Parav Pandit
4ab7cb4bf3 IB/core: Refer to RoCE port property instead of GID table property
ib_find_gid_by_filter() searches GID with filter only for RoCE link
layer regardless of HCA's support for GID table.
Therefore, right way to lookup is compare RoCE port property and not
the GID table property.

Fixes: 99b27e3b5d ("IB/cache: Add ib_find_gid_by_filter cache API")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 16:22:12 -06:00
Parav Pandit
3401857ea3 IB/core: Generate GID change event regardless of RoCE GID table property
Due to following reasons, GID table event is generated regardless of GID
table property.

1. GID table cache is maintained at ib core layer regardless of link layer.
2. GID change event has no relation with IB link layer.
3. GID change event also doesn't depend on whether HCA supports GID table
or not.

Fixes: f3906bd360 ("IB/core: Refactor GID cache's ib_dispatch_event")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 16:22:12 -06:00
Parav Pandit
97c45c2c28 IB/cm: Block processing alternate path handling RoCE Rx cm messages
Due to below reasons, it is better to not support alternate path receive
messages for RoCE in near term.

1. Alternate path for RoCE is not supported at rdmacm layer.
2. It is not supported in uverbs/core layer for RoCE.
3. Alternate path for IPv6 for link local address cannot resolve route
determinstically without a valid incoming interface id whose usecase
make sense only with dual port mode.
4. init_av_from_path while processing LAP messages for IB and RoCE can
lead to adding duplicate entry of AV into the port list, leads to list
corruption.
5. rdma-core userspace a well known userspace implementation has removed
support of libucm which use ucm.ko module, which is the only module that
can trigger alternate path related messages.
6. ucm kernel module is requested to be removed from the IB core in
patch [1].

[1] https://patchwork.kernel.org/patch/10268503/

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 16:22:12 -06:00
Mark Bloch
e945130b52 IB/core: Protect against concurrent access to hardware stats
Currently access to hardware stats buffer isn't protected, this can
result in multiple writes and reads at the same time to the same
memory location. This can lead to providing an incorrect value to
the user. Add a mutex to protect against it.

Fixes: b40f4757da ("IB/core: Make device counter infrastructure dynamic")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 15:07:21 -06:00
Colin Ian King
38759d6175 RDMA/hns: ensure for-loop actually iterates and free's buffers
The current for-loop zeros variable i and only loops once, hence
not all the buffers are free'd.  Fix this by setting i correctly.

Detected by CoverityScan, CID#1463415 ("Operands don't affect result")

Fixes: a5073d6054 ("RDMA/hns: Add eq support of hip08")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 15:06:34 -06:00
Majd Dibbiny
c8d75a980f IB/mlx5: Respect new UMR capabilities
In some firmware configuration, UMR usage from Virtual Functions is restricted.
This information is published to the driver using new capability bits.

Avoid using UMRs in these cases and use the Firmware slow-path flow to create
mkeys and populate them with Virtual to Physical address translation.

Older drivers that do not have this patch, will end up using memory keys that
aren't populated with Virtual to Physical address translation that is done
part of the UMR work.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 14:43:10 -06:00
Majd Dibbiny
ea8af0d2f2 IB/mlx5: Enable ECN capable bits for UD RoCE v2 QPs
When working with RC QPs, the FW sets the ECN capable bits for all
the RoCE v2 packets. On the other hand, for UD QPs, the driver needs
to set the the ECN capable bits in the Address Handler since the HW
generates each packet according to the Address Handler and not
the QP context.

If ECN is not enabled in NIC or switch, these bits are ignored.

Fixes: 2811ba51b0 ("IB/mlx5: Add RoCE fields to Address Vector")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 14:43:10 -06:00
Jason Gunthorpe
f2e9bfac13 RDMA/rxe: Fix uABI structure layouts for 32/64 compat
With 32 bit compilation several of the fields become misaligned here.
Fixing this is an ABI break for 32 bit rxe and it is in well used
portions of the rxe ABI.

To handle this we bump the ABI version, as expected. However the user
space driver doesn't handle it properly today, so all existing user
space continues to work.

Updated userspace will start to require the necessary kernel version.

We don't expect there to be any 32 bit users of rxe. Most likely cases,
such as ARM 32 already generally don't work because rxe does not handle
the CPU cache properly on its shared with userspace pages.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 14:25:09 -06:00
Jason Gunthorpe
611cb92b08 RDMA/ucma: Fix uABI structure layouts for 32/64 compat
The rdma_ucm_event_resp is a different length on 32 and 64 bit compiles.

The kernel requires it to be the expected length or longer so 32 bit
builds running on a 64 bit kernel will not work.

Retain full compat by having all kernels accept a struct with or without
the trailing reserved field.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 14:25:08 -06:00
Leon Romanovsky
c8d3bcbfc5 RDMA/ucma: Check that device exists prior to accessing it
Ensure that device exists prior to accessing its properties.

Reported-by: <syzbot+71655d44855ac3e76366@syzkaller.appspotmail.com>
Fixes: 7521663857 ("RDMA/cma: Export rdma cm interface to userspace")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 14:10:45 -06:00
Leon Romanovsky
4b658d1bbc RDMA/ucma: Check that device is connected prior to access it
Add missing check that device is connected prior to access it.

[   55.358652] BUG: KASAN: null-ptr-deref in rdma_init_qp_attr+0x4a/0x2c0
[   55.359389] Read of size 8 at addr 00000000000000b0 by task qp/618
[   55.360255]
[   55.360432] CPU: 1 PID: 618 Comm: qp Not tainted 4.16.0-rc1-00071-gcaf61b1b8b88 #91
[   55.361693] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[   55.363264] Call Trace:
[   55.363833]  dump_stack+0x5c/0x77
[   55.364215]  kasan_report+0x163/0x380
[   55.364610]  ? rdma_init_qp_attr+0x4a/0x2c0
[   55.365238]  rdma_init_qp_attr+0x4a/0x2c0
[   55.366410]  ucma_init_qp_attr+0x111/0x200
[   55.366846]  ? ucma_notify+0xf0/0xf0
[   55.367405]  ? _get_random_bytes+0xea/0x1b0
[   55.367846]  ? urandom_read+0x2f0/0x2f0
[   55.368436]  ? kmem_cache_alloc_trace+0xd2/0x1e0
[   55.369104]  ? refcount_inc_not_zero+0x9/0x60
[   55.369583]  ? refcount_inc+0x5/0x30
[   55.370155]  ? rdma_create_id+0x215/0x240
[   55.370937]  ? _copy_to_user+0x4f/0x60
[   55.371620]  ? mem_cgroup_commit_charge+0x1f5/0x290
[   55.372127]  ? _copy_from_user+0x5e/0x90
[   55.372720]  ucma_write+0x174/0x1f0
[   55.373090]  ? ucma_close_id+0x40/0x40
[   55.373805]  ? __lru_cache_add+0xa8/0xd0
[   55.374403]  __vfs_write+0xc4/0x350
[   55.374774]  ? kernel_read+0xa0/0xa0
[   55.375173]  ? fsnotify+0x899/0x8f0
[   55.375544]  ? fsnotify_unmount_inodes+0x170/0x170
[   55.376689]  ? __fsnotify_update_child_dentry_flags+0x30/0x30
[   55.377522]  ? handle_mm_fault+0x174/0x320
[   55.378169]  vfs_write+0xf7/0x280
[   55.378864]  SyS_write+0xa1/0x120
[   55.379270]  ? SyS_read+0x120/0x120
[   55.379643]  ? mm_fault_error+0x180/0x180
[   55.380071]  ? task_work_run+0x7d/0xd0
[   55.380910]  ? __task_pid_nr_ns+0x120/0x140
[   55.381366]  ? SyS_read+0x120/0x120
[   55.381739]  do_syscall_64+0xeb/0x250
[   55.382143]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[   55.382841] RIP: 0033:0x7fc2ef803e99
[   55.383227] RSP: 002b:00007fffcc5f3be8 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
[   55.384173] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc2ef803e99
[   55.386145] RDX: 0000000000000057 RSI: 0000000020000080 RDI: 0000000000000003
[   55.388418] RBP: 00007fffcc5f3c00 R08: 0000000000000000 R09: 0000000000000000
[   55.390542] R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000400480
[   55.392916] R13: 00007fffcc5f3cf0 R14: 0000000000000000 R15: 0000000000000000
[   55.521088] Code: e5 4d 1e ff 48 89 df 44 0f b6 b3 b8 01 00 00 e8 65 50 1e ff 4c 8b 2b 49
8d bd b0 00 00 00 e8 56 50 1e ff 41 0f b6 c6 48 c1 e0 04 <49> 03 85 b0 00 00 00 48 8d 78 08
48 89 04 24 e8 3a 4f 1e ff 48
[   55.525980] RIP: rdma_init_qp_attr+0x52/0x2c0 RSP: ffff8801e2c2f9d8
[   55.532648] CR2: 00000000000000b0
[   55.534396] ---[ end trace 70cee64090251c0b ]---

Fixes: 7521663857 ("RDMA/cma: Export rdma cm interface to userspace")
Fixes: d541e45500 ("IB/core: Convert ah_attr from OPA to IB when copying to user")
Reported-by: <syzbot+7b62c837c2516f8f38c8@syzkaller.appspotmail.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 14:10:45 -06:00
Jason Gunthorpe
9137108cc3 RDMA/rdma_cm: Fix use after free race with process_one_req
process_one_req() can race with rdma_addr_cancel():

           CPU0                                 CPU1
           ====                                 ====
 process_one_work()
  debug_work_deactivate(work);
  process_one_req()
                                        rdma_addr_cancel()
	                                  mutex_lock(&lock);
 			    	           set_timeout(&req->work,..);
                                              __queue_work()
				   	       debug_work_activate(work);
	                                  mutex_unlock(&lock);

   mutex_lock(&lock);
[..]
	list_del(&req->list);
   mutex_unlock(&lock);
[..]

   // ODEBUG explodes since the work is still queued.
   kfree(req);

Causing ODEBUG to detect the use after free:

ODEBUG: free active (active state 0) object type: work_struct hint: process_one_req+0x0/0x6c0 include/net/dst.h:165
WARNING: CPU: 0 PID: 79 at lib/debugobjects.c:291 debug_print_object+0x166/0x220 lib/debugobjects.c:288
kvm: emulating exchange as write
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 79 Comm: kworker/u4:3 Not tainted 4.16.0-rc6+ #361
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: ib_addr process_one_req
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x194/0x24d lib/dump_stack.c:53
 panic+0x1e4/0x41c kernel/panic.c:183
 __warn+0x1dc/0x200 kernel/panic.c:547
 report_bug+0x1f4/0x2b0 lib/bug.c:186
 fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
 fixup_bug arch/x86/kernel/traps.c:247 [inline]
 do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
 invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
RIP: 0010:debug_print_object+0x166/0x220 lib/debugobjects.c:288
RSP: 0000:ffff8801d966f210 EFLAGS: 00010086
RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff815acd6e
RDX: 0000000000000000 RSI: 1ffff1003b2cddf2 RDI: 0000000000000000
RBP: ffff8801d966f250 R08: 0000000000000000 R09: 1ffff1003b2cddc8
R10: ffffed003b2cde71 R11: ffffffff86f39a98 R12: 0000000000000001
R13: ffffffff86f15540 R14: ffffffff86408700 R15: ffffffff8147c0a0
 __debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
 debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
 kfree+0xc7/0x260 mm/slab.c:3799
 process_one_req+0x2e7/0x6c0 drivers/infiniband/core/addr.c:592
 process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
 worker_thread+0x223/0x1990 kernel/workqueue.c:2247
 kthread+0x33c/0x400 kernel/kthread.c:238
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406

Fixes: 5fff41e1f8 ("IB/core: Fix race condition in resolving IP to MAC")
Reported-by: <syzbot+3b4acab09b6463472d0a@syzkaller.appspotmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 13:19:01 -06:00
Kirill Tkhai
2f635ceeb2 net: Drop pernet_operations::async
Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-27 13:18:09 -04:00
Kirill Tkhai
070f2d7e26 net: Drop NETDEV_UNREGISTER_FINAL
Last user is gone after bdf5bd7f21 "rds: tcp: remove
register_netdevice_notifier infrastructure.", so we can
remove this netdevice command. This allows to delete
rtnl_lock() in netdev_run_todo(), which is hot path for
net namespace unregistration.

dev_change_net_namespace() and netdev_wait_allrefs()
have rcu_barrier() before NETDEV_UNREGISTER_FINAL call,
and the source commits say they were introduced to
delemit the call with NETDEV_UNREGISTER, but this patch
leaves them on the places, since they require additional
analysis, whether we need in them for something else.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26 11:34:00 -04:00
Kirill Tkhai
3e0c2dbfea infiniband: Replace usnic_ib_netdev_event_to_string() with netdev_cmd_to_name()
This function just calls netdev_cmd_to_name().

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26 11:34:00 -04:00
Kirill Tkhai
ede2762d93 net: Make NETDEV_XXX commands enum { }
This patch is preparation to drop NETDEV_UNREGISTER_FINAL.
Since the cmd is used in usnic_ib_netdev_event_to_string()
to get cmd name, after plain removing NETDEV_UNREGISTER_FINAL
from everywhere, we'd have holes in event2str[] in this
function.

Instead of that, let's make NETDEV_XXX commands names
available for everyone, and to define netdev_cmd_to_name()
in the way we won't have to shaffle names after their
numbers are changed.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-26 11:33:26 -04:00
Steve Wise
f215a3d244 iw_cxgb4: Add ib_device->get_netdev support
This is useful to rdma ULPs.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-23 11:12:51 -06:00
Parav Pandit
114cc9c4b1 IB/cma: Resolve route only while receiving CM requests
Currently CM request for RoCE follows following flow.
rdma_create_id()
rdma_resolve_addr()
rdma_resolve_route()
For RC QPs:
rdma_connect()
->cma_connect_ib()
  ->ib_send_cm_req()
    ->cm_init_av_by_path()
      ->ib_init_ah_attr_from_path()
For UD QPs:
rdma_connect()
->cma_resolve_ib_udp()
  ->ib_send_cm_sidr_req()
    ->cm_init_av_by_path()
      ->ib_init_ah_attr_from_path()

In both the flows, route is already resolved before sending CM requests.
Therefore, code is refactored to avoid resolving route second time in
ib_cm layer.
ib_init_ah_attr_from_path() is extended to resolve route when it is not
yet resolved for RoCE link layer. This is achieved by caller setting
route_resolved field in path record whenever it has route already
resolved.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-23 10:58:05 -06:00
David S. Miller
03fe2debbb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Fun set of conflict resolutions here...

For the mac80211 stuff, these were fortunately just parallel
adds.  Trivially resolved.

In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the
function phy_disable_interrupts() earlier in the file, whilst in
'net-next' the phy_error() call from this function was removed.

In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the
'rt_table_id' member of rtable collided with a bug fix in 'net' that
added a new struct member "rt_mtu_locked" which needs to be copied
over here.

The mlxsw driver conflict consisted of net-next separating
the span code and definitions into separate files, whilst
a 'net' bug fix made some changes to that moved code.

The mlx5 infiniband conflict resolution was quite non-trivial,
the RDMA tree's merge commit was used as a guide here, and
here are their notes:

====================

    Due to bug fixes found by the syzkaller bot and taken into the for-rc
    branch after development for the 4.17 merge window had already started
    being taken into the for-next branch, there were fairly non-trivial
    merge issues that would need to be resolved between the for-rc branch
    and the for-next branch.  This merge resolves those conflicts and
    provides a unified base upon which ongoing development for 4.17 can
    be based.

    Conflicts:
            drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f95
            (IB/mlx5: Fix cleanup order on unload) added to for-rc and
            commit b5ca15ad7e (IB/mlx5: Add proper representors support)
            add as part of the devel cycle both needed to modify the
            init/de-init functions used by mlx5.  To support the new
            representors, the new functions added by the cleanup patch
            needed to be made non-static, and the init/de-init list
            added by the representors patch needed to be modified to
            match the init/de-init list changes made by the cleanup
            patch.
    Updates:
            drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
            prototypes added by representors patch to reflect new function
            names as changed by cleanup patch
            drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
            stage list to match new order from cleanup patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 11:31:58 -04:00
Parav Pandit
98f1f4e0ed IB/core: Refer to RoCE port property instead of GID table property
ib_query_gid() in commit [1] refers to RoCE GID table capability of
the HCA using rdma_cap_roce_gid_table().
ib_core maintains the GID table cache regardless of the HCA provider
drivers capability to maintain RoCE GID table.
Therefore, whether to return a GID table entry from the software cache or
from HCA should be done based on whether the port is RoCE or not.

[1] commit 03db3a2d81 ("IB/core: Add RoCE GID table management")

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-22 12:42:49 -06:00
Leon Romanovsky
03286030ac RDMA/restrack: Remove ambiguity in resource track clean logic
The restrack clean routine had simple, but powerful WARN_ON check
to see if all resources are cleared prior to releasing device.

The WARN_ON check performed very well, but lack of information
which device caused to resource leak, the object type and origin
made debug to be fun and challenging at the same time.

The fact that all dumps were the same because restrack_clean() is
called in dealloc() didn't help either.

So let's fix spelling error and convert WARN_ON to be more debug
friendly. The dmesg cut below gives example of how the output
will look output for the case fixed in patch [1]

[  438.421372] restrack: ------------[ cut here ]------------
[  438.423448] restrack: BUG: RESTRACK detected leak of resources on mlx5_2
[  438.425600] restrack: Kernel PD object allocated by mlx5_ib is not freed
[  438.427753] restrack: Kernel CQ object allocated by mlx5_ib is not freed
[  438.429660] restrack: ------------[ cut here ]------------

[1] https://patchwork.kernel.org/patch/10298695/

Cc: Michal Kalderon <Michal.Kalderon@cavium.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-22 12:42:48 -06:00
Yixian Liu
e95955773d RDMA/hns: Fix cq record doorbell enable in kernel
Upon detecting both kernel and user space support record doorbell,
the kernel needs to enable this capability in hardware by db_en,
and it should take place before cq context configuration in
hns_roce_cq_alloc. Currently, db_en is configured after cq alloc
and db_map_user has similar problem.

Reported-by: Xiping Zhang <zhangxiping3@huawei.com>
Fixes: 9b44703d0a ("RDMA/hns: Support cq record doorbell for the user space")
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-22 12:42:48 -06:00
Jason Gunthorpe
761fc376c9 RDMA/cxgb3: Use structs to describe the uABI instead of opencoding
Open coding a loose value is not acceptable for describing the uABI in
RDMA. Provide the missing struct.

Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-22 12:42:48 -06:00
Kalderon, Michal
caf61b1b8b RDMA/qedr: Fix QP state initialization race
Once the FW is transitioned to error, FLUSH cqes can be received.
We want the driver to be aware of the fact that QP is already in error.

Without this fix, a user may see false error messages in the dmesg log,
mentioning that a FLUSH cqe was received while QP is not in error state.

Fixes: cecbcddf ("qedr: Add support for QP verbs")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-21 14:38:41 -06:00
Kalderon, Michal
b15606f47b RDMA/qedr: Fix rc initialization on CNQ allocation failure
Return code wasn't set properly when CNQ allocation failed.
This only affect error message logging, currently user will
receive an error message that says the qedr driver load failed
with rc '0', instead of ENOMEM

Fixes: ec72fce4 ("qedr: Add support for RoCE HW init")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-21 14:38:41 -06:00
Kalderon, Michal
c3594f2230 RDMA/qedr: fix QP's ack timeout configuration
QPs that were configured with ack timeout value lower than 1
msec will not implement re-transmission timeout.
This means that if a packet / ACK were dropped, the QP
will not retransmit this packet.

This can lead to an application hang.

Fixes: cecbcddf6 ("qedr: Add support for QP verbs")
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-21 14:38:41 -06:00
Chien Tin Tung
5f3e3b85cc RDMA/ucma: Correct option size check using optlen
The option size check is using optval instead of optlen
causing the set option call to fail. Use the correct
field, optlen, for size check.

Fixes: 6a21dfc0d0 ("RDMA/ucma: Limit possible option size")
Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-21 14:22:22 -06:00
Leon Romanovsky
103140eccb RDMA/restrack: Move restrack_clean to be symmetrical to restrack_init
The fact that resource tracking commit 02d8883f52 ("RDMA/restrack: Add
general infrastructure to track RDMA resources") was added immediately
after commit 16c1975f10 ("IB/mlx5: Create profile infrastructure to add
and remove stages") caused us to miss the fact that PD and CQ are created
after ib_register_device, but released after ib_unregister_device() and
not before as it is expected from normal flow.

Fix introduced in commit 42cea83f95 ("IB/mlx5: Fix cleanup order on
unload") revealed this fact, so this patch is needed to avoid from
restrack warnings

It fixes resource tracking warnings during shutdown.

[   43.473906] CPU: 5 PID: 3016 Comm: modprobe Not tainted 4.16.0-rc5-for-linust-perf-2018-03-19_07-01-58-14 #1
[   43.473907] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
[   43.473919] RIP: 0010:rdma_restrack_clean+0x25/0x30 [ib_core]
[   43.473921] RSP: 0018:ffffc9000267be48 EFLAGS: 00010282
[   43.473924] RAX: 0000000000000000 RBX: ffff88033c690070 RCX: 0000000180080006
[   43.473925] RDX: ffff88035ce922e0 RSI: ffffea000cf1a200 RDI: ffff88033c6907c8
[   43.473926] RBP: ffff88033c690070 R08: ffff88033c689000 R09: 0000000180080006
[   43.473927] R10: 000000003c68a001 R11: ffff88033c689000 R12: ffff88033c690000
[   43.473929] R13: ffff88033c69005c R14: 0000000000000000 R15: 0000000000000000
[   43.473932] FS:  00007f5928359740(0000) GS:ffff88036c540000(0000) knlGS:0000000000000000
[   43.473933] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   43.473935] CR2: 00007ffffc760cc8 CR3: 000000035620c000 CR4: 00000000000006e0
[   43.473940] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   43.473941] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   43.473942] Call Trace:
[   43.473969]  ib_unregister_device+0xf5/0x190 [ib_core]
[   43.474000]  __mlx5_ib_remove+0x2e/0x40 [mlx5_ib]
[   43.474098]  mlx5_remove_device+0xf5/0x120 [mlx5_core]
[   43.474132]  mlx5_unregister_interface+0x37/0x90 [mlx5_core]
[   43.474142]  mlx5_ib_cleanup+0xc/0x16a [mlx5_ib]
[   43.474152]  SyS_delete_module+0x159/0x260
[   43.474159]  do_syscall_64+0x61/0x110
[   43.474165]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[   43.474168] RIP: 0033:0x7f59278466b7
[   43.474170] RSP: 002b:00007ffffc763e38 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
[   43.474172] RAX: ffffffffffffffda RBX: 000000000130d590 RCX: 00007f59278466b7
[   43.474173] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000000000130d5f8
[   43.474175] RBP: 0000000000000000 R08: 00007f5927b0b060 R09: 00007f59278b6a40
[   43.474176] R10: 00007ffffc763bc0 R11: 0000000000000202 R12: 0000000000000000
[   43.474177] R13: 0000000000000001 R14: 000000000130d5f8 R15: 0000000000000000
[   43.474179] Code: 84 00 00 00 00 00 0f 1f 44 00 00 48 83 c7 28 31 c0
eb 0c 48 83 c0 08 48 3d 00 08 00 00 74 0f 48 8d 14 07 48 8b 12 48 85 d2
74 e8 <0f> 0b c3 f3 c3 66 0f 1f 44 00 00 0f 1f 44 00 00 53 48 8b 47 28
[   43.474221] ---[ end trace e89771e2250ffc23 ]---

Fixes: 42cea83f95 ("IB/mlx5: Fix cleanup order on unload")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-21 14:22:22 -06:00
Mark Bloch
32927e281c IB/mlx5: Don't clean uninitialized UMR resources
In case we failed to create UMR resources, mark them as invalid so we
won't try to destroy them on the unwind path.

Add the relevant checks to destroy_umrc_res(), this is done for the
unlikely event ib_register_device() or create_umr_res() err out and we
try to destroy invalid objects.

Fixes: 42cea83f95 ("IB/mlx5: Fix cleanup order on unload")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-21 14:22:22 -06:00
Sinan Kaya
97d82a48d7 IB/mlx4: Eliminate duplicate barriers on weakly-ordered archs
Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-21 13:51:41 -06:00
Linus Torvalds
303851e14a Fourth pull request for 4.16-rc
- Many bug fixes related to syzkaller from Leon Romanovsky.
   These are still for the mlx driver and ucma interface.
 - Fix a situation with port reuse for iWarp, discovered during scale-up
   testing
 - Bug fixes for the profile and restrack patches accepted during this merge
   window
 - Compile warning cleanups from Arnd, this is apparently the last warning
   to make 32 bit builds quite.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJasZXlAAoJEDht9xV+IJsaOv8QAJ7FwiS/DVadNBD9/Gjksb/V
 co41g9eK4s0qEKXJgMLjSr/aaK7gCG2waeXRSSJm52e1ERyiyceRaoLWqhJ98ZOt
 xs/rAj4aBKmRgQ5j082O3YYVaAHETTIcPYVjlULDNhkqH1xhtm7LmKxTHPxGxLdU
 IMKZeQStY46or3fuOyDmHQ7RbxRbl+7LhL1LEE9+dd7u6fSipgGWv6i3Wrqex/AV
 XC1C1Pabq7Qo+d516mNX2JSjo9jT4kuamLprxQtUJxMeU5UJ+7IlZlTOGgU0q2Zv
 x8W7744zVvifUc1gs3AFRvwhDkTrwnkhFullCyVN86r1jrduG1kxX+N0ksCY79yB
 h9A5r+qV63XPelDJAFQIllkLPh8p3raCfvfQZAMkVqh2Lqn2s1tlukN/6NLAaguW
 YAwbRk0Q1XgXRj4mGW+3vEH7UGMgaIqF2JlnU25hOuoyVSUkgvy88NG9aVx//a5h
 KCdRa/iqTDJthKfnCAu+yYa4k5AKeRkdNKB0GebiPrrdpgJHMBKuPCjLrd4NP9QG
 As1gi9N3gtNoZL7QEyYBL8NIXNpiiY4YANFf7otoZwvFBSzILKWJI74WOg8HWGJT
 jKDQk6WQfYS3Xe3WVy0WOXhsdvYyiCXdag63ErUlzAMffhpp1GZoBEEk+4z+lYL8
 W69w1xQcntpG9N+EBqBT
 =Gal+
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Not much exciting here, almost entirely syzkaller fixes.

  This is going to be on ongoing theme for some time, I think. Both
  Google and Mellanox are now running syzkaller on different parts of
  the user API.

  Summary:

   - Many bug fixes related to syzkaller from Leon Romanovsky. These are
     still for the mlx driver and ucma interface.

   - Fix a situation with port reuse for iWarp, discovered during
     scale-up testing

   - Bug fixes for the profile and restrack patches accepted during this
     merge window

   - Compile warning cleanups from Arnd, this is apparently the last
     warning to make 32 bit builds quiet"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/ucma: Ensure that CM_ID exists prior to access it
  RDMA/verbs: Remove restrack entry from XRCD structure
  RDMA/ucma: Fix use-after-free access in ucma_close
  RDMA/ucma: Check AF family prior resolving address
  infiniband: bnxt_re: use BIT_ULL() for 64-bit bit masks
  infiniband: qplib_fp: fix pointer cast
  IB/mlx5: Fix cleanup order on unload
  RDMA/ucma: Don't allow join attempts for unsupported AF family
  RDMA/ucma: Fix access to non-initialized CM_ID object
  RDMA/core: Do not use invalid destination in determining port reuse
  RDMA/mlx5: Fix crash while accessing garbage pointer and freed memory
  IB/mlx5: Fix integer overflows in mlx5_ib_create_srq
  IB/mlx5: Fix out-of-bounds read in create_raw_packet_qp_rq
2018-03-20 17:39:07 -07:00
Leon Romanovsky
e8980d67d6 RDMA/ucma: Ensure that CM_ID exists prior to access it
Prior to access UCMA commands, the context should be initialized
and connected to CM_ID with ucma_create_id(). In case user skips
this step, he can provide non-valid ctx without CM_ID and cause
to multiple NULL dereferences.

Also there are situations where the create_id can be raced with
other user access, ensure that the context is only shared to
other threads once it is fully initialized to avoid the races.

[  109.088108] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[  109.090315] IP: ucma_connect+0x138/0x1d0
[  109.092595] PGD 80000001dc02d067 P4D 80000001dc02d067 PUD 1da9ef067 PMD 0
[  109.095384] Oops: 0000 [#1] SMP KASAN PTI
[  109.097834] CPU: 0 PID: 663 Comm: uclose Tainted: G    B 4.16.0-rc1-00062-g2975d5de6428 #45
[  109.100816] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[  109.105943] RIP: 0010:ucma_connect+0x138/0x1d0
[  109.108850] RSP: 0018:ffff8801c8567a80 EFLAGS: 00010246
[  109.111484] RAX: 0000000000000000 RBX: 1ffff100390acf50 RCX: ffffffff9d7812e2
[  109.114496] RDX: 1ffffffff3f507a5 RSI: 0000000000000297 RDI: 0000000000000297
[  109.117490] RBP: ffff8801daa15600 R08: 0000000000000000 R09: ffffed00390aceeb
[  109.120429] R10: 0000000000000001 R11: ffffed00390aceea R12: 0000000000000000
[  109.123318] R13: 0000000000000120 R14: ffff8801de6459c0 R15: 0000000000000118
[  109.126221] FS:  00007fabb68d6700(0000) GS:ffff8801e5c00000(0000) knlGS:0000000000000000
[  109.129468] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  109.132523] CR2: 0000000000000020 CR3: 00000001d45d8003 CR4: 00000000003606b0
[  109.135573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  109.138716] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  109.142057] Call Trace:
[  109.144160]  ? ucma_listen+0x110/0x110
[  109.146386]  ? wake_up_q+0x59/0x90
[  109.148853]  ? futex_wake+0x10b/0x2a0
[  109.151297]  ? save_stack+0x89/0xb0
[  109.153489]  ? _copy_from_user+0x5e/0x90
[  109.155500]  ucma_write+0x174/0x1f0
[  109.157933]  ? ucma_resolve_route+0xf0/0xf0
[  109.160389]  ? __mod_node_page_state+0x1d/0x80
[  109.162706]  __vfs_write+0xc4/0x350
[  109.164911]  ? kernel_read+0xa0/0xa0
[  109.167121]  ? path_openat+0x1b10/0x1b10
[  109.169355]  ? fsnotify+0x899/0x8f0
[  109.171567]  ? fsnotify_unmount_inodes+0x170/0x170
[  109.174145]  ? __fget+0xa8/0xf0
[  109.177110]  vfs_write+0xf7/0x280
[  109.179532]  SyS_write+0xa1/0x120
[  109.181885]  ? SyS_read+0x120/0x120
[  109.184482]  ? compat_start_thread+0x60/0x60
[  109.187124]  ? SyS_read+0x120/0x120
[  109.189548]  do_syscall_64+0xeb/0x250
[  109.192178]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[  109.194725] RIP: 0033:0x7fabb61ebe99
[  109.197040] RSP: 002b:00007fabb68d5e98 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[  109.200294] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fabb61ebe99
[  109.203399] RDX: 0000000000000120 RSI: 00000000200001c0 RDI: 0000000000000004
[  109.206548] RBP: 00007fabb68d5ec0 R08: 0000000000000000 R09: 0000000000000000
[  109.209902] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fabb68d5fc0
[  109.213327] R13: 0000000000000000 R14: 00007fff40ab2430 R15: 00007fabb68d69c0
[  109.216613] Code: 88 44 24 2c 0f b6 84 24 6e 01 00 00 88 44 24 2d 0f
b6 84 24 69 01 00 00 88 44 24 2e 8b 44 24 60 89 44 24 30 e8 da f6 06 ff
31 c0 <66> 41 83 7c 24 20 1b 75 04 8b 44 24 64 48 8d 74 24 20 4c 89 e7
[  109.223602] RIP: ucma_connect+0x138/0x1d0 RSP: ffff8801c8567a80
[  109.226256] CR2: 0000000000000020

Fixes: 7521663857 ("RDMA/cma: Export rdma cm interface to userspace")
Reported-by: <syzbot+36712f50b0552615bf59@syzkaller.appspotmail.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-20 11:07:21 -06:00
Matan Barak
185899ee8d IB/uverbs: Enable ioctl() uAPI by default for new verbs
Enable the ioctl() uAPI for IB by default if the standard write()
uAPI (INFINIBAND_USER_ACCESS) is enabled. Verbs that are
also available under the old write() uAPI are put inside a new
INFINIBAND_EXP_LEGACY_VERBS_NEW_UAPI Kconfig.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak
41b2a71fc8 IB/uverbs: Move ioctl path of create_cq and destroy_cq to a new file
Currently, all objects are declared in uverbs_std_types. This could lead
to a huge file once we implement all objects, methods and handlers.
Moving each object to its own file to keep the files smaller and more
readable. uverbs_std_types.c will only contain the parsing tree
definition and objects without any methods.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak
dfb1395573 IB/uverbs: Expose parsing tree of all common objects to providers
The ioctl() based uverbs is based on merging feature trees. This teaches
the generic parser how to parse methods according to the provider's
support. In order to support merging with the common objects, exporting
the common-object-tree to the provider drivers.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak
c66db31113 IB/uverbs: Safely extend existing attributes
Previously, we've used UVERBS_ATTR_SPEC_F_MIN_SZ for extending existing
attributes. The behavior of this flag was the kernel accepts anything
bigger than the minimum size it specified. This is unsafe, since in
order to safely extend an attribute, we need to make sure unknown size
is zeroed. Replacing UVERBS_ATTR_SPEC_F_MIN_SZ with
UVERBS_ATTR_SPEC_F_MIN_SZ_OR_ZERO, which essentially checks that the
unknown size is zero. In addition, attributes are now decorated with
UVERBS_ATTR_TYPE and UVERBS_ATTR_STRUCT, so we can provide the minimum
and known length.

Users of this flag needs to use copy_from_or_zero functions/macros.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak
1f07e08fab IB/uverbs: Enable compact representation of uverbs_attr_spec
Downstream patches extend uverbs_attr_spec with new fields.
In order to save space, we move the type and flags fields to
the various attribute flavors contained in the union.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak
0ede73bc01 IB/uverbs: Extend uverbs_ioctl header with driver_id
Extending uverbs_ioctl header with driver_id and another reserved
field. driver_id should be used in order to identify the driver.
Since every driver could have its own parsing tree, this is necessary
for strace support.
Downstream patches take off the EXPERIMENTAL flag from the ioctl() IB
support and thus we add some reserved fields for future usage.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak
1f7ff9d5d3 IB/uverbs: Move to new headers and make naming consistent
Use macros to make names consistent in ioctl() uAPI:
The ioctl() uAPI works with object-method hierarchy. The method part
also states which handler should be executed when this method is called
from user-space. Therefore, we need to tie method, method's id, method's
handler and the object owning this method together.
Previously, this was done through explicit developer chosen names.
This makes grepping the code harder. Changing the method's name,
method's handler and object's name to be automatically generated based
on the ids.

The headers are split in a way so they be included and used by
user-space. One header strictly contains structures that are used
directly by user-space applications, where another header is used for
internal library (i.e. libibverbs) to form the ioctl() commands.
Other header simply contains the required general command structure.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Leon Romanovsky
ed65a4dc22 RDMA/ucma: Fix use-after-free access in ucma_close
The error in ucma_create_id() left ctx in the list of contexts belong
to ucma file descriptor. The attempt to close this file descriptor causes
to use-after-free accesses while iterating over such list.

Fixes: 7521663857 ("RDMA/cma: Export rdma cm interface to userspace")
Reported-by: <syzbot+dcfd344365a56fbebd0f@syzkaller.appspotmail.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:01:35 -06:00
Bart Van Assche
b470c154c6 IB/srp: Disallow duplicate RDMA/CM connections
According to the SRP standard the INITIATOR and TARGET PORT IDENTIFIER
fields from the login request specify the I_T nexus. Whether or not an
SRP target closes an existing connection for an I_T nexus when a login
request is received depends on the value of the MULTICHANNEL field in
the login request. The SRP initiator derives the value of the
INITIATOR and TARGET PORT IDENTIFIER fields from the .id_ext,
.ioc_guid, .initiator_ext .sgid members of the srp_target_port
structure. This means that the .rdma_cm.dst check must be removed from
srp_conn_unique(). This patch avoids that for target ports that have
multiple addresses, e.g. an IPv4 and an IPv6 address, and if a
connection is established to both target port addresses, that the
initiator logs in alternatingly every 10 seconds to the other target
port address. An SRP target must namely terminate all but one
connections for a given I_T nexus if the MULTICHANNEL field has not
been set in the login request.

Fixes: 19f313438c ("IB/srp: Add RDMA/CM support")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 13:54:50 -06:00
Bodong Wang
61147f391a IB/mlx5: Packet packing enhancement for RAW QP
Enable RAW QP to be able to configure burst control by modify_qp. By
using burst control with rate limiting, user can achieve best
performance and accuracy. The burst control information is passed by
user through udata.

This patch also reports burst control capability for mlx5 related
hardwares, burst control is only marked as supported when both
packet_pacing_burst_bound and packet_pacing_typical_size are
supported.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 11:55:13 -06:00