Commit Graph

1619 Commits

Author SHA1 Message Date
Parav Pandit
1c43d5d308 IB/core: Avoid exporting module internal ib_find_gid_by_filter()
ib_find_gid_by_filter() is used only by ib_core, therefore avoid
exporting it.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:09 -07:00
Parav Pandit
151ed9d700 IB/core: Refactor to avoid unnecessary check on GID lookup miss
Currently on every gid entry comparison miss found variable is checked;
which is not needed as those two comparison fail already indicate that
GID is not found yet.
So refactor to avoid such check and copy the GID index when found.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:08 -07:00
Parav Pandit
b0dd0d3353 IB/core: Avoid unnecessary type cast
Type cast from void to struct find_gid_index_context is not needed.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:08 -07:00
Parav Pandit
981b5a2384 RDMA/cma: Introduce and use helper functions to init work
Introduce and user helper functions to initialize work for address
resolved and route resolved event that avoid code duplication at few
places.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:07 -07:00
Parav Pandit
c423880537 RDMA/cma: Avoid setting path record type twice
Avoid setting path record type twice for RoCE.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:07 -07:00
Parav Pandit
4367ec7fe2 RDMA/cma: Simplify netdev check
Current code checks for NULL ndev twice where 2nd check is always
invalid given the fact that during route resolving stage, device address
must be bound to netdevice interface.

This patch simplifies such check.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:07 -07:00
Parav Pandit
5c181bda77 RDMA/cma: Set default GID type as RoCE when resolving RoCE route
As the function name suggests cma_resolve_iboe_route() resolves RoCE
route. However, its default GID type is IB_GID_TYPE_IB and not
IB_GID_TYPE_ROCE, even though both are mapped to the same enum value.
Change default GID type to IB_GID_TYPE_ROCE.

cma_iboe_set_mgid() is updated to reflect the RoCEv2 GID check.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:06 -07:00
Artemy Kovalyov
edf1a84fe3 IB/umem: Fix use of npages/nmap fields
In ib_umem structure npages holds original number of sg entries, while
nmap is number of DMA blocks returned by dma_map_sg.

Fixes: c5d76f130b ('IB/core: Add umem function to read data from user-space')
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:06 -07:00
Daniel Jurgens
119bf81793 IB/cm: Add debug prints to ib_cm
Add debug prints to the error paths in the connection manager control
flows, to help debug connection management problems.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:06 -07:00
Matan Barak
8b00914654 IB/core: Fix memory leak in cm_req_handler error flows
In cm_req_handler error flows, sometimes cm_id_priv->timewait_info
isn't free'd.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:05 -07:00
Parav Pandit
7baaa49af3 RDMA/cma: Use correct size when writing netlink stats
The code was using the src size when formatting the dst. They are almost
certainly the same value but it reads wrong.

Fixes: ce117ffac2 ("RDMA/cma: Export AF_IB statistics")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 15:37:05 -07:00
Parav Pandit
df8441c668 IB/core: Avoid exporting module internal function
ib_security_modify_qp and ib_security_pkey_access are core internal
function. So avoid exporting them.
ib_security_pkey_access is used only when secuirty hooks are enabled so
avoid defining it otherwise.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 13:49:43 -07:00
Parav Pandit
56d0a7d9a0 IB/core: Depend on IPv6 stack to resolve link local address for RoCEv2
RoCEv1 does not use the IPv6 stack to resolve the link local DGID since it
uses GID address. It forms the DMAC directly from the DGID.

The code became confused and also tried to use this bypass for RoCEv2
packets, however RoCEv2 always uses a IP address in the GID and must
always use ARP or neighbor discovery to get the DMAC address.

Now that rdma_addr_find_l2_eth_by_grh() supports resolving link local
address to find destination mac address, lets make use of it.
This aligns it to how the rest of the IPv6 stack resolves link local
destination IPv6 address.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 13:49:43 -07:00
Parav Pandit
1060f86534 IB/{core/cm}: Fix generating a return AH for RoCEE
When computing a UD reverse path (return AH) from a WC the code was not
doing a route lookup anchored in a specific netdevice. This caused several
bugs, including broken IPv6 link-local address support in RoCEv2. [1]

This fixes the lookup by determining the GID table entry that the HW
matched to the SGID for the WC and then using the netdevice from that
entry to perform the route and ND lookup for the 'DGID' to build a return
AH.

RoCE GID table management ensures that right upper netdevices of the
physical netdevices are added. Therefore init_ah_from_wc doesn't need to
perform such check.

Now that route lookup is done based on the netdevice of the GID entry,
simplify code to not have ifindex and vlan pointers.  As part of that,
refactor to have netdevice as input parameter.  This is already discussed
at [2].

Finally ib_init_ah_from_wc resolves dmac for unicast GID in similar way as
what ib_resolve_eth_dmac() does. So ib_resolve_eth_dmac is refactored to
split for unicast and non unicast GIDs, so that it can be reused by
ib_init_ah_from_wc.

While we are at refactoring ib_resolve_eth_dmac(), it is further
simplified

(a) to avoid hoplimit as optional parameter, as there is only one
    user who always queries hoplimit.
(b) for empty line.
(c) avoided zero initialization of ret.
(d) removed as exported symbol as only ib core uses it.

For IPv6, this is tested using simple rping test as below.
 rping -sv -a ::0
 rping -c -a fe80::268a:7ff:fe55:4661%ens2f1 -C 1 -v -d

[1] https://www.spinics.net/lists/linux-rdma/msg45690.html
[2] https://www.spinics.net/lists/linux-rdma/msg45710.html

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reported-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18 13:49:43 -07:00
Linus Torvalds
f3b5ad89de Second pull request for 4.15-rc
- Fix for SELinux on the umad SMI path. Some old hardware
   does not fill the PKey properly exposing another bug in the newer
   SELinux code.
 - Check the input port as we can exceed array bounds from this
   user supplied value
 - Users are unable to use the hash field support as they want due to
   incorrect checks on the field restrictions, correct that so the
   feature works as intended
 - User triggerable oops in the NETLINK_RDMA handler
 - cxgb4 driver fix for a bad interaction with CQ flushing in iser
   caused by patches in this merge window, and bad CQ flushing during
   normal close.
 - Unbalanced memalloc_noio in ipoib in an error path.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJaNGJ9AAoJEDht9xV+IJsawK4P/iVlUR8DReXKVPkxYOQk15bI
 GEKG3t2Ce1GaeFZY7TBmsdHBRTHhxf2osEM57TbBmWv6N/pG83GLresE6xxOhRHz
 3s2hzWElJXpYnM0QttHCNjJvySIzjzLZiaQyhiWFqs0+cPVUM9zQd0G77LwHngRf
 1gO7toTMYk8eZkJt0ClQwHMeH6qR892o+zDUtorX/Ez4Ly4tT3I/RwRpbZ1HHpsA
 uWMYcsge7lRzFbZnC+lDoeqozcv20B7n9UBEcAHJkVSh5JFC+TByRmCAZ/hPzjXb
 Pr2E4gTYT+ULUsPRECtIwupT30xfFdByFYBAl+EQ+fiJvGgBxcgVjdLDQ3Ddlb6n
 ga5UEverYKivizitKowtpMCJ0nVH6R4qLt5vcPwxuoHKQmUtXQFeg/haZPWCiPwr
 B4Ahm371yRx8xo4AITBFX4L4PdtmdAueyrrjz/MxJm5YM2eRy08OONFVlBqXTuqT
 EdbtHFCbXtE3aAIiWGmUA0jbswKN9fkUct/wMwkny8T3h/XPKhBqA/SWN5SX1KC3
 EHAjczAcX+MOS52pyhf07C3Z/oq4gpXSCQQHSat9es8oxst4w0CWcCGqIhyqyu2q
 s5CZG3Ok+OvTmKWRJkaEAJXHRoTB1OjkgEod13xRDQmONW/cabeBKe/BC1zmvctM
 g2eyl4amP8MGaRTldpou
 =oyIl
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "More fixes from testing done on the rc kernel, including more SELinux
  testing. Looking forward, lockdep found regression today in ipoib
  which is still being fixed.

  Summary:

   - Fix for SELinux on the umad SMI path. Some old hardware does not
     fill the PKey properly exposing another bug in the newer SELinux
     code.

   - Check the input port as we can exceed array bounds from this user
     supplied value

   - Users are unable to use the hash field support as they want due to
     incorrect checks on the field restrictions, correct that so the
     feature works as intended

   - User triggerable oops in the NETLINK_RDMA handler

   - cxgb4 driver fix for a bad interaction with CQ flushing in iser
     caused by patches in this merge window, and bad CQ flushing during
     normal close.

   - Unbalanced memalloc_noio in ipoib in an error path"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/ipoib: Restore MM behavior in case of tx_ring allocation failure
  iw_cxgb4: only insert drain cqes if wq is flushed
  iw_cxgb4: only clear the ARMED bit if a notification is needed
  RDMA/netlink: Fix general protection fault
  IB/mlx4: Fix RSS hash fields restrictions
  IB/core: Don't enforce PKey security on SMI MADs
  IB/core: Bound check alternate path port number
2017-12-16 13:43:08 -08:00
Geert Uytterhoeven
302d6424e4 RDMA/iwpm: Fix uninitialized error code in iwpm_send_mapinfo()
With gcc-4.1.2:

    drivers/infiniband/core/iwpm_util.c: In function ‘iwpm_send_mapinfo’:
    drivers/infiniband/core/iwpm_util.c:647: warning: ‘ret’ may be used uninitialized in this function

Indeed, if nl_client is not found in any of the scanned has buckets, ret
will be used uninitialized.

Preinitialize ret to -EINVAL to fix this.

Fixes: 30dc5e63d6 ("RDMA/core: Add support for iWARP Port Mapper user space service")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-13 10:55:49 -07:00
Gomonovych, Vasyl
f4cd9d588e IB/core: Use PTR_ERR_OR_ZERO()
Fix ptr_ret.cocci warnings:
drivers/infiniband/core/uverbs_cmd.c:1156:1-3: WARNING: PTR_ERR_OR_ZERO can be used

Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-11 16:19:43 -07:00
Don Hiatt
c5c4e40e90 IB/CM: Change sgid to IB GID when handling CM request
ULPs do not understand OPA GIDs and will reject CM requests
if the sgid does not match the local_gid. In order to
fix this behavior we convert the OPA GID back to an IB GID.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-11 16:19:40 -07:00
Leon Romanovsky
d0e312fe3d RDMA/netlink: Fix general protection fault
The RDMA netlink core code checks validity of messages by ensuring
that type and operand are in range. It works well for almost all
clients except NLDEV, which has cb_table less than number of operands.

Request to access such operand will trigger the following kernel panic.

This patch updates all places where cb_table is declared for the
consistency, but only NLDEV is actually need it.

general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN
Modules linked in:
CPU: 0 PID: 522 Comm: syz-executor6 Not tainted 4.13.0+ #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
task: ffff8800657799c0 task.stack: ffff8800695d000
RIP: 0010:rdma_nl_rcv_msg+0x13a/0x4c0
RSP: 0018:ffff8800695d7838 EFLAGS: 00010207
RAX: dffffc0000000000 RBX: 1ffff1000d2baf0b RCX: 00000000704ff4d7
RDX: 0000000000000000 RSI: ffffffff81ddb03c RDI: 00000003827fa6bc
RBP: ffff8800695d7900 R08: ffffffff82ec0578 R09: 0000000000000000
R10: ffff8800695d7900 R11: 0000000000000001 R12: 000000000000001c
R13: ffff880069d31e00 R14: 00000000ffffffff R15: ffff880069d357c0
FS:  00007fee6acb8700(0000) GS:ffff88006ca00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000201a9000 CR3: 0000000059766000 CR4: 00000000000006b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ? rdma_nl_multicast+0x80/0x80
 rdma_nl_rcv+0x36b/0x4d0
 ? ibnl_put_attr+0xc0/0xc0
 netlink_unicast+0x4bd/0x6d0
 ? netlink_sendskb+0x50/0x50
 ? drop_futex_key_refs.isra.4+0x68/0xb0
 netlink_sendmsg+0x9ab/0xbd0
 ? nlmsg_notify+0x140/0x140
 ? wake_up_q+0xa1/0xf0
 ? drop_futex_key_refs.isra.4+0x68/0xb0
 sock_sendmsg+0x88/0xd0
 sock_write_iter+0x228/0x3c0
 ? sock_sendmsg+0xd0/0xd0
 ? do_futex+0x3e5/0xb20
 ? iov_iter_init+0xaf/0x1d0
 __vfs_write+0x46e/0x640
 ? sched_clock_cpu+0x1b/0x190
 ? __vfs_read+0x620/0x620
 ? __fget+0x23a/0x390
 ? rw_verify_area+0xca/0x290
 vfs_write+0x192/0x490
 SyS_write+0xde/0x1c0
 ? SyS_read+0x1c0/0x1c0
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL_64_fastpath+0x18/0xad
RIP: 0033:0x7fee6a74a219
RSP: 002b:00007fee6acb7d58 EFLAGS: 00000212 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000638000 RCX: 00007fee6a74a219
RDX: 0000000000000078 RSI: 0000000020141000 RDI: 0000000000000006
RBP: 0000000000000046 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000212 R12: ffff8800695d7f98
R13: 0000000020141000 R14: 0000000000000006 R15: 00000000ffffffff
Code: d6 48 b8 00 00 00 00 00 fc ff df 66 41 81 e4 ff 03 44 8d 72 ff 4a 8d 3c b5 c0 a6 7f 82 44 89 b5 4c ff ff ff 48 89 f9 48 c1 e9 03 <0f> b6 0c 01 48 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85
RIP: rdma_nl_rcv_msg+0x13a/0x4c0 RSP: ffff8800695d7838
---[ end trace ba085d123959c8ec ]---
Kernel panic - not syncing: Fatal exception

Cc: syzkaller <syzkaller@googlegroups.com>
Fixes: b4c598a67e ("RDMA/netlink: Implement nldev device dumpit calback")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-12-07 15:28:07 -05:00
Daniel Jurgens
0fbe8f575b IB/core: Don't enforce PKey security on SMI MADs
Per the infiniband spec an SMI MAD can have any PKey. Checking the pkey
on SMI MADs is not necessary, and it seems that some older adapters
using the mthca driver don't follow the convention of using the default
PKey, resulting in false denials, or errors querying the PKey cache.

SMI MAD security is still enforced, only agents allowed to manage the
subnet are able to receive or send SMI MADs.

Reported-by: Chris Blake <chrisrblake93@gmail.com>
Cc: <stable@vger.kernel.org> # v4.12
Fixes: 47a2b338fe ("IB/core: Enforce security on management datagrams")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-12-07 15:28:06 -05:00
Daniel Jurgens
4cae8ff136 IB/core: Bound check alternate path port number
The alternate port number is used as an array index in the IB
security implementation, invalid values can result in a kernel panic.

Cc: <stable@vger.kernel.org> # v4.12
Fixes: d291f1a652 ("IB/core: Enforce PKey security on QPs")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-12-07 15:28:06 -05:00
Linus Torvalds
e6cdd80a83 Here is the first rc pull request for RDMA. This includes an important core
fix for a regression in iWarp if SELinux is enabled, a fix for a compilation
 regression introduced in this merge window, and one obscure kconfig
 combination that oops's the kernel.
 
 For drivers, we have hns fixes needed to make their devices work on certain
 ARM IOMMU configurations, a stack data leak for hfi1, and various testing
 discovered -rc bug fixes for i40iw.
 
 This cycle we pushed back on the driver maintainers to have better commit
 messages for -rc material.
 
 You may need to pull my latest PGP key from the GPG key servers for this, I am
 not certain if the subkey update will make it to kernel.org's WKD before you
 need it.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJaJcu6AAoJEDht9xV+IJsa8aoP/Rh3tHT3mqN6v/p9HgUyUNRS
 gyDJ4BHg3A+O1UrnBjFjrAhpX9bqik/96n9t14Er7kVM9gxaDzOaCNd9+ASsKjDf
 fXRusCnKS5RP5CpQ16e6qurkOsBXghsJKTL+zpqGSmDf0yUBQCJUkmRNJNhiaUtW
 YEp92dfZytTK+iEmuXW4fJoIKWK3N5aOkttiK8BFb6XvmsUnWSp1wlBS2FhRzDq9
 PPwfM2EE/x46dFF1/w04M5hVDPO6Bngq0Tvo+EdOlAMwKN3Zmun+fSOLKaxg44Of
 dyN6dsu5tKi200Nbdq6cBkehWL6CukSGdJnepeI+xW+8hve9Eu9O6j6O3pMb/dYn
 /vvqE14KhrR1B3F5LFkJLcxxKRl97S2uPhOY2j3oU4L93s9B4X6geXX2oLVIos1r
 41YPu1/7OQyQffp4eKgsz4eA38TpdG6DoOlFMXgdIboJ8bASuRuyfLISVviMc8dx
 SKQTZTY54FK7uJMRw4rkcOlVUpJ2tyuVZr+Lt8p80IpnySCdJsEgAZkJngCPOKRT
 2h8VdfFwzhdlf3Ni5tZRZdMtE6oMD5BMa0jri7xtyKYa0o3gUqvHDGMTSVlQ1maF
 qXMP2mApTcpdFuFvdbnxIeLzP8zigJVkvIsqeKHGS8gt+dxF/934rTY1NTj3rQN5
 zmClIoiVg7NvHlDzvwg+
 =YaUB
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "Here is the first rc pull request for RDMA. This includes an important
  core fix for a regression in iWarp if SELinux is enabled, a fix for a
  compilation regression introduced in this merge window, and one
  obscure kconfig combination that oops's the kernel.

  For drivers, we have hns fixes needed to make their devices work on
  certain ARM IOMMU configurations, a stack data leak for hfi1, and
  various testing discovered -rc bug fixes for i40iw.

  This cycle we pushed back on the driver maintainers to have better
  commit messages for -rc material"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/core: Only enforce security for InfiniBand
  RDMA/hns: Get rid of page operation after dma_alloc_coherent
  RDMA/hns: Get rid of virt_to_page and vmap calls after dma_alloc_coherent
  RDMA/hns: Fix the issue of IOVA not page continuous in hip08
  IB/core: Init subsys if compiled to vmlinuz-core
  RDMA/cma: Make sure that PSN is not over max allowed
  i40iw: Notify user of established connection after QP in RTS
  i40iw: Move MPA request event for loopback after connect
  i40iw: Correct ARP index mask
  i40iw: Do not free sqbuf when event is I40IW_TIMER_TYPE_CLOSE
  i40iw: Allocate a sdbuf per CQP WQE
  IB: INFINIBAND should depend on HAS_DMA
  IB/hfi1: Initialize bth1 in 16B rc ack builder
2017-12-05 10:10:15 -08:00
Daniel Jurgens
315d160c5a IB/core: Only enforce security for InfiniBand
For now the only LSM security enforcement mechanism available is
specific to InfiniBand. Bypass enforcement for non-IB link types.

This fixes a regression where modify_qp fails for iWARP because
querying the PKEY returns -EINVAL.

Cc: Paul Moore <paul@paul-moore.com>
Cc: Don Dutile <ddutile@redhat.com>
Cc: stable@vger.kernel.org
Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
Fixes: d291f1a65232("IB/core: Enforce PKey security on QPs")
Fixes: 47a2b338fe63("IB/core: Enforce security on management datagrams")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-01 12:21:28 -07:00
Dmitry Monakhov
a9cd1a6737 IB/core: Init subsys if compiled to vmlinuz-core
Once infiniband is compiled as a core component its subsystem must be
enabled before device initialization. Otherwise there is a NULL pointer
dereference during mlx4_core init, calltrace:
->device_add
  if (dev->class) {
     deref  dev->class->p =>NULLPTR

#Config
CONFIG_NET_DEVLINK=y
CONFIG_MAY_USE_DEVLINK=y
CONFIG_MLX4_EN=y

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-01 12:21:26 -07:00
Moni Shoua
23a9cd2ad9 RDMA/cma: Make sure that PSN is not over max allowed
This patch limits the initial value for PSN to 24 bits as
spec requires.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-01 12:21:26 -07:00
Dan Williams
5f1d43de54 IB/core: disable memory registration of filesystem-dax vmas
Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow RDMA to create long standing memory registrations
against filesytem-dax vmas.

Link: http://lkml.kernel.org/r/151068941011.7446.7766030590347262502.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a6 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29 18:40:42 -08:00
Al Viro
afc9a42b74 the rest of drivers/*: annotate ->poll() instances
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-28 11:06:58 -05:00
Linus Torvalds
ad0835a930 Updates for 4.15 kernel merge window
- Add iWARP support to qedr driver
 - Lots of misc fixes across subsystem
 - Multiple update series to hns roce driver
 - Multiple update series to hfi1 driver
 - Updates to vnic driver
 - Add kref to wait struct in cxgb4 driver
 - Updates to i40iw driver
 - Mellanox shared pull request
 - timer_setup changes
 - massive cleanup series from Bart Van Assche
 - Two series of SRP/SRPT changes from Bart Van Assche
 - Core updates from Mellanox
 - i40iw updates
 - IPoIB updates
 - mlx5 updates
 - mlx4 updates
 - hns updates
 - bnxt_re fixes
 - PCI write padding support
 - Sparse/Smatch/warning cleanups/fixes
 - CQ moderation support
 - SRQ support in vmw_pvrdma
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaDF9JAAoJELgmozMOVy/dDXUP/i92g+G4OJ+4hHMh4KCjQMHT
 eMr/w9l1C033HrtsU1afPhqHOsKSxwCuJSiTgN4uXIm67/2kPK5Vlx+ir7mbOLwB
 3ukVK6Q/aFdigWCUhIaJSlDpjbd2sEj7JwKtM3rucvMWJlBJ4mAbcVQVfU96CCsv
 V9mO7dpR3QtYWDId9DukfnAfPUPFa3SMZnD7tdl6mKNRg/MjWGYLAL4nJoBfex5f
 b4o+MTrbuFWXYsfDru1m9BpHgyul20ldfcnbe8C/sVOQmOgkX7ngD5Sdi1FLeRJP
 GF/DnAqInC9N7cAxZHx4kH9x6mLMmEdfnwQ9VTVqGUHBsj3H4hQTVIAFfHUhWUbG
 TP5ZHgZG2CewZ0rf092cWlDZwp6n0BalnbQJr+QN4MzPmYbofs3AccSKUwrle+e+
 E6yYf4XxJdt7wRr4F1QKygtUEXSnNkNYUDQ4ZFbpJS/D4Sq80R1ZV/WZ7PJxm1D/
 EIKoi7NU9cbPMIlbCzn8kzgfjS7Pe4p0WW/Xxc/IYmACzpwNPkZuFGSND79ksIpF
 jhHqwZsOWFuXISjvcR4loc8wW6a5w5vjOiX0lLVz0NSdXSzVqav/2at7ZLDx/PT+
 Lh9YVL51akA3hiD+3X6iOhfOUu6kskjT9HijE5T8rJnf0V+C6AtIRpwrQ7ONmjJm
 3JMrjjLxtCIvpUyzCvDW
 =A1oL
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "This is a fairly plain pull request. Lots of driver updates across the
  stack, a huge number of static analysis cleanups including a close to
  50 patch series from Bart Van Assche, and a number of new features
  inside the stack such as general CQ moderation support.

  Nothing really stands out, but there might be a few conflicts as you
  take things in. In particular, the cleanups touched some of the same
  lines as the new timer_setup changes.

  Everything in this pull request has been through 0day and at least two
  days of linux-next (since Stephen doesn't necessarily flag new
  errors/warnings until day2). A few more items (about 30 patches) from
  Intel and Mellanox showed up on the list on Tuesday. I've excluded
  those from this pull request, and I'm sure some of them qualify as
  fixes suitable to send any time, but I still have to review them
  fully. If they contain mostly fixes and little or no new development,
  then I will probably send them through by the end of the week just to
  get them out of the way.

  There was a break in my acceptance of patches which coincides with the
  computer problems I had, and then when I got things mostly back under
  control I had a backlog of patches to process, which I did mostly last
  Friday and Monday. So there is a larger number of patches processed in
  that timeframe than I was striving for.

  Summary:
   - Add iWARP support to qedr driver
   - Lots of misc fixes across subsystem
   - Multiple update series to hns roce driver
   - Multiple update series to hfi1 driver
   - Updates to vnic driver
   - Add kref to wait struct in cxgb4 driver
   - Updates to i40iw driver
   - Mellanox shared pull request
   - timer_setup changes
   - massive cleanup series from Bart Van Assche
   - Two series of SRP/SRPT changes from Bart Van Assche
   - Core updates from Mellanox
   - i40iw updates
   - IPoIB updates
   - mlx5 updates
   - mlx4 updates
   - hns updates
   - bnxt_re fixes
   - PCI write padding support
   - Sparse/Smatch/warning cleanups/fixes
   - CQ moderation support
   - SRQ support in vmw_pvrdma"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (296 commits)
  RDMA/core: Rename kernel modify_cq to better describe its usage
  IB/mlx5: Add CQ moderation capability to query_device
  IB/mlx4: Add CQ moderation capability to query_device
  IB/uverbs: Add CQ moderation capability to query_device
  IB/mlx5: Exposing modify CQ callback to uverbs layer
  IB/mlx4: Exposing modify CQ callback to uverbs layer
  IB/uverbs: Allow CQ moderation with modify CQ
  iw_cxgb4: atomically flush the qp
  iw_cxgb4: only call the cq comp_handler when the cq is armed
  iw_cxgb4: Fix possible circular dependency locking warning
  RDMA/bnxt_re: report vlan_id and sl in qp1 recv completion
  IB/core: Only maintain real QPs in the security lists
  IB/ocrdma_hw: remove unnecessary code in ocrdma_mbx_dealloc_lkey
  RDMA/core: Make function rdma_copy_addr return void
  RDMA/vmw_pvrdma: Add shared receive queue support
  RDMA/core: avoid uninitialized variable warning in create_udata
  RDMA/bnxt_re: synchronize poll_cq and req_notify_cq verbs
  RDMA/bnxt_re: Flush CQ notification Work Queue before destroying QP
  RDMA/bnxt_re: Set QP state in case of response completion errors
  RDMA/bnxt_re: Add memory barriers when processing CQ/EQ entries
  ...
2017-11-15 14:54:53 -08:00
Linus Torvalds
abc36be236 A couple of configfs cleanups:
- proper use of the bool type (Thomas Meyer)
   - constification of struct config_item_type (Bhumika Goyal)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAloLSTALHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYNxfhAAv3cunxiEPEAvs+1xuGd3cZYaxz7qinvIODPxIKoF
 kRWiuy5PUklRMnJ8seOgJ1p1QokX6Sk4cZ8HcctDJVByqODjOq4K5eaKVN1ZqJoz
 BUzO/gOqfs64r9yaFIlKfe8nFA+gpUftSeWyv3lThxAIJ1iSbue7OZ/A10tTOS1m
 RWp9FPepFv+nJMfWqeQU64BsoDQ4kgZ2NcEA+jFxNx5dlmIbLD49tk0lfddvZQXr
 j5WyAH73iugilLtNUGVOqSzHBY4kUvfCKUV7leirCegyMoGhFtA87m6Wzwbo6ZUI
 DwQLzWvuPaGv1P2PpNEHfKiNbfIEp75DRyyyf87DD3lc5ffAxQSm28mGuwcr7Rn5
 Ow/yWL6ERMzCLExoCzEkXYJISy7T5LIzYDgNggKMpeWxysAduF7Onx7KfW1bTuhK
 mHvY7iOXCjEvaIVaF8uMKE6zvuY1vCMRXaJ+kC9jcIE3gwhg+2hmQvrdJ2uAFXY+
 rkeF2Poj/JlblPU4IKWAjiPUbzB7Lv0gkypCB2pD4riaYIN5qCAgF8ULIGQp2hsO
 lYW1EEgp5FBop85oSO/HAGWeH9dFg0WaV7WqNRVv0AGXhKjgy+bVd7iYPpvs7mGw
 z9IqSQDORcG2ETLcFhZgiJpCk/itwqXBD+wgMOjJPP8lL+4kZ8FcuhtY9kc9WlJE
 Tew=
 =+tMO
 -----END PGP SIGNATURE-----

Merge tag 'configfs-for-4.15' of git://git.infradead.org/users/hch/configfs

Pull configfs updates from Christoph Hellwig:
 "A couple of configfs cleanups:

   - proper use of the bool type (Thomas Meyer)

   - constification of struct config_item_type (Bhumika Goyal)"

* tag 'configfs-for-4.15' of git://git.infradead.org/users/hch/configfs:
  RDMA/cma: make config_item_type const
  stm class: make config_item_type const
  ACPI: configfs: make config_item_type const
  nvmet: make config_item_type const
  usb: gadget: configfs: make config_item_type const
  PCI: endpoint: make config_item_type const
  iio: make function argument and some structures const
  usb: gadget: make config_item_type structures const
  dlm: make config_item_type const
  netconsole: make config_item_type const
  nullb: make config_item_type const
  ocfs2/cluster: make config_item_type const
  target: make config_item_type const
  configfs: make ci_type field, some pointers and function arguments const
  configfs: make config_item_type const
  configfs: Fix bool initialization/comparison
2017-11-14 14:44:04 -08:00
Leon Romanovsky
4190b4e969 RDMA/core: Rename kernel modify_cq to better describe its usage
Current ib_modify_cq() is used to set CQ moderation parameters.

This patch renames ib_modify_cq() to be rdma_set_cq_moderation(),
because the kernel version of RDMA API doesn't need to follow already
exposed to user's API pattern (create_XXX/modify_XXX/query_XXX/destroy_XXX)
and better to have more accurate name which describes the actual usage.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 16:59:22 -05:00
Yonatan Cohen
18bd907292 IB/uverbs: Add CQ moderation capability to query_device
The query_device function can now obtain the maximum values for
cq_max_count and cq_period, needed for CQ moderation.
cq_max_count is a 16 bits number that determines the number
of CQEs to accumulate before generating an event.
cq_period is a 16 bits number that determines the timeout in micro
seconds from the last event generated, upon which a new event will
be generated even if cq_max_count was not reached.

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 16:59:22 -05:00
Yonatan Cohen
869ddcf8b3 IB/uverbs: Allow CQ moderation with modify CQ
Uverbs support in modify_cq for CQ moderation only.
Gives ability to change cq_max_count and cq_period.
CQ moderation enhance performance by moderating the number
of CQEs needed to create an event instead of application
having to suffer from event per-CQE.
To achieve CQ moderation the application needs to set cq_max_count
and cq_period.
cq_max_count - defines the number of CQEs needed to create an event.
cq_period - defines the timeout (micro seconds) between last
            event and a new one that will occur even if
	    cq_max_count was not satisfied

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 16:59:22 -05:00
Daniel Jurgens
877add2817 IB/core: Only maintain real QPs in the security lists
When modify QP is called on a shared QP update the security context for
the real QP. When security is subsequently enforced the shared QP
handles will be checked as well.

Without this change shared QP handles get added to the port/pkey lists,
which is a bug, because not all shared QP handles will be checked for
access. Also the shared QP security context wouldn't get removed from
the port/pkey lists causing access to free memory and list corruption
when they are destroyed.

Cc: stable@vger.kernel.org
Fixes: d291f1a652 ("IB/core: Enforce PKey security on QPs")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 16:24:17 -05:00
Yuval Shaia
e08ce2e82b RDMA/core: Make function rdma_copy_addr return void
Function returns zero - make it void.

While there make struct net_device const.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 16:18:33 -05:00
Arnd Bergmann
cb9fd89f91 RDMA/core: avoid uninitialized variable warning in create_udata
As Dan pointed out, the rework I did makes it harder for smatch and other
static checkers to figure out what is going on with the uninitialized
pointers.

By open-coding the call in create_udata(), we make it more readable for
both humans and tools.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 12f727721e ("IB/uverbs: clean up INIT_UDATA_BUF_OR_NULL usage")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 16:11:11 -05:00
Don Hiatt
19b57c6c44 IB/core: Convert OPA AH to IB for Extended LIDs only
When deciding to convert an OPA AH to IB we were incorrectly
including the IB multicast range. At this layer, all Extended
LIDs will be larger than IB_LID_PERMISSIVE. Change comparison
accordingly.

Fixes: d541e45500 ("IB/core: Convert ah_attr from OPA to IB when copying to user")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 15:53:57 -05:00
Parav Pandit
2e4c85c6ed IB/core: Avoid unnecessary return value check
Since there is nothing done with non zero return value, such check is
avoided.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13 14:42:04 -05:00
Noa Osherovich
e1d2e88733 IB/core: Add PCI write end padding flags for WQ and QP
There are root complexes that are able to optimize their
performance when incoming data is multiple full cache lines.

PCI write end padding is the device's ability to pad the ending of
incoming packets (scatter) to full cache line such that the last
upstream write generated by an incoming packet will be a full cache
line.

Add a relevant entry to ib_device_cap_flags to report such capability
of an RDMA device.

Add the QP and WQ create flags:
 * A QP/WQ created with a scatter end padding flag will cause
   HW to pad the last upstream write generated by a packet to cache line.

User should consider several factors before activating this feature:
- In case of high CPU memory load (which may cause PCI back pressure in
  turn), if a large percent of the writes are partial cache line, this
  feature should be checked as an optional solution.
- This feature might reduce performance if most packets are between one
  and two cache lines and PCIe throughput has reached its maximum
  capacity. E.g. 65B packet from the network port will lead to 128B
  write on PCIe, which may cause traffic on PCIe to reach high
  throughput.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 13:50:27 -05:00
Parav Pandit
89548bcafe IB/core: Avoid crash on pkey enforcement failed in received MADs
Below kernel crash is observed when Pkey security enforcement fails on
received MADs. This issue is reported in [1].

ib_free_recv_mad() accesses the rmpp_list, whose initialization is
needed before accessing it.
When security enformcent fails on received MADs, MAD processing avoided
due to security checks failed.

OpenSM[3770]: SM port is down
kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
kernel: IP: ib_free_recv_mad+0x44/0xa0 [ib_core]
kernel: PGD 0
kernel: P4D 0
kernel:
kernel: Oops: 0002 [#1] SMP
kernel: CPU: 0 PID: 2833 Comm: kworker/0:1H Tainted: P          IO    4.13.4-1-pve #1
kernel: Hardware name: Dell       XS23-TY3        /9CMP63, BIOS 1.71 09/17/2013
kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
kernel: task: ffffa069c6541600 task.stack: ffffb9a729054000
kernel: RIP: 0010:ib_free_recv_mad+0x44/0xa0 [ib_core]
kernel: RSP: 0018:ffffb9a729057d38 EFLAGS: 00010286
kernel: RAX: ffffa069cb138a48 RBX: ffffa069cb138a10 RCX: 0000000000000000
kernel: RDX: ffffb9a729057d38 RSI: 0000000000000000 RDI: ffffa069cb138a20
kernel: RBP: ffffb9a729057d60 R08: ffffa072d2d49800 R09: ffffa069cb138ae0
kernel: R10: ffffa069cb138ae0 R11: ffffa072b3994e00 R12: ffffb9a729057d38
kernel: R13: ffffa069d1c90000 R14: 0000000000000000 R15: ffffa069d1c90880
kernel: FS:  0000000000000000(0000) GS:ffffa069dba00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000008 CR3: 00000011f51f2000 CR4: 00000000000006f0
kernel: Call Trace:
kernel:  ib_mad_recv_done+0x5cc/0xb50 [ib_core]
kernel:  __ib_process_cq+0x5c/0xb0 [ib_core]
kernel:  ib_cq_poll_work+0x20/0x60 [ib_core]
kernel:  process_one_work+0x1e9/0x410
kernel:  worker_thread+0x4b/0x410
kernel:  kthread+0x109/0x140
kernel:  ? process_one_work+0x410/0x410
kernel:  ? kthread_create_on_node+0x70/0x70
kernel:  ? SyS_exit_group+0x14/0x20
kernel:  ret_from_fork+0x25/0x30
kernel: RIP: ib_free_recv_mad+0x44/0xa0 [ib_core] RSP: ffffb9a729057d38
kernel: CR2: 0000000000000008

[1] : https://www.spinics.net/lists/linux-rdma/msg56190.html

Fixes: 47a2b338fe ("IB/core: Enforce security on management datagrams")
Cc: stable@vger.kernel.org # 4.13+
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reported-by: Chris Blake <chrisrblake93@gmail.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 13:26:00 -05:00
Leon Romanovsky
fec99ededf RDMA/umem: Avoid partial declaration of non-static function
The RDMA/umem uses generic RB-trees macros to generate various ib_umem
access functions. The generation is performed with INTERVAL_TREE_DEFINE
macro, which allows one of two modes: declare all functions as static or
declare none of the function to be static.

The second mode of operation produces the following sparse errors:
 drivers/infiniband/core/umem_rbtree.c:69:1:
	warning: symbol 'rbt_ib_umem_iter_first' was not declared.
	Should it be static?
 drivers/infiniband/core/umem_rbtree.c:69:1:
	warning: symbol 'rbt_ib_umem_iter_next' was not declared.
	Should it be static?

Code relocation together with declaration of such functions to be
"static" solves the issue.

Because there is no need to have separate file for two functions,
let's consolidate umem_rtree.c and umem_odp.c into one file.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-10 13:02:12 -05:00
Linus Torvalds
ead751507d License cleanup: add SPDX license identifiers to some files
Many source files in the tree are missing licensing information, which
 makes it harder for compliance tools to determine the correct license.
 
 By default all files without license information are under the default
 license of the kernel, which is GPL version 2.
 
 Update the files which contain no license information with the 'GPL-2.0'
 SPDX license identifier.  The SPDX identifier is a legally binding
 shorthand, which can be used instead of the full boiler plate text.
 
 This patch is based on work done by Thomas Gleixner and Kate Stewart and
 Philippe Ombredanne.
 
 How this work was done:
 
 Patches were generated and checked against linux-4.14-rc6 for a subset of
 the use cases:
  - file had no licensing information it it.
  - file was a */uapi/* one with no licensing information in it,
  - file was a */uapi/* one with existing licensing information,
 
 Further patches will be generated in subsequent months to fix up cases
 where non-standard license headers were used, and references to license
 had to be inferred by heuristics based on keywords.
 
 The analysis to determine which SPDX License Identifier to be applied to
 a file was done in a spreadsheet of side by side results from of the
 output of two independent scanners (ScanCode & Windriver) producing SPDX
 tag:value files created by Philippe Ombredanne.  Philippe prepared the
 base worksheet, and did an initial spot review of a few 1000 files.
 
 The 4.13 kernel was the starting point of the analysis with 60,537 files
 assessed.  Kate Stewart did a file by file comparison of the scanner
 results in the spreadsheet to determine which SPDX license identifier(s)
 to be applied to the file. She confirmed any determination that was not
 immediately clear with lawyers working with the Linux Foundation.
 
 Criteria used to select files for SPDX license identifier tagging was:
  - Files considered eligible had to be source code files.
  - Make and config files were included as candidates if they contained >5
    lines of source
  - File already had some variant of a license header in it (even if <5
    lines).
 
 All documentation files were explicitly excluded.
 
 The following heuristics were used to determine which SPDX license
 identifiers to apply.
 
  - when both scanners couldn't find any license traces, file was
    considered to have no license information in it, and the top level
    COPYING file license applied.
 
    For non */uapi/* files that summary was:
 
    SPDX license identifier                            # files
    ---------------------------------------------------|-------
    GPL-2.0                                              11139
 
    and resulted in the first patch in this series.
 
    If that file was a */uapi/* path one, it was "GPL-2.0 WITH
    Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
 
    SPDX license identifier                            # files
    ---------------------------------------------------|-------
    GPL-2.0 WITH Linux-syscall-note                        930
 
    and resulted in the second patch in this series.
 
  - if a file had some form of licensing information in it, and was one
    of the */uapi/* ones, it was denoted with the Linux-syscall-note if
    any GPL family license was found in the file or had no licensing in
    it (per prior point).  Results summary:
 
    SPDX license identifier                            # files
    ---------------------------------------------------|------
    GPL-2.0 WITH Linux-syscall-note                       270
    GPL-2.0+ WITH Linux-syscall-note                      169
    ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
    ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
    LGPL-2.1+ WITH Linux-syscall-note                      15
    GPL-1.0+ WITH Linux-syscall-note                       14
    ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
    LGPL-2.0+ WITH Linux-syscall-note                       4
    LGPL-2.1 WITH Linux-syscall-note                        3
    ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
    ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
 
    and that resulted in the third patch in this series.
 
  - when the two scanners agreed on the detected license(s), that became
    the concluded license(s).
 
  - when there was disagreement between the two scanners (one detected a
    license but the other didn't, or they both detected different
    licenses) a manual inspection of the file occurred.
 
  - In most cases a manual inspection of the information in the file
    resulted in a clear resolution of the license that should apply (and
    which scanner probably needed to revisit its heuristics).
 
  - When it was not immediately clear, the license identifier was
    confirmed with lawyers working with the Linux Foundation.
 
  - If there was any question as to the appropriate license identifier,
    the file was flagged for further research and to be revisited later
    in time.
 
 In total, over 70 hours of logged manual review was done on the
 spreadsheet to determine the SPDX license identifiers to apply to the
 source files by Kate, Philippe, Thomas and, in some cases, confirmation
 by lawyers working with the Linux Foundation.
 
 Kate also obtained a third independent scan of the 4.13 code base from
 FOSSology, and compared selected files where the other two scanners
 disagreed against that SPDX file, to see if there was new insights.  The
 Windriver scanner is based on an older version of FOSSology in part, so
 they are related.
 
 Thomas did random spot checks in about 500 files from the spreadsheets
 for the uapi headers and agreed with SPDX license identifier in the
 files he inspected. For the non-uapi files Thomas did random spot checks
 in about 15000 files.
 
 In initial set of patches against 4.14-rc6, 3 files were found to have
 copy/paste license identifier errors, and have been fixed to reflect the
 correct identifier.
 
 Additionally Philippe spent 10 hours this week doing a detailed manual
 inspection and review of the 12,461 patched files from the initial patch
 version early this week with:
  - a full scancode scan run, collecting the matched texts, detected
    license ids and scores
  - reviewing anything where there was a license detected (about 500+
    files) to ensure that the applied SPDX license was correct
  - reviewing anything where there was no detection but the patch license
    was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
    SPDX license was correct
 
 This produced a worksheet with 20 files needing minor correction.  This
 worksheet was then exported into 3 different .csv files for the
 different types of files to be modified.
 
 These .csv files were then reviewed by Greg.  Thomas wrote a script to
 parse the csv files and add the proper SPDX tag to the file, in the
 format that the file expected.  This script was further refined by Greg
 based on the output to detect more types of files automatically and to
 distinguish between header and source .c files (which need different
 comment types.)  Finally Greg ran the script using the .csv files to
 generate the patches.
 
 Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
 Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
 Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWfswbQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykvEwCfXU1MuYFQGgMdDmAZXEc+xFXZvqgAoKEcHDNA
 6dVh26uchcEQLN/XqUDt
 =x306
 -----END PGP SIGNATURE-----

Merge tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull initial SPDX identifiers from Greg KH:
 "License cleanup: add SPDX license identifiers to some files

  Many source files in the tree are missing licensing information, which
  makes it harder for compliance tools to determine the correct license.

  By default all files without license information are under the default
  license of the kernel, which is GPL version 2.

  Update the files which contain no license information with the
  'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally
  binding shorthand, which can be used instead of the full boiler plate
  text.

  This patch is based on work done by Thomas Gleixner and Kate Stewart
  and Philippe Ombredanne.

  How this work was done:

  Patches were generated and checked against linux-4.14-rc6 for a subset
  of the use cases:

   - file had no licensing information it it.

   - file was a */uapi/* one with no licensing information in it,

   - file was a */uapi/* one with existing licensing information,

  Further patches will be generated in subsequent months to fix up cases
  where non-standard license headers were used, and references to
  license had to be inferred by heuristics based on keywords.

  The analysis to determine which SPDX License Identifier to be applied
  to a file was done in a spreadsheet of side by side results from of
  the output of two independent scanners (ScanCode & Windriver)
  producing SPDX tag:value files created by Philippe Ombredanne.
  Philippe prepared the base worksheet, and did an initial spot review
  of a few 1000 files.

  The 4.13 kernel was the starting point of the analysis with 60,537
  files assessed. Kate Stewart did a file by file comparison of the
  scanner results in the spreadsheet to determine which SPDX license
  identifier(s) to be applied to the file. She confirmed any
  determination that was not immediately clear with lawyers working with
  the Linux Foundation.

  Criteria used to select files for SPDX license identifier tagging was:

   - Files considered eligible had to be source code files.

   - Make and config files were included as candidates if they contained
     >5 lines of source

   - File already had some variant of a license header in it (even if <5
     lines).

  All documentation files were explicitly excluded.

  The following heuristics were used to determine which SPDX license
  identifiers to apply.

   - when both scanners couldn't find any license traces, file was
     considered to have no license information in it, and the top level
     COPYING file license applied.

     For non */uapi/* files that summary was:

       SPDX license identifier                            # files
       ---------------------------------------------------|-------
       GPL-2.0                                              11139

     and resulted in the first patch in this series.

     If that file was a */uapi/* path one, it was "GPL-2.0 WITH
     Linux-syscall-note" otherwise it was "GPL-2.0". Results of that
     was:

       SPDX license identifier                            # files
       ---------------------------------------------------|-------
       GPL-2.0 WITH Linux-syscall-note                        930

     and resulted in the second patch in this series.

   - if a file had some form of licensing information in it, and was one
     of the */uapi/* ones, it was denoted with the Linux-syscall-note if
     any GPL family license was found in the file or had no licensing in
     it (per prior point). Results summary:

       SPDX license identifier                            # files
       ---------------------------------------------------|------
       GPL-2.0 WITH Linux-syscall-note                       270
       GPL-2.0+ WITH Linux-syscall-note                      169
       ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
       ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
       LGPL-2.1+ WITH Linux-syscall-note                      15
       GPL-1.0+ WITH Linux-syscall-note                       14
       ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
       LGPL-2.0+ WITH Linux-syscall-note                       4
       LGPL-2.1 WITH Linux-syscall-note                        3
       ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
       ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

     and that resulted in the third patch in this series.

   - when the two scanners agreed on the detected license(s), that
     became the concluded license(s).

   - when there was disagreement between the two scanners (one detected
     a license but the other didn't, or they both detected different
     licenses) a manual inspection of the file occurred.

   - In most cases a manual inspection of the information in the file
     resulted in a clear resolution of the license that should apply
     (and which scanner probably needed to revisit its heuristics).

   - When it was not immediately clear, the license identifier was
     confirmed with lawyers working with the Linux Foundation.

   - If there was any question as to the appropriate license identifier,
     the file was flagged for further research and to be revisited later
     in time.

  In total, over 70 hours of logged manual review was done on the
  spreadsheet to determine the SPDX license identifiers to apply to the
  source files by Kate, Philippe, Thomas and, in some cases,
  confirmation by lawyers working with the Linux Foundation.

  Kate also obtained a third independent scan of the 4.13 code base from
  FOSSology, and compared selected files where the other two scanners
  disagreed against that SPDX file, to see if there was new insights.
  The Windriver scanner is based on an older version of FOSSology in
  part, so they are related.

  Thomas did random spot checks in about 500 files from the spreadsheets
  for the uapi headers and agreed with SPDX license identifier in the
  files he inspected. For the non-uapi files Thomas did random spot
  checks in about 15000 files.

  In initial set of patches against 4.14-rc6, 3 files were found to have
  copy/paste license identifier errors, and have been fixed to reflect
  the correct identifier.

  Additionally Philippe spent 10 hours this week doing a detailed manual
  inspection and review of the 12,461 patched files from the initial
  patch version early this week with:

   - a full scancode scan run, collecting the matched texts, detected
     license ids and scores

   - reviewing anything where there was a license detected (about 500+
     files) to ensure that the applied SPDX license was correct

   - reviewing anything where there was no detection but the patch
     license was not GPL-2.0 WITH Linux-syscall-note to ensure that the
     applied SPDX license was correct

  This produced a worksheet with 20 files needing minor correction. This
  worksheet was then exported into 3 different .csv files for the
  different types of files to be modified.

  These .csv files were then reviewed by Greg. Thomas wrote a script to
  parse the csv files and add the proper SPDX tag to the file, in the
  format that the file expected. This script was further refined by Greg
  based on the output to detect more types of files automatically and to
  distinguish between header and source .c files (which need different
  comment types.) Finally Greg ran the script using the .csv files to
  generate the patches.

  Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
  Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
  Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  License cleanup: add SPDX license identifier to uapi header files with a license
  License cleanup: add SPDX license identifier to uapi header files with no license
  License cleanup: add SPDX GPL-2.0 license identifier to files with no license
2017-11-02 10:04:46 -07:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Doug Ledford
5c08681b48 Merge branch 'k.o/for-rc' into k.o/for-next
Pick up the missing netlink oops fix

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-01 15:25:27 -04:00
Leon Romanovsky
287683d027 RDMA/nldev: Enforce device index check for port callback
IB device index is nldev's handler and it should be checked always.

Fixes: c3f66f7b00 ("RDMA/netlink: Implement nldev port doit callback")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
[ Applying directly, since Doug fried his SSD's and is rebuilding  - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-31 12:12:55 -07:00
Michael J. Ruhl
b4d91aeb6e RDMA/netlink: OOPs in rdma_nl_rcv_msg() from misinterpreted flag
rdma_nl_rcv_msg() checks to see if it should use the .dump() callback
or the .doit() callback.  The check is done with this check:

if (flags & NLM_F_DUMP) ...

The NLM_F_DUMP flag is two bits (NLM_F_ROOT | NLM_F_MATCH).

When an RDMA_NL_LS message (response) is received, the bit used for
indicating an error is the same bit as NLM_F_ROOT.

NLM_F_ROOT == (0x100) == RDMA_NL_LS_F_ERR.

ibacm sends a response with the RDMA_NL_LS_F_ERR bit set if an error
occurs in the service.  The current code then misinterprets the
NLM_F_DUMP bit and trys to call the .dump() callback.

If the .dump() callback for the specified request is not available
(which is true for the RDMA_NL_LS messages) the following Oops occurs:

[ 4555.960256] BUG: unable to handle kernel NULL pointer dereference at
   (null)
[ 4555.969046] IP:           (null)
[ 4555.972664] PGD 10543f1067 P4D 10543f1067 PUD 1033f93067 PMD 0
[ 4555.979287] Oops: 0010 [#1] SMP
[ 4555.982809] Modules linked in: rpcrdma ib_isert iscsi_target_mod
target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm
ib_uverbs ib_umad rdma_cm ib_cm iw_cm dm_mirror dm_region_hash dm_log dm_mod
dax sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd
glue_helper cryptd hfi1 rdmavt iTCO_wdt iTCO_vendor_support ib_core mei_me
lpc_ich pcspkr mei ioatdma sg shpchp i2c_i801 mfd_core wmi ipmi_si ipmi_devintf
ipmi_msghandler acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace
sunrpc ip_tables ext4 mbcache jbd2 sd_mod mgag200 drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops ttm igb ahci crc32c_intel ptp libahci
pps_core drm dca libata i2c_algo_bit i2c_core
[ 4556.061190] CPU: 54 PID: 9841 Comm: ibacm Tainted: G          I
4.14.0-rc2+ #6
[ 4556.069667] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS
SE5C610.86B.01.01.0008.021120151325 02/11/2015
[ 4556.081339] task: ffff880855f42d00 task.stack: ffffc900246b4000
[ 4556.087967] RIP: 0010:          (null)
[ 4556.092166] RSP: 0018:ffffc900246b7bc8 EFLAGS: 00010246
[ 4556.098018] RAX: ffffffff81dbe9e0 RBX: ffff881058bb1000 RCX:
0000000000000000
[ 4556.105997] RDX: 0000000000001100 RSI: ffff881058bb1320 RDI:
ffff881056362000
[ 4556.113984] RBP: ffffc900246b7bf8 R08: 0000000000000ec0 R09:
0000000000001100
[ 4556.121971] R10: ffff8810573a5000 R11: 0000000000000000 R12:
ffff881056362000
[ 4556.129957] R13: 0000000000000ec0 R14: ffff881058bb1320 R15:
0000000000000ec0
[ 4556.137945] FS:  00007fe0ba5a38c0(0000) GS:ffff88105f080000(0000)
knlGS:0000000000000000
[ 4556.147000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4556.153433] CR2: 0000000000000000 CR3: 0000001056f5d003 CR4:
00000000001606e0
[ 4556.161419] Call Trace:
[ 4556.164167]  ? netlink_dump+0x12c/0x290
[ 4556.168468]  __netlink_dump_start+0x186/0x1f0
[ 4556.173357]  rdma_nl_rcv_msg+0x193/0x1b0 [ib_core]
[ 4556.178724]  rdma_nl_rcv+0xdc/0x130 [ib_core]
[ 4556.183604]  netlink_unicast+0x181/0x240
[ 4556.187998]  netlink_sendmsg+0x2c2/0x3b0
[ 4556.192392]  sock_sendmsg+0x38/0x50
[ 4556.196299]  SYSC_sendto+0x102/0x190
[ 4556.200308]  ? __audit_syscall_entry+0xaf/0x100
[ 4556.205387]  ? syscall_trace_enter+0x1d0/0x2b0
[ 4556.210366]  ? __audit_syscall_exit+0x209/0x290
[ 4556.215442]  SyS_sendto+0xe/0x10
[ 4556.219060]  do_syscall_64+0x67/0x1b0
[ 4556.223165]  entry_SYSCALL64_slow_path+0x25/0x25
[ 4556.228328] RIP: 0033:0x7fe0b9db2a63
[ 4556.232333] RSP: 002b:00007ffc55edc260 EFLAGS: 00000293 ORIG_RAX:
000000000000002c
[ 4556.240808] RAX: ffffffffffffffda RBX: 0000000000000010 RCX:
00007fe0b9db2a63
[ 4556.248796] RDX: 0000000000000010 RSI: 00007ffc55edc280 RDI:
000000000000000d
[ 4556.256782] RBP: 00007ffc55edc670 R08: 00007ffc55edc270 R09:
000000000000000c
[ 4556.265321] R10: 0000000000000000 R11: 0000000000000293 R12:
00007ffc55edc280
[ 4556.273846] R13: 000000000260b400 R14: 000000000000000d R15:
0000000000000001
[ 4556.282368] Code:  Bad RIP value.
[ 4556.286629] RIP:           (null) RSP: ffffc900246b7bc8
[ 4556.293013] CR2: 0000000000000000
[ 4556.297292] ---[ end trace 8d67abcfd10ec209 ]---
[ 4556.305465] Kernel panic - not syncing: Fatal exception
[ 4556.313786] Kernel Offset: disabled
[ 4556.321563] ---[ end Kernel panic - not syncing: Fatal exception
[ 4556.328960] ------------[ cut here ]------------

Special case RDMA_NL_LS response messages to call the appropriate
callback.

Additionally, make sure that the .dump() callback is not NULL
before calling it.

Fixes: 647c75ac59 ("RDMA/netlink: Convert LS to doit callback")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Reviewed-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-25 14:54:43 -04:00
Parav Pandit
5a3dc32372 IB/cm: Fix memory corruption in handling CM request
In recent code, two path record entries are alwasy cleared while
allocated could be either one or two path record entries.
This leads to zero out of unallocated memory.

This fix initializes alternative path record only when alternative path
is set.

While we are at it, path record allocation doesn't check for OPA
alternative path, but rest of the code checks for OPA alternative path.
Path record allocation code doesn't check for OPA alternative LID.
This can further lead to memory corruption when only one path record is
allocated, but there is actually alternative OPA path record present in CM
request.

Cc: <stable@vger.kernel.org> # v4.12+
Fixes: 9fdca4da4d ("IB/SA: Split struct sa_path_rec based on IB and ROCE specific fields")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-25 14:37:03 -04:00
Bhumika Goyal
6ace4f6bbc RDMA/cma: make config_item_type const
Make these structures const as they are either passed to the functions
having the argument as const or stored as a reference in the "ci_type"
const field of a config_item structure.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 16:15:31 +02:00
Doug Ledford
754137a769 Merge branch 'for-next-early' into for-next
The early for-next branch was based on v4.14-rc2, while the shared pull
request I got from Mellanox used a v4.14-rc4 base.  I'm making the
branch that was the shared Mellanox pull request the new for-next branch
and merging the early for-next branch into it.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 13:07:13 -04:00
Parav Pandit
39baf10310 IB/core: Fix use workqueue without WQ_MEM_RECLAIM
The IB/core provides address resolution service and invokes callback
handler when address resolve request completes of requester in worker
thread context.

Such caller might allocate or free memory in callback handler
depending on the completion status to make further progress or to
terminate a connection. Most ULPs resolve route which involves
allocating route entry and path record elements in callback event handler.

It has been noticed that WQ_MEM_RECLAIM flag should not be used for
workers that tend to allocate memory in this [1] thread discussion.

In order to mitigate this situation, WQ_MEM_RECLAIM flag was dropped for
other such WQs in this [2] patch.

Similar problem might arise with address resolution path, though its not
yet noticed. The ib_addr workqueue is not memory reclaim path due to its
nature of invoking callback that might allocate memory or don't free any
memory under memory pressure.

[1] https://www.spinics.net/lists/linux-rdma/msg53239.html
[2] https://www.spinics.net/lists/linux-rdma/msg53416.html

Fixes: f54816261c ("IB/addr: Remove deprecated create_singlethread_workqueue")
Fixes: 5fff41e1f8 ("IB/core: Fix race condition in resolving IP to MAC")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 12:10:36 -04:00
Parav Pandit
79c4d80b43 IB/core: Fix unable to change lifespan entry for hw_counters
This patch fixes the case where 'lifespan' entry of the hw_counters
is not writable. Currently write callback is not exposed for for
the hw_counters sysfs operation. Due to this, modifying lifespan
value results into permission denied error in below example.

echo 10 > /sys/class/infiniband/mlx5_0/ports/1/hw_counters/lifespan
-bash: /sys/class/infiniband/mlx5_0/ports/1/hw_counters/lifespan:
Permission denied

This patch adds the hook to modify any attribute which implements
store() operation.

Fixes: b40f4757da ("IB/core: Make device counter infrastructure dynamic")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 12:10:36 -04:00
Parav Pandit
c0348eb069 IB: Let ib_core resolve destination mac address
Since IB/core resolves the destination mac address for user and kernel
consumers, avoid resolving in multiple provider drivers.

Only ib_core resolves DMAC now, therefore resolve_eth_dmac is removed as
exported symbol.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 12:10:36 -04:00
Parav Pandit
5cda6587fe IB/core: Introduce and use rdma_create_user_ah
Introduce rdma_create_user_ah API which allows passing udata to
provider driver and additionally which resolves DMAC for RoCE.

ib_resolve_eth_dmac() resolves destination mac address for unicast,
multicast, link local ipv4 mapped ipv6 and ipv6 destination gid entry.
This allows all RoCE provider drivers to avoid duplicating such code.

Such change brings consistency where IB core always resolves dmac and pass
it to RoCE provider drivers for user and kernel consumers, with this
ah_attr->roce.dmac is always an input field for provider drivers.

This uniformity avoids exporting ib_resolve_eth_dmac symbol to providers
or other modules. Therefore its removed as exported symbol at later in
the patch series.

Now uverbs and umad both makes use of rdma_create_user_ah API which
fixes the issue where umad has invalid DMAC for address.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 12:10:36 -04:00
Bart Van Assche
69abc735f4 RDMA/uverbs: Make the code in ib_uverbs_cmd_verbs() less confusing
This patch reduces the number of #ifdefs and also avoids that
smatch reports the following:

drivers/infiniband/core/uverbs_ioctl.c:276: ib_uverbs_cmd_verbs() warn: if statement not indented
drivers/infiniband/core/uverbs_ioctl.c:280: ib_uverbs_cmd_verbs() warn: possible memory leak of 'ctx'
drivers/infiniband/core/uverbs_ioctl.c:315: ib_uverbs_cmd_verbs() warn: if statement not indented

References: commit fac9658cab ("IB/core: Add new ioctl interface")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Matan Barak <matanb@mellanox.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 10:42:02 -04:00
Bart Van Assche
d39dcd6a2d RDMA/iwcm: Remove a set-but-not-used variable
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-14 20:47:05 -04:00
Bart Van Assche
c0b64f58e8 RDMA/cma: Avoid triggering undefined behavior
According to the C standard the behavior of computations with
integer operands is as follows:
* A computation involving unsigned operands can never overflow,
  because a result that cannot be represented by the resulting
  unsigned integer type is reduced modulo the number that is one
  greater than the largest value that can be represented by the
  resulting type.
* The behavior for signed integer underflow and overflow is
  undefined.

Hence only use unsigned integers when checking for integer
overflow.

This patch is what I came up with after having analyzed the
following smatch warnings:

drivers/infiniband/core/cma.c:3448: cma_resolve_ib_udp() warn: signed overflow undefined. 'offset + conn_param->private_data_len < conn_param->private_data_len'
drivers/infiniband/core/cma.c:3505: cma_connect_ib() warn: signed overflow undefined. 'offset + conn_param->private_data_len < conn_param->private_data_len'

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-14 20:47:05 -04:00
Bart Van Assche
401c6ae363 IB/cm: Suppress gcc 7 fall-through complaints
Avoid that gcc 7 reports the following warning when building with W=1:

warning: this statement may fall through [-Wimplicit-fallthrough=]

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-14 20:47:05 -04:00
Colin Ian King
318a8ab7e8 IB/core: remove redundant check on prot_sg_cnt
prot_sg_cnt cannot be zero as a previous check on ret (from which
prot_sg_cnt is assigned) returns -ENOMEM if is it zero.  Since
it cannot be zero we can simplify the code by removing the non
-zero check on prot_sg_cnt and redundant else statement.

Detected by CoverityScan, COD#1357188 ("Logically dead code")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-10 10:49:45 -04:00
Bart Van Assche
9d18717790 IB/core: Simplify sa_path_set_[sd]lid() calls
Instead of making every caller convert the second argument of
sa_path_set_slid() and sa_path_set_dlid() to big endian format,
make these two functions accept LIDs in CPU endian format.
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Cc: Don Hiatt <don.hiatt@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-10 10:49:44 -04:00
Don Hiatt
6588e412fe IB/core: Do not warn on lid conversions for OPA
On OPA devices the user_mad recv_handler can receive 32Bit LIDs
(e.g. OPA_PERMISSIVE_LID) and it is okay to lose the upper 16 bits
of the LID as this information is obtained elsewhere. Do not issue
a warning when calling ib_lid_be16() in this case by masking out
the upper 16Bits.

[75667.310846] ------------[ cut here ]------------
[75667.316447] WARNING: CPU: 0 PID: 1718 at ./include/rdma/ib_verbs.h:3799 recv_handler+0x15a/0x170 [ib_umad]
[75667.327640] Modules linked in: ib_ipoib hfi1(E) rdmavt(E) rdma_ucm(E) ib_ucm(E) rdma_cm(E) ib_cm(E) iw_cm(E) ib_umad(E) ib_uverbs(E) ib_core(E) libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log dm_mod dax x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel mei_me ipmi_si iTCO_wdt iTCO_vendor_support crypto_simd ipmi_devintf pcspkr mei sg i2c_i801 glue_helper lpc_ich shpchp ioatdma mfd_core wmi ipmi_msghandler cryptd acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm igb ptp ahci libahci pps_core crc32c_intel libata dca i2c_algo_bit i2c_core [last unloaded: ib_core]
[75667.407704] CPU: 0 PID: 1718 Comm: kworker/0:1H Tainted: G        W I E   4.13.0-rc7+ #1
[75667.417310] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015
[75667.429555] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[75667.436360] task: ffff88084a718000 task.stack: ffffc9000a424000
[75667.443549] RIP: 0010:recv_handler+0x15a/0x170 [ib_umad]
[75667.450090] RSP: 0018:ffffc9000a427ce8 EFLAGS: 00010286
[75667.456508] RAX: 00000000ffffffff RBX: ffff88085159ce80 RCX: 0000000000000000
[75667.465094] RDX: ffff88085a47b068 RSI: 0000000000000000 RDI: ffff88085159cf00
[75667.473668] RBP: ffffc9000a427d38 R08: 000000000001efc0 R09: ffff88085159ce80
[75667.482228] R10: ffff88085f007480 R11: ffff88084acf20e8 R12: ffff88085a47b020
[75667.490824] R13: ffff881056842e10 R14: ffff881056840200 R15: ffff88104c8d0800
[75667.499390] FS:  0000000000000000(0000) GS:ffff88085f400000(0000) knlGS:0000000000000000
[75667.509028] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[75667.516080] CR2: 00007f9e4b3d9000 CR3: 0000000001c09000 CR4: 00000000001406f0
[75667.524664] Call Trace:
[75667.528044]  ? find_mad_agent+0x7c/0x1b0 [ib_core]
[75667.534031]  ? ib_mark_mad_done+0x73/0xa0 [ib_core]
[75667.540142]  ib_mad_recv_done+0x423/0x9b0 [ib_core]
[75667.546215]  __ib_process_cq+0x5d/0xb0 [ib_core]
[75667.552007]  ib_cq_poll_work+0x20/0x60 [ib_core]
[75667.557766]  process_one_work+0x149/0x360
[75667.562844]  worker_thread+0x4d/0x3c0
[75667.567529]  kthread+0x109/0x140
[75667.571713]  ? rescuer_thread+0x380/0x380
[75667.576775]  ? kthread_park+0x60/0x60
[75667.581447]  ret_from_fork+0x25/0x30
[75667.586014] Code: 43 4a 0f b6 45 c6 88 43 4b 48 8b 45 b0 48 89 43 4c 48 8b 45 b8 48 89 43 54 8b 45 c0 0f c8 89 43 5c e9 79 ff ff ff e8 16 4e fa e0 <0f> ff e9 42 ff ff ff 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00
[75667.608323] ---[ end trace cf26df27c9597264 ]---

Fixes: 62ede77799 ("Add OPA extended LID support")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-04 15:39:45 -04:00
Shiraz Saleem
04eae42740 RDMA/iwpm: Properly mark end of NL messages
Commit 1a1c116f3d removes nlmsg_len calculation in
ibnl_put_attr causing netlink messages to be rejected due
to incorrect length.

Add nlmsg_end after all attributes are appended to calculate
the nlmsg_len.

Fixes: 1a1c116f3d ("RDMA/netlink: Simplify the put_msg and put_attr")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-29 11:32:42 -04:00
Arnd Bergmann
40a203396c IB/uverbs: clean up INIT_UDATA() macro usage
After changing INIT_UDATA_BUF_OR_NULL() to an inline function,
this does the same change to INIT_UDATA for consistency.
I'm keeping it separate as this part is much larger and
we wouldn't want to backport this to stable kernels if we
ever want to address the gcc warnings by backporting the
first patch.

Again, using an inline function gives us better type
safety here among other issues with macros. I'm using
u64_to_user_ptr() to convert the user pointer to simplify
the logic rather than adding lots of new type casts.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 08:54:19 -04:00
Arnd Bergmann
12f727721e IB/uverbs: clean up INIT_UDATA_BUF_OR_NULL usage
We get a harmless warning about the fact that we use the result of a
multiplication as a condition:

drivers/infiniband/core/uverbs_main.c: In function 'ib_uverbs_write':
drivers/infiniband/core/uverbs_main.c:787:40: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context]
drivers/infiniband/core/uverbs_main.c:787:117: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context]
drivers/infiniband/core/uverbs_main.c:790:50: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context]
drivers/infiniband/core/uverbs_main.c:790:151: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context]

This avoids the problem by using an inline function in place of
the macro.

Fixes: a96e4e2ffe ("IB/uverbs: New macro to set pointers to NULL if length is 0 in INIT_UDATA()")
Suggested-by: Christoph Hellwig <hch@infradead.org>
Link: https://patchwork.kernel.org/patch/9940777/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 08:54:19 -04:00
Colin Ian King
8f63d4b1d5 IB/core: fix spelling mistake: "aceess" -> "access"
Trivial fix to spelling mistake in WARN message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27 08:54:19 -04:00
Parav Pandit
73827a605b IB/core: Fix qp_sec use after free access
When security_ib_alloc_security fails, qp->qp_sec memory is freed.
However ib_destroy_qp still tries to access this memory which result
in kernel crash. So its initialized to NULL to avoid such access.

Fixes: d291f1a652 ("IB/core: Enforce PKey security on QPs")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25 11:47:23 -04:00
Leon Romanovsky
78b1beb099 IB/core: Fix typo in the name of the tag-matching cap struct
The tag matching functionality is implemented by mlx5 driver
by extending XRQ, however this internal kernel information was
exposed to user space applications with *xrq* name instead of *tm*.

This patch renames *xrq* to *tm* to handle that.

Fixes: 8d50505ada ("IB/uverbs: Expose XRQ capabilities")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25 11:47:23 -04:00
Linus Torvalds
ded8503200 First -rc update for 4.14 kernel
- Smattering of miscellanous fixes
 - A five patch series for i40iw that had a patch (5/5) that was larger
   than I would like, but I took it because it's needed for large scale
   users
 - An 8 patch series for bnxt_re that landed right as I was leaving on
   PTO and so had to wait until now...they are all appropriate fixes for
   -rc IMO
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZxU+WAAoJELgmozMOVy/dQwEP/ja5+3zNbkX69T/ch5Q9koKO
 7O1Onw/ePn9va/hC0IJm910syeyUcnkl+0GJH9JhS/Q/7bd9S97TjdSMjZpOSTjA
 qCkFWOJ2zZPsGVijsiFF+BQa1jPgUc2VRwbuC4sWm19Ma8iLZ86aXKot9prBPoU7
 dEnpwX5LrUIQCcNmWaudXoctiqN3y6oQzIobzGJXXQzlT5VPudIPYKUZMixuLYH2
 XXJ5MtrHlvB+aKIURcHey03q8Vah5HQ6P467249fNBsLoYbycx7aPYhR7NyFDEEX
 IkucBT7FOZUqcklxIXQHRQOTvj8dru91TvsZ6aNVPuS6SvYTf95cSFu7yBBP+DNd
 g3UWpuRXwvJYQosXbpHhGNevq2M3XLZmzEvOBul8j7Fq/4rw6HxFYtA9um/8V4h9
 UxJjjAu59gbkmnrG2cGJCLwnC75BId84cZ4Nc8vfB/mhShE3n8YjRXfb1clS9DB7
 CTNLp7AtFujTdWc4iQ3vMZ9cCILQtKnSXvnETHq65WDnqfaPT7NfwIrFxGHDUa5N
 m94l+Neg3rNrsxcRFxXQ9HzmG2ZTiGK956Nvpxn6/cDD6ZVd6RQBOYjZ4QxVd+lS
 jdkA0gImS88HlupyosILMPjQm+BCqmDjpZx/yWyRRCBe7XP1MgX9S2ySDqFgiy1j
 J9KGzXFIV73DA8nVfNtM
 =iiKF
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma fixes from Doug Ledford:

 - Smattering of miscellanous fixes

 - A five patch series for i40iw that had a patch (5/5) that was larger
   than I would like, but I took it because it's needed for large scale
   users

 - An 8 patch series for bnxt_re that landed right as I was leaving on
   PTO and so had to wait until now...they are all appropriate fixes for
   -rc IMO

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (22 commits)
  bnxt_re: Don't issue cmd to delete GID for QP1 GID entry before the QP is destroyed
  bnxt_re: Fix memory leak in FRMR path
  bnxt_re: Remove RTNL lock dependency in bnxt_re_query_port
  bnxt_re: Fix race between the netdev register and unregister events
  bnxt_re: Free up devices in module_exit path
  bnxt_re: Fix compare and swap atomic operands
  bnxt_re: Stop issuing further cmds to FW once a cmd times out
  bnxt_re: Fix update of qplib_qp.mtu when modified
  i40iw: Add support for port reuse on active side connections
  i40iw: Add missing VLAN priority
  i40iw: Call i40iw_cm_disconn on modify QP to disconnect
  i40iw: Prevent multiple netdev event notifier registrations
  i40iw: Fail open if there are no available MSI-X vectors
  RDMA/vmw_pvrdma: Fix reporting correct opcodes for completion
  IB/bnxt_re: Fix frame stack compilation warning
  IB/mlx5: fix debugfs cleanup
  IB/ocrdma: fix incorrect fall-through on switch statement
  IB/ipoib: Suppress the retry related completion errors
  iw_cxgb4: remove the stid on listen create failure
  iw_cxgb4: drop listen destroy replies if no ep found
  ...
2017-09-23 05:47:04 -10:00
Alex Estrin
e6f9bc34d3 IB/core: Fix for core panic
Build with the latest patches resulted in panic:
11384.486289] BUG: unable to handle kernel NULL pointer dereference at
         (null)
[11384.486293] IP:           (null)
[11384.486295] PGD 0
[11384.486295] P4D 0
[11384.486296]
[11384.486299] Oops: 0010 [#1] SMP
......... snip ......
[11384.486401] CPU: 0 PID: 968 Comm: kworker/0:1H Tainted: G        W  O
    4.13.0-a-stream-20170825 #1
[11384.486402] Hardware name: Intel Corporation S2600WT2R/S2600WT2R,
BIOS SE5C610.86B.01.01.0014.121820151719 12/18/2015
[11384.486418] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
[11384.486419] task: ffff880850579680 task.stack: ffffc90007fec000
[11384.486420] RIP: 0010:          (null)
[11384.486420] RSP: 0018:ffffc90007fef970 EFLAGS: 00010206
[11384.486421] RAX: ffff88084cfe8000 RBX: ffff88084dce4000 RCX:
ffffc90007fef978
[11384.486422] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff88084cfe8000
[11384.486422] RBP: ffffc90007fefab0 R08: 0000000000000000 R09:
ffff88084dce4080
[11384.486423] R10: ffffffffa02d7f60 R11: 0000000000000000 R12:
ffff88105af65a00
[11384.486423] R13: ffff88084dce4000 R14: 000000000000c000 R15:
000000000000c000
[11384.486424] FS:  0000000000000000(0000) GS:ffff88085f400000(0000)
knlGS:0000000000000000
[11384.486425] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11384.486425] CR2: 0000000000000000 CR3: 0000000001c09000 CR4:
00000000001406f0
[11384.486426] Call Trace:
[11384.486431]  ? is_valid_mcast_lid.isra.21+0xfb/0x110 [ib_core]
[11384.486436]  ib_attach_mcast+0x6f/0xa0 [ib_core]
[11384.486441]  ipoib_mcast_attach+0x81/0x190 [ib_ipoib]
[11384.486443]  ipoib_mcast_join_complete+0x354/0xb40 [ib_ipoib]
[11384.486448]  mcast_work_handler+0x330/0x6c0 [ib_core]
[11384.486452]  join_handler+0x101/0x220 [ib_core]
[11384.486455]  ib_sa_mcmember_rec_callback+0x54/0x80 [ib_core]
[11384.486459]  recv_handler+0x3a/0x60 [ib_core]
[11384.486462]  ib_mad_recv_done+0x423/0x9b0 [ib_core]
[11384.486466]  __ib_process_cq+0x5d/0xb0 [ib_core]
[11384.486469]  ib_cq_poll_work+0x20/0x60 [ib_core]
[11384.486472]  process_one_work+0x149/0x360
[11384.486474]  worker_thread+0x4d/0x3c0
[11384.486487]  kthread+0x109/0x140
[11384.486488]  ? rescuer_thread+0x380/0x380
[11384.486489]  ? kthread_park+0x60/0x60
[11384.486490]  ? kthread_park+0x60/0x60
[11384.486493]  ret_from_fork+0x25/0x30
[11384.486493] Code:  Bad RIP value.
[11384.486493] Code:  Bad RIP value.
[11384.486496] RIP:           (null) RSP: ffffc90007fef970
[11384.486497] CR2: 0000000000000000
[11384.486531] ---[ end trace b1acec6fb4ff6e75 ]---
[11384.532133] Kernel panic - not syncing: Fatal exception
[11384.536541] Kernel Offset: disabled
[11384.969491] ---[ end Kernel panic - not syncing: Fatal exception
[11384.976875] sched: Unexpected reschedule of offline CPU#1!
[11384.983646] ------------[ cut here ]------------

Rdma device driver may not have implemented (*get_link_layer)()
so it can not be called directly. Should use appropriate helper function.

Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Fixes: 5236333592 ("IB/core: Fix the validations of a multicast LID in attach or detach operations")
Cc: stable@kernel.org # 4.13
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-22 11:52:09 -04:00
Linus Torvalds
ad9a19d003 More RDMA work and some op-structure constification from Chuck Lever,
and a small cleanup to our xdr encoding.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZst0LAAoJECebzXlCjuG+o30QALbchoIvs7BiDrUxYMfJ2nCa
 7UW69STwX79B3NZTg7RrScFTLPEFW9DMpb/Og7AYTH3/wdgGYQNM1UxGUYe7IxSN
 xemH7BSmQzJ7ryaxouO/jskUw5nvNRXhY0PMxJApjrCs837vTjduIVw9zUa8EDeH
 9toxpTM4k3z/1myj60PuHnuQF9EyLDL6W581loDF04nQB3pVRbAZOh1lUeqMgLUd
 7IF+CDECFcjL7oZSA3wDGpsVySLdZ+GYxloFIDO/d8kHEsZD3OaN2MdfRki8EOSQ
 qibTYO0284VeyNLUOIHjspqbDh0Lr2F7VolMmlM5GF1IuApih0/QYidqsH6/As3U
 JIAK53vgqZfK2qI0ud7dGGFEnT/vlE7pQiXiza36xI8YZu4Xz6uGbM41p38RU8jO
 3fr38xdPqqO7YE6F7ZUHYyrmW81Vi0lFdQkw1DBEipHV8UquuCmdtAeR9xgDsdQ/
 LsMVevM1mF+19krOIGbBnENq1GX78ecfHEYGxlTjf/MeO4JYl+8/x7Ow2e/ZbwSa
 7hpUeCiVuVmy1hqOEtraBl5caAG0hCE8PeGRrdr5dA6ZS9YTm0ANgtxndKabwDh2
 CjXF3gRnQNUGdFGCi/fmvfb89tVNj1tL52pbQqfgOb/VFrrL328vyNNg/1p2VY4Q
 qzmKtxZhi/XBewQjaSQl
 =E3UQ
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.14' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "More RDMA work and some op-structure constification from Chuck Lever,
  and a small cleanup to our xdr encoding"

* tag 'nfsd-4.14' of git://linux-nfs.org/~bfields/linux:
  svcrdma: Estimate Send Queue depth properly
  rdma core: Add rdma_rw_mr_payload()
  svcrdma: Limit RQ depth
  svcrdma: Populate tail iovec when receiving
  nfsd: Incoming xdr_bufs may have content in tail buffer
  svcrdma: Clean up svc_rdma_build_read_chunk()
  sunrpc: Const-ify struct sv_serv_ops
  nfsd: Const-ify NFSv4 encoding and decoding ops arrays
  sunrpc: Const-ify instances of struct svc_xprt_ops
  nfsd4: individual encoders no longer see error cases
  nfsd4: skip encoder in trivial error cases
  nfsd4: define ->op_release for compound ops
  nfsd4: opdesc will be useful outside nfs4proc.c
  nfsd4: move some nfsd4 op definitions to xdr4.h
2017-09-09 13:31:49 -07:00
Davidlohr Bueso
f808c13fd3 lib/interval_tree: fast overlap detection
Allow interval trees to quickly check for overlaps to avoid unnecesary
tree lookups in interval_tree_iter_first().

As of this patch, all interval tree flavors will require using a
'rb_root_cached' such that we can have the leftmost node easily
available.  While most users will make use of this feature, those with
special functions (in addition to the generic insert, delete, search
calls) will avoid using the cached option as they can do funky things
with insertions -- for example, vma_interval_tree_insert_after().

[jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()]
  Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com
Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Doug Ledford <dledford@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:49 -07:00
Linus Torvalds
015a9e66b9 RDMA/netlink: clean up message validity array initializer
The fix in the parent made me look at that function, and react to how
illogical and illegible the array initializer was.

Use named array indexes to make it clearer what is going on, and make
the initializer not depend silently on the exact index numbers.

[ The initializer now also shows an odd inconsistency in the naming:
  note the IWCM vs IWPM..   - Linus ]

Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 10:17:20 -07:00
Leon Romanovsky
8b2c7e7a3c RDAM/netlink: Fix out-of-bound access while checking message validity
The netlink message sent with type == 0, which doesn't have any client
behind it, caused to the overflow in max_num_ops array.

Fix it by declaring zero number of ops for the first client.

Fixes: c9901724a2 ("RDMA/netlink: Remove netlink clients infrastructure")
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 10:01:03 -07:00
Chuck Lever
0062818298 rdma core: Add rdma_rw_mr_payload()
The amount of payload per MR depends on device capabilities and
the memory registration mode in use. The new rdma_rw API hides both,
making it difficult for ULPs to determine how large their transport
send queues need to be.

Expose the MR payload information via a new API.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-09-05 15:15:30 -04:00
Linus Torvalds
aa9d4648c2 Updates for 4.14 kernel merge window
- Lots of hfi1 driver updates (mixed with a few qib and core updates as
   well)
 - rxe updates
 - various mlx updates
 - Set default roce type to RoCEv2
 - Several larger fixes for bnxt_re that were too big for -rc
 - Several larger fixes for qedr that, likewise, were too big for -rc
 - Misc core changes
 - Make the hns_roce driver compilable on arches other than aarch64 so we
   can more easily debug build issues related to it
 - Add rdma-netlink infrastructure updates
 - Add automatic IRQ affinity infrastructure
 - Add 32bit lid support
 - Lots of misc fixes across the subsystem from random people
 - Autoloading of RDMA netlink modules
 - PCI pool cleanups from Romain Perier
 - mlx5 driver feature additions and fixes
 - Hardware tag matchine feature
 - Fix sleeping in atomic when resolving roce ah
 - Add experimental ioctl interface as posted to linux-api@
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZqBDtAAoJELgmozMOVy/dNlcQAJhYNRGaNUBx0L6+8t2xwUrt
 7ndP6qlMar30DJY9FjTQCzRBw0CRMWkXdJD8rYlyaHy07pjWDKG8LZtxEXu1FLdZ
 oNRvQX6ZJh8Bz7db2SQFBCTF2uWGZZFqWQCrSbQwjj9xxjMDs59u/knmwHVY9dKk
 egjPG4IQBDmcTeNY7h1otG2hXpx7QPIOilQW2EFN5SWAuBAazdF2JKxjjxqhnUfp
 gD2pSdgsm3VSMoo0zpMa6qOP+9GcOu8J97fYFhasRYWCavPdWHyq+XNu9S/eicRd
 xbv+seCYM+9jPb2dsNdjEKll7w3yyWdu7h6tSCMPYv54eN9sDDiO1w2L2ZnESMZa
 JRnSfB+HXru1r4RyHOTPO8peaNhYlR1V4u8bTS5G2dffbHis9BajkWoAR/oSiUcB
 AIjIIDcdJFVGfpF9KIt/pEl+adHNgESibSijzOUYkyw6RNbPqDmdd7YakPHcQhKN
 clE3zQfIsPRLWsToP/nkBE0tUd3tQocRuLy7ote7hXQK+0p7TBz0a6Kkj87MvX33
 8dVbUI+q6WRlEY90l71y0ZdXy/AvkxkFxAc4Y7FQZyJxhEArTaKgfa5fmpRwVxBm
 yi9baoYCspHNRNv6AO4IL86ZCJqmWBuch8CBY1n2X3h8IGfKYEZUAZ+T/mnTTeUq
 A4joXduz94ZD4w23leD1
 =2ntC
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "This is a big pull request.

  Of note is that I'm sending you the new ioctl API for the rdma
  subsystem. We put it up on linux-api@, but didn't get much response.
  The API is complex, but it solves two different problems in one go:

   1) The bi-directional nature of the RDMA file write calls, which
      created the security hole we had to handle (and for which the fix
      is now causing problems for systems in production, we were a bit
      over zealous in the fix and the ability to open a device, then
      fork, then create new queue pairs on the device and use them is
      broken).

   2) The bloat caused by different vendors implementing extensions to
      the base verbs API. Each vendor's hardware is slightly different,
      and the hardware might be suitable for one extension but not
      another.

      By the time we add generic extensions for all the different ways
      that the different hardware can offload things, the API becomes
      bloated. Things like our completion structs have started to exceed
      a cache line in size because of all the elements needed to support
      this. That in turn shows up heavily in the performance graphs with
      a noticable drop in performance on 100Gigabit links as our
      completion structs go from occupying one cache line to 1+.

      This API makes things like the completion structs modular in a
      very similar way to netlink so that your structs can only include
      the items needed for the offloads/features you are actually using
      on a given queue pair. In that way we support everything, but only
      use what we need, and our structs stay smaller.

  The ioctl API is better explained by the posting on linux-api@ than I
  can explain it here, so I'll just leave it at that.

  The rest of the pull request is typical stuff.

  Updates for 4.14 kernel merge window

   - Lots of hfi1 driver updates (mixed with a few qib and core updates
     as well)

   - rxe updates

   - various mlx updates

   - Set default roce type to RoCEv2

   - Several larger fixes for bnxt_re that were too big for -rc

   - Several larger fixes for qedr that, likewise, were too big for -rc

   - Misc core changes

   - Make the hns_roce driver compilable on arches other than aarch64 so
     we can more easily debug build issues related to it

   - Add rdma-netlink infrastructure updates

   - Add automatic IRQ affinity infrastructure

   - Add 32bit lid support

   - Lots of misc fixes across the subsystem from random people

   - Autoloading of RDMA netlink modules

   - PCI pool cleanups from Romain Perier

   - mlx5 driver feature additions and fixes

   - Hardware tag matchine feature

   - Fix sleeping in atomic when resolving roce ah

   - Add experimental ioctl interface as posted to linux-api@"

* tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (328 commits)
  IB/core: Expose ioctl interface through experimental Kconfig
  IB/core: Assign root to all drivers
  IB/core: Add completion queue (cq) object actions
  IB/core: Add legacy driver's user-data
  IB/core: Export ioctl enum types to user-space
  IB/core: Explicitly destroy an object while keeping uobject
  IB/core: Add macros for declaring methods and attributes
  IB/core: Add uverbs merge trees functionality
  IB/core: Add DEVICE object and root tree structure
  IB/core: Declare an object instead of declaring only type attributes
  IB/core: Add new ioctl interface
  RDMA/vmw_pvrdma: Fix a signedness
  RDMA/vmw_pvrdma: Report network header type in WC
  IB/core: Add might_sleep() annotation to ib_init_ah_from_wc()
  IB/cm: Fix sleeping in atomic when RoCE is used
  IB/core: Add support to finalize objects in one transaction
  IB/core: Add a generic way to execute an operation on a uobject
  Documentation: Hardware tag matching
  IB/mlx5: Support IB_SRQT_TM
  net/mlx5: Add XRQ support
  ...
2017-09-03 17:49:17 -07:00
Jérôme Glisse
b1a89257f2 IB/umem: update to new mmu_notifier semantic
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()

Remove now useless invalidate_page callback.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Tested-by: Leon Romanovsky <leonro@mellanox.com>
Cc: linux-rdma@vger.kernel.org
Cc: Artemy Kovalyov <artemyko@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31 16:12:59 -07:00
Matan Barak
8eb19e8e7c IB/core: Expose ioctl interface through experimental Kconfig
Add CONFIG_INFINIBAND_EXP_USER_ACCESS that enables the ioctl
interface. This interface is experimental and is subject to change.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:14 -04:00
Matan Barak
5242711294 IB/core: Assign root to all drivers
In order to use the parsing tree, we need to assign the root
to all drivers. Currently, we just assign the default parsing
tree via ib_uverbs_add_one. The driver could override this by
assigning a parsing tree prior to registering the device.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:14 -04:00
Matan Barak
9ee79fce36 IB/core: Add completion queue (cq) object actions
Adding CQ ioctl actions:
1. create_cq
2. destroy_cq

This requires adding the following:
1. A specification describing the method
	a. Handler
	b. Attributes specification
		Each attribute is one of the following:
		a. PTR_IN - input data
			    Note: This could be encoded inlined for
				  data < 64bit
		b. PTR_OUT - response data
		c. IDR - idr based object
		d. FD - fd based object
                Blobs attributes (clauses a and b) contain their type,
	        while objects specifications (clauses c and d)
                contains the expected object type (for example, the
                given id should be UVERBS_TYPE_PD) and the required
                access (READ, WRITE, NEW or DESTROY). If a NEW is
                required, the new object's id will be assigned to this
                attribute. All attributes could get UA_FLAGS
                attribute. Currently we support stating that an
		attribute is mandatory or that the specification size
                corresponds to a lower bound (and that this attribute
		could be extended).
		We currently add both default attributes and the two
		generic UHW_IN and UHW_OUT driver specific attributes.
2. Handler
   A handler gets a uverbs_attr_bundle. The handler developer uses
   uverbs_attr_get to fetch an attribute of a given id.
   Each of these attribute groups correspond to the specification
   group defined in the action (clauses 1.b and 1.c respectively).
   The indices of these arrays corresponds to the attribute ids
   declared in the specifications (clause 2).

   The handler is quite simple. It assumes the infrastructure fetched
   all objects and locked, created or destroyed them as required by
   the specification. Pointer (or blob) attributes were validated to
   match their required sizes. After the handler finished, the
   infrastructure commits or rollbacks the objects.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:13 -04:00
Matan Barak
d70724f149 IB/core: Add legacy driver's user-data
In this phase, we don't want to change all the drivers to use
flexible driver's specific attributes. Therefore, we add two default
attributes: UHW_IN and UHW_OUT. These attributes are optional in some
methods and they encode the driver specific command data. We add
a function that extract this data and creates the legacy udata over
it.

Driver's data should start from UVERBS_UDATA_DRIVER_DATA_FLAG. This
turns on the first bit of the namespace, indicating this attribute
belongs to the driver's namespace.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:13 -04:00
Matan Barak
4da70da23e IB/core: Explicitly destroy an object while keeping uobject
When some objects are destroyed, we need to extract their status at
destruction. After object's destruction, this status
(e.g. events_reported) relies in the uobject. In order to have the
latest and correct status, the underlying object should be destroyed,
but we should keep the uobject alive and read this information off the
uobject. We introduce a rdma_explicit_destroy function. This function
destroys the class type object (for example, the IDR class type which
destroys the underlying object as well) and then convert the uobject
to be of a null class type. This uobject will then be destroyed as any
other uobject once uverbs_finalize_object[s] is called.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:11 -04:00
Matan Barak
118620d368 IB/core: Add uverbs merge trees functionality
Different drivers support different features and even subset of the
common uverbs implementation. Currently, this is handled as bitmask
in every driver that represents which kind of methods it supports, but
doesn't go down to attributes granularity. Moreover, drivers might
want to add their specific types, methods and attributes to let
their user-space counter-parts be exposed to some more efficient
abstractions. It means that existence of different features is
validated syntactically via the parsing infrastructure rather than
using a complex in-handler logic.

In order to do that, we allow defining features and abstractions
as parsing trees. These per-feature parsing tree could be merged
to an efficient (perfect-hash based) parsing tree, which is later
used by the parsing infrastructure.

To sum it up, this makes a parse tree unique for a device and
represents only the features this particular device supports.
This is done by having a root specification tree per feature.
Before a device registers itself as an IB device, it merges
all these trees into one parsing tree. This parsing tree
is used to parse all user-space commands.

A future user-space application could read this parse tree. This
tree represents which objects, methods and attributes are
supported by this device.

This is based on the idea of
Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:10 -04:00
Matan Barak
09e3ebf8c1 IB/core: Add DEVICE object and root tree structure
This adds the DEVICE object. This object supports creating the context
that all objects are created from. Moreover, it supports executing
methods which are related to the device itself, such as QUERY_DEVICE.
This is a singleton object (per file instance).

All standard objects are put in the root structure. This root will later
on be used in drivers as the source for their whole parsing tree.
Later on, when new features are added, these drivers could mix this root
with other customized objects.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:10 -04:00
Matan Barak
5009010fbf IB/core: Declare an object instead of declaring only type attributes
Switch all uverbs_type_attrs_xxxx with DECLARE_UVERBS_OBJECT
macros. This will be later used in order to embed the object
specific methods in the objects as well.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:09 -04:00
Matan Barak
fac9658cab IB/core: Add new ioctl interface
In this ioctl interface, processing the command starts from
properties of the command and fetching the appropriate user objects
before calling the handler.

Parsing and validation is done according to a specifier declared by
the driver's code. In the driver, all supported objects are declared.
These objects are separated to different object namepsaces. Dividing
objects to namespaces is done at initialization by using the higher
bits of the object ids. This initialization can mix objects declared
in different places to one parsing tree using in this ioctl interface.

For each object we list all supported methods. Similarly to objects,
methods are separated to method namespaces too. Namespacing is done
similarly to the objects case. This could be used in order to add
methods to an existing object.

Each method has a specific handler, which could be either a default
handler or a driver specific handler.
Along with the handler, a bunch of attributes are specified as well.
Similarly to objects and method, attributes are namespaced and hashed
by their ids at initialization too. All supported attributes are
subject to automatic fetching and validation. These attributes include
the command, response and the method's related objects' ids.

When these entities (objects, methods and attributes) are used, the
high bits of the entities ids are used in order to calculate the hash
bucket index. Then, these high bits are masked out in order to have a
zero based index. Since we use these high bits for both bucketing and
namespacing, we get a compact representation and O(1) array access.
This is mandatory for efficient dispatching.

Each attribute has a type (PTR_IN, PTR_OUT, IDR and FD) and a length.
Attributes could be validated through some attributes, like:
(*) Minimum size / Exact size
(*) Fops for FD
(*) Object type for IDR

If an IDR/fd attribute is specified, the kernel also states the object
type and the required access (NEW, WRITE, READ or DESTROY).
All uobject/fd management is done automatically by the infrastructure,
meaning - the infrastructure will fail concurrent commands that at
least one of them requires concurrent access (WRITE/DESTROY),
synchronize actions with device removals (dissociate context events)
and take care of reference counting (increase/decrease) for concurrent
actions invocation. The reference counts on the actual kernel objects
shall be handled by the handlers.

 objects
+--------+
|        |
|        |   methods                                                                +--------+
|        |   ns         method      method_spec                           +-----+   |len     |
+--------+  +------+[d]+-------+   +----------------+[d]+------------+    |attr1+-> |type    |
| object +> |method+-> | spec  +-> +  attr_buckets  +-> |default_chain+--> +-----+   |idr_type|
+--------+  +------+   |handler|   |                |   +------------+    |attr2|   |access  |
|        |  |      |   +-------+   +----------------+   |driver chain|    +-----+   +--------+
|        |  |      |                                    +------------+
|        |  +------+
|        |
|        |
|        |
|        |
|        |
|        |
|        |
|        |
|        |
|        |
+--------+

[d] = Hash ids to groups using the high order bits

The right types table is also chosen by using the high bits from
the ids. Currently we have either default or driver specific groups.

Once validation and object fetching (or creation) completed, we call
the handler:
int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file *ufile,
               struct uverbs_attr_bundle *ctx);

ctx bundles attributes of different namespaces. Each element there
is an array of attributes which corresponds to one namespaces of
attributes. For example, in the usually used case:

 ctx                               core
+----------------------------+     +------------+
| core:                      +---> | valid      |
+----------------------------+     | cmd_attr   |
| driver:                    |     +------------+
|----------------------------+--+  | valid      |
                                |  | cmd_attr   |
                                |  +------------+
                                |  | valid      |
                                |  | obj_attr   |
                                |  +------------+
                                |
                                |  drivers
                                |  +------------+
                                +> | valid      |
                                   | cmd_attr   |
                                   +------------+
                                   | valid      |
                                   | cmd_attr   |
                                   +------------+
                                   | valid      |
                                   | obj_attr   |
                                   +------------+

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:09 -04:00
Roland Dreier
79364227e6 IB/core: Add might_sleep() annotation to ib_init_ah_from_wc()
For RoCE, ib_init_ah_from_wc() can follow the path

    ib_init_ah_from_wc() ->
      rdma_addr_find_l2_eth_by_grh() ->
        rdma_resolve_ip()

and rdma_resolve_ip() will sleep in kzalloc() and wait_for_completion().

However, developers will not see any warnings if they use ib_init_ah_from_wc()
in an atomic context and test only on IB, because the function doesn't
sleep in that case.

Add a might_sleep() so that lockdep will catch bugs no matter what hardware is
used to test.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:07 -04:00
Roland Dreier
c761611811 IB/cm: Fix sleeping in atomic when RoCE is used
A couple of places in the CM do

    spin_lock_irq(&cm_id_priv->lock);
    ...
    if (cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg))

However when the underlying transport is RoCE, this leads to a sleeping function
being called with the lock held - the callchain is

    cm_alloc_response_msg() ->
      ib_create_ah_from_wc() ->
        ib_init_ah_from_wc() ->
          rdma_addr_find_l2_eth_by_grh() ->
            rdma_resolve_ip()

and rdma_resolve_ip() starts out by doing

    req = kzalloc(sizeof *req, GFP_KERNEL);

not to mention rdma_addr_find_l2_eth_by_grh() doing

    wait_for_completion(&ctx.comp);

to wait for the task that rdma_resolve_ip() queues up.

Fix this by moving the AH creation out of the lock.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-31 08:35:07 -04:00
Matan Barak
f43dbebfa3 IB/core: Add support to finalize objects in one transaction
The new ioctl based infrastructure either commits or rollbacks
all objects of the method as one transaction. In order to do
that, we introduce a notion of dealing with a collection of
objects that are related to a specific method.

This also requires adding a notion of a method and attribute.
A method contains a hash of attributes, where each bucket
contains several attributes. The attributes are hashed according
to their namespace which resides in the four upper bits of the id.

For example, an object could be a CQ, which has an action of CREATE_CQ.
This action has multiple attributes. For example, the CQ's new handle
and the comp_channel. Each layer in this hierarchy - objects, methods
and attributes is split into namespaces. The basic example for that is
one namespace representing the default entities and another one
representing the driver specific entities.

When declaring these methods and attributes, we actually declare
their specifications. When a method is executed, we actually
allocates some space to hold auxiliary information. This auxiliary
information contains meta-data about the required objects, such
as pointers to their type information, pointers to the uobjects
themselves (if exist), etc.
The specification, along with the auxiliary information we allocated
and filled is given to the finalize_objects function.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-30 10:30:38 -04:00
Matan Barak
a0aa309c39 IB/core: Add a generic way to execute an operation on a uobject
The ioctl infrastructure treats all user-objects in the same manner.
It gets objects ids from the user-space and by using the object type
and type attributes mentioned in the object specification, it executes
this required method. Passing an object id from the user-space as
an attribute is carried out in three stages. The first is carried out
before the actual handler and the last is carried out afterwards.

The different supported operations are read, write, destroy and create.
In the first stage, the former three actions just fetches the object
from the repository (by using its id) and locks it. The last action
allocates a new uobject. Afterwards, the second stage is carried out
when the handler itself carries out the required modification of the
object. The last stage is carried out after the handler finishes and
commits the result. The former two operations just unlock the object.
Destroy calls the "free object" operation, taking into account the
object's type and releases the uobject as well. Creation just adds the
new uobject to the repository, making the object visible to the
application.

In order to abstract these details from the ioctl infrastructure
layer, we add uverbs_get_uobject_from_context and
uverbs_finalize_object functions which corresponds to the first
and last stages respectively.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-30 10:30:38 -04:00
Artemy Kovalyov
8d50505ada IB/uverbs: Expose XRQ capabilities
Make XRQ capabilities available via ibv_query_device() verb.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29 08:30:18 -04:00
Artemy Kovalyov
38eb44fac7 IB/uverbs: Add new SRQ type IB_SRQT_TM
Add new SRQ type capable of new tag matching feature.

When SRQ receives a message it will search through the matching list
for the corresponding posted receive buffer. The process of searching
the matching list is called tag matching.

In case the tag matching results in a match, the received message will
be placed in the address specified by the receive buffer. In case no
match was found the message will be placed in a generic buffer until the
corresponding receive buffer will be posted. These messages are called
unexpected and their set is called an unexpected list.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29 08:30:18 -04:00
Artemy Kovalyov
1a56ff6daa IB/core: Separate CQ handle in SRQ context
Before this change CQ attached to SRQ was part of XRC specific extension.
Moving CQ handle out makes it available to other types extending SRQ
functionality.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-29 08:30:16 -04:00
Doug Ledford
a1139697ad Merge branch 'mellanox' into k.o/for-next
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 20:25:15 -04:00
Selvin Xavier
61e0962d52 IB: Avoid ib_modify_port() failure for RoCE devices
IB CM calls ib_modify_port() irrespective of link layer. If the
failure is returned, the mad agent gets unregistered for those
devices. Recently, modify_port() hook was removed from some of the
low level drivers as it was always returning success. This breaks
rdma connection establishment over those devices.
For ethernet devices, Qkey violation and port capabilities are not
applicable. So returning success for RoCE when modify_port hook is
is not implemented.

Cc: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 17:34:57 -04:00
Leon Romanovsky
82901e3eb8 RDMA/core: Refactor get link layer wrapper
The return values from rdma_node_get_transport() are strict
and IB_LINK_LAYER_UNSPECIFIED is unreachable in this flow.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 16:27:10 -04:00
Leon Romanovsky
cdc596d89e RDMA/core: Delete BUG() from unreachable flow
Remove call to BUG() in case wrong node_type was provided.
This flow is unreachable, because node_types are supplied
from specific enum.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 16:27:10 -04:00
Leon Romanovsky
dcc9881e67 RDMA/(core, ulp): Convert register/unregister event handler to be void
The functions ib_register_event_handler() and
ib_unregister_event_handler() always returned success and they can't fail.

Let's convert those functions to be void, remove redundant checks and
cleanup tons of goto statements.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 16:27:10 -04:00
Parav Pandit
89caa0538e IB/uverbs: Introduce and use helper functions to copy ah attributes
This patch introduces two helper functions to copy ah attributes
from uverbs to internal ib_ah_attr structure and the other way
during modify qp and query qp respectively.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 16:27:10 -04:00
Leon Romanovsky
5ab2d89b85 IB/cma: Fix erroneous validation of supported default GID type
When rdma_cm is initializing a cma_device it checks if this device
supports the preferred default GID type. This check was done in a wrong way
and therefore sometimes rdma_cm is coming up with default GID type that is
not supported by the device.

Fix that by checking for supported GID type properly.

Fixes: 3c7f67d188 ("IB/cma: Fix default RoCE type setting")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 16:15:32 -04:00
Doug Ledford
732912c738 Merge branch 'k.o/for-4.13-rc' into k.o/for-next
Pick up -rc fixes.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 15:58:26 -04:00
Noa Osherovich
498ca3c82a IB/core: Avoid accessing non-allocated memory when inferring port type
Commit 44c58487d5 ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
introduced the concept of type in ah_attr:
 * During ib_register_device, each port is checked for its type which
   is stored in ib_device's port_immutable array.
 * During uverbs' modify_qp, the type is inferred using the port number
   in ib_uverbs_qp_dest struct (address vector) by accessing the
   relevant port_immutable array and the type is passed on to
   providers.

IB spec (version 1.3) enforces a valid port value only in Reset to
Init. During Init to RTR, the address vector must be valid but port
number is not mentioned as a field in the address vector, so its
value is not validated, which leads to accesses to a non-allocated
memory when inferring the port type.

Save the real port number in ib_qp during modify to Init (when the
comp_mask indicates that the port number is valid) and use this value
to infer the port type.

Avoid copying the address vector fields if the matching bit is not set
in the attr_mask. Address vector can't be modified before the port, so
no valid flow is affected.

Fixes: 44c58487d5 ('IB/core: Define 'ib' and 'roce' rdma_ah_attr types')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24 15:33:33 -04:00
Jason Gunthorpe
e3bf14bdc1 rdma: Autoload netlink client modules
If a message comes in and we do not have the client in the table, then
try to load the module supplying that client using MODULE_ALIAS to find
it.

This duplicates the scheme seen in other netlink muxes (eg nfnetlink).

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 17:04:22 -04:00
Jason Gunthorpe
1eb5be0ec7 rdma: Allow demand loading of NETLINK_RDMA
Provide a module alias so that if userspace opens a netlink
socket for RDMA the kernel support is loaded automatically.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 17:04:22 -04:00
Don Hiatt
d98bb7f7e6 IB/hfi1: Determine 9B/16B L2 header type based on Address handle
When address handle attributes are initialized, the LIDs are
transformed to be in the 32 bit LID space.
When constructing the header, hfi1 driver will look at the LID
to determine the packet header to be created.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:22:37 -04:00
Amrani, Ram
e093111ddb IB/core: Fix input len in multiple user verbs
Most user verbs pass user data to the kernel with the inclusion of the
ib_uverbs_cmd_hdr structure. This is problematic because the vendor has
no ideas if the verb was called by a legacy verb or an extended verb.
Also, the incosistency between the verbs is confusing.

Fixes: 565197dd8f ("IB/core: Extend ib_uverbs_create_cq")
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22 14:02:29 -04:00
Bharat Potnuri
65159c051c RDMA/uverbs: Initialize cq_context appropriately
Initializing cq_context with ev_queue in create_cq(), leads to NULL pointer
dereference in ib_uverbs_comp_handler(), if application doesnot use completion
channel. This patch fixes the cq_context initialization.

Fixes: 1e7710f3f6 ("IB/core: Change completion channel to use the reworked")
Cc: stable@vger.kernel.org # 4.12
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
(cherry picked from commit 699a2d5b1b)
2017-08-22 13:55:47 -04:00
Hiatt, Don
62ede77799 Add OPA extended LID support
This patch series primarily increases sizes of variables that hold
lid values from 16 to 32 bits. Additionally, it adds a check in
the IB mad stack to verify a properly formatted MAD when OPA
extended LIDs are used.

Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 14:47:37 -04:00
Doug Ledford
b0e32e20e3 Merge branch 'k.o/for-4.13-rc' into k.o/for-next
Merging our (hopefully) final -rc pull branch into our for-next branch
because some of our pending patches won't apply cleanly without having
the -rc patches in our tree.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 14:12:04 -04:00
Doug Ledford
d3cf4d9915 Merge branch 'misc' into k.o/for-next
Conflicts:
	drivers/infiniband/core/iwcm.c - The rdma_netlink patches in
	HEAD and the iwarp cm workqueue fix (don't use WQ_MEM_RECLAIM,
	we aren't safe for that context) touched the same code.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 14:10:23 -04:00
Bharat Potnuri
699a2d5b1b RDMA/uverbs: Initialize cq_context appropriately
Initializing cq_context with ev_queue in create_cq(), leads to NULL pointer
dereference in ib_uverbs_comp_handler(), if application doesnot use completion
channel. This patch fixes the cq_context initialization.

Fixes: 1e7710f3f6 ("IB/core: Change completion channel to use the reworked")
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 14:06:02 -04:00
Sagi Grimberg
75215e5bb2 iwcm: Don't allocate iwcm workqueue with WQ_MEM_RECLAIM
Its very likely that iwcm work execution will yield memory
allocations (for example cm connection request).

Reported-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 10:46:20 -04:00
Sagi Grimberg
cb93e59777 cm: Don't allocate ib_cm workqueue with WQ_MEM_RECLAIM
create_workqueue always creates the workqueue with WQ_MEM_RECLAIM
and silences a flush dependency warn for WQ_LEGACY. Instead, we
want to keep the warn in case the allocator tries to flush the
cm workqueue because its very likely that cm work execution will
yield memory allocations (for example cm connection requests).

Reported-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 10:46:20 -04:00
Sagi Grimberg
b059e2108d RDMA/core: make ib_device.add method optional
ib_clients can indeed fill .add to NULL, but then they will not see
any device removal notifications. The reason is that that
ib_register_client and ib_register_device checked existence of .add
before adding the creating a corresponding client_data and adding
it to the list. Simple condition reverse fixes the issue.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-18 10:46:20 -04:00
Maor Gottlieb
870201f95f IB/uverbs: Fix NULL pointer dereference during device removal
As part of ib_uverbs_remove_one which might be triggered upon
reset flow, we trigger IB_EVENT_DEVICE_FATAL event to userspace
application.
If device was removed after uverbs fd was opened but before
ib_uverbs_get_context was called, the event file will be accessed
before it was allocated, result in NULL pointer dereference:

[ 72.325873] BUG: unable to handle kernel NULL pointer dereference at (null)
...
[ 72.325984] IP: _raw_spin_lock_irqsave+0x22/0x40
[ 72.327123] Call Trace:
[ 72.327168] ib_uverbs_async_handler.isra.8+0x2e/0x160 [ib_uverbs]
[ 72.327216] ? synchronize_srcu_expedited+0x27/0x30
[ 72.327269] ib_uverbs_remove_one+0x120/0x2c0 [ib_uverbs]
[ 72.327330] ib_unregister_device+0xd0/0x180 [ib_core]
[ 72.327373] mlx5_ib_remove+0x74/0x140 [mlx5_ib]
[ 72.327422] mlx5_remove_device+0xfb/0x110 [mlx5_core]
[ 72.327466] mlx5_unregister_interface+0x3c/0xa0 [mlx5_core]
[ 72.327509] mlx5_ib_cleanup+0x10/0x962 [mlx5_ib]
[ 72.327546] SyS_delete_module+0x155/0x230
[ 72.328472] ? exit_to_usermode_loop+0x70/0xa6
[ 72.329370] do_syscall_64+0x54/0xc0
[ 72.330262] entry_SYSCALL64_slow_path+0x25/0x25

Fix it by checking that user context was allocated before
trigger the event.

Fixes: 036b106357 ('IB/uverbs: Enable device removal when there are active user space applications')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-16 12:53:15 -04:00
Shiraz Saleem
06f8174a97 IB/core: Protect sysfs entry on ib_unregister_device
ib_unregister_device is not protecting removal of sysfs entries.
A call to ib_register_device in that window can result in
duplicate sysfs entry warning. Move mutex_unlock to after
ib_device_unregister_sysfs to protect against sysfs entry creation.

This issue is exposed during driver load/unload stress test.

WARNING: CPU: 5 PID: 4445 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5f/0x70
sysfs: cannot create duplicate filename '/class/infiniband/i40iw0'
Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Q87M-D2H
BIOS F7 01/17/2014
Workqueue: i40e i40e_service_task [i40e]
Call Trace:
dump_stack+0x67/0x98
__warn+0xcc/0xf0
warn_slowpath_fmt+0x4a/0x50
? kernfs_path_from_node+0x4b/0x60
sysfs_warn_dup+0x5f/0x70
sysfs_do_create_link_sd.isra.2+0xb7/0xc0
sysfs_create_link+0x20/0x40
device_add+0x28c/0x600
ib_device_register_sysfs+0x58/0x170 [ib_core]
ib_register_device+0x325/0x570 [ib_core]
? i40iw_register_rdma_device+0x1f4/0x400 [i40iw]
? kmem_cache_alloc_trace+0x143/0x330
? __raw_spin_lock_init+0x2d/0x50
i40iw_register_rdma_device+0x2dc/0x400 [i40iw]
i40iw_open+0x10a6/0x1950 [i40iw]
? i40iw_open+0xeab/0x1950 [i40iw]
? i40iw_make_cm_node+0x9c0/0x9c0 [i40iw]
i40e_client_subtask+0xa4/0x110 [i40e]
i40e_service_task+0xc2d/0x1320 [i40e]
process_one_work+0x203/0x710
? process_one_work+0x16f/0x710
worker_thread+0x126/0x4a0
? trace_hardirqs_on+0xd/0x10
kthread+0x112/0x150
? process_one_work+0x710/0x710
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x2e/0x40
---[ end trace fd11b69e21ea7653 ]---
Couldn't register device i40iw0 with driver model

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-16 11:47:55 -04:00
Doug Ledford
d0d62c34fb Merge branch 'rdma-netlink' into k.o/merge-test
Conflicts:
	include/rdma/ib_verbs.h - Modified a function signature adjacent
	to a newly added function signature from a previous merge

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-10 14:34:18 -04:00
Doug Ledford
320438301b Merge branches '32bit_lid' and 'irq_affinity' into k.o/merge-test
Conflicts:
	drivers/infiniband/hw/mlx5/main.c - Both add new code
	include/rdma/ib_verbs.h - Both add new code

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-10 14:31:29 -04:00
Leon Romanovsky
1bb77b8c1d RDMA/netlink: Export node_type
Add ability to get node_type for RDAM netlink users.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10 13:28:14 +03:00
Leon Romanovsky
5654e49db0 RDMA/netlink: Provide port state and physical link state
Add port state and physical link state to the users of RDMA netlink.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10 13:28:13 +03:00
Leon Romanovsky
34840fea11 RDMA/netlink: Export LID mask control (LMC)
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10 13:28:13 +03:00
Leon Romanovsky
80a06dd36f RDMA/netink: Export lids and sm_lids
According to the IB specification, the LID and SM_LID
are 16-bit wide, but to support OmniPath users, export
it as 32-bit value from the beginning.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10 13:28:12 +03:00
Leon Romanovsky
12026fbba6 RDMA/netlink: Advertise IB subnet prefix
Add IB subnet prefix to the port properties exported
by RDMA netlink.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10 13:28:12 +03:00
Leon Romanovsky
1aaff896ca RDMA/netlink: Export node_guid and sys_image_guid
Add Node GUID and system image GUID to the device properties
exported by RDMA netlink, to be used by RDMAtool.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10 13:28:11 +03:00
Leon Romanovsky
8621a7e3c1 RDMA/netlink: Export FW version
Add FW version to the device properties exported
by RDMA netlink, to be used by RDMAtool.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10 13:28:11 +03:00
Leon Romanovsky
9abb0d1bbd RDMA: Simplify get firmware interface
There is a need to forward FW version to user space
application through RDMA netlink. In order to make it safe, there
is need to declare nla_policy and limit the size of FW string.

The new define IB_FW_VERSION_NAME_MAX will limit the size of
FW version string. That define was chosen to be equal to
ETHTOOL_FWVERS_LEN, because many drivers anyway are limited
by that value indirectly.

The introduction of this define allows us to remove the string size
from get_fw_str function signature.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10 13:28:10 +03:00
Leon Romanovsky
ac50525374 RDMA/netlink: Expose device and port capability masks
The port capability mask is exposed to user space via sysfs interface,
while device capabilities are available for verbs only.

This patch provides those capabilities through netlink interface.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:28:10 +03:00
Leon Romanovsky
c3f66f7b00 RDMA/netlink: Implement nldev port doit callback
Provide ability to get specific to device and port information.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:28:09 +03:00
Leon Romanovsky
7d02f605f0 RDMA/netlink: Add nldev port dumpit implementation
This patch implements the query interface to get all
ports data for the specific device.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:28:09 +03:00
Leon Romanovsky
e5c9469efc RDMA/netlink: Add nldev device doit implementation
Provide ability to query specific device.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:28:08 +03:00
Leon Romanovsky
b4c598a67e RDMA/netlink: Implement nldev device dumpit calback
This patch adds the ability to return all available devices
together with their properties.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:28:08 +03:00
Leon Romanovsky
6c80b41abe RDMA/netlink: Add nldev initialization flows
Add nldev init and exit flows to the RDMA/core.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:28:07 +03:00
Leon Romanovsky
1a6e7c31d7 RDMA/netlink: Add netlink device definitions to UAPI
Introduce new defines to rdma_netlink.h, so the RDMA configuration tool
will be able to communicate with RDMA subsystem by using the shared defines.

The addition of new client (NLDEV) revealed the fact that we exposed by
mistake the RDMA_NL_I40IW define which is not backed by any RDMA netlink
by now and it won't be exposed in the future too. So this patch reuses
the value and deletes the old defines.

The NLDEV operates with objects. The struct ib_device has two straightforward
objects: device itself and ports of that device.

This brings us to propose the following commands to work on those objects:
 * RDMA_NLDEV_CMD_{GET,SET,NEW,DEL} - works on ib_device itself
 * RDMA_NLDEV_CMD_PORT_{GET,SET,NEW,DEL} - works on ports of specific ib_device

Those commands receive/return the device index (RDMA_NLDEV_ATTR_DEV_INDEX)
and port index (RDMA_NLDEV_ATTR_PORT_INDEX). For device object accesses,
the RDMA_NLDEV_ATTR_PORT_INDEX will return the maximum number of ports
for specific ib_device and for port access the actual port index.

The port index starts from 1 to follow RDMA/core internal semantics and
the sysfs exposed knobs.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:28:07 +03:00
Leon Romanovsky
8bc67414f2 RDMA/netlink: Update copyright
Add Mellanox to the copyright header.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:21:57 +03:00
Leon Romanovsky
647c75ac59 RDMA/netlink: Convert LS to doit callback
RDMA_NL_LS protocol is actually does not dump anything,
but sets data and it should be handled by doit callback.

This patch actually converts RDMA_NL_LS to doit callback, while
preserving IWCM and RDMA_CM flows through netlink_dump_start().

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:21:56 +03:00
Leon Romanovsky
c729943a77 RDMA/netlink: Reduce indirection access to cb_table
Introduce intermediate variable to store access to fields
of cb_table.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:21:56 +03:00
Leon Romanovsky
1830ba21b9 RDMA/netlink: Add and implement doit netlink callback
The .doit callback is used by netlink core to differentiate
between get and set operations. Common convention is to use
that call for command operations like (SET, ADD, e.t.c.) and/or
access without NLF_M_DUMP flag.

This commit adds proper declaration and implementation
to RDMA netlink.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:21:55 +03:00
Leon Romanovsky
ecc82c53f9 RDMA/core: Add and expose static device index
This patch adds static device index in similar fashion to
already available in netdev world (struct net->ifindex).

In downstream patches, the RDMA nelink will use this idx-to-ib_device
conversion, so as part of this commit, we are exposing the translation
function to be visible for IB/core users.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10 13:21:54 +03:00
Leon Romanovsky
8030c8357a RDMA/core: Add iterator over ib_devices
The coming nldev needs to iterate over all IB devices in the system
and in order to not expose the ib_devices list outside the devices.c,
it is necessary to provide function iterator.

Current version is written explicitly for nldev callback to avoid
over-engineering at this stage, but it can be easily extended for
other types.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:21:54 +03:00
Leon Romanovsky
3250b4dbd8 RDMA/netlink: Rename netlink callback struct
The RDMA netlink client infrastructure was removed and made obsolete.
The old infrastructure defined struct ibnl_client_cbs. Now that all
uses of this have been updated to the new infrastructure, rename the
struct to be compliant with the current stack naming standards:
struct rdma_nl_cbs.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:20:15 +03:00
Leon Romanovsky
ff61c425c1 RDMA/netlink: Simplify and rename ibnl_chk_listeners
Make ibnl_chk_listeners function to be one line by removing
unneeded comparison.

Rename that function to be complaint to other functions in RDMA netlink.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:19:03 +03:00
Leon Romanovsky
4d7f693af0 RDMA/netlink: Rename and remove redundant parameter from ibnl_multicast
The pointer to netlink header was not used in the ibnl_multicast
function, so let's remove it and simplify the function
signature.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:19:03 +03:00
Leon Romanovsky
f00e646370 RDMA/netlink: Rename and remove redundant parameter from ibnl_unicast*
Netlink message header is not needed for unicast reply, hence remove it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:19:02 +03:00
Leon Romanovsky
1a1c116f3d RDMA/netlink: Simplify the put_msg and put_attr
Reuse standard macros to cancel the netlink message
in case of error.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:19:01 +03:00
Leon Romanovsky
e3a2b93ddd RDMA/netlink: Add flag to consolidate common handling
Add ability to provide flags to control RDMA netlink callbacks
and convert addr.c and sa_query.c to be first users of such
infrastructure. It allows to move their CAP_NET_ADMIN checks
into netlink core.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:18:45 +03:00
Leon Romanovsky
5d7ee40907 RDMA/iwcm: Remove extra EXPORT_SYMBOLS
The iwcm exports functions which are not used outside of ib_core.
This patch simply removes these EXPORT_SYMBOLS.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Chien Tin Tung <chien.tin.tung@intel.com>
2017-08-10 13:17:43 +03:00
Leon Romanovsky
93fa50760b RDMA/iwcm: Remove useless check of netlink client validity
RDMA netlink implementation guarantees that supplied
client number is in allowed range.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Chien Tin Tung <chien.tin.tung@intel.com>
2017-08-10 13:17:26 +03:00
Leon Romanovsky
3c3e75d5ff RDMA/netlink: Avoid double pass for RDMA netlink messages
The standard netlink_rcv_skb function skips messages without
NLM_F_REQUEST flag in it, while SA netlink client issues them.

In commit bc10ed7d3d ("IB/core: Add rdma netlink helper functions")
the local function was introduced to allow such messages.

This led to double pass for every incoming message.

In this patch, we unify that local implementation and netlink_rcv_skb
functions, so there will be no need for double pass anymore.

As a outcome, this combined function gained more strict check
for NLM_F_REQUEST flag and it is now allowed for SA pathquery
client only.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10 13:15:42 +03:00
Leon Romanovsky
64401b69b2 RDMA/netlink: Remove redundant owner option for netlink callbacks
Owner field is not needed to be set because netlink is part of ib_core
which will be unloaded last after all other modules are unloaded.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10 13:15:41 +03:00
Leon Romanovsky
c9901724a2 RDMA/netlink: Remove netlink clients infrastructure
RDMA netlink has a complicated infrastructure for dynamically
registering and de-registering netlink clients to the NETLINK_RDMA
group. The complicated portion of this code is not widely used because
2 of the 3 current clients are statically compiled together with
netlink.c. The infrastructure, therefore, is deemed overkill.

Refactor the code to eliminate the dynamically added clients. Now all
clients are pre-registered in a client array at compile time, and at run
time they merely check-in with the infrastructure to pass their callback
table for inclusion in the pre-sized client array.

This also allows for future cleanups and removal of unneeded code in the
iwcm* netlink handler.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Chien Tin Tung <chien.tin.tung@intel.com>
2017-08-10 13:13:06 +03:00
Ismail, Mustafa
9047811b77 RDMA/core: Add wait/retry version of ibnl_unicast
Add a wait/retry version of ibnl_unicast, ibnl_unicast_wait,
and modify ibnl_unicast to not wait/retry.  This eliminates
the undesirable wait for future users of ibnl_unicast.

Change Portmapper calls originating from kernel to user-space
to use ibnl_unicast_wait and take advantage of the wait/retry
logic in netlink_unicast.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2017-08-09 16:08:27 +03:00
Dasaratharaman Chandramouli
ac3a949fb2 IB/CM: Set appropriate slid and dlid when handling CM request
If extended LIDs are being used, a connection request contains
OPA GIDs in them. Extract the lids from the OPA gids and populate
slid/dlid fields in the path records that are created when handling
a connection request.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:50:40 -04:00
Dasaratharaman Chandramouli
6b3c0e6e6d IB/CM: Create appropriate path records when handling CM request
When handling an incoming conection request, ib_cm creates
either an IB or an OPA path record based on the gid field
in the request.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:50:25 -04:00