Commit Graph

692161 Commits

Author SHA1 Message Date
Leon Romanovsky
ac50525374 RDMA/netlink: Expose device and port capability masks
The port capability mask is exposed to user space via sysfs interface,
while device capabilities are available for verbs only.

This patch provides those capabilities through netlink interface.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:28:10 +03:00
Leon Romanovsky
c3f66f7b00 RDMA/netlink: Implement nldev port doit callback
Provide ability to get specific to device and port information.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:28:09 +03:00
Leon Romanovsky
7d02f605f0 RDMA/netlink: Add nldev port dumpit implementation
This patch implements the query interface to get all
ports data for the specific device.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:28:09 +03:00
Leon Romanovsky
e5c9469efc RDMA/netlink: Add nldev device doit implementation
Provide ability to query specific device.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:28:08 +03:00
Leon Romanovsky
b4c598a67e RDMA/netlink: Implement nldev device dumpit calback
This patch adds the ability to return all available devices
together with their properties.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:28:08 +03:00
Leon Romanovsky
6c80b41abe RDMA/netlink: Add nldev initialization flows
Add nldev init and exit flows to the RDMA/core.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:28:07 +03:00
Leon Romanovsky
1a6e7c31d7 RDMA/netlink: Add netlink device definitions to UAPI
Introduce new defines to rdma_netlink.h, so the RDMA configuration tool
will be able to communicate with RDMA subsystem by using the shared defines.

The addition of new client (NLDEV) revealed the fact that we exposed by
mistake the RDMA_NL_I40IW define which is not backed by any RDMA netlink
by now and it won't be exposed in the future too. So this patch reuses
the value and deletes the old defines.

The NLDEV operates with objects. The struct ib_device has two straightforward
objects: device itself and ports of that device.

This brings us to propose the following commands to work on those objects:
 * RDMA_NLDEV_CMD_{GET,SET,NEW,DEL} - works on ib_device itself
 * RDMA_NLDEV_CMD_PORT_{GET,SET,NEW,DEL} - works on ports of specific ib_device

Those commands receive/return the device index (RDMA_NLDEV_ATTR_DEV_INDEX)
and port index (RDMA_NLDEV_ATTR_PORT_INDEX). For device object accesses,
the RDMA_NLDEV_ATTR_PORT_INDEX will return the maximum number of ports
for specific ib_device and for port access the actual port index.

The port index starts from 1 to follow RDMA/core internal semantics and
the sysfs exposed knobs.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:28:07 +03:00
Leon Romanovsky
8bc67414f2 RDMA/netlink: Update copyright
Add Mellanox to the copyright header.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:21:57 +03:00
Leon Romanovsky
647c75ac59 RDMA/netlink: Convert LS to doit callback
RDMA_NL_LS protocol is actually does not dump anything,
but sets data and it should be handled by doit callback.

This patch actually converts RDMA_NL_LS to doit callback, while
preserving IWCM and RDMA_CM flows through netlink_dump_start().

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:21:56 +03:00
Leon Romanovsky
c729943a77 RDMA/netlink: Reduce indirection access to cb_table
Introduce intermediate variable to store access to fields
of cb_table.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:21:56 +03:00
Leon Romanovsky
1830ba21b9 RDMA/netlink: Add and implement doit netlink callback
The .doit callback is used by netlink core to differentiate
between get and set operations. Common convention is to use
that call for command operations like (SET, ADD, e.t.c.) and/or
access without NLF_M_DUMP flag.

This commit adds proper declaration and implementation
to RDMA netlink.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:21:55 +03:00
Leon Romanovsky
ecc82c53f9 RDMA/core: Add and expose static device index
This patch adds static device index in similar fashion to
already available in netdev world (struct net->ifindex).

In downstream patches, the RDMA nelink will use this idx-to-ib_device
conversion, so as part of this commit, we are exposing the translation
function to be visible for IB/core users.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10 13:21:54 +03:00
Leon Romanovsky
8030c8357a RDMA/core: Add iterator over ib_devices
The coming nldev needs to iterate over all IB devices in the system
and in order to not expose the ib_devices list outside the devices.c,
it is necessary to provide function iterator.

Current version is written explicitly for nldev callback to avoid
over-engineering at this stage, but it can be easily extended for
other types.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:21:54 +03:00
Leon Romanovsky
3250b4dbd8 RDMA/netlink: Rename netlink callback struct
The RDMA netlink client infrastructure was removed and made obsolete.
The old infrastructure defined struct ibnl_client_cbs. Now that all
uses of this have been updated to the new infrastructure, rename the
struct to be compliant with the current stack naming standards:
struct rdma_nl_cbs.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:20:15 +03:00
Leon Romanovsky
ff61c425c1 RDMA/netlink: Simplify and rename ibnl_chk_listeners
Make ibnl_chk_listeners function to be one line by removing
unneeded comparison.

Rename that function to be complaint to other functions in RDMA netlink.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:19:03 +03:00
Leon Romanovsky
4d7f693af0 RDMA/netlink: Rename and remove redundant parameter from ibnl_multicast
The pointer to netlink header was not used in the ibnl_multicast
function, so let's remove it and simplify the function
signature.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:19:03 +03:00
Leon Romanovsky
f00e646370 RDMA/netlink: Rename and remove redundant parameter from ibnl_unicast*
Netlink message header is not needed for unicast reply, hence remove it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:19:02 +03:00
Leon Romanovsky
1a1c116f3d RDMA/netlink: Simplify the put_msg and put_attr
Reuse standard macros to cancel the netlink message
in case of error.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:19:01 +03:00
Leon Romanovsky
e3a2b93ddd RDMA/netlink: Add flag to consolidate common handling
Add ability to provide flags to control RDMA netlink callbacks
and convert addr.c and sa_query.c to be first users of such
infrastructure. It allows to move their CAP_NET_ADMIN checks
into netlink core.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
2017-08-10 13:18:45 +03:00
Leon Romanovsky
5d7ee40907 RDMA/iwcm: Remove extra EXPORT_SYMBOLS
The iwcm exports functions which are not used outside of ib_core.
This patch simply removes these EXPORT_SYMBOLS.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Chien Tin Tung <chien.tin.tung@intel.com>
2017-08-10 13:17:43 +03:00
Leon Romanovsky
93fa50760b RDMA/iwcm: Remove useless check of netlink client validity
RDMA netlink implementation guarantees that supplied
client number is in allowed range.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Chien Tin Tung <chien.tin.tung@intel.com>
2017-08-10 13:17:26 +03:00
Leon Romanovsky
3c3e75d5ff RDMA/netlink: Avoid double pass for RDMA netlink messages
The standard netlink_rcv_skb function skips messages without
NLM_F_REQUEST flag in it, while SA netlink client issues them.

In commit bc10ed7d3d ("IB/core: Add rdma netlink helper functions")
the local function was introduced to allow such messages.

This led to double pass for every incoming message.

In this patch, we unify that local implementation and netlink_rcv_skb
functions, so there will be no need for double pass anymore.

As a outcome, this combined function gained more strict check
for NLM_F_REQUEST flag and it is now allowed for SA pathquery
client only.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10 13:15:42 +03:00
Leon Romanovsky
64401b69b2 RDMA/netlink: Remove redundant owner option for netlink callbacks
Owner field is not needed to be set because netlink is part of ib_core
which will be unloaded last after all other modules are unloaded.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-08-10 13:15:41 +03:00
Leon Romanovsky
c9901724a2 RDMA/netlink: Remove netlink clients infrastructure
RDMA netlink has a complicated infrastructure for dynamically
registering and de-registering netlink clients to the NETLINK_RDMA
group. The complicated portion of this code is not widely used because
2 of the 3 current clients are statically compiled together with
netlink.c. The infrastructure, therefore, is deemed overkill.

Refactor the code to eliminate the dynamically added clients. Now all
clients are pre-registered in a client array at compile time, and at run
time they merely check-in with the infrastructure to pass their callback
table for inclusion in the pre-sized client array.

This also allows for future cleanups and removal of unneeded code in the
iwcm* netlink handler.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Chien Tin Tung <chien.tin.tung@intel.com>
2017-08-10 13:13:06 +03:00
Ismail, Mustafa
9047811b77 RDMA/core: Add wait/retry version of ibnl_unicast
Add a wait/retry version of ibnl_unicast, ibnl_unicast_wait,
and modify ibnl_unicast to not wait/retry.  This eliminates
the undesirable wait for future users of ibnl_unicast.

Change Portmapper calls originating from kernel to user-space
to use ibnl_unicast_wait and take advantage of the wait/retry
logic in netlink_unicast.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2017-08-09 16:08:27 +03:00
Sagi Grimberg
0b36658ca8 nvme-rdma: use intelligent affinity based queue mappings
Use the generic block layer affinity mapping helper. Also,
limit nr_hw_queues to the rdma device number of irq vectors
as we don't really need more.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:58:03 -04:00
Sagi Grimberg
24c5dc6610 block: Add rdma affinity based queue mapping helper
Like pci and virtio, we add a rdma helper for affinity
spreading. This achieves optimal mq affinity assignments
according to the underlying rdma device affinity maps.

Reviewed-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:58:03 -04:00
Sagi Grimberg
40b24403f3 mlx5: support ->get_vector_affinity
Simply refer to the generic affinity mask helper.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:58:03 -04:00
Sagi Grimberg
c66cd353bb RDMA/core: expose affinity mappings per completion vector
This will allow ULPs to intelligently locate threads based
on completion vector cpu affinity mappings. In case the
driver does not expose a get_vector_affinity callout, return
NULL so the caller can maintain a fallback logic.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:55:56 -04:00
Sagi Grimberg
a435393aca mlx5: move affinity hints assignments to generic code
generic api takes care of spreading affinity similar to
what mlx5 open coded (and even handles better asymmetric
configurations). Ask the generic API to spread affinity
for us, and feed him pre_vectors that do not participate
in affinity settings (which is an improvement to what we
had before).

The affinity assignments should match what mlx5 tried to
do earlier but now we do not set affinity to async, cmd
and pages dedicated vectors.

Also, remove mlx5e_get_cpu and introduce mlx5e_get_node
(used for allocation purposes) and mlx5_get_vector_affinity
(for indirection table construction) as they provide the needed
information. Luckily, we have generic helpers to get cpumask
and node given a irq vector. mlx5_get_vector_affinity will
be used by mlx5_ib in a subsequent patch.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:55:56 -04:00
Sagi Grimberg
a85e5474f4 mlx5e: don't assume anything on the irq affinity mappings of the device
mlx5e currently assumes that irq affinity is really spread first
irq vectors across device home node cpus, with the new generic affinity
mappings this is no longer the case, hence mlxe should not rely on
this anymore.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:53:05 -04:00
Sagi Grimberg
78249c4215 mlx5: convert to generic pci_alloc_irq_vectors
Now that we have a generic code to allocate an array
of irq vectors and even correctly spread their affinity,
correctly handle cpu hotplug events and more, were much
better off using it.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:53:05 -04:00
Dasaratharaman Chandramouli
ac3a949fb2 IB/CM: Set appropriate slid and dlid when handling CM request
If extended LIDs are being used, a connection request contains
OPA GIDs in them. Extract the lids from the OPA gids and populate
slid/dlid fields in the path records that are created when handling
a connection request.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:50:40 -04:00
Dasaratharaman Chandramouli
6b3c0e6e6d IB/CM: Create appropriate path records when handling CM request
When handling an incoming conection request, ib_cm creates
either an IB or an OPA path record based on the gid field
in the request.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:50:25 -04:00
Hiatt, Don
e92aa00a51 IB/CM: Add OPA Path record support to CM
Add OPA path record support to the Connection Manager.

Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:50:25 -04:00
Hiatt, Don
7db20ecd1d IB/core: Change wc.slid from 16 to 32 bits
slid field in struct ib_wc is increased to 32 bits.
This enables core components to use larger LIDs if needed.
The user ABI is unchanged and return 16 bit values when queried.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:50:25 -04:00
Dasaratharaman Chandramouli
db58540b02 IB/core: Change port_attr.sm_lid from 16 to 32 bits
sm_lid field in struct ib_port_attr is increased to 32 bits. This
enables core components to use larger LIDs if needed.
The user ABI is unchanged and return 16 bit values when queried.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:50:25 -04:00
Dasaratharaman Chandramouli
582faf3150 IB/core: Change port_attr.lid size from 16 to 32 bits
lid field in struct ib_port_attr is increased to 32 bits. This enables core
components to use larger LIDs if needed.
The user ABI is unchanged and return 16 bit values when queried.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:50:25 -04:00
Dasaratharaman Chandramouli
1cb2fc0db7 IB/mad: Change slid in RMPP recv from 16 to 32 bits
MAD RMPP contains slid field which is 16 bits in
length, increase it to 32 bits.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:47:18 -04:00
Dasaratharaman Chandramouli
7e93e2cb83 IB/IPoIB: Increase local_lid to 32 bits
IPoIB contains local_lid field which is 16 bits in
length, increase it to 32 bits.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:47:18 -04:00
Dasaratharaman Chandramouli
4c4736905e IB/srpt: Increase lid and sm_lid to 32 bits
srpt contains lid and sm_lid fields which are 16 bits in
length, increase them to 32 bits.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:47:18 -04:00
Dasaratharaman Chandramouli
d541e45500 IB/core: Convert ah_attr from OPA to IB when copying to user
OPA address handle atttibutes that have 32 bit LIDs would have to
be converted to IB address handle attribute with the LID field
programmed in the GID before copying to user space.

Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-08 14:47:18 -04:00
Doug Ledford
48107c4e59 IPoIB fixes for 4.13
The patchset provides various fixes for IPoIB. It is combination of
 fixes to various issues discovered during verification along with
 static checkers cleanup patches.
 
 Most of the patches are from pre-git era and hence lack of Fixes lines.
 
 There is one exception in this IPoIB group - addition of patch revert:
 Revert "IB/core: Allow QP state transition from reset to error", but
 it followed by proper fix to the annoying print, so I thought it is
 appropriate to include it.
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEkhr/r4Op1/04yqaB5GN7iDZyWKcFAll41ucQHGxlb25Aa2Vy
 bmVsLm9yZwAKCRDkY3uINnJYpwsLD/9wb1vS+4UBh7L7+jEnko1nQhFU4RLRm8Z7
 mdW0y+5OU0wSNFGxkFyR1dd5xoUlDgoiTxqWy4Wwmwsi8s+yOKbhg110BAqq+G9k
 gPNPIx/FXXTHY0UmQo8L2DdwCncpBXmzrIiiiQ+qdmm2Rg4J1QttCpe+8XR249ad
 0jrGhc5MgzJv1dx0mRTm3n/U1Skmx4RNkaYUb8uat08rnX+FpVAqY0fVNJNNeYUY
 nUZ+J5fYRxKhamofrKntciMpATEtx162BNbS+3A0s+W7up8g1dapS4apD2QIRllR
 g2BOWPG1lrYcjn92jGGMbNJ0Wi1rNjeEhxX+2Gl6tYlyYCevOx8MEojpuFrnKjd8
 CtG6rc5aM1yBAS5Sm5Jsxe11g72Reqc7Ciuqr+6g20ypbwADunufPfl+4jAvsEik
 mL1GlpCB3DHI205dnxC26RB9vhgSrCYXutdLPYTcVTBiEM+AuN2mo36Zyk2CAXJI
 x8IqDjHfRyd7HlbXyeSSVN/yckOJHTfNI66JCSTSm+MHCK0gMuUbciasWlxjkPyT
 84adAg35iNwrP5HKbcXVu1I9ly4R7uksHnmj8ANLwmXkXN123vT+Vpg0UrQor27e
 i2m4sV6WqHfLHhXPuz7iYojpp28wvu3RlA2px7KYINWsmqT+bK2ZBove5CFzvS/n
 8jPSbikW7A==
 =UbC8
 -----END PGP SIGNATURE-----

Merge tag 'rdma-rc-2017-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma into leon-ipoib

IPoIB fixes for 4.13

The patchset provides various fixes for IPoIB. It is combination of
fixes to various issues discovered during verification along with
static checkers cleanup patches.

Most of the patches are from pre-git era and hence lack of Fixes lines.

There is one exception in this IPoIB group - addition of patch revert:
Revert "IB/core: Allow QP state transition from reset to error", but
it followed by proper fix to the annoying print, so I thought it is
appropriate to include it.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-07 13:30:40 -04:00
Dan Carpenter
5db465f235 IB/hns: checking for IS_ERR() instead of NULL
The hns_roce_v1_create_lp_qp() returns NULL on error, not error pointers.

Fixes: bfcc681bd0 ("IB/hns: Fix the bug when free mr")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-04 15:09:56 -04:00
Leon Romanovsky
931b3c1a83 RDMA/mlx5: Fix existence check for extended address vector
The extended address vector is the highest bit in be32 variable,
but it was compared with the lowest. This patch fixes the endianness
of that check and removes already declared define.

Fixes: 17d2f88f92 ("IB/mlx5: Add ODP atomics support")
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-04 14:24:05 -04:00
Yishai Hadas
efdd6f53b1 IB/uverbs: Fix device cleanup
Uverbs device should be cleaned up only when there is no
potential usage of.

As part of ib_uverbs_remove_one which might be triggered upon reset flow
the device reference count is decreased as expected and leave the final
cleanup to the FDs that were opened.

Current code increases reference count upon opening a new command FD and
decreases it upon closing the file. The event FD is opened internally
and rely on the command FD by taking on it a reference count.

In case that the command FD was closed and just later the event FD we
may ensure that the device resources as of srcu are still alive as they
are still in use.

Fixing the above by moving the reference count decreasing to the place
where the command FD is really freed instead of doing that when it was
just closed.

fixes: 036b106357 ("IB/uverbs: Enable device removal when there are active user space applications")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-04 14:24:05 -04:00
Leon Romanovsky
f7a6cb7b38 RDMA/uverbs: Prevent leak of reserved field
initialize to zero the response structure to prevent
the leakage of "resp.reserved" field.

drivers/infiniband/core/uverbs_cmd.c:1178 ib_uverbs_resize_cq() warn:
	check that 'resp.reserved' doesn't leak information

Fixes: 33b9b3ee97 ("IB: Add userspace support for resizing CQs")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-04 14:24:05 -04:00
Parav Pandit
5fff41e1f8 IB/core: Fix race condition in resolving IP to MAC
Currently while resolving IP address to MAC address single delayed work
is used for resolving multiple such resolve requests. This singled work
is essentially performs two tasks.
(a) any retry needed to resolve and
(b) it executes the callback function for all completed requests

While work is executing callbacks, any new work scheduled on for this
workqueue is lost because workqueue has completed looking at all pending
requests and now looking at callbacks, but work is still under
execution. Any further retry to look at pending requests in
process_req() after executing callbacks would lead to similar race
condition (may be reduce the probably further but doesn't eliminate it).
Retrying to enqueue work that from queue_req() context is not something
rest of the kernel modules have followed.

Therefore fix in this patch utilizes kernel facility to enqueue multiple
work items to a workqueue. This ensures that no such requests
gets lost in synchronization. Request list is still maintained so that
rdma_cancel_addr() can unlink the request and get the completion with
error sooner. Neighbour update event handling continues to be handled in
same way as before.
Additionally process_req() work entry cancels any pending work for a
request that gets completed while processing those requests.

Originally ib_addr was ST workqueue, but it became MT work queue with
patch of [1]. This patch again makes it similar to ST so that
neighbour update events handler work item doesn't race with
other work items.

In one such below trace, (though on 4.5 based kernel) it can be seen
that process_req() never executed the callback, which is likely for an
event that was schedule by queue_req() when previous callback was
getting executed by workqueue.

 [<ffffffff816b0dde>] schedule+0x3e/0x90
 [<ffffffff816b3c45>] schedule_timeout+0x1b5/0x210
 [<ffffffff81618c37>] ? ip_route_output_flow+0x27/0x70
 [<ffffffffa027f9c9>] ? addr_resolve+0x149/0x1b0 [ib_addr]
 [<ffffffff816b228f>] wait_for_completion+0x10f/0x170
 [<ffffffff810b6140>] ? try_to_wake_up+0x210/0x210
 [<ffffffffa027f220>] ? rdma_copy_addr+0xa0/0xa0 [ib_addr]
 [<ffffffffa0280120>] rdma_addr_find_l2_eth_by_grh+0x1d0/0x278 [ib_addr]
 [<ffffffff81321297>] ? sub_alloc+0x77/0x1c0
 [<ffffffffa02943b7>] ib_init_ah_from_wc+0x3a7/0x5a0 [ib_core]
 [<ffffffffa0457aba>] cm_req_handler+0xea/0x580 [ib_cm]
 [<ffffffff81015982>] ? __switch_to+0x212/0x5e0
 [<ffffffffa04582fd>] cm_work_handler+0x6d/0x150 [ib_cm]
 [<ffffffff810a14c1>] process_one_work+0x151/0x4b0
 [<ffffffff810a1940>] worker_thread+0x120/0x480
 [<ffffffff816b074b>] ? __schedule+0x30b/0x890
 [<ffffffff810a1820>] ? process_one_work+0x4b0/0x4b0
 [<ffffffff810a1820>] ? process_one_work+0x4b0/0x4b0
 [<ffffffff810a6b1e>] kthread+0xce/0xf0
 [<ffffffff810a6a50>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff816b53a2>] ret_from_fork+0x42/0x70
 [<ffffffff810a6a50>] ? kthread_freezable_should_stop+0x70/0x70
INFO: task kworker/u144:1:156520 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kworker/u144:1  D ffff883ffe1d7600     0 156520      2 0x00000080
Workqueue: ib_addr process_req [ib_addr]
 ffff883f446fbbd8 0000000000000046 ffff881f95280000 ffff881ff24de200
 ffff883f66120000 ffff883f446f8008 ffff881f95280000 ffff883f6f9208c4
 ffff883f6f9208c8 00000000ffffffff ffff883f446fbbf8 ffffffff816b0dde

[1] http://lkml.iu.edu/hypermail/linux/kernel/1608.1/05834.html

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-04 14:24:04 -04:00
Sebastian Sanchez
913cc67159 IB/hfi1: Always perform offline transition
Always initiate an offline transition request
when a link down occurs. The firmware will
use this request to confirm that the driver
has seen the link down message. A host version
is set to indicate this driver behavior to the
firmware.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:38 -04:00
Sebastian Sanchez
626c077c02 IB/hfi1: Prevent link down request double queuing
When link interrupts occur, multiple link down requests
could be queued up when only one is needed. This could get
the hfi1 out of sync with its link partner during LNI.

Only allow one link down request to be queued at any one time.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31 15:18:38 -04:00