Since mlx4_ib supports IP based addressing, a dependency on INET needs
to be added, since mlx4_ib registers itself for net device events.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Add the missing unlock before return from function cm_init_qp_rtr_attr()
in the error handling case.
Fixes: dd5f03beb4 ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Roland Dreier <roland@purestorage.com>
IP based addressing introduces the usage of rdma_addr_find_dmac_by_grh()
within ib_core. Since this function is declared in ib_addr, ib_addr
should be a part of the core INFINIBAND modules, rather than
INFINIBAND_ADDR_TRANS.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Existing user space applications provide only IBoE L3 address
attributes to the kernel when they issue a modify QP modify. To work
with them and let such apps (plus kernel consumers which don't use the
RDMA-CM) keep working transparently under the IBoE GID IP addressing
changes, add an Eth L2 address resolution helper.
Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch is similar in spirit to the "IB/mlx4: Use IBoE (RoCE) IP
based GIDs in the port GID table" patch.
Changes to inet4 and inet6 addresses for the host are monitored and if
the address is associated with an ocrdma device then a gid is added or
deleted from the device's gid table. The gid format will be a IPv4 to
IPv6 mapped or the IPv6 address.
Cc: Naresh Gottumukkala <bgottumukkala@emulex.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch is similar in spirit to the "IB/mlx4: Handle Ethernet L2
parameters for IP based GID addressing". It handles the fact that IP
based RoCE gids don't store Ethernet L2 parameters, MAC and VLAN.
When building an address handle, instead of parsing the dgid to
get the MAC and VLAN, take them from the address handle attributes.
Cc: Naresh Gottumukkala <bgottumukkala@emulex.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
IP based RoCE gids don't store Ethernet L2 parameters, MAC and VLAN.
Therefore, we need to extract them from the CQE and place them in
struct ib_wc (to be used for cases were they were taken from the gid).
Also, when modifying a QP or building address handle, instead of
parsing the dgid to get the MAC and VLAN, take them from the address
handle attributes.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Currently, the mlx4 driver set IBoE (RoCE) gids to encode related
Ethernet netdevice interface MAC address and possibly VLAN id.
Change this scheme such that gids encode interface IP addresses (both
IP4 and IPv6).
This requires learning the IP addresses which are of use by a
netdevice associated with the HCA port, formatting them to gids and
adding them to the port gid table. Furthermore, events of add and
delete address are caught to maintain the gid table accordingly.
Associated IP addresses may belong to a master of an Ethernet
netdevice on top of that port so this should be considered when
building and maintaining the gid table.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Currently, the IB core and specifically the RDMA-CM assumes that IBoE
(RoCE) gids encode related Ethernet netdevice interface MAC address
and possibly VLAN id.
Change GIDs to be treated as they encode interface IP address.
Since Ethernet layer 2 address parameters are not longer encoded
within gids, we have to extend the Infiniband address structures (e.g.
ib_ah_attr) with layer 2 address parameters, namely mac and vlan.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.
When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.
Thus, those attributes were added to the following structures:
* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id
For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.
On the active side, the CM fills. its internal structures from the
path provided by the ULP. We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).
On the passive side, the CM fills its internal structures from the WC
associated with the REQ message. We add there taking the ETH L2
attributes from the WC.
When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.
ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB. Vendor drivers are modified to support the new
function signature.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When we create a new infiniband link with uninfiniband device, e.g. `ip link
add link em1 type ipoib pkey 0x8001`. We will get a NULL pointer dereference
cause other dev like Ethernet don't have struct ib_device.
The code path is:
rtnl_newlink
|-- ipoib_new_child_link
|-- __ipoib_vlan_add
|-- ipoib_set_dev_features
|-- ib_query_device
Fix this bug by make sure the src net is infiniband when create new link.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"Some holiday bug fixes for 3.13... There is still one bug I'd like to
get fixed before 3.13-final.
The vlan code erroneously assignes the header ops of the underlying
real device to the VLAN device above it when the real device can
hardware offload VLAN handling. That's completely bogus because
header ops are tied to the device type, so they only expect to see a
'dev' argument compatible with their ops.
The fix is the have the VLAN code use a special set of header ops that
does the pass-thru correctly, by calling the underlying real device's
header ops but _also_ passing in the real device instead of the VLAN
device.
That fix is currently waiting some testing.
Anyways, of note here:
1) Fix bitmap edge case in radiotap, from Johannes Berg.
2) Fix oops on driver unload in rtlwifi, from Larry Finger.
3) Bonding doesn't do locking correctly during speed/duplex/link
changes, from Ding Tianhong.
4) Fix header parsing in GRE code, this bug has been around for a few
releases. From Timo Teräs.
5) SIT tunnel driver MTU check needs to take GSO into account, from
Eric Dumazet.
6) Minor info leak in inet_diag, from Daniel Borkmann.
7) Info leak in YAM hamradio driver, from Salva Peiró.
8) Fix route expiration state handling in ipv6 routing code, from Li
RongQing.
9) DCCP probe module does not check request_module()'s return value,
from Wang Weidong.
10) cpsw driver passes NULL device names to request_irq(), from
Mugunthan V N.
11) Prevent a NULL splat in RDS binding code, from Sasha Levin.
12) Fix 4G overflow test in tg3 driver, from Nithin Sujir.
13) Cure use after free in arc_emac and fec driver's software
timestamp handling, from Eric Dumazet.
14) SIT driver can fail to release the route when
iptunnel_handle_offloads() throws an error. From Li RongQing.
15) Several batman-adv fixes from Simon Wunderlich and Antonio
Quartulli.
16) Fix deadlock during TIPC socket release, from Ying Xue.
17) Fix regression in ROSE protocol recvmsg() msg_name handling, from
Florian Westphal.
18) stmmac PTP support releases wrong spinlock, from Vince Bridgers"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
stmmac: Fix incorrect spinlock release and PTP cap detection.
phy: IRQ cannot be shared
net: rose: restore old recvmsg behavior
xen-netback: fix guest-receive-side array sizes
fec: Do not assume that PHY reset is active low
tipc: fix deadlock during socket release
netfilter: nf_tables: fix wrong datatype in nft_validate_data_load()
batman-adv: fix vlan header access
batman-adv: clean nf state when removing protocol header
batman-adv: fix alignment for batadv_tvlv_tt_change
batman-adv: fix size of batadv_bla_claim_dst
batman-adv: fix size of batadv_icmp_header
batman-adv: fix header alignment by unrolling batadv_header
batman-adv: fix alignment for batadv_coded_packet
netfilter: nf_tables: fix oops when updating table with user chains
netfilter: nf_tables: fix dumping with large number of sets
ipv6: release dst properly in ipip6_tunnel_xmit
netxen: Correct off-by-one errors in bounds checks
net: Add some clarification to skb_tx_timestamp() comment.
arc_emac: fix potential use after free
...
- Additional checks for uverbs to ensure forward compatibility, handle
malformed input better.
- Fix potential use-after-free in iWARP connection manager.
- Make a function static.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABCAAGBQJSuHFcAAoJEENa44ZhAt0hetAP/2lMjThZ/dK/Q4eyGXr86qQ6
ygjFc26H7SPEqZ2cjmcm0mgObHhVj4+94YiJDcemHpjwfzLhhZywZxjySTTJdKna
ZSzmSb8si0KNdPAoEdldPmtmO04oTI0+mz7kK2sz9mbn033xDVcJ24M8jSUy6/dl
KBmYWCaiE6rabSyvRNwXOIFilj5RwomQFZzJMjg9JGvWY5qpmwKbcvVdZMCSzAwS
VsZ2Z5UEDaghsGtnYjKQzvR1TDGeq6XyWtR4rfjCH19U6rUcInbA1oZFDWcOqE3C
r/o9jOCUszAVefG3iPX5PJ+A3kKCHB2lcDK9owfoIM/cLfV8KWBFkp0WouD5wiGr
8sfENcKI7aMq+Rptsa24FDNkJVCcdG1HhribRqWY/mn4CwI4c3Qe+5s4x+kgHnxU
VngX/Hp54lw+pnrgWWrKCT5tFVuDgdVhHqyYLabtnKrCHIPsb2Eqp8Kaz2JPNLkI
OYWOcS17wzg7nAPoiqbMVfWInGmziqOpWqVZkFwZ+EBTfJIjE7td+Mxn46PzKsrO
gbVYifn+q5cToDGwmK4pn14G1Xh45tqYW/P3KfZNwyGB6+Ui7u18Q5/epp8BHIWy
8bn3jTcJzBIMlOmFxqirrJ2OixaeAWJhZjgAMJIA3So9MHwmhbkJfgo0J+S6jLnI
p1WRy0kTmPC5NNX21YN8
=bol6
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband fixes from Roland Dreier:
"Last batch of InfiniBand/RDMA changes for 3.13 / 2014:
- Additional checks for uverbs to ensure forward compatibility,
handle malformed input better.
- Fix potential use-after-free in iWARP connection manager.
- Make a function static"
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/uverbs: Check access to userspace response buffer in extended command
IB/uverbs: Check input length in flow steering uverbs
IB/uverbs: Set error code when fail to consume all flow_spec items
IB/uverbs: Check reserved fields in create_flow
IB/uverbs: Check comp_mask in destroy_flow
IB/uverbs: Check reserved field in extended command header
IB/uverbs: New macro to set pointers to NULL if length is 0 in INIT_UDATA()
IB/core: const'ify inbuf in struct ib_udata
RDMA/iwcm: Don't touch cm_id after deref in rem_ref
RDMA/cxgb4: Make _c4iw_write_mem_dma() static
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on original work by Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a check on the output buffer with access_ok(VERIFY_WRITE, ...)
to ensure the whole buffer is in userspace memory before using the
pointer in uverbs functions. If the buffer or a subset of it is not
valid, returns -EFAULT to the caller.
This will also catch invalid buffer before the final call to
copy_to_user() which happen late in most uverb functions.
Just like the check in read(2) syscall, it's a sanity check to detect
invalid parameters provided by userspace. This particular check was added
in vfs_read() by Linus Torvalds for v2.6.12 with following commit message:
https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=fd770e66c9a65b14ce114e171266cf6f393df502
Make read/write always do the full "access_ok()" tests.
The actual user copy will do them too, but only for the
range that ends up being actually copied. That hides
bugs when the range has been clamped by file size or other
issues.
Note: there's no need to check input buffer since vfs_write() already does
access_ok(VERIFY_READ, ...) as part of write() syscall.
Link: http://marc.info/?i=cover.1387273677.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Since ib_copy_from_udata() doesn't check yet the available input data
length before accessing userspace memory, an explicit check of this
length is required to prevent:
- reading past the user provided buffer,
- underflow when subtracting the expected command size from the input
length.
This will ensure the newly added flow steering uverbs don't try to
process truncated commands.
Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If the flow_spec items parsed count does not match the number of items
declared in the flow_attr command, or if not all bytes are used for
flow_spec items (eg. trailing garbage), a log message is reported and
the function leave through the error path. Unfortunately the error
code is currently not set.
This patch set error code to -EINVAL in such cases, so that the error
is reported to userspace instead of silently fail.
Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
As noted by Daniel Vetter in its article "Botching up ioctls"[1]
"Check *all* unused fields and flags and all the padding for whether
it's 0, and reject the ioctl if that's not the case. Otherwise
your nice plan for future extensions is going right down the
gutters since someone *will* submit an ioctl struct with random
stack garbage in the yet unused parts. Which then bakes in the ABI
that those fields can never be used for anything else but garbage."
It's important to ensure that reserved fields are set to known value,
so that it will be possible to use them latter to extend the ABI.
The same reasonning apply to comp_mask field present in newer uverbs
command: per commit 22878dbc91 ("IB/core: Better checking of
userspace values for receive flow steering"), unsupported values in
comp_mask are rejected.
[1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html
Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
As noted by Daniel Vetter in its article "Botching up ioctls"[1]
"Check *all* unused fields and flags and all the padding for whether
it's 0, and reject the ioctl if that's not the case. Otherwise
your nice plan for future extensions is going right down the
gutters since someone *will* submit an ioctl struct with random
stack garbage in the yet unused parts. Which then bakes in the ABI
that those fields can never be used for anything else but garbage."
It's important to ensure that reserved fields are set to known value,
so that it will be possible to use them latter to extend the ABI.
The same reasonning apply to comp_mask field present in newer uverbs
command: per commit 22878dbc91 ("IB/core: Better checking of
userspace values for receive flow steering"), unsupported values in
comp_mask are rejected.
[1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html
Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Trying to have a ternary operator to choose between NULL (or 0) and the
real pointer value in invocations leads to an impossible choice between
a sparse error about a literal 0 used as a NULL pointer, and a gcc
warning about "pointer/integer type mismatch in conditional expression."
Rather than clutter the source with more casts, move the ternary
operator into a new INIT_UDATA_BUF_OR_NULL() macro, which makes it
easier to use and simplifies its callers.
Reported-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch moves INIT_WORK setup for cq_desc->cq_[rx,tx]_work into
isert_create_device_ib_res(), instead of being done each callback
invocation in isert_cq_[rx,tx]_callback().
This also fixes a 'INFO: trying to register non-static key' warning
when cancel_work_sync() is called before INIT_WORK has setup the
struct work_struct.
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Userspace input buffer is not modified by kernel, so it can be 'const'.
This is also a prerequisite to remove the implicit cast
from INIT_UDATA().
Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
rem_ref() calls iwcm_deref_id(), which will wake up any blockers on
cm_id_priv->destroy_comp if the refcnt hits 0. That will unblock
someone in iw_destroy_cm_id() which will free the cmid. If that
happens before rem_ref() calls test_bit(IWCM_F_CALLBACK_DESTROY,
&cm_id_priv->flags), then the test_bit() will touch freed memory.
The fix is to read the bit first, then deref. We should never be in
iw_destroy_cm_id() with IWCM_F_CALLBACK_DESTROY set, and there is a
BUG_ON() to make sure of that.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch marks the function _c4iw_write_mem_dma() as static
because it is not used outside this file, which fixes the warning:
drivers/infiniband/hw/cxgb4/mem.c:176:5: warning: no previous prototype for ‘_c4iw_write_mem_dma’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Pull SCSI target updates from Nicholas Bellinger:
"Things have been quiet this round with mostly bugfixes, percpu
conversions, and other minor iscsi-target conformance testing changes.
The highlights include:
- Add demo_mode_discovery attribute for iscsi-target (Thomas)
- Convert tcm_fc(FCoE) to use percpu-ida pre-allocation
- Add send completion interrupt coalescing for ib_isert
- Convert target-core to use percpu-refcounting for se_lun
- Fix mutex_trylock usage bug in iscsit_increment_maxcmdsn
- tcm_loop updates (Hannes)
- target-core ALUA cleanups + prep for v3.14 SCSI Referrals support (Hannes)
v3.14 is currently shaping to be a busy development cycle in target
land, with initial support for T10 Referrals and T10 DIF currently on
the roadmap"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (40 commits)
iscsi-target: chap auth shouldn't match username with trailing garbage
iscsi-target: fix extract_param to handle buffer length corner case
iscsi-target: Expose default_erl as TPG attribute
target_core_configfs: split up ALUA supported states
target_core_alua: Make supported states configurable
target_core_alua: Store supported ALUA states
target_core_alua: Rename ALUA_ACCESS_STATE_OPTIMIZED
target_core_alua: spellcheck
target core: rename (ex,im)plict -> (ex,im)plicit
percpu-refcount: Add percpu-refcount.o to obj-y
iscsi-target: Do not reject non-immediate CmdSNs exceeding MaxCmdSN
iscsi-target: Convert iscsi_session statistics to atomic_long_t
target: Convert se_device statistics to atomic_long_t
target: Fix delayed Task Aborted Status (TAS) handling bug
iscsi-target: Reject unsupported multi PDU text command sequence
ib_isert: Avoid duplicate iscsit_increment_maxcmdsn call
iscsi-target: Fix mutex_trylock usage in iscsit_increment_maxcmdsn
target: Core does not need blkdev.h
target: Pass through I/O topology for block backstores
iser-target: Avoid using FRMR for single dma entry requests
...
- Re-enable flow steering verbs with new improved userspace ABI
- Fixes for slow connection due to GID lookup scalability
- IPoIB fixes
- Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
- Further improvements to SRP error handling
- Add new transport type for Cisco usNIC
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABCAAGBQJSil7BAAoJEENa44ZhAt0hbtgP/A+AmUalbOX6ZKzuOFxsrtY2
r55CX9b1JBeFM/Zhn2o6y+81lpCjkckJSggESMe4izNgocGw0nW4vYGN4SBynatj
y8sR9OSn+G3ihuENrzG41MJUGEa5WbcNMy4boN+Oa+qyTlV/WjLR7Fv4WbikK7Wm
o8FNlXiiDhMoGfHHG5J0MD0EQsnxuLDk2XP+ciu4tLtTs+wBka+gFK8WnMvztle3
gTeMNna5ilvCS2fdBxteuPA3KeDnJE9AgJSMJ2a4Rh+DR8uTgWYQ6n7amjmOc546
yhAKkoBkxPE10+Yj82WOPhCFxSeWcuSwJvpgv5dTVZ1XqUUcC1V3TEcZDHmyyHQ7
uPXgS1A+erBW3OYPBjZqtKvnHObscV12fL+rId3vIhcAQIbFroci08ZwPidEYRkn
fvwlEKcrIsBIpRXEyjlFCxsiiDnfq1wC1VayMR3jrIK0P6idf1SXf/geiRp9+RGT
wKUc0j51jvEx29qc65xuhEP9FQV9pCMxyd+FEE0d0KkjMz5hsIkjmcUcBbgF0CGg
GEyDPlgRLv+vmWDGpT8XraaV/0CJOEQDIgB4WSN87/AZ4UoNt7spW2xqsLsp1toy
5e0100tpWUleTPLe/Wig5GtBdagQ2jAUK1+186CP93pFPtkwc4/7X3hyp7qPIPTz
VDvT9DEy6zjSMCLpMcdo
=xxC+
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband/rdma updates from Roland Dreier:
- Re-enable flow steering verbs with new improved userspace ABI
- Fixes for slow connection due to GID lookup scalability
- IPoIB fixes
- Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
- Further improvements to SRP error handling
- Add new transport type for Cisco usNIC
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
IB/core: Re-enable create_flow/destroy_flow uverbs
IB/core: extended command: an improved infrastructure for uverbs commands
IB/core: Remove ib_uverbs_flow_spec structure from userspace
IB/core: Use a common header for uverbs flow_specs
IB/core: Make uverbs flow structure use names like verbs ones
IB/core: Rename 'flow' structs to match other uverbs structs
IB/core: clarify overflow/underflow checks on ib_create/destroy_flow
IB/ucma: Convert use of typedef ctl_table to struct ctl_table
IB/cm: Convert to using idr_alloc_cyclic()
IB/mlx5: Fix page shift in create CQ for userspace
IB/mlx4: Fix device max capabilities check
IB/mlx5: Fix list_del of empty list
IB/mlx5: Remove dead code
IB/core: Encorce MR access rights rules on kernel consumers
IB/mlx4: Fix endless loop in resize CQ
RDMA/cma: Remove unused argument and minor dead code
RDMA/ucma: Discard events for IDs not yet claimed by user space
IB/core: Add Cisco usNIC rdma node and transport types
RDMA/nes: Remove self-assignment from nes_query_qp()
IB/srp: Report receive errors correctly
...
This commit reverts commit 7afbddfae9 ("IB/core: Temporarily disable
create_flow/destroy_flow uverbs"). Since the uverbs extensions
functionality was experimental for v3.12, this patch re-enables the
support for them and flow-steering for v3.13.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit 400dbc9658 ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad05a ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.
According to the commit 400dbc9658, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.
But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].
So the following patch is an attempt to a revised extensible command
infrastructure.
This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.
Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.
Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).
So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).
The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.
Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.
The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.
Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:
legacy extended
Maximum command buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
Maximum response buffer: 256KBytes 1024KBytes (512KBytes + 512KBytes)
For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".
One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.
The proposed scheme will format input (command) and output (response)
buffers this way:
- command:
legacy header +
extended header +
command data (core + hw):
+----------------------------------------+
| flags | 00 00 | command |
| in_words | out_words |
+----------------------------------------+
| response |
| response |
| provider_in_words | provider_out_words |
| padding |
+----------------------------------------+
| |
. <uverbs input> .
. (in_words * 8) .
| |
+----------------------------------------+
| |
. <provider input> .
. (provider_in_words * 8) .
| |
+----------------------------------------+
- response, if present:
+----------------------------------------+
| |
. <uverbs output space> .
. (out_words * 8) .
| |
+----------------------------------------+
| |
. <provider output space> .
. (provider_out_words * 8) .
| |
+----------------------------------------+
The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.
Note:
The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility). This was suggested by Roland Dreier in a previous
review[2]. But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.
[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com
[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com
[3]:
http://marc.info/?i=525C1149.6000701@mellanox.com
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
[ Convert "ret ? ret : 0" to the equivalent "ret". - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
The structure holding any types of flow_spec is of no use to
userspace. It would be wrong for userspace to do:
struct ib_uverbs_flow_spec flow_spec;
flow_spec.type = IB_FLOW_SPEC_TCP;
flow_spec.size = sizeof(flow_spec);
Instead, userspace should use the dedicated flow_spec structure for
- Ethernet : struct ib_uverbs_flow_spec_eth,
- IPv4 : struct ib_uverbs_flow_spec_ipv4,
- TCP/UDP : struct ib_uverbs_flow_spec_tcp_udp.
In other words, struct ib_uverbs_flow_spec is a "virtual" data
structure that can only be use by the kernel as an alias to the other.
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch adds "flow" prefix to most of data structure added as part
of commit 436f2ad05a ("IB/core: Export ib_create/destroy_flow through
uverbs") to keep those names in sync with the data structures added in
commit 319a441d13 ("IB/core: Add receive flow steering support").
It's just a matter of translating 'ib_flow' to 'ib_uverbs_flow'.
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit 436f2ad05a ("IB/core: Export ib_create/destroy_flow through
uverbs") added public data structures to support receive flow
steering. The new structs are not following the 'uverbs' pattern:
they're lacking the common prefix 'ib_uverbs'.
This patch replaces ib_kern prefix by ib_uverbs.
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch fixes the following issues:
1. Unneeded checks were removed
2. Removed the fixed size out of flow_attr.size, thus simplifying the checks.
3. Remove a 32bit hole on 64bit systems with strict alignment in
struct ib_kern_flow_att by adding a reserved field.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This typedef is unnecessary and should just be removed.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit 3e6628c4b3 ("idr: introduce idr_alloc_cyclic()") adds a new
idr_alloc_cyclic() routine and converts several of these users to it.
This is just a missed one - add it.
Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Pull trivial tree updates from Jiri Kosina:
"Usual earth-shaking, news-breaking, rocket science pile from
trivial.git"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
doc: add missing files to timers/00-INDEX
timekeeping: Fix some trivial typos in comments
mm: Fix some trivial typos in comments
irq: Fix some trivial typos in comments
NUMA: fix typos in Kconfig help text
mm: update 00-INDEX
doc: Documentation/DMA-attributes.txt fix typo
DRM: comment: `halve' -> `half'
Docs: Kconfig: `devlopers' -> `developers'
doc: typo on word accounting in kprobes.c in mutliple architectures
treewide: fix "usefull" typo
treewide: fix "distingush" typo
mm/Kconfig: Grammar s/an/a/
kexec: Typo s/the/then/
Documentation/kvm: Update cpuid documentation for steal time and pv eoi
treewide: Fix common typo in "identify"
__page_to_pfn: Fix typo in comment
Correct some typos for word frequency
clk: fixed-factor: Fix a trivial typo
...
When creating a CQ, we must use mlx5 adapter page shift.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Move the check on max supported CQEs after the final number of entries is
evaluated.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The value of the local variable index is never used in reg_mr_callback().
Signed-off-by: Eli Cohen <eli@mellanox.com>
[ Remove now-unused variable delta too. - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Enforce the rule that when requesting remote write or atomic permissions, local
write must be indicated as well. See IB spec 11.2.8.2.
Spotted by: Hagay Abramovsky <hagaya@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When calling get_sw_cqe() we need pass the consumer_index and not the
masked value. Failure to do so will cause incorrect result of
get_sw_cqe() possibly leading to endless loop.
This problem was reported and analyzed by Michael Rice from HP.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
* lookup_one_len() really wants i_mutex held on directory.
* leaks galore - just mount ipathfs, then
cd /sys/bus/pci/drivers/qib_ib; echo *:*:*.* >unbind
on a box with that card present and try to umount ipathfs...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This patch avoids a duplicate iscsit_increment_maxcmdsn() call for
ISER_IB_RDMA_WRITE within isert_map_rdma() + isert_reg_rdma_frwr(),
which will already be occuring once during isert_put_datain() ->
iscsit_build_rsp_pdu() operation.
It also removes the local conn->stat_sn assignment + increment,
and changes the third parameter to iscsit_build_rsp_pdu() to
signal this should be done by iscsi_target_mode code.
Tested-by: Moussa Ba <moussaba@micron.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>