This patch makes following fixes in QP flush logic:
- correctly flushes unsignaled WRs followed by a signaled WR
- supports for flushing a CQ bound to multiple QPs
- resets cidx_flush if a active queue starts getting HW CQEs again
- marks WQ in error when we leave RTS. This was only being done for
user queues, but we need it for kernel queues too so that
post_send/post_recv will start returning the appropriate error
synchronously
- eats unsignaled read resp CQEs. HW always inserts CQEs so we must
silently discard them if the read work request was unsignaled.
- handles QP flushes with pending SW CQEs. The flush and out of order
completion logic has a bug where if out of order completions are
flushed but not yet polled by the consumer and the qp is then
flushed then we end up inserting duplicate completions.
- c4iw_flush_sq() should only flush wrs that have not already been
flushed. Since we already track where in the SQ we've flushed via
sq.cidx_flush, just start at that point and flush any remaining.
This bug only caused a problem in the presence of unsignaled work
requests.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
[ Fixed sparse warning due to htonl/ntohl confusion. - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Move QP to TERMINATE instead to allow the peer to get the TERM
message. This bug wasn't detectable until newer FW that moves
connections out of RDMA mode as soon as an error is detected.
QP can exit RTS before the last AE arrives. This was introduced by
changes in the FW to kick connections out of RDMA mode as soon as an
error is detected. A side effect of this is that the driver can move
the QP out of RTS before the AE causing the connection to get kicked
out of RDMA mode is processed. Fix for this is to always post async
errors even if the QP is out of RTS.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Add new cpl messages, cpl_act_open_req6 and cpl_t5_act_open_req6, for
initiating active open connections.
Use LLD api cxgb4_create_server and cxgb4_create_server6 for
initiating passive open connections. Similarly use cxgb4_remove_server
to remove the passive open connections in place of listen_stop.
Add support for iWARP over VLAN device and enable IPv6 support on VLAN device.
Make use of import_ep in c4iw_reconnect.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
[ Fix build when IPv6 is disabled and make sure iw_cxgb4 is not built-in
when ipv6 is a module. - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Modify the type of local_addr and remote_addr fields in struct
iw_cm_id from struct sockaddr_in to struct sockaddr_storage to hold
IPv6 and IPv4 addresses uniformly.
Change the references of local_addr and remote_addr in cxgb4, cxgb3,
nes and amso drivers to match this. However to be able to actully run
traffic over IPv6, low-level drivers have to add code to support this.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
[ Fix unused variable warnings when INFINIBAND_NES_DEBUG not set.
- Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
"uresp.ma_sync_key" doesn't get set on this path so we leak 8 bytes of data.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
- XRC transport fixes
- Fix DHCP on IPoIB
- mlx4 preparations for flow steering
- iSER fixes
- miscellaneous other fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJRisChAAoJEENa44ZhAt0hHLoP/iX5MxtHJ3X1u5KcARX/7nci
CCH/VnD172re/KavCCg7zkbZQpS4jHCtW/CzLUCSPqBGOaj78HFTUkB3ragvUW7m
ndERwplF8DP/i0x7Kk7Wau2a4RdlH0lqwucOjqTyQnbIdxknkz6w3Jcsb9Ic2lzx
up0T0HaHtTxdVF6lXOB5QOIpUGg3l0Yu4euX2BA61WDoZj+VIYhgeWZewq0iV0D1
rLtarJ+Or7mdwu2rNcDHgrD0lhF4SCBd3rx4lbc4F68Cr8JUz0Xe7liPLNskeLhW
f3NEm3gmkYp9YI1otGsA0X/CyV6wnRk4mT8JMlOb2WNzeq2V13Z54/9ZyF5/gFD2
JgzkQB9Ibf7EmTwXWd6+0+FA40Q6dNvnRnhddRM255dvDVw7nxUr2UzYH4Re/Z9K
rNFjkvix2YUwEmoPjitWocz2kj2reDMqjtiVDmdGy1YbtnicH5GtkQsWkoPg8ON9
m5jORUdzydTD+yBJwTiFP1EuFoG3TdfoZ7zHMJwWy/u8i308xD6WPGms9MTdjh8j
7gjz2TCKr+vpuVRh/p6esCPPOTSsSeWDeowy7Sgpdf3qoqAImXsWXVrl2kXLhtyl
1VIgHU3ztm7oqwmy0gQ/zVCo4CLLdsif2zmEIDpxJPnWaSq+D9LJdyLTcfdBjhQ8
9SjUafe4msT1pIjNb7ND
=1ojD
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA changes from Roland Dreier:
- XRC transport fixes
- Fix DHCP on IPoIB
- mlx4 preparations for flow steering
- iSER fixes
- miscellaneous other fixes
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (23 commits)
IB/iser: Add support for iser CM REQ additional info
IB/iser: Return error to upper layers on EAGAIN registration failures
IB/iser: Move informational messages from error to info level
IB/iser: Add module version
mlx4_core: Expose a few helpers to fill DMFS HW strucutures
mlx4_core: Directly expose fields of DMFS HW rule control segment
mlx4_core: Change a few DMFS fields names to match firmare spec
mlx4: Match DMFS promiscuous field names to firmware spec
mlx4_core: Move DMFS HW structs to common header file
IB/mlx4: Set link type for RAW PACKET QPs in the QP context
IB/mlx4: Disable VLAN stripping for RAW PACKET QPs
mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level
RDMA/iwcm: Don't touch cmid after dropping reference
IB/qib: Correct qib_verbs_register_sysfs() error handling
IB/ipath: Correct ipath_verbs_register_sysfs() error handling
RDMA/cxgb4: Fix SQ allocation when on-chip SQ is disabled
SRPT: Fix odd use of WARN_ON()
IPoIB: Fix ipoib_hard_header() return value
RDMA: Rename random32() to prandom_u32()
RDMA/cxgb3: Fix uninitialized variable
...
Use preferable function name which implies using a pseudo-random number
generator.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use more preferable function name which implies using a pseudo-random
number generator.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Steve Wise <swise@chelsio.com>
Cc: linux-rdma@vger.kernel.org
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Conflicts:
include/net/ipip.h
The changes made to ipip.h in 'net' were already included
in 'net-next' before that header was moved to another location.
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fix for TX lockup in IPoIB
- QLogic -> Intel update for qib driver
- Small static checker fix for qib
- Fix error path return value in cxgb4
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJRUH4BAAoJEENa44ZhAt0hP7gP/1SAVyeL8H/QNG6FDki7WUJC
qoW4bnezuNb35/H9NSPUQ/XA0QY2XKdJamkSnDJgE5WVjWZS80hKUBcdUi3S2GBl
Zbbs7H0Z2c/fiDvULrxRwqkZ/MlW4a6NnyKRHfSswKY+bAi+uybGoYGKmczXV4gC
LlscYagb4WYVTNVlD5JG9sxo5ChueAx51IbzuOWq/94iyKtocFB87ZyK2FhbQaua
n6af7lijAcTk02ad4JY2+ZZOOjGgOxP5+iigExzkXuS28dgNvNtN8FrrzLbiDHHy
aZ7miLSdPawO790nvXzn1NGL/nZiUmRI46d7qv6K/X38IX7GUFtMNEDUFCCZknwg
4VtNXzFExQy4YLCMpZ6di6rxMNVQ/EGj7QLN8jfD/pBEfiH/BMISdITL2KbgzcUv
5TUQGeQFzzyY+2HwUCqb0RCskbhsIUwLDpEnhnyHI2sZKbdRrpI07ZpL3iAhTcqK
5RDyXHNaVudVaKzaxq96vuw2OhH5fvACp3SdpaoCgZ54XuqoDoVW2q54vlj7anQI
E0/f+EDyBf8Nu5YydQri76HE+BqG+PzCqU5Rq9866itLEG2DMsmAj9KJJUcDTiCn
ANUIHoOpriGBAm1NP5tau/C0isvTw+lD/pBh26sdum1FwhoeqVf55PMvhxMG4Q/N
aC3g+RMj3BPkdhewWpY+
=SOee
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband/rdma fixes from Roland Dreier:
"Small batch of InfiniBand/RDMA fixes for 3.9:
- Fix for TX lockup in IPoIB
- QLogic -> Intel update for qib driver
- Small static checker fix for qib
- Fix error path return value in cxgb4"
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/qib: change QLogic to Intel
IB/ipath: Silence a static checker warning
IPoIB: Fix send lockup due to missed TX completion
RDMA/cxgb4: Fix error return code in create_qp()
Fix to return a negative error code from the error handling case
instead of 0, as returned elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
TCPCT uses option-number 253, reserved for experimental use and should
not be used in production environments.
Further, TCPCT does not fully implement RFC 6013.
As a nice side-effect, removing TCPCT increases TCP's performance for
very short flows:
Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests
for files of 1KB size.
before this patch:
average (among 7 runs) of 20845.5 Requests/Second
after:
average (among 7 runs) of 21403.6 Requests/Second
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
When neighbour table is full, dst_neigh_lookup/dst_neigh_lookup_skb will return
-ENOBUFS which is absolutely non zero, while all the code in kernel which use
above functions assume failure only on zero return which will cause panic. (for
example: : https://bugzilla.kernel.org/show_bug.cgi?id=54731).
This patch corrects above error with smallest changes to kernel source code and
also correct two return value check missing bugs in drivers/infiniband/hw/cxgb4/cm.c
Tested on my x86_64 SMP machine
Reported-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
T5 adapter does not support onchip queue memory. Present logic fails to
allocate QP for T5 and returns an error. Also, if module parameter ocqp_support
is zero then we are unable to allocate QP which should not be the case. Ideally
if ocqp_support parameter is 0 or onchip queue support is disable then host QP
should be allocated before returning an error.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Always bump the tcam_full stat. Also, bump wr reply timeout to 30 seconds.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It enables direct DMA by HW to memory region PBL arrays and fast register PBL
arrays from host memory, vs the T4 way of passing these arrays in the WR itself.
The result is lower latency for memory registration, and larger PBL array
support for fast register operations.
This patch also updates ULP_TX_MEM_WRITE command fields for T5. Ordering bit of
ULP_TX_MEM_WRITE is at bit position 22 in T5 and at 23 in T4.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both DB Flow-Control and DB Coalescing are disabled by default on T5
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support for Chelsio T5 adapter.
Enables T5's Write Combining feature.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert to the much saner new idr interface.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch enhances the IB core support for Memory Windows (MWs).
MWs allow an application to have better/flexible control over remote
access to memory.
Two types of MWs are supported, with the second type having two flavors:
Type 1 - associated with PD only
Type 2A - associated with QPN only
Type 2B - associated with PD and QPN
Applications can allocate a MW once, and then repeatedly bind the MW
to different ranges in MRs that are associated to the same PD. Type 1
windows are bound through a verb, while type 2 windows are bound by
posting a work request.
The 32-bit memory key is composed of a 24-bit index and an 8-bit
key. The key is changed with each bind, thus allowing more control
over the peer's use of the memory key.
The changes introduced are the following:
* add memory window type enum and a corresponding parameter to ib_alloc_mw.
* type 2 memory window bind work request support.
* create a struct that contains the common part of the bind verb struct
ibv_mw_bind and the bind work request into a single struct.
* add the ib_inc_rkey helper function to advance the tag part of an rkey.
Consumer interface details:
* new device capability flags IB_DEVICE_MEM_WINDOW_TYPE_2A and
IB_DEVICE_MEM_WINDOW_TYPE_2B are added to indicate device support
for these features.
Devices can set either IB_DEVICE_MEM_WINDOW_TYPE_2A or
IB_DEVICE_MEM_WINDOW_TYPE_2B if it supports type 2A or type 2B
memory windows. It can set neither to indicate it doesn't support
type 2 windows at all.
* modify existing provides and consumers code to the new param of
ib_alloc_mw and the ib_mw_bind_info structure
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix compile warning about cast to pointer from integer of different size.
Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Work requests are passed between the host and the firmware with a
"cookie". This cookie is swapped to big-endian when passed to the
firmware and back to host endianness on return. This swapping seems
to be implemented incorrectly. Moreover, the byte swapping triggers
GCC warnings on 32 bit:
drivers/infiniband/hw/cxgb4/cm.c: In function ‘passive_ofld_conn_reply’:
drivers/infiniband/hw/cxgb4/cm.c:2803:12: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
drivers/infiniband/hw/cxgb4/cm.c: In function ‘send_fw_pass_open_req’:
drivers/infiniband/hw/cxgb4/cm.c:2941:16: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
[...]
But byte swapping isn't needed as the firmware doesn't actually touch
the cookie. Dropping byte swapping makes the warnings go away too.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fixe the following types of sparse warnings
- cast to pointer from integer of different size
- cast from pointer to integer of different size
- incorrect type in assignment (different base types)
- incorrect type in argument 1 (different base types)
- cast from restricted __be64
- cast from restricted __be32
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
CPL_ABORT_REQ_RSS can come before TCP connection is established. In
such case peer_abort was trying to remove the hwtid, which was not
inserted. To avoid this we insert the hwtid when we are sure that we
are surely going to send passive accept request.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Don't wakeup threads blocked in rdma_init/rdma_fini if we are on
MPAv2, and want to retry connection with MPAv1.
Stop ep-timer on getting MPA version mismatch, before doing the
abort_connection - in process_mpa_request.
Take care to stop ep-timer in error paths for process_mpa_request.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Only reconnect if the endpoint wasn't freed.
peer_abort() should only attempt to reconnect if the endpoint wasn't
freed. Also remove hwtid from the debugfs idr.
Add missing check for peer2peer in MPAv2 code
Use correct mpa version on reject.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The endpoint timeout logic had a race that could cause an endpoint
object to be freed while it was still on the timedout list. This
can happen if the timer is stopped after it had fired, but before
the timedout thread processed the endpoint timeout.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
With newer firmware, we can get streaming data due to connection
errors before the driver moves the QP out of RTS.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Log AEs even if the QP isn't in RTS. It is useful information.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The driver is currently releasing the last ref on the QP too early.
This can cause bus errors due to HW still fetching WRs from the HW
queue. The fix is to keep a qp ref until we release the HW TID.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
With later firmware, the chances of getting streaming mode data after
we exit RTS is likely, so we don't need to warn for it. The only real
case where we don't expect it is when the QP is in RTS.
Move QP to ERROR when streaming mode data received.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
If a FINI operation fails, then we need to ABORT instead of CLOSE.
Also, if we ABORT due to unexpected STREAMING data, then wake up
anybody blocked in FINI...
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This error means the RDMA connection was knocked out of RDMA mode,
probably due to an error on the connection.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Retries active opens for INUSE errors.
Logs any active ofld_connect_wr error replies.
Sends ofld_connect_wr on same ctrlq. It needs to go on the same control txq as
regular CPL active/passive messages.
Retries on active open replies with EADDRINUSE.
Uses active open fw wr only if active filter region is set.
Adds stat for ofld_connect_wr failures.
This patch also adds debugfs file to show endpoints.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
It establishes passive open connection through firmware work request. Passive
open connection will go through this path as now instead of listening server we
create a server filter which will redirect the incoming SYN packet to the
offload queue. After this driver tries to establish the connection using
firmware work request.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
It enables establishing active open connection using fw_ofld_connection work
request when cpl_act_open_rpl says TCAM full error which may be because
of LE hash collision. Current support is only for IPv4 active open connections.
Sets ntuple bits in active open requests. For T4 firmware greater than 1.4.10.0
ntuple bits are required to be set.
Adds nocong and enable_ecn module parameter options.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
[ Move all FW return values to t4fw_api.h. - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
Use WARN rather than printk followed by WARN_ON(1), for conciseness.
A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@
expression list es;
@@
-printk(
+WARN(1,
es);
-WARN_ON(1);
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In the error path of registering memory when there's a failure to
allocate a chunk from the memory pool, we try to free the same chunk
we just failed to allocate, which will BUG().
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
- mlx4 IB support for SR-IOV
- A couple of SRP initiator fixes
- Batch of nes hardware driver fixes
- Fix for long-standing use-after-free crash in IPoIB
- Other miscellaneous fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJQav4jAAoJEENa44ZhAt0hmL0QAJTuMdSOzYFd/NB38owJCNM2
kz/N1GlBm3z98fIlGo8u+lzgV2qxqZSAzsJsouMeK38KiAX3CL8HKe44A1QvTM6v
dXTNL4JFX24/YF+nlmMY8Av518I9Mkte3BZCnpYkBjVFBWe0ePwoRC/btfBXPDIV
0snq4OtjoBAn00dOOyuZ5PoyY9xf0z4UB0Gple9sM4mzEb8wVWdNDDPOiuPJc6fA
L+gk6HLkZDg54+QswafdKYwpeTq45wIKLmCdS3oUNmppMLVhZY8rECOwzSa+KiTr
/Yo19n+zl+IBlvjQHhmUqGHvdD17PaGlr+TckAsQqmVfXUH5qqpEnkF8FoEK59c5
YA3lVU8Sj4BPhJ7qX54CuN3767mZizakkxCr9iPRzABFTgzWVgcSgCrE8jjx4i0h
Pam+L5bmANFStgmGR8PmXiNgcrCUcEqYHsOWDDAnHa5ekb2nyv1JL1c18hlY9hC3
Xb1YTMZFwvofGza89hBu7oHrMbLOUc5kW2lBpvUn2nlyf3i0F8ISlVbVbNjFA54p
60/jHa2VOQ2CcJUJKnJOk4ajOOEfHnPtMn2q96XJ69Dp8+eSYEO/G+0i1OlChq4h
ClnG0Yp+NkT1o8WXMd7guDR+RsXt+DXIij5TiUWRIqnIlopIsMTRhNH28tMu4jQL
fgN5n987wru91ewdX4gW
=PAcy
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband updates from Roland Dreier:
"First batch of InfiniBand/RDMA changes for the 3.7 merge window:
- mlx4 IB support for SR-IOV
- A couple of SRP initiator fixes
- Batch of nes hardware driver fixes
- Fix for long-standing use-after-free crash in IPoIB
- Other miscellaneous fixes"
This merge also removes a new use of __cancel_delayed_work(), and
replaces it with the regular cancel_delayed_work() that is now irq-safe
thanks to the workqueue updates.
That said, I suspect the sequence in question should probably use
"mod_delayed_work()". I just did the minimal "don't use deprecated
functions" fixup, though.
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (45 commits)
IB/qib: Fix local access validation for user MRs
mlx4_core: Disable SENSE_PORT for multifunction devices
mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs
mlx4_core: Stash PCI ID driver_data in mlx4_priv structure
IB/srp: Avoid having aborted requests hang
IB/srp: Fix use-after-free in srp_reset_req()
IB/qib: Add a qib driver version
RDMA/nes: Fix compilation error when nes_debug is enabled
RDMA/nes: Print hardware resource type
RDMA/nes: Fix for crash when TX checksum offload is off
RDMA/nes: Cosmetic changes
RDMA/nes: Fix for incorrect MSS when TSO is on
RDMA/nes: Fix incorrect resolving of the loopback MAC address
mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem
mlx4_core: Trivial cleanups to driver log messages
mlx4_core: Trivial readability fix: "0X30" -> "0x30"
IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes
mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
mlx4: Paravirtualize Node Guids for slaves
mlx4: Activate SR-IOV mode for IB
...
The variable ret is assigned return values in a couple of places, but
its value is never returned. This patch makes use of the ret variable
so that the caller get correct error codes returned.
The following changes are also introduced:
- The alloc_oc_sq function can return -ENOSYS or -ENOMEM so we want to
get the return value from it.
- Change the label names to improve readability.
Signed-off-by: Emil Goode <emilgoode@gmail.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Conflicts:
drivers/net/team/team.c
drivers/net/usb/qmi_wwan.c
net/batman-adv/bat_iv_ogm.c
net/ipv4/fib_frontend.c
net/ipv4/route.c
net/l2tp/l2tp_netlink.c
The team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.
qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.
With help from Antonio Quartulli.
Signed-off-by: David S. Miller <davem@davemloft.net>
spatch with a semantic match is used to found this.
(http://coccinelle.lip6.fr/)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
cxgb4 driver has duplicate definitions of registers which will be removed. This
patch updates the RDMA/cxgb4 driver accordingly.
Signed-off-by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Reviewed-by: Sivakumar Subramani <sivasu@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sparse correctly warns that if mpa->private_data_size is __be16, then
doing += on it is wrong, even if we do += htons(<something>) -- on a
little endian system, carries will go the wrong way. Fix this up by
doing the addition in native byte order.
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When using rping -c -a 0.0.0.0 with iw_cxgb4, the system crashes when
rdma_connect() is called. ip_dev_find() will return NULL, but pdev is
accessed anyway.
Checking that pdev is NULL and returning -ENODEV prevents the system
from crashing.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This allows querying the QP state before flushing.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Using kfifos for ID management was limiting the number of QPs and
preventing NP384 MPI jobs. So replace it with a simple bitmap
allocator.
Remove IDs from the IDR tables before deallocating them. This bug was
causing the BUG_ON() in insert_handle() to fire because the ID was
getting reused before being removed from the IDR table.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This allows dumping thousands of QPs. Log active open failures of
interest.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Add module option db_fc_threshold which is the count of active QPs
that trigger automatic db flow control mode. Automatically transition
to/from flow control mode when the active qp count crosses
db_fc_theshold.
Add more db debugfs stats
On DB DROP event from the LLD, recover all the iwarp queues.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Use GFP_ATOMIC in _insert_handle() if ints are disabled.
Don't panic if we get an abort with no endpoint found. Just log a
warning.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Get FULL/EMPTY/DROP events from LLD. On FULL event, disable normal
user mode DB rings.
Add modify_qp semantics to allow user processes to call into the
kernel to ring doobells without overflowing.
Add DB Full/Empty/Drop stats.
Mark queues when created indicating the doorbell state.
If we're in the middle of db overflow avoidance, then newly created
queues should start out in this mode.
Bump the C4IW_UVERBS_ABI_VERSION to 2 so the user mode library can
know if the driver supports the kernel mode db ringing.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Log a warning and drop the abort message. Otherwise we will do a
bogus wake_up() and crash.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This fixes a race where an ingress abort fails to wake up the thread
blocked in rdma_init() causing the app to hang.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Function import_ep() is incorrectly using ep->dst instead of the dst
ptr passed in. This causes a crash when accepting new rdma connections
becase ep->dst is not initialized yet.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
stands out; by patch count lots of fixes to the mlx4 driver plus some
cleanups and fixes to the core and other drivers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABCAAGBQJPZ2THAAoJEENa44ZhAt0hMgkP/AwXjwzz6lYlIizytQ5KOH11
CT/VoHLIGIwY31aRhZRHggWLEbJi8akgfh04K9PiSh/muFRPYk5KJDoQEoJV5Odv
12krqtpZeL41YcAPq0wiyP3Ki5AZDCDumzWbOtmxE9m93mVfi3yFTTWDHFo/7SOx
A2kUITdKuZhgBpCFYS23Gs8QTWcMPUlwNlM76edG90BjSySHsK15tbqTUaEOC6Ux
SPG0p0c2YjlZCPmRObrCL65o5LFRVajinZ4rXN7rre4LEU+IuXgW+mQU2Kwy8z6a
oMPE2l4OMSqj270r2PlVK087gGH69nKfLgLQLJiP0Id+KwdYUk7vQcA+Fu0jwjP3
Ci/ahllvB76IiaNbax0vTy2ohCJmilLLotAP46NW0vPR2/KnmVy0Mqk8pXQQddw4
tnAFNnafn/MjTty8GwqyXR/enJZs29ePcSxcYnKjXYRIgG4ldejL2wktgSra8+gb
3/qFag1HRgZzcICkXGpHj7uWJ2KDe++1m6KTb0THUmrjuiel6ATFZpYoeRed3GfS
ZMqpjZvuXM8nA9CrKMkzm5vIiUFF8Qq0hUTLR38F8FiNEhmC08JqH5PqjJ3Z18if
R1yG4W4xmp/0ICHg4zyg6LWV3YsweDD+jSCJpW+KNBvu3HtYb41HIlK+i2TTmtPD
3etEZZRwROAyYKcwN01p
=DbLx
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA changes for the 3.4 merge window from Roland Dreier:
"Nothing big really stands out; by patch count lots of fixes to the
mlx4 driver plus some cleanups and fixes to the core and other
drivers."
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (28 commits)
mlx4_core: Scale size of MTT table with system RAM
mlx4_core: Allow dynamic MTU configuration for IB ports
IB/mlx4: Fix info returned when querying IBoE ports
IB/mlx4: Fix possible missed completion event
mlx4_core: Report thermal error events
mlx4_core: Fix one more static exported function
IB: Change CQE "csum_ok" field to a bit flag
RDMA/iwcm: Reject connect requests if cmid is not in LISTEN state
RDMA/cxgb3: Don't pass irq flags to flush_qp()
mlx4_core: Get rid of redundant ext_port_cap flags
RDMA/ucma: Fix AB-BA deadlock
IB/ehca: Fix ilog2() compile failure
IB: Use central enum for speed instead of hard-coded values
IB/iser: Post initial receive buffers before sending the final login request
IB/iser: Free IB connection resources in the proper place
IB/srp: Consolidate repetitive sysfs code
IB/srp: Use pr_fmt() and pr_err()/pr_warn()
IB/core: Fix SDR rates in sysfs
mlx4: Enforce device max FMR maps in FMR alloc
IB/mlx4: Set bad_wr for invalid send opcode
...
The kernel IB stack uses one enumeration for IB speed, which wasn't
explicitly specified in the verbs header file. Add that enum, and use
it all over the code.
The IB speed/width notation is also used by iWARP and IBoE HW drivers,
which use the convention of rate = speed * width to advertise their
port link rate.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Don't worry about p2p_type if peer2peer itself is not requested in the
first place.
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Now we must provide the IP destination address, and a reference has
to be dropped when we're done with the entry.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Three pieces of code do the same thing, create a l2t entry and then
import this information into the c4iw_ep object.
Create a helper function and call it from these 3 locations instead.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>
To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>
Commit f2c31e32b3 ("net: fix NULL dereferences in check_peer_redir()")
forgot to take care of infiniband uses of dst neighbours.
Many thanks to Marc Aurele who provided a nice bug report and feedback.
Reported-by: Marc Aurele La France <tsi@ualberta.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix logic so that we don't retry with MPAv1 once we have done that
already. Otherwise, we end up retrying with MPAv1 even when its not
needed on getting peer aborts - and this could lead to kernel panic.
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Fix another place in the code where logic dealing with the t4_cqe was
using the wrong QID. This fixes the counting logic so that it tests
against the SQ QID instead of the RQ QID when counting RCQES.
Signed-off by: Jonathan Lallinger <jonathan@ogc.us>
Signed-off by: Steve Wise <swise@ogc.us>
Signed-off-by: Roland Dreier <roland@purestorage.com>
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include <linux/module.h>
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include <linux/module.h>
net: sch_generic remove redundant use of <linux/module.h>
net: inet_timewait_sock doesnt need <linux/module.h>
...
Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits)
mlx4_core: Deprecate log_num_vlan module param
IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
IB/mlx4: Enable 4K mtu for IBoE
RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
RDMA/cxgb4: Serialize calls to CQ's comp_handler
RDMA/cxgb3: Serialize calls to CQ's comp_handler
IB/qib: Fix issue with link states and QSFP cables
IB/mlx4: Configure extended active speeds
mlx4_core: Add extended port capabilities support
IB/qib: Hold links until tuning data is available
IB/qib: Clean up checkpatch issue
IB/qib: Remove s_lock around header validation
IB/qib: Precompute timeout jiffies to optimize latency
IB/qib: Use RCU for qpn lookup
IB/qib: Eliminate divide/mod in converting idx to egr buf pointer
IB/qib: Decode path MTU optimization
IB/qib: Optimize RC/UC code by IB operation
IPoIB: Use the right function to do DMA unmap pages
RDMA/cxgb4: Use correct QID in insert_recv_cqe()
RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
...
They had been getting it implicitly via device.h but we can't
rely on that for the future, due to a pending cleanup so fix
it now.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
QPs need to be moved to error before telling the firwmare to shutdown
the queue. Otherwise, the application can submit WRs that will never
get fetched by the hardware and never flushed by the driver.
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Acked-by: Steve Wise <swsie@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Commit 01e7da6ba5 ("RDMA/cxgb4: Make sure flush CQ entries are
collected on connection close") introduced a potential problem where a
CQ's comp_handler can get called simultaneously from different places
in the iw_cxgb4 driver. This does not comply with
Documentation/infiniband/core_locking.txt, which states that at a
given point of time, there should be only one callback per CQ should
be active.
This problem was reported by Parav Pandit <Parav.Pandit@Emulex.Com>.
Based on discussion between Parav Pandit and Steve Wise, this patch
fixes the above problem by serializing the calls to a CQ's
comp_handler using a spin_lock.
Reported-by: Parav Pandit <Parav.Pandit@Emulex.Com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When creating flushed receive CQEs, set the QPID field in the t4_cqe
to the SQ QID and not the RQ QID. Otherwise the poll code will not
find the correct QP context.
Signed-off by: Jonathan Lallinger <jonathan@ogc.us>
Signed-off by: Steve Wise <swise@ogc.us>
Signed-off-by: Roland Dreier <roland@purestorage.com>
At the time when a peer closes the connection, iw_cxgb4 will not send
a cq event if ibqp.uobject exists. In that case, its possible for a
user application to get blocked in ibv_get_cq_event().
To resolve this, call the cq's comp_handler to unblock any read from
ibv_get_cq_event(). This will trigger userspace to poll the cq and
collect flush status completions for any pending work requests.
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch adds support for Enhanced RDMA Connection Establishment
(draft-ietf-storm-mpa-peer-connect-06), aka MPAv2. Details of draft
can be obtained from:
<http://www.ietf.org/id/draft-ietf-storm-mpa-peer-connect-06.txt>
The patch updates the following functions for initiator perspective:
- send_mpa_request
- process_mpa_reply
- post_terminate for TERM error codes
- destroy_qp for TERM related change
- adds layer/etype/ecode to c4iw_qp_attrs for sending with TERM
- peer_abort for retrying connection attempt with MPA_v1 message
- added c4iw_reconnect function
The patch updates the following functions for responder perspective:
- process_mpa_request
- send_mpa_reply
- c4iw_accept_cr
- passes ird/ord to upper layers
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The iw_cxgb4 module crashes at init time if the T4 card does not
support RDMA. So clean up the init logic to correctly deal with
non-RDMA cards.
- If any RDMA resources are not available, then fail the initialization
logging an info message.
- Clean up properly on initialization failures.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Moves the drivers for the Chelsio chipsets into
drivers/net/ethernet/chelsio/ and the necessary Kconfig and Makefile
changes.
CC: Divy Le Ray <divy@chelsio.com>
CC: Dimitris Michailidis <dm@chelsio.com>
CC: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits)
IB/qib: Defer HCA error events to tasklet
mlx4_core: Bump the driver version to 1.0
RDMA/cxgb4: Use printk_ratelimited() instead of printk_ratelimit()
IB/mlx4: Support PMA counters for IBoE
IB/mlx4: Use flow counters on IBoE ports
IB/pma: Add include file for IBA performance counters definitions
mlx4_core: Add network flow counters
mlx4_core: Fix location of counter index in QP context struct
mlx4_core: Read extended capabilities into the flags field
mlx4_core: Extend capability flags to 64 bits
IB/mlx4: Generate GID change events in IBoE code
IB/core: Add GID change event
RDMA/cma: Don't allow IPoIB port space for IBoE
RDMA: Allow for NULL .modify_device() and .modify_port() methods
IB/qib: Update active link width
IB/qib: Fix potential deadlock with link down interrupt
IB/qib: Add sysfs interface to read free contexts
IB/mthca: Remove unnecessary read of PCI_CAP_ID_EXP
IB/qib: Remove double define
IB/qib: Remove unnecessary read of PCI_CAP_ID_EXP
...
Since printk_ratelimit() shouldn't be used anymore (see comment in
include/linux/printk.h), replace it with printk_ratelimited().
Signed-off-by: Manuel Zerpies <manuel.f.zerpies@ww.stud.uni-erlangen.de>
Signed-off-by: Roland Dreier <roland@purestorage.com>
These methods don't make sense for iWARP devices, so rather than
forcing them to implement stubs, just return -ENOSYS in the core if
the hardware driver doesn't set .modify_device and/or .modify_port.
Signed-off-by: Roland Dreier <roland@purestorage.com>
- fix a race where the driver could end up sending a close_con_req
after an abort_rpl. In c4iw_ep_disconnect(), send abort or close
request with the ep mutex held.
- fix a hang where driver fails to wake up when a connection is reset
during a normal close. Wake up any waiters in the interrupt path,
and correctly cleanup after rdma_fini() failures.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Remove left-over code from T3 that limited MR sizes to 32b.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Memory allocated for user CQs gets rounded up to the next page
boundary. And after rounding, we recalculate the resulting IQ depth
and we need to make sure we don't exceed the HW limits.
This bug can result a much smaller CQ allocated than was expected if
the HW size field is exceeded, resulting in CQ overflow failures.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
RDMA/cma: Save PID of ID's owner
RDMA/cma: Add support for netlink statistics export
RDMA/cma: Pass QP type into rdma_create_id()
RDMA: Update exported headers list
RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h>
RDMA/nes: Add a check for strict_strtoul()
RDMA/cxgb3: Don't post zero-byte read if endpoint is going away
RDMA/cxgb4: Use completion objects for event blocking
IB/srp: Fix integer -> pointer cast warnings
IB: Add devnode methods to cm_class and umad_class
IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required
IB/uverbs: Add devnode method to set path/mode
RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node
RDMA: Add netlink infrastructure
RDMA: Add error handling to ib_core_init()
There exists a race condition when using wait_queue_head_t objects
that are declared on the stack. This was being done in a few places
where we are sending work requests to the FW and awaiting replies, but
we don't have an endpoint structure with an embedded c4iw_wr_wait
struct. So the code was allocating it locally on the stack. Bad
design. The race is:
1) thread on cpuX declares the wait_queue_head_t on the stack, then
posts a firmware WR with that wait object ptr as the cookie to be
returned in the WR reply. This thread will proceed to block in
wait_event_timeout() but before it does:
2) An interrupt runs on cpuY with the WR reply. fw6_msg() handles
this and calls c4iw_wake_up(). c4iw_wake_up() sets the condition
variable in the c4iw_wr_wait object to TRUE and will call
wake_up(), but before it calls wake_up():
3) The thread on cpuX calls c4iw_wait_for_reply(), which calls
wait_event_timeout(). The wait_event_timeout() macro checks the
condition variable and returns immediately since it is TRUE. So
this thread never blocks/sleeps. The function then returns
effectively deallocating the c4iw_wr_wait object that was on the
stack.
4) So at this point cpuY has a pointer to the c4iw_wr_wait object
that is no longer valid. Further its pointing to a stack frame
that might now be in use by some other context/thread. So cpuY
continues execution and calls wake_up() on a ptr to a wait object
that as been effectively deallocated.
This race, when it hits, can cause a crash in wake_up(), which I've
seen under heavy stress. It can also corrupt the referenced stack
which can cause any number of failures.
The fix:
Use struct completion, which supports on-stack declarations.
Completions use a spinlock around setting the condition to true and
the wake up so that steps 2 and 4 above are atomic and step 3 can
never happen in-between.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits)
macvlan: fix panic if lowerdev in a bond
tg3: Add braces around 5906 workaround.
tg3: Fix NETIF_F_LOOPBACK error
macvlan: remove one synchronize_rcu() call
networking: NET_CLS_ROUTE4 depends on INET
irda: Fix error propagation in ircomm_lmp_connect_response()
irda: Kill set but unused variable 'bytes' in irlan_check_command_param()
irda: Kill set but unused variable 'clen' in ircomm_connect_indication()
rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport()
be2net: Kill set but unused variable 'req' in lancer_fw_download()
irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication()
atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined.
rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer().
rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler()
rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection()
rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window()
pkt_sched: Kill set but unused variable 'protocol' in tc_classify()
isdn: capi: Use pr_debug() instead of ifdefs.
tg3: Update version to 3.119
tg3: Apply rx_discards fix to 5719/5720
...
Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c
as per Davem.
Manual merge of arch/powerpc/kernel/smp.c and add missing scheduler_ipi()
call to arch/powerpc/platforms/cell/interrupt.c
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
A few more EEH fixes:
c4iw_wait_for_reply(): detect fatal EEH condition on timeout and
return an error.
The iw_cxgb4 driver was only calling ib_deregister_device() on an EEH
event followed by a ib_register_device() when the device was
reinitialized. However, the RDMA core doesn't allow multiple
iterations of register/deregister by the provider. See
drivers/infiniband/core/sysfs.c: ib_device_unregister_sysfs() where
the kobject ref is held until the device is deallocated in
ib_deallocate_device(). Calling deregister adds this kobj reference,
and then a subsequent register call will generate a WARN_ON() from the
kobject subsystem because the kobject is being initialized but is
already initialized with the ref held.
So the provider must deregister and dealloc when resetting for an EEH
event, then alloc/register to re-initialize. To do this, we cannot
use the device ptr as our ULD handle since it will change with each
reallocation. This commit adds a ULD context struct which is used as
the ULD handle, and then contains the device pointer and other state
needed.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The driver was never really waiting for RDMA_WR/FINI completions
because the condition variable used to determine if the completion
happened was never reset, and this condition variable is reused for
both connection setup and teardown. This causes various driver
crashes under heavy loads due to releasing resources too early.
The fix is to use atomic bits to correctly reset the condition
immediately after the completion is detected.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>