linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-15 15:41:58 +00:00

Author	SHA1	Message	Date
Tejun Heo	e8c8d1bc06	idr: remove MAX_IDR_MASK and move left MAX_IDR_* into idr.c MAX_IDR_MASK is another weirdness in the idr interface. As idr covers whole positive integer range, it's defined as 0x7fffffff or INT_MAX. Its usage in idr_find(), idr_replace() and idr_remove() is bizarre. They basically mask off the sign bit and operate on the rest, so if the caller, by accident, passes in a negative number, the sign bit will be masked off and the remaining part will be used as if that was the input, which is worse than crashing. The constant is visible in idr.h and there are several users in the kernel. * drivers/i2c/i2c-core.c:i2c_add_numbered_adapter() Basically used to test if adap->nr is a negative number which isn't -1 and returns -EINVAL if so. idr_alloc() already has negative @start checking (w/ WARN_ON_ONCE), so this can go away. * drivers/infiniband/core/cm.c:cm_alloc_id() drivers/infiniband/hw/mlx4/cm.c:id_map_alloc() Used to wrap cyclic @start. Can be replaced with max(next, 0). Note that this type of cyclic allocation using idr is buggy. These are prone to spurious -ENOSPC failure after the first wraparound. * fs/super.c:get_anon_bdev() The ID allocated from ida is masked off before being tested whether it's inside valid range. ida allocated ID can never be a negative number and the masking is unnecessary. Update idr_() functions to fail with -EINVAL when negative @id is specified and update other MAX_IDR_MASK users as described above. This leaves MAX_IDR_MASK without any user, remove it and relocate other MAX_IDR_ constants to lib/idr.c. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jean Delvare <khali@linux-fr.org> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: "Marciniszyn, Mike" <mike.marciniszyn@intel.com> Cc: Jack Morgenstein <jackm@dev.mellanox.co.il> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Wolfram Sang <wolfram@the-dreams.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:20 -08:00
Tejun Heo	80f22b4430	IB/qib: convert to idr_alloc() Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Mike Marciniszyn <infinipath@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:17 -08:00
Tejun Heo	cffcd59f15	IB/ocrdma: convert to idr_alloc() Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:17 -08:00
Tejun Heo	6a9200603d	IB/mlx4: convert to idr_alloc() Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jack Morgenstein <jackm@dev.mellanox.co.il> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:17 -08:00
Tejun Heo	5c213f8641	IB/ipath: convert to idr_alloc() Convert to the much saner new idr interface. [yongjun_wei@trendmicro.com.cn: use GFP_NOWAIT under spin lock] Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:17 -08:00
Tejun Heo	cbbbce1de2	IB/ehca: convert to idr_alloc() Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Cc: Christoph Raisch <raisch@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:16 -08:00
Tejun Heo	e8d4dd606b	IB/cxgb4: convert to idr_alloc() Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:16 -08:00
Tejun Heo	6fa780095f	IB/cxgb3: convert to idr_alloc() Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:16 -08:00
Tejun Heo	ac1d68296b	IB/amso1100: convert to idr_alloc() Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Cc: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:16 -08:00
Tejun Heo	3b069c5d85	IB/core: convert to idr_alloc() Convert to the much saner new idr interface. v2: Mike triggered WARN_ON() in idr_preload() because send_mad(), which may be used from non-process context, was calling idr_preload() unconditionally. Preload iff @gfp_mask has __GFP_WAIT. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reported-by: "Marciniszyn, Mike" <mike.marciniszyn@intel.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-27 19:10:16 -08:00
Linus Torvalds	d895cb1af1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle and ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...	2013-02-26 20:16:07 -08:00
Linus Torvalds	70a3a06d01	Main batch of InfiniBand/RDMA changes for 3.9: - SRP error handling fixes from Bart Van Assche - Implementation of memory windows for mlx4 from Shani Michaeli - Lots of cxgb4 HW driver fixes from Vipul Pandya - Make iSER work for virtual functions, other fixes from Or Gerlitz - Fix for bug in qib HW driver from Mike Marciniszyn - IPoIB fixes from me, Itai Garbi, Shlomo Pongratz, Yan Burman - Various cleanups and warning fixes from Julia Lawall, Paul Bolle, Wei Yongjun -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABCAAGBQJRLPNoAAoJEENa44ZhAt0h0bMP/1xqlgDR3DGdoUoV4yTd6O9G Uccdb7og5o5tedVo+8xZz01y4at99P8FWW4hJ4os6k8n6yKoBVwo7qjN+BOR+JG5 q8+O+ynUSIg4tGrb5sXcMnKXAXbw/vkftMWYNA41cbrM24DTYzB/2SLpvhwbFoTT tdQc2tgz5QaDqzWbagyCR4+k/IgO+Llrz/RvIdtz4dsTnTDogN7QCoSffX8n/Lpb DxtyXK4sdl3DAtd3CsIdsB/TSMb3RkHLCoSvmrWlLnqMdsbRxVnCVfBm4BOghW3J Y2K3joRoCjjIZSRNs/i0FMFkT/jbCXg1oXg9ek/a6YFNcgyk7z8iGyXrRY7fOnno 8U2SfxJ69YpVYeJr+DSjaeHcmjsaYU7NN7JPxzvPKcJOIsxQJ/euJDXAXau3lEQY o9/p4JsGty0WHi1NanyygvghvBAoP1C5/59Sl4bHH5gckPyJT1kinPSCTT76YXGS WkSHg2mDhiJHy7Pnuy85iZldPoy2/5z09/I4aGMeL+8kUZbD4iFqzXIJU0HTsAim EONoRXDhIcN5DNVSVH1ig6nJ2a7Vhov4Z0r/vB8P4KhslBcqFwf2leC0eCoe5mNt SzcKhqosZDXoL8AwzpntzGIOid8pWmHbUx/PgIcoVXPjtl0h2ULNIFoYYyMZ3cyU AyN2tSiUZVddTV1/aKGL =RAQw -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband update from Roland Dreier: "Main batch of InfiniBand/RDMA changes for 3.9: - SRP error handling fixes from Bart Van Assche - Implementation of memory windows for mlx4 from Shani Michaeli - Lots of cxgb4 HW driver fixes from Vipul Pandya - Make iSER work for virtual functions, other fixes from Or Gerlitz - Fix for bug in qib HW driver from Mike Marciniszyn - IPoIB fixes from me, Itai Garbi, Shlomo Pongratz, Yan Burman - Various cleanups and warning fixes from Julia Lawall, Paul Bolle, Wei Yongjun" * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (41 commits) IB/mlx4: Advertise MW support IB/mlx4: Support memory window binding mlx4: Implement memory windows allocation and deallocation mlx4_core: Enable memory windows in {INIT, QUERY}_HCA mlx4_core: Disable memory windows for virtual functions IPoIB: Free ipoib neigh on path record failure so path rec queries are retried IB/srp: Fail I/O requests if the transport is offline IB/srp: Avoid endless SCSI error handling loop IB/srp: Avoid sending a task management function needlessly IB/srp: Track connection state properly IB/mlx4: Remove redundant NULL check before kfree IB/mlx4: Fix compiler warning about uninitialized 'vlan' variable IB/mlx4: Convert is_xxx variables in build_mlx_header() to bool IB/iser: Enable iser when FMRs are not supported IB/iser: Avoid error prints on EAGAIN registration failures IB/iser: Use proper define for the commands per LUN value advertised to SCSI ML IB/uverbs: Implement memory windows support in uverbs IB/core: Add "type 2" memory windows support mlx4_core: Propagate MR deregistration failures to caller mlx4_core: Rename MPT-related functions to have mpt_ prefix ...	2013-02-26 11:41:08 -08:00
Roland Dreier	ef4e359d9b	Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'misc', 'mlx4', 'qib' and 'srp' into for-next	2013-02-26 09:17:56 -08:00
Shani Michaeli	b425388dc1	IB/mlx4: Advertise MW support Indicate memory windows support through device capabilities, kernel verb entries and the relevant uverbs command mask entries. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-25 10:44:32 -08:00
Shani Michaeli	6ff63e1940	IB/mlx4: Support memory window binding * Implement memory windows binding in mlx4_ib_post_send. * Implement mlx4_ib_bind_mw by deferring to mlx4_ib_post_send. * Rename MLX4_WQE_FMR_PERM_* flags to MLX4_WQE_FMR_AND_BIND_PERM_*, indicating that they are used both for fast registration work requests, and for memory window bind work requests. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-25 10:44:32 -08:00
Shani Michaeli	804d6a89a5	mlx4: Implement memory windows allocation and deallocation Implement MW allocation and deallocation in mlx4_core and mlx4_ib. Pass down the enable bind flag when registering memory regions. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-25 10:44:32 -08:00
Roland Dreier	f72dd56690	IPoIB: Free ipoib neigh on path record failure so path rec queries are retried If IPoIB fails to look up a path record (eg if it tries during an SM failover when one SM is dead but the new one hasn't taken over yet), the driver ends up with a neighbour structure but no address handle (AH). There's no mechanism to recover from this: any further packets sent to this destination will be silently dumped in ipoib_start_xmit(). Fix this by freeing the neighbour structures when a path rec query fails, so that the next packet queued to be sent will trigger a new path record query. Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-25 09:42:15 -08:00
Bart Van Assche	2ce19e72f4	IB/srp: Fail I/O requests if the transport is offline If an SRP target is no longer reachable and srp_reset_host() fails to reconnect then ib_srp will invoke scsi_remove_host(). That function will invoke __scsi_remove_device() for each LUN. And that last function will change the device state from SDEV_TRANSPORT_OFFLINE into SDEV_CANCEL. Certain user space software, e.g. older versions of multipathd, continue queueing I/O to SCSI devices that are in the SDEV_CANCEL state. If these I/O requests are submitted as SG_IO that means that the REQ_PREEMPT flag will be set and hence that these requests will be passed to srp_queuecommand(). These requests will time out. If new requests are queued fast enough from user space these active requests will prevent __scsi_remove_device() to finish. Avoid this by failing I/O requests in the SDEV_CANCEL state if the transport is offline. Introduce a new variable to keep track of the transport state instead of failing requests if (!target->connected \|\| target->qp_in_error), so that the SCSI error handler has a chance to retry commands after a transport layer failure occurred. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@vger.kernel.org> # 3.8 Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-25 09:31:14 -08:00
Bart Van Assche	c7c4e7ff80	IB/srp: Avoid endless SCSI error handling loop If a SCSI command times out it is passed to the SCSI error handler. The SCSI error handler will try to abort the commands that timed out. If aborting fails, a device reset will be attempted. If the device reset also fails a host reset will be attempted. If the host reset also fails the whole procedure will be repeated. srp_abort() and srp_reset_device() fail for a QP in the error state. srp_reset_host() fails after host removal has started. Hence if the SCSI error handler gets invoked after host removal has started and with the QP in the error state an endless loop will be triggered. Modify the SCSI error handling functions in ib_srp as follows: - Abort SCSI commands properly even if the QP is in the error state. - Make srp_reset_host() reset SCSI requests even after host removal has already started or if reconnecting fails. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dave@thedillows.org> Cc: <stable@vger.kernel.org> # 3.8 Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-25 09:31:14 -08:00
Bart Van Assche	3780d1f088	IB/srp: Avoid sending a task management function needlessly Do not send a task management function if sending will fail anyway because either there is no RDMA/RC connection or the QP is in the error state. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dave@thedillows.org> Cc: <stable@vger.kernel.org> # 3.8 Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-25 09:31:14 -08:00
Bart Van Assche	e1b2f13aba	IB/srp: Track connection state properly Remove an assignment that incorrectly overwrites the connection state update by srp_connect_target(). Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dave@thedillows.org> Cc: <stable@vger.kernel.org> # 3.8 Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-25 09:31:14 -08:00
Syam Sidhardhan	c89d127128	IB/mlx4: Remove redundant NULL check before kfree kfree on NULL pointer is a no-op. Signed-off-by: Syam Sidhardhan <s.syam@samsung.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-25 09:24:32 -08:00
Paul Bolle	57d88cffc8	IB/mlx4: Fix compiler warning about uninitialized 'vlan' variable Building qp.o triggers this gcc warning: drivers/infiniband/hw/mlx4/qp.c: In function ‘mlx4_ib_post_send’: drivers/infiniband/hw/mlx4/qp.c:1862:62: warning: ‘vlan’ may be used uninitialized in this function [-Wmaybe-uninitialized] drivers/infiniband/hw/mlx4/qp.c:1752:6: note: ‘vlan’ was declared here Looking at the code it is clear 'vlan' is only set and used if 'is_eth' is non-zero. But by initializing 'vlan' to 0xffff, on gcc (Ubuntu 4.7.2-22ubuntu1) 4.7.2 on x86-64 at least, we fix the warning, and the compiler was already setting 'vlan' to 0 in the generated code, so there's no real downside. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> [ Get rid of unnecessary move of 'is_vlan' initialization. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-25 09:17:13 -08:00
Roland Dreier	a29bec1241	IB/mlx4: Convert is_xxx variables in build_mlx_header() to bool Matches the way they're used, and actually lets at least x86-64 generate better code: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-38 (-38) function old new delta mlx4_ib_post_send 4416 4378 -38 Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-25 09:02:03 -08:00
Al Viro	496ad9aa8e	new helper: file_inode(file) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-22 23:31:31 -05:00
Or Gerlitz	5525d210fd	IB/iser: Enable iser when FMRs are not supported Reuse the "SG unaligned for FMR" driver flow to make the initiator functional when running over driver instance which doesn't support FMRs, such as a mlx4 virtual function. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-22 00:22:30 -08:00
Or Gerlitz	819a087316	IB/iser: Avoid error prints on EAGAIN registration failures Under IO/CPU stress its possible that the FMR pool might not have a free FMR mapping element for iSER to use because of incomplete background unmapping processing. In that case we get -EAGAIN and the IO is pushed back to the SCSI layer which soon retries it. No need to be so verbose about that. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-22 00:22:29 -08:00
Or Gerlitz	b96e4abaae	IB/iser: Use proper define for the commands per LUN value advertised to SCSI ML ISER_DEF_CMD_PER_LUN was meant to be ISCSI_DEF_XMIT_CMDS_MAX, not plain 128 Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-22 00:22:29 -08:00
Linus Torvalds	9afa3195b9	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "Assorted tiny fixes queued in trivial tree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (22 commits) DocBook: update EXPORT_SYMBOL entry to point at export.h Documentation: update top level 00-INDEX file with new additions ARM: at91/ide: remove unsused at91-ide Kconfig entry percpu_counter.h: comment code for better readability x86, efi: fix comment typo in head_32.S IB: cxgb3: delay freeing mem untill entirely done with it net: mvneta: remove unneeded version.h include time: x86: report_lost_ticks doesn't exist any more pcmcia: avoid static analysis complaint about use-after-free fs/jfs: Fix typo in comment : 'how may' -> 'how many' of: add missing documentation for of_platform_populate() btrfs: remove unnecessary cur_trans set before goto loop in join_transaction sound: soc: Fix typo in sound/codecs treewide: Fix typo in various drivers btrfs: fix comment typos Update ibmvscsi module name in Kconfig. powerpc: fix typo (utilties -> utilities) of: fix spelling mistake in comment h8300: Fix home page URL in h8300/README xtensa: Fix home page URL in Kconfig ...	2013-02-21 17:40:58 -08:00
Shani Michaeli	6b52a12bc3	IB/uverbs: Implement memory windows support in uverbs The existing user/kernel uverbs API has IB_USER_VERBS_CMD_ALLOC/DEALLOC_MW. Implement these calls, along with destroying user memory windows during process cleanup. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-21 11:59:09 -08:00
Shani Michaeli	7083e42ee2	IB/core: Add "type 2" memory windows support This patch enhances the IB core support for Memory Windows (MWs). MWs allow an application to have better/flexible control over remote access to memory. Two types of MWs are supported, with the second type having two flavors: Type 1 - associated with PD only Type 2A - associated with QPN only Type 2B - associated with PD and QPN Applications can allocate a MW once, and then repeatedly bind the MW to different ranges in MRs that are associated to the same PD. Type 1 windows are bound through a verb, while type 2 windows are bound by posting a work request. The 32-bit memory key is composed of a 24-bit index and an 8-bit key. The key is changed with each bind, thus allowing more control over the peer's use of the memory key. The changes introduced are the following: * add memory window type enum and a corresponding parameter to ib_alloc_mw. * type 2 memory window bind work request support. * create a struct that contains the common part of the bind verb struct ibv_mw_bind and the bind work request into a single struct. * add the ib_inc_rkey helper function to advance the tag part of an rkey. Consumer interface details: * new device capability flags IB_DEVICE_MEM_WINDOW_TYPE_2A and IB_DEVICE_MEM_WINDOW_TYPE_2B are added to indicate device support for these features. Devices can set either IB_DEVICE_MEM_WINDOW_TYPE_2A or IB_DEVICE_MEM_WINDOW_TYPE_2B if it supports type 2A or type 2B memory windows. It can set neither to indicate it doesn't support type 2 windows at all. * modify existing provides and consumers code to the new param of ib_alloc_mw and the ib_mw_bind_info structure Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-21 11:51:45 -08:00
Shani Michaeli	6108372070	mlx4_core: Propagate MR deregistration failures to caller MR deregistration fails when memory windows are bound to the MR. Handle such failures by propagating them to the caller ULP. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-21 11:38:43 -08:00
Shani Michaeli	aee38fadd2	IB/mlx4_ib: Remove local invalidate segment unused fields Remove unused fields from the local invalidate WQE segment structure. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Shani Michaeli <shanim@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-21 11:36:29 -08:00
Itai Garbi	5a2815f03c	IPoIB: Don't attempt to release resources on error flow If the ipoib client info isn't found on the _remove_one callback, we must not attempt to scan the returned null list. Found by Coverity. Signed-off-by: Itai Garbi <igarbi@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-19 08:21:36 -08:00
Yan Burman	4b48680b55	IPoIB: Add version and firmware info to ethtool reporting Implement version info as well as report firmware version and bus info of the underlying IB HW device. Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-19 08:21:36 -08:00
Shlomo Pongratz	9d1ad66e3e	IPoIB: Fix ipoib_neigh hashing to use the correct daddr octets The hash function introduced in commit `b63b70d877` ("IPoIB: Use a private hash table for path lookup in xmit path") was designd to use the 3 octets of the IPoIB HW address that holds the remote QPN. However, this currently isn't the case on little-endian machines, because the the code there uses the flags part (octet[0]) and not the last octet of the QPN (octet[3]). Fix this. The fix caused a checkpatch warning on line over 80 characters, to solve that changed the name of the temp variable that holds the daddr. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-19 08:21:35 -08:00
Julia Lawall	6950a235b8	IB/mlx4: Adjust duplicate test Delete successive tests to the same location. The code tested the result of a previous allocation, that itself was already tested. It is changed to test the result of the most recent allocation. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @s exists@ local idexpression y; expression x,e; @@ if ( \(x == NULL\\|IS_ERR(x)\\|y != 0\) ) { ... when forall return ...; } ... when != \(y = e\\|y += e\\|y -= e\\|y \|= e\\|y &= e\\|y++\\|y--\\|&y\) when != \(XT_GETPAGE(...,y)\\|WMI_CMD_BUF(...)\) if ( \(x == NULL\\|IS_ERR(x)\\|y != 0\) ) { ... when forall return ...; } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-15 15:23:26 -08:00
Dan Carpenter	cab66d1273	IB/mlx4: Fix bug unwinding on error in mlx4_ib_init_sriov() We have to decrement "i" before calling mlx4_ib_free_demux_ctx() or we free something that wasn't allocated. That's fine for free_pv_object() but it would lead to a NULL dereference calling mlx4_ib_free_demux_ctx(). The null dereference is because ->tun is NULL when we check: if (!ctx->tun[i]) Also we didn't free ->sriov.demux[0] so it was a small leak. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-15 15:22:26 -08:00
Wei Yongjun	8ab10f7537	RDMA/amso1100: Use module_pci_driver() to simplify the code Use the module_pci_driver() macro to make the code simpler by eliminating module_init and module_exit calls. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Steve WIse <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-15 15:16:10 -08:00
Stefan Hasko	b23523d858	RDMA/cxgb4: Fix cast warning Fix compile warning about cast to pointer from integer of different size. Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-15 15:13:43 -08:00
Mike Marciniszyn	bcc9b67a5b	IB/qib: Fix QP locate/remove race remove_qp() can execute concurrently with a qib_lookup_qpn() on another CPU, which in of itself, is ok, given the RCU locking. The issue is that remove_qp() NULLs out the qp->next field so that a qib_lookup_qpn() might fail to find a qp if it occurs after the one that is being deleted. This is a momentary issue and subsequent qib_lookup_qpn() calls would find the qp's since the search restarts from the bucket head. At scale, the issue might causes dropped packets and unnecessary retransmissions. The fix just deletes the qp->next NULL assignment to prevent the remove_qp() from hiding qp's from qib_lookup_qpn(). Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-14 17:04:18 -08:00
Paul Bolle	710a31102b	RDMA/cxgb4: "cookie" can stay in host endianness Work requests are passed between the host and the firmware with a "cookie". This cookie is swapped to big-endian when passed to the firmware and back to host endianness on return. This swapping seems to be implemented incorrectly. Moreover, the byte swapping triggers GCC warnings on 32 bit: drivers/infiniband/hw/cxgb4/cm.c: In function ‘passive_ofld_conn_reply’: drivers/infiniband/hw/cxgb4/cm.c:2803:12: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] drivers/infiniband/hw/cxgb4/cm.c: In function ‘send_fw_pass_open_req’: drivers/infiniband/hw/cxgb4/cm.c:2941:16: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] [...] But byte swapping isn't needed as the firmware doesn't actually touch the cookie. Dropping byte swapping makes the warnings go away too. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-14 15:55:05 -08:00
Vipul Pandya	ef5d6355ed	RDMA/cxgb4: Address sparse warnings Fixe the following types of sparse warnings - cast to pointer from integer of different size - cast from pointer to integer of different size - incorrect type in assignment (different base types) - incorrect type in argument 1 (different base types) - cast from restricted __be64 - cast from restricted __be32 Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-14 15:51:58 -08:00
Vipul Pandya	b3de6cfebc	RDMA/cxgb4: Insert hwtid in pass_accept_req instead in pass_establish CPL_ABORT_REQ_RSS can come before TCP connection is established. In such case peer_abort was trying to remove the hwtid, which was not inserted. To avoid this we insert the hwtid when we are sure that we are surely going to send passive accept request. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-14 15:51:58 -08:00
Vipul Pandya	7c0a33d611	RDMA/cxgb4: Don't wakeup threads for MPAv2 Don't wakeup threads blocked in rdma_init/rdma_fini if we are on MPAv2, and want to retry connection with MPAv1. Stop ep-timer on getting MPA version mismatch, before doing the abort_connection - in process_mpa_request. Take care to stop ep-timer in error paths for process_mpa_request. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-14 15:51:57 -08:00
Vipul Pandya	fe7e0a4dd0	RDMA/cxgb4: Don't reconnect on abort for mpa_rev 1 Only reconnect if the endpoint wasn't freed. peer_abort() should only attempt to reconnect if the endpoint wasn't freed. Also remove hwtid from the debugfs idr. Add missing check for peer2peer in MPAv2 code Use correct mpa version on reject. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-14 15:51:57 -08:00
Vipul Pandya	1ec779cc29	RDMA/cxgb4: Fix endpoint timeout race condition The endpoint timeout logic had a race that could cause an endpoint object to be freed while it was still on the timedout list. This can happen if the timer is stopped after it had fired, but before the timedout thread processed the endpoint timeout. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-14 15:51:57 -08:00
Vipul Pandya	e8e5b9278b	RDMA/cxgb4: Only log rx_data warnings if cpl status is non-zero With newer firmware, we can get streaming data due to connection errors before the driver moves the QP out of RTS. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-14 15:51:56 -08:00
Vipul Pandya	04236df2a5	RDMA/cxgb4: Always log async errors Log AEs even if the QP isn't in RTS. It is useful information. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-14 15:51:56 -08:00
Vipul Pandya	325abead6c	RDMA/cxgb4: Keep QP referenced until TID released The driver is currently releasing the last ref on the QP too early. This can cause bus errors due to HW still fetching WRs from the HW queue. The fix is to keep a qp ref until we release the HW TID. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-14 15:51:56 -08:00
Vipul Pandya	1557967bf9	RDMA/cxgb4: Display streaming mode error only if detected in RTS With later firmware, the chances of getting streaming mode data after we exit RTS is likely, so we don't need to warn for it. The only real case where we don't expect it is when the QP is in RTS. Move QP to ERROR when streaming mode data received. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-14 15:51:55 -08:00
Vipul Pandya	91e9c07195	RDMA/cxgb4: Abort connections when moving to ERROR state If a FINI operation fails, then we need to ABORT instead of CLOSE. Also, if we ABORT due to unexpected STREAMING data, then wake up anybody blocked in FINI... Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-14 15:51:55 -08:00
Vipul Pandya	55abf8df0a	RDMA/cxgb4: Abort connections that receive unexpected streaming mode data This error means the RDMA connection was knocked out of RDMA mode, probably due to an error on the connection. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-14 15:51:55 -08:00
David S. Miller	fd5023111c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Synchronize with 'net' in order to sort out some l2tp, wireless, and ipv6 GRE fixes that will be built on top of in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-08 18:02:14 -05:00
Roland Dreier	cbdba97a0f	Merge branches 'ipoib', 'mlx4' and 'qib' into for-next	2013-02-05 09:45:25 -08:00
Mike Marciniszyn	d359f35430	IB/qib: Fix for broken sparse warning fix Commit `1fb9fed6d4` ("IB/qib: Fix QP RCU sparse warning") broke QP hash list deletion in qp_remove() badly. This patch restores the former for loop behavior, while still fixing the sparse warnings. Cc: <stable@vger.kernel.org> Reviewed-by: Gary Leshner <gary.s.leshner@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-05 09:43:09 -08:00
Shlomo Pongratz	7e5a90c25f	IPoIB: Fix crash due to skb double destruct After commit `b13912bbb4` ("IPoIB: Call skb_dst_drop() once skb is enqueued for sending"), using connected mode and running multithreaded iperf for long time, ie iperf -c <IP> -P 16 -t 3600 results in a crash. After the above-mentioned patch, the driver is calling skb_orphan() and skb_dst_drop() after calling post_send() in ipoib_cm.c::ipoib_cm_send() (also in ipoib_ib.c::ipoib_send()) The problem with this is, as is written in a comment in both routines, "it's entirely possible that the completion handler will run before we execute anything after the post_send()." This leads to running the skb cleanup routines simultaneously in two different contexts. The solution is to always perform the skb_orphan() and skb_dst_drop() before queueing the send work request. If an error occurs, then it will be no different than the regular case where dev_free_skb_any() in the completion path, which is assumed to be after these two routines. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2013-02-05 09:35:06 -08:00
Jesper Juhl	fe194f19da	IB: cxgb3: delay freeing mem untill entirely done with it Sure, it's just the pointer value we use, but the coverity checker complains about a use-after-free bug and it really does seem cleaner to delay freeing until we are entirely done with the memory. So, rearrange the code to move the kfree() later untill we are completely done. Trivial and harmless, but nice IMHO. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-01-29 10:52:20 +01:00
David S. Miller	4b87f92259	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: Documentation/networking/ip-sysctl.txt drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c Both conflicts were simply overlapping context. A build fix for qlcnic is in here too, simply removing the added devinit annotations which no longer exist. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-15 15:05:59 -05:00
Jiri Pirko	aaeb6cdfa5	remove init of dev->perm_addr in drivers perm_addr is initialized correctly in register_netdevice() so to init it in drivers is no longer needed. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-08 18:00:48 -08:00
Jiri Pirko	7826d43f2d	ethtool: fix drvinfo strings set in drivers Use strlcpy where possible to ensure the string is \0 terminated. Use always sizeof(string) instead of 32, ETHTOOL_BUSINFO_LEN and custom defines. Use snprintf instead of sprint. Remove unnecessary inits of ->fw_version Remove unnecessary inits of drvinfo struct. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-06 21:06:31 -08:00
Jiri Pirko	7f6e7101df	nes: remove usage of dev->master Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-04 13:31:50 -08:00
Greg Kroah-Hartman	1e6d9abea7	Drivers: infinband: remove __dev* attributes. CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, __devinitdata, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Tom Tucker <tom@opengridcomputing.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Cc: Christoph Raisch <raisch@de.ibm.com> Cc: Mike Marciniszyn <infinipath@intel.com> Cc: Faisal Latif <faisal.latif@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-03 15:57:15 -08:00
Linus Torvalds	184e251661	Second batch of InfiniBand/RDMA changes for 3.8: - cxgb4 changes to fix lookup engine hash collisions - mlx4 changes to make flow steering usable - fix to IPoIB to avoid pinning dst reference for too long -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABCAAGBQJQ1NdhAAoJEENa44ZhAt0hjb0P/i0tL4Ux+PvqG/Phh2gaZaQR evoi3bw6tCYFzWEJPfur2AJ63svKjyrSrfpvgEZjhthDdYORjIv2Dw2Je1qkOSrf 5tJtSp3r9D0y35SE/DxNnzlgua9heBqphPlOGpjcKdN83KP7XIyVG2SGyJiEeLza owefPx/48jZr8hsw7LB2DlZmNUbVWK00o+pXa/VsUQX/dlIU5hyihAzBjtkwJyT2 xtiyu9oqXGuv1JW/a16ooPGDaETDLJ1G50NndadUZYWFWj36VrAwW6hOAK3oOikf Qa4z3gJVzpSdaC1kiuxERj7GxlRpVUJY0IgHEoMTVrexOz1IsFEP8KEfGLkAYwzB jjuXh+Z2+QU5OOO3un0nINRGxKZUSD8Scoa222GBwGnWuCCq68APx2UGTkVkhWon FyjhF13WJRbElg2oXzI1cg9lJNv2pf10hXhiy2qdO6tDElVXVQk+KRiDdcNtxS0H FYYh3og/DjFwp18j+FVLA9r5AiPuVV5DjNnlwBNjTTMc9RwlOrX/6oCK2kZN24rZ l+Nr0gv+h6MAjTPBPYLUP2bsY6wYt5n566mfLea/lir9YeI+Q3PL2sDdGZ7C6xHH S4pRW95leP4pEFpHqYg8z3QKPswYqzokgbUSHxg2TVYrV01RC75axXCh0Q90q3RE oLP8GDYod2okxhAU2AyN =lDXt -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull more infiniband changes from Roland Dreier: "Second batch of InfiniBand/RDMA changes for 3.8: - cxgb4 changes to fix lookup engine hash collisions - mlx4 changes to make flow steering usable - fix to IPoIB to avoid pinning dst reference for too long" * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA/cxgb4: Fix bug for active and passive LE hash collision path RDMA/cxgb4: Fix LE hash collision bug for passive open connection RDMA/cxgb4: Fix LE hash collision bug for active open connection mlx4_core: Allow choosing flow steering mode mlx4_core: Adjustments to Flow Steering activation logic for SR-IOV mlx4_core: Fix error flow in the flow steering wrapper mlx4_core: Add QPN enforcement for flow steering rules set by VFs cxgb4: Add LE hash collision bug fix path in LLD driver cxgb4: Add T4 filter support IPoIB: Call skb_dst_drop() once skb is enqueued for sending	2012-12-21 16:40:26 -08:00
Roland Dreier	d72623b665	Merge branches 'cxgb4', 'ipoib' and 'mlx4' into for-next	2012-12-19 23:03:43 -08:00
Vipul Pandya	793dad94e7	RDMA/cxgb4: Fix bug for active and passive LE hash collision path Retries active opens for INUSE errors. Logs any active ofld_connect_wr error replies. Sends ofld_connect_wr on same ctrlq. It needs to go on the same control txq as regular CPL active/passive messages. Retries on active open replies with EADDRINUSE. Uses active open fw wr only if active filter region is set. Adds stat for ofld_connect_wr failures. This patch also adds debugfs file to show endpoints. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-12-19 23:03:12 -08:00
Vipul Pandya	1cab775c3e	RDMA/cxgb4: Fix LE hash collision bug for passive open connection It establishes passive open connection through firmware work request. Passive open connection will go through this path as now instead of listening server we create a server filter which will redirect the incoming SYN packet to the offload queue. After this driver tries to establish the connection using firmware work request. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-12-19 23:03:11 -08:00
Vipul Pandya	5be78ee924	RDMA/cxgb4: Fix LE hash collision bug for active open connection It enables establishing active open connection using fw_ofld_connection work request when cpl_act_open_rpl says TCAM full error which may be because of LE hash collision. Current support is only for IPv4 active open connections. Sets ntuple bits in active open requests. For T4 firmware greater than 1.4.10.0 ntuple bits are required to be set. Adds nocong and enable_ecn module parameter options. Signed-off-by: Vipul Pandya <vipul@chelsio.com> [ Move all FW return values to t4fw_api.h. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-12-19 23:02:43 -08:00
Roland Dreier	b13912bbb4	IPoIB: Call skb_dst_drop() once skb is enqueued for sending Currently, IPoIB delays collecting send completions for TX packets in order to batch work more efficiently. It does skb_orphan() right after queuing the packets so that destructors run early, to avoid problems like holding socket send buffers for too long (since we might not collect a send completion until a long time after the packet is actually sent). However, IPoIB clears IFF_XMIT_DST_RELEASE because it actually looks at skb_dst() to update the PMTU when it gets a too-long packet. This means that the packets sitting in the TX ring with uncollected send completions are holding a reference on the dst. We've seen this lead to pathological behavior with respect to route and neighbour GC. The easy fix for this is to call skb_dst_drop() when we call skb_orphan(). Also, give packets sent via connected mode (CM) the same skb_orphan() / skb_dst_drop() treatment that packets sent via datagram mode get. Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-12-19 09:16:43 -08:00
Linus Torvalds	16e024f30c	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc update from Benjamin Herrenschmidt: "The main highlight is probably some base POWER8 support. There's more to come such as transactional memory support but that will wait for the next one. Overall it's pretty quiet, or rather I've been pretty poor at picking things up from patchwork and reviewing them this time around and Kumar no better on the FSL side it seems..." * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (73 commits) powerpc+of: Rename and fix OF reconfig notifier error inject module powerpc: mpc5200: Add a3m071 board support powerpc/512x: don't compile any platform DIU code if the DIU is not enabled powerpc/mpc52xx: use module_platform_driver macro powerpc+of: Export of_reconfig_notifier_[register,unregister] powerpc/dma/raidengine: add raidengine device powerpc/iommu/fsl: Add PAMU bypass enable register to ccsr_guts struct powerpc/mpc85xx: Change spin table to cached memory powerpc/fsl-pci: Add PCI controller ATMU PM support powerpc/86xx: fsl_pcibios_fixup_bus requires CONFIG_PCI drivers/virt: the Freescale hypervisor driver doesn't need to check MSR[GS] powerpc/85xx: p1022ds: Use NULL instead of 0 for pointers powerpc: Disable relocation on exceptions when kexecing powerpc: Enable relocation on during exceptions at boot powerpc: Move get_longbusy_msecs into hvcall.h and remove duplicate function powerpc: Add wrappers to enable/disable relocation on exceptions powerpc: Add set_mode hcall powerpc: Setup relocation on exceptions for bare metal systems powerpc: Move initial mfspr LPCR out of __init_LPCR powerpc: Add relocation on exception vector handlers ...	2012-12-18 09:58:09 -08:00
Linus Torvalds	5bd665f28d	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull target updates from Nicholas Bellinger: "It has been a very busy development cycle this time around in target land, with the highlights including: - Kill struct se_subsystem_dev, in favor of direct se_device usage (hch) - Simplify reservations code by combining SPC-3 + SCSI-2 support for virtual backends only (hch) - Simplify ALUA code for virtual only backends, and remove left over abstractions (hch) - Pass sense_reason_t as return value for I/O submission path (hch) - Refactor MODE_SENSE emulation to allow for easier addition of new mode pages. (roland) - Add emulation of MODE_SELECT (roland) - Fix bug in handling of ExpStatSN wrap-around (steve) - Fix bug in TMR ABORT_TASK lookup in qla2xxx target (steve) - Add WRITE_SAME w/ UNMAP=0 support for IBLOCK backends (nab) - Convert ib_srpt to use modern target_submit_cmd caller + drop legacy ioctx->kref usage (nab) - Convert ib_srpt to use modern target_submit_tmr caller (nab) - Add link_magic for fabric allow_link destination target_items for symlinks within target_core_fabric_configfs.c code (nab) - Allocate pointers in instead of full structs for config_group->default_groups (sebastian) - Fix 32-bit highmem breakage for FILEIO (sebastian) All told, hch was able to shave off another ~1K LOC by killing the se_subsystem_dev abstraction, along with a number of PR + ALUA simplifications. Also, a nice patch by Roland is the refactoring of MODE_SENSE handling, along with the addition of initial MODE_SELECT emulation support for virtual backends. Sebastian found a long-standing issue wrt to allocation of full config_group instead of pointers for config_group->default_group[] setup in a number of areas, which ends up saving memory with big configurations. He also managed to fix another long-standing BUG wrt to broken 32-bit highmem support within the FILEIO backend driver. Thank you again to everyone who contributed this round!" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits) target/iscsi_target: Add NodeACL tags for initiator group support target/tcm_fc: fix the lockdep warning due to inconsistent lock state sbp-target: fix error path in sbp_make_tpg() sbp-target: use simple assignment in tgt_agent_rw_agent_state() iscsi-target: use kstrdup() for iscsi_param target/file: merge fd_do_readv() and fd_do_writev() target/file: Fix 32-bit highmem breakage for SGL -> iovec mapping target: Add link_magic for fabric allow_link destination target_items ib_srpt: Convert TMR path to target_submit_tmr ib_srpt: Convert I/O path to target_submit_cmd + drop legacy ioctx->kref target: Make spc_get_write_same_sectors return sector_t target/configfs: use kmalloc() instead of kzalloc() for default groups target/configfs: allocate only 6 slots for dev_cg->default_groups target/configfs: allocate pointers instead of full struct for default_groups target: update error handling for sbc_setup_write_same() iscsit: use GFP_ATOMIC under spin lock iscsi_target: Remove redundant null check before kfree target/iblock: Forward declare bio helpers target: Clean up flow in transport_check_aborted_status() target: Clean up logic in transport_put_cmd() ...	2012-12-15 14:25:10 -08:00
Roland Dreier	01e0336598	Merge branch 'nes' into for-next	2012-12-08 20:45:48 -08:00
Tatyana Nikolova	7d9c199a55	RDMA/nes: Fix for crash when registering zero length MR for CQ Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-12-08 00:31:03 -08:00
Tatyana Nikolova	7bfcfa51c3	RDMA/nes: Fix for terminate timer crash The terminate timer needs to be initialized just once. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-12-08 00:31:02 -08:00
Tatyana Nikolova	00ad255d17	RDMA/nes: Fix for BUG_ON due to adding already-pending timer To avoid nes tcp_timer crash for SMP architectures, add_timer is replaced with mod_timer. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-12-08 00:31:02 -08:00
Roland Dreier	fb57e1dbbd	Merge branch 'srp' into for-next	2012-11-30 17:41:04 -08:00
Bart Van Assche	dc1bdbd9b8	IB/srp: Allow SRP disconnect through sysfs Make it possible to disconnect the IB RC connection used by the SRP protocol to communicate with a target. Have the SRP transport layer create a sysfs "delete" attribute for initiator drivers that support this functionality. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-30 17:40:33 -08:00
Vu Pham	55d93898a1	IB/srp: send disconnect request without waiting for CM timewait exit Now that SRP recreates the CM ID, QP, and CQ for each connection, there is no need to wait for the timewait state to complete. Signed-off-by: Vu Pham <vu@mellanox.com> Signed-off-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-30 17:40:32 -08:00
Ishai Rabinovitz	73aa89ed9e	IB/srp: destroy and recreate QP and CQs when reconnecting HW QP FATAL errors persist over a reset operation, but we can recover from that by recreating the QP and associated CQs for each connection. Creating a new QP/CQ also completely forecloses any possibility of getting stale completions or packets on the new connection. Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> [ updated to current code from OFED, cleaned up commit message ] Signed-off-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-30 17:40:32 -08:00
Bart Van Assche	ef6c49d87c	IB/srp: Eliminate state SRP_TARGET_DEAD Only queue removal work after having changed the target state into SRP_TARGET_REMOVED and not if that state was already equal to SRP_TARGET_REMOVED. That allows us to remove the state SRP_TARGET_DEAD. Add a call to srp_disconnect_target() in srp_remove_target() -- due to previous changes it is now safe to invoke that function even if the IB connection has already been disconnected. This change allows us to replace the target removal code in srp_remove_one() by an (indirect) call to srp_remove_target(). Rename srp_target_port.work into srp_target_port.remove_work to reflect its usage. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-30 17:40:31 -08:00
Bart Van Assche	ee12d6a80c	IB/srp: Introduce the helper function srp_remove_target() Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-30 17:40:31 -08:00
Bart Van Assche	294c875a65	IB/srp: Suppress superfluous error messages Keep track of the connection state. Only report QP errors while connected. Only invoke ib_send_cm_dreq() when connected so that invoking srp_disconnect_target() after having received a DREQ does not cause an error message to be printed. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-30 17:40:31 -08:00
Bart Van Assche	4f0af69799	IB/srp: Process all error completions If the RDMA RC connection is closed, tell the SCSI mid-layer to terminate all pending commands instead of only the first. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-30 17:40:31 -08:00
Bart Van Assche	948d1e889e	IB/srp: Introduce srp_handle_qp_err() Introduce the function srp_handle_qp_err(), change the type of qp_in_error from int into bool and move the initialization of that variable from srp_reconnect_target() to srp_connect_target(). Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-30 17:40:30 -08:00
Bart Van Assche	224db15743	IB/srp: Simplify SCSI error handling Since scsi_remove_host() has been modified so that SCSI error handling functions will no longer be invoked after scsi_remove_host() returns, the test at the start of srp_send_tsk_mgmt() is now superfluous. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-30 17:40:30 -08:00
Bart Van Assche	f371823120	IB/srp: Keep processing commands during host removal Some SCSI upper layer drivers, e.g. sd, issue SCSI commands from inside scsi_remove_host() (see the sd_shutdown() call in sd_remove()). Make sure that these commands have a chance to reach the SCSI device. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-30 17:40:30 -08:00
Bart Van Assche	09be70a238	IB/srp: Eliminate state SRP_TARGET_CONNECTING Block the SCSI host while reconnecting instead of representing the reconnection activity as a distinct SRP target state. This allows us to eliminate the target state SRP_TARGET_CONNECTING. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-30 17:40:29 -08:00
Bart Van Assche	c9b03c1ae5	IB/srp: Increase block layer timeout Increase the block layer timeout for disks so that it is above the InfiniBand transport layer timeout. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-30 17:40:29 -08:00
Roland Dreier	78c90247af	Merge branches 'cma' and 'mlx4' into for-next	2012-11-29 12:16:43 -08:00
shefty	63f05be2c0	RDMA/cm: Change return value from find_gid_port() Problem reported by Dan Carpenter <dan.carpenter@oracle.com>: The patch `3c86aa70bf`: "RDMA/cm: Add RDMA CM support for IBoE devices" from Oct 13, 2010, leads to the following warning: net/sunrpc/xprtrdma/svc_rdma_transport.c:722 svc_rdma_create() error: passing non neg 1 to ERR_PTR This bug would result in a NULL dereference. svc_rdma_create() is supposed to return ERR_PTRs or valid pointers, but instead it returns ERR_PTRs, valid pointers and 1. The call tree is: svc_rdma_create() => rdma_bind_addr() => cma_acquire_dev() => find_gid_port() rdma_bind_addr() should return a valid errno. Fix this by having find_gid_port() also return a valid errno. If we can't find the specified GID on a given port, return -EADDRNOTAVAIL, rather than -EAGAIN, to better indicate the error. We also drop using the special return value of '1' and instead pass through the error returned by the underlying verbs call. On such errors, rather than aborting the search, we simply continue to check the next device/port. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-29 12:16:29 -08:00
Jack Morgenstein	ceb7decb36	IB/mlx4: Fix spinlock order to avoid lockdep warnings lockdep warns about taking a hard-irq-unsafe lock (sriov->id_map_lock) inside a hard-irq-safe lock (sriov->going_down_lock). Since id_map_lock is never taken in the interrupt context, we can simply reverse the order of taking the two spinlocks, thus avoiding the warning and the depencency. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-29 12:14:45 -08:00
Nicholas Bellinger	3e4f574857	ib_srpt: Convert TMR path to target_submit_tmr This patch converts the TMR path in srpt_handle_tsk_mgmt() to use target_submit_tmr() with TARGET_SCF_ACK_KREF flag usage. v2: Drop ununused res in target_submit_tmr (Fengguang.Wu) Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Roland Dreier <roland@kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-11-28 11:20:28 -08:00
Nicholas Bellinger	9474b04313	ib_srpt: Convert I/O path to target_submit_cmd + drop legacy ioctx->kref This patch converts the main srpt_handle_cmd() I/O path to use modern target_submit_cmd() with TARGET_SCF_ACK_KREF flag usage. This includes dropping the original internal ioctx->kref + srpt_put_send_ioctx() usage in favor of target_put_sess_cmd() w/ se_cmd_t->cmd_kref within ib_srpt response callbacks. It also updates srpt_abort_cmd() to call target_put_sess_cmd() for completion of aborted commands, and adds target_wait_for_sess_cmds() into srpt_release_channel_work() to allow outstanding I/O to complete during session shutdown. Also, go ahead and update srpt_handle_tsk_mgmt() to make the remaining transport_init_se_cmd() to setup the ioctx->cmd with se_tmr_req. Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Roland Dreier <roland@kernel.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-11-28 01:10:59 -08:00
Roland Dreier	d0951d2133	Merge branches 'cxgb4', 'misc', 'mlx4', 'nes' and 'uapi' into for-next	2012-11-26 11:08:41 -08:00
Julia Lawall	5107c2a3d1	RDMA/cxgb3: use WARN Use WARN rather than printk followed by WARN_ON(1), for conciseness. A simplified version of the semantic patch that makes this transformation is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression list es; @@ -printk( +WARN(1, es); -WARN_ON(1); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-26 11:08:16 -08:00
Julia Lawall	76f267b7ad	RDMA/cxgb4: use WARN Use WARN rather than printk followed by WARN_ON(1), for conciseness. A simplified version of the semantic patch that makes this transformation is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression list es; @@ -printk( +WARN(1, es); -WARN_ON(1); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-26 11:07:18 -08:00
Or Gerlitz	08ff32352d	mlx4: 64-byte CQE/EQE support ConnectX-3 devices can use either 64- or 32-byte completion queue entries (CQEs) and event queue entries (EQEs). Using 64-byte EQEs/CQEs performs better because each entry is aligned to a complete cacheline. This patch queries the HCA's capabilities, and if it supports 64-byte CQEs and EQES the driver will configure the HW to work in 64-byte mode. The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ. Since this mode is global, userspace (libmlx4) must be updated to work with the configured CQE size, and guests using SR-IOV virtual functions need to know both EQE and CQE size. In case one of the 64-byte CQE/EQE capabilities is activated, the patch makes sure that older guest drivers that use the QUERY_DEV_FUNC command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that they need an update to be able to work with the PPF. This is done by changing the returned pf_context_behaviour not to be zero any more. In case none of these capabilities is activated that value remains zero and older guest drivers can run OK. The SRIOV related flow is as follows 1. the PPF does the detection of the new capabilities using QUERY_DEV_CAP command. 2. the PPF activates the new capabilities using INIT_HCA. 3. the VF detects if the PPF activated the capabilities using QUERY_HCA, and if this is the case activates them for itself too. Note that the VF detects that it must be aware to the new PF behaviour using QUERY_FUNC_CAP. Steps 1 and 2 apply also for native mode. User space notification is done through a new field introduced in struct mlx4_ib_ucontext which holds device capabilities for which user space must take action. This changes the binary interface so the ABI towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only when needed i.e. only when the driver does use 64-byte CQEs or future device capabilities which must be in sync by user space. This practice allows to work with unmodified libmlx4 on older devices (e.g A0, B0) which don't support 64-byte CQEs. In order to keep existing systems functional when they update to a newer kernel that contains these changes in VF and userspace ABI, a module parameter enable_64b_cqe_eqe must be set to enable 64-byte mode; the default is currently false. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-26 10:19:17 -08:00
Benjamin Herrenschmidt	2a859ab07b	Merge branch 'merge' into next Merge my own merge branch to get various fixes from there and upstream, especially the hvc console tty refcouting fixes which which testing is quite a bit harder...	2012-11-26 09:23:57 +11:00
Alan Cox	c9795bd708	RDMA/amsol1100: Fix missing break Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-22 00:52:55 -08:00
Alan Cox	5390f86796	IB/ipath: Remove unreachable code Signed-off-by: Alan Cox <alan@linux.intel.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-22 00:50:51 -08:00
Julia Lawall	079abea6a3	RDMA/nes: Use WARN() Use WARN() rather than printk() followed by WARN_ON(1), for conciseness. A simplified version of the semantic patch that makes this transformation is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression list es; @@ -printk( +WARN(1, es); -WARN_ON(1); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> [ Remove extra KERN_ERR from WARN() format. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-22 00:49:15 -08:00
Tatyana Nikolova	cecdcd5f24	RDMA/nes: Fix for incorrect multicast address in the perfect filter table Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-22 00:49:14 -08:00
Tatyana Nikolova	fc8d7547b1	RDMA/nes: Fix for sending fpdus in order to hardware Locking fix to prevent race conditions. Fpdus (per qp) need to be forwarded to hardware in the order of their sequence numbers. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Donald Wood <Donald.E.Wood@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-22 00:49:14 -08:00
Tatyana Nikolova	bff3976bef	RDMA/nes: Fix for unlinking skbs from empty list Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-22 00:49:13 -08:00
Tatyana Nikolova	3127e4ea54	RDMA/nes: Fix incorrect address of IP header Fix for incorrect ip header address when forwarding fpdus to hardware. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-11-22 00:49:13 -08:00
Ian Munsie	cca55d9ddf	powerpc: Move get_longbusy_msecs into hvcall.h and remove duplicate function I am going to use this in the next patch, better to have this code in one place rather than three. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-11-15 15:08:07 +11:00
Christoph Hellwig	de103c93af	target: pass sense_reason as a return value Pass the sense reason as an explicit return value from the I/O submission path instead of storing it in struct se_cmd and using negative return values. This cleans up a lot of the code pathes, and with the sparse annotations for the new sense_reason_t type allows for much better error checking. (nab: Convert spc_emulate_modesense + spc_emulate_modeselect to use sense_reason_t with Roland's MODE SELECT changes) Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-11-06 20:55:46 -08:00
Roland Dreier	1e3474d1de	Merge branches 'cxgb4' and 'mlx4' into for-next	2012-10-23 09:03:49 -07:00
Thadeu Lima de Souza Cascardo	32c631f9f2	RDMA/cxgb4: Don't free chunk that we have failed to allocate In the error path of registering memory when there's a failure to allocate a chunk from the memory pool, we try to free the same chunk we just failed to allocate, which will BUG(). Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-10-22 11:05:00 -07:00
Eli Cohen	bef83ed92c	IB/mlx4: Synchronize cleanup of MCGs in MCG paravirtualization A client re-register event invokes cleanup of all MCGs. This is required to protect against misbehaved guests leading to corruption of join/leave database. However, since cleaning up the MCGs is a heavy operation, it is pushed to a work queue for further processing. Client re-register is also propagated to ULPs (e.g IPoIB). However, since the cleanup is performed in a workqueue, the ULP could leave and re-join groups before the cleanup occurs. In this case, when the cleanup takes place, it prunes the (newly-joined) MCGs and the ULP is left without actual MCGs while believing it joined them. Fix this by setting the flushing flag before invoking the cleanup task and clearing it after flushing is complete. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-10-18 10:29:02 -07:00
Jack Morgenstein	2c75d2ccb6	IB/mlx4: Fix QP1 P_Key processing in the Primary Physical Function (PPF) In the MAD paravirtualization code, one of the checks performed when forwarding QP1 (GSI) packets from wire to slave was a P_Key check: the P_Key received in the MAD must be present in the guest's paravirtualized P_Key table, and at least one of the (packet P_Key, guest P_Key) must be a full-membership P_Key. However, if everyone involved has only limited membership in the default P_Key, then packets sent by full-member remote hosts arrive at the PPF but are not passed on to the VFs with the current P_Key1 check. Fix this as follows: 1. Don't care if P_Key received over wire is full or not. If it successfully passed HW checks on the real QP1, then simply pass it to guest regardless of whether the guest has full or limited membership in its P_Key table. 2. If the guest (including paravirtualized master) has both full and limited P_Key forms in its table, preferentially pass the paravirtualized P_Key index of the full P_Key form in the tunnel header. 3. In the multicast join flow (mlx4/mcg.c), use the index for the default P_Key (wherever it is located) in replies generated from within the mcg module (previously, P_Key index 0 was used in all cases). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-10-18 10:29:02 -07:00
Doug Ledford	8a095030f7	IB/mlx4: Fix build error on platforms where UL is not 64 bits Line 110 uses UL as a compiler cast for the 0x constant, but it's not large enough to hold a 64-bit value on a 32-bit arch. Signed-off-by: Doug Ledford <dledford@redhat.com> [ Use "-1" instead of "FFFFFFFFFFFFFFFFULL". - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-10-18 10:29:01 -07:00
Linus Torvalds	a188e7e93a	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull scsi target updates from Nicholas Bellinger: "Things have been calm for the most part with no new fabric drivers in flight for v3.7 (we're up to eight now !), so this update is primarily focused on addressing a few long-standing items within target-core and iscsi-target fabric code. The highlights include: - target: Simplify fabric sense data length handling (roland) - qla2xxx: Fix endianness of task management response code (roland) - target: fix truncation of mode data, support zero allocation length (paolo) - target: Properly support zero-length commands in normal processing path (paolo) - iscsi-target: Correctly set 0xffffffff field within ISCSI_OP_REJECT PDU (ronnie + nab) - iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG demo-mode (ronnie + nab) - target/file: Re-enable optional fd_buffered_io=1 operation (nab + hch) - iscsi-target: Add MaxXmitDataSegmenthLength forr target -> initiator MDRSL declaration (nab) - target: Add target_submit_cmd_map_sgls for SGL fabric memory passthrough (nab + hch) - tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls (hch + nab) - tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls (nab + hch) The last series for adding a new target_submit_cmd_map_sgls() fabric caller (as requested by hch) that accepts pre-allocated SGL memory (using existing logic), along with converting tcm_loop + tcm_vhost has only been in -next for the last days, but has gotten enough review +testing and is clear enough a mechanical change that I think it's reasonable to merge for -rc1 code. Thanks again to everyone who contributed this round! Extra special thanks to Roland (PureStorage) for tracking down the qla2xxx target TMR response code endian issue, and to Paolo (Redhat) for resolving the long standing zero-length CDB issues within target-core between virtual and pSCSI backends." * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (44 commits) iscsi-target: Bump defaults for nopin_timeout + nopin_response_timeout values iscsit: proper endianess conversions iscsit: use the itt_t abstract type iscsit: add missing endianess conversion in iscsit_check_inaddr_any iscsit: remove incorrect unlock in iscsit_build_sendtargets_resp iscsit: mark various functions static target/iscsi: precedence bug in iscsit_set_dataout_sequence_values() target/usb-gadget: strlen() doesn't count the terminator target/usb-gadget: remove duplicate initialization tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls target: Add control CDB READ payload zero work-around tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls target: Add target_submit_cmd_map_sgls for SGL fabric memory passthrough iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG demo-mode iscsi-target: Change iscsi_target_seq_pdu_list.c to honor MaxXmitDataSegmentLength iscsi-target: Add MaxXmitDataSegmentLength connection recovery check iscsi-target: Convert incoming PDU payload checks to MaxXmitDataSegmentLength iscsi-target: Enable MaxXmitDataSegmentLength operation in login path iscsi-target: Add base MaxXmitDataSegmentLength code target/file: Re-enable optional fd_buffered_io=1 operation ...	2012-10-10 19:52:19 +09:00
David S. Miller	8dd9117cc7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux Pulled mainline in order to get the UAPI infrastructure already merged before I pull in David Howells's UAPI trees for networking. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-09 13:14:32 -04:00
Konstantin Khlebnikov	314e51b985	mm: kill vma flag VM_RESERVED and mm->reserved_vm counter A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA, currently it lost original meaning but still has some effects: \| effect \| alternative flags -+------------------------+--------------------------------------------- 1\| account as reserved_vm \| VM_IO 2\| skip in core dump \| VM_IO, VM_DONTDUMP 3\| do not merge or expand \| VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP 4\| do not mlock \| VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP This patch removes reserved_vm counter from mm_struct. Seems like nobody cares about it, it does not exported into userspace directly, it only reduces total_vm showed in proc. Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND \| VM_DONTDUMP. remap_pfn_range() and io_remap_pfn_range() set VM_IO\|VM_DONTEXPAND\|VM_DONTDUMP. remap_vmalloc_range() set VM_DONTEXPAND \| VM_DONTDUMP. [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:19 +09:00
Linus Torvalds	ca4da6948b	Second batch of changes for the 3.7 merge window: - Late-breaking fix for IPoIB on mlx4 SR-IOV VFs. - Fix for IPoIB build breakage with CONFIG_INFINIBAND_IPOIB_CM=n (new netlink config changes are to blame). - Make sure retry count values are in range in RDMA CM. - A few nes hardware driver fixes and cleanups. - Have iSER initiator use >1 interrupt vectors if available. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABCAAGBQJQbkIkAAoJEENa44ZhAt0hrLEP/3jNpwR+6/rbVxcPkmVVITsn yUz9S3V7loJqQqlHt7ktphgcoUpar+xs2C8DbCyVupyee4K3zd7Lw+wOVO2OJ16Y +gE8PZB6mSYOVaI0JW26Qqe2gcQ5DLZ4ic6E8gEiuyjNo3ig3MZl17ZffTI+5lXi igR+Qnsy2bm18JI6aumSeKYa17f7EmOK63jaIp0jc95vrtWke3jRRs6+bmmDLV7q 4ZxCP7ckBTRFVkisXlWZ95MmUbg/lhWosZGdJwjRYs+LW3tzgY18OQZgbw6FJKpJ 0OouAO51UMKcizfg7ikwUIw6/apaCFXXv/pH3rIeq18qiJeKcDsIXseY7sFRboHR fIPpfhHENokcxtlOReaw7hnlyMuwrLiYbF6TjPN9t8dWTNbwe+4W0K9G60fu1wmk qjaUTBCCuDF4KtKqCxP01cZzXyjJrworw5p+Nc1wxds+JeyKRCpVRrfYep8vOgft z6KJjQGDF16MvXmhF1dD9VBpmaMCOA4NDhfa0IcgHPTOLc/7Nbe5NTD95InlzM3Q /C0CPj+4J4kzEMnTBHVJvv1NeD3G1RQD80bce6ViTbgOUg9pmeGHkLWe420h7xT4 u/LBnwM1dcqY7BDOnn3ycYoJgr5OCQZOpLcNWjTXv4t51mBOYtM6AJ6IBIvkOmSz QjEyiDEEvT2K7c/rhYOz =9Jv4 -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband changes from Roland Dreier: "Second batch of changes for the 3.7 merge window: - Late-breaking fix for IPoIB on mlx4 SR-IOV VFs. - Fix for IPoIB build breakage with CONFIG_INFINIBAND_IPOIB_CM=n (new netlink config changes are to blame). - Make sure retry count values are in range in RDMA CM. - A few nes hardware driver fixes and cleanups. - Have iSER initiator use >1 interrupt vectors if available." * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA/cma: Check that retry count values are in range IB/iser: Add more RX CQs to scale out processing of SCSI responses RDMA/nes: Bump the version number of nes driver RDMA/nes: Remove unused module parameter "send_first" RDMA/nes: Remove unnecessary if-else statement RDMA/nes: Add missing break to switch. mlx4_core: Adjust flow steering attach wrapper so that IB works on SR-IOV VFs IPoIB: Fix build with CONFIG_INFINIBAND_IPOIB_CM=n	2012-10-07 17:19:49 +09:00
Gao feng	809d5fc9bf	infiniband: pass rdma_cm module to netlink_dump_start set netlink_dump_control.module to avoid panic. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-07 00:30:56 -04:00
Linus Torvalds	5f3d2f2e1a	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Benjamin Herrenschmidt: "Some highlights in addition to the usual batch of fixes: - 64TB address space support for 64-bit processes by Aneesh Kumar - Gavin Shan did a major cleanup & re-organization of our EEH support code (IBM fancy PCI error handling & recovery infrastructure) which paves the way for supporting different platform backends, along with some rework of the PCIe code for the PowerNV platform in order to remove home made resource allocations and instead use the generic code (which is possible after some small improvements to it done by Gavin). - Uprobes support by Ananth N Mavinakayanahalli - A pile of embedded updates from Freescale folks, including new SoC and board supports, more KVM stuff including preparing for 64-bit BookE KVM support, ePAPR 1.1 updates, etc..." Fixup trivial conflicts in drivers/scsi/ipr.c * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (146 commits) powerpc/iommu: Fix multiple issues with IOMMU pools code powerpc: Fix VMX fix for memcpy case driver/mtd:IFC NAND:Initialise internal SRAM before any write powerpc/fsl-pci: use 'Header Type' to identify PCIE mode powerpc/eeh: Don't release eeh_mutex in eeh_phb_pe_get powerpc: Remove tlb batching hack for nighthawk powerpc: Set paca->data_offset = 0 for boot cpu powerpc/perf: Sample only if SIAR-Valid bit is set in P7+ powerpc/fsl-pci: fix warning when CONFIG_SWIOTLB is disabled powerpc/mpc85xx: Update interrupt handling for IFC controller powerpc/85xx: Enable USB support in p1023rds_defconfig powerpc/smp: Do not disable IPI interrupts during suspend powerpc/eeh: Fix crash on converting OF node to edev powerpc/eeh: Lock module while handling EEH event powerpc/kprobe: Don't emulate store when kprobe stwu r1 powerpc/kprobe: Complete kprobe and migrate exception frame powerpc/kprobe: Introduce a new thread flag powerpc: Remove unused __get_user64() and __put_user64() powerpc/eeh: Global mutex to protect PE tree powerpc/eeh: Remove EEH PE for normal PCI hotplug ...	2012-10-06 03:16:12 +09:00
Fengguang Wu	125c4c706b	idr: rename MAX_LEVEL to MAX_IDR_LEVEL To avoid name conflicts: drivers/video/riva/fbdev.c:281:9: sparse: preprocessor token MAX_LEVEL redefined While at it, also make the other names more consistent and add parentheses. [akpm@linux-foundation.org: repair fallout] [sfr@canb.auug.org.au: IB/mlx4: fix for MAX_ID_MASK to MAX_IDR_MASK name change] Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at> Cc: walter harms <wharms@bfs.de> Cc: Glauber Costa <glommer@parallels.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:04:56 +09:00
Roland Dreier	56040f07b2	Merge branches 'cma', 'ipoib', 'iser', 'mlx4' and 'nes' into for-next	2012-10-04 19:12:22 -07:00
Sean Hefty	4ede178a5e	RDMA/cma: Check that retry count values are in range The retry_count and rnr_retry_count connection parameters are both 3-bit values. Check that the values are in range and reduce if they're not. This fixes a problem reported by Doug Ledford <dledford@redhat.com> that resulted in the userspace rping test (part of the librdmacm samples) failing to run over Intel IB HCAs. Signed-off-by: Sean Hefty <sean.hefty@intel.com> [ Use min_t() to avoid warnings about type mismatch. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-10-04 19:11:54 -07:00
Alex Tabachnik	5a33a66949	IB/iser: Add more RX CQs to scale out processing of SCSI responses RX/TX CQs will now be selected from a per HCA pool. For the RX flow this has the effect of using different interrupt vectors when using low level drivers (such as mlx4) that map the "vector" param provided by the ULP on CQ creation to a dedicated IRQ/MSI-X vector. This allows the RX flow processing of IO responses to be distributed across multiple CPUs. QPs (--> iSER sessions) are assigned to CQs in round robin order using the CQ with the minimum number of sessions attached to it. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-10-03 21:26:49 -07:00
Tatyana Nikolova	cf9fd75cbc	RDMA/nes: Bump the version number of nes driver Bumpthe version of the nes driver to reflect recent fixes. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-10-03 14:27:59 -07:00
Tatyana Nikolova	a378e3a3fb	RDMA/nes: Remove unused module parameter "send_first" Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-10-03 14:27:59 -07:00
Tatyana Nikolova	a67b078cf6	RDMA/nes: Remove unnecessary if-else statement Remove unnecessary if-else statement -- we do the same thing either way. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-10-03 14:27:58 -07:00
Tatyana Nikolova	d5fb476a10	RDMA/nes: Add missing break to switch. Static code analyzer cppcheck points out a missing break. Reported-by: David Binderman <dcb314@hotmail.com> Addresses: <https://bugzilla.kernel.org/show_bug.cgi?id=47671> Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-10-03 14:27:58 -07:00
Roland Dreier	71d9c5f9e6	IPoIB: Fix build with CONFIG_INFINIBAND_IPOIB_CM=n With the new netlink support in commit `862096a8bb` ("IB/ipoib: Add more rtnl_link_ops callbacks") we need ipoib_set_mode() to be available even if connected mode isn't built. Move the function from ipoib_cm.c to ipoib_main.c (and make a few CM-related macros available unconditonally). This fixes the build error drivers/built-in.o: In function 'ipoib_changelink': ipoib_netlink.c:(.text+0x6a5fc9): undefined reference to 'ipoib_set_mode' ipoib_netlink.c:(.text+0x6a5fe3): undefined reference to 'ipoib_set_mode' when CONFIG_INFINIBAND_IPOIB_CM isn't set. Reported-by: Randy Dunlap <rdunlap@xenotime.net> Reported-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-10-02 21:33:41 -07:00
Linus Torvalds	aab174f0df	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs update from Al Viro: - big one - consolidation of descriptor-related logics; almost all of that is moved to fs/file.c (BTW, I'm seriously tempted to rename the result to fd.c. As it is, we have a situation when file_table.c is about handling of struct file and file.c is about handling of descriptor tables; the reasons are historical - file_table.c used to be about a static array of struct file we used to have way back). A lot of stray ends got cleaned up and converted to saner primitives, disgusting mess in android/binder.c is still disgusting, but at least doesn't poke so much in descriptor table guts anymore. A bunch of relatively minor races got fixed in process, plus an ext4 struct file leak. - related thing - fget_light() partially unuglified; see fdget() in there (and yes, it generates the code as good as we used to have). - also related - bits of Cyrill's procfs stuff that got entangled into that work; _not_ all of it, just the initial move to fs/proc/fd.c and switch of fdinfo to seq_file. - Alex's fs/coredump.c spiltoff - the same story, had been easier to take that commit than mess with conflicts. The rest is a separate pile, this was just a mechanical code movement. - a few misc patches all over the place. Not all for this cycle, there'll be more (and quite a few currently sit in akpm's tree)." Fix up trivial conflicts in the android binder driver, and some fairly simple conflicts due to two different changes to the sock_alloc_file() interface ("take descriptor handling from sock_alloc_file() to callers" vs "net: Providing protocol type via system.sockprotoname xattr of /proc/PID/fd entries" adding a dentry name to the socket) * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits) MAX_LFS_FILESIZE should be a loff_t compat: fs: Generic compat_sys_sendfile implementation fs: push rcu_barrier() from deactivate_locked_super() to filesystems btrfs: reada_extent doesn't need kref for refcount coredump: move core dump functionality into its own file coredump: prevent double-free on an error path in core dumper usb/gadget: fix misannotations fcntl: fix misannotations ceph: don't abuse d_delete() on failure exits hypfs: ->d_parent is never NULL or negative vfs: delete surplus inode NULL check switch simple cases of fget_light to fdget new helpers: fdget()/fdput() switch o2hb_region_dev_write() to fget_light() proc_map_files_readdir(): don't bother with grabbing files make get_file() return its argument vhost_set_vring(): turn pollstart/pollstop into bool switch prctl_set_mm_exe_file() to fget_light() switch xfs_find_handle() to fget_light() switch xfs_swapext() to fget_light() ...	2012-10-02 20:25:04 -07:00
Linus Torvalds	7a9a2970b5	First batch of InfiniBand/RDMA changes for the 3.7 merge window: - mlx4 IB support for SR-IOV - A couple of SRP initiator fixes - Batch of nes hardware driver fixes - Fix for long-standing use-after-free crash in IPoIB - Other miscellaneous fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABCAAGBQJQav4jAAoJEENa44ZhAt0hmL0QAJTuMdSOzYFd/NB38owJCNM2 kz/N1GlBm3z98fIlGo8u+lzgV2qxqZSAzsJsouMeK38KiAX3CL8HKe44A1QvTM6v dXTNL4JFX24/YF+nlmMY8Av518I9Mkte3BZCnpYkBjVFBWe0ePwoRC/btfBXPDIV 0snq4OtjoBAn00dOOyuZ5PoyY9xf0z4UB0Gple9sM4mzEb8wVWdNDDPOiuPJc6fA L+gk6HLkZDg54+QswafdKYwpeTq45wIKLmCdS3oUNmppMLVhZY8rECOwzSa+KiTr /Yo19n+zl+IBlvjQHhmUqGHvdD17PaGlr+TckAsQqmVfXUH5qqpEnkF8FoEK59c5 YA3lVU8Sj4BPhJ7qX54CuN3767mZizakkxCr9iPRzABFTgzWVgcSgCrE8jjx4i0h Pam+L5bmANFStgmGR8PmXiNgcrCUcEqYHsOWDDAnHa5ekb2nyv1JL1c18hlY9hC3 Xb1YTMZFwvofGza89hBu7oHrMbLOUc5kW2lBpvUn2nlyf3i0F8ISlVbVbNjFA54p 60/jHa2VOQ2CcJUJKnJOk4ajOOEfHnPtMn2q96XJ69Dp8+eSYEO/G+0i1OlChq4h ClnG0Yp+NkT1o8WXMd7guDR+RsXt+DXIij5TiUWRIqnIlopIsMTRhNH28tMu4jQL fgN5n987wru91ewdX4gW =PAcy -----END PGP SIGNATURE----- Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband updates from Roland Dreier: "First batch of InfiniBand/RDMA changes for the 3.7 merge window: - mlx4 IB support for SR-IOV - A couple of SRP initiator fixes - Batch of nes hardware driver fixes - Fix for long-standing use-after-free crash in IPoIB - Other miscellaneous fixes" This merge also removes a new use of __cancel_delayed_work(), and replaces it with the regular cancel_delayed_work() that is now irq-safe thanks to the workqueue updates. That said, I suspect the sequence in question should probably use "mod_delayed_work()". I just did the minimal "don't use deprecated functions" fixup, though. * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (45 commits) IB/qib: Fix local access validation for user MRs mlx4_core: Disable SENSE_PORT for multifunction devices mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs mlx4_core: Stash PCI ID driver_data in mlx4_priv structure IB/srp: Avoid having aborted requests hang IB/srp: Fix use-after-free in srp_reset_req() IB/qib: Add a qib driver version RDMA/nes: Fix compilation error when nes_debug is enabled RDMA/nes: Print hardware resource type RDMA/nes: Fix for crash when TX checksum offload is off RDMA/nes: Cosmetic changes RDMA/nes: Fix for incorrect MSS when TSO is on RDMA/nes: Fix incorrect resolving of the loopback MAC address mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem mlx4_core: Trivial cleanups to driver log messages mlx4_core: Trivial readability fix: "0X30" -> "0x30" IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations mlx4: Paravirtualize Node Guids for slaves mlx4: Activate SR-IOV mode for IB ...	2012-10-02 17:20:40 -07:00
Linus Torvalds	aecdc33e11	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking changes from David Miller: 1) GRE now works over ipv6, from Dmitry Kozlov. 2) Make SCTP more network namespace aware, from Eric Biederman. 3) TEAM driver now works with non-ethernet devices, from Jiri Pirko. 4) Make openvswitch network namespace aware, from Pravin B Shelar. 5) IPV6 NAT implementation, from Patrick McHardy. 6) Server side support for TCP Fast Open, from Jerry Chu and others. 7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel Borkmann. 8) Increate the loopback default MTU to 64K, from Eric Dumazet. 9) Use a per-task rather than per-socket page fragment allocator for outgoing networking traffic. This benefits processes that have very many mostly idle sockets, which is quite common. From Eric Dumazet. 10) Use up to 32K for page fragment allocations, with fallbacks to smaller sizes when higher order page allocations fail. Benefits are a) less segments for driver to process b) less calls to page allocator c) less waste of space. From Eric Dumazet. 11) Allow GRO to be used on GRE tunnels, from Eric Dumazet. 12) VXLAN device driver, one way to handle VLAN issues such as the limitation of 4096 VLAN IDs yet still have some level of isolation. From Stephen Hemminger. 13) As usual there is a large boatload of driver changes, with the scale perhaps tilted towards the wireless side this time around. Fix up various fairly trivial conflicts, mostly caused by the user namespace changes. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits) hyperv: Add buffer for extended info after the RNDIS response message. hyperv: Report actual status in receive completion packet hyperv: Remove extra allocated space for recv_pkt_list elements hyperv: Fix page buffer handling in rndis_filter_send_request() hyperv: Fix the missing return value in rndis_filter_set_packet_filter() hyperv: Fix the max_xfer_size in RNDIS initialization vxlan: put UDP socket in correct namespace vxlan: Depend on CONFIG_INET sfc: Fix the reported priorities of different filter types sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP sfc: Fix loopback self-test with separate_tx_channels=1 sfc: Fix MCDI structure field lookup sfc: Add parentheses around use of bitfield macro arguments sfc: Fix null function pointer in efx_sriov_channel_type vxlan: virtual extensible lan igmp: export symbol ip_mc_leave_group netlink: add attributes to fdb interface tg3: unconditionally select HWMON support when tg3 is enabled. Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT" gre: fix sparse warning ...	2012-10-02 13:38:27 -07:00
Linus Torvalds	437589a74b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "This is a mostly modest set of changes to enable basic user namespace support. This allows the code to code to compile with user namespaces enabled and removes the assumption there is only the initial user namespace. Everything is converted except for the most complex of the filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs, nfs, ocfs2 and xfs as those patches need a bit more review. The strategy is to push kuid_t and kgid_t values are far down into subsystems and filesystems as reasonable. Leaving the make_kuid and from_kuid operations to happen at the edge of userspace, as the values come off the disk, and as the values come in from the network. Letting compile type incompatible compile errors (present when user namespaces are enabled) guide me to find the issues. The most tricky areas have been the places where we had an implicit union of uid and gid values and were storing them in an unsigned int. Those places were converted into explicit unions. I made certain to handle those places with simple trivial patches. Out of that work I discovered we have generic interfaces for storing quota by projid. I had never heard of the project identifiers before. Adding full user namespace support for project identifiers accounts for most of the code size growth in my git tree. Ultimately there will be work to relax privlige checks from "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing root in a user names to do those things that today we only forbid to non-root users because it will confuse suid root applications. While I was pushing kuid_t and kgid_t changes deep into the audit code I made a few other cleanups. I capitalized on the fact we process netlink messages in the context of the message sender. I removed usage of NETLINK_CRED, and started directly using current->tty. Some of these patches have also made it into maintainer trees, with no problems from identical code from different trees showing up in linux-next. After reading through all of this code I feel like I might be able to win a game of kernel trivial pursuit." Fix up some fairly trivial conflicts in netfilter uid/git logging code. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits) userns: Convert the ufs filesystem to use kuid/kgid where appropriate userns: Convert the udf filesystem to use kuid/kgid where appropriate userns: Convert ubifs to use kuid/kgid userns: Convert squashfs to use kuid/kgid where appropriate userns: Convert reiserfs to use kuid and kgid where appropriate userns: Convert jfs to use kuid/kgid where appropriate userns: Convert jffs2 to use kuid and kgid where appropriate userns: Convert hpfs to use kuid and kgid where appropriate userns: Convert btrfs to use kuid/kgid where appropriate userns: Convert bfs to use kuid/kgid where appropriate userns: Convert affs to use kuid/kgid wherwe appropriate userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids userns: On ia64 deal with current_uid and current_gid being kuid and kgid userns: On ppc convert current_uid from a kuid before printing. userns: Convert s390 getting uid and gid system calls to use kuid and kgid userns: Convert s390 hypfs to use kuid and kgid where appropriate userns: Convert binder ipc to use kuids userns: Teach security_path_chown to take kuids and kgids userns: Add user namespace support to IMA userns: Convert EVM to deal with kuids and kgids in it's hmac computation ...	2012-10-02 11:11:09 -07:00
Linus Torvalds	033d9959ed	Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue changes from Tejun Heo: "This is workqueue updates for v3.7-rc1. A lot of activities this round including considerable API and behavior cleanups. * delayed_work combines a timer and a work item. The handling of the timer part has always been a bit clunky leading to confusing cancelation API with weird corner-case behaviors. delayed_work is updated to use new IRQ safe timer and cancelation now works as expected. * Another deficiency of delayed_work was lack of the counterpart of mod_timer() which led to cancel+queue combinations or open-coded timer+work usages. mod_delayed_work[_on]() are added. These two delayed_work changes make delayed_work provide interface and behave like timer which is executed with process context. * A work item could be executed concurrently on multiple CPUs, which is rather unintuitive and made flush_work() behavior confusing and half-broken under certain circumstances. This problem doesn't exist for non-reentrant workqueues. While non-reentrancy check isn't free, the overhead is incurred only when a work item bounces across different CPUs and even in simulated pathological scenario the overhead isn't too high. All workqueues are made non-reentrant. This removes the distinction between flush_[delayed_]work() and flush_[delayed_]_work_sync(). The former is now as strong as the latter and the specified work item is guaranteed to have finished execution of any previous queueing on return. * In addition to the various bug fixes, Lai redid and simplified CPU hotplug handling significantly. * Joonsoo introduced system_highpri_wq and used it during CPU hotplug. There are two merge commits - one to pull in IRQ safe timer from tip/timers/core and the other to pull in CPU hotplug fixes from wq/for-3.6-fixes as Lai's hotplug restructuring depended on them." Fixed a number of trivial conflicts, but the more interesting conflicts were silent ones where the deprecated interfaces had been used by new code in the merge window, and thus didn't cause any real data conflicts. Tejun pointed out a few of them, I fixed a couple more. * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits) workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending() workqueue: use cwq_set_max_active() helper for workqueue_set_max_active() workqueue: introduce cwq_set_max_active() helper for thaw_workqueues() workqueue: remove @delayed from cwq_dec_nr_in_flight() workqueue: fix possible stall on try_to_grab_pending() of a delayed work item workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback() workqueue: use __cpuinit instead of __devinit for cpu callbacks workqueue: rename manager_mutex to assoc_mutex workqueue: WORKER_REBIND is no longer necessary for idle rebinding workqueue: WORKER_REBIND is no longer necessary for busy rebinding workqueue: reimplement idle worker rebinding workqueue: deprecate __cancel_delayed_work() workqueue: reimplement cancel_delayed_work() using try_to_grab_pending() workqueue: use mod_delayed_work() instead of __cancel + queue workqueue: use irqsafe timer for delayed_work workqueue: clean up delayed_work initializers and add missing one workqueue: make deferrable delayed_work initializer names consistent workqueue: cosmetic whitespace updates for macro definitions workqueue: deprecate system_nrt[_freezable]_wq workqueue: deprecate flush[_delayed]_work_sync() ...	2012-10-02 09:54:49 -07:00
Roland Dreier	d172f5a4ab	Merge branches 'cma', 'cxgb4', 'ipoib', 'mlx4', 'mlx4-sriov', 'nes', 'qib' and 'srp' into for-linus	2012-10-02 07:43:59 -07:00
Or Gerlitz	862096a8bb	IB/ipoib: Add more rtnl_link_ops callbacks Add the rtnl_link_ops changelink and fill_info callbacks, through which the admin can now set/get the driver mode, etc policies. Maintain the proprietary sysfs entries only for legacy childs. For child devices, set dev->iflink to point to the parent device ifindex, such that user space tools can now correctly show the uplink relation as done for vlan, macvlan, etc devices. Pointed out by Patrick McHardy <kaber@trash.net> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-01 17:12:22 -04:00
Linus Torvalds	fdb2f9c2eb	PCI changes for the 3.7 merge window: Host bridge hotplug - Protect acpi_pci_drivers and acpi_pci_roots (Taku Izumi) - Clear host bridge resource info to avoid issue when releasing (Yinghai Lu) - Notify acpi_pci_drivers when hot-plugging host bridges (Jiang Liu) - Use standard list ops for acpi_pci_drivers (Jiang Liu) Device hotplug - Use pci_get_domain_bus_and_slot() to close hotplug races (Jiang Liu) - Remove fakephp driver (Bjorn Helgaas) - Fix VGA ref count in hotplug remove path (Yinghai Lu) - Allow acpiphp to handle PCIe ports without native hotplug (Jiang Liu) - Implement resume regardless of pciehp_force param (Oliver Neukum) - Make pci_fixup_irqs() work after init (Thierry Reding) Miscellaneous - Add pci_pcie_type(dev) and remove pci_dev.pcie_type (Yijing Wang) - Factor out PCI Express Capability accessors (Jiang Liu) - Add pcibios_window_alignment() so powerpc EEH can use generic resource assignment (Gavin Shan) - Make pci_error_handlers const (Stephen Hemminger) - Cleanup drivers/pci/remove.c (Bjorn Helgaas) - Improve Vendor-Specific Extended Capability support (Bjorn Helgaas) - Use standard list ops for bus->devices (Bjorn Helgaas) - Avoid kmalloc in pci_get_subsys() and pci_get_class() (Feng Tang) - Reassign invalid bus number ranges (Intel DP43BF workaround) (Yinghai Lu) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAABAgAGBQJQac4hAAoJEPGMOI97Hn6zjZYP/iaqU9kjmgTsBbSyzB4oApv/ RRxo3I+ad9GF6XlMQfVAtyx1pgCD1gdGAtoDgGSCTqgdYD3CO10AxKU+yleAk1wo dbMxLifJNTrT3G1mZ/NL16yEGhCwvhfwzRtB1VoZmCT4lSApO/7cJkXl2DzHfA/i pmltOOiQCN8kbUcJbVPtUyTVPi2zl/8bsyCyTkS7YG0VXeGRM+ZUvPWZJ7MnWYYB 5qoCdrw5ENCCiDQ9yw5SAfgL23b+0p6OI/x3Lkex0QQOWwSqGSiaHt4b7eitrC5b 2eAJg32f/AzZke1YbKLMfdsL0VJP3GAswhDVHlgmo63rZkOZChm+97dgZ35Mcv5v kEXkWyBb1xJ3t8rZir6Qer9Iv2wOB+MkZ5qtU/Vf+l0wLQLYTrRVsKngrEDREONk dXbokp6iVSPeA1sTSdH9MmHlTUIj82ZLSGcxcjTsN8NWZjxx6g3rNx1uay+5MYOW 4ET9zNu5snrAqN6N4Tb81gvtG8qYfxzdvVfrA9AaGKI6xxB7jkqgFJRp55JiEcFc x4cmWkhvdlhVsG2TQwFxYNfswOqD+7NCs6V4kSVZX6ezpDrH7I5VvcnnhstF7C8l KZul0EV7OW+kDK23pNe24lVP2xtOv6G8eK/3PmeKIXWl1V83nqre/oLufRzTfs+Z SxkILwY/MFpuCFteKE1t =haBu -----END PGP SIGNATURE----- Merge tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI changes from Bjorn Helgaas: "Host bridge hotplug - Protect acpi_pci_drivers and acpi_pci_roots (Taku Izumi) - Clear host bridge resource info to avoid issue when releasing (Yinghai Lu) - Notify acpi_pci_drivers when hot-plugging host bridges (Jiang Liu) - Use standard list ops for acpi_pci_drivers (Jiang Liu) Device hotplug - Use pci_get_domain_bus_and_slot() to close hotplug races (Jiang Liu) - Remove fakephp driver (Bjorn Helgaas) - Fix VGA ref count in hotplug remove path (Yinghai Lu) - Allow acpiphp to handle PCIe ports without native hotplug (Jiang Liu) - Implement resume regardless of pciehp_force param (Oliver Neukum) - Make pci_fixup_irqs() work after init (Thierry Reding) Miscellaneous - Add pci_pcie_type(dev) and remove pci_dev.pcie_type (Yijing Wang) - Factor out PCI Express Capability accessors (Jiang Liu) - Add pcibios_window_alignment() so powerpc EEH can use generic resource assignment (Gavin Shan) - Make pci_error_handlers const (Stephen Hemminger) - Cleanup drivers/pci/remove.c (Bjorn Helgaas) - Improve Vendor-Specific Extended Capability support (Bjorn Helgaas) - Use standard list ops for bus->devices (Bjorn Helgaas) - Avoid kmalloc in pci_get_subsys() and pci_get_class() (Feng Tang) - Reassign invalid bus number ranges (Intel DP43BF workaround) (Yinghai Lu)" * tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (102 commits) PCI: acpiphp: Handle PCIe ports without native hotplug capability PCI/ACPI: Use acpi_driver_data() rather than searching acpi_pci_roots PCI/ACPI: Protect acpi_pci_roots list with mutex PCI/ACPI: Use acpi_pci_root info rather than looking it up again PCI/ACPI: Pass acpi_pci_root to acpi_pci_drivers' add/remove interface PCI/ACPI: Protect acpi_pci_drivers list with mutex PCI/ACPI: Notify acpi_pci_drivers when hot-plugging PCI root bridges PCI/ACPI: Use normal list for struct acpi_pci_driver PCI/ACPI: Use DEVICE_ACPI_HANDLE rather than searching acpi_pci_roots PCI: Fix default vga ref_count ia64/PCI: Clear host bridge aperture struct resource x86/PCI: Clear host bridge aperture struct resource PCI: Stop all children first, before removing all children Revert "PCI: Use hotplug-safe pci_get_domain_bus_and_slot()" PCI: Provide a default pcibios_update_irq() PCI: Discard __init annotations for pci_fixup_irqs() and related functions PCI: Use correct type when freeing bus resource list PCI: Check P2P bridge for invalid secondary/subordinate range PCI: Convert "new_id"/"remove_id" into generic pci_bus driver attributes xen-pcifront: Use hotplug-safe pci_get_domain_bus_and_slot() ...	2012-10-01 12:05:36 -07:00
Mike Marciniszyn	c00aaa1a02	IB/qib: Fix local access validation for user MRs Commit `8aac4cc3a9` ("IB/qib: RCU locking for MR validation") introduced a bug that broke user post sends. The proper validation of the MR was lost in the patch. This patch corrects that validation. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-10-01 09:59:26 -07:00
Bart Van Assche	d853667091	IB/srp: Avoid having aborted requests hang We need to call scsi_done() for commands after we abort them. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:36:48 -07:00
Bart Van Assche	9b796d06d5	IB/srp: Fix use-after-free in srp_reset_req() srp_free_req() uses the scsi_cmnd structure contents to unmap buffers, so we must invoke srp_free_req() before we release ownership of that structure. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:36:47 -07:00
Dean Luick	e20d583818	IB/qib: Add a qib driver version Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:36:29 -07:00
Tatyana Nikolova	bca1935ccd	RDMA/nes: Fix compilation error when nes_debug is enabled Removing old variables caused a compile error from nes_debug(). Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:34:55 -07:00
Tatyana Nikolova	818216442b	RDMA/nes: Print hardware resource type Hardware resource types are added and when a resource isn't available, its type is printed. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:34:55 -07:00
Tatyana Nikolova	fc4ba7291b	RDMA/nes: Fix for crash when TX checksum offload is off When TX checksum offload is disabled for an iWarp connection, skb->ip_summed needs to be set to CHECKSUM_NONE. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:34:54 -07:00
Tatyana Nikolova	48a9956362	RDMA/nes: Cosmetic changes - Remove unnecessary statement "if (1)" - Refactor a statement (wqe_misc \|= NES_NIC_SQ_WQE_COMPLETION) out of if/else statement, because it is independant of the flow. - Define netdev->features in one line for clarity. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:34:53 -07:00
Tatyana Nikolova	6ad1be814b	RDMA/nes: Fix for incorrect MSS when TSO is on In TSO handling code, skb_shared_info() is used to get the MSS instead of the bool function skb_is_gso() (which always returns 1). Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:34:52 -07:00
Tatyana Nikolova	ef3d0c4a5e	RDMA/nes: Fix incorrect resolving of the loopback MAC address Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:34:52 -07:00
Jack Morgenstein	3806d08cf6	IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes When we have VFs and PFs on same host, the VFs are activated within the mlx4_core module before the mlx4_ib kernel module is loaded. When the mlx4_ib module initializes the PF (master), it now creates MAD paravirtualization contexts for any VFs that already active. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:44 -07:00
Jack Morgenstein	47605df953	mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations Previously, the structure of a guest's proxy QPs followed the structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1, qp1 port 2, ...). The guest then did offset calculations on the sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP(). This is now changed so that the guest does no offset calculations regarding proxy or tunnel QPs to use. This change frees the PPF from needing to adhere to a specific order in allocating proxy and tunnel QPs. Now QUERY_FUNC_CAP provides each port individually with its proxy qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are used directly where required (with no offset calculations). To accomplish this change, several fields were added to the phys_caps structure for use by the PPF and by non-SR-IOV mode: base_sqpn -- in non-sriov mode, this was formerly sqp_start. base_proxy_sqpn -- the first physical proxy qp number -- used by PPF base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF. The current code in the PPF still adheres to the previous layout of sqps, proxy-sqps and tunnel-sqps. However, the PPF can change this layout without affecting VF or (paravirtualized) PF code. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:43 -07:00
Jack Morgenstein	afa8fd1db9	mlx4: Paravirtualize Node Guids for slaves This is necessary in order to support > 1 VF/PF in a VM for software that uses the node guid as a discriminator, such as librdmacm. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:43 -07:00
Jack Morgenstein	026149cbaa	mlx4: Activate SR-IOV mode for IB Remove the error returns for IB ports from mlx4_ib_add, mlx4_INIT_PORT_wrapper, and mlx4_CLOSE_PORT_wrapper. Currently, SRIOV is supported only for devices for which the link layer is IB on all ports; RoCE support will be added later. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:42 -07:00
Jack Morgenstein	992e8e6e87	IB/mlx4: Miscellaneous adjustments for SR-IOV IB support 1. Allow only master to change node description. 2. Prevent AH leakage in send mads. 3. Take device part number from PCI structure, so that guests see the VF part number (and not the PF part number). 4. Place the device revision ID into caps structure at startup. 5. SET_PORT in update_gids_task needs to go through wrapper on master. 6. In mlx4_ib_event(), PORT_MGMT_EVENT needs be handled in a work queue on the master, since it propagates events to slaves using GEN_EQE. 7. Do not support FMR on slaves. 8. Add spinlock to slave_event(), since it is called both in interrupt context and in process context (due to 6 above, and also if smp_snoop is used). This fix was found and implemented by Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:41 -07:00
Jack Morgenstein	c1e7e46612	IB/mlx4: Add iov directory in sysfs under the ib device This directory is added only for the master -- slaves do not have it. The sysfs iov directory is used to manage and examine the port P_Key and guid paravirtualization. Under iov/ports, the administrator may examine the gid and P_Key tables as they are present in the device (and as are seen in the "network view" presented to the SM). Under the iov/<pci slot number> directories, the admin may map the index numbers in the physical tables (as under iov/ports) to the paravirtualized index numbers that guests see. For example, if the administrator, for port 1 on guest 2 maps physical pkey index 10 to virtual index 1, then that guest, whenever it uses its pkey index 1, will actually be using the real pkey index 10. Based on patch from Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:39 -07:00
Jack Morgenstein	2a4fae148c	IB/mlx4: Propagate P_Key and guid change port management events to slaves P_Key change and guid change events are not of interest to all slaves, but only to those slaves which "see" the table slots whose contents have change. For example, if the guid at port 1, index 5 has changed in the PPF, we wish to propagate the gid-change event only to the function which has that guid index mapped to its port/guid table (in this case it is slave #5). Other functions should not get the event, since the event does not affect them. Similarly with P_Keys -- P_Key change events are forwarded only to slaves which have that P_Key index mapped to their virtual P_Key table. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:38 -07:00
Jack Morgenstein	a0c64a17ab	mlx4: Add alias_guid mechanism For IB ports, we paravirtualize the GUID at index 0 on slaves. The GUID at index 0 seen by a slave is the actual GUID occupying the GUID table at the slave-id index. The driver, by default, requests at startup time that subnet manager populate its entire guid table with GUIDs. These guids are then mapped (paravirtualized) to the slaves, and appear for each slave as its GUID at index 0. Until each slave has such a guid, its port status is DOWN. The guid table is cached to support special QP paravirtualization, and event propagation to slaves on guid change (we test to see if the guid really changed before propagating an event to the slave). To support this caching, add capability to __mlx4_ib_query_gid() to obtain the network view (i.e., physical view) gid at index X, not just the host (paravirtualized) view. Based on a patch from Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:37 -07:00
Amir Vadai	3cf69cc8db	IB/mlx4: Add CM paravirtualization In CM para-virtualization: 1. Incoming requests are steered to the correct vHCA according to the embedded GID. 2. Communication IDs on outgoing requests are replaced by a globally unique ID, generated by the PPF, since there is no synchronization of ID generation between guests (and so these IDs are not guaranteed to be globally unique). The guest's comm ID is stored, and is returned to the response MAD when it arrives. Signed-off-by: Amir Vadai <amirv@mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:36 -07:00
Oren Duer	b9c5d6a643	IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV MCG paravirtualization support includes: - Creating multicast groups by VFs, and keeping accounting of them - Leaving multicast groups by VFs - Updating SM only with real changes in the overall picture of MCGs status - Creation of MGID=0 groups (let SM choose MGID) Note that the MCG module maintains its own internal MCG object reference counts. The reason for this is that the IB core is used to track only the multicast groups joins generated by the PF it runs over. The PF IB core layer is unaware of slaves, so it cannot be used to keep track of MCG joins they generate. Signed-off-by: Oren Duer <oren@mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:35 -07:00
Jack Morgenstein	0a9a01884d	mlx4: MAD_IFC paravirtualization The MAD_IFC firmware command fulfills two functions. First, it is used in the QP0/QP1 MAD-handling flow to obtain information from the FW (for answering queries), and for setting variables in the HCA (MAD SET packets). For this, MAD_IFC should provide the FW (physical) view of the data. This is the view that OpenSM needs. We call this the "network view". In the second case, MAD_IFC is used by various verbs to obtain data regarding the local HCA (e.g., ib_query_device()). We call this the "host view". This data needs to be paravirtualized. MAD_IFC therefore needs a wrapper function, and also needs another flag indicating whether it should provide the network view (when it is called by ib_process_mad in special-qp packet handling), or the host view (when it is called while implementing a verb). There are currently 2 flag parameters in mlx4_MAD_IFC already: ignore_bkey and ignore_mkey. These two parameters are replaced by a single "mad_ifc_flags" parameter, with different bits set for each flag. A third flag is added: "network-view/host-view". Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:34 -07:00
Jack Morgenstein	37bfc7c1e8	IB/mlx4: SR-IOV multiplex and demultiplex MADs Special QPs are paravirtualized. vHCAs are not given direct access to QP0/1. Rather, these QPs are operated by a special context hosted by the PF, which mediates access to/from vHCAs. This is done by opening a "tunnel" per vHCA port per QP0/1. A tunnel comprises a pair of UD QPs: a "Tunnel QP" in the PF-context and a "Proxy QP" in the vHCA. All vHCA MAD traffic must pass through the corresponding tunnel. vHCA QPs cannot be assigned to VL15 and are denied of the well-known QKey. Outgoing messages are "de-multiplexed" (i.e., directed to the wire via the real special QP). Incoming messages are "multiplexed" (i.e. steered by the PPF to the correct VF or to the PF) QP0 access is restricted to the PF vHCA. VF vHCAs also have (virtual) QP0s, but they never receive any SMPs and all SMPs sent are discarded. QP1 traffic is allowed for all vHCAs, but special care is required to bridge the gap between the host and network views. Specifically: - Transaction IDs are mapped to guarantee uniqueness among vHCAs - CM para-virtualization o Incoming requests are steered to the correct vHCA according to the embedded GID o Local communication IDs are mapped to ensure uniqueness among vHCAs (see the patch that adds CM paravirtualization.) - Multicast para-virtualization o The PF context aggregates membership state from all vHCAs o The SA is contacted only when the aggregate membership changes o If the aggregate does not change, the PF context will provide the requesting vHCA with the proper response. (see the patch that adds multicast group paravirtualization) Incoming MADs are steered according to: - the DGID If a GRH is present - the mapped transaction ID for response MADs - the embedded GID in CM requests - the remote communication ID in other CM messages Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:34 -07:00
Jack Morgenstein	54679e1482	mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop This requires: 1. Replacing the paravirtualized P_Key index (inserted by the guest) with the real P_Key index. 2. For UD QPs, placing the guest's true source GID index in the address path structure mgid field, and setting the ud_force_mgid bit so that the mgid is taken from the QP context and not from the WQE when posting sends. 3. For UC and RC QPs, placing the guest's true source GID index in the address path structure mgid field. 4. For tunnel and proxy QPs, setting the Q_Key value reserved for that proxy/tunnel pair. Since not all the above adjustments occur in all the QP transitions, the QP transitions require separate wrapper functions. Secondly, initialize the P_Key virtualization table to its default values: Master virtualized table is 1-1 with the real P_Key table, guest virtualized table has P_Key index 0 mapped to the real P_Key index 0, and all the other P_Key indices mapped to the reserved (invalid) P_Key at index 127. Finally, add logic in smp_snoop for maintaining the phys_P_Key_cache. and generating events on the master only if a P_Key actually changed. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:33 -07:00
Jack Morgenstein	fc06573dfa	IB/mlx4: Initialize SR-IOV IB support for slaves in master context Allocate SR-IOV paravirtualization resources and MAD demuxing contexts on the master. This has two parts. The first part is to initialize the structures to contain the contexts. This is done at master startup time in mlx4_ib_init_sriov(). The second part is to actually create the tunneling resources required on the master to support a slave. This is performed the master detects that a slave has started up (MLX4_DEV_EVENT_SLAVE_INIT event generated when a slave initializes its comm channel). For the master, there is no such startup event, so it creates its own tunneling resources when it starts up. In addition, the master also creates the real special QPs. The ib_core layer on the master causes creation of proxy special QPs, since the master is also paravirtualized at the ib_core layer. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:32 -07:00
Jack Morgenstein	1ffeb2eb8b	IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support 1. Introduce the basic SR-IOV parvirtualization context objects for multiplexing and demultiplexing MADs. 2. Introduce support for the new proxy and tunnel QP types. This patch introduces the objects required by the master for managing QP paravirtualization for guests. struct mlx4_ib_sriov is created by the master only. It is a container for the following: 1. All the info required by the PPF to multiplex and de-multiplex MADs (including those from the PF). (struct mlx4_ib_demux_ctx demux) 2. All the info required to manage alias GUIDs (i.e., the GUID at index 0 that each guest perceives. In fact, this is not the GUID which is actually at index 0, but is, in fact, the GUID which is at index[<VF number>] in the physical table. 3. structures which are used to manage CM paravirtualization 4. structures for managing the real special QPs when running in SR-IOV mode. The real SQPs are controlled by the PPF in this case. All SQPs created and controlled by the ib core layer are proxy SQP. struct mlx4_ib_demux_ctx contains the information per port needed to manage paravirtualization: 1. All multicast paravirt info 2. All tunnel-qp paravirt info for the port. 3. GUID-table and GUID-prefix for the port 4. work queues. struct mlx4_ib_demux_pv_ctx contains all the info for managing the paravirtualized QPs for one slave/port. struct mlx4_ib_demux_pv_qp contains the info need to run an individual QP (either tunnel qp or real SQP). Note: We made use of the 2 most significant bits in enum mlx4_ib_qp_flags (based on enum ib_qp_create_flags in ib_verbs.h). We need these bits in the low-level driver for internal purposes. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:30 -07:00
Jack Morgenstein	73aaa7418f	IB/core: Add ib_find_exact_cached_pkey() When P_Key tables potentially contain both full and partial membership copies for the same P_Key, we need a function to find the index for an exact (16-bit) P_Key. This is necessary when the master forwards QP1 MADs sent by guests. If the guest has sent the MAD with a limited membership P_Key, we need to to forward the MAD using the same limited membership P_Key. Since the master may have both the limited and the full member P_Keys in its table, we must make sure to retrieve the limited membership P_Key in this case. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:30 -07:00
Jack Morgenstein	ff7166c447	IB/core: Handle table with full and partial membership for the same P_Key Extend the cached and non-cached P_Key table lookups to handle limited and full membership of the same P_Key to co-exist in the P_Key table. This is necessary for SR-IOV, to allow for some guests would to have the full membership P_Key in their virtual P_Key table, while other guests on the same physical HCA would have the limited one. To support this, we need both the limited and full membership P_Keys to be present in the master's (hypervisor physical port) P_Key table. The algorithm for handling P_Key tables which contain both the limited and the full membership versions of the same P_Key works as follows: When scanning the P_Key table for a 15-bit P_Key: A. If there is a full member version of that P_Key anywhere in the table, return its index (even if a limited-member version of the P_Key exists earlier in the table). B. If the full member version is not in the table, but the limited-member version is in the table, return the index of the limited P_Key. Signed-off-by: Liran Liss <liranl@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:29 -07:00
Dotan Barak	46db567deb	IB/mlx4: Fill in sq_sig_type in query QP Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:33:08 -07:00
Patrick McHardy	bea1e22df4	IPoIB: Fix use-after-free of multicast object Fix a crash in ipoib_mcast_join_task(). (with help from Or Gerlitz) Commit `c8c2afe360` ("IPoIB: Use rtnl lock/unlock when changing device flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which is run from the ipoib_workqueue, and hence the workqueue can't be flushed from the context of ipoib_stop(). In the current code, ipoib_stop() (which doesn't flush the workqueue) calls ipoib_mcast_dev_flush(), which goes and deletes all the multicast entries. This takes place without any synchronization with a possible running instance of ipoib_mcast_join_task() for the same ipoib device, leading to a crash due to NULL pointer dereference. Fix this by making sure that the workqueue is flushed before ipoib_mcast_dev_flush() is called. To make that possible, we move the RTNL-lock wrapped code to ipoib_mcast_join_finish(). Signed-off-by: Patrick McHardy <kaber@trash.net> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:32:33 -07:00
Emil Goode	c079c28714	RDMA/cxgb4: Fix error handling in create_qp() The variable ret is assigned return values in a couple of places, but its value is never returned. This patch makes use of the ret variable so that the caller get correct error codes returned. The following changes are also introduced: - The alloc_oc_sq function can return -ENOSYS or -ENOMEM so we want to get the return value from it. - Change the label names to improve readability. Signed-off-by: Emil Goode <emilgoode@gmail.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:32:14 -07:00
Dotan Barak	2a22fb8c69	RDMA/cma: Use consistent component mask for IPoIB port space multicast joins CMA multicast joins for the IPoIB port space need to use the same component mask used by the ipoib driver. Otherwise, it's possible for the CMA to create a group to which a join made by ipoib will fail, or vise-versa. Some of the component mask fields set by ipoib weren't set by the CMA, fix that. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:31:47 -07:00
Dotan Barak	d8c9166961	IB/core: Remove unused variables in ucm/ucma Remove unused wait objects from ucm/ucma events flow. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-30 20:31:46 -07:00
David S. Miller	6a06e5e1bb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/team/team.c drivers/net/usb/qmi_wwan.c net/batman-adv/bat_iv_ogm.c net/ipv4/fib_frontend.c net/ipv4/route.c net/l2tp/l2tp_netlink.c The team, fib_frontend, route, and l2tp_netlink conflicts were simply overlapping changes. qmi_wwan and bat_iv_ogm were of the "use HEAD" variety. With help from Antonio Quartulli. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-28 14:40:49 -04:00
Al Viro	2903ff019b	switch simple cases of fget_light to fdget Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-26 22:20:08 -04:00
Al Viro	88b428d6e1	switch infinibarf users of fget() to fget_light() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-26 21:10:10 -04:00
Paul E. McKenney	593d1006cd	Merge remote-tracking branch 'tip/core/rcu' into next.2012.09.25b Resolved conflict in kernel/sched/core.c using Peter Zijlstra's approach from https://lkml.org/lkml/2012/9/5/585.	2012-09-25 10:03:56 -07:00
Paul E. McKenney	5217192b85	Merge remote-tracking branch 'tip/smp/hotplug' into next.2012.09.25b The conflicts between kernel/rcutree.h and kernel/rcutree_plugin.h were due to adjacent insertions and deletions, which were resolved by simply accepting the changes on both branches.	2012-09-25 10:01:45 -07:00
Eric W. Biederman	d03ca5820d	userns: Convert ipathfs to use GLOBAL_ROOT_UID and GLOBAL_ROOT_GID Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-09-21 03:13:19 -07:00
Or Gerlitz	9baa0b0364	IB/ipoib: Add rtnl_link_ops support Add rtnl_link_ops to IPoIB, with the first usage being child device create/delete through them. Childs devices are now either legacy ones, created/deleted through the ipoib sysfs entries, or RTNL ones. Adding support for RTNL childs involved refactoring of ipoib_vlan_add which is now used by both the sysfs and the link_ops code. Also, added ndo_uninit entry to support calling unregister_netdevice_queue from the rtnl dellink entry. This required removal of calls to ipoib_dev_cleanup from the driver in flows which use unregister_netdevice, since the networking core will invoke ipoib_uninit which does exactly that. Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-20 16:49:17 -04:00
Roland Dreier	9c58b7ddd7	target: Simplify fabric sense data length handling Every fabric driver has to supply a se_tfo->set_fabric_sense_len() method, just so iSCSI can return an offset of 2. However, every fabric driver is already allocating a sense buffer and passing it into the target core, either via transport_init_se_cmd() or target_submit_cmd(). So instead of having iSCSI pass the start of its sense buffer into the core and then later tell the core to skip the first 2 bytes, it seems easier for iSCSI just to do the offset of 2 when it passes the sense buffer into the core. Then we can drop the se_tfo->set_fabric_sense_len() everywhere, and just add a couple of lines of code to iSCSI to set the sense data length to the beginning of the buffer right before it sends it over the network. (nab: Remove .set_fabric_sense_len usage from tcm_qla2xxx_npiv_ops + change transport_get_sense_buffer to follow v3.6-rc6 code w/o ->set_fabric_sense_len usage) Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-09-17 17:12:58 -07:00
Roland Dreier	2ed772b7b9	target: Remove unused target_core_fabric_ops.get_fabric_sense_len method There are no callers of se_tfo->get_fabric_sense_len(), so we should stop having every fabric driver implement it. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-09-17 16:15:47 -07:00
Benjamin Herrenschmidt	eda485f06d	Merge remote-tracking branch 'pci/pci/gavin-window-alignment' into next Merge Gavin patches from the PCI tree as subsequent powerpc patches are going to depend on them Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-17 16:07:43 +10:00
Roland Dreier	2116fe4e8b	Merge branches 'cxgb4', 'ipoib', 'mlx4', 'ocrdma' and 'qib' into for-next	2012-09-14 10:42:52 -07:00
Mike Marciniszyn	4c3550057b	IB/qib: Fix failure of compliance test C14-024#06_LocalPortNum Commit `3236b2d469` ("IB/qib: MADs with misset M_Keys should return failure") introduced a return code assignment that unfortunately introduced an unconditional exit for the routine due to missing braces. This patch adds the braces to correct the original patch. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-14 10:42:32 -07:00
Parav Pandit	ae3bca90e9	RDMA/ocrdma: Fix CQE expansion of unsignaled WQE Fix CQE expansion of unsignaled WQE -- don't expand the CQE when the WQE index of the completed CQE matches with last pending WQE (tail) in the queue. Signed-off-by: Parav Pandit <parav.pandit@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-14 10:40:58 -07:00
Bjorn Helgaas	9a5d5bd848	Merge commit 'v3.6-rc5' into pci/gavin-window-alignment * commit 'v3.6-rc5': (1098 commits) Linux 3.6-rc5 HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured Remove user-triggerable BUG from mpol_to_str xen/pciback: Fix proper FLR steps. uml: fix compile error in deliver_alarm() dj: memory scribble in logi_dj Fix order of arguments to compat_put_time[spec\|val] xen: Use correct masking in xen_swiotlb_alloc_coherent. xen: fix logical error in tlb flushing xen/p2m: Fix one-off error in checking the P2M tree directory. powerpc: Don't use __put_user() in patch_instruction powerpc: Make sure IPI handlers see data written by IPI senders powerpc: Restore correct DSCR in context switch powerpc: Fix DSCR inheritance in copy_thread() powerpc: Keep thread.dscr and thread.dscr_inherit in sync powerpc: Update DSCR on all CPUs when writing sysfs dscr_default powerpc/powernv: Always go into nap mode when CPU is offline powerpc: Give hypervisor decrementer interrupts their own handler powerpc/vphn: Fix arch_update_cpu_topology() return value ARM: gemini: fix the gemini build ... Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c drivers/rapidio/devices/tsi721.c	2012-09-13 15:54:57 -06:00
Bjorn Helgaas	78890b5989	Merge commit 'v3.6-rc5' into next * commit 'v3.6-rc5': (1098 commits) Linux 3.6-rc5 HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured Remove user-triggerable BUG from mpol_to_str xen/pciback: Fix proper FLR steps. uml: fix compile error in deliver_alarm() dj: memory scribble in logi_dj Fix order of arguments to compat_put_time[spec\|val] xen: Use correct masking in xen_swiotlb_alloc_coherent. xen: fix logical error in tlb flushing xen/p2m: Fix one-off error in checking the P2M tree directory. powerpc: Don't use __put_user() in patch_instruction powerpc: Make sure IPI handlers see data written by IPI senders powerpc: Restore correct DSCR in context switch powerpc: Fix DSCR inheritance in copy_thread() powerpc: Keep thread.dscr and thread.dscr_inherit in sync powerpc: Update DSCR on all CPUs when writing sysfs dscr_default powerpc/powernv: Always go into nap mode when CPU is offline powerpc: Give hypervisor decrementer interrupts their own handler powerpc/vphn: Fix arch_update_cpu_topology() return value ARM: gemini: fix the gemini build ... Conflicts: drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c drivers/rapidio/devices/tsi721.c	2012-09-13 08:41:01 -06:00
Bjorn Helgaas	1959ec5f82	Merge branch 'pci/stephen-const' into next * pci/stephen-const: make drivers with pci error handlers const scsi: make pci error handlers const netdev: make pci_error_handlers const PCI: Make pci_error_handlers const	2012-09-12 13:54:10 -06:00
Shlomo Pongratz	b5120a6e11	IPoIB: Fix AB-BA deadlock when deleting neighbours Lockdep points out a circular locking dependency betwwen the ipoib device priv spinlock (priv->lock) and the neighbour table rwlock (ntbl->rwlock). In the normal path, ie neigbour garbage collection task, the neigh table rwlock is taken first and then if the neighbour needs to be deleted, priv->lock is taken. However in some error paths, such as in ipoib_cm_handle_tx_wc(), priv->lock is taken first and then ipoib_neigh_free routine is called which in turn takes the neighbour table ntbl->rwlock. The solution is to get rid the neigh table rwlock completely and use only priv->lock. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-12 09:21:45 -07:00
Shlomo Pongratz	66172c0993	IPoIB: Fix memory leak in the neigh table deletion flow If the neighbours hash table is empty when unloading the module, then ipoib_flush_neighs(), the cleanup routine, isn't called and the memory used for the hash table itself leaked. To fix this, ipoib_flush_neighs() is allways called, and another completion object is added to signal when the table is freed. Once invoked, ipoib_flush_neighs() flushes all the neighbours (if there are any), calls the the hash table RCU free routine, which now signals completion of the deletion process, and waits for the last neighbour to be freed. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-12 09:05:03 -07:00
Pablo Neira Ayuso	9f00d9776b	netlink: hide struct module parameter in netlink_kernel_create This patch defines netlink_kernel_create as a wrapper function of __netlink_kernel_create to hide the struct module *me parameter (which seems to be THIS_MODULE in all existing netlink subsystems). Suggested by David S. Miller. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-08 18:46:30 -04:00
Wei Yongjun	92dd6c3d4d	RDMA/cxgb4: Move dereference below NULL test spatch with a semantic match is used to found this. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-09-07 16:19:03 -07:00
Stephen Hemminger	1d3520357d	make drivers with pci error handlers const Covers the rest of the uses of pci error handler. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2012-09-07 16:35:00 -06:00
Benjamin Herrenschmidt	fff34b3412	Merge branch 'merge' into next Brings in various bug fixes from 3.6-rcX	2012-09-07 09:48:59 +10:00
Vipul Pandya	e5619c120d	RDMA/cxgb4: Update RDMA/cxgb4 due to macro definition removal in cxgb4 driver cxgb4 driver has duplicate definitions of registers which will be removed. This patch updates the RDMA/cxgb4 driver accordingly. Signed-off-by: Santosh Rastapur <santosh@chelsio.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Reviewed-by: Sivakumar Subramani <sivasu@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-05 17:41:40 -04:00
Michael Ellerman	b4b8d1e48e	IB/ehca: Remove uses of virt_to_abs() and abs_to_virt() abs_to_virt() simply calls __va() and we'd like to get rid of it, so replace all abs_to_virt() uses with __va(). __va() returns void *, so when assigning to a pointer there's no need to cast. Similarly virt_to_abs() just calls __pa(). Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-05 15:18:45 +10:00
Michael Ellerman	aa97c32c1e	IB/ehca: Don't use phys_to_abs(), it's a nop Since commit `f5339277` phys_to_abs() is 100% a nop, so just remove it. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-09-05 15:18:43 +10:00
Jiang Liu	0921caf326	IB/qib: Use PCI Express Capability accessors Use PCI Express Capability access functions to simplify qib driver. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>	2012-08-23 10:11:15 -06:00
Jiang Liu	3c55569b08	IB/mthca: Use PCI Express Capability accessors Use PCI Express Capability access functions to simplify mthca driver. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Roland Dreier <roland@purestorage.com>	2012-08-23 10:11:15 -06:00
Tejun Heo	136b5721d7	workqueue: deprecate __cancel_delayed_work() Now that cancel_delayed_work() can be safely called from IRQ handlers, there's no reason to use __cancel_delayed_work(). Use cancel_delayed_work() instead of __cancel_delayed_work() and mark the latter deprecated. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jens Axboe <axboe@kernel.dk> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Roland Dreier <roland@kernel.org> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>	2012-08-21 13:18:24 -07:00
Tejun Heo	e7c2f96744	workqueue: use mod_delayed_work() instead of __cancel + queue Now that mod_delayed_work() is safe to call from IRQ handlers, __cancel_delayed_work() followed by queue_delayed_work() can be replaced with mod_delayed_work(). Most conversions are straight-forward except for the following. * net/core/link_watch.c: linkwatch_schedule_work() was doing a quite elaborate dancing around its delayed_work. Collapse it such that linkwatch_work is queued for immediate execution if LW_URGENT and existing timer is kept otherwise. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>	2012-08-21 13:18:24 -07:00
Paul E. McKenney	bff4a39479	infiniband: ehca: Fix compiler warnings Fix comp_task() to return void to match smp_hotplug_thread's thread_fn member, and adjust indirection on the ->cpu_comp_threads per-CPU member of the ehca_comp_pool structure. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>	2012-08-20 08:11:04 -07:00
Paul E. McKenney	c0e12e51b0	infiniband: ehca: Fix while->do-while conversion typo This commit just adds a needed semicolon. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>	2012-08-20 08:00:23 -07:00
Roland Dreier	c0369b296e	Merge branches 'cma', 'ipoib', 'misc', 'mlx4', 'ocrdma', 'qib' and 'srp' into for-next	2012-08-16 09:38:39 -07:00
Kleber Sacilotto de Souza	a0675a386a	IB/mlx4: Check iboe netdev pointer before dereferencing it Unlike other parts of the mlx4_ib code, the function build_mlx_header() doesn't check if the iboe netdev of the given port is valid before dereferencing it, which can cause a crash if the ethernet interface has already been taken down. Fix this by checking for a valid netdev pointer before using it to get the port MAC address. Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-08-16 09:38:19 -07:00
Bart Van Assche	220329916c	IB/srp: Fix a race condition Avoid a crash caused by the scmnd->scsi_done(scmnd) call in srp_process_rsp() being invoked with scsi_done == NULL. This can happen if a reply is received during or after a command abort. Reported-by: Joseph Glanville <joseph.glanville@orionvm.com.au> Reference: http://marc.info/?l=linux-rdma&m=134314367801595 Cc: <stable@vger.kernel.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-08-15 12:00:48 -07:00
Julia Lawall	51fa3ca37e	IB/qib: Fix error return code in qib_init_7322_variables() Convert a 0 error return code to a negative one, as returned elsewhere in the function. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier ret; expression e,e1,e2,e3,e4,x; @@ ( if (\(ret != 0\\|ret < 0\) \|\| ...) { ... return ...; } \| ret = 0 ) ... when != ret = e1 x = \(kmalloc\\|kzalloc\\|kcalloc\\|devm_kzalloc\\|ioremap\\|ioremap_nocache\\|devm_ioremap\\|devm_ioremap_nocache\)(...); ... when != x = e2 when != ret = e3 if (x == NULL \|\| ...) { ... when != ret = e4 * return ret; } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-08-15 11:58:21 -07:00
Masanari Iida	142ad5db2b	IB: Fix typos in infiniband drivers Correct spelling typos in comments in drivers/infiniband. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-08-15 11:56:19 -07:00
Shlomo Pongratz	6c723a68c6	IB/ipoib: Fix RCU pointer dereference of wrong object Commit `b63b70d877` ("IPoIB: Use a private hash table for path lookup in xmit path") introduced a bug where in ipoib_neigh_free() (which is called from a few errors flows in the driver), rcu_dereference() is invoked with the wrong pointer object, which results in a crash. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-08-14 15:21:44 -07:00
Shlomo Pongratz	fa16ebed31	IB/ipoib: Add missing locking when CM object is deleted Commit `b63b70d877` ("IPoIB: Use a private hash table for path lookup in xmit path") introduced a bug where in ipoib_cm_destroy_tx() a CM object is moved between lists without any supported locking. Under a stress test, this eventually leads to list corruption and a crash. Previously when this routine was called, callers were taking the device priv lock. Currently this function is called from the RCU callback associated with neighbour deletion. Fix the race by taking the same lock we used to before. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-08-14 15:21:44 -07:00
Tejun Heo	41f63c5359	workqueue: use mod_delayed_work() instead of cancel + queue Convert delayed_work users doing cancel_delayed_work() followed by queue_delayed_work() to mod_delayed_work(). Most conversions are straight-forward. Ones worth mentioning are, * drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always use mod_delayed_work() and cancel loop in edac_mc_reset_delay_period() is dropped. * drivers/platform/x86/thinkpad_acpi.c: No need to remember whether watchdog is active or not. @fan_watchdog_active and related code dropped. * drivers/power/charger-manager.c: Seemingly a lot of delayed_work_pending() abuse going on here. [delayed_]work_pending() are unsynchronized and racy when used like this. I converted one instance in fullbatt_handler(). Please conver the rest so that it invokes workqueue APIs for the intended target state rather than trying to game work item pending state transitions. e.g. if timer should be modified - call mod_delayed_work(), canceled - call cancel_delayed_work[_sync](). * drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling() simplified. Note that round_jiffies() calls in this function are meaningless. round_jiffies() work on absolute jiffies not delta delay used by delayed_work. v2: Tomi pointed out that __cancel_delayed_work() users can't be safely converted to mod_delayed_work(). They could be calling it from irq context and if that happens while delayed_work_timer_fn() is running, it could deadlock. __cancel_delayed_work() users are dropped. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Anton Vorontsov <cbouatmailru@gmail.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Doug Thompson <dougthompson@xmission.com> Cc: David Airlie <airlied@linux.ie> Cc: Roland Dreier <roland@kernel.org> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Len Brown <len.brown@intel.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Johannes Berg <johannes@sipsolutions.net>	2012-08-13 16:27:37 -07:00
Tatyana Nikolova	418edaaba9	RDMA/ucma.c: Fix for events with wrong context on iWARP It is possible for asynchronous RDMA_CM_EVENT_ESTABLISHED events to be generated with ctx->uid == 0, because ucma_set_event_context() copies ctx->uid to the event structure outside of ctx->file->mut. This leads to a crash in the userspace library, since it gets a bogus event. Fix this by taking the mutex a bit earlier in ucma_event_handler. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Sean Hefty <Sean.Hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-08-13 13:08:35 -07:00
Thomas Gleixner	81942621bd	infiniband: Ehca: Use hotplug thread infrastructure Get rid of the hotplug notifiers and use the generic hotplug thread infrastructure. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20120716103948.775527032@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2012-08-13 17:01:08 +02:00
Roland Dreier	d549f55f2e	RDMA/ocrdma: Don't call vlan_dev_real_dev() for non-VLAN netdevs If CONFIG_VLAN_8021Q is not set, then vlan_dev_real_dev() just goes BUG(), so we shouldn't call it unless we're actually dealing with a VLAN netdev. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-08-10 16:52:13 -07:00
Jack Morgenstein	df7fba6647	IB/mlx4: Fix possible deadlock on sm_lock spinlock The sm_lock spinlock is taken in the process context by mlx4_ib_modify_device, and in the interrupt context by update_sm_ah, so we need to take that spinlock with irqsave, and release it with irqrestore. Lockdeps reports this as follows: [ INFO: inconsistent lock state ] 3.5.0+ #20 Not tainted inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes: (&(&ibdev->sm_lock)->rlock){?.+...}, at: [<ffffffffa028af1d>] update_sm_ah+0xad/0x100 [mlx4_ib] {HARDIRQ-ON-W} state was registered at: [<ffffffff810b84a0>] mark_irqflags+0x120/0x190 [<ffffffff810b9ce7>] __lock_acquire+0x307/0x4c0 [<ffffffff810b9f51>] lock_acquire+0xb1/0x150 [<ffffffff815523b1>] _raw_spin_lock+0x41/0x50 [<ffffffffa028d563>] mlx4_ib_modify_device+0x63/0x240 [mlx4_ib] [<ffffffffa026d1fc>] ib_modify_device+0x1c/0x20 [ib_core] [<ffffffffa026c353>] set_node_desc+0x83/0xc0 [ib_core] [<ffffffff8136a150>] dev_attr_store+0x20/0x30 [<ffffffff81201fd6>] sysfs_write_file+0xe6/0x170 [<ffffffff8118da38>] vfs_write+0xc8/0x190 [<ffffffff8118dc01>] sys_write+0x51/0x90 [<ffffffff8155b869>] system_call_fastpath+0x16/0x1b ... * DEADLOCK * 1 lock held by swapper/0/0: stack backtrace: Pid: 0, comm: swapper/0 Not tainted 3.5.0+ #20 Call Trace: <IRQ> [<ffffffff810b7bea>] print_usage_bug+0x18a/0x190 [<ffffffff810b7370>] ? print_irq_inversion_bug+0x210/0x210 [<ffffffff810b7fb2>] mark_lock_irq+0xf2/0x280 [<ffffffff810b8290>] mark_lock+0x150/0x240 [<ffffffff810b84ef>] mark_irqflags+0x16f/0x190 [<ffffffff810b9ce7>] __lock_acquire+0x307/0x4c0 [<ffffffffa028af1d>] ? update_sm_ah+0xad/0x100 [mlx4_ib] [<ffffffff810b9f51>] lock_acquire+0xb1/0x150 [<ffffffffa028af1d>] ? update_sm_ah+0xad/0x100 [mlx4_ib] [<ffffffff815523b1>] _raw_spin_lock+0x41/0x50 [<ffffffffa028af1d>] ? update_sm_ah+0xad/0x100 [mlx4_ib] [<ffffffffa026b2fa>] ? ib_create_ah+0x1a/0x40 [ib_core] [<ffffffffa028af1d>] update_sm_ah+0xad/0x100 [mlx4_ib] [<ffffffff810c27c3>] ? is_module_address+0x23/0x30 [<ffffffffa028b05b>] handle_port_mgmt_change_event+0xeb/0x150 [mlx4_ib] [<ffffffffa028c177>] mlx4_ib_event+0x117/0x160 [mlx4_ib] [<ffffffff81552501>] ? _raw_spin_lock_irqsave+0x61/0x70 [<ffffffffa022718c>] mlx4_dispatch_event+0x6c/0x90 [mlx4_core] [<ffffffffa0221b40>] mlx4_eq_int+0x500/0x950 [mlx4_core] Reported by: Or Gerlitz <ogerlitz@mellanox.com> Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-08-10 13:02:24 -07:00
Roland Dreier	1da9b6b43e	Merge branches 'cma', 'ipoib', 'ocrdma' and 'qib' into for-next	2012-07-30 07:47:27 -07:00
Shlomo Pongratz	b63b70d877	IPoIB: Use a private hash table for path lookup in xmit path Dave Miller <davem@davemloft.net> provided a detailed description of why the way IPoIB is using neighbours for its own ipoib_neigh struct is buggy: Any time an ipoib_neigh is changed, a sequence like the following is made: spin_lock_irqsave(&priv->lock, flags); /* * It's safe to call ipoib_put_ah() inside * priv->lock here, because we know that * path->ah will always hold one more reference, * so ipoib_put_ah() will never do more than * decrement the ref count. / if (neigh->ah) ipoib_put_ah(neigh->ah); list_del(&neigh->list); ipoib_neigh_free(dev, neigh); spin_unlock_irqrestore(&priv->lock, flags); ipoib_path_lookup(skb, n, dev); This doesn't work, because you're leaving a stale pointer to the freed up ipoib_neigh in the special neigh->ha pointer cookie. Yes, it even fails with all the locking done to protect _changes_ to ipoib_neigh(n), and with the code in ipoib_neigh_free() that NULLs out the pointer. The core issue is that read side calls to to_ipoib_neigh(n) are not being synchronized at all, they are performed without any locking. So whether we hold the lock or not when making changes to ipoib_neigh(n) you still can have threads see references to freed up ipoib_neigh objects. cpu 1 cpu 2 n = ipoib_neigh() ipoib_neigh() = NULL kfree(n) n->foo == OOPS [..] Perhaps the ipoib code can have a private path database it manages entirely itself, which holds all the necessary information and is looked up by some generic key which is available easily at transmit time and does not involve generic neighbour entries. See <http://marc.info/?l=linux-rdma&m=132812793105624&w=2> and <http://marc.info/?l=linux-rdma&w=2&r=1&s=allows+references+to+freed+memory&q=b> for the full discussion. This patch aims to solve the race conditions found in the IPoIB driver. The patch removes the connection between the core networking neighbour structure and the ipoib_neigh structure. In addition to avoiding the race described above, it allows us to handle SKBs carrying IP packets that don't have any associated neighbour. We add an ipoib_neigh hash table with N buckets where the key is the destination hardware address. The ipoib_neigh is fetched from the hash table and instead of the stashed location in the neighbour structure. The hash table uses both RCU and reference counting to guarantee that no ipoib_neigh instance is ever deleted while in use. Fetching the ipoib_neigh structure instance from the hash also makes the special code in ipoib_start_xmit that handles remote and local bonding failover redundant. Aged ipoib_neigh instances are deleted by a garbage collection task that runs every M seconds and deletes every ipoib_neigh instance that was idle for at least 2*M seconds. The deletion is safe since the ipoib_neigh instances are protected using RCU and reference count mechanisms. The number of buckets (N) and frequency of running the GC thread (M), are taken from the exported arb_tbl. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-30 07:46:50 -07:00
Mike Marciniszyn	5d7fe4efbf	IB/qib: Fix size of cc_supported_table_entries Commit `36a8f01cd2` ("IB/qib: Add congestion control agent implementation") tries to store the value 1984 in a u8, which leads to truncation. Fix this by making the member big enough. This bug was detected by a smatch warning. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-29 20:26:10 -07:00
Roland Dreier	0764c76ecb	RDMA/ucma: Convert open-coded equivalent to memdup_user() Suggested by scripts/coccinelle/api/memdup_user.cocci. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-27 13:27:45 -07:00
Roland Dreier	9e8fa040cb	RDMA/ocrdma: Fix check of GSI CQs It looks like one check was accidentally duplicated, and the other 3 checks were left out. This was detected by scripts/coccinelle/tests/doubletest.cocci: drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:895:6-54: duplicated argument to && or \|\| Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-27 13:14:44 -07:00
Fengguang Wu	4e28904528	RDMA/cma: Use PTR_RET rather than if (IS_ERR(...)) + PTR_ERR Suggested by scripts/coccinelle/api/ptr_ret.cocci. Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-27 13:05:18 -07:00
Linus Torvalds	5dedb9f3bd	InfiniBand/RDMA changes for the 3.6 merge window: - Updates to the qib low-level driver - First chunk of changes for SR-IOV support for mlx4 IB - RDMA CM support for IPv6-only binding - Other misc cleanups and fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABCAAGBQJQDXhUAAoJEENa44ZhAt0hZxwQAJydS9f9me31AGa45SAq8rA8 6e3LgnQ6jS38he7LZZdSsT7g25jROCKOcTj6VUkNSVAVSkK8zcp7wngjDxw50IK6 FF9hNeljMkWBflOhxzB34DRGCW4b4J+Yt1o7v1RtawiG/Mhri/6imKL0Aqjt4GwX a1MPZn+xI2osLujfdHJtATPWWB9jCaXdFe4DJUNPdqJhS6TN7s8OP3XMiqoJtdaV ptHeSSbjdUR1mg/h2LU2FVmWHXNSBxn7MEsrBBRkQVyiEkXieFBLwPTp4DqmAUJf xugm6Hf4sbiQ+QuU0baJODt56wveuYVQ4HKUzE0urUFXyU4TUB9blrehWZnKsRte wo/w4nvVlnXgGhHH0Igq76RDX8aCwc/6uQJ/29oChWjrei3HE0LjmIlPAu0vAhyw ViLe02/r2gKXQv1NxIqhPmGsJTZizg2mUk2eEJHHPQb/NGL7iNo6b+141AfURoqQ goGmlGdzffCRpmo6FXFZ57RPVRS4gwMunCY/Pmvq5a2t4oZh8899l2+V3N7bCfvH +JdavxAjia9U4IlgPsAqVaz8z8TyHOY/lEd75Wnw8q9www3kLx7hASvHsdwrS4LL ihzECzsaXOSoIXQzghs+6iyp1pzmzE9ve8OqbIGJStzlrOLyn7Gjo+Ixwm0SLsTO I7h9iJ1OMKFZ8bzmHsWb =f09j -----END PGP SIGNATURE----- Merge tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA changes from Roland Dreier: - Updates to the qib low-level driver - First chunk of changes for SR-IOV support for mlx4 IB - RDMA CM support for IPv6-only binding - Other misc cleanups and fixes Fix up some add-add conflicts in include/linux/mlx4/device.h and drivers/net/ethernet/mellanox/mlx4/main.c * tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (30 commits) IB/qib: checkpatch fixes IB/qib: Add congestion control agent implementation IB/qib: Reduce sdma_lock contention IB/qib: Fix an incorrect log message IB/qib: Fix QP RCU sparse warnings mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them mlx4_core: Allow guests to have IB ports mlx4_core: Implement mechanism for reserved Q_Keys net/mlx4_core: Free ICM table in case of error IB/cm: Destroy idr as part of the module init error flow mlx4_core: Remove double function declarations IB/mlx4: Fill the masked_atomic_cap attribute in query device IB/mthca: Fill in sq_sig_type in query QP IB/mthca: Warning about event for non-existent QPs should show event type IB/qib: Fix sparse RCU warnings in qib_keys.c net/mlx4_core: Initialize IB port capabilities for all slaves mlx4: Use port management change event instead of smp_snoop IB/qib: RCU locking for MR validation IB/qib: Avoid returning EBUSY from MR deregister IB/qib: Fix UC MR refs for immediate operations ...	2012-07-24 13:56:26 -07:00
Linus Torvalds	3c4cfadef6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking changes from David S Miller: 1) Remove the ipv4 routing cache. Now lookups go directly into the FIB trie and use prebuilt routes cached there. No more garbage collection, no more rDOS attacks on the routing cache. Instead we now get predictable and consistent performance, no matter what the pattern of traffic we service. This has been almost 2 years in the making. Special thanks to Julian Anastasov, Eric Dumazet, Steffen Klassert, and others who have helped along the way. I'm sure that with a change of this magnitude there will be some kind of fallout, but such things ought the be simple to fix at this point. Luckily I'm not European so I'll be around all of August to fix things :-) The major stages of this work here are each fronted by a forced merge commit whose commit message contains a top-level description of the motivations and implementation issues. 2) Pre-demux of established ipv4 TCP sockets, saves a route demux on input. 3) TCP SYN/ACK performance tweaks from Eric Dumazet. 4) Add namespace support for netfilter L4 conntrack helpers, from Gao Feng. 5) Add config mechanism for Energy Efficient Ethernet to ethtool, from Yuval Mintz. 6) Remove quadratic behavior from /proc/net/unix, from Eric Dumazet. 7) Support for connection tracker helpers in userspace, from Pablo Neira Ayuso. 8) Allow userspace driven TX load balancing functions in TEAM driver, from Jiri Pirko. 9) Kill off NLMSG_PUT and RTA_PUT macros, more gross stuff with embedded gotos. 10) TCP Small Queues, essentially minimize the amount of TCP data queued up in the packet scheduler layer. Whereas the existing BQL (Byte Queue Limits) limits the pkt_sched --> netdevice queuing levels, this controls the TCP --> pkt_sched queueing levels. From Eric Dumazet. 11) Reduce the number of get_page/put_page ops done on SKB fragments, from Alexander Duyck. 12) Implement protection against blind resets in TCP (RFC 5961), from Eric Dumazet. 13) Support the client side of TCP Fast Open, basically the ability to send data in the SYN exchange, from Yuchung Cheng. Basically, the sender queues up data with a sendmsg() call using MSG_FASTOPEN, then they do the connect() which emits the queued up fastopen data. 14) Avoid all the problems we get into in TCP when timers or PMTU events hit a locked socket. The TCP Small Queues changes added a tcp_release_cb() that allows us to queue work up to the release_sock() caller, and that's what we use here too. From Eric Dumazet. 15) Zero copy on TX support for TUN driver, from Michael S. Tsirkin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1870 commits) genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEP r8169: revert "add byte queue limit support". ipv4: Change rt->rt_iif encoding. net: Make skb->skb_iif always track skb->dev ipv4: Prepare for change of rt->rt_iif encoding. ipv4: Remove all RTCF_DIRECTSRC handliing. ipv4: Really ignore ICMP address requests/replies. decnet: Don't set RTCF_DIRECTSRC. net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse. ipv4: Remove redundant assignment rds: set correct msg_namelen openvswitch: potential NULL deref in sample() tcp: dont drop MTU reduction indications bnx2x: Add new 57840 device IDs tcp: avoid oops in tcp_metrics and reset tcpm_stamp niu: Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value niu: Fix to check for dma mapping errors. net: Fix references to out-of-scope variables in put_cmsg_compat() net: ethernet: davinci_emac: add pm_runtime support net: ethernet: davinci_emac: Remove unnecessary #include ...	2012-07-24 10:01:50 -07:00
Roland Dreier	089117e1ad	Merge branches 'cma', 'cxgb4', 'misc', 'mlx4-sriov', 'mlx-cleanups', 'ocrdma' and 'qib' into for-linus	2012-07-22 23:26:17 -07:00
Linus Torvalds	cb47c1831f	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull target updates from Nicholas Bellinger: "There have been lots of work in a number of areas this past round. The highlights include: - Break out target_core_cdb.c emulation into SPC/SBC ops (hch) - Add a parse_cdb method to target backend drivers (hch) - Move sync_cache + write_same + unmap into spc_ops (hch) - Use target_execute_cmd for WRITEs in iscsi_target + srpt (hch) - Offload WRITE I/O backend submission in tcm_qla2xxx + tcm_fc (hch + nab) - Refactor core_update_device_list_for_node() into enable/disable funcs (agrover) - Replace the TCM processing thread with a TMR work queue (hch) - Fix regression in transport_add_device_to_core_hba from TMR conversion (DanC) - Remove racy, now-redundant check of sess_tearing_down with qla2xxx (roland) - Add range checking, fix reading of data len + possible underflow in UNMAP (roland) - Allow for target_submit_cmd() returning errors + convert fabrics (roland + nab) - Drop bogus struct file usage for iSCSI/SCTP (viro)" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (54 commits) iscsi-target: Drop bogus struct file usage for iSCSI/SCTP target: NULL dereference on error path target: Allow for target_submit_cmd() returning errors target: Check number of unmap descriptors against our limit target: Fix possible integer underflow in UNMAP emulation target: Fix reading of data length fields for UNMAP commands target: Add range checking to UNMAP emulation target: Add generation of LOGICAL BLOCK ADDRESS OUT OF RANGE target: Make unnecessarily global se_dev_align_max_sectors() static target: Remove se_session.sess_wait_list qla2xxx: Remove racy, now-redundant check of sess_tearing_down target: Check sess_tearing_down in target_get_sess_cmd() sbp-target: Consolidate duplicated error path code in sbp_handle_command() target: Un-export target_get_sess_cmd() qla2xxx: Get rid of redundant qla_tgt_sess.tearing_down target: Make core_disable_device_list_for_node use pre-refactoring lock ordering target: refactor core_update_device_list_for_node() target: Eliminate else using boolean logic target: Misc retval cleanups target: Remove hba param from core_dev_add_lun ...	2012-07-22 13:31:57 -07:00
Mike Marciniszyn	7fac33014f	IB/qib: checkpatch fixes Elminate some simple_strto* usage. checkpatch also noted pr_ conversations, which have been done as recommended. The pr_fmt() define is used to shorten line length. Other multi-line string warnings are also elmininated. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-19 11:20:04 -07:00
Mike Marciniszyn	36a8f01cd2	IB/qib: Add congestion control agent implementation Add a congestion control agent in the driver that handles gets and sets from the congestion control manager in the fabric for the Performance Scale Messaging (PSM) library. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-19 11:20:04 -07:00
Mike Marciniszyn	551ace124d	IB/qib: Reduce sdma_lock contention Profiling has shown that sdma_lock is proving a bottleneck for performance. The situations include: - RDMA reads when krcvqs > 1 - post sends from multiple threads For RDMA read the current global qib_wq mechanism runs on all CPUs and contends for the sdma_lock when multiple RMDA read requests are fielded on differenct CPUs. For post sends, the direct call to qib_do_send() from multiple threads causes the contention. Since the sdma mechanism is per port, this fix converts the existing workqueue to a per port single thread workqueue to reduce the lock contention in the RDMA read case, and for any other case where the QP is scheduled via the workqueue mechanism from more than 1 CPU. For the post send case, This patch modifies the post send code to test for a non empty sdma engine. If the sdma is not idle the (now single thread) workqueue will be used to trigger the send engine instead of the direct call to qib_do_send(). Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-19 11:19:58 -07:00
Betty Dall	f3331f88a4	IB/qib: Fix an incorrect log message There is a cut-and-paste typo in the function qib_pci_slot_reset() where it prints that the "link_reset" function is called rather than the "slot_reset" function. This makes the message misleading. Signed-off-by: Betty Dall <betty.dall@hp.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-19 11:19:08 -07:00
Amir Vadai	d9236c3f10	{NET,IB}/mlx4: Add rmap support to mlx4_assign_eq Enable callers of mlx4_assign_eq to supply a pointer to cpu_rmap. If supplied, the assigned IRQ is tracked using rmap infrastructure. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-19 08:34:37 -07:00
Mike Marciniszyn	1fb9fed6d4	IB/qib: Fix QP RCU sparse warnings Commit `af061a644a` ("IB/qib: Use RCU for qpn lookup") introduced sparse warnings. This patch corrects those issues. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-17 10:18:37 -07:00
David S. Miller	6700c2709c	net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() This will be used so that we can compose a full flow key. Even though we have a route in this context, we need more. In the future the routes will be without destination address, source address, etc. keying. One ipv4 route will cover entire subnets, etc. In this environment we have to have a way to possess persistent storage for redirects and PMTU information. This persistent storage will exist in the FIB tables, and that's why we'll need to be able to rebuild a full lookup flow key here. Using that flow key will do a fib_lookup() and create/update the persistent entry. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-17 03:29:28 -07:00
Christoph Hellwig	e672a47fd9	srpt: use target_execute_cmd for WRITEs in srpt_handle_rdma_comp srpt_handle_rdma_comp is called from kthread context and thus can execute target_execute_cmd directly. srpt_abort_cmd sets the CMD_T_LUN_STOP flag directly, and thus the abuse of transport_generic_handle_data can be replaced with an opencoded variant of that code path. I'm still not happy about a fabric driver poking into target core internals like this, but let's defer the bigger architecture changes for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2012-07-16 17:35:18 -07:00
Jack Morgenstein	6634961c14	mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them To allow easy paravirtualization of P_Key and GID table sizes, keep paravirtualized sizes in mlx4_dev->caps, but save the actual physical sizes from FW in struct: mlx4_dev->phys_cap. In addition, in SR-IOV mode, do the following: 1. Reduce reported P_Key table size by 1. This is done to reserve the highest P_Key index for internal use, for declaring an invalid P_Key in P_Key paravirtualization. We require a P_Key index which always contain an invalid P_Key value for this purpose (i.e., one which cannot be modified by the subnet manager). The way to do this is to reduce the P_Key table size reported to the subnet manager by 1, so that it will not attempt to access the P_Key at index #127. 2. Paravirtualize the GID table size to 1. Thus, each guest sees only a single GID (at its paravirtualized index 0). In addition, since we are paravirtualizing the GID table size to 1, we add paravirtualization of the master GID event here (i.e., we do not do ib_dispatch_event() for the GUID change event on the master, since its (only) GUID never changes). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-11 11:52:23 -07:00
Dotan Barak	87d4abda83	IB/cm: Destroy idr as part of the module init error flow Clean the idr as part of the error flow since it is a resource too. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-11 09:22:58 -07:00
Dotan Barak	47e956b2a6	IB/mlx4: Fill the masked_atomic_cap attribute in query device When the user queries for device capabilities, fill in the masked_atomic_cap attribute with the real support level of atomic capabilities instead of using a hard coded value. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-11 09:22:57 -07:00
Dotan Barak	16551d4501	IB/mthca: Fill in sq_sig_type in query QP The query QP code was didn't fill that attribute, do that. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-11 09:22:57 -07:00
Dotan Barak	9bbeb6663e	IB/mthca: Warning about event for non-existent QPs should show event type Events received for non-existent QPs should generate a warning that includes the event type that was received. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-11 09:22:57 -07:00
David S. Miller	04c9f416e3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/batman-adv/bridge_loop_avoidance.c net/batman-adv/bridge_loop_avoidance.h net/batman-adv/soft-interface.c net/mac80211/mlme.c With merge help from Antonio Quartulli (batman-adv) and Stephen Rothwell (drivers/net/usb/qmi_wwan.c). The net/mac80211/mlme.c conflict seemed easy enough, accounting for a conversion to some new tracing macros. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-10 23:56:33 -07:00
Eric Dumazet	b28ba72665	IPoIB: fix skb truesize underestimatiom Or Gerlitz reported triggering of WARN_ON_ONCE(delta < len); in skb_try_coalesce() This warning tracks drivers that incorrectly set skb->truesize IPoIB indeed allocates a full page to store a fragment, but only accounts in skb->truesize the used part of the page (frame length) This patch fixes skb truesize underestimation, and also fixes a performance issue, because RX skbs have not enough tailroom to allow IP and TCP stacks to pull their header in skb linear part without an expensive call to pskb_expand_head() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Cc: Erez Shitrit <erezsh@mellanox.com> Cc: Shlomo Pongartz <shlomop@mellanox.com> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-10 23:33:12 -07:00
Mike Marciniszyn	7e23017704	IB/qib: Fix sparse RCU warnings in qib_keys.c Commit `8aac4cc3a9` ("IB/qib: RCU locking for MR validation") introduced new sparse warnings in qib_keys.c. Acked-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-10 10:01:56 -07:00
Jack Morgenstein	00f5ce99dc	mlx4: Use port management change event instead of smp_snoop The port management change event can replace smp_snoop. If the capability bit for this event is set in dev-caps, the event is used (by the driver setting the PORT_MNG_CHG_EVENT bit in the async event mask in the MAP_EQ fw command). In this case, when the driver passes incoming SMP PORT_INFO SET mads to the FW, the FW generates port management change events to signal any changes to the driver. If the FW generates these events, smp_snoop shouldn't be invoked in ib_process_mad(), or duplicate events will occur (once from the FW-generated event, and once from smp_snoop). In the case where the FW does not generate port management change events smp_snoop needs to be invoked to create these events. The flow in smp_snoop has been modified to make use of the same procedures as in the fw-generated-event event case to generate the port management events (LID change, Client-rereg, Pkey change, and/or GID change). Port management change event handling required changing the mlx4_ib_event and mlx4_dispatch_event prototypes; the "param" argument (last argument) had to be changed to unsigned long in order to accomodate passing the EQE pointer. We also needed to move the definition of struct mlx4_eqe from net/mlx4.h to file device.h -- to make it available to the IB driver, to handle port management change events. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-10 09:47:10 -07:00
Mike Marciniszyn	8aac4cc3a9	IB/qib: RCU locking for MR validation Profiling indicates that MR validation locking is expensive. The MR table is largely read-only and is a suitable candidate for RCU locking. The patch uses RCU locking during validation to eliminate one lock/unlock during that validation. Reviewed-by: Mike Heinz <michael.william.heinz@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-08 18:05:19 -07:00
Mike Marciniszyn	6a82649f21	IB/qib: Avoid returning EBUSY from MR deregister A timing issue can occur where qib_mr_dereg can return -EBUSY if the MR use count is not zero. This can occur if the MR is de-registered while RDMA read response packets are being progressed from the SDMA ring. The suspicion is that the peer sent an RDMA read request, which has already been copied across to the peer. The peer sees the completion of his request and then communicates to the responder that the MR is not needed any longer. The responder tries to de-register the MR, catching some responses remaining in the SDMA ring holding the MR use count. The code now uses a get/put paradigm to track MR use counts and coordinates with the MR de-registration process using a completion when the count has reached zero. A timeout on the delay is in place to catch other EBUSY issues. The reference count protocol is as follows: - The return to the user counts as 1 - A reference from the lk_table or the qib_ibdev counts as 1. - Transient I/O operations increase/decrease as necessary A lot of code duplication has been folded into the new routines init_qib_mregion() and deinit_qib_mregion(). Additionally, explicit initialization of fields to zero is now handled by kzalloc(). Also, duplicated code 'while.*num_sge' that decrements reference counts have been consolidated in qib_put_ss(). Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-08 18:05:19 -07:00
Mike Marciniszyn	354dff1bd8	IB/qib: Fix UC MR refs for immediate operations An MR reference leak exists when handling UC RDMA writes with immediate data because we manipulate the reference counts as if the operation had been a send. This patch moves the last_imm label so that the RDMA write operations with immediate data converge at the cq building code. The copy/mr deref code is now done correctly prior to the branch to last_imm. Reviewed-by: Edward Mascarenhas <edward.mascarenhas@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-08 18:05:19 -07:00
Jack Morgenstein	3045f09203	IB/core: Move CM_xxx_ATTR_ID macros from cm_msgs.h to ib_cm.h These macros will be reused by the mlx4 SRIOV-IB CM paravirtualization code, and there is no reason to have them declared both in the IB core in the mlx4 IB driver. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-08 18:05:06 -07:00
Erez Shitrit	aeab97ed15	IB/sa: Add GuidInfoRecord query support This query is needed for SRIOV alias GUID support. The query is implemented per the IB Spec definition in section 15.2.5.18 (GuidInfoRecord). Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-08 18:05:06 -07:00
Jack Morgenstein	b1d8eb5a21	IB/mlx4: Add debug prints Define pr_fmt and add some pr_debug prints. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-08 18:05:06 -07:00
Roland Dreier	d90f9b3591	IB: Use IS_ENABLED(CONFIG_IPV6) Instead of testing defined(CONFIG_IPV6) \|\| defined(CONFIG_IPV6_MODULE) Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-08 18:04:32 -07:00
Roland Dreier	f747c34af4	RDMA/cxgb4: Fix endianness of addition to mpa->private_data_size sparse correctly warns that if mpa->private_data_size is __be16, then doing += on it is wrong, even if we do += htons(<something>) -- on a little endian system, carries will go the wrong way. Fix this up by doing the addition in native byte order. Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-08 18:02:33 -07:00
Sean Hefty	68602120e4	RDMA/cma: Allow user to restrict listens to bound address family Provide an option for the user to specify that listens should only accept connections where the incoming address family matches that of the locally bound address. This is used to support the equivalent of IPV6_V6ONLY socket option, which allows an app to only accept connection requests directed to IPv6 addresses. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-08 18:02:24 -07:00
Sean Hefty	406b6a25f8	RDMA/cma: Listen on specific address family The rdma_cm maps IPv4 and IPv6 addresses to the same service ID. This prevents apps from listening only for IPv4 or IPv6 addresses. It also results in an app binding to an IPv4 address receiving connection requests for an IPv6 address. Change this to match socket behavior: restrict listens on IPv4 addresses to only IPv4 addresses, and if a listen is on an IPv6 address, allow it to receive either IPv4 or IPv6 addresses, based on its address family binding. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-08 18:02:24 -07:00
Sean Hefty	5b0ec991c0	RDMA/cma: Bind to a specific address family The RDMA CM uses a single port space for all associated (tcp, udp, etc.) port bindings, regardless of the address family that the user binds to. The result is that if a user binds to AF_INET, but does not specify an IP address, the bind will occur for AF_INET6. This causes an attempt to bind to the same port using AF_INET6 to fail, and connection requests to AF_INET6 will match with the AF_INET listener. Align the behavior with sockets and restrict the bind to AF_INET only. If a user binds to AF_INET6, we bind the port to AF_INET6 and AF_INET depending on the value of bindv6only. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-08 18:02:23 -07:00
Hadar Hen Zion	0ff1fb654b	{NET, IB}/mlx4: Add device managed flow steering firmware API The driver is modified to support three operation modes. If supported by firmware use the device managed flow steering API, that which we call device managed steering mode. Else, if the firmware supports the B0 steering mode use it, and finally, if none of the above, use the A0 steering mode. When the steering mode is device managed, the code is modified such that L2 based rules set by the mlx4_en driver for Ethernet unicast and multicast, and the IB stack multicast attach calls done through the mlx4_ib driver are all routed to use the device managed API. When attaching rule using device managed flow steering API, the firmware returns a 64 bit registration id, which is to be provided during detach. Currently the firmware is always programmed during HCA initialization to use standard L2 hashing. Future work should be done to allow configuring the flow-steering hash function with common, non proprietary means. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-07 16:23:05 -07:00
Roland Dreier	d1e09ebf42	RDMA/ocrdma: Fix assignment of max_srq_sge in device query We want to set attr->max_srq_sge to dev->attr.max_srq_sge, not to itself. This was detected by Coverity (CID 709210). Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-07-07 15:13:47 -07:00
David S. Miller	700db99d01	ipoib: Need to do dst_neigh_lookup_skb() outside of priv->lock. Otherwise local_bh_enable() complains. Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-05 21:08:05 -07:00
David S. Miller	534cb283ef	cxgb3: Convert t3_l2t_get() over to dst_neigh_lookup(). This means passing in a suitable destination address. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-05 02:29:40 -07:00
David S. Miller	178709bbfe	ipoib: Convert over to dev_lookup_neigh_skb(). Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-05 01:09:36 -07:00
Pablo Neira Ayuso	a31f2d17b3	netlink: add netlink_kernel_cfg parameter to netlink_kernel_create This patch adds the following structure: struct netlink_kernel_cfg { unsigned int groups; void (input)(struct sk_buff skb); struct mutex *cb_mutex; }; That can be passed to netlink_kernel_create to set optional configurations for netlink kernel sockets. I've populated this structure by looking for NULL and zero parameters at the existing code. The remaining parameters that always need to be set are still left in the original interface. That includes optional parameters for the netlink socket creation. This allows easy extensibility of this interface in the future. This patch also adapts all callers to use this new interface. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-29 16:46:02 -07:00
David S. Miller	b26d344c6b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/caif/caif_hsi.c drivers/net/usb/qmi_wwan.c The qmi_wwan merge was trivial. The caif_hsi.c, on the other hand, was not. It's a conflict between `1c385f1fdf` ("caif-hsi: Replace platform device with ops structure.") in the net-next tree and commit `39abbaef19` ("caif-hsi: Postpone init of HIS until open()") in the net tree. I did my best with that one and will ask Sjur to check it out. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-28 17:37:00 -07:00
David S. Miller	e05273341c	infiniband: netlink: Move away from NLMSG_NEW(). And use nlmsg_data() while we're here too. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-26 21:54:14 -07:00
Roland Dreier	2e51fd3c13	Merge branches 'cma' and 'ocrdma' into for-linus	2012-06-24 04:59:59 -07:00
Sean Hefty	4dd81e8956	RDMA/cma: QP type check on received REQs should be AND not OR Change \|\| check to the intended && when checking the QP type in a received connection request against the listening endpoint. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-06-19 20:04:04 -07:00
Dan Carpenter	7b33dc2b05	RDMA/ocrdma: Fix off by one in ocrdma_query_gid() The dev->sgid_tbl[] array is allocated in ocrdma_alloc_resources(). It has OCRDMA_MAX_SGID elements so the test here is off by one. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-06-14 13:30:41 -07:00
Parav Pandit	a3698a9b91	RDMA/ocrdma: Fixed RQ error CQE polling Fix RQ/SRQ error CQE polling. Return error CQE to consumer for error case which was not returned previously. Signed-off-by: Parav Pandit <parav.pandit@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-06-11 09:38:36 -07:00
Mahesh Vardhamanaiah	634c5796a5	RDMA/ocrdma: Correct queue SGE calculation Fix max sge calculation for sq, rq, srq for all hardware types. Signed-off-by: Mahesh Vardhamanaiah <mahesh.vardhamanaiah@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-06-11 09:38:35 -07:00
Mahesh Vardhamanaiah	07bb54244e	RDMA/ocrdma: Correct reported max queue sizes Fix code to read the max wqe and max rqe values from mailbox response. Signed-off-by: Mahesh Vardhamanaiah <mahesh.vardhamanaiah@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-06-11 09:38:34 -07:00
Parav Pandit	6ab6827ee9	RDMA/ocrdma: Fixed GID table for vlan and events 1. Fix reporting GID table addition events. 2. Enable vlan based GID entries only when VLAN is enabled at compile time (test CONFIG_VLAN_8021Q / CONFIG_VLAN_8021Q_MODULE). Signed-off-by: Parav Pandit <parav.pandit@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-06-11 09:38:32 -07:00
Roland Dreier	20952cdd8e	Merge branches 'cxgb4', 'mlx4' and 'ocrdma' into for-linus	2012-06-06 10:08:11 -07:00
Sagi Grimberg	fc2d004419	IB/mlx4: Fix max_wqe capacity reported from query device 1. Limit the max number of WQEs per QP reported when querying the device, so that ib_create_qp() will not fail for a QP size that the device claimed to support due to additional headroom WQEs being allocated. 2. Limit qp resources accepted for ib_create_qp() to the limits reported in ib_query_device(). In kernel space, make sure that the limits returned to the caller following qp creation also lie within the reported device limits. For userspace, report as before, and do adjustment in libmlx4 (so as not to break ABI). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Sagi Grimberg <sagig@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-06-06 10:08:03 -07:00
Shlomo Pongratz	3aac6ff16a	IB/mlx4: Fix EQ deallocation in legacy mode Commit `e605b743f3` ("IB/mlx4: Increase the number of vectors (EQs) available for ULPs") didn't handle correctly the case where there aren't enough MSI-X vectors to increase the number of EQs, so only the legacy EQs are allocated. This results in an attempt to memset() to zero the EQ table which was never allocated and a kernel crash. Fix this by checking in the teardown flow if the table of EQs was ever allocated. Also remove some unneeded setting to zero of the EQ related fields in struct mlx4_ib_dev. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-06-03 23:02:16 -07:00
Thadeu Lima de Souza Cascardo	71b43fd573	RDMA/cxgb4: Fix crash when peer address is 0.0.0.0 When using rping -c -a 0.0.0.0 with iw_cxgb4, the system crashes when rdma_connect() is called. ip_dev_find() will return NULL, but pdev is accessed anyway. Checking that pdev is NULL and returning -ENODEV prevents the system from crashing. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-06-03 22:59:15 -07:00
Devendra Naga	7ad5e449b9	RDMA/ocrdma: Remove unnecessary version.h includes "make versioncheck" shows: drivers/infiniband/hw/ocrdma/ocrdma_main.c: 29 linux/version.h not needed. drivers/infiniband/hw/ocrdma/ocrdma_verbs.h: 31 linux/version.h not needed. Signed-off-by: Devendra Naga <devendra.aaru@gmail.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-29 12:53:11 -07:00
Parav Pandit	804eaf29ba	RDMA/ocrdma: Fix signaled event for SRQ_LIMIT_REACHED Signed-off-by: Parav Pandit <parav.pandit@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-29 12:52:00 -07:00
Parav Pandit	cd4fedf9cf	RDMA/ocrdma: Correct queue free count math Correct queue free count math for SQ, RQ for all hardware type. Update user-kernel ABI interface. Signed-off-by: Parav Pandit <parav.pandit@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-29 12:49:36 -07:00
Linus Torvalds	c23ddf7857	InfiniBand/RDMA changes for the 3.5 merge window: - Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters - Add generic and mlx4 support for "raw" QPs: allow suitably privileged applications to send and receive arbitrary packets directly to/from the hardware - Add "doorbell drop" handling to the cxgb4 driver - A fairly large batch of qib hardware driver changes - A few fixes for lockdep-detected issues - A few other miscellaneous fixes and cleanups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABCAAGBQJPumbZAAoJEENa44ZhAt0h8LgP/0fXe7Szm3n6P6UvMAVqkagM 4PpreH3mpWUFpzqeQE1JPDtgx700R6aPipbHqgIN+k61RWMpLjICGcNx7iwxn1I+ zqdquGygWgjceLz+BLVlk+iBmJt3vZ3fPRAXc7fdP+jhIarWkNIOy1pXWTUuRvED jL8jIaxhCcgAVzm/zNyt6IPxkaHvCz7K9wqmpyU0dsO9OyPdGvWA9+CkGXwmOCPq mxSVhWnfGsMkPBsL7EgTC5KP/ox2PKq6rFgysmVVS+rKCpP0L8BEVQyGX3Gf8KA8 yV+KdTi9ofDnFrv6R7Wz0v7HRUih8GRssakzBu7Y7HLfK1M/QwMG0GUAibXGZObc vUXuQ3uRJ/cIzMPXqKeGYwpb5t+TmxyjhWu44OjCUQkNau91+9BSbA69S88KXc49 aTJiCZlhPuGf4uGMWJJuPLcE2xO2QCZj+8ckL2STYrIip6GWlCH02kJaQmRkuWH2 UhMOeJDBC4nvh4EQT/WwHpGzyhkavE2ayfo5YemxBJXo+P5Mdbf7WIDRQDLUEeQH F8sPoccH4hDiAorN/SkTsm14jVTP7oWW1M40Ont59Nhbgm88MsVkvjoneHnfBvbD HjK92soCWnYTAoREfj0G4xUxZgMdOZcezWrX0rx5LJ8Ju9y4zAi3cKGr7lg6hs4X syKfN0VjiDRtJ+pxayi3 =yWfr -----END PGP SIGNATURE----- Merge tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA changes from Roland Dreier: - Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters - Add generic and mlx4 support for "raw" QPs: allow suitably privileged applications to send and receive arbitrary packets directly to/from the hardware - Add "doorbell drop" handling to the cxgb4 driver - A fairly large batch of qib hardware driver changes - A few fixes for lockdep-detected issues - A few other miscellaneous fixes and cleanups Fix up trivial conflict in drivers/net/ethernet/emulex/benet/be.h. * tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (53 commits) RDMA/cxgb4: Include vmalloc.h for vmalloc and vfree IB/mlx4: Fix mlx4_ib_add() error flow IB/core: Fix IB_SA_COMP_MASK macro IB/iser: Fix error flow in iser ep connection establishment IB/mlx4: Increase the number of vectors (EQs) available for ULPs RDMA/cxgb4: Add query_qp support RDMA/cxgb4: Remove kfifo usage RDMA/cxgb4: Use vmalloc() for debugfs QP dump RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues RDMA/cxgb4: Disable interrupts in c4iw_ev_dispatch() RDMA/cxgb4: Add DB Overflow Avoidance RDMA/cxgb4: Add debugfs RDMA memory stats cxgb4: DB Drop Recovery for RDMA and LLD queues cxgb4: Common platform specific changes for DB Drop Recovery cxgb4: Detect DB FULL events and notify RDMA ULD RDMA/cxgb4: Drop peer_abort when no endpoint found RDMA/cxgb4: Always wake up waiters in c4iw_peer_abort_intr() mlx4_core: Change bitmap allocator to work in round-robin fashion RDMA/nes: Don't call event handler if pointer is NULL RDMA/nes: Fix for the ORD value of the connecting peer ...	2012-05-21 17:54:55 -07:00
Linus Torvalds	c9bfa7d75b	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull scsi-target changes from Nicholas Bellinger: "There has been lots of work in existing code in a number of areas this past cycle. The major highlights have been: * Removal of transport_do_task_sg_chain() from core + fabrics (Roland) * target-core: Removal of se_task abstraction from target-core and enforce hw_max_sectors for pSCSI backends (hch) * Re-factoring of iscsi-target tx immediate/response queues (agrover) * Conversion of iscsi-target back to using target core memory allocation logic (agrover) We've had one last minute iscsi-target patch go into for-next to address a nasty regression bug related to the target core allocation logic conversion from agrover that is not included in friday's linux-next build, but has been included in this series. On the new fabric module code front for-3.5, here is a brief status update for the three currently in flight this round: * usb-gadget target driver: Sebastian Siewior's driver for supporting usb-gadget target mode operation. This will be going out as a separate PULL request from target-pending/usb-target-merge with subsystem maintainer ACKs. There is one minor target-core patch in this series required to function. * sbp ieee-1394/firewire target driver: Chris Boot's driver for supportting the Serial Block Protocol (SBP) across IEEE-1394 Firewire hardware. This will be going out as a separate PULL request from target-pending/sbp-target-merge with two additional drivers/firewire/ patches w/ subsystem maintainer ACKs. * qla2xxx LLD target mode infrastructure changes + tcm_qla2xxx: The Qlogic >= 24xx series HW target mode LLD infrastructure patch-set and tcm_qla2xxx fabric driver. Support for FC target mode using qla2xxx LLD code has been officially submitted by Qlogic to James below, and is currently outstanding but not yet merged into scsi.git/for-next.. [PATCH 00/22] qla2xxx: Updates for scsi "misc" branch http://www.spinics.net/lists/linux-scsi/msg59350.html Note there are zero direct dependencies upon this for-next series for the qla2xxx LLD target + tcm_qla2xxx patches submitted above, and over the last days the target mode team has been tracking down an tcm_qla2xxx specific active I/O shutdown bug that appears to now be almost squashed for 3.5-rc-fixes." * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (47 commits) iscsi-target: Fix iov_count calculation bug in iscsit_allocate_iovecs iscsi-target: remove dead code in iscsi_check_valuelist_for_support target: Handle ATA_16 passthrough for pSCSI backend devices target: Add MI_REPORT_TARGET_PGS ext. header + implict_trans_secs attribute target: Fix MAINTENANCE_IN service action CDB checks to use lower 5 bits target: add support for the WRITE_VERIFY command target: make target_put_session void target: cleanup transport_execute_tasks() target: Remove max_sectors device attribute for modern se_task less code target: lock => unlock typo in transport_lun_wait_for_tasks target: Enforce hw_max_sectors for SCF_SCSI_DATA_SG_IO_CDB target: remove the t_se_count field in struct se_cmd target: remove the t_task_cdbs_ex_left field in struct se_cmd target: remove the t_task_cdbs_left field in struct se_cmd target: remove struct se_task target: move the state and execute lists to the command target: simplify command to task linkage target: always allocate a single task target: replace ->execute_task with ->execute_cmd target: remove the task_sectors field in struct se_task ...	2012-05-21 17:37:09 -07:00
Roland Dreier	cc169165c8	Merge branches 'core', 'cxgb4', 'ipath', 'iser', 'lockdep', 'mlx4', 'nes', 'ocrdma', 'qib' and 'raw-qp' into for-linus	2012-05-21 09:00:47 -07:00
Vipul Pandya	e572568fbc	RDMA/cxgb4: Include vmalloc.h for vmalloc and vfree Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-21 09:00:34 -07:00
Jack Morgenstein	035b1032b5	IB/mlx4: Fix mlx4_ib_add() error flow We need to use a different loop index for mlx4_counter_alloc() and for device_create_file() iterations: the mlx4_counter_alloc() loop index is used in the error flow to free counters. If the same loop index is used for device_create_file() and, say, the device_create_file() loop fails on the first iteration, the allocated counters will not be freed. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-18 17:45:01 -07:00
Or Gerlitz	7d9c0de4ab	IB/iser: Fix error flow in iser ep connection establishment The current error flow code was releasing the IB connection object and calling iscsi_destroy_endpoint() directly without going through the reference counting mechanism introduced in commit `39ff05d` ("IB/iser: Enhance disconnection logic for multi-pathing"). This resulted in a double free of the iscsi endpoint object, which causes a kernel NULL pointer dereference. Fix that by plugging into the IB conn reference counting correctly. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-18 17:05:31 -07:00
Shlomo Pongratz	e605b743f3	IB/mlx4: Increase the number of vectors (EQs) available for ULPs Enable IB ULPs to use a larger portion of the device EQs (which map to IRQs). The mlx4_ib driver follows the mlx4_core framework of the EQs to be divided among the device ports. In this scheme, for each IB port, the number of allocated EQs follows the number of cores, subject to other system constraints, such as number available MSI-X vectors. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-18 17:04:00 -07:00
Vipul Pandya	67bbc05512	RDMA/cxgb4: Add query_qp support This allows querying the QP state before flushing. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-18 13:22:37 -07:00
Vipul Pandya	ec3eead217	RDMA/cxgb4: Remove kfifo usage Using kfifos for ID management was limiting the number of QPs and preventing NP384 MPI jobs. So replace it with a simple bitmap allocator. Remove IDs from the IDR tables before deallocating them. This bug was causing the BUG_ON() in insert_handle() to fire because the ID was getting reused before being removed from the IDR table. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-18 13:22:36 -07:00
Vipul Pandya	d716a2a014	RDMA/cxgb4: Use vmalloc() for debugfs QP dump This allows dumping thousands of QPs. Log active open failures of interest. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-18 13:22:35 -07:00
Vipul Pandya	422eea0a8c	RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues Add module option db_fc_threshold which is the count of active QPs that trigger automatic db flow control mode. Automatically transition to/from flow control mode when the active qp count crosses db_fc_theshold. Add more db debugfs stats On DB DROP event from the LLD, recover all the iwarp queues. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-18 13:22:33 -07:00
Vipul Pandya	4984037bef	RDMA/cxgb4: Disable interrupts in c4iw_ev_dispatch() Use GFP_ATOMIC in _insert_handle() if ints are disabled. Don't panic if we get an abort with no endpoint found. Just log a warning. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-18 13:22:32 -07:00
Vipul Pandya	2c97478106	RDMA/cxgb4: Add DB Overflow Avoidance Get FULL/EMPTY/DROP events from LLD. On FULL event, disable normal user mode DB rings. Add modify_qp semantics to allow user processes to call into the kernel to ring doobells without overflowing. Add DB Full/Empty/Drop stats. Mark queues when created indicating the doorbell state. If we're in the middle of db overflow avoidance, then newly created queues should start out in this mode. Bump the C4IW_UVERBS_ABI_VERSION to 2 so the user mode library can know if the driver supports the kernel mode db ringing. Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-18 13:22:31 -07:00
Vipul Pandya	8d81ef34b2	RDMA/cxgb4: Add debugfs RDMA memory stats Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-18 13:22:29 -07:00
Steve Wise	14b9222808	RDMA/cxgb4: Drop peer_abort when no endpoint found Log a warning and drop the abort message. Otherwise we will do a bogus wake_up() and crash. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-15 09:46:09 -07:00
Steve Wise	0f1dcfae6b	RDMA/cxgb4: Always wake up waiters in c4iw_peer_abort_intr() This fixes a race where an ingress abort fails to wake up the thread blocked in rdma_init() causing the app to hang. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-15 09:46:04 -07:00
Tatyana Nikolova	784d135f96	RDMA/nes: Don't call event handler if pointer is NULL Don't call the ibqp event_handler pointer in the case it wasn't initialized. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Donald Wood <Donald.E.Wood@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-14 12:48:07 -07:00
Tatyana Nikolova	d3e5132814	RDMA/nes: Fix for the ORD value of the connecting peer Set ORD value of the connecting peer to be at least one in order to accommodate an RDMA READ Request message. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Signed-off-by: Donald Wood <Donald.E.Wood@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-14 12:47:18 -07:00
Mike Marciniszyn	1c94283ddb	IB/qib: Add cache line awareness to qib_qp and qib_devdata structures This patch reorganizes the QP and devdata files to be more cache line aware. qib_qp fields in particular are split into read-mostly, send, and receive fields. qib_devdata fields are split into read-mostly and read/write fields Testing has show that bidirectional tests improve by as much as 100% with this patch. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-14 12:43:34 -07:00
Jim Foraker	3236b2d469	IB/qib: MADs with misset M_Keys should return failure If a MAD is sent directly to the local HCA rather than placed on a QP and the MAD fails M_Key checks, there is no means to generate a timeout for the client, which may hang. Instead we report IB_MAD_RESULT_FAILURE, which operates the same for on-the-wire packets, but will generate a send failure back to the client. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jim Foraker <foraker1@llnl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-14 12:42:44 -07:00
Jim Foraker	6199c8961e	IB/qib: Fix M_Key lease timeout handling If a port has an M_Key lease set, the M_Key protect bits set to 1, and a SubnSet arrives with an invalid M_Key, an M_Key mismatch trap is generated, the lease timer begins as expected, and eventually the M_Key protect bits will be set back to 0 as per the spec. However, if any other SMP with an invalid M_Key arrives, the lease timer is expired and the M_Key protect bits remain in force. This is not according to to spec. In particular, C14-17 says that a lease timer that is underway is not affected by protection level checks (ie, at protection level 1, a SubnGet with a bad M_Key may be successful, but does not stop the timer), and C14-19 says that the timer shall stop when a valid M_Key has been received. C14-19 is the only compliance statement that specifies a stopping condition for the timer. This behavior is magnified if the port's Master SM LID attribute points at itself. In that case, the M_Key mismatch trap is sufficient to expire the timer, and the mkey lease attribute is rendered useless. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Signed-off-by: Jim Foraker <foraker1@llnl.gov> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-14 12:42:42 -07:00
Mitko Haralanov	f665acb3cb	IB/qib: Fix QLE734X link cycling The SERDES was using the incorrect Frequency Loop Bandwidth setting causing the link to cycle through the Physical link negotiation state machine. Fixing the Frequency Loop Bandwidth setting in the SERDES helps the link come up faster and more reliably. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-14 12:39:26 -07:00
Mitko Haralanov	6ceaadee34	IB/qib: Display correct value for number of contexts A "fix" for a bug with the number of contexts on a single-port board caused the calculation to be off by one, which causes problems with the upper layers. The same problem exists for number of free contexts, which is also fixed here. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-14 12:39:04 -07:00
Todd Rimmer	4ccf28a26c	IB/qib: Correct ordering of reregister vs. port active events When a port first goes active with SMA Set(PortInfo) and reregister bit set, the driver sends up the reregister event followed by a port active event. The problem is that in response to reregister event most apps try to issue a SA query of some sort, but that fails because port is not active. The qib driver needs to a trivial change to correct this behavior. This issue has been there for a while; however the recent serdes work has probably made the delay between the reregister event and the active event larger and hence opened the race far enough so that its being seen more often. The patch also changes the clientrereg local to a u8 and saves off the rereg bit into it. The code following the nested subn_get_portinfo() now restores that bit per o14-12.2.1 with a logical OR from that copy. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-14 12:38:28 -07:00
Mike Marciniszyn	bb77a07723	IB/qib: Optimize pio ack buffer allocation This patch optimizes pio buffer allocation in the kernel. For qib, kernel pio buffers are used for sending acks. The code to allocate the buffer would always start at 0 until it found a buffer. This means that an average of 64 comparisions were done on each allocate, since the busy bit won't be cleared until the bits are refreshed when buffers are exhausted. This patch adds two new fields in the devdata struct, last_pio and min_kernel_pio. last_pio is the last buffer that was allocated. min_kernel_pio is the lowest potential available buffer. min_kernel_pio is modifed as contexts are allocated and deallocted. Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-14 12:37:03 -07:00
Mike Marciniszyn	cca195a168	IB/qib: Add prefetch for eager buffers Add a prefetch call when a packet has been stored. The nature of the prefetch is correctly determined by the alternatives mechanism. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-14 12:36:18 -07:00
Yishai Hadas	c4870eb874	IB/core: Fix mismatch between locked and pinned pages Commit `bc3e53f682` ("mm: distinguish between mlocked and pinned pages") introduced a separate counter for pinned pages and used it in the IB stack. However, in ib_umem_get() the pinned counter is incremented, but ib_umem_release() wrongly decrements the locked counter. Fix this. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Christoph Lameter <cl@linux.com> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-11 11:38:22 -07:00
Shlomo Pongratz	987c8f8fc9	IB/mlx4: Replace printk(KERN_yyy...) with pr_yyy(...) Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> [ Replace one more printk_once() with pr_info_once(). - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-08 11:59:36 -07:00
Oren Duer	c0c1d3d761	IB/mlx4: Put priority bits in WQE of IBoE MLX QP Otherwise CM packets going over MLX QP1 get fixed scheduling priority 0. We want CM packets to get the same scheduling priority, and therefore map to the same SQ (Schedule Queue) and eventually TC (Traffic Class), as the application requested for the actual QP used for the connection. Signed-off-by: Oren Duer <oren@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-08 11:48:09 -07:00
Or Gerlitz	3987a2d319	IB/mlx4: Add raw packet QP support Implement raw packet QPs for Ethernet ports using the MLX transport (as done by the mlx4_en Ethernet netdevice driver). Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2012-05-08 11:18:09 -07:00

... 4 5 6 7 8 ...

3535 Commits