linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-27 22:51:35 +00:00

Author	SHA1	Message	Date
Haggai Eran	b4cfe447d4	IB/mlx5: Implement on demand paging by adding support for MMU notifiers * Implement the relevant invalidation functions (zap MTTs as needed) * Implement interlocking (and rollback in the page fault handlers) for cases of a racing notifier and fault. * With this patch we can now enable the capability bits for supporting RC send/receive/RDMA read/RDMA write, and UD send. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:19:04 -08:00
Haggai Eran	eab668a6d0	IB/mlx5: Add support for RDMA read/write responder page faults Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:19:03 -08:00
Haggai Eran	7bdf65d411	IB/mlx5: Handle page faults This patch implement a page fault handler (leaving the pages pinned as of time being). The page fault handler handles initiator and responder page faults for UD/RC transports, for send/receive operations, as well as RDMA read/write initiator support. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:19:03 -08:00
Haggai Eran	6aec21f6a8	IB/mlx5: Page faults handling infrastructure * Refactor MR registration and cleanup, and fix reg_pages accounting. * Create a work queue to handle page fault events in a kthread context. * Register a fault handler to get events from the core for each QP. The registered fault handler is empty in this patch, and only a later patch implements it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:19:03 -08:00
Haggai Eran	832a6b06ab	IB/mlx5: Add mlx5_ib_update_mtt to update page tables after creation The new function allows updating the page tables of a memory region after it was created. This can be used to handle page faults and page invalidations. Since mlx5_ib_update_mtt will need to work from within page invalidation, so it must not block on memory allocation. It employs an atomic memory allocation mechanism that is used as a fallback when kmalloc(GFP_ATOMIC) fails. In order to reuse code from mlx5_ib_populate_pas, the patch splits this function and add the needed parameters. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:19:02 -08:00
Haggai Eran	cc149f751b	IB/mlx5: Changes in memory region creation to support on-demand paging This patch wraps together several changes needed for on-demand paging support in the mlx5_ib_populate_pas function, and when registering memory regions. * Instead of accepting a UMR bit telling the function to enable all access flags, the function now accepts the access flags themselves. * For on-demand paging memory regions, fill the memory tables from the correct list, and enable/disable the access flags per-page according to whether the page is present. * A new bit is set to enable writing of access flags when using the firmware create_mkey command. * Disable contig pages when on-demand paging is enabled. In addition the patch changes the UMR code to use PTR_ALIGN instead of our own macro. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:19:02 -08:00
Haggai Eran	8cdd312cfe	IB/mlx5: Implement the ODP capability query verb The patch adds infrastructure to query ODP capabilities in the mlx5 driver. The code will read the capabilities from the device, and enable only those capabilities that both the driver and the device supports. At this point ODP is not supported, so no capability is copied from the device, but the patch exposes the global ODP device capability bit. Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:19:02 -08:00
Haggai Eran	882214e2b1	IB/core: Implement support for MMU notifiers regarding on demand paging regions * Add an interval tree implementation for ODP umems. Create an interval tree for each ucontext (including a count of the number of ODP MRs in this context, semaphore, etc.), and register ODP umems in the interval tree. * Add MMU notifiers handling functions, using the interval tree to notify only the relevant umems and underlying MRs. * Register to receive MMU notifier events from the MM subsystem upon ODP MR registration (and unregister accordingly). * Add a completion object to synchronize the destruction of ODP umems. * Add mechanism to abort page faults when there's a concurrent invalidation. The way we synchronize between concurrent invalidations and page faults is by keeping a counter of currently running invalidations, and a sequence number that is incremented whenever an invalidation is caught. The page fault code checks the counter and also verifies that the sequence number hasn't progressed before it updates the umem's page tables. This is similar to what the kvm module does. In order to prevent the case where we register a umem in the middle of an ongoing notifier, we also keep a per ucontext counter of the total number of active mmu notifiers. We only enable new umems when all the running notifiers complete. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yuval Dagan <yuvalda@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:36 -08:00
Shachar Raindel	8ada2c1c0c	IB/core: Add support for on demand paging regions * Extend the umem struct to keep the ODP related data. * Allocate and initialize the ODP related information in the umem (page_list, dma_list) and freeing as needed in the end of the run. * Store a reference to the process PID struct in the ucontext. Used to safely obtain the task_struct and the mm during fault handling, without preventing the task destruction if needed. * Add 2 helper functions: ib_umem_odp_map_dma_pages and ib_umem_odp_unmap_dma_pages. These functions get the DMA addresses of specific pages of the umem (and, currently, pin them). * Support for page faults only - IB core will keep the reference on the pages used and call put_page when freeing an ODP umem area. Invalidations support will be added in a later patch. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:36 -08:00
Sagi Grimberg	860f10a799	IB/core: Add flags for on demand paging support * Add a configuration option for enable on-demand paging support in the infiniband subsystem (CONFIG_INFINIBAND_ON_DEMAND_PAGING). In a later patch, this configuration option will select the MMU_NOTIFIER configuration option to enable mmu notifiers. * Add a flag for on demand paging (ODP) support in the IB device capabilities. * Add a flag to request ODP MR in the access flags to reg_mr. * Fail registrations done with the ODP flag when the low-level driver doesn't support this. * Change the conditions in which an MR will be writable to explicitly specify the access flags. This is to avoid making an MR writable just because it is an ODP MR. * Add a ODP capabilities to the extended query device verb. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:35 -08:00
Eli Cohen	5a77abf9a9	IB/core: Add support for extended query device caps Add extensible query device capabilities verb to allow adding new features. ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to copy capability fields to be used by both ib_uverbs_query_device and ib_uverbs_ex_query_device. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:35 -08:00
Haggai Eran	c1395a2a8c	IB/mlx5: Add function to read WQE from user-space Add a helper function mlx5_ib_read_user_wqe to read information from user-space owned work queues. The function will be used in a later patch by the page-fault handling code in mlx5_ib. Signed-off-by: Haggai Eran <haggaie@mellanox.com> [ Add stub for ib_umem_copy_from() for CONFIG_INFINIBAND_USER_MEM=n - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:35 -08:00
Haggai Eran	c5d76f130b	IB/core: Add umem function to read data from user-space In some drivers there's a need to read data from a user space area that was pinned using ib_umem when running from a different process context. The ib_umem_copy_from function allows reading data from the physical pages pinned in the ib_umem struct. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:35 -08:00
Haggai Eran	406f9e5fa9	IB/core: Replace ib_umem's offset field with a full address In order to allow umems that do not pin memory, we need the umem to keep track of its region's address. This makes the offset field redundant, and so this patch removes it. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:35 -08:00
Haggai Eran	968e78dd96	IB/mlx5: Enhance UMR support to allow partial page table update The current UMR interface doesn't allow partial updates to a memory region's page tables. This patch changes the interface to allow that. It also changes the way the UMR operation validates the memory region's state. When set, IB_SEND_UMR_FAIL_IF_FREE will cause the UMR operation to fail if the MKEY is in the free state. When it is unchecked the operation will check that it isn't in the free state. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:35 -08:00
Haggai Eran	21af2c3ebf	IB/mlx5: Remove per-MR pas and dma pointers Since UMR code now uses its own context struct on the stack, the pas and dma pointers for the UMR operation that remained in the mlx5_ib_mr struct are not necessary. This patch removes them. Fixes: `a74d24168d` ("IB/mlx5: Refactor UMR to have its own context struct") Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-12-15 18:13:35 -08:00
Linus Torvalds	70e71ca0af	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) New offloading infrastructure and example 'rocker' driver for offloading of switching and routing to hardware. This work was done by a large group of dedicated individuals, not limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend, Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu 2) Start making the networking operate on IOV iterators instead of modifying iov objects in-situ during transfers. Thanks to Al Viro and Herbert Xu. 3) A set of new netlink interfaces for the TIPC stack, from Richard Alpe. 4) Remove unnecessary looping during ipv6 routing lookups, from Martin KaFai Lau. 5) Add PAUSE frame generation support to gianfar driver, from Matei Pavaluca. 6) Allow for larger reordering levels in TCP, which are easily achievable in the real world right now, from Eric Dumazet. 7) Add a variable of napi_schedule that doesn't need to disable cpu interrupts, from Eric Dumazet. 8) Use a doubly linked list to optimize neigh_parms_release(), from Nicolas Dichtel. 9) Various enhancements to the kernel BPF verifier, and allow eBPF programs to actually be attached to sockets. From Alexei Starovoitov. 10) Support TSO/LSO in sunvnet driver, from David L Stevens. 11) Allow controlling ECN usage via routing metrics, from Florian Westphal. 12) Remote checksum offload, from Tom Herbert. 13) Add split-header receive, BQL, and xmit_more support to amd-xgbe driver, from Thomas Lendacky. 14) Add MPLS support to openvswitch, from Simon Horman. 15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen Klassert. 16) Do gro flushes on a per-device basis using a timer, from Eric Dumazet. This tries to resolve the conflicting goals between the desired handling of bulk vs. RPC-like traffic. 17) Allow userspace to ask for the CPU upon what a packet was received/steered, via SO_INCOMING_CPU. From Eric Dumazet. 18) Limit GSO packets to half the current congestion window, from Eric Dumazet. 19) Add a generic helper so that all drivers set their RSS keys in a consistent way, from Eric Dumazet. 20) Add xmit_more support to enic driver, from Govindarajulu Varadarajan. 21) Add VLAN packet scheduler action, from Jiri Pirko. 22) Support configurable RSS hash functions via ethtool, from Eyal Perry. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits) Fix race condition between vxlan_sock_add and vxlan_sock_release net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header net/mlx4: Add support for A0 steering net/mlx4: Refactor QUERY_PORT net/mlx4_core: Add explicit error message when rule doesn't meet configuration net/mlx4: Add A0 hybrid steering net/mlx4: Add mlx4_bitmap zone allocator net/mlx4: Add a check if there are too many reserved QPs net/mlx4: Change QP allocation scheme net/mlx4_core: Use tasklet for user-space CQ completion events net/mlx4_core: Mask out host side virtualization features for guests net/mlx4_en: Set csum level for encapsulated packets be2net: Export tunnel offloads only when a VxLAN tunnel is created gianfar: Fix dma check map error when DMA_API_DEBUG is enabled cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call net: fec: only enable mdio interrupt before phy device link up net: fec: clear all interrupt events to support i.MX6SX net: fec: reset fep link status in suspend function net: sock: fix access via invalid file descriptor net: introduce helper macro for_each_cmsghdr ...	2014-12-11 14:27:06 -08:00
Matan Barak	d57febe1a4	net/mlx4: Add A0 hybrid steering A0 hybrid steering is a form of high performance flow steering. By using this mode, mlx4 cards use a fast limited table based steering, in order to enable fast steering of unicast packets to a QP. In order to implement A0 hybrid steering we allocate resources from different zones: (1) General range (2) Special MAC-assigned QPs [RSS, Raw-Ethernet] each has its own region. When we create a rss QP or a raw ethernet (A0 steerable and BF ready) QP, we try hard to allocate the QP from range (2). Otherwise, we try hard not to allocate from this range. However, when the system is pushed to its limits and one needs every resource, the allocator uses every region it can. Meaning, when we run out of raw-eth qps, the allocator allocates from the general range (and the special-A0 area is no longer active). If we run out of RSS qps, the mechanism tries to allocate from the raw-eth QP zone. If that is also exhausted, the allocator will allocate from the general range (and the A0 region is no longer active). Note that if a raw-eth qp is allocated from the general range, it attempts to allocate the range such that bits 6 and 7 (blueflame bits) in the QP number are not set. When the feature is used in SRIOV, the VF has to notify the PF what kind of QP attributes it needs. In order to do that, along with the "Eth QP blueflame" bit, we reserve a new "A0 steerable QP". According to the combination of these bits, the PF tries to allocate a suitable QP. In order to maintain backward compatibility (with older PFs), the PF notifies which QP attributes it supports via QUERY_FUNC_CAP command. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-11 14:47:35 -05:00
Eugenia Emantayev	ddae0349fd	net/mlx4: Change QP allocation scheme When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset. The current Ethernet driver code reserves a Tx QP range with 256b alignment. This is wrong because if there are more than 64 Tx QPs in use, QPNs >= base + 65 will have bits 6/7 set. This problem is not specific for the Ethernet driver, any entity that tries to reserve more than 64 BF-enabled QPs should fail. Also, using ranges is not necessary here and is wasteful. The new mechanism introduced here will support reservation for "Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs (when hypervisors support WC in VMs). The flow we use is: 1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation, and request "BF enabled QPs" if BF is supported for the function 2. In the ALLOC_RES FW command, change param1 to: a. param1[23:0] - number of QPs b. param1[31-24] - flags controlling QPs reservation Bit 31 refers to Eth blueflame supported QPs. Those QPs must have bits 6 and 7 unset in order to be used in Ethernet. Bits 24-30 of the flags are currently reserved. When a function tries to allocate a QP, it states the required attributes for this QP. Those attributes are considered "best-effort". If an attribute, such as Ethernet BF enabled QP, is a must-have attribute, the function has to check that attribute is supported before trying to do the allocation. In a lower layer of the code, mlx4_qp_reserve_range masks out the bits which are unsupported. If SRIOV is used, the PF validates those attributes and masks out unsupported attributes as well. In order to notify VFs which attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's mailbox is filled by the PF, which notifies which QP allocation attributes it supports. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-11 14:47:35 -05:00
Matan Barak	3dca0f42c7	net/mlx4_core: Use tasklet for user-space CQ completion events Previously, we've fired all our completion callbacks straight from our ISR. Some of those callbacks were lightweight (for example, mlx4_en's and IPoIB napi callbacks), but some of them did more work (for example, the user-space RDMA stack uverbs' completion handler). Besides that, doing more than the minimal work in ISR is generally considered wrong, it could even lead to a hard lockup of the system. Since when a lot of completion events are generated by the hardware, the loop over those events could be so long, that we'll get into a hard lockup by the system watchdog. In order to avoid that, add a new way of invoking completion events callbacks. In the interrupt itself, we add the CQs which receive completion event to a per-EQ list and schedule a tasklet. In the tasklet context we loop over all the CQs in the list and invoke the user callback. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-11 14:47:34 -05:00
Eli Cohen	d14e71103b	mlx5: Fix error flow in add_keys If mlx5_core_create_mkey fails, decrease the pending counter to undo the previous increment. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-08 20:45:56 -05:00
Eli Cohen	6a4f139aae	mlx5: Fix sparse warnings 1. Add required __acquire/__release statements to balance spinlock usage. 2. Change the index parameter of begin_wqe() to be unsigned to match supplied argument type. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-08 20:45:56 -05:00
James Bottomley	096cbc35ea	Merge remote-tracking branch 'scsi-queue/drivers-for-3.19' into for-linus Conflicts: drivers/scsi/scsi_debug.c Agreed and tested resolution to a merge problem between a fix in scsi_debug and a driver update Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2014-12-08 07:42:25 -08:00
James Bottomley	dc843ef00e	Merge remote-tracking branch 'scsi-queue/core-for-3.19' into for-linus	2014-12-08 07:40:20 -08:00
Christoph Hellwig	db5ed4dfd5	scsi: drop reason argument from ->change_queue_depth Drop the now unused reason argument from the ->change_queue_depth method. Also add a return value to scsi_adjust_queue_depth, and rename it to scsi_change_queue_depth now that it can be used as the default ->change_queue_depth implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Hannes Reinecke <hare@suse.de>	2014-11-24 14:45:27 +01:00
Christoph Hellwig	1e6f241604	scsi: don't allow setting of queue_depth bigger than can_queue We won't ever queue more commands than the host allows. Instead of letting drivers either reject or ignore this case handle it in common code. Note that various driver use internal constant or variables that are assigned to both shost->can_queue and checked in ->change_queue_depth - I did remove those checks as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Hannes Reinecke <hare@suse.de>	2014-11-24 14:45:26 +01:00
Christoph Hellwig	c40ecc12cf	scsi: avoid ->change_queue_depth indirection for queue full tracking All drivers use the implementation for ramping the queue up and down, so instead of overloading the change_queue_depth method call the implementation diretly if the driver opts into it by setting the track_queue_depth flag in the host template. Note that a few drivers validated the new queue depth in their change_queue_depth method, but as we never go over the queue depth set during slave_configure or the sysfs file this isn't nessecary and can safely be removed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>	2014-11-24 14:45:12 +01:00
Hariprasad Shenai	b2e1a3f091	RDMA/cxgb4/cxgb4vf/csiostor: Cleanup macros/register defines related to PCIE, RSS and FW This patch cleanups all PCIE, RSS & FW related macros/register defines that are defined in t4fw_api.h and the affected files. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-22 16:57:47 -05:00
Hariprasad Shenai	5167865aaa	RDMA/cxgb4/csiostor: Cleansup FW related macros/register defines for PF/VF and LDST This patch cleanups PF/VF and LDST related macros/register defines that are defined in t4fw_api.h and the affected files. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-22 16:57:47 -05:00
Hariprasad Shenai	77a80e23cc	RDMA/cxgb4: Cleanup Filter related macros/register defines This patch cleanups all filter related macros/register defines that are defined in t4fw_api.h and the affected files. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-22 16:57:46 -05:00
David S. Miller	1459143386	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ieee802154/fakehard.c A bug fix went into 'net' for ieee802154/fakehard.c, which is removed in 'net-next'. Add build fix into the merge from Stephen Rothwell in openvswitch, the logging macros take a new initial 'log' argument, a new call was added in 'net' so when we merge that in here we have to explicitly add the new 'log' arg to it else the build fails. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 22:28:24 -05:00
Linus Torvalds	a46171d010	Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target fixes from Nicholas Bellinger: "Here are the target-pending fixes queued for v3.18-rc6. The highlights include: - target-core OOPs fix with tcm_qla2xxx + vxworks FC initiators + zero length SCSI commands having a transfer direction set. (Roland + Craig Watson) - vhost-scsi OOPs fix to explicitly prevent WWPN endpoint configfs group removal while qemu still has an active reference. (Paolo + nab) - ib_srpt fix for RDMA hardware with lower srp_sq_size limits. (Bart) - two ib_isert work-arounds for running on ocrdma hardware (Or + Sagi + Chris) - iscsi-target discovery portal typo + SPC-3 PR Preempt SA key matching fix (Steve)" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: IB/isert: Adjust CQ size to HW limits target: return CONFLICT only when SA key unmatched iser-target: Handle DEVICE_REMOVAL event on network portal listener correctly ib_isert: Add max_send_sge=2 minimum for control PDU responses srp-target: Retry when QP creation fails with ENOMEM iscsi-target: return the correct port in SendTargets vhost-scsi: Take configfs group dependency during VHOST_SCSI_SET_ENDPOINT target: Don't call TFO->write_pending if data_length == 0	2014-11-21 16:28:45 -08:00
Al Viro	479163f460	mlx5: don't duplicate kvfree() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-21 14:58:18 -05:00
Chris Moore	b1a5ad006b	IB/isert: Adjust CQ size to HW limits isert has an issue of trying to create a CQ with more CQEs than are supported by the hardware, that currently results in failures during isert_device creation during first session login. This is the isert version of the patch that Minh Tran submitted for iser, and is simple a workaround required to function with existing ocrdma hardware. Signed-off-by: Chris Moore <chris.moore@emulex.com> Reviewied-by: Sagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # 3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2014-11-19 22:28:42 -08:00
Matan Barak	7ae0e400cd	net/mlx4_core: Flexible (asymmetric) allocation of EQs and MSI-X vectors for PF/VFs Previously, the driver queried the firmware in order to get the number of supported EQs. Under SRIOV, since this was done before the driver notified the firmware how many VFs it actually needs, the firmware had to take into account a worst case scenario and always allocated four EQs per VF, where one was used for events while the others were used for completions. Now, when the firmware supports the asymmetric allocation scheme, denoted by exposing num_sys_eqs > 0 (--> MLX4_DEV_CAP_FLAG2_SYS_EQS), we use the QUERY_FUNC command to query the firmware before enabling SRIOV. Thus we can get more EQs and MSI-X vectors per function. Moreover, when running in the new firmware/driver mode, the limitation that the number of EQs should be a power of two is lifted. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 15:16:21 -05:00
Anish Bhatt	d7990b0c34	cxgb4i/cxgb4 : Refactor macros to conform to uniform standards Refactored all macros used in cxgb4i as part of previously started cxgb4 macro names cleanup. Makes them more uniform and avoids namespace collision. Minor changes in other drivers where required as some of these macros are used by multiple drivers, affected drivers are iw_cxgb4, cxgb4(vf) & csiostor Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-13 14:36:22 -05:00
Bart Van Assche	7dad6b2e44	IB/srp: Fix a race condition triggered by destroying a queue pair At least LID reassignment can trigger a race condition in the SRP initiator driver, namely the receive completion handler trying to post a request on a QP during or after QP destruction and before the CQ's have been destroyed. Avoid this race by modifying a QP into the error state and by waiting until all receive completions have been processed before destroying a QP. Reported-by: Max Gurtuvoy <maxg@mellanox.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-11-12 12:05:24 +01:00
Bart Van Assche	d92c0da71a	IB/srp: Add multichannel support Improve performance by using multiple RDMA/RC channels per SCSI host for communication with an SRP target. About the implementation: - Introduce a loop over all channels in the code that uses target->ch. - Set the SRP_MULTICHAN_MULTI flag during login for the creation of the second and subsequent channels. - RDMA completion vectors are chosen such that RDMA completion interrupts are handled by the CPU socket that submitted the I/O request. As one can see in this patch it has been assumed if a system contains n CPU sockets and m RDMA completion vectors have been assigned to an RDMA HCA that IRQ affinity has been configured such that completion vectors [im/n..(i+1)m/n) are bound to CPU socket i with 0 <= i < n. - Modify srp_free_ch_ib() and srp_free_req_data() such that it becomes safe to invoke these functions after the corresponding allocation function failed. - Add a ch_count sysfs attribute per target port. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-11-12 12:05:24 +01:00
Bart Van Assche	77f2c1a40e	IB/srp: Use block layer tags Since the block layer already contains functionality to assign a tag to each request, use that functionality instead of reimplementing that functionality in the SRP initiator driver. This change makes the free_reqs list superfluous. Hence remove that list. [hch: updated to use .use_blk_tags instead scsi_activate_tcq] Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-11-12 12:05:23 +01:00
Bart Van Assche	509c07bc18	IB/srp: Separate target and channel variables Changes in this patch: - Move channel variables into a new structure (struct srp_rdma_ch). - Add an srp_target_port pointer, 'lock' and 'comp_vector' members in struct srp_rdma_ch. - Add code to initialize these three new member variables. - Many boring "target->" into "ch->" changes. - The cm_id and completion handler context pointers are now of type srp_rdma_ch * instead of srp_target_port . - Three kzalloc(a b, f) calls have been changed into kcalloc(a, b, f) to avoid that this patch would trigger a checkpatch warning. - Two casts from u64 into unsigned long long have been left out because these are superfluous. Since considerable time u64 is defined as unsigned long long for all architectures supported by the Linux kernel. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-11-12 11:32:03 +01:00
Bart Van Assche	747fe000ef	IB/srp: Introduce two new srp_target_port member variables Introduce the srp_target_port member variables 'sgid' and 'pkey'. Change the type of 'orig_dgid' from __be16[8] into union ib_gid. This patch does not change any functionality but makes the "Separate target and channel variables" patch easier to verify. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-11-12 11:32:03 +01:00
Bart Van Assche	34aa654ecb	IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning If a cable is pulled during LUN scanning it can happen that the SRP rport and the SCSI host have been created but no LUNs have been added to the SCSI host. Since multipathd only sends SCSI commands to a SCSI target if one or more SCSI devices are present and since there is no keepalive mechanism for IB queue pairs this means that after a LUN scan failed and after a reconnect has succeeded no data will be sent over the QP and hence that a subsequent cable pull will not be detected. Avoid this by not creating an rport or SCSI host if a cable is pulled during a SCSI LUN scan. Note: so far the above behavior has only been observed with the kernel module parameter ch_count set to a value >= 2. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-11-12 11:32:02 +01:00
Bart Van Assche	205619f2f8	IB/srp: Remove stale connection retry mechanism Attempting to connect three times may be insufficient after an initiator system tries to relogin, especially if the relogin attempt occurs before the SRP target service ID has been registered. Since the srp_daemon retries a failed login attempt anyway, remove the stale connection retry mechanism. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-11-12 11:32:02 +01:00
Bart Van Assche	394c595ee8	IB/srp: Move ib_destroy_cm_id() call into srp_free_ch_ib() The patch that adds multichannel support into the SRP initiator driver introduces an additional call to srp_free_ch_ib(). This patch helps to keep that later patch simple. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-11-12 11:32:01 +01:00
Christoph Hellwig	c8b09f6fb6	scsi: don't set tagging state from scsi_adjust_queue_depth Remove the tagged argument from scsi_adjust_queue_depth, and just let it handle the queue depth. For most drivers those two are fairly separate, given that most modern drivers don't care about the SCSI "tagged" status of a command at all, and many old drivers allow queuing of multiple untagged commands in the driver. Instead we start out with the ->simple_tags flag set before calling ->slave_configure, which is how all drivers actually looking at ->simple_tags except for one worke anyway. The one other case looks broken, but I've kept the behavior as-is for now. Except for that we only change ->simple_tags from the ->change_queue_type, and when rejecting a tag message in a single driver, so keeping this churn out of scsi_adjust_queue_depth is a clear win. Now that the usage of scsi_adjust_queue_depth is more obvious we can also remove all the trivial instances in ->slave_alloc or ->slave_configure that just set it to the cmd_per_lun default. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>	2014-11-12 11:19:43 +01:00
Christoph Hellwig	a62182f338	scsi: provide a generic change_queue_type method Most drivers use exactly the same implementation, so provide it as a library function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de>	2014-11-12 11:19:39 +01:00
Hariprasad Shenai	e2ac962895	cxgb4: Cleanup macros so they follow the same style and look consistent, part 2 Various patches have ended up changing the style of the symbolic macros/register defines to different style. As a result, the current kernel.org files are a mix of different macro styles. Since this macro/register defines is used by different drivers a few patch series have ended up adding duplicate macro/register define entries with different styles. This makes these register define/macro files a complete mess and we want to make them clean and consistent. This patch cleans up a part of it. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-10 12:57:10 -05:00
Sagi Grimberg	3b726ae2de	iser-target: Handle DEVICE_REMOVAL event on network portal listener correctly In this case the cm_id->context is the isert_np, and the cm_id->qp is NULL, so use that to distinct the cases. Since we don't expect any other events on this cm_id we can just return -1 for explicit termination of the cm_id by the cma layer. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # 3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2014-11-02 22:04:48 -08:00
Or Gerlitz	f57915cfa5	ib_isert: Add max_send_sge=2 minimum for control PDU responses This patch adds a max_send_sge=2 minimum in isert_conn_setup_qp() to ensure outgoing control PDU responses with tx_desc->num_sge=2 are able to function correctly. This addresses a bug with RDMA hardware using dev_attr.max_sge=3, that in the original code with the ConnectX-2 work-around would result in isert_conn->max_sge=1 being negotiated. Originally reported by Chris with ocrdma driver. Reported-by: Chris Moore <Chris.Moore@emulex.com> Tested-by: Chris Moore <Chris.Moore@emulex.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Cc: <stable@vger.kernel.org> # 3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>	2014-11-02 22:04:29 -08:00
Linus Torvalds	89453379aa	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: "A bit has accumulated, but it's been a week or so since my last batch of post-merge-window fixes, so... 1) Missing module license in netfilter reject module, from Pablo. Lots of people ran into this. 2) Off by one in mac80211 baserate calculation, from Karl Beldan. 3) Fix incorrect return value from ax88179_178a driver's set_mac_addr op, which broke use of it with bonding. From Ian Morgan. 4) Checking of skb_gso_segment()'s return value was not all encompassing, it can return an SKB pointer, a pointer error, or NULL. Fix from Florian Westphal. This is crummy, and longer term will be fixed to just return error pointers or a real SKB. 6) Encapsulation offloads not being handled by skb_gso_transport_seglen(). From Florian Westphal. 7) Fix deadlock in TIPC stack, from Ying Xue. 8) Fix performance regression from using rhashtable for netlink sockets. The problem was the synchronize_net() invoked for every socket destroy. From Thomas Graf. 9) Fix bug in eBPF verifier, and remove the strong dependency of BPF on NET. From Alexei Starovoitov. 10) In qdisc_create(), use the correct interface to allocate ->cpu_bstats, otherwise the u64_stats_sync member isn't initialized properly. From Sabrina Dubroca. 11) Off by one in ip_set_nfnl_get_byindex(), from Dan Carpenter. 12) nf_tables_newchain() was erroneously expecting error pointers from netdev_alloc_pcpu_stats(). It only returna a valid pointer or NULL. From Sabrina Dubroca. 13) Fix use-after-free in _decode_session6(), from Li RongQing. 14) When we set the TX flow hash on a socket, we mistakenly do so before we've nailed down the final source port. Move the setting deeper to fix this. From Sathya Perla. 15) NAPI budget accounting in amd-xgbe driver was counting descriptors instead of full packets, fix from Thomas Lendacky. 16) Fix total_data_buflen calculation in hyperv driver, from Haiyang Zhang. 17) Fix bcma driver build with OF_ADDRESS disabled, from Hauke Mehrtens. 18) Fix mis-use of per-cpu memory in TCP md5 code. The problem is that something that ends up being vmalloc memory can't be passed to the crypto hash routines via scatter-gather lists. From Eric Dumazet. 19) Fix regression in promiscuous mode enabling in cdc-ether, from Olivier Blin. 20) Bucket eviction and frag entry killing can race with eachother, causing an unlink of the object from the wrong list. Fix from Nikolay Aleksandrov. 21) Missing initialization of spinlock in cxgb4 driver, from Anish Bhatt. 22) Do not cache ipv4 routing failures, otherwise if the sysctl for forwarding is subsequently enabled this won't be seen. From Nicolas Cavallari" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (131 commits) drivers: net: cpsw: Support ALLMULTI and fix IFF_PROMISC in switch mode drivers: net: cpsw: Fix broken loop condition in switch mode net: ethtool: Return -EOPNOTSUPP if user space tries to read EEPROM with lengh 0 stmmac: pci: set default of the filter bins net: smc91x: Fix gpios for device tree based booting mpls: Allow mpls_gso to be built as module mpls: Fix mpls_gso handler. r8152: stop submitting intr for -EPROTO netfilter: nft_reject_bridge: restrict reject to prerouting and input netfilter: nft_reject_bridge: don't use IP stack to reject traffic netfilter: nf_reject_ipv6: split nf_send_reset6() in smaller functions netfilter: nf_reject_ipv4: split nf_send_reset() in smaller functions netfilter: nf_tables_bridge: update hook_mask to allow {pre,post}routing drivers/net: macvtap and tun depend on INET drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets drivers/net: Disable UFO through virtio net: skb_fclone_busy() needs to detect orphaned skb gre: Use inner mac length when computing tunnel length mlx4: Avoid leaking steering rules on flow creation error flow net/mlx4_en: Don't attempt to TX offload the outer UDP checksum for VXLAN ...	2014-10-31 15:04:58 -07:00

1 2 3 4 5 ...

4191 Commits