Commit Graph

2666 Commits

Author SHA1 Message Date
Linus Torvalds
c6b1de1b64 Merge branch 'debugfs_automount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull debugfs patches from Al Viro:
 "debugfs patches, mostly to make it possible for something like tracefs
  to be transparently automounted on given directory in debugfs.

  New primitive in there is debugfs_create_automount(name, parent, func,
  arg), which creates a directory and makes its ->d_automount() return
  func(arg).  Another missing primitive was debugfs_create_file_size() -
  open-coded in quite a few places.  Dave's patch adds it and converts
  the open-code instances to calling it"

* 'debugfs_automount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  debugfs: Provide a file creation function that also takes an initial size
  new primitive: debugfs_create_automount()
  debugfs: split end_creating() into success and failure cases
  debugfs: take mode-dependent parts of debugfs_get_inode() into callers
  fold debugfs_mknod() into callers
  fold debugfs_create() into caller
  fold debugfs_mkdir() into caller
  debugfs_mknod(): get rid useless arguments
  fold debugfs_link() into caller
  debugfs: kill __create_file()
  debugfs: split the beginning and the end of __create_file() off
  debugfs_{mkdir,create,link}(): get rid of redundant argument
2015-02-17 15:18:19 -08:00
David Howells
e59b4e9187 debugfs: Provide a file creation function that also takes an initial size
Provide a file creation function that also takes an initial size so that the
caller doesn't have to set i_size, thus meaning that we don't have to call
deal with ->d_inode in the callers.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-17 12:21:51 -05:00
Linus Torvalds
c5ce28df0e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) More iov_iter conversion work from Al Viro.

    [ The "crypto: switch af_alg_make_sg() to iov_iter" commit was
      wrong, and this pull actually adds an extra commit on top of the
      branch I'm pulling to fix that up, so that the pre-merge state is
      ok.   - Linus ]

 2) Various optimizations to the ipv4 forwarding information base trie
    lookup implementation.  From Alexander Duyck.

 3) Remove sock_iocb altogether, from CHristoph Hellwig.

 4) Allow congestion control algorithm selection via routing metrics.
    From Daniel Borkmann.

 5) Make ipv4 uncached route list per-cpu, from Eric Dumazet.

 6) Handle rfs hash collisions more gracefully, also from Eric Dumazet.

 7) Add xmit_more support to r8169, e1000, and e1000e drivers.  From
    Florian Westphal.

 8) Transparent Ethernet Bridging support for GRO, from Jesse Gross.

 9) Add BPF packet actions to packet scheduler, from Jiri Pirko.

10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer.

11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman
    Kwok.

12) More sanely handle out-of-window dupacks, which can result in
    serious ACK storms.  From Neal Cardwell.

13) Various rhashtable bug fixes and enhancements, from Herbert Xu,
    Patrick McHardy, and Thomas Graf.

14) Support xmit_more in be2net, from Sathya Perla.

15) Group Policy extensions for vxlan, from Thomas Graf.

16) Remove Checksum Offload support for vxlan, from Tom Herbert.

17) Like ipv4, support lockless transmit over ipv6 UDP sockets.  From
    Vlad Yasevich.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits)
  crypto: fix af_alg_make_sg() conversion to iov_iter
  ipv4: Namespecify TCP PMTU mechanism
  i40e: Fix for stats init function call in Rx setup
  tcp: don't include Fast Open option in SYN-ACK on pure SYN-data
  openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set
  ipv6: Make __ipv6_select_ident static
  ipv6: Fix fragment id assignment on LE arches.
  bridge: Fix inability to add non-vlan fdb entry
  net: Mellanox: Delete unnecessary checks before the function call "vunmap"
  cxgb4: Add support in cxgb4 to get expansion rom version via ethtool
  ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version
  net: dsa: Remove redundant phy_attach()
  IB/mlx4: Reset flow support for IB kernel ULPs
  IB/mlx4: Always use the correct port for mirrored multicast attachments
  net/bonding: Fix potential bad memory access during bonding events
  tipc: remove tipc_snprintf
  tipc: nl compat add noop and remove legacy nl framework
  tipc: convert legacy nl stats show to nl compat
  tipc: convert legacy nl net id get to nl compat
  tipc: convert legacy nl net id set to nl compat
  ...
2015-02-10 20:01:30 -08:00
Yishai Hadas
35f05dabf9 IB/mlx4: Reset flow support for IB kernel ULPs
The driver exposes interfaces that directly relate to HW state. Upon fatal
error, consumers of these interfaces (ULPs) that rely on completion of
all their posted work-request could hang, thereby introducing dependencies
in shutdown order.  To prevent this from happening, we manage the
relevant resources (CQs, QPs) that are used by the device. Upon a fatal error,
we now generate simulated completions for outstanding WQEs that were not
completed at the time the HW was reset.

It includes invoking the completion event handler for all involved CQs so that
the ULPs will poll those CQs. When polled we return simulated CQEs with
IB_WC_WR_FLUSH_ERR return code enabling ULPs to clean up their resources and
not wait forever for completions upon receiving remove_one.

The above change requires an extra check in the data path to make sure that when
device is in error state, the simulated CQEs will be returned and no further
WQEs will be posted.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09 14:03:53 -08:00
Moni Shoua
824c25c1ab IB/mlx4: Always use the correct port for mirrored multicast attachments
When attaching a QP to a multicast address in bonded mode, there was an
assumption that the port of the QP must be #1. This assumption isn't the
case under the flow which enables maximal usage of the physical ports.

Fix it by always checking the port of the original flow and create the
mirrored flow on the other port.

Fixes: c6215745b6 ('IB/mlx4: Load balance ports in port aggregation mode')
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09 14:03:53 -08:00
Yann Droneaud
43c6116573 Revert "IB/core: Add support for extended query device caps"
While commit 7e36ef8205 ("IB/core: Temporarily disable
ex_query_device uverb") is correct as it makes the extended
QUERY_DEVICE uverb (which came as part of commit 5a77abf9a9
("IB/core: Add support for extended query device caps") and commit
860f10a799 ("IB/core: Add flags for on demand paging support")) not
available to userspace, it doesn't address the initial issue regarding
ib_copy_to_udata() [1][2].

Additionally, further discussions around this new uverb seems to
conclude it would require a different data structure than the one
currently described in <rdma/ib_user_verbs.h> [3].

Both of these issues require a revert of the changes, so this patch
partially reverts commit 8cdd312cfe ("IB/mlx5: Implement the ODP
capability query verb") and commit 860f10a799 ("IB/core: Add flags
for on demand paging support") and fully reverts commit 5a77abf9a9
("IB/core: Add support for extended query device caps").

[1] "Re: [PATCH v3 06/17] IB/core: Add support for extended query device caps"
    http://mid.gmane.org/1418733236.2779.26.camel@opteya.com

[2] "Re: [PATCH] IB/core: Temporarily disable ex_query_device uverb"
    http://mid.gmane.org/1423067503.3030.83.camel@opteya.com

[3] "RE: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask"
    http://mid.gmane.org/2807E5FD2F6FDA4886F6618EAC48510E0CC12C30@CRSMSX101.amr.corp.intel.com

Cc: Eli Cohen <eli@mellanox.com>
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2015-02-06 00:54:33 -08:00
Moni Shoua
c6215745b6 IB/mlx4: Load balance ports in port aggregation mode
When the mlx4 IB (RoCE) device works in link aggregation mode, it
exposes a single port to upper layers. Therefore, applications always
set '1' in port_num attribute when modifying a QP or creating an address handle.

To make sure that a node uses all available ports the mlx4 driver will
override the port_num attribute with a round robin policy.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:14:25 -08:00
Moni Shoua
146d6e1983 IB/mlx4: Create mirror flows in port aggregation mode
In port aggregation mode flows for port #1 (the only port) should be mirrored
on port #2. This is because packets can arrive from either physical ports.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:14:25 -08:00
Moni Shoua
a575009030 IB/mlx4: Add port aggregation support
Register the interface with the mlx4 core driver with port aggregation support
and check for port aggregation mode when the 'add' function is called.

In this mode, only one physical port is reported to the upper layer
(RoCE/IB core stack and ULPs).

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:14:25 -08:00
Moni Shoua
2f48485d1c IB/mlx4: Reuse mlx4_mac_to_u64()
This function is implemented twice... get rid of one copy.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:14:25 -08:00
David S. Miller
95f873f2ff Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	arch/arm/boot/dts/imx6sx-sdb.dts
	net/sched/cls_bpf.c

Two simple sets of overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 16:59:56 -08:00
Yishai Hadas
872bf2fb69 net/mlx4_core: Maintain a persistent memory for mlx4 device
Maintain a persistent memory that should survive reset flow/PCI error.
This comes as a preparation for coming series to support above flows.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:13 -08:00
Hariprasad Shenai
cf7fe64aee iw_cxgb4: Cleanup register defines/MACROS defined in t4fw_ri_api.h
Cleanup all the MACROS that are defined in t4fw_ri_api.h and affected files

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-16 01:07:02 -05:00
Hariprasad Shenai
a56c66e808 iw_cxgb4: Cleanup register defines/MACROS defined in t4.h
Cleanup all the MACROS defined in t4.h and the affected files

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-16 01:07:01 -05:00
Or Gerlitz
5eff6dadb9 net/mlx4: Don't disable vxlan offloads under DMFS-A0 optimized steering
Except for VXLAN steering rules, all offloads should work as they were
under plain DMFS mode. Fix that by enabling all the offloads under
DMFS-A0 mode, except for VXLAN steering rules.

Fixes: d57febe1a4 "net/mlx4: Add A0 hybrid steering"
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 19:35:30 -05:00
Jiri Pirko
df8a39defa net: rename vlan_tx_* helpers since "tx" is misleading there
The same macros are used for rx as well. So rename it.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:51:08 -05:00
Arnd Bergmann
7835bfb526 infiniband: mlx5: avoid a compile-time warning
The return type of find_first_bit() is architecture specific,
on ARM it is 'unsigned int', while the asm-generic code used
on x86 and a lot of other architectures returns 'unsigned long'.

When building the mlx5 driver on ARM, we get a warning about
this:

infiniband/hw/mlx5/mem.c: In function 'mlx5_ib_cont_pages':
infiniband/hw/mlx5/mem.c:84:143: warning: comparison of distinct pointer types lacks a cast
     m = min(m, find_first_bit(&tmp, sizeof(tmp)));

This patch changes the driver to use min_t to make it behave
the same way on all architectures.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:08:21 -05:00
Hariprasad Shenai
bdc590b99f iw_cxgb4/cxgb4/cxgb4vf/cxgb4i/csiostor: Cleanup register defines/macros related to all other cpl messages
This patch cleanups all other macros/register define related to
CPL messages that are defined in t4_msg.h and the affected files

Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:19:34 -05:00
Hariprasad Shenai
6c53e938a8 iw_cxgb4/cxgb4/cxgb4i: Cleanup register defines/MACROS related to CM CPL messages
This patch cleanups all macros/register define related to connection management
CPL messages that are defined in t4_msg.h and the affected files

Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-12 16:19:34 -05:00
Hariprasad Shenai
f612b815d7 RDMA/cxgb4/cxgb4vf/csiostor: Cleanup SGE register defines
This patch cleanups all SGE related macros/register defines that are
defined in t4_regs.h and the affected files.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 16:34:47 -05:00
Roland Dreier
a7cfef21e3 Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'mlx4', 'ocrdma', 'odp' and 'srp' into for-next 2014-12-15 18:19:20 -08:00
Haggai Eran
b4cfe447d4 IB/mlx5: Implement on demand paging by adding support for MMU notifiers
* Implement the relevant invalidation functions (zap MTTs as needed)
* Implement interlocking (and rollback in the page fault handlers) for
  cases of a racing notifier and fault.
* With this patch we can now enable the capability bits for supporting RC
  send/receive/RDMA read/RDMA write, and UD send.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:19:04 -08:00
Haggai Eran
eab668a6d0 IB/mlx5: Add support for RDMA read/write responder page faults
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:19:03 -08:00
Haggai Eran
7bdf65d411 IB/mlx5: Handle page faults
This patch implement a page fault handler (leaving the pages pinned as
of time being).  The page fault handler handles initiator and responder
page faults for UD/RC transports, for send/receive operations, as well
as RDMA read/write initiator support.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:19:03 -08:00
Haggai Eran
6aec21f6a8 IB/mlx5: Page faults handling infrastructure
* Refactor MR registration and cleanup, and fix reg_pages accounting.
* Create a work queue to handle page fault events in a kthread context.
* Register a fault handler to get events from the core for each QP.

The registered fault handler is empty in this patch, and only a later
patch implements it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:19:03 -08:00
Haggai Eran
832a6b06ab IB/mlx5: Add mlx5_ib_update_mtt to update page tables after creation
The new function allows updating the page tables of a memory region
after it was created. This can be used to handle page faults and page
invalidations.

Since mlx5_ib_update_mtt will need to work from within page invalidation,
so it must not block on memory allocation. It employs an atomic memory
allocation mechanism that is used as a fallback when kmalloc(GFP_ATOMIC) fails.

In order to reuse code from mlx5_ib_populate_pas, the patch splits
this function and add the needed parameters.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:19:02 -08:00
Haggai Eran
cc149f751b IB/mlx5: Changes in memory region creation to support on-demand paging
This patch wraps together several changes needed for on-demand paging support
in the mlx5_ib_populate_pas function, and when registering memory regions.

* Instead of accepting a UMR bit telling the function to enable all
  access flags, the function now accepts the access flags themselves.
* For on-demand paging memory regions, fill the memory tables from the
  correct list, and enable/disable the access flags per-page according
  to whether the page is present.
* A new bit is set to enable writing of access flags when using the
  firmware create_mkey command.
* Disable contig pages when on-demand paging is enabled.

In addition the patch changes the UMR code to use PTR_ALIGN instead of
our own macro.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:19:02 -08:00
Haggai Eran
8cdd312cfe IB/mlx5: Implement the ODP capability query verb
The patch adds infrastructure to query ODP capabilities in the mlx5
driver. The code will read the capabilities from the device, and
enable only those capabilities that both the driver and the device
supports.  At this point ODP is not supported, so no capability is
copied from the device, but the patch exposes the global ODP device
capability bit.

Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:19:02 -08:00
Haggai Eran
c1395a2a8c IB/mlx5: Add function to read WQE from user-space
Add a helper function mlx5_ib_read_user_wqe to read information from
user-space owned work queues.  The function will be used in a later
patch by the page-fault handling code in mlx5_ib.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>

[ Add stub for ib_umem_copy_from() for CONFIG_INFINIBAND_USER_MEM=n
  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:13:35 -08:00
Haggai Eran
406f9e5fa9 IB/core: Replace ib_umem's offset field with a full address
In order to allow umems that do not pin memory, we need the umem to
keep track of its region's address.

This makes the offset field redundant, and so this patch removes it.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:13:35 -08:00
Haggai Eran
968e78dd96 IB/mlx5: Enhance UMR support to allow partial page table update
The current UMR interface doesn't allow partial updates to a memory
region's page tables. This patch changes the interface to allow that.

It also changes the way the UMR operation validates the memory
region's state.  When set, IB_SEND_UMR_FAIL_IF_FREE will cause the UMR
operation to fail if the MKEY is in the free state. When it is
unchecked the operation will check that it isn't in the free state.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:13:35 -08:00
Haggai Eran
21af2c3ebf IB/mlx5: Remove per-MR pas and dma pointers
Since UMR code now uses its own context struct on the stack, the pas
and dma pointers for the UMR operation that remained in the mlx5_ib_mr
struct are not necessary.  This patch removes them.

Fixes: a74d24168d ("IB/mlx5: Refactor UMR to have its own context struct")
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:13:35 -08:00
Devesh Sharma
e5f0508d43 RDMA/ocrdma: Always resolve destination mac from GRH for UD QPs
For user applications that use UD QPs, always resolve destination MAC
from the GRH.  This is to avoid failure due to any garbage value in
the attr->dmac.

Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:13:09 -08:00
Mitesh Ahuja
95bf0093a9 RDMA/ocrdma: Fix ocrdma_query_qp() to report q_key value for UD QPs
Signed-off-by: Mitesh Ahuja <mitesh.ahuja@emulex.com>
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:13:09 -08:00
Jack Morgenstein
9f35e8995b IB/mlx4: Fix an incorrectly shadowed variable in mlx4_ib_rereg_user_mr
This error was detected by sparse static checker:

    drivers/infiniband/hw/mlx4/mr.c:226:21: warning: symbol 'err' shadows an earlier one
    drivers/infiniband/hw/mlx4/mr.c:197:13: originally declared here

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:12:29 -08:00
Hariprasad S
e6b11163d4 RDMA/cxgb4: Handle NET_XMIT return codes
cxgb4_create_server() and cxgb4_create_server6() return NET_XMIT_*
values or a negative errno. iw_cxgb4 need to handle this correctly.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:10:46 -08:00
Steve Wise
5b34180883 RDMA/cxgb4: Wake up waiters after flushing the qp
When transitioning into ERROR state, the QP was getting flushed after
waking up any waiters.  This can cause applications to miss flushed work
requests which can stall an NFS mount.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:10:46 -08:00
Hariprasad Shenai
2550a88d95 RDMA/cxgb4: Limit MRs to < 8GB for T4/T5 devices
T4/T5 hardware can't handle MRs >= 8GB due to a hardware bug.  So limit
registrations to < 8GB for thse devices.

Based on original work by Steve Wise <swise@opengridcomputing.com>.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:10:46 -08:00
Hariprasad Shenai
10be6b48fd RDMA/cxgb4: Fix locking issue in process_mpa_request
Fix the following lockdep report:

    =============================================
    [ INFO: possible recursive locking detected ]
    3.17.0+ #3 Tainted: G            E
    ---------------------------------------------
    kworker/u64:3/299 is trying to acquire lock:
     (&epc->mutex){+.+.+.}, at: [<ffffffffa074e07a>]
    process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]

    but task is already holding lock:
     (&epc->mutex){+.+.+.}, at: [<ffffffffa074e34e>] rx_data+0x9e/0x1f0 [iw_cxgb4]

    other info that might help us debug this:
     Possible unsafe locking scenario:

           CPU0
           ----
      lock(&epc->mutex);
      lock(&epc->mutex);

     *** DEADLOCK ***

     May be due to missing lock nesting notation

    3 locks held by kworker/u64:3/299:
     #0:  ("%s""iw_cxgb4"){.+.+.+}, at: [<ffffffff8106f14d>]
    process_one_work+0x13d/0x4d0
     #1:  (skb_work){+.+.+.}, at: [<ffffffff8106f14d>] process_one_work+0x13d/0x4d0
     #2:  (&epc->mutex){+.+.+.}, at: [<ffffffffa074e34e>] rx_data+0x9e/0x1f0
    [iw_cxgb4]

    stack backtrace:
    CPU: 2 PID: 299 Comm: kworker/u64:3 Tainted: G            E  3.17.0+ #3
    Hardware name: Dell Inc. PowerEdge T110/0X744K, BIOS 1.2.1 01/28/2010
    Workqueue: iw_cxgb4 process_work [iw_cxgb4]
     ffff8800b91593d0 ffff8800b8a2f9f8 ffffffff815df107 0000000000000001
     ffff8800b9158750 ffff8800b8a2fa28 ffffffff8109f0e2 ffff8800bb768a00
     ffff8800b91593d0 ffff8800b9158750 0000000000000000 ffff8800b8a2fa88
    Call Trace:
     [<ffffffff815df107>] dump_stack+0x49/0x62
     [<ffffffff8109f0e2>] print_deadlock_bug+0xf2/0x100
     [<ffffffff810a0f04>] validate_chain+0x454/0x700
     [<ffffffff810a1574>] __lock_acquire+0x3c4/0x580
     [<ffffffffa074e07a>] ? process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
     [<ffffffff810a17cc>] lock_acquire+0x9c/0x110
     [<ffffffffa074e07a>] ? process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
     [<ffffffff815e111b>] mutex_lock_nested+0x4b/0x360
     [<ffffffffa074e07a>] ? process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
     [<ffffffff810c181a>] ? del_timer_sync+0xaa/0xd0
     [<ffffffff810c1770>] ? try_to_del_timer_sync+0x70/0x70
     [<ffffffffa074e07a>] process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
     [<ffffffffa074a3ec>] ? update_rx_credits+0xec/0x140 [iw_cxgb4]
     [<ffffffffa074e381>] rx_data+0xd1/0x1f0 [iw_cxgb4]
     [<ffffffff8109ff23>] ? mark_held_locks+0x73/0xa0
     [<ffffffff815e4b90>] ? _raw_spin_unlock_irqrestore+0x40/0x70
     [<ffffffff810a020d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
     [<ffffffff810a02dd>] ? trace_hardirqs_on+0xd/0x10
     [<ffffffffa074c931>] process_work+0x51/0x80 [iw_cxgb4]
     [<ffffffff8106f1c8>] process_one_work+0x1b8/0x4d0
     [<ffffffff8106f14d>] ? process_one_work+0x13d/0x4d0
     [<ffffffff8106f600>] worker_thread+0x120/0x3c0
     [<ffffffff8106f4e0>] ? process_one_work+0x4d0/0x4d0
     [<ffffffff81074a0e>] kthread+0xde/0x100
     [<ffffffff815e4b40>] ? _raw_spin_unlock_irq+0x30/0x40
     [<ffffffff81074930>] ? __init_kthread_worker+0x70/0x70
     [<ffffffff815e512c>] ret_from_fork+0x7c/0xb0
     [<ffffffff81074930>] ? __init_kthread_worker+0x70/0x70

Based on original work by Steve Wise <swise@opengridcomputing.com>.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:10:46 -08:00
Pramod Kumar
123bc2a27a RDMA/cxgb4: Configure 0B MRs to match HW implementation
0B MRs need some tweaks to work correctly with HW. When writing the
TPTE, if the MR length is zero we now:

1) turn off all permissions
2) set the length to -1

While functionality/capabilities of the MR are the same with these
changes, it resolves a dapltest 0B RDMA Read test failure.  Based on
original work by Steve Wise <swise@opengridcomputing.com>.

Signed-off-by: Pramod Kumar <pramod@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:10:46 -08:00
Pramod Kumar
63a71ba617 RDMA/cxgb4: Increase epd buff size for debug interface
IPv6 address string lengths require increasing the buffer size for
debugfs handlers.

Signed-off-by: Pramod Kumar <pramod@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:10:45 -08:00
Matan Barak
d57febe1a4 net/mlx4: Add A0 hybrid steering
A0 hybrid steering is a form of high performance flow steering.
By using this mode, mlx4 cards use a fast limited table based steering,
in order to enable fast steering of unicast packets to a QP.

In order to implement A0 hybrid steering we allocate resources
from different zones:
(1) General range
(2) Special MAC-assigned QPs [RSS, Raw-Ethernet] each has its own region.

When we create a rss QP or a raw ethernet (A0 steerable and BF ready) QP,
we try hard to allocate the QP from range (2). Otherwise, we try hard not
to allocate from this  range. However, when the system is pushed to its
limits and one needs every resource, the allocator uses every region it can.

Meaning, when we run out of raw-eth qps, the allocator allocates from the
general range (and the special-A0 area is no longer active). If we run out
of RSS qps, the mechanism tries to allocate from the raw-eth QP zone. If that
is also exhausted, the allocator will allocate from the general range
(and the A0 region is no longer active).

Note that if a raw-eth qp is allocated from the general range, it attempts
to allocate the range such that bits 6 and 7 (blueflame bits) in the
QP number are not set.

When the feature is used in SRIOV, the VF has to notify the PF what
kind of QP attributes it needs. In order to do that, along with the
"Eth QP blueflame" bit, we reserve a new "A0 steerable QP". According
to the combination of these bits, the PF tries to allocate a suitable QP.

In order to maintain backward compatibility (with older PFs), the PF
notifies which QP attributes it supports via QUERY_FUNC_CAP command.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:35 -05:00
Eugenia Emantayev
ddae0349fd net/mlx4: Change QP allocation scheme
When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields
in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.

The current Ethernet driver code reserves a Tx QP range with 256b alignment.

This is wrong because if there are more than 64 Tx QPs in use,
QPNs >= base + 65 will have bits 6/7 set.

This problem is not specific for the Ethernet driver, any entity that
tries to reserve more than 64 BF-enabled QPs should fail. Also, using
ranges is not necessary here and is wasteful.

The new mechanism introduced here will support reservation for
"Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
(when hypervisors support WC in VMs). The flow we use is:

1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
   and request "BF enabled QPs" if BF is supported for the function

2. In the ALLOC_RES FW command, change param1 to:
a. param1[23:0]  - number of QPs
b. param1[31-24] - flags controlling QPs reservation

Bit 31 refers to Eth blueflame supported QPs. Those QPs must have
bits 6 and 7 unset in order to be used in Ethernet.

Bits 24-30 of the flags are currently reserved.

When a function tries to allocate a QP, it states the required attributes
for this QP. Those attributes are considered "best-effort". If an attribute,
such as Ethernet BF enabled QP, is a must-have attribute, the function has
to check that attribute is supported before trying to do the allocation.

In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
which are unsupported. If SRIOV is used, the PF validates those attributes
and masks out unsupported attributes as well. In order to notify VFs which
attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's
mailbox is filled by the PF, which notifies which QP allocation attributes
it supports.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:35 -05:00
Matan Barak
3dca0f42c7 net/mlx4_core: Use tasklet for user-space CQ completion events
Previously, we've fired all our completion callbacks straight from our ISR.

Some of those callbacks were lightweight (for example, mlx4_en's and
IPoIB napi callbacks), but some of them did more work (for example,
the user-space RDMA stack uverbs' completion handler). Besides that,
doing more than the minimal work in ISR is generally considered wrong,
it could even lead to a hard lockup of the system. Since when a lot
of completion events are generated by the hardware, the loop over those
events could be so long, that we'll get into a hard lockup by the system
watchdog.

In order to avoid that, add a new way of invoking completion events
callbacks. In the interrupt itself, we add the CQs which receive completion
event to a per-EQ list and schedule a tasklet. In the tasklet context
we loop over all the CQs in the list and invoke the user callback.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:34 -05:00
Eli Cohen
d14e71103b mlx5: Fix error flow in add_keys
If mlx5_core_create_mkey fails, decrease the pending counter to undo the
previous increment.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:45:56 -05:00
Eli Cohen
6a4f139aae mlx5: Fix sparse warnings
1. Add required __acquire/__release statements to balance spinlock usage.
2. Change the index parameter of begin_wqe() to be unsigned to match supplied
argument type.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:45:56 -05:00
Hariprasad Shenai
b2e1a3f091 RDMA/cxgb4/cxgb4vf/csiostor: Cleanup macros/register defines related to PCIE, RSS and FW
This patch cleanups all PCIE, RSS & FW related macros/register defines that are
defined in t4fw_api.h and the affected files.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-22 16:57:47 -05:00
Hariprasad Shenai
5167865aaa RDMA/cxgb4/csiostor: Cleansup FW related macros/register defines for PF/VF and LDST
This patch cleanups PF/VF and LDST related macros/register defines that are
defined in t4fw_api.h and the affected files.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-22 16:57:47 -05:00
Hariprasad Shenai
77a80e23cc RDMA/cxgb4: Cleanup Filter related macros/register defines
This patch cleanups all filter related macros/register defines that are defined
in t4fw_api.h and the affected files.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-22 16:57:46 -05:00
Al Viro
479163f460 mlx5: don't duplicate kvfree()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 14:58:18 -05:00