Commit Graph

575565 Commits

Author SHA1 Message Date
Doug Ledford
520a07bff6 Merge branches 'i40iw', 'sriov' and 'hfi1' into k.o/for-4.6 2016-03-21 17:32:23 -04:00
Eli Cohen
68996a6e76 IB/ipoib: Allow mcast packets from other VFs
With SRIOV enabled, two VFs on the same HCA which have the same port LID
and may have the same QP number. To enable receiving multicasts from
such VFs, further qualify the check: ignore the receive only if, in
addition, the packet source gid equals the receiving VF's source gid.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 17:13:14 -04:00
Eli Cohen
eff901d30e IB/mlx5: Implement callbacks for manipulating VFs
Implement the IB defined callbacks used to manipulate the policy for the
link state, set GUIDs or get statistics information. This functionality
is added into a new file that will be used to add any SRIOV related
functionality to the mlx5 IB layer.

The following callbacks have been added:

mlx5_ib_get_vf_config
mlx5_ib_set_vf_link_state
mlx5_ib_get_vf_stats
mlx5_ib_set_vf_guid

In addition, publish whether this device is based on a virtual function.

In mlx5 supported devices, virtual functions are implemented as vHCAs.
vHCAs have their own QP number space so it is possible that two vHCAs
will use a QP with the same number at the same time.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 17:13:14 -04:00
Eli Cohen
1f324bff9b net/mlx5_core: Implement modify HCA vport command
Implement the modify HCA vport commands used to modify the parameters of
virtual HCA's ports.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 17:13:14 -04:00
Eli Cohen
2a4826fe74 net/mlx5_core: Add VF param when querying vport counter
Add a vf parameter to mlx5_core_query_vport_counter so we can call it to
query counters of virtual functions. Also update current users of the
API.

PFs may call mlx5_core_query_vport_counter with other_vport set to
indicate that they are querying a virtual function. The virtual
function to be queried is given by the vf parameter. Virtual function
numbering is zero based so the first VF is 0 and so on. When a PF
queries its own function, the other_vport parameter is cleared.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 17:13:14 -04:00
Eli Cohen
9c3c5f8e1f IB/ipoib: Add ndo operations for configuring VFs
Add ndo operations to the network driver that enables configuring the
following operations:

ipoib_set_vf_link_state - configure the VF link policy
ipoib_get_vf_config - get link state configuration
ipoib_set_vf_guid - set a VF port or node GUID
ipoib_get_vf_stats - get statistics of a VF

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 17:13:14 -04:00
Eli Cohen
50174a7f2c IB/core: Add interfaces to control VF attributes
Following the practice exercised for network devices which allow the PF
net device to configure attributes of its virtual functions, we
introduce the following functions to be used by IPoIB which is the
network driver implementation for IB devices.

ib_set_vf_link_state - set the policy for a VF link. More below.
ib_get_vf_config - read configuration information of a VF
ib_get_vf_stats - read VF statistics
ib_set_vf_guid - set the node or port GUID of a VF

Also add an indication in the device cap flags that indicates that this
IB devices is based on a virtual function.

A VF shares the physical port with the PF and other VFs. When setting
the link state we have three options:

1. Auto - in this mode, the virtual port follows the state of the
   physical port and becomes active only if the physical port's state is
   active. In all other cases it remains in a Down state.
2. Down - sets the state of the virtual port to Down
3. Up - causes the virtual port to transition into Initialize state if
   it was not already in this state. A virtualization aware subnet manager
   can then bring the state of the port into the Active state.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 17:13:14 -04:00
Eli Cohen
a0c1b2a350 IB/core: Support accessing SA in virtualized environment
Per the ongoing standardisation process, when virtual HCAs are present
in a network, traffic is routed based on a destination GID. In order to
access the SA we use the well known SA GID.

We also add a GRH required boolean field to the port attributes which is
used to report to the verbs consumer whether this port is connected to a
virtual network. We use this field to realize whether we need to create
an address vector with GRH to access the subnet administrator. We clear
the port attributes struct before calling the hardware driver to make
sure the default remains that GRH is not required.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:34:06 -04:00
Eli Cohen
fad61ad4e7 IB/core: Add subnet prefix to port info
The subnet prefix is a part of the port_info MAD returned and should be
available at the ib_port_attr struct. We define it here and provide a
default implementation in case the hardware driver does not provide one.
The subnet prefix is required when creating the address vector to access
the SA in networks where GRH must be used.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:34:06 -04:00
Eli Cohen
d603c809ef IB/mlx5: Fix decision on using MAD_IFC
Fix the condition that dictates when MAD_IFC should be used. According
to firmware specifications, MAD_IFC commands must be used only if the
ib_virt capability is off.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:34:06 -04:00
Eli Cohen
cc8e27cc97 net/core: Add support for configuring VF GUIDs
Add two new NLAs to support configuration of Infiniband node or port
GUIDs. New applications can choose to use this interface to configure
GUIDs with iproute2 with commands such as:

ip link set dev ib0 vf 0 node_guid 00:02:c9:03:00:21:6e:70
ip link set dev ib0 vf 0 port_guid 00:02:c9:03:00:21:6e:78

A new ndo, ndo_sef_vf_guid is introduced to notify the net device of the
request to change the GUID.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:34:06 -04:00
Leon Romanovsky
fb532d6a79 IB/{core, ulp} Support above 32 possible device capability flags
The old bitwise device_cap_flags variable was limited to u32 which
has all bits already defined. In order to overcome it, we converted
device_cap_flags variable to be u64 type.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:32:59 -04:00
Leon Romanovsky
2953f42513 IB/core: Replace setting the zero values in ib_uverbs_ex_query_device
The setting to zero during variable initialization eliminates
the need to explicitly set to zero variables and structures.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:32:36 -04:00
Sagi Grimberg
3f0393a575 net/mlx5_core: Introduce offload arithmetic hardware capabilities
Define the necessary hardware structures for the offload
arithmetic capabilities and read/cache them on driver load.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:32:35 -04:00
Leon Romanovsky
b06e7de8a9 net/mlx5_core: Refactor device capability function
Device capability function was called similar in all places.
It was called twice for every queried parameter, while the
difference between calls was in HCA capability mode only.

The change proposed unify these calls into one function.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:29:07 -04:00
Leon Romanovsky
91d9ed8443 net/mlx5_core: Fix caching ATOMIC endian mode capability
Add caching of maximum device capability of ATOMIC endian mode.

Fixes: f91e6d8941 ('net/mlx5_core: Add setting ATOMIC endian mode')
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:29:07 -04:00
Dan Carpenter
5658600e7f ib_srpt: fix a WARN_ON() message
The first argument of WARN_ON() is a condition, so it means the warning
message here will just be the name without the ->qp_num information.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:06:27 -04:00
Tatyana Nikolova
34abf9ed73 i40iw: Replace the obsolete crypto hash interface with shash
This patch replaces the obsolete crypto hash interface with shash
and resolves a build failure after merge of the rdma tree
which is caused by the removal of crypto hash interface

Removing CRYPTO_ALG_ASYNC from crypto_alloc_shash(),
because it is by definition sync only

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 16:02:24 -04:00
Mitko Haralanov
5511d78107 IB/hfi1: Add SDMA cache eviction algorithm
This commit adds a cache eviction algorithm for the SDMA
user buffer cache.

Besides the interval RB tree used for node lookup, the cache
nodes are also arranged in a doubly-linked list. When a node is
used, it is put at the beginning of the list. Less frequently
used nodes naturally move to the tail of the list.

When the cache limit is reached, the eviction code starts
traversing the linked list in reverse, freeing buffers until
enough space has been freed to fit the new user buffer. This
guarantees that only the least used cache nodes will be removed
from the cache.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:25 -04:00
Mitko Haralanov
a7922f7ddf IB/hfi1: Switch to using the pin query function
Use the new function to query whether the expected receive
user buffer can be pinned successfully. This requires that
a new variable be added to the hfi1_filedata structure used
to hold the number of pages pinned by the expected receive
code.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:25 -04:00
Mitko Haralanov
bd3a8947de IB/hfi1: Specify mm when releasing pages
This change adds a pointer to the process mm_struct when
calling hfi1_release_user_pages().

Previously, the function used the mm_struct of the current
process to adjust the number of pinned pages. However, is some
cases, namely when unpinning pages due to a MMU notifier call,
we want to drop into that code block as it will cause a deadlock
(the MMU notifiers take the process' mmap_sem prior to calling
the callbacks).

By allowing to caller to specify the pointer to the mm_struct,
the caller has finer control over that part of hfi1_release_user_pages().

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:25 -04:00
Mitko Haralanov
2c97ce4f3c IB/hfi1: Add pin query function
System administrators can use the locked memory
ulimit setting to set the maximum amount of memory
a user can lock/pin. However, this setting alone is not
enough to guarantee good operation of the hfi1 driver
due to the fact that the setting does not have fine
enough granularity to account for the limit being used
by multiple user processes and caches.

Therefore, a better limiting algorithm is needed. This
is where the new hfi1_can_pin_pages() function and the
cache_size module parameter come in.

The function works by looking at the ulimit and cache_size
value to compute a cache size. The algorithm examines the
ulimit value and, if it is not "unlimited", computes a
per-cache limit based on the number of configured user
contexts.

After that, the lower of the two - cache_size and computed
per-cache limit - is used.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:24 -04:00
Mitko Haralanov
5cd3a88d7f IB/hfi1: Implement SDMA-side buffer caching
Add support for caching of user buffers used for SDMA
transfers. This change improves performance by
avoiding repeatedly pinning the pages of buffers, which
are being re-used by the application.

While the cost of the pinning operation has been made
heavier by adding the extra code to search the cache tree,
re-allocate pages arrays, and future cache evictions,
that cost will be amortized against the savings when the
same buffer is re-used. It is also worth noting that in
most cases, the cost of pinning should be much lower due
to the buffer already being in the cache.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:24 -04:00
Mitko Haralanov
a489876010 IB/hfi1: Adjust last address values for intervals
Last address values for intervals in the interval RB tree
nodes should be non-inclusive in order to avoid confusing
ranges.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:23 -04:00
Mitko Haralanov
0f310a00e0 IB/hfi1: Add filter callback
This commit adds a filter callback, which can be used to filter
out interval RB nodes matching a certain interval down to a
single one.

This is needed for the upcoming SDMA-side caching where buffers
will need to be filtered by their virtual address.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:23 -04:00
Mitko Haralanov
b8718e2e2e IB/hfi1: Remove compare callback
Interval RB trees provide their own searching function,
which also takes care of determining the path through
the tree that should be taken.

This make the compare callback unnecessary.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:23 -04:00
Mitko Haralanov
353b71c7c0 IB/hfi1: Add MMU tracing
Add a new tracepoint type for the MMU functions and calls
to that tracepoint to allow tracing of MMU functionality.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:22 -04:00
Mitko Haralanov
df5a00f81d IB/hfi1: Use interval RB trees
The interval RB trees can handle RB nodes which
hold ranged information. This is exactly the usage
for the buffer cache implemented in the expected
receive code path.

Convert the MMU/RB functions to use the interval RB
tree API. This will help with future users of the
caching API, as well.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:22 -04:00
Mitko Haralanov
909e2cd004 IB/hfi1: Notify remove MMU/RB callback of calling context
Tell the remove MMU/RB callback if it's being called as
part of a memory invalidation or not. This can be important
in preventing a deadlock if the remove callback attempts to
take the map_sem semaphore because the kernel's MMU
invalidation functions have already taken it.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:22 -04:00
Mitko Haralanov
368f2b59d0 IB/hfi1: Remove the use of add/remove RB function pointers
The usage of function pointers for RB node insertion
and removal in the expected receive code path was
meant to be a small performance optimization. However,
maintaining it, especially with the new MMU API, would
become more troublesome as the API is extended.

Since the performance optimization is minor, remove the
function pointers and replace with direct calls.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:21 -04:00
Mitko Haralanov
eef9c896a9 IB/hfi1: Allow remove MMU callbacks to free nodes
In order to allow the remove MMU callbacks to free the
RB nodes, it is necessary to prevent any references to
the nodes after the remove callback has been called.

Therefore, remove the node from the tree prior to calling
the callback. In other words, the MMU/RB API now guarantees
that all RB node operations it performs will be done prior
to calling the remove callback and that the RB node will
not be touched afterwards.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:21 -04:00
Mitko Haralanov
4b00d9490f IB/hfi1: Prevent NULL pointer dereference
Prevent a potential NULL pointer dereference (found
by code inspection) when unregistering an MMU handler.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:20 -04:00
Mitko Haralanov
c81e1f6452 IB/hfi1: Allow MMU function execution in IRQ context
Future users of the MMU/RB functions might be searching or
manipulating the MMU RB trees in interrupt context. Therefore,
the MMU/RB functions need to be able to run in interrupt
context. This requires that we use the IRQ-aware API for
spin locks.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:55:20 -04:00
Mitko Haralanov
06e0ffa693 IB/hfi1: Re-factor MMU notification code
The MMU notification code added to the
expected receive side has been re-factored and
split into it's own file. This was done in
order to make the code more general and, therefore,
usable by other parts of the driver.

The caching behavior remains the same. However,
the handling of the RB tree (insertion, deletions,
and searching) as well as the MMU invalidation
processing is now handled by functions in the
mmu_rb.[ch] files.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-21 15:54:59 -04:00
Alex Estrin
000a830efd IB/rdmavt: Post receive for QP in ERR state
Accordingly IB Spec post WR to receive queue must
complete with error if QP is in Error state.
Please refer to C10-42, C10-97.2.1

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:22 -04:00
Mike Marciniszyn
d0e859c328 IB/hfi1: Enable adaptive pio by default
Set the piothreshold to the agreed upon default of 256B.

Reviewed-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:21 -04:00
Mike Marciniszyn
47177f1bac IB/hfi1: Fix adaptive pio packet corruption
The adaptive pio heuristic missed a case that causes a corrupted
packet on the wire.

The case is if SDMA egress had been chosen for a pio-able packet and
then encountered a ring space wait, the packet is queued.   The sge
cursor had been incremented as part of the packet build out for SDMA.

After the send engine restart, the heuristic might now chose pio based
on the sdma count being zero and start the mmio copy using the already
incremented sge cursor.

Fix this by forcing SDMA egress when the SDMA descriptor has already
been built.

Additionally, the code to wait for a QPs pio count to zero when
switching to SDMA was missing.  Add it.

There is also an issue with UD QPs, in that the different SLs can pick
a different egress send context.  For now, just insure the UD/GSI
always go through SDMA.

Reviewed-by: Vennila Megavannan <vennila.megavannan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:21 -04:00
Mike Marciniszyn
cef504c5c0 IB/hfi1: Fix panic in adaptive pio
The following panic occurs while running ib_send_bw -a with
adaptive pio turned on:

[ 8551.143596] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 8551.152986] IP: [<ffffffffa0902a94>] pio_wait.isra.21+0x34/0x190 [hfi1]
[ 8551.160926] PGD 80db21067 PUD 80bb45067 PMD 0
[ 8551.166431] Oops: 0000 [#1] SMP
[ 8551.276725] task: ffff880816bf15c0 ti: ffff880812ac0000 task.ti: ffff880812ac0000
[ 8551.285705] RIP: 0010:[<ffffffffa0902a94>] pio_wait.isra.21+0x34/0x190 [hfi1]
[ 8551.296462] RSP: 0018:ffff880812ac3b58  EFLAGS: 00010282
[ 8551.303029] RAX: 000000000000002d RBX: 0000000000000000 RCX: 0000000000000800
[ 8551.311633] RDX: ffff880812ac3c08 RSI: 0000000000000000 RDI: ffff8800b6665e40
[ 8551.320228] RBP: ffff880812ac3ba0 R08: 0000000000001000 R09: ffffffffa09039a0
[ 8551.328820] R10: ffff880817a0c000 R11: 0000000000000000 R12: ffff8800b6665e40
[ 8551.337406] R13: ffff880817a0c000 R14: ffff8800b6665800 R15: ffff8800b6665e40
[ 8551.355640] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8551.362674] CR2: 0000000000000000 CR3: 000000080abe8000 CR4: 00000000001406e0
[ 8551.371262] Stack:
[ 8551.374119]  ffff880812ac3bf0 ffff88080cf54010 ffff880800000800 ffff880812ac3c08
[ 8551.383036]  ffff8800b6665800 ffff8800b6665e40 0000000000000202 ffffffffa08e7b80
[ 8551.391941]  00000001007de431 ffff880812ac3bc8 ffffffffa0904645 ffff8800b6665800
[ 8551.400859] Call Trace:
[ 8551.404214]  [<ffffffffa08e7b80>] ? hfi1_del_timers_sync+0x30/0x30 [hfi1]
[ 8551.412417]  [<ffffffffa0904645>] hfi1_verbs_send+0x215/0x330 [hfi1]
[ 8551.420154]  [<ffffffffa08ec126>] hfi1_do_send+0x166/0x350 [hfi1]
[ 8551.427618]  [<ffffffffa055a533>] rvt_post_send+0x533/0x6a0 [rdmavt]
[ 8551.435367]  [<ffffffffa050760f>] ib_uverbs_post_send+0x30f/0x530 [ib_uverbs]
[ 8551.443999]  [<ffffffffa0501367>] ib_uverbs_write+0x117/0x380 [ib_uverbs]
[ 8551.452269]  [<ffffffff815810ab>] ? sock_recvmsg+0x3b/0x50
[ 8551.459071]  [<ffffffff81581152>] ? sock_read_iter+0x92/0xe0
[ 8551.466068]  [<ffffffff81212857>] __vfs_write+0x37/0x100
[ 8551.472692]  [<ffffffff81213532>] ? rw_verify_area+0x52/0xd0
[ 8551.479682]  [<ffffffff81213782>] vfs_write+0xa2/0x1a0
[ 8551.486089]  [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
[ 8551.493891]  [<ffffffff812146c5>] SyS_write+0x55/0xc0
[ 8551.500220]  [<ffffffff816ae0ee>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 8551.531284] RIP  [<ffffffffa0902a94>] pio_wait.isra.21+0x34/0x190 [hfi1]
[ 8551.539508]  RSP <ffff880812ac3b58>
[ 8551.544110] CR2: 0000000000000000

The priv s_sendcontext pointer was not setup properly.  Fix with this
patch by using the s_sendcontext and eliminating its send engine use.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:21 -04:00
Mike Marciniszyn
60df29581f IB/hfi1: Fix PIO wakeup timing hole
There is a timing hole if there had been greater than
PIO_WAIT_BATCH_SIZE waiters.  This code will dispatch the first
batch but leave the others in the queue.   If the restarted waiters
don't in turn wait on a buffer, there is a hang.

Fix by forcing a return when the QP queue is non-empty.

Reviewed-by: Vennila Megavannan <vennila.megavannan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:20 -04:00
Mike Marciniszyn
5326dfbf00 IB/hfi1: Fix ordering of trace for accuracy
The postitioning of the sdma ibhdr trace was
causing an extra trace message when the tx send
returned -EBUSY.

Move the trace to just before the return
and handle negative return values to avoid
any trace.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:20 -04:00
Mike Marciniszyn
1db78eeebe IB/hfi1: Add unique trace point for pio and sdma send
This allows for separately enabling pio and sdma
tracepoints to cut the volume of trace information.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:19 -04:00
Mike Marciniszyn
ef6d8c4ec8 IB/hfi1: Fix issues with qp_stats print
The changes are to aid in coorelating trace information
with QPs between the trace and qp_stats information

Such changes include adds a space after QP and clarifying that the second
QP is actually the remote QP.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:19 -04:00
Mike Marciniszyn
ef086c0d5d IB/hfi1: Report pid in qp_stats to aid debug
Tracking user/QP ownership is needed to debug issues with
user ULPs like OpenMPI.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:19 -04:00
Easwar Hariharan
2243472e9d IB/hfi1: Improve LED beaconing
The current LED beaconing code is unclear and uses the timer handler to
turn off the timer. This patch simplifies the code by removing the
special semantics of timeon = timeoff = 0 being interpreted as a request
to turn off the beaconing.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:18 -04:00
Kaike Wan
831464ce4b IB/hfi1: Don't call cond_resched in atomic mode when sending packets
This patch fixed the problem where the driver might reschedule in atomic
mode when sending packets. This is due to the fact that the call to
cond_resched() in hfi1_do_send() might occur in atomic mode and a check is
required to avoid the warning message:
    "kernel: BUG: scheduling while atomic: swapper/2/0/0x10000100."

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:18 -04:00
Dean Luick
528ee9fbf0 IB/hfi1: Add adaptive cacheless verbs copy
The kernel memcpy is faster than a cacheless copy.  However,
if too much of the L3 cache is overwritten by one-time copies
then overall bandwidth suffers.  Implement an adaptive scheme
where full page copies are tracked and if the number of unique
entries are larger than a threshold, verbs will use a cacheless
copy.  Tracked entries are gradually cleaned, allowing memcpy to
resume once the larger copies have stopped.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:17 -04:00
Jubin John
8fefef125e IB/hfi1: Handle host handshake timeout
Host handshake timeout can occur during the verify capability
state. This is a LNI related failure and should be
handled in the same way as other LNI failures.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:17 -04:00
Dean Luick
c9c8ea3d47 IB/hfi1: Add ASIC flag view/clear
Different OSes using parts of the same hardware may leave
cross-device flags set.  Export a debugfs file to view and
clear these flags if needed.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:17 -04:00
Dean Luick
ae993e7fba IB/hfi1: Hold i2c resource across debugfs open/close
External i2c firmware updates are done in multiple steps and
cannot have other things done in between.  For debugfs files,
acquire the resource on open and release it on close.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:16 -04:00
Dean Luick
b0506f4c56 IB/hfi1: Reduce hardware mutex timeout
The hardware mutex is now held only long enough to set
or clear flags.  Reduce the timeout to something more
reasonable.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-03-17 15:55:16 -04:00