Commit Graph

656 Commits

Author SHA1 Message Date
Christoph Hellwig
29c0964873 nvme-fc: merge __nvme_fc_schedule_delete_work into __nvme_fc_del_ctrl
No need to have two functions doing the same thing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-11-01 16:28:04 +01:00
James Smart
71c691fd06 nvme-fc: avoid workqueue flush stalls
There's no need to wait for the full nvme_wq, which is now shared,
to flush. flush only the delete_work item.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sgi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-11-01 16:28:03 +01:00
James Smart
ecad0d2cb8 nvme-fc: remove NVME_FC_MAX_SEGMENTS
The define is an arbitrary limit to the io size on the initiator,
capping the io to 1MB-4KB.

Remove the define from the transport. I/O size will solely be limited
by the LLDD sg limits.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-27 09:25:35 +03:00
James Smart
56d5f4f108 nvme-fc: add support for duplicate_connect option
Adds support for the duplicate_connect option. When set to true,
checks whether there's an existing controller via the same host port
and target port for the same host (hostnqn, hostid) to the same
subsystem. Fails the connection request if an existing controller.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-27 09:25:32 +03:00
James Smart
36e835f243 nvme-rdma: add support for duplicate_connect option
Adds support for the duplicate_connect option. When set to true,
checks whether there's an existing controller via the same target
address (traddr), target port (trsvcid), and if specified, host
address (host_traddr). Fails the connection request if there is
an existing controller.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-27 09:25:32 +03:00
James Smart
991231dc48 nvme: add helper to compare options to controller
Adds a helper function that compares the host and subsytem
specified in a connect options list vs a controller.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-27 09:25:28 +03:00
James Smart
3b33876207 nvme: add duplicate_connect option
Add the "duplicate_connect" boolean option (presence means true).
Default is false.

When false, the transport should validate whether a new controller request
is targeted for the same host transport addressing and target transport
addressing as an existing controller. If so, the new controller request
should be rejected.

When true, the callee is explicitly requesting a duplicate controller
connection to be made and the new request should be attempted.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-27 09:25:20 +03:00
Christoph Hellwig
999ada2871 nvme: check for a live controller in nvme_dev_open
This is a much more sensible check than just the admin queue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@rimbeg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-10-27 09:05:22 +03:00
Christoph Hellwig
a6a5149b10 nvme: get rid of nvme_ctrl_list
Use the core chrdev code to set up the link between the character device
and the nvme controller.  This allows us to get rid of the global list
of all controllers, and also ensures that we have both a reference to
the controller and the transport module before the open method of the
character device is called.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sgi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
2017-10-27 09:04:53 +03:00
Christoph Hellwig
d22524a478 nvme: switch controller refcounting to use struct device
Instead of allocating a separate struct device for the character device
handle embedd it into struct nvme_ctrl and use it for the main controller
refcounting.  This removes double refcounting and gets us an automatic
reference for the character device operations.  We keep ctrl->device as a
pointer for now to avoid chaning printks all over, but in the future we
could look into message printing helpers that take a controller structure
similar to what other subsystems do.

Note the delete_ctrl operation always already has a reference (either
through sysfs due this change, or because every open file on the
/dev/nvme-fabrics node has a refernece) when it is entered now, so we
don't need to do the unless_zero variant there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
2017-10-27 09:04:07 +03:00
Christoph Hellwig
c6424a90da nvme: simplify nvme_open
Now that we are protected against lookup vs free races for the namespace
by using kref_get_unless_zero we don't need the hack of NULLing out the
disk private data during removal.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-10-27 09:03:31 +03:00
Christoph Hellwig
2dd4122854 nvme: use kref_get_unless_zero in nvme_find_get_ns
For kref_get_unless_zero to protect against lookup vs free races we need
to use it in all places where we aren't guaranteed to already hold a
reference.  There is no such guarantee in nvme_find_get_ns, so switch to
kref_get_unless_zero in this function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-10-27 09:02:58 +03:00
Nitzan Carmi
e62a538da2 nvme-rdma: Add debug message when reaches timeout
Signed-off-by: Nitzan Carmi <nitzanc@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-23 16:28:42 +02:00
Max Gurtovoy
f87c89ad93 nvme-rdma: align nvme_rdma_device structure
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-23 16:28:05 +02:00
James Smart
134aedc9c1 nvme-fc: correct io timeout behavior
The transport io timeout behavior wasn't quite correct. It ignored
that the io error handler is supposed to be synchronous so it possibly
allowed the blk request to be restarted while the io associated was
still aborting. Timeouts on reserved commands, those used for
association create, were never timing out thus they hung out forever.

To correct:
If an io is times out while a remoteport is not connected, just
restart the io timer. The lack of connectivity will simultaneously
be resetting the controller, so the reset path will abort and terminate
the io.

If an io is times out while it was marked for transport abort, just
reset the io timer. The abort process is underway and will complete
the io.

Otherwise, if an io times out, abort the io. If the abort was
unsuccessful (unlikely) give up and return not handled.

If the abort was successful, as the abort process is underway it will
terminate the io, so rather than synchronously waiting, just restart
the io timer.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-20 12:17:05 +02:00
James Smart
0a02e39fd1 nvme-fc: correct io termination handling
The io completion handling for i/o's that are failing due to
to a transport error or association termination had issues, causing
io failures (DNR set so retries didn't kick in) or long stalls.

Change the io completion handler for the following items:

When an io has been completed due to a transport abort (based on an
exchange error) or when marked as aborted as part of an association
termination (FCOP_FLAGS_TERMIO), set the NVME completion status to
NVME_SC_ABORTED. By default, do not set DNR on the status so that a
retry can be attempted after association recreate.

In cases where an io is failed (non-successful nvme status including
aborted), if the controller is being deleted (blk_queue_dying) or
the io was part of the ios used for association creation (ctrl state
is NEW or RECONNECTING), then additionally set the DNR bit so the io
will not be retried. If the failed io was part of association creation,
the failure will tear down the partially completioned association and
typically restart a new reconnect attempt (another create association
later).

Rearranged code flow to remove a largely unneeded local variable.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-20 12:16:59 +02:00
Chaitanya Kulkarni
a7a7cbe353 nvme-pci: add SGL support
This adds SGL support for NVMe PCIe driver, based on an earlier patch
from Rajiv Shanmugam Madeswaran <smrajiv15 at gmail.com>. This patch
refactors the original code and adds new module parameter sgl_threshold
to determine whether to use SGL or PRP for IOs.

The usage of SGLs is controlled by the sgl_threshold module parameter,
which allows to conditionally use SGLs if average request segment
size (avg_seg_size) is greater than sgl_threshold. In the original patch,
the decision of using SGLs was dependent only on the IO size,
with the new approach we consider not only IO size but also the
number of physical segments present in the IO.

We calculate avg_seg_size based on request payload bytes and number
of physical segments present in the request.

For e.g.:-

1. blk_rq_nr_phys_segments = 2 blk_rq_payload_bytes = 8k
avg_seg_size = 4K use sgl if avg_seg_size >= sgl_threshold.

2. blk_rq_nr_phys_segments = 2 blk_rq_payload_bytes = 64k
avg_seg_size = 32K use sgl if avg_seg_size >= sgl_threshold.

3. blk_rq_nr_phys_segments = 16 blk_rq_payload_bytes = 64k
avg_seg_size = 4K use sgl if avg_seg_size >= sgl_threshold.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-20 12:16:58 +02:00
Christoph Hellwig
9843f685ae nvme: use ida_simple_{get,remove} for the controller instance
Switch to the ida_simple_* helpers instead of opencoding them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-10-20 12:16:57 +02:00
Israel Rukshin
5a22e2bf44 nvme-fc: Add BLK_MQ_F_NO_SCHED flag to admin tag set
Since commit b86dd81
"block: get rid of blk-mq default scheduler choice Kconfig entries",
when setting nr_hw_queues to 1 the admin tag set uses mq-deadline scheduler.
This flag is useful for admin queues that aren't used for normal IO.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart  <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 10:58:54 +02:00
Israel Rukshin
94f29d4f78 nvme-rdma: Add BLK_MQ_F_NO_SCHED flag to admin tag set
Since commit b86dd81
"block: get rid of blk-mq default scheduler choice Kconfig entries",
when setting nr_hw_queues to 1 the admin tag set uses mq-deadline scheduler.
This flag is useful for admin queues that aren't used for normal IO.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 10:58:53 +02:00
Minwoo Im
16772ae6d9 nvme-pci: fix typos in comments
fixed comment typos in adapter_alloc_cq() and adapter_alloc_sq().
'the the' duplications are replaced with 'that the'.

Signed-off-by: Minwoo Im <dn3108@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 10:56:07 +02:00
Sagi Grimberg
0ad0bfa298 nvme-rdma: stop controller reset if the controller is deleting
If the controller is deleting (in case the user decided to delete it), we
have no point to continue reset sequence.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:29:39 +02:00
Sagi Grimberg
5013e98b5e nvme-rdma: change queue flag semantics DELETING -> ALLOCATED
Instead of marking we are deleting, mark we are allocated and check that
instead. This makes the logic symmetrical to connected mark check.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:28:55 +02:00
Sagi Grimberg
60a5188633 nvme-rdma: Don't local invalidate if the queue is not live
No chance for the local invalidate to succeed if the queue-pair
is in error state. Most likely the target will do a remote
invalidation of our mr so not a big loss on the test_bit.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:54 +02:00
Sagi Grimberg
5e1fe61d41 nvme-rdma: teardown admin/io queues once on error recovery
Relying on the queue state while tearing down on every reconnect
attempt is not a good design. We should do it once in err_work
and simply try to establish the queues for each reconnect attempt.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:53 +02:00
Sagi Grimberg
0fc176dfda nvme-rdma: Check that reinit_request got a proper mr
Warn if req->mr is NULL as it should never happen.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:52 +02:00
Sagi Grimberg
0c5b43b9c1 nvme-rdma: move assignment to declaration
No need for the extra line for trivial assignments.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:52 +02:00
Sagi Grimberg
d8bfceebc4 nvme-rdma: fix wrong logging message
Not necessarily address resolution failed.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:51 +02:00
Sagi Grimberg
60070c78ef nvme-rdma: pass tagset to directly nvme_rdma_free_tagset
Instead of flagging admin/io.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:50 +02:00
Sagi Grimberg
31b8446079 nvme: introduce nvme_reinit_tagset
Move blk_mq_reinit_tagset from blk-mq to nvme core
as the only user of it. Current transports that use
it (rdma, fc) simply implement .reinit_request op.

This patch does not change any functionality.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-18 19:27:48 +02:00
Christoph Hellwig
761f2e1ed8 nvme: simplify compat_ioctl handling
We can just use our normal ioctl handler for the compat case and remove
the boilerplate code for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
2017-10-16 14:54:10 +02:00
James Smart
469d0ef06d nvme-fc: move remote port get/put/free location
move nvme_fc_rport_get/put and rport free to higher in the file to
avoid adding prototypes to resolve references in upcoming code additions

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-05 10:02:47 +02:00
James Smart
5f5685569a nvme-fc: create fc class and transport device
Added a new fc class and a device node for udev events under it.  I
expect the fc class will eventually be the location where the FC SCSI and
FC NVME merge in the future. Therefore names are kept somewhat generic.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-04 09:48:23 +02:00
James Smart
eaefd5abf6 nvme-fc: add uevent for auto-connect
To support auto-connecting to FC-NVME devices upon their dynamic
appearance, add a uevent that can kick off connection scripts.
uevent is posted against the fc_udev device.

patch set tested with the following rule to kick an nvme-cli connect-all
for the FC initiator and FC target ports. This is just an example for
testing and not intended for real life use.

ACTION=="change", SUBSYSTEM=="fc", ENV{FC_EVENT}=="nvmediscovery", \
        ENV{NVMEFC_HOST_TRADDR}=="*", ENV{NVMEFC_TRADDR}=="*", \
	RUN+="/bin/sh -c '/usr/local/sbin/nvme connect-all --transport=fc --host-traddr=$env{NVMEFC_HOST_TRADDR} --traddr=$env{NVMEFC_TRADDR} >> /tmp/nvme_fc.log'"

I will post proposed udev/systemd scripts for possible kernel support.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-04 09:48:20 +02:00
Sagi Grimberg
d1f1071f81 nvme-fabrics: request transport module
Help userspace to make sure transport module is loaded.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-04 09:43:58 +02:00
Marc Olson
8ae4e4477d nvme: update timeout module parameter type
The underlying blk_mq_tag_set, and request timeout parameters support an
unsigned int. Extend the size of the nvme module parameters for io and
admin commands to match.

Signed-off-by: Marc Olson <marcolso@amazon.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-04 09:43:09 +02:00
Sagi Grimberg
e4d753d7e5 nvme-rdma: don't fully stop the controller in error recovery
By calling nvme_stop_ctrl on a already failed controller will wait for the
scan work to complete (only by identify timeout expiration which is 60
seconds). This is unnecessary when we already know that the controller has
failed.

Reported-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 12:42:11 -06:00
Sagi Grimberg
0a960afd60 nvme-rdma: give up reconnect if state change fails
If we failed to transition to state LIVE after a successful reconnect,
then controller deletion already started. In this case there is no
point moving forward with reconnect.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 12:42:11 -06:00
Sagi Grimberg
1a40d97288 nvme-core: Use nvme_wq to queue async events and fw activation
async_event_work might race as it is executed from two different
workqueues at the moment.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 12:42:11 -06:00
Guilherme G. Piccoli
8edd11c9ad nvme-fabrics: Allow 0 as KATO value
Currently, driver code allows user to set 0 as KATO
(Keep Alive TimeOut), but this is not being respected.
This patch enforces the expected behavior.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 08:56:05 -06:00
James Smart
0951338d96 nvme: allow timed-out ios to retry
Currently the nvme_req_needs_retry() applies several checks to see if
a retry is allowed. On of those is whether the current time has exceeded
the start time of the io plus the timeout length. This check, if an io
times out, means there is never a retry allowed for the io. Which means
applications see the io failure.

Remove this check and allow the io to timeout, like it does on other
protocols, and retries to be made.

On the FC transport, a frame can be lost for an individual io, and there
may be no other errors that escalate for the connection/association.
The io will timeout, which causes the transport to escalate into creating
a new association, but the io that timed out, due to this retry logic, has
already failed back to the application and things are hosed.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 08:56:05 -06:00
James Smart
cd48282cc7 nvme: stop aer posting if controller state not live
If an nvme async_event command completes, in most cases, a new
async event is posted. However, if the controller enters a
resetting or reconnecting state, there is nothing to block the
scheduled work element from posting the async event again. Nor are
there calls from the transport to stop async events when an
association dies.

In the case of FC, where the association is torn down, the aer must
be aborted on the FC link and completes through the normal job
completion path. Thus the terminated async event ends up being
rescheduled even though the controller isn't in a valid state for
the aer, and the reposting gets the transport into a partially torn
down data structure.

It's possible to hit the scenario on rdma, although much less likely
due to an aer completing right as the association is terminated and
as the association teardown reclaims the blk requests via
nvme_cancel_request() so its immediate, not a link-related action
like on FC.

Fix by putting controller state checks in both the async event
completion routine where it schedules the async event and in the
async event work routine before it calls into the transport. It's
effectively a "stop_async_events()" behavior.  The transport, when
it creates a new association with the subsystem will transition
the state back to live and is already restarting the async event
posting.

Signed-off-by: James Smart <james.smart@broadcom.com>
[hch: remove taking a lock over reading the controller state]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 08:56:05 -06:00
Keith Busch
d087747384 nvme-pci: Print invalid SGL only once
The WARN_ONCE macro returns true if the condition is true, not if the
warn was raised, so we're printing the scatter list every time it's
invalid. This is excessive and makes debugging harder, so this patch
prints it just once.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 08:56:05 -06:00
Keith Busch
161b8be2bd nvme-pci: initialize queue memory before interrupts
A spurious interrupt before the nvme driver has initialized the completion
queue may inadvertently cause the driver to believe it has a completion
to process. This may result in a NULL dereference since the nvmeq's tags
are not set at this point.

The patch initializes the host's CQ memory so that a spurious interrupt
isn't mistaken for a real completion.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 08:56:05 -06:00
James Smart
d9d34c0b23 nvme-fc: use transport-specific sgl format
Sync with NVM Express spec change and FC-NVME 1.18.

FC transport sets SGL type to Transport SGL Data Block Descriptor and
subtype to transport-specific value 0x0A.

Removed the warn-on's on the PRP fields. They are unneeded. They were
to check for values from the upper layer that weren't set right, and
for the most part were fine. But, with Async events, which reuse the
same structure and 2nd time issued the SGL overlay converted them to
the Transport SGL values - the warn-on's were errantly firing.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 08:56:05 -06:00
James Smart
56b7103a06 nvme-fc: remove use of FC-specific error codes
The FC-NVME transport used the FC-specific error codes in cases where
it had to fabricate an error to go back up stack. Instead of using the
FC-specific values, now use a generic value (NVME_SC_INTERNAL).

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25 08:56:05 -06:00
Christoph Hellwig
044a9df1a7 nvme-pci: implement the HMB entry number and size limitations
Adds support for the new Host Memory Buffer Minimum Descriptor Entry Size
and Host Memory Maximum Descriptors Entries field that were added in
TP 4002 HMB Enhancements.  These allow the controller to advertise
limits for the usual number of segments in the host memory buffer, as
well as a minimum usable per-segment size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
2017-09-11 12:29:40 -04:00
Christoph Hellwig
9620cfba97 nvme-pci: propagate (some) errors from host memory buffer setup
We want to catch command execution errors when resetting the device, so
propagate errors from the Set Features when setting up the host memory
buffer.  We keep ignoring memory allocation failures, as the spec
clearly says that the controller must work without a host memory buffer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org
2017-09-11 12:29:39 -04:00
Akinobu Mita
30f92d62e5 nvme-pci: use appropriate initial chunk size for HMB allocation
The initial chunk size for host memory buffer allocation is currently
PAGE_SIZE << MAX_ORDER.  MAX_ORDER order allocation is usually failed
without CONFIG_DMA_CMA.  So the HMB allocation is retried with chunk size
PAGE_SIZE << (MAX_ORDER - 1) in general, but there is no problem if the
retry allocation works correctly.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
[hch: rebased]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org
2017-09-11 12:29:38 -04:00
Christoph Hellwig
92dc689563 nvme-pci: fix host memory buffer allocation fallback
nvme_alloc_host_mem currently contains two loops that are interwinded,
and the outer retry loop turns out to be broken.  Fix this by untangling
the two.

Based on a report an initial patch from Akinobu Mita.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Akinobu Mita <akinobu.mita@gmail.com>
Tested-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org
2017-09-11 12:29:37 -04:00