Only read and write commands need DIF remapping. Everything else uses
a passthrough integrity payload.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Keep the metadata code in a separate helper instead of making the
main function more complicated.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The mutex protects against the list of transports changing while a
controller is being created, but using a plain old mutex means that it
also serializes controller creation. This unnecessarily slows down
creating multiple controllers - for example for the RDMA transport,
creating a controller involves establishing one connection for every IO
queue, which involves even more network/software round trips, so the
delay can become significant.
The simplest way to fix this is to change the mutex to an rwsem and only
hold it for writing when the list is being mutated. Since we can take
the rwsem for reading while creating a controller, we can create multiple
controllers in parallel.
Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Instead validate that these identifiers do not change, as that is
prohibited by the specification.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
The function is used in two places, and the shared code for those will
diverge later in this series.
Instead factor out a new helper to get the ids for a namespace, simplify
the calling conventions for nvme_identify_ns and just open code the
sequence.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
And move the flags for the flags field near that field while touching
this area.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
If an NVMe controller reports RTD3 Entry Latency larger than
shutdown_timeout, up to a maximum of 60 seconds, use that value to set
the shutdown timer. Otherwise fall back to the module parameter which
defaults to 5 seconds.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
[hch: removed do_div, made transition time local scope]
Signed-off-by: Christoph Hellwig <hch@lst.de>
The value of iod->first_dma ends up as prp2 in NVMe commands. In case
there is not enough data to cross a page boundary, iod->first_dma is
never initialized and contains random data.
Comply with the NVMe specification and fill in 0 in that case.
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This patch slightly improves performance (mainly for small block sizes).
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This changes the earlier patch "nvmet: don't report 0-bytes
in serial number" to use the memcpy_and_pad() helper introduced
in a previous patch.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The existing nvmet_fc sg list handling has 2 faults:
a) the request between LLDD and transport has too large of an sg
list (256 elements), which is normally 256k (64 elements).
b) sglist handling doesn't optimize on the fact that each element
is a page.
This patch removes the static sg list in the request and uses the
dynamic list already present in the nvmet_fc transport. It also
simplies the handling of the sg list on multiple sequences to
take advantage of the per-page divisions.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
If the LLDD resets or detaches from an fc port, the LLDD will
deregister all remoteports seen by the fc port and deregister the
localport associated with the fc port. The teardown of the localport
structure will be held off due to reference counting until all the
remoteports are removed (and they are held off until all
controllers/associations to terminated). Currently, if the fc port
is reinit/reattached and registered again as a localport it is
treated as an independent entity from the prior localport and all
prior remoteports and controllers cannot be revived. They are
created as new and separate entities.
This patch changes the localport registration to look at the known
localports that are waiting to be torndown. If they are the same port
based on wwn's, the local port is transitioned out of the teardown
state. This allows the remote ports and controller connections to
be reestablished and resumed as long as the localport can also be
reregistered within the timeout windows.
The patch adds a new routine nvme_fc_attach_to_unreg_lport() with
the functionality and moves the lport get/put routines to avoid
forward references.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This helps users to quickly spot the reason of why connection fails
if the hostid is not compliant with the uuid format.
Signed-off-by: Guan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
To make the nvme_rdma_configure_admin_queue generic in preparation of
moving it to common code.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
No need to queue an extra work to indirect controller removal, just call the
ctrl remove routine.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This should pair with nvme_rdma_stop_queue. While this is not a complete
inverse, it still pairs up pretty well because in fabrics we don't have a
disconnect capsule (yet) but we simply teardown the transport association.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Give it a name symmetric to nvme_rdma_free_queue. Also pass in the ctrl
sqsize+1 and not the opts queue_size. And suppress a superflous
failure message.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
If we move the queues from LIVE state, we might as well stop them (drain
for rdma). Do it after we stop the request queues to prevent a stray
request sneaking in .queue_rq after we stop the queue.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Mimic the pci driver as a controller disable might be more lightweight
than a shutdown.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
We always pair tagset allocation with rdma device reference and it shares
some code, centralize it with an argument if its an admin or IO tagset.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
We will call it from other places so avoid having to forward declare it.
Also move it next to nvme_rdma_destroy_admin_queue.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
NVME_RDMA_MAX_SEGMENT_SIZE is not used anywhere, zap it.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
nvmf target shall return NVME_SC_CONNECT_INVALID_HOST instead of
the gereal code INVALID_PARAM when the given host nqn is not allowed
to connect. Refer to the 2.2.1 section of the NVMe over Fabrics Spec.
Signed-off-by: Guan Junxiong <guanjunxiong@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
NVME's Timestamp feature allows controllers to be aware of the epoch
time in milliseconds. This patch adds the set features hook for various
transports through the identify path, so that resets and resumes can
update the controller as necessary.
Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
[hch: rebased on top of nvme-4.13 error handling changes,
changed nvme_configure_timestamp to return the status]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Define the constant "0xffffffff" (used as nsid for all namespaces)
as NVME_NSID_ALL.
Signed-off-by: Arnav Dawn <a.dawn@samsung.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
This patch adds support for handling Fw activation without reset
On completion of FW-activation-starting AER, all queues are
paused till CSTS.PP is cleared or timed out (exceeds max time for
fw activtion MTFA). If device fails to clear CSTS.PP within MTFA,
driver issues reset controller.
Signed-off-by: Arnav Dawn <a.dawn@samsung.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
This way we don't need a block_device structure to submit I/O. The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open. Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).
For the actual I/O path all that we need is the gendisk, which exists
once per block device. But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.
Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since blk_mq_ops.reinit_request is only called from inside
blk_mq_reinit_tagset(), make this function pointer an argument of
blk_mq_reinit_tagset() instead of a member of struct blk_mq_ops.
This patch does not change any functionality but makes
blk_mq_reinit_tagset() calls easier to read and to analyze.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: James Smart <james.smart@broadcom.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fixes: 920d13a884 ("nvme-pci: factor out the cqe reading mechanics from __nvme_process_cq")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Only print the specified options that are not recognized, instead
of the whole list of options.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
There were 2 statics introduced that were bogus. Removed the static
designations.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Use list_for_each_entry_safe to prevent list handling from referencing
next pointers directly after list_del's
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The merged version of my patch "nvmet: don't report 0-bytes in serial
number" fails to remove two lines which should have been replaced,
so that the space-padded strings are overwritten again with 0-bytes.
Fix it.
Fixes: 42de82a8b5 nvmet: don't report 0-bytes in serial number
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The numd field of directive receive command takes number of dwords to
transfer. This fix has the correct calculation for numd.
Signed-off-by: Kwan (Hingkwan) Huen-SSI <kwan.huen@samsung.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
We need to return an error if a timeout occurs on any NVMe command during
initialization. Without this, the nvme reset work will be stuck. A timeout
will have a negative error code, meaning we need to stop initializing
the controller. All postitive returns mean the controller is still usable.
bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196325
Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: Martin Peres <martin.peres@intel.com>
[jth consolidated cleanup path ]
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Currently we create the sysfs entry even if we fail mapping
it. In that case, the unmapping will not remove the sysfs created
file. There is no good reason to create a sysfs entry for a non
working CMB and show his characteristics.
Fixes: f63572dff ("nvme: unmap CMB and remove sysfs file in reset path")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
At queue creation, the transport allocates a local job struct
(struct nvmet_fc_fcp_iod) for each possible element of the queue.
When a new CMD is received from the wire, a jobs struct is allocated
from the queue and then used for the duration of the command.
The job struct contains buffer space for the wire command iu. Thus,
upon allocation of the job struct, the cmd iu buffer is copied to
the job struct and the LLDD may immediately free/reuse the CMD IU
buffer passed in the call.
However, in some circumstances, due to the packetized nature of FC
and the api of the FC LLDD which may issue a hw command to send the
wire response, but the LLDD may not get the hw completion for the
command and upcall the nvmet_fc layer before a new command may be
asynchronously received on the wire. In other words, its possible
for the initiator to get the response from the wire, thus believe a
command slot free, and send a new command iu. The new command iu
may be received by the LLDD and passed to the transport before the
LLDD had serviced the hw completion and made the teardown calls for
the original job struct. As such, there is no available job struct
available for the new io. E.g. it appears like the host sent more
queue elements than the queue size. It didn't based on it's
understanding.
Rather than treat this as a hard connection failure queue the new
request until the job struct does free up. As the buffer isn't
copied as there's no job struct, a special return value must be
returned to the LLDD to signify to hold off on recycling the cmd
iu buffer. And later, when a job struct is allocated and the
buffer copied, a new LLDD callback is introduced to notify the
LLDD and allow it to recycle it's command iu buffer.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Some broken controllers (such as earlier Linux targets) pad model or
serial fields with 0-bytes rather than spaces. The NVMe spec disallows
0 bytes in "ASCII" fields. Thus strip trailing 0-bytes, too. Also make
sure that we get no underflow for pathological input.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>