Commit Graph

2307 Commits

Author SHA1 Message Date
Christoph Hellwig
a5d737f100 nvme: factor out a nvme_ns_ioctl helper
Factor out a helper for the namespace based ioctls.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
2021-04-15 08:12:54 +02:00
Christoph Hellwig
d7790d3739 nvme: pass a user pointer to nvme_nvm_ioctl
Pass the proper user pointer instead of the not all that useful integer
representation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
2021-04-15 08:12:54 +02:00
Christoph Hellwig
9953ab0c5a nvme: cleanup setting the disk name
Return false from nvme_set_disk_name and let the caller set the
non-multipath name instead of duplicating the naming information in two
places.  Also remove the pointless local variables for the disk name
and flags and the not needed ctrl argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
2021-04-15 08:12:54 +02:00
Minwoo Im
3089738868 nvme: add a nvme_ns_head_multipath helper
Move the multipath gendisk out of #ifdef CONFIG_NVME_MULTIPATH and add
a new nvme_ns_head_multipath that uses it to check if a ns_head has
a multipath device associated with it.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
[hch: added the IS_ENABLED, converted a few existing users]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
2021-04-15 08:12:54 +02:00
Niklas Cassel
95d54bd1a4 nvme: remove single trailing whitespace
There is a single trailing whitespace in core.c.
Since this is just a single whitespace, the chances of this affecting
backports to stable should be quite low, so let's just remove it.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-15 08:12:54 +02:00
Niklas Cassel
e234f1f8bb nvme-multipath: remove single trailing whitespace
There is a single trailing whitespace in multipath.c.
Since this is just a single whitespace, the chances of this affecting
backports to stable should be quite low, so let's just remove it.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-15 08:12:54 +02:00
Niklas Cassel
53dc180e7c nvme-pci: remove single trailing whitespace
There is a single trailing whitespace in pci.c.
Since this is just a single whitespace, the chances of this affecting
backports to stable should be quite low, so let's just remove it.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-15 08:12:54 +02:00
Niklas Cassel
e51183be1f nvme-pci: don't simple map sgl when sgls are disabled
According to the module parameter description for sgl_threshold,
a value of 0 means that SGLs are disabled.

If SGLs are disabled, we should respect that, even for the case
where the request is made up of a single physical segment.

Fixes: 297910571f ("nvme-pci: optimize mapping single segment requests using SGLs")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-15 08:12:53 +02:00
Colin Ian King
ccc1003b5b nvmet: fix a spelling mistake "nubmer" -> "number"
There is a spelling mistake in a pr_err error message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-15 08:12:53 +02:00
Amit Engel
0d8ddeea11 nvmet-fc: simplify nvmet_fc_alloc_hostport
Once a host is already created, avoid allocate additional hostports that
will be thrown away. add an helper function to handle host search.

Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Amit Engel <amit.engel@dell.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-15 08:12:53 +02:00
Elad Grupi
bdaf132791 nvmet-tcp: fix a segmentation fault during io parsing error
In case there is an io that contains inline data and it goes to
parsing error flow, command response will free command and iov
before clearing the data on the socket buffer.
This will delay the command response until receive flow is completed.

Fixes: 872d26a391 ("nvmet-tcp: add NVMe over TCP target driver")
Signed-off-by: Elad Grupi <elad.grupi@dell.com>
Signed-off-by: Hou Pu <houpu.main@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-15 08:12:50 +02:00
Chaitanya Kulkarni
327e1d2957 lightnvm: use kobj_to_dev()
This fixs coccicheck warning:

drivers/nvme//host/lightnvm.c:1243:60-61: WARNING opportunity for
kobj_to_dev()

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Link: https://lore.kernel.org/r/20210413105257.159260-2-matias.bjorling@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-04-13 09:16:12 -06:00
Bart Van Assche
8609c63fce nvme: fix handling of large MDTS values
Instead of triggering an integer overflow and undefined behavior if MDTS is
large, set max_hw_sectors to UINT_MAX.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Keith Busch <kbusch@kernel.org>
[hch: rebased to account for the new nvme_mps_to_sectors helper]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-06 08:34:39 +02:00
Keith Busch
5befc7c26e nvme: implement non-mdts command limits
Commands that access LBA contents without a data transfer between the
host historically have not had a spec defined upper limit. The driver
set the queue constraints for such commands to the max data transfer
size just to be safe, but this artificial constraint frequently limits
devices below their capabilities.

The NVMe Workgroup ratified TP4040 defines how a controller may
advertise their non-MDTS limits. Use these if provided and default to
the current constraints if not. Since the Dataset Management command
limits are defined in logical blocks, but without a namespace to tell us
the logical block size, the code defaults to the safe 512b size.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-06 08:34:39 +02:00
Niklas Cassel
c881a23fb6 nvme: disallow passthru cmd from targeting a nsid != nsid of the block dev
When a passthru command targets a specific namespace, the ns parameter to
nvme_user_cmd()/nvme_user_cmd64() is set. However, there is currently no
validation that the nsid specified in the passthru command targets the
namespace/nsid represented by the block device that the ioctl was
performed on.

Add a check that validates that the nsid in the passthru command matches
that of the supplied namespace.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Javier González <javier@javigon.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-06 08:34:38 +02:00
Hannes Reinecke
dd8f7fa908 nvme: retrigger ANA log update if group descriptor isn't found
If ANA is enabled but no ANA group descriptor is found when creating
a new namespace the ANA log is most likely out of date, so trigger
a re-read. The namespace will be tagged with the NS_ANA_PENDING flag
to exclude it from path selection until the ANA log has been re-read.

Fixes: 32acab3181 ("nvme: implement multipath access to nvme subsystems")
Reported-by: Martin George <marting@netapp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-06 08:34:38 +02:00
Daniel Wagner
09fbed6363 nvme: export fast_io_fail_tmo to sysfs
Commit 8c4dfea97f ("nvme-fabrics: reject I/O to offline device")
introduced fast_io_fail_tmo but didn't export the value to sysfs. The
value can be set during the 'nvme connect'. Export the timeout value
to user space via sysfs to allow runtime configuration.

Cc: Victor Gladkov <Victor.Gladkov@kioxia.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Himanshu Madhani <himanshu.madhaani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:29 +02:00
Daniel Wagner
25a64e4e7e nvme: remove superfluous else in nvme_ctrl_loss_tmo_store
If there is an error we will leave the function early. So there
is no need for an else. Remove it.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:29 +02:00
Daniel Wagner
bff4bcf3cf nvme: use sysfs_emit instead of sprintf
sysfs_emit is the recommended API to use for formatting strings to be
returned to user space. It is equivalent to scnprintf and aware of the
PAGE_SIZE buffer size.

Suggested-by: Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:29 +02:00
Max Gurtovoy
8df1bff57c nvme-fc: check sgl supported by target
SGLs support is mandatory for NVMe/FC, make sure that the target is
aligned to the specification.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:29 +02:00
Max Gurtovoy
73ffcefcfc nvme-tcp: check sgl supported by target
SGLs support is mandatory for NVMe/tcp, make sure that the target is
aligned to the specification.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:28 +02:00
Wunderlich, Mark
d8e7b462f5 nvmet-tcp: enable optional queue idle period tracking
Add 'idle_poll_period_usecs' option used by io_work() to support
network devices enabled with advanced interrupt moderation
supporting a relaxed interrupt model. It was discovered that
such a NIC used on the target was unable to support initiator
connection establishment, caused by the existing io_work()
flow that immediately exits after a loop with no activity and
does not re-queue itself.

With this new option a queue is assigned a period of time
that no activity must occur in order to become 'idle'.  Until
the queue is idle the work item is requeued.

The new module option is defined as changeable making it
flexible for testing purposes.

The pre-existing legacy behavior is preserved when no module option
for idle_poll_period_usecs is specified.

Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:28 +02:00
Sagi Grimberg
b5332a9f3f nvmet-tcp: fix incorrect locking in state_change sk callback
We are not changing anything in the TCP connection state so
we should not take a write_lock but rather a read lock.

This caused a deadlock when running nvmet-tcp and nvme-tcp
on the same system, where state_change callbacks on the
host and on the controller side have causal relationship
and made lockdep report on this with blktests:

================================
WARNING: inconsistent lock state
5.12.0-rc3 #1 Tainted: G          I
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-R} usage.
nvme/1324 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffff888363151000 (clock-AF_INET){++-?}-{2:2}, at: nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
{IN-SOFTIRQ-W} state was registered at:
  __lock_acquire+0x79b/0x18d0
  lock_acquire+0x1ca/0x480
  _raw_write_lock_bh+0x39/0x80
  nvmet_tcp_state_change+0x21/0x170 [nvmet_tcp]
  tcp_fin+0x2a8/0x780
  tcp_data_queue+0xf94/0x1f20
  tcp_rcv_established+0x6ba/0x1f00
  tcp_v4_do_rcv+0x502/0x760
  tcp_v4_rcv+0x257e/0x3430
  ip_protocol_deliver_rcu+0x69/0x6a0
  ip_local_deliver_finish+0x1e2/0x2f0
  ip_local_deliver+0x1a2/0x420
  ip_rcv+0x4fb/0x6b0
  __netif_receive_skb_one_core+0x162/0x1b0
  process_backlog+0x1ff/0x770
  __napi_poll.constprop.0+0xa9/0x5c0
  net_rx_action+0x7b3/0xb30
  __do_softirq+0x1f0/0x940
  do_softirq+0xa1/0xd0
  __local_bh_enable_ip+0xd8/0x100
  ip_finish_output2+0x6b7/0x18a0
  __ip_queue_xmit+0x706/0x1aa0
  __tcp_transmit_skb+0x2068/0x2e20
  tcp_write_xmit+0xc9e/0x2bb0
  __tcp_push_pending_frames+0x92/0x310
  inet_shutdown+0x158/0x300
  __nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp]
  nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp]
  nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp]
  nvme_do_delete_ctrl+0x100/0x10c [nvme_core]
  nvme_sysfs_delete.cold+0x8/0xd [nvme_core]
  kernfs_fop_write_iter+0x2c7/0x460
  new_sync_write+0x36c/0x610
  vfs_write+0x5c0/0x870
  ksys_write+0xf9/0x1d0
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xae
irq event stamp: 10687
hardirqs last  enabled at (10687): [<ffffffff9ec376bd>] _raw_spin_unlock_irqrestore+0x2d/0x40
hardirqs last disabled at (10686): [<ffffffff9ec374d8>] _raw_spin_lock_irqsave+0x68/0x90
softirqs last  enabled at (10684): [<ffffffff9f000608>] __do_softirq+0x608/0x940
softirqs last disabled at (10649): [<ffffffff9cdedd31>] do_softirq+0xa1/0xd0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(clock-AF_INET);
  <Interrupt>
    lock(clock-AF_INET);

 *** DEADLOCK ***

5 locks held by nvme/1324:
 #0: ffff8884a01fe470 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xf9/0x1d0
 #1: ffff8886e435c090 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x216/0x460
 #2: ffff888104d90c38 (kn->active#255){++++}-{0:0}, at: kernfs_remove_self+0x22d/0x330
 #3: ffff8884634538d0 (&queue->queue_lock){+.+.}-{3:3}, at: nvme_tcp_stop_queue+0x52/0xb0 [nvme_tcp]
 #4: ffff888363150d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: inet_shutdown+0x59/0x300

stack backtrace:
CPU: 26 PID: 1324 Comm: nvme Tainted: G          I       5.12.0-rc3 #1
Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.10.0 11/12/2020
Call Trace:
 dump_stack+0x93/0xc2
 mark_lock_irq.cold+0x2c/0xb3
 ? verify_lock_unused+0x390/0x390
 ? stack_trace_consume_entry+0x160/0x160
 ? lock_downgrade+0x100/0x100
 ? save_trace+0x88/0x5e0
 ? _raw_spin_unlock_irqrestore+0x2d/0x40
 mark_lock+0x530/0x1470
 ? mark_lock_irq+0x1d10/0x1d10
 ? enqueue_timer+0x660/0x660
 mark_usage+0x215/0x2a0
 __lock_acquire+0x79b/0x18d0
 ? tcp_schedule_loss_probe.part.0+0x38c/0x520
 lock_acquire+0x1ca/0x480
 ? nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
 ? rcu_read_unlock+0x40/0x40
 ? tcp_mtu_probe+0x1ae0/0x1ae0
 ? kmalloc_reserve+0xa0/0xa0
 ? sysfs_file_ops+0x170/0x170
 _raw_read_lock+0x3d/0xa0
 ? nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
 nvme_tcp_state_change+0x21/0x150 [nvme_tcp]
 ? sysfs_file_ops+0x170/0x170
 inet_shutdown+0x189/0x300
 __nvme_tcp_stop_queue+0x36/0x270 [nvme_tcp]
 nvme_tcp_stop_queue+0x87/0xb0 [nvme_tcp]
 nvme_tcp_teardown_admin_queue+0x69/0xe0 [nvme_tcp]
 nvme_do_delete_ctrl+0x100/0x10c [nvme_core]
 nvme_sysfs_delete.cold+0x8/0xd [nvme_core]
 kernfs_fop_write_iter+0x2c7/0x460
 new_sync_write+0x36c/0x610
 ? new_sync_read+0x600/0x600
 ? lock_acquire+0x1ca/0x480
 ? rcu_read_unlock+0x40/0x40
 ? lock_is_held_type+0x9a/0x110
 vfs_write+0x5c0/0x870
 ksys_write+0xf9/0x1d0
 ? __ia32_sys_read+0xa0/0xa0
 ? lockdep_hardirqs_on_prepare.part.0+0x198/0x340
 ? syscall_enter_from_user_mode+0x27/0x70
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: 872d26a391 ("nvmet-tcp: add NVMe over TCP target driver")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:28 +02:00
Sagi Grimberg
8b73b45d54 nvme-tcp: block BH in sk state_change sk callback
The TCP stack can run from process context for a long time
so we should disable BH here.

Fixes: 3f2304f8c6 ("nvme-tcp: add NVMe over TCP host driver")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:28 +02:00
Hou Pu
79695dcd9a nvmet: return proper error code from discovery ctrl
Return NVME_SC_INVALID_FIELD from discovery controller like normal
controller when executing identify or get log page command.

Signed-off-by: Hou Pu <houpu.main@gmail.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:28 +02:00
Keith Busch
ed4a854b06 nvme: warn of unhandled effects only once
We don't need to repeatedly spam the kernel logs with the same warning
about unhandled passthrough IO effects. Just one warning is sufficient
to observe this condition occurs.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:28 +02:00
Keith Busch
f4b9e6c90c nvme: use driver pdu command for passthrough
All nvme transport drivers preallocate an nvme command for each request.
Assume to use that command for nvme_setup_cmd() instead of requiring
drivers pass a pointer to it. All nvme drivers must initialize the
generic nvme_request 'cmd' to point to the transport's preallocated
nvme_command.

The generic nvme_request cmd pointer had previously been used only as a
temporary copy for passthrough commands. Since it now points to the
command that gets dispatched, passthrough commands must directly set it
up prior to executing the request.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:27 +02:00
Keith Busch
af7fae857e nvme-pci: allocate nvme_command within driver pdu
Except for pci, all the nvme transport drivers allocate a command within
the driver's pdu. Align pci with everyone else by allocating the nvme
command within pci's pdu and replace the .queue_rq() stack variable with
this.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:27 +02:00
Noam Gottlieb
48b4c010c8 nvmet: do not allow model_number exceed 40 bytes
According to the NVM specifications, the model number size should be
40 bytes (bytes 63:24 of the Identify Controller data structure).
Therefore, any attempt to store a value into model_number which
exceeds 40 bytes should return an error.

Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Noam Gottlieb <ngottlieb@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:27 +02:00
Chaitanya Kulkarni
de5878048e nvmet: remove unnecessary ctrl parameter
The function nvmet_ctrl_find_get() accepts out pointer to nvmet_ctrl
structure. This function returns the same error value from two places
that is :- NVME_SC_CONNECT_INVALID_PARAM | NVME_SC_DNR.

Move this to the caller so we can change the return type to nvmet_ctrl.

Now that we can changed the return type, instead of taking out pointer
to the nvmet_ctrl structure remove that function parameter and return
the valid nvmet_ctrl pointer on success and NULL on failure.

Also, add and rename the goto labels for more readability with comments.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:27 +02:00
Chaitanya Kulkarni
b53d47418d nvmet-fc: update function documentation
Add minimum description of the hosthandle parameter for
nvmet_fc_rcv_ls_req() so that we can get rid of the following warning.

drivers/nvme//target/fc.c:2009: warning: Function parameter or member 'hosthandle' not described in 'nvmet_fc_rcv_ls_req

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:27 +02:00
Chaitanya Kulkarni
2afc4866c4 nvme-fc: fix the function documentation comment
The nvme_fc_rcv_ls_req() function has first argument as pointer to
remoteport named portprt, but in the documentation comment that is name
is used as remoteport. Fix that to get rid if the compilation warning.

drivers/nvme//host/fc.c:1724: warning: Function parameter or member 'portptr' not described in 'nvme_fc_rcv_ls_req'
drivers/nvme//host/fc.c:1724: warning: Excess function parameter 'remoteport' description in 'nvme_fc_rcv_ls_req'

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by:  James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:27 +02:00
Chaitanya Kulkarni
f1c772d581 nvme: add new line after variable declatation
Add a new line in functions nvme_pr_preempt(), nvme_pr_clear(), and
nvme_pr_release() after variable declaration which follows the rest of
the code in the nvme/host/core.c.

No functional change(s) in this patch.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:26 +02:00
Chaitanya Kulkarni
c03fd85de2 nvme: don't check nvme_req flags for new req
nvme_clear_request() has a check for flag REQ_DONTPREP and it is called
from nvme_init_request() and nvme_setuo_cmd().

The function nvme_init_request() is called from nvme_alloc_request()
and nvme_alloc_request_qid(). From these two callers new request is
allocated everytime. For newly allocated request RQF_DONTPREP is never
set. Since after getting a tag, block layer sets the req->rq_flags == 0
and never sets the REQ_DONTPREP when returning the request :-

nvme_alloc_request()
	blk_mq_alloc_request()
		blk_mq_rq_ctx_init()
			rq->rq_flags = 0 <----

nvme_alloc_request_qid()
	blk_mq_alloc_request_hctx()
		blk_mq_rq_ctx_init()
			rq->rq_flags = 0 <----

The block layer does set req->rq_flags but REQ_DONTPREP is not one of
them and that is set by the driver.

That means we can unconditinally set the REQ_DONTPREP value to the
rq->rq_flags when nvme_init_request()->nvme_clear_request() is called
from above two callers.

Move the check for REQ_DONTPREP from nvme_clear_nvme_request() into
nvme_setup_cmd().

This is needed since nvme_alloc_request() now gets called from fast
path when NVMeOF target is configured with passthru backend to avoid
unnecessary checks in the fast path.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:26 +02:00
Chaitanya Kulkarni
7a36604668 nvme: mark nvme_setup_passsthru() inline
Since nvmet_setup_passthru() function falls in fast path when called
from the NVMeOF passthru backend, make it inline.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:26 +02:00
Chaitanya Kulkarni
44ef5611c2 nvme: split init identify into helper
The function nvme_init_ctrl_finish() (formerly nvme_init_identify()) has
grown over the period of time about ~200 lines given the size of nvme id
ctrl data structure.

Move the nvme_id_ctrl data structure related initilzation into helper
nvme_init_identify() and call it from nvme_init_ctrl_finish().

When we move the code into nvme_init_identify() change the local
variable i from int to unsigned int and remove the duplicate kfree()
after nvme_mpath_init() and jump to the label out_free if
nvme_mpath_ini() fails.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:26 +02:00
Chaitanya Kulkarni
f21c4769d0 nvme: rename nvme_init_identify()
This is a prep patch so that we can move the identify data structure
related code initialization from nvme_init_identify() into a helper.

Rename the function nvmet_init_identify() to nvmet_init_ctrl_finish().

Next patch will move the nvme_id_ctrl related initialization from newly
renamed function nvme_init_ctrl_finish() into the nvme_init_identify()
helper.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:26 +02:00
Kanchan Joshi
18479ddb7f nvme: reduce checks for zero command effects
For passthrough I/O commands, effects are usually to be zero.
nvme_passthrough_end() does three checks in futility for this case.
Bail out of function-call/checks.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:26 +02:00
Kanchan Joshi
2bd643079e nvme: use NVME_CTRL_CMIC_ANA macro
Use the proper macro instead of hard-coded value.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:25 +02:00
Chaitanya Kulkarni
75b5f9edb5 nvmet: replace white spaces with tabs
Instead of the using the whitespaces use tab spacing in the
nvmet_execute_identify_ns().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:25 +02:00
Chaitanya Kulkarni
7798df6fcf nvmet: remove an unnecessary function parameter to nvmet_check_ctrl_status
In nvmet_check_ctrl_status() cmd can be derived from nvmet_req. Remove
the local variable cmd in the nvmet_check_ctrl_status() and function
parameter cmd for nvmet_check_ctrl_status(). Derive the cmd value from
req parameter in the nvmet_check_ctrl_status().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:25 +02:00
Chaitanya Kulkarni
a56f14c26d nvmet: update error log page in nvmet_alloc_ctrl()
Instead of updating the error log page in the caller of the
nvmet_alloc_ctrt() update the error log page in the nvmet_alloc_ctrl().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:25 +02:00
Chaitanya Kulkarni
76affbe6d6 nvmet: remove a duplicate status assignment in nvmet_alloc_ctrl
In the function nvmet_alloc_ctrl() we assign status value before we
call nvmet_fine_get_subsys() to:

	status = NVME_SC_CONNECT_INVALID_PARAM | NVME_SC_DNR;

After we successfully find the subsystem we again set the status value
to:

	status = NVME_SC_CONNECT_INVALID_PARAM | NVME_SC_DNR;

Remove the duplicate status assignment value.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:25 +02:00
Chaitanya Kulkarni
05fae499a9 nvme-pci: cleanup nvme_irq()
Get rid of a local variable that is not needed and just return the
status directly.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:25 +02:00
Chaitanya Kulkarni
e9c78c2335 nvme-pci: remove the barriers in nvme_irq()
The barriers were added to the nvme_irq() in commit 3a7afd8ee4
("nvme-pci: remove the CQ lock for interrupt driven queues") to prevent
compiler from doing memory optimization for the variabes that were
protected previously by spinlock in nvme_irq() at completion queue
processing and with queue head check condition.

The variable nvmeq->last_cq_head from those checks was removed in the
commit f6c4d97b0d ("nvme/pci: Remove last_cq_head") that was not
allwing poll queues from mistakenly triggering the spurious interrupt
detection.

Remove the barriers which were protecting the updates to the variables.

Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:24 +02:00
Christoph Hellwig
f4f9fc29e5 nvme: fix the nsid value to print in nvme_validate_or_alloc_ns
ns can be NULL at this point, and my move of the check from
the original patch by Chaitanya broke this.

Fixes: 0ec84df495 ("nvme-core: check ctrl css before setting up zns")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-12 13:17:45 -07:00
Jens Axboe
d4b64fd702 nvme fixes for 5.12:
- one more quirk (Dmitry Monakhov)
  - fix max_zone_append_sectors initialization (Chaitanya Kulkarni)
  - nvme-fc reset/create race fix (James Smart)
  - fix status code on aborts/resets (Hannes Reinecke)
  - fix the CSS check for ZNS namespaces (Chaitanya Kulkarni)
  - fix a use after free in a debug printk in nvme-rdma (Lv Yunlong)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmBLMlwLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYNVLA//aay9+Jq/BTwXG69249PshsvGpw5EVpaNp5YIwxyM
 ehC7AW5vm+1wg3b8X1O+sRhJ+PSKs3hbzUxoVWsGwj4RxJB0OPRnthjtcaGz2Ly3
 5q/IGlA6Qxk0cHLRqTg+l3n4d1ctPpt+wS2IcTJpgh8Vmqj+/H3+jD26UjhvLySr
 I9zgtl27STC8nwQwWYOXvbLql3eRzilZNd4qydnZIlcj/+CW3MGehiFlaVKsUQP1
 FJwx8LVHNU3V6n21eT99Xa9Lu/vWRjPQEyEBLV9iZ/dD2JVS1Q4TI/hwtTjFLZvi
 CVw1uu8HfSlYE/lA4wclPfljdZLUt2chjVG5S/lqJJmmoy7J6xf4b0hNI2yLGjft
 yKZB+vi+nKKoj3OLiy+LpopXSb23BMDlis1EqL/rXtU1JXsonDcsTAdscACvTHTL
 2aVbyHQ2hvNOs6L7GkHGEwu7nNE31tG7+DOXnjHwfq3j2mTnHu2AniAF9llG1NuC
 nbIQZQxRDdakknJsZU8YWG3rkOFUDLs9z2V3O5bs8KlTgYTkCut0Vtgdx2vZ2iVF
 jTmbEuFyK6pcWgFEFk6IP4LawJPNpozo8NQZGMUOp3ym8CXm/NHuxj1R3w8LcAH6
 OvmQWo6AcUNyaXNhVNPADqmnY2ZDjYC6/KWoMrg/YESCYXqfpMvXfArVbyYoU9yD
 61U=
 =LkdO
 -----END PGP SIGNATURE-----

Merge tag 'nvme-5.12-2021-03-12' of git://git.infradead.org/nvme into block-5.12

Pull NVMe fixes from Christoph:

"nvme fixes for 5.12:

 - one more quirk (Dmitry Monakhov)
 - fix max_zone_append_sectors initialization (Chaitanya Kulkarni)
 - nvme-fc reset/create race fix (James Smart)
 - fix status code on aborts/resets (Hannes Reinecke)
 - fix the CSS check for ZNS namespaces (Chaitanya Kulkarni)
 - fix a use after free in a debug printk in nvme-rdma (Lv Yunlong)"

* tag 'nvme-5.12-2021-03-12' of git://git.infradead.org/nvme:
  nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a
  nvme-rdma: Fix a use after free in nvmet_rdma_write_data_done
  nvme-core: check ctrl css before setting up zns
  nvme-fc: fix racing controller reset and create association
  nvme-fc: return NVME_SC_HOST_ABORTED_CMD when a command has been aborted
  nvme-fc: set NVME_REQ_CANCELLED in nvme_fc_terminate_exchange()
  nvme: add NVME_REQ_CANCELLED flag in nvme_cancel_request()
  nvme: simplify error logic in nvme_validate_ns()
  nvme: set max_zone_append_sectors nvme_revalidate_zones
2021-03-12 07:21:15 -07:00
Christoph Hellwig
a8affc03a9 block: rename BIO_MAX_PAGES to BIO_MAX_VECS
Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been
horribly confusingly misnamed.  Rename it to BIO_MAX_VECS to stop
confusing users of the bio API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-11 07:47:48 -07:00
Dmitry Monakhov
abbb5f5929 nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a
This adds a quirk for Samsung PM1725a drive which fixes timeouts and
I/O errors due to the fact that the controller does not properly
handle the Write Zeroes command, dmesg log:

nvme nvme0: I/O 528 QID 10 timeout, aborting
nvme nvme0: I/O 529 QID 10 timeout, aborting
nvme nvme0: I/O 530 QID 10 timeout, aborting
nvme nvme0: I/O 531 QID 10 timeout, aborting
nvme nvme0: I/O 532 QID 10 timeout, aborting
nvme nvme0: I/O 533 QID 10 timeout, aborting
nvme nvme0: I/O 534 QID 10 timeout, aborting
nvme nvme0: I/O 535 QID 10 timeout, aborting
nvme nvme0: Abort status: 0x0
nvme nvme0: Abort status: 0x0
nvme nvme0: Abort status: 0x0
nvme nvme0: Abort status: 0x0
nvme nvme0: Abort status: 0x0
nvme nvme0: Abort status: 0x0
nvme nvme0: Abort status: 0x0
nvme nvme0: Abort status: 0x0
nvme nvme0: I/O 528 QID 10 timeout, reset controller
nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10
nvme nvme0: Device not ready; aborting reset, CSTS=0x3
nvme nvme0: Device not ready; aborting reset, CSTS=0x3
nvme nvme0: Removing after probe failure status: -19
nvme0n1: detected capacity change from 6251233968 to 0
blk_update_request: I/O error, dev nvme0n1, sector 32776 op 0x1:(WRITE) flags 0x3000 phys_seg 6 prio class 0
blk_update_request: I/O error, dev nvme0n1, sector 113319936 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0
Buffer I/O error on dev nvme0n1p2, logical block 1, lost async page write
blk_update_request: I/O error, dev nvme0n1, sector 113319680 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
Buffer I/O error on dev nvme0n1p2, logical block 2, lost async page write
blk_update_request: I/O error, dev nvme0n1, sector 113319424 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
Buffer I/O error on dev nvme0n1p2, logical block 3, lost async page write
blk_update_request: I/O error, dev nvme0n1, sector 113319168 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
Buffer I/O error on dev nvme0n1p2, logical block 4, lost async page write
blk_update_request: I/O error, dev nvme0n1, sector 113318912 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
Buffer I/O error on dev nvme0n1p2, logical block 5, lost async page write
blk_update_request: I/O error, dev nvme0n1, sector 113318656 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
Buffer I/O error on dev nvme0n1p2, logical block 6, lost async page write
blk_update_request: I/O error, dev nvme0n1, sector 113318400 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
blk_update_request: I/O error, dev nvme0n1, sector 113318144 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
blk_update_request: I/O error, dev nvme0n1, sector 113317888 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0

Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11 11:48:54 +01:00
Lv Yunlong
abec6561fc nvme-rdma: Fix a use after free in nvmet_rdma_write_data_done
In nvmet_rdma_write_data_done, rsp is recoverd by wc->wr_cqe and freed by
nvmet_rdma_release_rsp(). But after that, pr_info() used the freed
chunk's member object and could leak the freed chunk address with
wc->wr_cqe by computing the offset.

Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-11 11:48:35 +01:00