Commit Graph

996245 Commits

Author SHA1 Message Date
James Smart
8e9a3250dc scsi: lpfc: Fix use after free in lpfc_els_free_iocb
There are several code paths where the following sequence occurs:

 - An ndlp pointer is assigned to an iocb via a nlp_get()

 - An attempt is made to issue the iocb, but it fails

 - The failure case does a put on the ndlp then calls lpfc_els_free_iocb()

The put may free the ndlp structure, but the els_free_iocb may reference
the now-stale ndlp pointer and cause a crash.

Fix by ensuring that the lpfc_els_free_iocb() occurs before the
lpfc_nlp_put().

While fixing, refactor the code to better ensure this calling sequence.

Link: https://lore.kernel.org/r/20210301171821.3427-11-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:04 -05:00
James Smart
8dd1c125f7 scsi: lpfc: Fix null pointer dereference in lpfc_prep_els_iocb()
It is possible to call lpfc_issue_els_plogi() passing a did for which no
matching ndlp is found. A call is then made to lpfc_prep_els_iocb() with a
null pointer to a lpfc_nodelist structure resulting in a null pointer
dereference.

Fix by returning an error status if no valid ndlp is found. Fix up comments
regarding ndlp reference counting.

Link: https://lore.kernel.org/r/20210301171821.3427-10-jsmart2021@gmail.com
Fixes: 4430f7fd09 ("scsi: lpfc: Rework locations of ndlp reference taking")
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:04 -05:00
James Smart
ae960d78ec scsi: lpfc: Fix unnecessary null check in lpfc_release_scsi_buf
lpfc_fcp_io_cmd_wqe_cmpl() is intended to mirror
lpfc_nvme_io_cmd_wqe_cmpl() for sli4 fcp completions. When the routine was
added, lpfc_fcp_io_cmd_wqe_cmpl() included a null pointer check for
phba. However, phba is definitely valid, being dereferenced by the calling
routine and used later in the routine itself.

Remove the unnecessary null check.

Link: https://lore.kernel.org/r/20210301171821.3427-9-jsmart2021@gmail.com
Fixes: 96e209be6e ("scsi: lpfc: Convert SCSI I/O completions to SLI-3 and SLI-4 handlers")
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:04 -05:00
James Smart
bd4f510042 scsi: lpfc: Fix pt2pt connection does not recover after LOGO
On a pt2pt setup, between 2 initiators, if one side issues a a LOGO, there
is no relogin attempt. The FC specs are grey in this area on which port
(higher wwn or not) is to re-login.

As there is no spec guidance, unconditionally re-PLOGI after the logout to
ensure a login is re-established.

Link: https://lore.kernel.org/r/20210301171821.3427-8-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:04 -05:00
James Smart
6b6eaf8a53 scsi: lpfc: Fix lpfc_els_retry() possible null pointer dereference
Driver crashed in lpfc_debugfs_disc_trc() due to null ndlp pointer.  In
some calling cases, the ndlp is null and the did is looked up.

Fix by using the local did variable that is set appropriately based on ndlp
value.

Link: https://lore.kernel.org/r/20210301171821.3427-7-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:04 -05:00
James Smart
618e2ee146 scsi: lpfc: Fix FLOGI failure due to accessing a freed node
After an initial successful FLOGI into the switch, if a subsequent FLOGI
fails the driver crashed accessing a node struct. On FLOGI error, the flogi
completion logic triggers the final dereference on the node structure
without checking if it is registered with a backend. The devloss logic is
triggered after node is freed leading to the access of freed node.

Fix by adjusting the error path to not take the final dereferece if there
is an outstanding transport registration. Let the transport devloss call
remove the final reference.

Link: https://lore.kernel.org/r/20210301171821.3427-6-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:04 -05:00
James Smart
2693f5deed scsi: lpfc: Fix stale node accesses on stale RRQ request
Whenever an RRQ needs to be triggered, the DID from the node structure and
node pointer are stored in the RRQ data structure and the RRQ is scheduled
for later transmission. However, at the point in time that the timer
triggers, there's no validation on the node pointer. Reference counters may
have freed the structure. Additionally the DID in the node may no longer be
valid.

Fix by not tracking the node pointer in the RRQ, only the DID. At the time
of the timer expiration, look up the node with the did and if present, send
the RRQ. If no node exists, no need to send the RRQ.

Link: https://lore.kernel.org/r/20210301171821.3427-5-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:04 -05:00
James Smart
68a6a66c51 scsi: lpfc: Fix reftag generation sizing errors
An LBA is 8 bytes. The driver generates a reftag from the LBA but the
reftag is 4 bytes. Thus scsi_get_lba() could return a value that exceeds
our reftag size.

Fix by converting all the code to calling the common routine
t10_pi_ref_tag() which returns a u32, thus ensuring a consistent 4byte
value.  Also correct a few code lines that access LBA directly and ensure
64-bit data types are used.

Link: https://lore.kernel.org/r/20210301171821.3427-4-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:04 -05:00
James Smart
58c36e80ee scsi: lpfc: Fix vport indices in lpfc_find_vport_by_vpid()
Calls to lpfc_find_vport_by_vpid() for the highest indexed vport fails with
error, "2936 Could not find Vport mapped to vpi XXX".  Our vport indices in
the loop and if-clauses were off by one.

Correct the vpid range used for vpi lookup to include the highest possible
vpid.

Link: https://lore.kernel.org/r/20210301171821.3427-3-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:03 -05:00
James Smart
9302154c07 scsi: lpfc: Fix incorrect dbde assignment when building target abts wqe
The wqe_dbde field indicates whether a Data BDE is present in Words 0:2 and
should therefore should be clear in the abts request wqe. By setting the
bit we can be misleading fw into error cases.

Clear the wqe_dbde field.

Link: https://lore.kernel.org/r/20210301171821.3427-2-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:03 -05:00
Douglas Gilbert
771f712ba5 scsi: scsi_debug: Fix cmd duration calculation
In some cases, sdebug_defer::cmpl_ts (completion timestamp) wasn't being
properly set when REQ_HIPRI was given. Fix that and improve code to only
call ktime_get_boottime_ns() for commands with REQ_HIPRI set as cmpl_ts is
only used in that case.

Link: https://lore.kernel.org/r/20210304014107.307625-1-dgilbert@interlog.com
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:03 -05:00
Kashyap Desai
4309ea74b0 scsi: core: Set shost as hctx driver_data
hctx->driver_data is not set for SCSI currently. Set hctx->driver_data =
shost.

Link: https://lore.kernel.org/r/20210215074048.19424-6-kashyap.desai@broadcom.com
Suggested-by: John Garry <john.garry@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:03 -05:00
Douglas Gilbert
4a0c6f432d scsi: scsi_debug: Add new defer type for mq_poll
Add a new sdeb_defer_type enumeration: SDEB_DEFER_POLL for requests that
have REQ_HIPRI set in cmd_flags field. It is expected that these requests
will be polled via the mq_poll entry point which is driven by calls to
blk_poll() in the block layer. Therefore timer events are not 'wired up' in
the normal fashion.

There are still cases with short delays (e.g. < 10 microseconds) where by
the time the command response processing occurs, the delay is already
exceeded in which case the code calls scsi_done() directly. In such cases
there is no window for mq_poll() to be called.

Add 'mq_polls' counter that increments on each scsi_done() called via the
mq_poll entry point. Can be used to show (with 'cat
/proc/scsi/scsi_debug/<host_id>') that blk_poll() is causing completions
rather than some other mechanism.

Link: https://lore.kernel.org/r/20210215074048.19424-5-kashyap.desai@broadcom.com
Tested-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:03 -05:00
Kashyap Desai
c4b57d89ba scsi: scsi_debug: mq_poll support
Add support of the mq_poll interface to scsi_debug.  This feature
requires shared host tag support in kernel and driver.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>

Link: https://lore.kernel.org/r/20210215074048.19424-4-kashyap.desai@broadcom.com
Cc: dgilbert@interlog.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:03 -05:00
Kashyap Desai
9e4bec5b2a scsi: megaraid_sas: mq_poll support
Implement mq_poll interface support in megaraid_sas. This feature
requires shared host tag support in kernel and driver.

The driver can work in non-IRQ mode which means there will not be any MSI-x
vector associated for poll_queues. The MegaRAID hardware has a single
submission queue and multiple reply queues. However, using the shared host
tagset support will enable the driver to simulate multiple hardware queues.

Change driver to allocate some extra reply queues which will be marked as
poll_queues. These poll_queues will not have associated MSI-x vectors. All
I/O completions on these queues will be done through the IOPOLL interface.

megaraid_sas with 8 poll_queues and using the io_uring hiprio=1 setting can
reach 3.2M IOPS with zero interrupts generated by the hardware.

The IOPOLL feature can be enabled using module parameter poll_queues.

Link: https://lore.kernel.org/r/20210215074048.19424-3-kashyap.desai@broadcom.com
Cc: sumit.saxena@broadcom.com
Cc: chandrakanth.patil@broadcom.com
Cc: linux-block@vger.kernel.org
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:03 -05:00
Kashyap Desai
af1830956d scsi: core: Add mq_poll support to SCSI layer
Currently IOPOLL support is only available in block layer. This patch
adds mq_poll support to the SCSI layer.

Link: https://lore.kernel.org/r/20210215074048.19424-2-kashyap.desai@broadcom.com
Cc: sumit.saxena@broadcom.com
Cc: chandrakanth.patil@broadcom.com
Cc: linux-block@vger.kernel.org
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:03 -05:00
Mike Christie
39ae3edda3 scsi: target: core: Make completion affinity configurable
It may not always be best to complete the IO on same CPU as it was
submitted on. This commit allows userspace to configure it.

This has been useful for vhost-scsi where we have a single thread for
submissions and completions. If we force the completion on the submission
CPU we may be adding conflicts with what the user has setup in the lower
levels with settings like the block layer rq_affinity or the driver's IRQ
or softirq (the network's rps_cpus value) settings.

We may also want to set it up where the vhost thread runs on CPU N and does
its submissions/completions there, and then have LIO do its completion
booking on CPU M, but can't configure the lower levels due to issues like
using dm-multipath with lots of paths (the path selector can throw commands
all over the system because it's only taking into account latency/throughput
at its level).

The new setting is in:

    /sys/kernel/config/target/$fabric/$target/param/cmd_completion_affinity

Writing:

    -1 -> Gives the current default behavior of completing on the
          submission CPU.

    -2 -> Completes the cmd on the CPU the lower layers sent it to us from.

   > 0 -> Completes on the CPU userspace has specified.

Link: https://lore.kernel.org/r/20210227170006.5077-26-michael.christie@oracle.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:03 -05:00
Mike Christie
3d75948b83 scsi: target: core: Flush submission work during TMR processing
If a cmd is on the submission workqueue then the TMR code will miss it, and
end up returning task not found or success for LUN resets. The fabric
driver might then tell the initiator that the running cmds have been
handled when they are about to run.

This adds a flush when we are processing TMRs to make sure queued cmds do
not run after returning the TMR response.

Link: https://lore.kernel.org/r/20210227170006.5077-25-michael.christie@oracle.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:02 -05:00
Mike Christie
6888da8179 scsi: target: tcmu: Add backend plug/unplug callouts
This patch adds plug/unplug callouts for tcmu, so we can avoid the number
of times we switch to userspace. Using this driver with tcm_loop is a
common config, and dependng on the nr_hw_queues (nr_hw_queues=1 performs
much better) and fio jobs (lower num jobs around 4) this patch can increase
IOPS by only around 5-10% because we hit other issues like the big per tcmu
device mutex.

Link: https://lore.kernel.org/r/20210227170006.5077-24-michael.christie@oracle.com
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:02 -05:00
Mike Christie
415ccd9811 scsi: target: iblock: Add backend plug/unplug callouts
This patch adds plug/unplug callouts for iblock. For an initiator driver
like iSCSI which wants to pass multiple cmds to its xmit thread instead of
one cmd at a time, this increases IOPS by around 10% with vhost-scsi
(combined with the last patches we can see a total 40-50% increase). For
driver combos like tcm_loop and faster drivers like the iSER initiator, we
can still see IOPS increase by 20-30% when tcm_loop's nr_hw_queues setting
is also increased.

Link: https://lore.kernel.org/r/20210227170006.5077-23-michael.christie@oracle.com
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:02 -05:00
Mike Christie
302990ac3b scsi: target: core: Fix backend plugging
target_core_iblock is plugging and unplugging on every command and this is
causing perf issues for drivers that prefer batched cmds. With recent
patches we can now take multiple cmds from a fabric driver queue and then
pass them down the backend drivers in a batch. This patch adds this support
by adding 2 callouts to the backend for plugging and unplugging the
device. Subsequent commits will add support for iblock and tcmu device
plugging.

Link: https://lore.kernel.org/r/20210227170006.5077-22-michael.christie@oracle.com
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:02 -05:00
Mike Christie
802ec4f672 scsi: target: core: Cleanup cmd flag bits
We have a couple holes in the cmd flags definitions. This cleans up the
definitions to fix that and make it easier to read.

Link: https://lore.kernel.org/r/20210227170006.5077-21-michael.christie@oracle.com
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:02 -05:00
Mike Christie
1130b499b4 scsi: target: tcm_loop: Use LIO wq cmd submission helper
Convert loop to use the LIO wq cmd submission helper.

Link: https://lore.kernel.org/r/20210227170006.5077-20-michael.christie@oracle.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:02 -05:00
Mike Christie
e0eb5d38b7 scsi: target: tcm_loop: Use block cmd allocator for se_cmds
Make tcm_loop use the block layer cmd allocator for se_cmds instead of
using the tcm_loop_cmd_cache. In the future when we can use the host tags
for internal requests like TMFs we can completely kill the
tcm_loop_cmd_cache.

Link: https://lore.kernel.org/r/20210227170006.5077-19-michael.christie@oracle.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:02 -05:00
Mike Christie
6ec29cb8ad scsi: target: vhost-scsi: Use LIO wq cmd submission helper
Convert vhost-scsi to use the LIO wq cmd submission helper.

Link: https://lore.kernel.org/r/20210227170006.5077-18-michael.christie@oracle.com
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:02 -05:00
Mike Christie
eb44ce8c8c scsi: target: core: Add workqueue based cmd submission
loop and vhost/scsi do their target cmd submission from driver
workqueues. This allows them to avoid an issue where the backend may block
waiting for resources like tags/requests, mem/locks, etc and that ends up
blocking their entire submission path and for the case of vhost-scsi both
the submission and completion path.

This patch adds a helper drivers can use to submit from a LIO workqueue.
This code will then be extended in the next patches to fix the plugging of
backend devices.

We are only converting vhost/loop initially, but the workqueue based
submission will work for other drivers and have similar benefits where the
main target loops will not end up blocking one some backend resource.

Link: https://lore.kernel.org/r/20210227170006.5077-17-michael.christie@oracle.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:02 -05:00
Mike Christie
0869419947 scsi: target: core: Add gfp_t arg to target_cmd_init_cdb()
tcm_loop could be used like a normal block device, so we can't use
GFP_KERNEL and should use GFP_NOIO. This adds a gfp_t arg to
target_cmd_init_cdb() and converts the users. For every driver but loop
GFP_KERNEL is kept.

This will also be useful in subsequent patches where loop needs to do
target_submit_prep() from interrupt context to get a ref to the se_device,
and so it will need to use GFP_ATOMIC.

Link: https://lore.kernel.org/r/20210227170006.5077-16-michael.christie@oracle.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:02 -05:00
Mike Christie
0fa50a8b12 scsi: target: core: Remove target_submit_cmd_map_sgls()
Convert target_submit_cmd() to do its own calls and then remove
target_submit_cmd_map_sgls() since no one uses it.

Link: https://lore.kernel.org/r/20210227170006.5077-15-michael.christie@oracle.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Bodo Stroesser <bostroesser@gmail.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:01 -05:00
Mike Christie
47edc84f33 scsi: target: tcm_fc: Convert to new submission API
target_submit_cmd() is now only for simple drivers that do their own sync
during shutdown and do not use target_stop_session().

tcm_fc uses target_stop_session() to sync session shutdown with LIO core,
so we use target_init_cmd(), target_submit_prep(), target_submit(), because
target_init_cmd() will now detect the target_stop_session() call and return
an error.

Link: https://lore.kernel.org/r/20210227170006.5077-14-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:01 -05:00
Mike Christie
1f48b065da scsi: target: xen-scsiback: Convert to new submission API
target_submit_cmd_map_sgls() is being removed, so convert xen to the new
submission API. This has it use target_init_cmd(), target_submit_prep(), or
target_submit() because we need to have LIO core map sgls which is now done
in target_submit_prep(). target_init_cmd() will never fail for xen because
it does its own sync during session shutdown, so we can remove that code.

Note: xen never calls target_stop_session() so target_submit_cmd_map_sgls()
never failed (in the new API target_init_cmd() handles
target_stop_session() being called when cmds are being submitted). If it
were to have used target_stop_session() and got an error, we would have hit
a refcount bug like xen and usb, because it does:

    if (rc < 0) {
            transport_send_check_condition_and_sense(se_cmd,
                            TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE, 0);
            transport_generic_free_cmd(se_cmd, 0);
    }

transport_send_check_condition_and_sense() calls queue_status which calls
scsiback_cmd_done->target_put_sess_cmd. We do an extra
transport_generic_free_cmd() call above which would have dropped the
refcount to -1 and the refcount code would spit out errors.

Link: https://lore.kernel.org/r/20210227170006.5077-13-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:01 -05:00
Mike Christie
eb929804db scsi: target: vhost-scsi: Convert to new submission API
target_submit_cmd_map_sgls() is being removed, so convert vhost-scsi to the
new submission API. This has it use target_init_cmd(),
target_submit_prep(), target_submit() because we need to have LIO core map
sgls which is now done in target_submit_prep(), and in the next patches we
will do the target_submit step from the LIO workqueue.

Note: vhost-scsi never calls target_stop_session() so
target_submit_cmd_map_sgls() never failed (in the new API target_init_cmd()
handles target_stop_session() being called when cmds are being
submitted). If it were to have used target_stop_session() and got an error,
we would have hit a refcount bug like xen and usb, because it does:

if (rc < 0) {
	transport_send_check_condition_and_sense(se_cmd,
			TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE, 0);
	transport_generic_free_cmd(se_cmd, 0);
}

transport_send_check_condition_and_sense() calls queue_status which does
transport_generic_free_cmd(), and then we do an extra
transport_generic_free_cmd() call above which would have dropped the
refcount to -1 and the refcount code would spit out errors.

Link: https://lore.kernel.org/r/20210227170006.5077-12-michael.christie@oracle.com
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:01 -05:00
Mike Christie
12340930a3 scsi: target: usb: gadget: Convert to new submission API
target_submit_cmd() is now only for simple drivers that do their own sync
during shutdown and do not use target_stop_session(). It will never return
a failure, so we can remove that code from the driver.

Note: Before these patches target_submit_cmd() would never return an error
for usb since it does not use target_stop_session(). If it did then we
would have hit a refcount error here:

    transport_send_check_condition_and_sense(se_cmd,
                             TCM_UNSUPPORTED_SCSI_OPCODE, 1);
    transport_generic_free_cmd(&cmd->se_cmd, 0);

transport_send_check_condition_and_sense() calls queue_status and the
driver can sometimes do transport_generic_free_cmd() from there via
uasp_status_data_cmpl(). In that case, the above
transport_generic_free_cmd() would then hit a refcount error.

So that other use of the above error path in the driver is also probably
wrong, but someone with the hardware needs to fix that.

Link: https://lore.kernel.org/r/20210227170006.5077-11-michael.christie@oracle.com
Cc: Felipe Balbi <balbi@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:01 -05:00
Mike Christie
c7e086b8d7 scsi: target: sbp_target: Convert to new submission API
target_submit_cmd() is now only for simple drivers that do their own sync
during shutdown and do not use target_stop_session(). It will never return
a failure, so we can remove that code from the driver.

Link: https://lore.kernel.org/r/20210227170006.5077-10-michael.christie@oracle.com
Cc: Chris Boot <bootc@bootc.net>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:01 -05:00
Mike Christie
17ae18a6ef scsi: target: tcm_loop: Convert to new submission API
target_submit_cmd_map_sgls() is being removed, so convert loop to
the new submission API.

Even though loop does its own shutdown sync, this has loop use
target_init_cmd()/target_submit_prep()/target_submit() since it needed to
map sgls and in the next patches it will use the API to use LIO's
workqueue.

Link: https://lore.kernel.org/r/20210227170006.5077-9-michael.christie@oracle.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:01 -05:00
Mike Christie
919ba0ad7d scsi: target: qla2xxx: Convert to new submission API
target_submit_cmd() is now only for simple drivers that do their
own sync during shutdown and do not use target_stop_session().

tcm_qla2xxx uses target_stop_session() to sync session shutdown with LIO
core, so we use target_init_cmd()/target_submit_prep()/target_submit(),
because target_init_cmd() will detect the target_stop_session() call and
return an error.

Link: https://lore.kernel.org/r/20210227170006.5077-8-michael.christie@oracle.com
Cc: Nilesh Javali <njavali@marvell.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:01 -05:00
Mike Christie
7d676851de scsi: target: ibmvscsi_tgt: Convert to new submission API
target_submit_cmd() is now only for simple drivers that do their own sync
during shutdown and do not use target_stop_session(). It will never return
a failure, so we can remove that code from the driver.

Link: https://lore.kernel.org/r/20210227170006.5077-7-michael.christie@oracle.com
Cc: Michael Cyr <mikecyr@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:01 -05:00
Mike Christie
50ab9c47f5 scsi: target: srpt: Convert to new submission API
target_submit_cmd_map_sgls() is being removed, so convert srpt to the new
submission API.

srpt uses target_stop_session() to sync session shutdown with LIO core, so
we use target_init_cmd()/target_submit_prep()/target_submit(), because
target_init_cmd() will detect the target_stop_session() call and return an
error.

Link: https://lore.kernel.org/r/20210227170006.5077-6-michael.christie@oracle.com
Cc: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:00 -05:00
Mike Christie
750a1d93f9 scsi: target: core: Break up target_submit_cmd_map_sgls()
This breaks up target_submit_cmd_map_sgls() into 3 helpers:

 - target_init_cmd(): Do the basic general setup and get a refcount to the
   session to make sure the caller can execute the cmd.

 - target_submit_prep(): Do the mapping, cdb processing and get a ref to
   the LUN.

 - target_submit(): Pass the cmd to LIO core for execution.

The above functions must be used by drivers that either:

 1. Rely on LIO for session shutdown synchronization by calling
    target_stop_session().

 2. Need to map sgls.

When the next patches are applied then simple drivers that do not need the
extra functionality above can use target_submit_cmd() and not worry about
failures being returned and how to handle them, since many drivers were
getting this wrong and would have hit refcount bugs.

Also, by breaking target_submit_cmd_map_sgls() up into these 3 helper
functions, we can allow the later patches to do the init/prep from
interrupt context and then do the submission from a workqueue.

Link: https://lore.kernel.org/r/20210227170006.5077-5-michael.christie@oracle.com
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Nilesh Javali <njavali@marvell.com>
Cc: Michael Cyr <mikecyr@linux.ibm.com>
Cc: Chris Boot <bootc@bootc.net>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:00 -05:00
Mike Christie
a78b713618 scsi: target: core: Rename transport_init_se_cmd()
Rename transport_init_se_cmd() to __target_init_cmd() to reflect that it is
more of an internal function that drivers should normally not use and
because we are going to add a new init function in the next patches.

Link: https://lore.kernel.org/r/20210227170006.5077-4-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:00 -05:00
Mike Christie
cb222a013d scsi: target: core: Drop kref_get_unless_zero() in target_get_sess_cmd()
The kref_get_unless_zero() use in target_get_sess_cmd() was added in:

    commit 1b4c59b7a1 ("target: fix potential race window in
    target_sess_cmd_list_waiting()")'

but it does not seem to do anything.

The original patch might have thought we could have added the cmd to the
sess_wait_list and then target_wait_for_sess_cmds could do a put before
target_get_sess_cmd did its get. That wouldn't happen because we do the get
first then grab the sess lock and put it on the list.

It is also not needed now, because the sess_cmd_list does not exist anymore
and we instead wait on the session cmd_count.

The other problem with the commit is that several
target_submit_cmd_map_sgls()/target_submit_cmd() callers do not handle the
error case properly if it were to ever happen. These drivers think they
have their normal refcount on the cmd and in many cases do a
transport_generic_free_cmd() plus target_put_sess_cmd() so they would have
fired off the refcount WARN/BUGs.

This patch just changes the kref_get_unless_zero() to kref_get().

Link: https://lore.kernel.org/r/20210227170006.5077-3-michael.christie@oracle.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:00 -05:00
Mike Christie
a9294d8674 scsi: target: core: Move t_task_cdb initialization
Prepare to split target_submit_cmd_map_sgls() so the initialization and
submission part can be called at different times. If the init part fails we
can reference the t_task_cdb early in some of the logging and tracing
code. Move it to transport_init_se_cmd() so we don't hit NULL pointer
crashes.

Link: https://lore.kernel.org/r/20210227170006.5077-2-michael.christie@oracle.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:00 -05:00
Ming Lei
020b0f0a31 scsi: core: Replace sdev->device_busy with sbitmap
SCSI currently uses an atomic variable to track queue depth for each
attached device. The queue depth depends on many factors such as transport
type and device implementation. In addition, the SCSI device queue depth is
not a static entity but changes over time as a result of congestion
management.

While blk-mq currently tracks queue depth for each hctx, it can't easily be
changed to accommodate the SCSI per-device requirement.

The current approach of using an atomic variable doesn't scale well when
there are lots of CPU cores and the disk is very fast. IOPS can be
substantially impacted by the atomic in the hot path.

Replace the atomic variable sdev->device_busy with an sbitmap for tracking
the SCSI device queue depth.

It has been observed that IOPS is improved ~30% by this patchset in the
following test:

1) test machine(32 logical CPU cores)
	Thread(s) per core:  2
	Core(s) per socket:  8
	Socket(s):           2
	NUMA node(s):        2
	Model name:          Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz

2) setup scsi_debug:
modprobe scsi_debug virtual_gb=128 max_luns=1 submit_queues=32 delay=0 max_queue=256

3) fio script:
fio --rw=randread --size=128G --direct=1 --ioengine=libaio --iodepth=2048 \
	--numjobs=32 --bs=4k --group_reporting=1 --group_reporting=1 --runtime=60 \
	--loops=10000 --name=job1 --filename=/dev/sdN

[mkp: fix device_busy reference in mpt3sas]

Link: https://lore.kernel.org/r/20210122023317.687987-14-ming.lei@redhat.com
Link: https://lore.kernel.org/linux-block/20200119071432.18558-6-ming.lei@redhat.com/
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:00 -05:00
Ming Lei
ca44532139 scsi: core: Make sure sdev->queue_depth is <= max(shost->can_queue, 1024)
Limit SCSI device's queue depth to max(host->can_queue, 1024) in
scsi_change_queue_depth(). 1024 is big enough for saturating current fast
SCSI LUN(SSD or RAID volume on multiple SSDs). Also single hardware queue
depth is usually enough for saturating single LUN because per-core
performance is often considered in storage design.

This patch is needed for replacing sdev->device_busy with sbitmap which has
to be pre-allocated with reasonable max depth.

Link: https://lore.kernel.org/r/20210122023317.687987-13-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:00 -05:00
Ming Lei
8278807abd scsi: core: Add scsi_device_busy() wrapper
Add scsi_device_busy() helper to prepare drivers for tracking device queue
depth via sbitmap_queue.

Link: https://lore.kernel.org/r/20210122023317.687987-12-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:00 -05:00
Kashyap Desai
6cb9b15238 scsi: megaraid_sas: Replace sdev_busy with local counter
Use local tracking of per-sdev outstanding command since sdev_busy in SCSI
mid layer is improved for performance reason using sbitmap (earlier it was
atomic variable).

Link: https://lore.kernel.org/r/20210122023317.687987-11-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:00 -05:00
Ming Lei
9ebb4d70dc scsi: core: Put hot fields of scsi_host_template in one cacheline
The following three fields of scsi_host_template are referenced in the SCSI
I/O submission hot path. Put them together in one cacheline:

 - cmd_size

 - queuecommand

 - commit_rqs

Link: https://lore.kernel.org/r/20210122023317.687987-10-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:00 -05:00
Ming Lei
2a5a24aa83 scsi: blk-mq: Return budget token from .get_budget callback
SCSI uses a global atomic variable to track queue depth for each
LUN/request queue.

This doesn't scale well when there are lots of CPU cores and the disk is
very fast. It has been observed that IOPS is affected a lot by tracking
queue depth via sdev->device_busy in the I/O path.

Return budget token from .get_budget callback. The budget token can be
passed to driver so that we can replace the atomic variable with
sbitmap_queue and alleviate the scaling problems that way.

Link: https://lore.kernel.org/r/20210122023317.687987-9-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:36:59 -05:00
Ming Lei
d022d18c04 scsi: blk-mq: Add callbacks for storing & retrieving budget token
Since SCSI is the only driver which requires dispatch budget move the token
from struct request to struct scsi_cmnd.

Link: https://lore.kernel.org/r/20210122023317.687987-8-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:36:59 -05:00
Ming Lei
2d13b1ea9f scsi: sbitmap: Add sbitmap_calculate_shift() helper
Move code for calculating default shift into a public helper which can be
used by SCSI.

Link: https://lore.kernel.org/r/20210122023317.687987-7-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:36:59 -05:00
Ming Lei
cbb9950b41 scsi: sbitmap: Export sbitmap_weight
SCSI's .device_busy will be converted to sbitmap and sbitmap_weight is
needed. Export the helper.

The only existing user of sbitmap_weight() uses it to find out how many
bits are set and not cleared. Align sbitmap_weight() meaning with this
usage model.

Link: https://lore.kernel.org/r/20210122023317.687987-6-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:36:59 -05:00