linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-30 08:01:59 +00:00

Author	SHA1	Message	Date
Xiang Chen	b4cc094922	scsi: hisi_sas: Use autosuspend for the host controller The controller may frequently enter and exit suspend for each I/O which we need to deal with. This is inefficient and may cause too much suspend and resume activity for the controller. To avoid this, use a default 5s autosuspend for the controller to stop frequently suspending and resuming. This value may still be modified via sysfs interfaces. Link: https://lore.kernel.org/r/1639999298-244569-16-git-send-email-chenxiang66@hisilicon.com Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:31 -05:00
Xiang Chen	307d9f49cc	scsi: libsas: Keep host active while processing events Processing events such as PORTE_BROADCAST_RCVD may cause dependency issues for runtime power management support. Such a problem would be that handling a PORTE_BROADCAST_RCVD event requires that the host is resumed to send SMP commands. However, in resuming the host, the phyup events generated from re-enabling the phys are processed in the same workqueue as the original PORTE_BROADCAST_RCVD event. As such, the host will never finish resuming (as it waits for the phyup event processing), and then the PORTE_BROADCAST_RCVD event can't be processed as the SMP commands are blocked, and so we have a deadlock. Solve this problem by ensuring that libsas keeps the host active until completely finished phy or port events, such as PORTE_BYTES_DMAED. As such, we don't have to worry about resuming the host for processing individual SMP commands in this example. Link: https://lore.kernel.org/r/1639999298-244569-15-git-send-email-chenxiang66@hisilicon.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:31 -05:00
Xiang Chen	ae9b69e85e	scsi: hisi_sas: Keep controller active between ISR of phyup and the event being processed It is possible that controller may become suspended between processing a phyup interrupt and the event being processed by libsas. As such, we can't ensure the controller is active when processing the phyup event - this may cause the phyup event to be lost or other issues. To avoid any possible issues, add pm_runtime_get_noresume() in phyup interrupt handler and pm_runtime_put_sync() in the work handler exit to ensure that we stay always active. Since we only want to call pm_runtime_get_noresume() for v3 hw, signal this will a new event, HISI_PHYE_PHY_UP_PM. Link: https://lore.kernel.org/r/1639999298-244569-14-git-send-email-chenxiang66@hisilicon.com Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:31 -05:00
Xiang Chen	bf19aea460	scsi: libsas: Defer works of new phys during suspend During the processing of event PORT_BYTES_DMAED, the driver queues work DISCE_DISCOVER_DOMAIN and then flushes workqueue ha->disco_q. If a new phyup event occurs during resuming the controller, the work PORTE_BYTES_DMAED of new phy occurs before suspended phy's. The work DISCE_DISCOVER_DOMAIN of new phy requires an active SAS controller (it needs to resume SAS controller by function scsi_sysfs_add_sdev() and some other functions such as function add_device_link()). However, the activation of the SAS controller requires completion of work PORTE_BYTES_DMAED of suspended phys while it is blocked by new phy's work on ha->event_q. So there is a deadlock and it is released only after resume timeout. To solve the issue, defer works of new phys during suspend and queue those defer works after SAS controller becomes active. Link: https://lore.kernel.org/r/1639999298-244569-13-git-send-email-chenxiang66@hisilicon.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:30 -05:00
Xiang Chen	1bc35475c6	scsi: libsas: Refactor sas_queue_deferred_work() In the second part of function __sas_drain_work(), deferred work is queued. This functionality is required other places so factor it out into the function sas_queue_deferred_work(). Link: https://lore.kernel.org/r/1639999298-244569-12-git-send-email-chenxiang66@hisilicon.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:30 -05:00
Xiang Chen	4ea775abbb	scsi: libsas: Add flag SAS_HA_RESUMING Add a flag SAS_HA_RESUMING and use it to indicate the state of resuming the host controller. Link: https://lore.kernel.org/r/1639999298-244569-11-git-send-email-chenxiang66@hisilicon.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:30 -05:00
Xiang Chen	0da7ca4c4f	scsi: libsas: Resume host while sending SMP I/Os When sending SMP I/Os to the host we need to ensure that the host is not suspended and can process the commands. This is a better approach than replying on the host to resume itself to handle such commands. Use pm_runtime_get_sync() and pm_runtime_put_sync() calls for the host when executing SMP I/Os. Link: https://lore.kernel.org/r/1639999298-244569-10-git-send-email-chenxiang66@hisilicon.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:30 -05:00
Xiang Chen	97f4100939	scsi: hisi_sas: Add more logs for runtime suspend/resume Add some logs at the beginning and end of suspend/resume. Link: https://lore.kernel.org/r/1639999298-244569-9-git-send-email-chenxiang66@hisilicon.com Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:30 -05:00
Xiang Chen	e31e18128e	scsi: libsas: Insert PORTE_BROADCAST_RCVD event for resuming host If a new disk is inserted through an expander when the host was suspended, it will not necessarily be detected as the topology is not re-scanned during resume. To detect possible changes in topology during suspension, insert a PORTE_BROADCAST_RCVD event per port when resuming to trigger a revalidation. Link: https://lore.kernel.org/r/1639999298-244569-8-git-send-email-chenxiang66@hisilicon.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:30 -05:00
Xiang Chen	133b688b2d	scsi: mvsas: Add spin_lock/unlock() to protect asd_sas_port->phy_list phy_list_lock is not held when using asd_sas_port->phy_list in the mvsas driver. Add spin_lock/unlock in those places. Link: https://lore.kernel.org/r/1639999298-244569-7-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:29 -05:00
Xiang Chen	29e2bac874	scsi: hisi_sas: Fix some issues related to asd_sas_port->phy_list Most places that use asd_sas_port->phy_list are protected by spinlock asd_sas_port->phy_list_lock, however there are still some places which miss grabbing the lock. Add it in function hisi_sas_refresh_port_id() when accessing asd_sas_port->phy_list. This carries a risk that list mutates while at the same time dropping the lock in function hisi_sas_send_ata_reset_each_phy(). Read asd_sas_port->phy_mask instead of accessing asd_sas_port->phy_list to avoid this risk. Link: https://lore.kernel.org/r/1639999298-244569-6-git-send-email-chenxiang66@hisilicon.com Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:29 -05:00
Xiang Chen	42159d3c8d	scsi: libsas: Add spin_lock/unlock() to protect asd_sas_port->phy_list Most places that use asd_sas_port->phy_list in libsas are protected by spinlock asd_sas_port->phy_list_lock. However, there are still a few places which miss the lock. Add it in those places. Link: https://lore.kernel.org/r/1639999298-244569-5-git-send-email-chenxiang66@hisilicon.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:29 -05:00
Alan Stern	6e1fcab00a	scsi: block: pm: Always set request queue runtime active in blk_post_runtime_resume() John Garry reported a deadlock that occurs when trying to access a runtime-suspended SATA device. For obscure reasons, the rescan procedure causes the link to be hard-reset, which disconnects the device. The rescan tries to carry out a runtime resume when accessing the device. scsi_rescan_device() holds the SCSI device lock and won't release it until it can put commands onto the device's block queue. This can't happen until the queue is successfully runtime-resumed or the device is unregistered. But the runtime resume fails because the device is disconnected, and __scsi_remove_device() can't do the unregistration because it can't get the device lock. The best way to resolve this deadlock appears to be to allow the block queue to start running again even after an unsuccessful runtime resume. The idea is that the driver or the SCSI error handler will need to be able to use the queue to resolve the runtime resume failure. This patch removes the err argument to blk_post_runtime_resume() and makes the routine act as though the resume was successful always. This fixes the deadlock. Link: https://lore.kernel.org/r/1639999298-244569-4-git-send-email-chenxiang66@hisilicon.com Fixes: `e27829dc92` ("scsi: serialize ->rescan against ->remove") Reported-and-tested-by: John Garry <john.garry@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:29 -05:00
John Garry	6cc7390877	scsi: Revert "scsi: hisi_sas: Filter out new PHY up events during suspend" This reverts commit `b14a37e011`. In that commit, we had to filter out phy-up events during suspend, as it work cause a deadlock between processing the phyup event and the resume HA function try to drain the HA event workqueue to complete the resume process. Now that we no longer try to drain the HA event queue during the HA resume processor, the deadlock would not occur, so remove the special handling for it. Link: https://lore.kernel.org/r/1639999298-244569-3-git-send-email-chenxiang66@hisilicon.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:29 -05:00
John Garry	fbefe22811	scsi: libsas: Don't always drain event workqueue for HA resume For the hisi_sas driver, if a directly attached disk is removed during suspend, a hang will occur in the resume process: The background is that in commit `16fd4a7c59` ("scsi: hisi_sas: Add device link between SCSI devices and hisi_hba"), it is ensured that the HBA device cannot be runtime suspended when any SCSI device associated is active. Other drivers which use libsas don't worry about this as none support runtime suspend. The mentioned hang occurs when an disk is removed during suspend. In the removal process - from PHYE_RESUME_TIMEOUT event processing - we call into scsi_remove_device(), which is being processed in the HA event workqueue. Here we wait for all suppliers of the SCSI device to resume, which includes the HBA device (from the above commit). However the HBA device cannot resume, as it is waiting for the PHYE_RESUME_TIMEOUT to be processed (from calling sas_resume_ha() -> sas_drain_work()). This is the deadlock. There does not appear to be any need for the sas_drain_work() to be called at all in sas_resume_ha() as it is not syncing against anything, so allow LLDDs to avoid this by providing a variant of sas_resume_ha() which does "sync", i.e. doesn't drain the event workqueue. Link: https://lore.kernel.org/r/1639999298-244569-2-git-send-email-chenxiang66@hisilicon.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:29 -05:00
John Garry	4be6181fea	scsi: libsas: Decode SAM status and host byte codes Value 0 is used for SAM status and libsas exec_status bytes codes in sas_end_task() - use defined macros instead. In addition, change to proper enum types. Also replace SAM_STAT_CHECK_CONDITION with SAS_SAM_STAT_CHECK_CONDITION, the former being a proper member of enum exec_status. Link: https://lore.kernel.org/r/1639579061-179473-9-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-16 22:59:58 -05:00
Qi Liu	37310bad7f	scsi: hisi_sas: Fix phyup timeout on FPGA The OOB interrupt and phyup interrupt handlers may run out-of-order in high CPU usage scenarios. Since the hisi_sas_phy.timer is added in hisi_sas_phy_oob_ready() and disarmed in phy_up_v3_hw(), this out-of-order execution will cause hisi_sas_phy.timer timeout to trigger. To solve, protect hisi_sas_phy.timer and .attached with a lock, and ensure that the timer won't be added after phyup handler completes. Link: https://lore.kernel.org/r/1639579061-179473-8-git-send-email-john.garry@huawei.com Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-16 22:59:58 -05:00
Qi Liu	16775db613	scsi: hisi_sas: Prevent parallel FLR and controller reset If we issue a controller reset command during executing a FLR a hung task may be found: Call trace: __switch_to+0x158/0x1cc __schedule+0x2e8/0x85c schedule+0x7c/0x110 schedule_timeout+0x190/0x1cc __down+0x7c/0xd4 down+0x5c/0x7c hisi_sas_task_exec+0x510/0x680 [hisi_sas_main] hisi_sas_queue_command+0x24/0x30 [hisi_sas_main] smp_execute_task_sg+0xf4/0x23c [libsas] sas_smp_phy_control+0x110/0x1e0 [libsas] transport_sas_phy_reset+0xc8/0x190 [libsas] phy_reset_work+0x2c/0x40 [libsas] process_one_work+0x1dc/0x48c worker_thread+0x15c/0x464 kthread+0x160/0x170 ret_from_fork+0x10/0x18 This is a race condition which occurs when the FLR completes first. Here the host HISI_SAS_RESETTING_BIT flag out gets of sync as HISI_SAS_RESETTING_BIT is not always cleared with the hisi_hba.sem held, so now only set/unset HISI_SAS_RESETTING_BIT under hisi_hba.sem . Link: https://lore.kernel.org/r/1639579061-179473-7-git-send-email-john.garry@huawei.com Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-16 22:59:58 -05:00
Qi Liu	20c634932a	scsi: hisi_sas: Prevent parallel controller reset and control phy command A user may issue a control phy command from sysfs at any time, even if the controller is resetting. If a phy is disabled by hardreset/linkreset command before calling get_phys_state() in the reset path, the saved phy state may be incorrect. To avoid incorrectly recording the phy state, use hisi_hba.sem to ensure that the controller reset may not run at the same time as when the phy control function is running. Link: https://lore.kernel.org/r/1639579061-179473-6-git-send-email-john.garry@huawei.com Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-16 22:59:57 -05:00
John Garry	dc313f6b12	scsi: hisi_sas: Factor out task prep and delivery code The task prep code is the same between the normal path (in hisi_sas_task_prep()) and the internal abort path, so factor is out into a common function. Link: https://lore.kernel.org/r/1639579061-179473-5-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-16 22:59:57 -05:00
John Garry	08c61b5d90	scsi: hisi_sas: Pass abort structure for internal abort To help factor out code in future, it's useful to know if we're executing an internal abort, so pass a pointer to the structure. The idea is that a NULL pointer means not an internal abort. Link: https://lore.kernel.org/r/1639579061-179473-4-git-send-email-john.garry@huawei.com Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-16 22:59:57 -05:00
John Garry	934385a4fd	scsi: hisi_sas: Make internal abort have no task proto For an internal abort, the task does not have a protocol, so set to none. This will make it easier to differentiate internal abort tasks in future. Link: https://lore.kernel.org/r/1639579061-179473-3-git-send-email-john.garry@huawei.com Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-16 22:59:57 -05:00
John Garry	0e4620856b	scsi: hisi_sas: Start delivery hisi_sas_task_exec() directly Currently we start delivery of commands to the DQ after returning from hisi_sas_task_exec() with success. Let's just start delivery directly in that function without having to check if some local variable is set. Link: https://lore.kernel.org/r/1639579061-179473-2-git-send-email-john.garry@huawei.com Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-16 22:59:57 -05:00
Christoph Hellwig	efac162a4e	scsi: efct: Don't pass GFP_DMA to dma_alloc_coherent() dma_alloc_coherent() ignores the zone specifiers so this is pointless and confusing. Link: https://lore.kernel.org/r/20211214163605.416288-1-hch@lst.de Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-16 22:52:29 -05:00
Bean Huo	99c66a8868	scsi: ufs: core: Fix deadlock issue in ufshcd_wait_for_doorbell_clr() Call shost_for_each_device() with holding host->host_lock will cause a deadlock situation, which will cause the system to stall (the log as follow). Fix this issue by using __shost_for_each_device() in ufshcd_pending_cmds(). stalls on CPUs/tasks: all trace: __switch_to+0x120/0x170 0xffff800011643998 ask dump for CPU 5: ask:kworker/u16:2 state:R running task stack: 0 pid: 80 ppid: 2 flags:0x0000000a orkqueue: events_unbound async_run_entry_fn all trace: __switch_to+0x120/0x170 0x0 ask dump for CPU 6: ask:kworker/u16:6 state:R running task stack: 0 pid: 164 ppid: 2 flags:0x0000000a orkqueue: events_unbound async_run_entry_fn all trace: __switch_to+0x120/0x170 0xffff54e7c4429f80 ask dump for CPU 7: ask:kworker/u16:4 state:R running task stack: 0 pid: 153 ppid: 2 flags:0x0000000a orkqueue: events_unbound async_run_entry_fn all trace: __switch_to+0x120/0x170 blk_mq_run_hw_queue+0x34/0x110 blk_mq_sched_insert_request+0xb0/0x120 blk_execute_rq_nowait+0x68/0x88 blk_execute_rq+0x4c/0xd8 __scsi_execute+0xec/0x1d0 scsi_vpd_inquiry+0x84/0xf0 scsi_get_vpd_buf+0x34/0xb8 scsi_attach_vpd+0x34/0x140 scsi_probe_and_add_lun+0xa6c/0xab8 __scsi_scan_target+0x438/0x4f8 scsi_scan_channel+0x6c/0xa8 scsi_scan_host_selected+0xf0/0x150 do_scsi_scan_host+0x88/0x90 scsi_scan_host+0x1b4/0x1d0 ufshcd_async_scan+0x248/0x310 async_run_entry_fn+0x30/0x178 process_one_work+0x1e8/0x368 worker_thread+0x40/0x478 kthread+0x174/0x180 ret_from_fork+0x10/0x20 Link: https://lore.kernel.org/r/20211214120537.531628-1-huobean@gmail.com Fixes: `8d077ede48` ("scsi: ufs: Optimize the command queueing code") Reported-by: YongQin Liu <yongqin.liu@linaro.org> Reported-by: Amit Pundir <amit.pundir@linaro.org> Tested-by: John Stultz <john.stultz@linaro.org> Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Co-developed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-16 22:51:13 -05:00
Hannes Reinecke	baea0e833f	scsi: qla2xxx: Synchronize rport dev_loss_tmo setting Currently, the dev_loss_tmo setting is only ever used for SCSI devices. This patch reshuffles initialisation such that the SCSI remote ports are registered before the NVMe ones, allowing the dev_loss_tmo setting to be synchronized between SCSI and NVMe. Link: https://lore.kernel.org/r/20211214111139.52503-1-dwagner@suse.de Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-16 22:48:33 -05:00
Martin K. Petersen	87f77d37d3	Merge branch '5.16/scsi-fixes' into 5.17/scsi-staging Pull in the 5.16 fixes branch to resolve a conflict in the UFS driver core. Conflicts: drivers/scsi/ufs/ufshcd.c Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-16 22:38:19 -05:00
Christophe JAILLET	8c2d045515	scsi: hpsa: Remove an unused variable in hpsa_update_scsi_devices() 'lunzerobits' is unused. Remove it. This a left over of commit `2d62a33e05` ("hpsa: eliminate fake lun0 enclosures") Link: https://lore.kernel.org/r/9f80ea569867b5f7ae1e0f99d656e5a8bacad34e.1639084205.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-13 23:34:01 -05:00
Kees Cook	c167dd0b2a	scsi: lpfc: Use struct_group to isolate cast to larger object When building under -Warray-bounds, a warning is generated when casting a u32 into MAILBOX_t (which is larger). This warning is conservative, but it's not an unreasonable change to make to improve future robustness. Use a tagged struct_group that can refer to either the specific fields or the first u32 separately, silencing this warning: drivers/scsi/lpfc/lpfc_sli.c: In function 'lpfc_reset_barrier': drivers/scsi/lpfc/lpfc_sli.c:4787:29: error: array subscript 'MAILBOX_t[0]' is partly outside array bounds of 'volatile uint32_t[1]' {aka 'volatile unsigned int[1]'} [-Werror=array-bounds] 4787 \| ((MAILBOX_t *)&mbox)->mbxCommand = MBX_KILL_BOARD; \| ^~ drivers/scsi/lpfc/lpfc_sli.c:4752:27: note: while referencing 'mbox' 4752 \| volatile uint32_t mbox; \| ^~~~ There is no change to the resulting executable instruction code. Link: https://lore.kernel.org/r/20211203223351.107323-1-keescook@chromium.org Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-13 23:31:40 -05:00
Kees Cook	532adda9f4	scsi: lpfc: Use struct_group() to initialize struct lpfc_cgn_info In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(), avoid intentionally writing across neighboring fields. Add struct_group() to mark "stat" region of struct lpfc_cgn_info that should be initialized to zero, and refactor the "data" region memset() to wipe everything up to the cgn_stats region. Link: https://lore.kernel.org/r/20211208195957.1603092-1-keescook@chromium.org Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-13 23:30:30 -05:00
Roman Bolshakov	69002c8ce9	scsi: qla2xxx: Format log strings only if needed Commit `598a90f200` ("scsi: qla2xxx: add ring buffer for tracing debug logs") introduced unconditional log string formatting to ql_dbg() even if ql_dbg_log event is disabled. It harms performance because some strings are formatted in fastpath and/or interrupt context. Link: https://lore.kernel.org/r/20211112145446.51210-1-r.bolshakov@yadro.com Fixes: `598a90f200` ("scsi: qla2xxx: add ring buffer for tracing debug logs") Cc: Rajan Shanmugavelu <rajan.shanmugavelu@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:38:17 -05:00
James Smart	4437503bfb	scsi: lpfc: Update lpfc version to 14.0.0.4 Update lpfc version to 14.0.0.4. Link: https://lore.kernel.org/r/20211204002644.116455-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:35:37 -05:00
James Smart	6014a2468f	scsi: lpfc: Add additional debugfs support for CMF Dump raw CMF parameter information in debugfs cgn_buffer. Link: https://lore.kernel.org/r/20211204002644.116455-9-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:35:37 -05:00
James Smart	05116ef9c4	scsi: lpfc: Cap CMF read bytes to MBPI Ensure read bytes data does not go over MBPI for CMF timer intervals that are purposely shortened. Link: https://lore.kernel.org/r/20211204002644.116455-8-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:35:37 -05:00
James Smart	a6269f8370	scsi: lpfc: Adjust CMF total bytes and rxmonitor Calculate any extra bytes needed to account for timer accuracy. If we are less than LPFC_CMF_INTERVAL, then calculate the adjustment needed for total to reflect a full LPFC_CMF_INTERVAL. Add additional info to rxmonitor, and adjust some log formatting. Link: https://lore.kernel.org/r/20211204002644.116455-7-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:35:37 -05:00
James Smart	7dd2e2a923	scsi: lpfc: Trigger SLI4 firmware dump before doing driver cleanup Extraneous teardown routines are present in the firmware dump path causing altered states in firmware captures. When a firmware dump is requested via sysfs, trigger the dump immediately without tearing down structures and changing adapter state. The driver shall rely on pre-existing firmware error state clean up handlers to restore the adapter. Link: https://lore.kernel.org/r/20211204002644.116455-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:35:36 -05:00
James Smart	8ed190a919	scsi: lpfc: Fix NPIV port deletion crash The driver is calling schedule_timeout after the DA_ID nameserver request and LOGO commands are issued to the fabric by the initiator virtual endport. These fixed delay functions are causing long delays in the driver's worker thread when processing discovery I/Os in a serialized fashion, which is then triggering mailbox timeout errors artificially. To fix this, don't wait on the DA_ID request to complete and call wait_event_timeout to allow the vport delete thread to make progress on an event driven basis rather than fixing the wait time. Link: https://lore.kernel.org/r/20211204002644.116455-5-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:35:36 -05:00
James Smart	7576d48c64	scsi: lpfc: Fix lpfc_force_rscn ndlp kref imbalance Issuing lpfc_force_rscn twice results in an ndlp kref use-after-free call trace. A prior patch reworked the get/put handling by ensuring nlp_get was done before WQE submission and a put was done in the completion path. Unfortunately, the issue_els_rscn path had a piece of legacy code that did a nlp_put, causing an imbalance on the ref counts. Fixed by removing the unnecessary legacy code snippet. Link: https://lore.kernel.org/r/20211204002644.116455-4-jsmart2021@gmail.com Fixes: `4430f7fd09` ("scsi: lpfc: Rework locations of ndlp reference taking") Cc: <stable@vger.kernel.org> # v5.11+ Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:35:36 -05:00
James Smart	2e81b1a374	scsi: lpfc: Change return code on I/Os received during link bounce During heavy I/O testing with issue_lip to bounce the link, occasionally I/O is terminated with status 3 result 9, which means the RPI is suspended. The I/O is completed and this type of error will result in immediate retry by the SCSI layer. The retry count expires and the I/O fails and returns error to the application. To avoid these quick retry/retries exhausted scenarios change the return code given to the midlayer to DID_REQUEUE rather than DID_ERROR. This gets them retried, and eventually succeed when the link recovers. Link: https://lore.kernel.org/r/20211204002644.116455-3-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:35:36 -05:00
James Smart	f0d3919697	scsi: lpfc: Fix leaked lpfc_dmabuf mbox allocations with NPIV During rmmod testing, messages appeared indicating lpfc_mbuf_pool entries were still busy. This situation was only seen doing rmmod after at least 1 vport (NPIV) instance was created and destroyed. The number of messages scaled with the number of vports created. When a vport is created, it can receive a PLOGI from another initiator Nport. When this happens, the driver prepares to ack the PLOGI and prepares an RPI for registration (via mbx cmd) which includes an mbuf allocation. During the unsolicited PLOGI processing and after the RPI preparation, the driver recognizes it is one of the vport instances and decides to reject the PLOGI. During the LS_RJT preparation for the PLOGI, the mailbox struct allocated for RPI registration is freed, but the mbuf that was also allocated is not released. Fix by freeing the mbuf with the mailbox struct in the LS_RJT path. As part of the code review to figure the issue out a couple of other areas where found that also would not have released the mbuf. Those are cleaned up as well. Link: https://lore.kernel.org/r/20211204002644.116455-2-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:35:36 -05:00
Bart Van Assche	eaab9b5730	scsi: ufs: Implement polling support The time spent in io_schedule() and also the interrupt latency are significant when submitting direct I/O to a UFS device. Hence this patch that implements polling support. User space software can enable polling by passing the RWF_HIPRI flag to the preadv2() system call or the IORING_SETUP_IOPOLL flag to the io_uring interface. Although the block layer supports to partition the tag space for interrupt-based completions (HCTX_TYPE_DEFAULT) purposes and polling (HCTX_TYPE_POLL), the choice has been made to use the same hardware queue for both hctx types because partitioning the tag space would negatively affect performance. On my test setup this patch increases IOPS from 2736 to 22000 (8x) for the following test: for hipri in 0 1; do fio --ioengine=io_uring --iodepth=1 --rw=randread \ --runtime=60 --time_based=1 --direct=1 --name=qd1 \ --filename=/dev/block/sda --ioscheduler=none --gtod_reduce=1 \ --norandommap --hipri=$hipri done Link: https://lore.kernel.org/r/20211203231950.193369-18-bvanassche@acm.org Tested-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:30:34 -05:00
Bart Van Assche	8d077ede48	scsi: ufs: Optimize the command queueing code Remove the clock scaling lock from ufshcd_queuecommand() since it is a performance bottleneck. Instead check the SCSI device budget bitmaps in the code that waits for ongoing ufshcd_queuecommand() calls. A bit is set in sdev->budget_map just before scsi_queue_rq() is called and a bit is cleared from that bitmap if scsi_queue_rq() does not submit the request or after the request has finished. See also the blk_mq_{get,put}_dispatch_budget() calls in the block layer. There is no risk for a livelock since the block layer delays queue reruns if queueing a request fails because the SCSI host has been blocked. Link: https://lore.kernel.org/r/20211203231950.193369-17-bvanassche@acm.org Cc: Asutosh Das (asd) <asutoshd@codeaurora.org> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:30:34 -05:00
Bart Van Assche	5675c381ea	scsi: ufs: Stop using the clock scaling lock in the error handler Instead of locking and unlocking the clock scaling lock, surround the command queueing code with an RCU reader lock and call synchronize_rcu(). This patch prepares for removal of the clock scaling lock. Link: https://lore.kernel.org/r/20211203231950.193369-16-bvanassche@acm.org Tested-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:30:34 -05:00
Bart Van Assche	3489c34bd0	scsi: ufs: Fix a kernel crash during shutdown Fix the following kernel crash: Unable to handle kernel paging request at virtual address ffffffc91e735000 Call trace: __queue_work+0x26c/0x624 queue_work_on+0x6c/0xf0 ufshcd_hold+0x12c/0x210 __ufshcd_wl_suspend+0xc0/0x400 ufshcd_wl_shutdown+0xb8/0xcc device_shutdown+0x184/0x224 kernel_restart+0x4c/0x124 __arm64_sys_reboot+0x194/0x264 el0_svc_common+0xc8/0x1d4 do_el0_svc+0x30/0x8c el0_svc+0x20/0x30 el0_sync_handler+0x84/0xe4 el0_sync+0x1bc/0x1c0 Fix this crash by ungating the clock before destroying the work queue on which clock gating work is queued. Link: https://lore.kernel.org/r/20211203231950.193369-15-bvanassche@acm.org Tested-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:30:34 -05:00
Bart Van Assche	1fbaa02dfd	scsi: ufs: Improve SCSI abort handling further Release resources when aborting a command. Make sure that aborted commands are completed once by clearing the corresponding tag bit from hba->outstanding_reqs. This patch is an improved version of commit `3ff1f6b6ba` ("scsi: ufs: core: Improve SCSI abort handling"). Link: https://lore.kernel.org/r/20211203231950.193369-14-bvanassche@acm.org Fixes: `7a3e97b0dc` ("[SCSI] ufshcd: UFS Host controller driver") Tested-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:30:34 -05:00
Bart Van Assche	6f8dafdee6	scsi: ufs: Introduce ufshcd_release_scsi_cmd() The only functional change in this patch is that scsi_done() is now called after ufshcd_release() and ufshcd_clk_scaling_update_busy() instead of before. The next patch in this series will introduce a call to ufshcd_release_scsi_cmd() in the abort handler. Link: https://lore.kernel.org/r/20211203231950.193369-13-bvanassche@acm.org Tested-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:30:34 -05:00
Bart Van Assche	3eb9dcc027	scsi: ufs: Remove the 'update_scaling' local variable This patch does not change any functionality but makes the next patch in this series easier to read. Link: https://lore.kernel.org/r/20211203231950.193369-12-bvanassche@acm.org Tested-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:30:34 -05:00
Bart Van Assche	511a083b8b	scsi: ufs: Remove hba->cmd_queue The previous patch removed all code that uses hba->cmd_queue. Hence also remove hba->cmd_queue itself. Link: https://lore.kernel.org/r/20211203231950.193369-11-bvanassche@acm.org Suggested-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:30:33 -05:00
Bart Van Assche	945c3cca05	scsi: ufs: Fix a deadlock in the error handler The following deadlock has been observed on a test setup: - All tags allocated - The SCSI error handler calls ufshcd_eh_host_reset_handler() - ufshcd_eh_host_reset_handler() queues work that calls ufshcd_err_handler() - ufshcd_err_handler() locks up as follows: Workqueue: ufs_eh_wq_0 ufshcd_err_handler.cfi_jt Call trace: __switch_to+0x298/0x5d8 __schedule+0x6cc/0xa94 schedule+0x12c/0x298 blk_mq_get_tag+0x210/0x480 __blk_mq_alloc_request+0x1c8/0x284 blk_get_request+0x74/0x134 ufshcd_exec_dev_cmd+0x68/0x640 ufshcd_verify_dev_init+0x68/0x35c ufshcd_probe_hba+0x12c/0x1cb8 ufshcd_host_reset_and_restore+0x88/0x254 ufshcd_reset_and_restore+0xd0/0x354 ufshcd_err_handler+0x408/0xc58 process_one_work+0x24c/0x66c worker_thread+0x3e8/0xa4c kthread+0x150/0x1b4 ret_from_fork+0x10/0x30 Fix this lockup by making ufshcd_exec_dev_cmd() allocate a reserved request. Link: https://lore.kernel.org/r/20211203231950.193369-10-bvanassche@acm.org Tested-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:30:33 -05:00
Bart Van Assche	fc21da8a84	scsi: ufs: Rework ufshcd_change_queue_depth() Prepare for making sdev->host->can_queue less than hba->nutrs. This patch does not change any functionality. Link: https://lore.kernel.org/r/20211203231950.193369-9-bvanassche@acm.org Tested-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:30:33 -05:00

1 2 3 4 5 ...

1058165 Commits