Updates to the usual drivers (qla2xxx, lpfc, ufs, hisi_sas, mpi3mr,
mpt3sas, target); the biggest change (from my biased viewpoint) being
that the mpi3mr now attached to the SAS transport class, making it the
first fusion type device to do so. Beyond the usual bug fixing and
security class reworks, there aren't a huge number of core changes.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCY0B74yYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishW2NAP9CPp2R
7NRmSyxcyVYvtCNUW3WxXh65Gn+KgArmg8XucgEAhUBX1fSjOzpERWEU+UaXitbE
Rb+FbjxSc5YxR+nJ/Qc=
=0Wlp
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"Updates to the usual drivers (qla2xxx, lpfc, ufs, hisi_sas, mpi3mr,
mpt3sas, target). The biggest change (from my biased viewpoint) being
that the mpi3mr now attached to the SAS transport class, making it the
first fusion type device to do so.
Beyond the usual bug fixing and security class reworks, there aren't a
huge number of core changes"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (141 commits)
scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername()
scsi: mpi3mr: Remove unnecessary cast
scsi: stex: Properly zero out the passthrough command structure
scsi: mpi3mr: Update driver version to 8.2.0.3.0
scsi: mpi3mr: Fix scheduling while atomic type bug
scsi: mpi3mr: Scan the devices during resume time
scsi: mpi3mr: Free enclosure objects during driver unload
scsi: mpi3mr: Handle 0xF003 Fault Code
scsi: mpi3mr: Graceful handling of surprise removal of PCIe HBA
scsi: mpi3mr: Schedule IRQ kthreads only on non-RT kernels
scsi: mpi3mr: Support new power management framework
scsi: mpi3mr: Update mpi3 header files
scsi: mpt3sas: Revert "scsi: mpt3sas: Fix ioc->base_readl() use"
scsi: mpt3sas: Revert "scsi: mpt3sas: Fix writel() use"
scsi: wd33c93: Remove dead code related to the long-gone config WD33C93_PIO
scsi: core: Add I/O timeout count for SCSI device
scsi: qedf: Populate sysfs attributes for vport
scsi: pm8001: Replace one-element array with flexible-array member
scsi: 3w-xxxx: Replace one-element array with flexible-array member
scsi: hptiop: Replace one-element array with flexible-array member in struct hpt_iop_request_ioctl_command()
...
Now that libsas and the SCSI core code limits the default sectors from
commit 4cbfca5f77 ("scsi: scsi_transport_sas: cap shost opt_sectors
according to DMA optimal limit") and commit 608128d391 ("scsi: sd: allow
max_sectors be capped at DMA optimal size limit"), there is no need for
the hack to limit the max HW sectors.
Link: https://lore.kernel.org/r/1662378529-101489-2-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since blk_mq_map_queues() and the .map_queues() callbacks always return 0,
change their return type into void. Most callers ignore the returned value
anyway.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20220815170043.19489-3-bvanassche@acm.org
[axboe: fold in fix from Bart]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If the I/O completion response frame returned by the target device has been
written to the host memory and the err bit in the status field of the
received fis is 1, ts->stat should set to SAS_PROTO_RESPONSE, and this will
let EH analyze and further determine cause of failure.
Link: https://lore.kernel.org/r/1657823002-139010-5-git-send-email-john.garry@huawei.com
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently SMP tasks are DMA unmapped only when cq of SMP I/O is returned
normally. If the cq of SMP I/O is returned with exception actually SMP TAS
is never unmapped. Relocate DMA unmap of SMP task to fix the issue.
Link: https://lore.kernel.org/r/1657823002-139010-4-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is duplicated code between slave_configure_v3_hw() and
hisi_sas_slave_configure(), so call common function
hisi_sas_slave_configure() from slave_configure_v3_hw().
Link: https://lore.kernel.org/r/1657823002-139010-2-git-send-email-john.garry@huawei.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If we fail to notify the phy up event then undo the RPM resume, as the phy
up notify event handling pairs with that RPM resume.
Link: https://lore.kernel.org/r/1651839939-101188-1-git-send-email-john.garry@huawei.com
Reported-by: Yihang Li <liyihang6@hisilicon.com>
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use the common libsas internal abort functionality.
In addition, this driver has special handling for internal abort timeouts -
specifically whether to reset the controller in that instance, so extend
the API for that.
Timeout is now increased to 20 * Hz from 6 * Hz.
We also retry for failure now, but this should not make a difference.
Link: https://lore.kernel.org/r/1647001432-239276-5-git-send-email-john.garry@huawei.com
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In case of SSP underflow allow the response frame IU to be examined for
setting the response stat value rather than always setting
SAS_DATA_UNDERRUN.
This will mean that we call sas_ssp_task_response() in those scenarios and
may send sense data to upper layer.
Such a condition would be for bad blocks were we just reporting an
underflow error to upper layer, but now the sense data will tell
immediately that the media is faulty.
Link: https://lore.kernel.org/r/1645703489-87194-7-git-send-email-john.garry@huawei.com
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add a file operation for "cnt" file under bist directory, so users can only
read "cnt" or clear "cnt" to zero, but cannot randomly modify.
Link: https://lore.kernel.org/r/1645703489-87194-6-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the driver probe fails to request the channel IRQ or fatal IRQ, the
driver will free the IRQ vectors before freeing the IRQs in free_irq(),
and this will cause a kernel BUG like this:
------------[ cut here ]------------
kernel BUG at drivers/pci/msi.c:369!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Call trace:
free_msi_irqs+0x118/0x13c
pci_disable_msi+0xfc/0x120
pci_free_irq_vectors+0x24/0x3c
hisi_sas_v3_probe+0x360/0x9d0 [hisi_sas_v3_hw]
local_pci_probe+0x44/0xb0
work_for_cpu_fn+0x20/0x34
process_one_work+0x1d0/0x340
worker_thread+0x2e0/0x460
kthread+0x180/0x190
ret_from_fork+0x10/0x20
---[ end trace b88990335b610c11 ]---
So we use devm_add_action() to control the order in which we free the
vectors.
Link: https://lore.kernel.org/r/1645703489-87194-4-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently the permission of parameter prot_mask is 0x0, which means that
the member does not appear in sysfs. Change it as other module parameters
to 0444 for world-readable.
[mkp: s/v3/v2/]
Link: https://lore.kernel.org/r/1645703489-87194-2-git-send-email-john.garry@huawei.com
Fixes: d6a9000b81 ("scsi: hisi_sas: Add support for DIF feature for v2 hw")
Reported-by: Yihang Li <liyihang6@hisilicon.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some of the LLDDs which use libsas have their own definition of a struct
to hold TMF info, so add a common struct for libsas.
Also add an interim force phy id field for hisi_sas driver, which will be
removed once the STP "TMF" code is factored out.
Even though some LLDDs (pm8001) use a u32 for the tag, u16 will be adequate,
as that named driver only uses tags in range [0, 1024).
Link: https://lore.kernel.org/r/1645112566-115804-8-git-send-email-john.garry@huawei.com
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The controller may frequently enter and exit suspend for each I/O which we
need to deal with. This is inefficient and may cause too much suspend and
resume activity for the controller. To avoid this, use a default 5s
autosuspend for the controller to stop frequently suspending and
resuming. This value may still be modified via sysfs interfaces.
Link: https://lore.kernel.org/r/1639999298-244569-16-git-send-email-chenxiang66@hisilicon.com
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It is possible that controller may become suspended between processing a
phyup interrupt and the event being processed by libsas. As such, we can't
ensure the controller is active when processing the phyup event - this may
cause the phyup event to be lost or other issues. To avoid any possible
issues, add pm_runtime_get_noresume() in phyup interrupt handler and
pm_runtime_put_sync() in the work handler exit to ensure that we stay
always active. Since we only want to call pm_runtime_get_noresume() for v3
hw, signal this will a new event, HISI_PHYE_PHY_UP_PM.
Link: https://lore.kernel.org/r/1639999298-244569-14-git-send-email-chenxiang66@hisilicon.com
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For the hisi_sas driver, if a directly attached disk is removed during
suspend, a hang will occur in the resume process:
The background is that in commit 16fd4a7c59 ("scsi: hisi_sas: Add device
link between SCSI devices and hisi_hba"), it is ensured that the HBA device
cannot be runtime suspended when any SCSI device associated is active.
Other drivers which use libsas don't worry about this as none support
runtime suspend.
The mentioned hang occurs when an disk is removed during suspend. In the
removal process - from PHYE_RESUME_TIMEOUT event processing - we call into
scsi_remove_device(), which is being processed in the HA event workqueue.
Here we wait for all suppliers of the SCSI device to resume, which includes
the HBA device (from the above commit). However the HBA device cannot
resume, as it is waiting for the PHYE_RESUME_TIMEOUT to be processed (from
calling sas_resume_ha() -> sas_drain_work()). This is the deadlock.
There does not appear to be any need for the sas_drain_work() to be called
at all in sas_resume_ha() as it is not syncing against anything, so allow
LLDDs to avoid this by providing a variant of sas_resume_ha() which does
"sync", i.e. doesn't drain the event workqueue.
Link: https://lore.kernel.org/r/1639999298-244569-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The OOB interrupt and phyup interrupt handlers may run out-of-order in high
CPU usage scenarios. Since the hisi_sas_phy.timer is added in
hisi_sas_phy_oob_ready() and disarmed in phy_up_v3_hw(), this out-of-order
execution will cause hisi_sas_phy.timer timeout to trigger.
To solve, protect hisi_sas_phy.timer and .attached with a lock, and ensure
that the timer won't be added after phyup handler completes.
Link: https://lore.kernel.org/r/1639579061-179473-8-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If we issue a controller reset command during executing a FLR a hung task
may be found:
Call trace:
__switch_to+0x158/0x1cc
__schedule+0x2e8/0x85c
schedule+0x7c/0x110
schedule_timeout+0x190/0x1cc
__down+0x7c/0xd4
down+0x5c/0x7c
hisi_sas_task_exec+0x510/0x680 [hisi_sas_main]
hisi_sas_queue_command+0x24/0x30 [hisi_sas_main]
smp_execute_task_sg+0xf4/0x23c [libsas]
sas_smp_phy_control+0x110/0x1e0 [libsas]
transport_sas_phy_reset+0xc8/0x190 [libsas]
phy_reset_work+0x2c/0x40 [libsas]
process_one_work+0x1dc/0x48c
worker_thread+0x15c/0x464
kthread+0x160/0x170
ret_from_fork+0x10/0x18
This is a race condition which occurs when the FLR completes first.
Here the host HISI_SAS_RESETTING_BIT flag out gets of sync as
HISI_SAS_RESETTING_BIT is not always cleared with the hisi_hba.sem held, so
now only set/unset HISI_SAS_RESETTING_BIT under hisi_hba.sem .
Link: https://lore.kernel.org/r/1639579061-179473-7-git-send-email-john.garry@huawei.com
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This series consists of the usual driver updates (ufs, smartpqi, lpfc,
target, megaraid_sas, hisi_sas, qla2xxx) and minor updates and bug
fixes. Notable core changes are the removal of scsi->tag which caused
some churn in obsolete drivers and a sweep through all drivers to call
scsi_done() directly instead of scsi->done() which removes a pointer
indirection from the hot path and a move to register core sysfs files
earlier, which means they're available to KOBJ_ADD processing, which
necessitates switching all drivers to using attribute groups.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYYUfBCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishbUJAQDZt4oc
vUx9JpyrdHxxTCuOzVFd8W1oJn0k5ltCBuz4yAD8DNbGhGm93raMSJ3FOOlzLEbP
RG8vBdpxMudlvxAPi/A=
=BSFz
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This consists of the usual driver updates (ufs, smartpqi, lpfc,
target, megaraid_sas, hisi_sas, qla2xxx) and minor updates and bug
fixes.
Notable core changes are the removal of scsi->tag which caused some
churn in obsolete drivers and a sweep through all drivers to call
scsi_done() directly instead of scsi->done() which removes a pointer
indirection from the hot path and a move to register core sysfs files
earlier, which means they're available to KOBJ_ADD processing, which
necessitates switching all drivers to using attribute groups"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits)
scsi: lpfc: Update lpfc version to 14.0.0.3
scsi: lpfc: Allow fabric node recovery if recovery is in progress before devloss
scsi: lpfc: Fix link down processing to address NULL pointer dereference
scsi: lpfc: Allow PLOGI retry if previous PLOGI was aborted
scsi: lpfc: Fix use-after-free in lpfc_unreg_rpi() routine
scsi: lpfc: Correct sysfs reporting of loop support after SFP status change
scsi: lpfc: Wait for successful restart of SLI3 adapter during host sg_reset
scsi: lpfc: Revert LOG_TRACE_EVENT back to LOG_INIT prior to driver_resource_setup()
scsi: ufs: ufshcd-pltfrm: Fix memory leak due to probe defer
scsi: ufs: mediatek: Avoid sched_clock() misuse
scsi: mpt3sas: Make mpt3sas_dev_attrs static
scsi: scsi_transport_sas: Add 22.5 Gbps link rate definitions
scsi: target: core: Stop using bdevname()
scsi: aha1542: Use memcpy_{from,to}_bvec()
scsi: sr: Add error handling support for add_disk()
scsi: sd: Add error handling support for add_disk()
scsi: target: Perform ALUA group changes in one step
scsi: target: Replace lun_tg_pt_gp_lock with rcu in I/O path
scsi: target: Fix alua_tg_pt_gps_count tracking
scsi: target: Fix ordered tag handling
...
Drop various include not actually used in blkdev.h itself.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210920123328.1399408-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
struct device supports attribute groups directly but does not support
struct device_attribute directly. Hence switch to attribute groups.
Link: https://lore.kernel.org/r/20211012233558.4066756-21-bvanassche@acm.org
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When issuing a hardreset/linkreset/phy_set_linkrate from sysfs, the phy
will be disabled and re-enabled for the directly attached scenario.
It takes some time for the phy to come back up after re-enabling the phy.
If the controller becomes suspended while waiting for the phy to come back,
the phy up may be lost (along with the disk).
To solve this problem, wait for the phy up to occur with a timeout. Indeed
this is already done in hisi_sas_debug_I_T_nexus_reset() for local phys, so
just relocate the functionality to hisi_sas_control_phy().
Since the HA workqueue is drained when suspending the controller, and the
phy control function is called from the same workqueue, we can guarantee
that the controller will not be suspended during this period.
Link: https://lore.kernel.org/r/1634041588-74824-3-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Perform driver-specific SCSI device initialization in the designated SCSI
midlayer callback instead of relying on the libsas "device found" callback.
The SCSI midlayer .slave_alloc interface is called prior to sending any I/O
to the device.
Link: https://lore.kernel.org/r/1634041588-74824-2-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The hisi_hba debugfs_dump_index member should increased after a dump
insertion completed, and not before it has started, so fix the code to do
so.
Link: https://lore.kernel.org/r/1629799260-120116-6-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some usage of del_timer() in the driver is potentially unsafe.
When running the sas_task->slow_task timer in
hisi_sas_exec_internal_tmf_task(), execution may be blocked in function
hisi_sas_task_exec(); so it is possible that the timer is running when the
callback to disable the timer is running. This could be dangerous, as we
immediately release resources which the timer callback uses after disabling
the timer. The same situation may be found at other sites, such as
_hisi_sas_internal_task_abort().
Change calls to del_timer() to del_timer_sync() as necessary, to ensure any
timer has finished when disabling.
Also remove calls to timer_pending() prior to del_timer() as it is not
necessary.
Link: https://lore.kernel.org/r/1629799260-120116-5-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
HISI_SAS_RESET_BIT means that the controller is being reset, and so the
name is a bit vague. Rename it to HISI_SAS_RESETTING_BIT.
Link: https://lore.kernel.org/r/1629799260-120116-4-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use managed PCI functions such as pcim_enable_device() and
pcim_iomap_regions() to simplify exception handling code.
Link: https://lore.kernel.org/r/1629799260-120116-2-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Prepare for removal of the request pointer by using scsi_cmd_to_rq()
instead. This patch does not change any functionality.
Link: https://lore.kernel.org/r/20210809230355.8186-22-bvanassche@acm.org
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is a set of minor fixes and clean ups in the core and various
drivers. The only core change in behaviour is the I/O retry for
spinup notify, but that shouldn't impact anything other than the
failing case.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYOqWWyYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishYBtAQCpqVdl
Axi1SpD6/UuKOgRmboWscoKD8FLHwvLDMRyCRQEAnLu3XdB9HcQrwZOkTG14vrfB
q2XB5cP4XAITxFLN1qo=
=9AO9
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"This is a set of minor fixes and clean ups in the core and various
drivers.
The only core change in behaviour is the I/O retry for spinup notify,
but that shouldn't impact anything other than the failing case"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (23 commits)
scsi: virtio_scsi: Add validation for residual bytes from response
scsi: ipr: System crashes when seeing type 20 error
scsi: core: Retry I/O for Notify (Enable Spinup) Required error
scsi: mpi3mr: Fix warnings reported by smatch
scsi: qedf: Add check to synchronize abort and flush
scsi: MAINTAINERS: Add mpi3mr driver maintainers
scsi: libfc: Fix array index out of bound exception
scsi: mvsas: Use DEVICE_ATTR_RO()/RW() macro
scsi: megaraid_mbox: Use DEVICE_ATTR_ADMIN_RO() macro
scsi: qedf: Use DEVICE_ATTR_RO() macro
scsi: qedi: Use DEVICE_ATTR_RO() macro
scsi: message: mptfc: Switch from pci_ to dma_ API
scsi: be2iscsi: Fix some missing space in some messages
scsi: be2iscsi: Fix an error handling path in beiscsi_dev_probe()
scsi: ufs: Fix build warning without CONFIG_PM
scsi: bnx2fc: Remove meaningless bnx2fc_abts_cleanup() return value assignment
scsi: qla2xxx: Add heartbeat check
scsi: virtio_scsi: Do not overwrite SCSI status
scsi: libsas: Add LUN number check in .slave_alloc callback
scsi: core: Inline scsi_mq_alloc_queue()
...
This series consists of the usual driver updates (ufs, ibmvfc,
megaraid_sas, lpfc, elx, mpi3mr, qedi, iscsi, storvsc, mpt3sas) with
elx and mpi3mr being new drivers. The major core change is a rework
to drop the status byte handling macros and the old bit shifted
definitions and the rest of the updates are minor fixes.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYN7I6iYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishXpRAQCkngYZ
35yQrqOxgOk2pfrysE95tHrV1MfJm2U49NFTwAEAuZutEvBUTfBF+sbcJ06r6q7i
H0hkJN/Io7enFs5v3WA=
=zwIa
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This series consists of the usual driver updates (ufs, ibmvfc,
megaraid_sas, lpfc, elx, mpi3mr, qedi, iscsi, storvsc, mpt3sas) with
elx and mpi3mr being new drivers.
The major core change is a rework to drop the status byte handling
macros and the old bit shifted definitions and the rest of the updates
are minor fixes"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (287 commits)
scsi: aha1740: Avoid over-read of sense buffer
scsi: arcmsr: Avoid over-read of sense buffer
scsi: ips: Avoid over-read of sense buffer
scsi: ufs: ufs-mediatek: Add missing of_node_put() in ufs_mtk_probe()
scsi: elx: libefc: Fix IRQ restore in efc_domain_dispatch_frame()
scsi: elx: libefc: Fix less than zero comparison of a unsigned int
scsi: elx: efct: Fix pointer error checking in debugfs init
scsi: elx: efct: Fix is_originator return code type
scsi: elx: efct: Fix link error for _bad_cmpxchg
scsi: elx: efct: Eliminate unnecessary boolean check in efct_hw_command_cancel()
scsi: elx: efct: Do not use id uninitialized in efct_lio_setup_session()
scsi: elx: efct: Fix error handling in efct_hw_init()
scsi: elx: efct: Remove redundant initialization of variable lun
scsi: elx: efct: Fix spelling mistake "Unexected" -> "Unexpected"
scsi: lpfc: Fix build error in lpfc_scsi.c
scsi: target: iscsi: Remove redundant continue statement
scsi: qla4xxx: Remove redundant continue statement
scsi: ppa: Switch to use module_parport_driver()
scsi: imm: Switch to use module_parport_driver()
scsi: mpt3sas: Fix error return value in _scsih_expander_add()
...
Offlining a SATA device connected to a hisi SAS controller and then
scanning the host will result in detecting 255 non-existent devices:
# lsscsi
[2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda
[2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb
[2:0:2:0] disk SEAGATE ST600MM0006 B001 /dev/sdc
# echo "offline" > /sys/block/sdb/device/state
# echo "- - -" > /sys/class/scsi_host/host2/scan
# lsscsi
[2:0:0:0] disk ATA Samsung SSD 860 2B6Q /dev/sda
[2:0:1:0] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdb
[2:0:1:1] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdh
...
[2:0:1:255] disk ATA WDC WD2003FYYS-3 1D01 /dev/sdjb
After a REPORT LUN command issued to the offline device fails, the SCSI
midlayer tries to do a sequential scan of all devices whose LUN number is
not 0. However, SATA does not support LUN numbers at all.
Introduce a generic sas_slave_alloc() handler which will return -ENXIO for
SATA devices if the requested LUN number is larger than 0 and make libsas
drivers use this function as their .slave_alloc callback.
Link: https://lore.kernel.org/r/20210622034037.1467088-1-yuyufen@huawei.com
Reported-by: Wu Bo <wubo40@huawei.com>
Suggested-by: John Garry <john.garry@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch prepares for converting SAM status codes into an enum. Without
this patch converting SAM status codes into an enumeration type would
trigger complaints about enum type mismatches for the SAS code.
Link: https://lore.kernel.org/r/20210524025457.11299-2-bvanassche@acm.org
Cc: Hannes Reinecke <hare@suse.com>
Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Cc: Jason Yan <yanaijie@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
irqs allocated with devm_request_irq() should not be freed using
free_irq(). Doing so causes a dangling pointer and a subsequent double
free.
Link: https://lore.kernel.org/r/20210519130519.2661938-1-yangyingliang@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If a channel interrupt occurs without any status bit set, the handler will
return directly. However, if such redundant interrupts are received, it's
better to check what happen, so add logs for this.
Link: https://lore.kernel.org/r/1617709711-195853-6-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: Yihang Li <liyihang6@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The debugfs snapshot should be executed before the reset occurs to ensure
that the register contents are saved properly.
As such, it is incorrect to queue the debugfs dump when running a reset as
the reset will occur prior to the snapshot work item is handler.
Therefore, directly snapshot registers in the reset work handler.
Link: https://lore.kernel.org/r/1617709711-195853-5-git-send-email-john.garry@huawei.com
Signed-off-by: Jianqin Xie <xiejianqin@hisilicon.com>
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Function sas_unregister_ha() needs to be called to roll back if
hisi_hba->hw->hw_init() fails in function hisi_sas_probe() or
hisi_sas_v3_probe(). Make that change.
Link: https://lore.kernel.org/r/1617709711-195853-4-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
To help debugging efforts, print the device SAS address for v3 hw erroneous
completion log.
Here is an example print:
hisi_sas_v3_hw 0000:b4:02.0: erroneous completion iptt=2193 task=000000002b0c13f8 dev id=17 addr=570fd45f9d17b001
Link: https://lore.kernel.org/r/1617709711-195853-3-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The controller provides trace FIFO DFX tool to assist link fault debugging
and link optimization. This tool can be helpful when debugging link faults
without SAS analyzers. Each PHY has an independent trace FIFO interface.
The user can configure the trace FIFO tool of one PHY by using the
following six interfaces:
signal_sel: select signal group applies to different scenarios.
0x0: linkrate negotiation
0x1: Host 12G TX train
0x2: Disk 12G TX train
0x3: SAS PHY CTRL DFX 0
0x4: SAS PHY CTRL DFX 1
0x5: SAS PCS DFX
other: linkrate negotiation
dump_mask: The masked hardware status bit will not be updated.
dump_mode: determines how to dump data after trigger signal is generated.
0x0: dump forever
0x1: dump 32 data after trigger signal is generated
0x2: no more dump after trigger signal is generated
trigger_mode: determines the trigger mode, level or edge.
0x0: dump when trigger signal changed
0x1: dump when trigger signal's level equal to trigger_level
0x2: dump when trigger signal's level different from trigger_level
trigger_level: determines the trigger level.
trigger_msk: mask trigger signal
The user can get 32-byte values from hardware by reading the rd_data.
These values consitute the status record of the hardware at different time
points.
Link: https://lore.kernel.org/r/1611659068-131975-6-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the controller reset occurs at the same time as driver removal, it may
be possible that the interrupts have been released prior to the host
softreset, and calling pci_irq_vector() there causes a WARN:
WARNING: CPU: 37 PID: 1542 /pci/msi.c:1275 pci_irq_vector+0xc0/0xd0
Call trace:
pci_irq_vector+0xc0/0xd0
disable_host_v3_hw+0x58/0x5b0 [hisi_sas_v3_hw]
soft_reset_v3_hw+0x40/0xc0 [hisi_sas_v3_hw]
hisi_sas_controller_reset+0x150/0x260 [hisi_sas_main]
hisi_sas_rst_work_handler+0x3c/0x58 [hisi_sas_main]
To fix, flush the driver workqueue prior to releasing the interrupts to
ensure any resets have been completed.
Link: https://lore.kernel.org/r/1611659068-131975-5-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
libsas event notifiers required an extension where gfp_t flags must be
explicitly passed. For bisectability, a temporary _gfp() variant of such
functions were added. All call sites then got converted use the _gfp()
variants and explicitly pass GFP context. Having no callers left, the
original libsas notifiers were then modified to accept gfp_t flags by
default.
Switch back to the original libas API, while still passing GFP context.
The libsas _gfp() variants will be removed afterwards.
Link: https://lore.kernel.org/r/20210118100955.1761652-14-a.darwish@linutronix.de
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use the new libsas event notifiers API, which requires callers to
explicitly pass the gfp_t memory allocation flags.
Below are the context analysis for modified functions:
=> hisi_sas_bytes_dmaed():
Since it is invoked from both process and atomic contexts, let its callers
pass the gfp_t flags:
* hisi_sas_main.c:
------------------
hisi_sas_phyup_work(): workqueue context
-> hisi_sas_bytes_dmaed(..., GFP_KERNEL)
hisi_sas_controller_reset_done(): has an msleep()
-> hisi_sas_rescan_topology()
-> hisi_sas_phy_down()
-> hisi_sas_bytes_dmaed(..., GFP_KERNEL)
hisi_sas_debug_I_T_nexus_reset(): calls wait_for_completion_timeout()
-> hisi_sas_phy_down()
-> hisi_sas_bytes_dmaed(..., GFP_KERNEL)
* hisi_sas_v1_hw.c:
-------------------
int_abnormal_v1_hw(): irq handler
-> hisi_sas_phy_down()
-> hisi_sas_bytes_dmaed(..., GFP_ATOMIC)
* hisi_sas_v[23]_hw.c:
----------------------
int_phy_updown_v[23]_hw(): irq handler
-> phy_down_v[23]_hw()
-> hisi_sas_phy_down()
-> hisi_sas_bytes_dmaed(..., GFP_ATOMIC)
=> int_bcast_v1_hw() and phy_bcast_v3_hw():
Both are invoked exclusively from irq handlers. Pass GFP_ATOMIC.
Link: https://lore.kernel.org/r/20210118100955.1761652-12-a.darwish@linutronix.de
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>