In scsi_mq_setup_tags(), cmd_size is calculated based on zero size for the
scatter-gather list in case the low level driver uses SG_NONE in its host
template.
cmd_size is passed on to the block layer for calculation of the request
size, and we've seen NULL pointer dereference errors from the block layer
in drivers where SG_NONE is used and a mq IO scheduler is active,
apparently as a consequence of this (see commit 68ab2d76e4 ("scsi:
cxlflash: Set sg_tablesize to 1 instead of SG_NONE"), and a recent patch by
Finn Thain converting the three m68k NFR5380 drivers to avoid setting
SG_NONE).
Try to avoid these errors by accounting for at least one sg list entry when
calculating cmd_size, regardless of whether the low level driver set a zero
sg_tablesize.
Tested on 030 m68k with the atari_scsi driver - setting sg_tablesize to
SG_NONE no longer results in a crash when loading this driver.
CC: Finn Thain <fthain@telegraphics.com.au>
Link: https://lore.kernel.org/r/1572922150-4358-1-git-send-email-schmitzmic@gmail.com
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
According to SBC-2 a TRANSFER LENGTH field of zero means that 256 logical
blocks must be transferred. Make the SCSI tracing code follow SBC-2.
Fixes: bf81623542 ("[SCSI] add scsi trace core functions and put trace points")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Link: https://lore.kernel.org/r/20191105215553.185018-1-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update lpfc version to 12.6.0.1
Link: https://lore.kernel.org/r/20191105005708.7399-12-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some adapters support the ability to hold multiple adapter dumps on the
adapter flash. Some adapters default to enabling this feature while others
default to single-dump.
Make support uniform by enabling dual dump by default.
Link: https://lore.kernel.org/r/20191105005708.7399-11-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current driver attempts to allocate an interrupt vector per cpu using
the systems managed IRQ allocator (flag PCI_IRQ_AFFINITY). The system IRQ
allocator will either provide the per-cpu vector, or return fewer
vectors. When fewer vectors, they are evenly spread between the numa nodes
on the system. When run on an AMD architecture, if interrupts occur to a
cpu that is not in the same numa node as the adapter generating the
interrupt, there are extreme costs and overheads in performance. Thus, if
1:1 vector allocation is used, or the "balanced" vectors in the other numa
nodes, performance can be hit significantly.
A much more performant model is to allocate interrupts only on the cpus
that are in the numa node where the adapter resides. I/O completion is
still performed by the cpu where the I/O was generated. Unfortunately,
there is no flag to request the managed IRQ subsystem allocate vectors only
for the CPUs in the numa node as the adapter.
On AMD architecture, revert the irq allocation to the normal style
(non-managed) and then use irq_set_affinity_hint() to set the cpu
affinity and disable user-space rebalancing.
Tie the support into CPU offline/online. If the cpu being offlined owns a
vector, the vector is re-affinitized to one of the other CPUs on the same
numa node. If there are no more CPUs on the numa node, the vector has all
affinity removed and lets the system determine where it's serviced.
Similarly, when the cpu that owned a vector comes online, the vector is
reaffinitized to the cpu.
Link: https://lore.kernel.org/r/20191105005708.7399-10-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The recent affinitization didn't address cpu offlining/onlining. If an
interrupt vector is shared and the low order cpu owning the vector is
offlined, as interrupts are managed, the vector is taken offline. This
causes the other CPUs sharing the vector will hang as they can't get io
completions.
Correct by registering callbacks with the system for Offline/Online
events. When a cpu is taken offline, its eq, which is tied to an interrupt
vector is found. If the cpu is the "owner" of the vector and if the
eq/vector is shared by other CPUs, the eq is placed into a polled mode.
Additionally, code paths that perform io submission on the "sharing CPUs"
will check the eq state and poll for completion after submission of new io
to a wq that uses the eq.
Similarly, when a cpu comes back online and owns an offlined vector, the eq
is taken out of polled mode and rearmed to start driving interrupts for eq.
Link: https://lore.kernel.org/r/20191105005708.7399-9-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Current message on FAWWN events is rather cryptic.
Expand the message to clarify its meaning.
Link: https://lore.kernel.org/r/20191105005708.7399-8-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Prior to the last FC-NVME-2 draft, SLER and CONF were independent. SLER
now requires CONF to be set.
Revise the NVME PRLI checking to look for both inorder to enable SLER.
Link: https://lore.kernel.org/r/20191105005708.7399-7-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The recently posted patch had a typo that incorrectly tested the receiving
function.
Fix the typo (change == to !=)
Fixes: 95bfc6d8ad ("scsi: lpfc: Make FW logging dynamically configurable")
Link: https://lore.kernel.org/r/20191105005708.7399-6-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During heavy RCN activity and log_verbose = 0 we see these messages:
2754 PRLI failure DID:521245 Status:x9/xb2c00, data: x0
0231 RSCN timeout Data: x0 x3
0230 Unexpected timeout, hba link state x5
This is due to delayed RSCN activity.
Correct by avoiding the timeout thus the messages by restarting the
discovery timeout whenever an rscn is received.
Filter PRLI responses such that severity depends on whether expected for
the configuration or not. For example, PRLI errors on a fabric will be
informational (they are expected), but Point-to-Point errors are not
necessarily expected so they are raised to an error level.
Link: https://lore.kernel.org/r/20191105005708.7399-5-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When reading sysfs nvme_info file while a remote port leaves and comes
back, a NULL pointer is encountered. The issue is due to ndlp list
corruption as the the nvme_info_show does not use the same lock as the rest
of the code.
Correct by removing the rcu_xxx_lock calls and replace by the host_lock and
phba->hbaLock spinlocks that are used by the rest of the driver. Given
we're called from sysfs, we are safe to use _irq rather than _irqsave.
Link: https://lore.kernel.org/r/20191105005708.7399-4-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver today is reading service parameters from the firmware and then
overwriting the firmware-provided values with values of its own. There are
some switch features that require preliminary FLOGI's that are
switch-specific and done prior to the actual fabric FLOGI for traffic. The
fw will perform those FLOGIs and will revise the service parameters for the
features configured. As the driver later overwrites those values with its
own values, it misconfigures things like BBSCN use by doing so.
Correct by eliminating the driver-overwrite of firmware values. The driver
correctly re-reads the service parameters after each link up to obtain the
latest values from firmware.
Link: https://lore.kernel.org/r/20191105005708.7399-3-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the driver receives a login that is later then LOGO'd by the remote port
(aka ndlp), the driver, upon the completion of the LOGO ACC transmission,
will logout the node and unregister the rpi that is being used for the
node. As part of the unreg, the node's rpi value is replaced by the
LPFC_RPI_ALLOC_ERROR value. If the port is subsequently offlined, the
offline walks the nodes and ensures they are logged out, which possibly
entails unreg'ing their rpi values. This path does not validate the node's
rpi value, thus doesn't detect that it has been unreg'd already. The
replaced rpi value is then used when accessing the rpi bitmask array which
tracks active rpi values. As the LPFC_RPI_ALLOC_ERROR value is not a valid
index for the bitmask, it may fault the system.
Revise the rpi release code to detect when the rpi value is the replaced
RPI_ALLOC_ERROR value and ignore further release steps.
Link: https://lore.kernel.org/r/20191105005708.7399-2-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
simply not needed there - neither sg_new_read() nor sg_new_write() need
it.
Link: https://lore.kernel.org/r/20191017193925.25539-8-viro@ZenIV.linux.org.uk
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Just use plain copy_from_user() and get_user(). Note that while a
buf-derived pointer gets stored into ->dxferp, all places that actually use
the resulting value feed it either to import_iovec() or to
import_single_range(), and both will do validation.
Link: https://lore.kernel.org/r/20191017193925.25539-7-viro@ZenIV.linux.org.uk
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use copy_..._user() instead, both in sg_read() and in sg_read_oxfer(). And
don't open-code memdup_user()...
Link: https://lore.kernel.org/r/20191017193925.25539-6-viro@ZenIV.linux.org.uk
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
... just use copy_from_user(). We copy only SZ_SG_IO_HDR bytes, so that
would, strictly speaking, loosen the check. However, for call chains via
->write() the caller has actually checked the entire range and SG_IO passes
exactly SZ_SG_IO_HDR for count. So no visible behaviour changes happen if
we check only what we really need for copyin.
Link: https://lore.kernel.org/r/20191017193925.25539-5-viro@ZenIV.linux.org.uk
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We don't need to allocate a temporary buffer and read the entire structure
in it, only to fetch a single field and free what we'd allocated. Just use
get_user() and be done with it...
Link: https://lore.kernel.org/r/20191017193925.25539-4-viro@ZenIV.linux.org.uk
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
First of all, __put_user() can fail with access_ok() succeeding. And
access_ok() + __copy_to_user() is spelled copy_to_user()...
__put_user() *can* fail with access_ok() succeeding...
Link: https://lore.kernel.org/r/20191017193925.25539-1-viro@ZenIV.linux.org.uk
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The memory chunk io_req is released by mempool_free. Accessing
io_req->start_time will result in a use after free bug. The variable
start_time is a backup of the timestamp. So, use start_time here to
avoid use after free.
Link: https://lore.kernel.org/r/1572881182-37664-1-git-send-email-bianpan2016@163.com
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix two issues with commit f5187b7d1a ("scsi: qla2xxx: Optimize NPIV
tear down process"): a missing negation in a wait_event_timeout()
condition, and a missing loop end condition.
Fixes: f5187b7d1a ("scsi: qla2xxx: Optimize NPIV tear down process")
Link: https://lore.kernel.org/r/20191105145550.10268-1-martin.wilck@suse.com
Signed-off-by: Martin Wilck <mwilck@suse.com>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The ILLEGAL REQUEST/INVALID FIELD IN CDB error generated by an attempt to
reset a conventional zone does not apply to the reset write pointer command
with the ALL bit set, that is, to REQ_OP_ZONE_RESET_ALL requests. Fix
sd_zbc_complete() to be quiet only in the case of REQ_OP_ZONE_RESET,
excluding REQ_OP_ZONE_RESET_ALL.
Since REQ_OP_ZONE_RESET is the only request handled by sd_zbc_complete(),
also simplify the code using a simple if statement.
[mkp: applied by hand]
Fixes: d81e9d4943 ("scsi: implement REQ_OP_ZONE_RESET_ALL")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191027140549.26272-4-damien.lemoal@wdc.com
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Nine changes, eight in drivers [ufs, target, lpfc x 2, qla2xxx x 4]
and one core change in sd that fixes an I/O failure on DIF type 3
devices.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXbzO+iYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishYOpAP9/BCSY
2TAFlli2rVQe+ZNjhHcE4Gj92HNPO7ZgvDQvWgD9F184tjG+1pntYGFutoso7Ak6
QimtBw4AuYg9eDKJDKU=
=bQRX
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Nine changes, eight in drivers [ufs, target, lpfc x 2, qla2xxx x 4]
and one core change in sd that fixes an I/O failure on DIF type 3
devices"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: stop timer in shutdown path
scsi: sd: define variable dif as unsigned int instead of bool
scsi: target: cxgbit: Fix cxgbit_fw4_ack()
scsi: qla2xxx: Fix partial flash write of MBI
scsi: qla2xxx: Initialized mailbox to prevent driver load failure
scsi: lpfc: Honor module parameter lpfc_use_adisc
scsi: ufs-bsg: Wake the device before sending raw upiu commands
scsi: lpfc: Check queue pointer before use
scsi: qla2xxx: fixup incorrect usage of host_byte
This patch fixes an unintended sign extension on left shifts. From Colin
King: "Shifting a u8 left will cause the value to be promoted to an
integer. If the top bit of the u8 is set then the following conversion to
an u64 will sign extend the value causing the upper 32 bits to be set in
the result."
Fix this by using get_unaligned_be*() instead.
Fixes: bf81623542 ("[SCSI] add scsi trace core functions and put trace points")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Link: https://lore.kernel.org/r/20191101211447.187151-1-bvanassche@acm.org
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Only csio_hw_free() calling csio_dfs_destroy() and it is not checking
return value. So remove the return from csio_dfs_destroy().
Link: https://lore.kernel.org/r/20191028194234.GA27848@saurav
Signed-off-by: Saurav Girepunje <saurav.girepunje@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
debugfs_remove_recursive() has taken the null pointer into account. Remove
the null check before debugfs_remove_recursive().
Link: https://lore.kernel.org/r/20191026195625.GA22455@saurav
Signed-off-by: Saurav Girepunje <saurav.girepunje@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Replace assignment of 0 to pointer with NULL assignment.
Link: https://lore.kernel.org/r/20191025135010.GA6191@saurav
Signed-off-by: Saurav Girepunje <saurav.girepunje@gmail.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It isn't necessary to check the host depth in scsi_queue_rq() any more
since it has been respected by blk-mq before calling scsi_queue_rq() via
getting driver tag.
Lots of LUNs may attach to same host and per-host IOPS may reach millions,
so we should avoid expensive atomic operations on the host-wide counter in
the IO path.
This patch implements scsi_host_busy() via blk_mq_tagset_busy_iter() with
one scsi command state for reading the count of busy IOs for scsi_mq.
It is observed that IOPS is increased by 15% in IO test on scsi_debug (32
LUNs, 32 submit queues, 1024 can_queue, libaio/dio) in a dual-socket
system.
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20191025065855.6309-1-ming.lei@redhat.com
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Declare all variables that hold dev_cmd_type values as an enum instead of
as an int.
Cc: Yaniv Gardi <ygardi@codeaurora.org>
Cc: Subhash Jadavani <subhashj@codeaurora.org>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Link: https://lore.kernel.org/r/20191029230710.211926-3-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix the following three kernel-doc warnings:
drivers/scsi/ufs/ufs_bsg.c:165: warning: Function parameter or member 'hba' not described in 'ufs_bsg_remove'
drivers/scsi/ufs/ufshcd.c:5789: warning: Function parameter or member 'cmd_type' not described in 'ufshcd_issue_devman_upiu_cmd'
drivers/scsi/ufs/ufshcd.c:5789: warning: Excess function parameter 'msgcode' description in 'ufshcd_issue_devman_upiu_cmd'
Cc: Yaniv Gardi <ygardi@codeaurora.org>
Cc: Subhash Jadavani <subhashj@codeaurora.org>
Cc: Stanley Chu <stanley.chu@mediatek.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Link: https://lore.kernel.org/r/20191029230710.211926-2-bvanassche@acm.org
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is no need to call ufshcd_def_desc_sizes() in ufshcd_init(), since
descriptor lengths will be checked and initialized later in
ufshcd_init_desc_sizes().
Fixes: a4b0e8a4e92b1b(scsi: ufs: Factor out ufshcd_read_desc_param)
Link: https://lore.kernel.org/r/BN7PR08MB5684A3ACE214C3D4792CE729DB610@BN7PR08MB5684.namprd08.prod.outlook.com
Signed-off-by: Bean Huo <beanhuo@micron.com>
Acked-by: Avri Altman <avri.altman.wdc.com>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull RCU and LKMM changes from Paul E. McKenney:
- Documentation updates.
- Miscellaneous fixes.
- Dynamic tick (nohz) updates, perhaps most notably changes to
force the tick on when needed due to lengthy in-kernel execution
on CPUs on which RCU is waiting.
- Replace rcu_swap_protected() with rcu_prepace_pointer().
- Torture-test updates.
- Linux-kernel memory consistency model updates.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This commit replaces the use of rcu_swap_protected() with the more
intuitively appealing rcu_replace_pointer() as a step towards removing
rcu_swap_protected().
Link: https://lore.kernel.org/lkml/CAHk-=wiAsJLw1egFEE=Z7-GGtM6wcvtyytXZA1+BHqta4gg6Hw@mail.gmail.com/
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
[ paulmck: From rcu_replace() to rcu_replace_pointer() per Ingo Molnar. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: <linux-scsi@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
Fix sparse warning:
drivers/scsi/lpfc/lpfc_debugfs.c:2083:1: warning:
symbol 'lpfc_debugfs_ras_log_data' was not declared. Should it be static?
Link: https://lore.kernel.org/r/20191028132556.16272-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In shutdown/reboot paths, the timer is not stopped:
qla2x00_shutdown
pci_device_shutdown
device_shutdown
kernel_restart_prepare
kernel_restart
sys_reboot
This causes lockups (on powerpc) when firmware config space access calls
are interrupted by smp_send_stop later in reboot.
Fixes: e30d175648 ("[SCSI] qla2xxx: Addition of shutdown callback handler.")
Link: https://lore.kernel.org/r/20191024063804.14538-1-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
mempool_destroy has taken null pointer check into account. Remove the
redundant check.
Link: https://lore.kernel.org/r/20191026194712.GA22249@saurav
Signed-off-by: Saurav Girepunje <saurav.girepunje@gmail.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
convert MAGIC_NUMER_xxx to MAGIC_NUMBER_xxx
Link: https://lore.kernel.org/r/20191025184342.6623-1-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
lpfc_debufs.c was missing include of vmalloc.h when compiled on PPC.
Add missing header.
Link: https://lore.kernel.org/r/20191025182530.26653-1-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Nine changes, eight to drivers (qla2xxx, hpsa, lpfc, alua, ch,
53c710[x2], target) and one core change that tries to close a race
between sysfs delete and module removal.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXbN1gSYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishWUzAP4tB9Z+
X5zfnMLmeAtSCnVwIgFX3/GVSFfzEmi+3VxfBQEA3nfs5AAJCPsaTk9z+jLtAKPk
6uYoHwsyTHal19Ojt9g=
=IOPn
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Nine changes, eight to drivers (qla2xxx, hpsa, lpfc, alua, ch,
53c710[x2], target) and one core change that tries to close a race
between sysfs delete and module removal"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: remove left-over BUILD_NVME defines
scsi: core: try to get module before removing device
scsi: hpsa: add missing hunks in reset-patch
scsi: target: core: Do not overwrite CDB byte 1
scsi: ch: Make it possible to open a ch device multiple times again
scsi: fix kconfig dependency warning related to 53C700_LE_ON_BE
scsi: sni_53c710: fix compilation error
scsi: scsi_dh_alua: handle RTPG sense code correctly during state transitions
scsi: qla2xxx: fix a potential NULL pointer dereference
The number of phy down reflects the quality of the link between SAS
controller and disk. In order to allow the user to confirm the link quality
of the system, we record the number of phy down for each phy.
The user can check the current phy down count by reading the debugfs file
corresponding to the specific phy, or clear the phy down count by writing 0
to the debugfs file.
Link: https://lore.kernel.org/r/1571926105-74636-19-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Although if the debugfs initialization fails, we will delete the debugfs
folder of hisi_sas, but we did not consider the scenario where debugfs was
successfully initialized, but the probe failed for other reasons. We found
out that hisi_sas folder is still remain after the probe failed.
When probe fail, we should delete debugfs folder to avoid the above issue.
Link: https://lore.kernel.org/r/1571926105-74636-18-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We use the module parameter debugfs_dump_count to manage the upper limit of
the memory block for multiple dumps.
Link: https://lore.kernel.org/r/1571926105-74636-17-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We add multiple dumps for debugfs, but only allocate memory this time and
only dump #0.
Link: https://lore.kernel.org/r/1571926105-74636-15-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Create a file structure which was used to save the memory address for
ITCT cache at debugfs. This structure is bound to the corresponding debugfs
file, it can help callback function of debugfs file to get what it needs.
Link: https://lore.kernel.org/r/1571926105-74636-14-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Create a file structure which was used to save the memory address for IOST
cache at debugfs. This structure is bound to the corresponding debugfs
file, it can help callback function of debugfs file to get what it needs.
Link: https://lore.kernel.org/r/1571926105-74636-13-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Create a file structure which was used to save the memory address for ITCT
at debugfs. This structure is bound to the corresponding debugfs file, it
can help callback function of debugfs file to get what it needs.
Link: https://lore.kernel.org/r/1571926105-74636-12-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Create a file structure which was used to save the memory address for IOST
at debugfs. This structure is bound to the corresponding debugfs file, it
can help callback function of debugfs file to get what it needs.
Link: https://lore.kernel.org/r/1571926105-74636-11-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Create a file structure which was used to save the memory address and phy
pointer for port at debugfs. This structure is bound to the corresponding
debugfs file, it can help callback function of debugfs file to get what it
need.
Link: https://lore.kernel.org/r/1571926105-74636-10-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Create a file structure which was used to save the memory address and
hisi_hba pointer for REGS at debugfs. This structure is bound to the
corresponding debugfs file, it can help callback function of debugfs file
to get what it need.
Link: https://lore.kernel.org/r/1571926105-74636-9-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Create a file structure which was used to save the memory address and DQ
pointer for DQ at debugfs. This structure is bound to the corresponding
debugfs file, it can help callback function of debugfs file to get what it
need.
Link: https://lore.kernel.org/r/1571926105-74636-8-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Create a file structure which was used to save the memory address and CQ
pointer for CQ at debugfs. This structure is bound to the corresponding
debugfs file, it can help callback function of debugfs file to get what it
need.
Link: https://lore.kernel.org/r/1571926105-74636-7-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It's useful to know when the dump occurred, so add a timestamp file for
this.
Link: https://lore.kernel.org/r/1571926105-74636-6-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For IOs from upper layer, preemption may be disabled as it may be called by
function __blk_mq_delay_run_hw_queue which will call get_cpu() (it disables
preemption). So if flags HISI_SAS_REJECT_CMD_BIT is set in function
hisi_sas_task_exec(), it may disable preempt twice after down() and up()
which will cause following call trace:
BUG: scheduling while atomic: fio/60373/0x00000002
Call trace:
dump_backtrace+0x0/0x150
show_stack+0x24/0x30
dump_stack+0xa0/0xc4
__schedule_bug+0x68/0x88
__schedule+0x4b8/0x548
schedule+0x40/0xd0
schedule_timeout+0x200/0x378
__down+0x78/0xc8
down+0x54/0x70
hisi_sas_task_exec.isra.10+0x598/0x8d8 [hisi_sas_main]
hisi_sas_queue_command+0x28/0x38 [hisi_sas_main]
sas_queuecommand+0x168/0x1b0 [libsas]
scsi_queue_rq+0x2ac/0x980
blk_mq_dispatch_rq_list+0xb0/0x550
blk_mq_do_dispatch_sched+0x6c/0x110
blk_mq_sched_dispatch_requests+0x114/0x1d8
__blk_mq_run_hw_queue+0xb8/0x130
__blk_mq_delay_run_hw_queue+0x1c0/0x220
blk_mq_run_hw_queue+0xb0/0x128
blk_mq_sched_insert_requests+0xdc/0x208
blk_mq_flush_plug_list+0x1b4/0x3a0
blk_flush_plug_list+0xdc/0x110
blk_finish_plug+0x3c/0x50
blkdev_direct_IO+0x404/0x550
generic_file_read_iter+0x9c/0x848
blkdev_read_iter+0x50/0x78
aio_read+0xc8/0x170
io_submit_one+0x1fc/0x8d8
__arm64_sys_io_submit+0xdc/0x280
el0_svc_common.constprop.0+0xe0/0x1e0
el0_svc_handler+0x34/0x90
el0_svc+0x10/0x14
...
To solve the issue, check preemptible() to avoid disabling preempt multiple
when flag HISI_SAS_REJECT_CMD_BIT is set.
Link: https://lore.kernel.org/r/1571926105-74636-5-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When injecting 2bit ecc errors, it will cause confusion inside SAS
controller which needs host reset to recover it. If a device is gone at the
same times inject 2bit ecc errors, we may not receive the ITCT interrupt so
it will wait for completion in clear_itct_v3_hw() all the time. And host
reset will also not occur because it can't require hisi_hba->sem, so the
system will be suspended.
To solve the issue, use wait_for_completion_timeout() instead of
wait_for_completion(), and also don't mark the gone device as
SAS_PHY_UNUSED when device gone.
Link: https://lore.kernel.org/r/1571926105-74636-4-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If set the BIST init value after enabling BIST, there may be still some few
error bits. According to the process, need to set the BIST init value
before enabling BIST.
Fixes: 97b151e758 ("scsi: hisi_sas: Add BIST support for phy loopback")
Link: https://lore.kernel.org/r/1571926105-74636-3-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Due to a merge error, we attempt to create 2x debugfs dump folders, which
fails:
[ 861.101914] debugfs: Directory 'dump' with parent '0000:74:02.0'
already present!
This breaks the dump function.
To fix, remove the superfluous attempt to create the folder.
Fixes: 7ec7082c57 ("scsi: hisi_sas: Add hisi_sas_debugfs_alloc() to centralise allocation")
Link: https://lore.kernel.org/r/1571926105-74636-2-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix misspellings of "disonnect", "reconnect", "connection", "connected",
and "disconnection".
Link: https://lore.kernel.org/r/20191024152633.30404-1-geert+renesas@glider.be
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
From an interrupt handler path memory may be allocated using
GFP_KERNEL, replace it with GFP_ATOMIC.
_base_interrupt->_scsih_io_done->_scsih_smart_predicted_fault
Link: https://lore.kernel.org/r/20191024152835.6177-1-thenzl@redhat.com
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This error path is missing an unlock.
Fixes: 26780d9e12 ("[SCSI] esas2r: ATTO Technology ExpressSAS 6G SAS/SATA RAID Adapter Driver")
Link: https://lore.kernel.org/r/20191022102324.GA27540@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/cxgbi/cxgb4i/cxgb4i.c:2076:15:
warning: variable ppmax set but not used [-Wunused-but-set-variable]
drivers/target/iscsi/cxgbit/cxgbit_ddp.c:300:15:
warning: variable ppmax set but not used [-Wunused-but-set-variable]
It is not used since commit a248384e64 ("cxgb4/libcxgb/cxgb4i/cxgbit:
enable eDRAM page pods for iSCSI")
Link: https://lore.kernel.org/r/20191021142042.30964-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
These are called with IRQs disabled from csio_mgmt_tmo_handler() so we
can't call spin_unlock_irq() or it will enable IRQs prematurely.
Fixes: a3667aaed5 ("[SCSI] csiostor: Chelsio FCoE offload driver")
Link: https://lore.kernel.org/r/20191019085913.GA14245@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Replace assignment of 0 to pointer with NULL assignment.
Link: https://lore.kernel.org/r/20191024030857.GA12097@saurav
Signed-off-by: Saurav Girepunje <saurav.girepunje@gmail.com>
Acked-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Replace assignment of 0 to pointer with NULL assignment.
Link: https://lore.kernel.org/r/20191024025726.GA31421@saurav
Signed-off-by: Saurav Girepunje <saurav.girepunje@gmail.com>
Acked-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update lpfc version to 12.6.0.0
Link: https://lore.kernel.org/r/20191018211832.7917-17-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When debugging a recent discovery customer problem it was very hard to tell
what was happening with the existing discovery log messages. To fully debug
the issue additional log messages were necessary.
Add or extend log messages so that sufficient information is present for
debugging.
Link: https://lore.kernel.org/r/20191018211832.7917-16-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In the past, the lpe32000 models, based their main support being for 32G,
and as FC-AL is not supported in the FC standards past 8G, did not support
FC-AL operation.
This patch adds private-loop FC-AL support for the LPE32000 adapters
when a link is 8G or below. To avoid conditions where link rate may
change, which would cause non-connectivity to the AL device, FC-AL
mode must become a persistent setting and the link kept at a speed
supporting FC-AL.
The patch:
- Adds a pls attribute indicating whether the adapter properly supports
FC-AL.
- Adds support for the adapter to indicate that topology should be fixed
and the topology types to be configured.
- Adds a pt attribute to report the persistent topology if present.
Link: https://lore.kernel.org/r/20191018211832.7917-15-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add decode support for adapter Async Events which report FA-WWN
configuration errors.
Link: https://lore.kernel.org/r/20191018211832.7917-14-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add two new macros to aid in message logging:
Both macros print a message if the corresponding lpfc verbosity setting is
set or the kernel log level is WARNING or more critical.
One macro is for use with a phba structure, the other with a vport
structure.
[mkp: typo]
Link: https://lore.kernel.org/r/20191018211832.7917-13-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, the FW logging facility is a load/boot time parameter which
requires the driver to be unloaded/reloaded or the system rebooted in order
to change its configuration.
Convert the logging facility to allow dynamic enablement and configuration.
Specifically:
- Convert the feature so that it can be enabled dynamically via an
attribute. Additionally, the size of the buffer can be configured
dynamically.
- Add locks around states that now may be changing.
- Tie the feature into debugfs so that the logs can be read at any time.
Link: https://lore.kernel.org/r/20191018211832.7917-12-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The existing "auto eq delay" mechanism was sometimes skipping over an EQ,
not ramping the coalescing down under light load fast enough, and in other
cases never kicked in as cpu sharing by multiple vectors didn't quite add
up right.
Tweak the interrupt mechanism such that:
- Add a flag to the EQ to force checking for colaescing values when being
serviced in the interrupt handler. The flag will be set by any CQ bound
to the EQ whenever the number of CQ elements process in a single scan
meets or exceeds the hardware queue notify level. E.g. there's a
significant number of completions happening.
- In the heartbeat work item that checks coalescing:
- Replace the structure that was counting the number of EQs that
interrupted on a single cpu with a new structure that looks at the EQ
to see whether EQ currently has a coalescing value (thus it should be
re-evaluate) or was marked by the new flag indicating heavy
completions.
- When a cpu, which may be servicing multiple vectors, had at least 1 EQ
that should be checked, a new coalescing delay is calculated based on
the number of interrupts that occurred on the cpu.
- The new coalescing value is then applied to the EQs that had
interrupted on the cpu.
Link: https://lore.kernel.org/r/20191018211832.7917-11-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Lower IOps performance with write operations. Perf tool shows lock
contention in dma_pool_alloc and dma_pool_free related to the
txrdy_payload_pool.
The allocations are for dma buffers for XFER_RDY's, which actually are not
needed for the FCP_TRECEIVE command as the command contents are used by the
adapter to generate the IU.
Remove the allocations and the associated buffer pool. Rather than leaving
NULLs in buffer pointer locations, set command and sgl to indicate skipped
SGLE indexes.
Link: https://lore.kernel.org/r/20191018211832.7917-10-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Slightly rework some error check code paths for better streamlining.
Added compiler unlikely hints to allow slightly better optimization of the
fast-path.
Removed a few pointer checks that were obviously already valid.
Link: https://lore.kernel.org/r/20191018211832.7917-9-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Log message conditional upon vport being NULL dereferences vport to
determine log verbose setting.
Changed to use lpfc_print_log which uses phba to determine the active log
verbose setting.
Fixes: 43bfea1bff ("scsi: lpfc: Fix coverity errors on NULL pointer checks")
Link: https://lore.kernel.org/r/20191018211832.7917-8-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In lpfc_abort_handler, the lock acquire order is hbalock (irqsave),
buf_lock (irq) and ring_lock (irq). The issue is that in two places the
locks are released out of order - the buf_lock and the hbalock - resulting
in the cpu preemption/lock flags getting restored out of order and
deadlocking the cpu.
Fix the unlock order by fully releasing the hbalocks as well.
CC: Zhangguanghui <zhang.guanghui@h3c.com>
Link: https://lore.kernel.org/r/20191018211832.7917-7-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In cases where I/O may be aborted, such as driver unload or link bounces,
the system will crash based on a bad ndlp pointer.
Example:
RIP: 0010:lpfc_sli4_abts_err_handler+0x15/0x140 [lpfc]
...
lpfc_sli4_io_xri_aborted+0x20d/0x270 [lpfc]
lpfc_sli4_sp_handle_abort_xri_wcqe.isra.54+0x84/0x170 [lpfc]
lpfc_sli4_fp_handle_cqe+0xc2/0x480 [lpfc]
__lpfc_sli4_process_cq+0xc6/0x230 [lpfc]
__lpfc_sli4_hba_process_cq+0x29/0xc0 [lpfc]
process_one_work+0x14c/0x390
Crash was caused by a bad ndlp address passed to I/O indicated by the XRI
aborted CQE. The address was not NULL so the routine deferenced the ndlp
ptr. The bad ndlp also caused the lpfc_sli4_io_xri_aborted to call an
erroneous io handler. Root cause for the bad ndlp was an lpfc_ncmd that
was aborted, put on the abort_io list, completed, taken off the abort_io
list, sent to lpfc_release_nvme_buf where it was put back on the abort_io
list because the lpfc_ncmd->flags setting LPFC_SBUF_XBUSY was not cleared
on the final completion.
Rework the exchange busy handling to ensure the flags are properly set for
both scsi and nvme.
Fixes: c490850a09 ("scsi: lpfc: Adapt partitioned XRI lists to efficient sharing")
Cc: <stable@vger.kernel.org> # v5.1+
Link: https://lore.kernel.org/r/20191018211832.7917-6-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When operating in private loop mode, PLOGI exchanges are racing and the
driver tries to abort it's PLOGI. But the PLOGI abort ends up terminating
the login with the other end causing the other end to abort its PLOGI as
well. Discovery never fully completes.
Fix by disabling the PLOGI abort when private loop and letting the state
machine play out.
Link: https://lore.kernel.org/r/20191018211832.7917-5-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix lockdep error in __lpfc_sli_ringtx_put(): The hbalock is valid for
sli3, but not for sli4. Change lockdep to look at ring lock if sli4.
Also update comment in __lpfc_sli_issue_iocb_s4() to reflect proper
lock. Note: lockdep check is already correct.
Link: https://lore.kernel.org/r/20191018211832.7917-4-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the adapter FW is administratively set to RO mode, a FW update
triggered by the driver's sysfs attribute will fail. Currently, the
driver's logging mechanism does not properly parse the adapter return codes
and print a meaningful message. This oversight prevents quick diagnosis in
the field.
Parse the adapter return codes for Write_Object and write an appropriate
message to the system console.
[mkp: typo]
Link: https://lore.kernel.org/r/20191018211832.7917-3-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, lpfc_nvmet_mrq is always scaled back to the min(lpfc_nvmet_mrq,
lpfc_irq_chann). There's no reason to reduce it to the number of interrupt
vectors. Rather, it should be scaled down based on the number of hardware
queues for the system (if lower than max of 16).
Change scaling to use hardware queue count rather than interrupt vector
count.
Link: https://lore.kernel.org/r/20191018211832.7917-2-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Variable dif in function sd_setup_read_write_cmnd() is the return value of
function scsi_host_dif_capable() which returns dif capability of disks. If
define it as bool, even for the disks which support DIF3, the function
still return dif=1, which causes IO error. So define variable dif as
unsigned int instead of bool.
Fixes: e249e42d27 ("scsi: sd: Clean up sd_setup_read_write_cmnd()")
Link: https://lore.kernel.org/r/1571725628-132736-1-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The sed_ioctl() function is written to be compatible between
32-bit and 64-bit processes, however compat mode is only
wired up for nvme, not for sd.
Add the missing call to sed_ioctl() in sd_compat_ioctl().
Fixes: d80210f25f ("sd: add support for TCG OPAL self encrypting disks")
Cc: linux-scsi@vger.kernel.org
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
SG_GET_REQUEST_TABLE is now the last ioctl command that needs a conversion
handler. This is only used in a single file, so the implementation should
be there.
I'm trying to simplify it in the process, to get rid of
the compat_alloc_user_space() and extra copy, by adding a
put_compat_request_table() function instead, which copies the data in
the right format to user space.
Cc: linux-scsi@vger.kernel.org
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
There are two code locations that implement the SG_IO ioctl: the old
sg.c driver, and the generic scsi_ioctl helper that is in turn used by
multiple drivers.
To eradicate the old compat_ioctl conversion handler for the SG_IO
command, I implement a readable pair of put_sg_io_hdr() /get_sg_io_hdr()
helper functions that can be used for both compat and native mode,
and then I call this from both drivers.
For the iovec handling, there is already a compat_import_iovec() function
that can simply be called in place of import_iovec().
To avoid having to pass the compat/native state through multiple
indirections, I mark the SG_IO command itself as compatible in
fs/compat_ioctl.c and use in_compat_syscall() to figure out where
we are called from.
As a side-effect of this, the sg.c driver now also accepts the 32-bit
sg_io_hdr format in compat mode using the read/write interface, not
just ioctl. This should improve compatiblity with old 32-bit binaries,
but it would break if any application intentionally passes the 64-bit
data structure in compat mode here.
Steffen Maier helped debug an issue in an earlier version of this patch.
Cc: Steffen Maier <maier@linux.ibm.com>
Cc: linux-scsi@vger.kernel.org
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
MTIOCPOS and MTIOCGET are incompatible between 32-bit and 64-bit user
space, and traditionally have been translated in fs/compat_ioctl.c.
To get rid of that translation handler, move a corresponding
implementation into each of the four drivers implementing those commands.
The interesting part of that is now in a new linux/mtio.h header that
wraps the existing uapi/linux/mtio.h header and provides an abstraction
to let drivers handle both cases easily. Using an in_compat_syscall()
check, the caller does not have to keep track of whether this was
called through .unlocked_ioctl() or .compat_ioctl().
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Kai Mäkisara" <Kai.Makisara@kolumbus.fi>
Cc: linux-scsi@vger.kernel.org
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
A handful of drivers all have a trivial wrapper around their ioctl
handler, but don't call the compat_ptr() conversion function at the
moment. In practice this does not matter, since none of them are used
on the s390 architecture and for all other architectures, compat_ptr()
does not do anything, but using the new compat_ptr_ioctl()
helper makes it more correct in theory, and simplifies the code.
I checked that all ioctl handlers in these files are compatible
and take either pointer arguments or no argument.
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The .ioctl and .compat_ioctl file operations have the same prototype so
they can both point to the same function, which works great almost all
the time when all the commands are compatible.
One exception is the s390 architecture, where a compat pointer is only
31 bit wide, and converting it into a 64-bit pointer requires calling
compat_ptr(). Most drivers here will never run in s390, but since we now
have a generic helper for it, it's easy enough to use it consistently.
I double-checked all these drivers to ensure that all ioctl arguments
are used as pointers or are ignored, but are not interpreted as integer
values.
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: David Sterba <dsterba@suse.com>
Acked-by: Darren Hart (VMware) <dvhart@infradead.org>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/cxlflash/main.c:47:22: warning:
variable ioarcb set but not used [-Wunused-but-set-variable]
It is never used, so can be removed.
Link: https://lore.kernel.org/r/20191021141957.18828-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Matthew R. Ochs <mrochs@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For new adapters with multiple flash regions to write to, current code
allows FW & Boot regions to be written, while other regions are blocked via
sysfs. The fix is to block all flash read/write through sysfs interface.
Fixes: e81d1bcbde ("scsi: qla2xxx: Further limit FLASH region write access from SysFS")
Cc: stable@vger.kernel.org # 5.2
Link: https://lore.kernel.org/r/20191022193643.7076-3-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Girish Basrur <gbasrur@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch fixes issue with Gen7 adapter in a blade environment where one
of the ports will not be detected by driver. Firmware expects mailbox 11 to
be set or cleared by driver for newer ISP.
Following message is seen in the log file:
[ 18.810892] qla2xxx [0000:d8:00.0]-1820:1: **** Failed=102 mb[0]=4005 mb[1]=37 mb[2]=20 mb[3]=8
[ 18.819596] cmd=2 ****
[mkp: typos]
Link: https://lore.kernel.org/r/20191022193643.7076-2-hmadhani@marvell.com
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The initial lpfc_desc_set_adisc implementation in commit
dea3101e0a ("lpfc: add Emulex FC driver version 8.0.28") enabled ADISC if
cfg_use_adisc && RSCN_MODE && FCP_2_DEVICE
In commit 92d7f7b0cd ("[SCSI] lpfc: NPIV: add NPIV support on top of
SLI-3") this changed to
(cfg_use_adisc && RSC_MODE) || FCP_2_DEVICE
and later in commit ffc954936b ("[SCSI] lpfc 8.3.13: FC Discovery Fixes
and enhancements.") to
(cfg_use_adisc && RSC_MODE) || (FCP_2_DEVICE && FCP_TARGET)
A customer reports that after a devloss, an ADISC failure is logged. It
turns out the ADISC flag is set even the user explicitly set lpfc_use_adisc
= 0.
[Sat Dec 22 22:55:58 2018] lpfc 0000:82:00.0: 2:(0):0203 Devloss timeout on WWPN 50:01:43:80:12:8e:40:20 NPort x05df00 Data: x82000000 x8 xa
[Sat Dec 22 23:08:20 2018] lpfc 0000:82:00.0: 2:(0):2755 ADISC failure DID:05DF00 Status:x9/x70000
[mkp: fixed Hannes' email]
Fixes: 92d7f7b0cd ("[SCSI] lpfc: NPIV: add NPIV support on top of SLI-3")
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: James Smart <james.smart@broadcom.com>
Link: https://lore.kernel.org/r/20191022072112.132268-1-dwagner@suse.de
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After IOP reset completion, AIF request command is not issued to the
controller. Driver schedules a worker thread to issue a AIF request command
after IOP reset completion.
[mkp: fix zeroday warning]
Link: https://lore.kernel.org/r/1571120524-6037-7-git-send-email-balsundar.p@microsemi.com
Acked-by: Balsundar P < Balsundar.P@microchip.com>
Signed-off-by: Balsundar P <balsundar.p@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently driver waits for the command IOCTL from the firmware and if the
firmware enters nonresponsive state, the driver doesn't respond till the
firmware is responsive again.
Check that firmware is alive, otherwise return -EBUSY.
[mkp: clarified commit desc]
Link: https://lore.kernel.org/r/1571120524-6037-6-git-send-email-balsundar.p@microsemi.com
Signed-off-by: Balsundar P <balsundar.p@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Before issuing IOP reset, INTX mode is selected. This is triggering MSGU
lockup and ended in basecode assert. Use DROP_IO command when IOP reset is
sent in preparation for interrupt mode switch.
Link: https://lore.kernel.org/r/1571120524-6037-4-git-send-email-balsundar.p@microsemi.com
Signed-off-by: Balsundar P <balsundar.p@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The problem is the driver detects FastResponse bit set and saves it to
Fib's flags to not check IO response status, but it never clears it for
next IO. Hence the next IO will pick up FastResponse bit to not check
the IO response status and fail to report any type IO error to kernel
Link: https://lore.kernel.org/r/1571120524-6037-3-git-send-email-balsundar.p@microsemi.com
Signed-off-by: Balsundar P <balsundar.p@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The scsi async probe process is calling blk_pm_runtime_init for each lun,
and then those request queues are monitored by the block layer pm
engine (blk-pm.c). This is however, not the case for scsi-passthrough
queues, created by bsg_setup_queue().
So the ufs-bsg driver might send various commands, disregarding the pm
status of the device. This is wrong, regardless if its request queue is
pm-aware or not.
Fixes: df032bf27a (scsi: ufs: Add a bsg endpoint that supports UPIUs)
Link: https://lore.kernel.org/r/1570696267-8487-1-git-send-email-avri.altman@wdc.com
Reported-by: Yuliy Izrailov <yuliy.izrailov@wdc.com>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The queue pointer might not be valid. The rest of the code checks the
pointer before accessing it. lpfc_sli4_process_missed_mbox_completions is
the only place where the check is missing.
Fixes: 657add4e5e ("scsi: lpfc: Fix poor use of hardware queues if fewer irq vectors")
Cc: James Smart <jsmart2021@gmail.com>
Link: https://lore.kernel.org/r/20191018162111.8798-1-dwagner@suse.de
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
DRIVER_ERROR is a a driver byte setting, not a host byte. The qla2xxx
driver should rather return DID_ERROR here to be in line with the other
drivers.
Link: https://lore.kernel.org/r/20191018140458.108278-1-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.com>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZOpW2gUwxXeCmhkh7ulgGnXF3j0FAl2PeYQACgkQ7ulgGnXF
3j3Zow/7BD/9Vai5zqDOFXSFnR5cFfcLoL2aYu13B5GjIYKZlhUW8ePC0jo0p0sV
TSsIrOxv3RaeDLC5ISi+njsSJMspW5qGv8jrZb7xBn1zE2gcJ9YeVhb+tboW2rrr
R03i7HInSQhdyKFMQS05IonRi8LphmTYKy3p8LifiiPoy4TsGcpw2tjQKicp0GxZ
gWMOcMnx4sfUiivo0tys6UwUIACVqKOysXn4HGs8COFF4cdBHXJVkddZ5ZUO6hP+
JInRdiKqDwycZyE6X/6Mj1B7tbmLVGH5mOX2Mx6dwUQkBIpsgJGGIxHRd9sUKv3r
ltfZADn7CJGcTgwEFF1Fnn61pYXgx/m5M9ECop+w89CLNjWBVsRW1rLjIqiDIY6a
pCmZQE/P95iHxLtAn6s1IkZoiXzU44tGQZaS3/8uGFxizx/ktWosokUdzC3PJee3
1eHIXOGwJlu9dTSTR7YDid4s3pXHovlMlu1OTNp1ap8jHX7L5D2AM6xxlLJaXPhN
zOJz6vcP5ZdVWqNq55jsB0dXDa76hrN2SUkpcwgJYKeU7qnRuGRq/jzjetSuRLvI
jyaLY0VyKxHWk0/YgmU2gfW/sBYccJg6ONCPJ80R3KuO8VoMxUPZZiWWTb6bwVXn
Mj2M6dWRIySVI1D8MR7sahGfeCqk7LkCE88DdTDoyaBHasbUSAk=
=LmPS
-----END PGP SIGNATURE-----
Merge tag 'mkp-scsi-postmerge' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi
Pull scsi fixes from Martin Petersen:
"These two commits were in a separate postmerge branch due to a
dependency on changes merged for 5.4 in the block tree.
They fix two issues in the intersection of the request cleanup changes
from block (b7e9e1fb7a) and the request batching changes
(8930a6c207) that were made to SCSI during the 5.4 cycle"
* tag 'mkp-scsi-postmerge' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi:
scsi: core: fix dh and multipathing for SCSI hosts without request batching
scsi: core: fix missing .cleanup_rq for SCSI hosts without request batching
As said in commit f2c2cbcc35 ("powerpc: Use pr_warn instead of
pr_warning"), removing pr_warning so all logging messages use a
consistent <prefix>_warn style. Let's do it.
Link: http://lkml.kernel.org/r/20191018031850.48498-21-wangkefeng.wang@huawei.com
To: linux-kernel@vger.kernel.org
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
The BUILD_NVME define never got defined anywhere, causing NVMe commands to
be treated as SCSI commands when freeing the buffers. This was causing a
stuck discovery and a horrible crash in lpfc_set_rrq_active() later on.
Link: https://lore.kernel.org/r/20191017150019.75769-1-hare@suse.de
Fixes: c00f62e6c5 ("scsi: lpfc: Merge per-protocol WQ/CQ pairs into single per-cpu pair")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We have a test case like block/001 in blktests, which will create a scsi
device by loading scsi_debug module and then try to delete the device by
sysfs interface. At the same time, it may remove the scsi_debug module.
And getting a invalid paging request BUG_ON as following:
[ 34.625854] BUG: unable to handle page fault for address: ffffffffa0016bb8
[ 34.629189] Oops: 0000 [#1] SMP PTI
[ 34.629618] CPU: 1 PID: 450 Comm: bash Tainted: G W 5.4.0-rc3+ #473
[ 34.632524] RIP: 0010:scsi_proc_hostdir_rm+0x5/0xa0
[ 34.643555] CR2: ffffffffa0016bb8 CR3: 000000012cd88000 CR4: 00000000000006e0
[ 34.644545] Call Trace:
[ 34.644907] scsi_host_dev_release+0x6b/0x1f0
[ 34.645511] device_release+0x74/0x110
[ 34.646046] kobject_put+0x116/0x390
[ 34.646559] put_device+0x17/0x30
[ 34.647041] scsi_target_dev_release+0x2b/0x40
[ 34.647652] device_release+0x74/0x110
[ 34.648186] kobject_put+0x116/0x390
[ 34.648691] put_device+0x17/0x30
[ 34.649157] scsi_device_dev_release_usercontext+0x2e8/0x360
[ 34.649953] execute_in_process_context+0x29/0x80
[ 34.650603] scsi_device_dev_release+0x20/0x30
[ 34.651221] device_release+0x74/0x110
[ 34.651732] kobject_put+0x116/0x390
[ 34.652230] sysfs_unbreak_active_protection+0x3f/0x50
[ 34.652935] sdev_store_delete.cold.4+0x71/0x8f
[ 34.653579] dev_attr_store+0x1b/0x40
[ 34.654103] sysfs_kf_write+0x3d/0x60
[ 34.654603] kernfs_fop_write+0x174/0x250
[ 34.655165] __vfs_write+0x1f/0x60
[ 34.655639] vfs_write+0xc7/0x280
[ 34.656117] ksys_write+0x6d/0x140
[ 34.656591] __x64_sys_write+0x1e/0x30
[ 34.657114] do_syscall_64+0xb1/0x400
[ 34.657627] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 34.658335] RIP: 0033:0x7f156f337130
During deleting scsi target, the scsi_debug module have been removed. Then,
sdebug_driver_template belonged to the module cannot be accessd, resulting
in scsi_proc_hostdir_rm() BUG_ON.
To fix the bug, we add scsi_device_get() in sdev_store_delete() to try to
increase refcount of module, avoiding the module been removed.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191015130556.18061-1-yuyufen@huawei.com
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Correct returning from reset before outstanding commands are completed
for the device.
Link: https://lore.kernel.org/r/157107623870.17997.11208813089704833029.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Five changes, two in drivers (qla2xxx, zfcp), one to MAINTAINERS
(qla2xxx) and two in the core. The last two are mostly about removing
incorrect messages from the kernel log: the resid message is
definitely wrong and the sync cache on protected drive problem is
arguably wrong.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXaYZYCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishVuDAP9HBhGv
dQ3FPA7gZ33rmsb8M1Q1NJ0GJuvFj2muh9CFYwD6AoJtVLivVZR75gojLLMqKpuf
6EwRTaUZwYAoWeILNuA=
=+iTy
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Five changes, two in drivers (qla2xxx, zfcp), one to MAINTAINERS
(qla2xxx) and two in the core.
The last two are mostly about removing incorrect messages from the
kernel log: the resid message is definitely wrong and the sync cache
on protected drive problem is arguably wrong"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: MAINTAINERS: Update qla2xxx driver
scsi: zfcp: fix reaction on bit error threshold notification
scsi: core: save/restore command resid for error handling
scsi: qla2xxx: Remove WARN_ON_ONCE in qla2x00_status_cont_entry()
scsi: sd: Ignore a failure to sync cache due to lack of authorization
Code that iterates over all standard PCI BARs typically uses
PCI_STD_RESOURCE_END. However, that requires the unusual test
"i <= PCI_STD_RESOURCE_END" rather than something the typical
"i < PCI_STD_NUM_BARS".
Add a definition for PCI_STD_NUM_BARS and change loops to use the more
idiomatic C style to help avoid fencepost errors.
Link: https://lore.kernel.org/r/20190927234026.23342-1-efremov@linux.com
Link: https://lore.kernel.org/r/20190927234308.23935-1-efremov@linux.com
Link: https://lore.kernel.org/r/20190916204158.6889-3-efremov@linux.com
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Sebastian Ott <sebott@linux.ibm.com> # arch/s390/
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> # video/fbdev/
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com> # pci/controller/dwc/
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> # scsi/pm8001/
Acked-by: Martin K. Petersen <martin.petersen@oracle.com> # scsi/pm8001/
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # memstick/
Clearing ch->device in ch_release() is wrong because that pointer must
remain valid until ch_remove() is called. This patch fixes the following
crash the second time a ch device is opened:
BUG: kernel NULL pointer dereference, address: 0000000000000790
RIP: 0010:scsi_device_get+0x5/0x60
Call Trace:
ch_open+0x4c/0xa0 [ch]
chrdev_open+0xa2/0x1c0
do_dentry_open+0x13a/0x380
path_openat+0x591/0x1470
do_filp_open+0x91/0x100
do_sys_open+0x184/0x220
do_syscall_64+0x5f/0x1a0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 085e56766f ("scsi: ch: add refcounting")
Cc: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20191009173536.247889-1-bvanassche@acm.org
Reported-by: Rob Turk <robtu@rtist.nl>
Suggested-by: Rob Turk <robtu@rtist.nl>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When building a kernel with SCSI_SNI_53C710 enabled, Kconfig warns:
WARNING: unmet direct dependencies detected for 53C700_LE_ON_BE
Depends on [n]: SCSI_LOWLEVEL [=y] && SCSI [=y] && SCSI_LASI700 [=n]
Selected by [y]:
- SCSI_SNI_53C710 [=y] && SCSI_LOWLEVEL [=y] && SNI_RM [=y] && SCSI [=y]
Add the missing depends SCSI_SNI_53C710 to 53C700_LE_ON_BE to fix it.
Link: https://lore.kernel.org/r/20191009151128.32411-1-tbogendoerfer@suse.de
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/megaraid/megaraid_sas_fp.c: In function MR_GetSpanBlock:
drivers/scsi/megaraid/megaraid_sas_fp.c:400:16: warning: variable debugBlk set but not used [-Wunused-but-set-variable]
drivers/scsi/megaraid/megaraid_sas_fp.c: In function mr_spanset_get_phy_params:
drivers/scsi/megaraid/megaraid_sas_fp.c:713:25: warning: variable fusion set but not used [-Wunused-but-set-variable]
drivers/scsi/megaraid/megaraid_sas_fp.c: In function MR_GetPhyParams:
drivers/scsi/megaraid/megaraid_sas_fp.c:815:25: warning: variable fusion set but not used [-Wunused-but-set-variable]
'debugBlk' is introduced by commit 9c915a8c99 ("[SCSI] megaraid_sas:
Add 9565/9285 specific code"), but never used, so remove it
'fusion' is not used since commit c365178f31 ("scsi: megaraid_sas:
use adapter_type for all gen controllers")
Link: https://lore.kernel.org/r/1570605824-89133-1-git-send-email-zhengbin13@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some arrays are not capable of returning RTPG data during state
transitioning, but rather return an 'LUN not accessible, asymmetric access
state transition' sense code. In these cases we can set the state to
'transitioning' directly and don't need to evaluate the RTPG data (which we
won't have anyway).
Link: https://lore.kernel.org/r/20191007135701.32389-1-hare@suse.de
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
alloc_workqueue is not checked for errors and as a result a potential
NULL dereference could occur.
Link: https://lore.kernel.org/r/1568824618-4366-1-git-send-email-allen.pais@oracle.com
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, MSI-X vectors name appears in /proc/interrupts is "megasas"
which is same for all the vectors. This patch provides a unique name for
all megaraid_sas controllers and their associated MSI-X interrupts.
Link: https://lore.kernel.org/r/20191007051828.12294-1-chandrakanth.patil@broadcom.com
Suggested-by: Konstantin Shalygin <k0ste@k0ste.ru>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Formatting changes, no functional changes.
Link: https://lore.kernel.org/r/157048753005.11757.2228541207280057256.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Removed some unused manifest constants.
Link: https://lore.kernel.org/r/157048752420.11757.3464951542864727227.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Obtain the unique IDs from the RLL and RPL instead of VPD page 83h.
Link: https://lore.kernel.org/r/157048751833.11757.11996314786914610803.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/157048751247.11757.1727592925624138646.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/157048750649.11757.7811056360633694725.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add support for a timeout on LUN resets.
Link: https://lore.kernel.org/r/157048750055.11757.9689400788261610618.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add timeout field in RAID IU.
Link: https://lore.kernel.org/r/157048749461.11757.10013040278241807855.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: koshyaji <ajish.koshy@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use sas_phy_delete rather than sas_phy_free which, according to
comments, should not be called for PHYs that have been set up
successfully.
Link: https://lore.kernel.org/r/157048748876.11757.17773443136670011786.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/157048748297.11757.3872221216800537383.stgit@brunhilda
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This line is indented too far so it's a bit confusing.
Link: https://lore.kernel.org/r/20191004100615.GA823@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix sparse warnings:
drivers/scsi/lpfc/lpfc_nportdisc.c:290:1: warning: symbol 'lpfc_defer_pt2pt_acc' was not declared. Should it be static?
Link: https://lore.kernel.org/r/1570183477-137273-1-git-send-email-zhengbin13@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reviewed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For MPI heartbeat stop Async Event, this patch would capture MPI FW dump
and chip reset. FW will tell which function to capture FW dump for.
Link: https://lore.kernel.org/r/20190912180918.6436-13-hmadhani@marvell.com
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add mailbox timeout checkout for ISP 27xx/28xx during FW dump procedure.
Without the timeout check, hardware lock can be held for long period. This
patch would shorten the dump procedure if a timeout condition is
encountered.
Link: https://lore.kernel.org/r/20190912180918.6436-12-hmadhani@marvell.com
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During driver unload, the remove flag will be set for all
scsi_qla_host/NPIV. This allows each NPIV to see the flag instead of
reaching for base_vha to search for it.
Link: https://lore.kernel.org/r/20190912180918.6436-11-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add error handling logic to ELS Passthrough relating to NVME devices.
Current code does not parse error code to take proper recovery action,
instead it re-logins with the same login parameters that encountered the
error. Ex: nport handle collision.
Link: https://lore.kernel.org/r/20190912180918.6436-10-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some storage arrays advertise FCP LUNs and NVMe namespaces behind the same
WWN. The driver now offers a user option by way of NVRAM parameter to
allow users to choose, on a per port basis, the kind of FC-4 type they
would like to prioritize for login.
Link: https://lore.kernel.org/r/20190912180918.6436-9-hmadhani@marvell.com
Signed-off-by: Michael Hernandez <mhernandez@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Twelve patches mostly small but obvious fixes or cosmetic but small
updates.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXZgfWiYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishaVOAQDnuANx
QGEuQ1dZPALeZPOlEOsJzzpHPd3O+mQauIE96wD9FMypt/UKF9+fvlp4mCP+ya66
0fz1kmTQIcAADdYaNYM=
=aQi7
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Twelve patches mostly small but obvious fixes or cosmetic but small
updates"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Fix Nport ID display value
scsi: qla2xxx: Fix N2N link up fail
scsi: qla2xxx: Fix N2N link reset
scsi: qla2xxx: Optimize NPIV tear down process
scsi: qla2xxx: Fix stale mem access on driver unload
scsi: qla2xxx: Fix unbound sleep in fcport delete path.
scsi: qla2xxx: Silence fwdump template message
scsi: hisi_sas: Make three functions static
scsi: megaraid: disable device when probe failed after enabled device
scsi: storvsc: setup 1:1 mapping between hardware queue and CPU queue
scsi: qedf: Remove always false 'tmp_prio < 0' statement
scsi: ufs: skip shutdown if hba is not powered
scsi: bnx2fc: Handle scope bits when array returns BUSY or TSF
When a non-passthrough command is terminated with CHECK CONDITION, request
sense is executed by hijacking the command descriptor. Since
scsi_eh_prep_cmnd() and scsi_eh_restore_cmnd() do not save/restore the
original command resid, the value returned on failure of the original
command is lost and replaced with the value set by the execution of the
request sense command. This value may in many instances be unaligned to the
device sector size, causing sd_done() to print a warning message about the
incorrect unaligned resid before the command is retried.
Fix this problem by saving the original command residual in struct
scsi_eh_save using scsi_eh_prep_cmnd() and restoring it in
scsi_eh_restore_cmnd(). In addition, to make sure that the request sense
command is executed with a correctly initialized command structure, also
reset the residual to 0 in scsi_eh_prep_cmnd() after saving the original
command value in struct scsi_eh_save.
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191001074839.1994-1-damien.lemoal@wdc.com
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix sparse warning:
drivers/scsi/bfa/bfad.c:1491:1: warning:
symbol 'restart_bfa' was not declared. Should it be static?
Link: https://lore.kernel.org/r/20190930094327.46836-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 88263208dd ("scsi: qla2xxx: Complain if sp->done() is not called
from the completion path") introduced the WARN_ON_ONCE in
qla2x00_status_cont_entry(). The assumption was that there is only one
status continuations element. According to the firmware documentation it is
possible that multiple status continuations are emitted by the firmware.
Fixes: 88263208dd ("scsi: qla2xxx: Complain if sp->done() is not called from the completion path")
Link: https://lore.kernel.org/r/20190927073031.62296-1-dwagner@suse.de
Cc: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
I've got a report about a UAS drive enclosure reporting back Sense: Logical
unit access not authorized if the drive it holds is password protected.
While the drive is obviously unusable in that state as a mass storage
device, it still exists as a sd device and when the system is asked to
perform a suspend of the drive, it will be sent a SYNCHRONIZE CACHE. If
that fails due to password protection, the error must be ignored.
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20190903101840.16483-1-oneukum@suse.com
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since 'commit fc8d0590d9 ("libcxgbi: Add ipv6 api to driver")' was
introduced, there is no call to csk_print_port() and csk_print_ip() is
made.
Hence kernel build with clang complains below message:
drivers/scsi/cxgbi/libcxgbi.c:2287:19: warning: unused function 'csk_print_port' [-Wunused-function]
static inline int csk_print_port(struct cxgbi_sock *csk, char *buf)
^
drivers/scsi/cxgbi/libcxgbi.c:2298:19: warning: unused function 'csk_print_ip' [-Wunused-function]
static inline int csk_print_ip(struct cxgbi_sock *csk, char *buf)
^
Remove csk_print_port() and csk_print_ip() to stop warning.
Link: https://lore.kernel.org/r/20190924093716.GA78230@LGEARND20B15
Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add sysfs attributes for the ATA information page and Supported VPD Pages
page.
Link: https://lore.kernel.org/r/20190926162216.56591-1-ryanattard@ryanattard.info
Signed-off-by: Ryan Attard <ryanattard@ryanattard.info>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are some statements that are indented too deeply, remove the
extraneous tabs and rejoin split lines.
Link: https://lore.kernel.org/r/20190927095840.26377-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Couple of users had requested to print the SCSI command age along with
command failure errors. This is a small change, but allows users to get
more important information about the command that was failed, it would help
the users in debugging the command failures:
Link: https://lore.kernel.org/r/20190926052501.GA8352@machine1
Signed-off-by: Milan P. Gandhi <mgandhi@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add qedf_get_host_port_id() to the transport template.
The fc_transport_template initializes the port_id member to the default
value of -1. The new getter ensures that the sysfs entry shows the current
value and not the default one, e.g by using 'lsscsi -H -t'
Link: https://lore.kernel.org/r/20190924072906.23737-1-dwagner@suse.de
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Rework from previous work by:
Sujit Reddy Thumma <sthumma@codeaurora.org>
Override auto suspend tunables for UFS device LUNs during initialization so
as to efficiently manage background operations and the power consumption.
Link: https://lore.kernel.org/r/1568649411-5127-3-git-send-email-stanley.chu@mediatek.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Rework from previous work by:
Sujit Reddy Thumma <sthumma@codeaurora.org>
Until now the scsi mid-layer forbids runtime suspend till userspace enables
it. This is mainly to quarantine some disks with broken runtime power
management or have high latencies executing suspend resume callbacks. If
the userspace doesn't enable the runtime suspend the underlying hardware
will be always on even when it is not doing any useful work and thus
wasting power.
Some low-level drivers for the controllers can efficiently use runtime
power management to reduce power consumption and improve battery life.
Allow runtime suspend parameters override within the LLD itself instead of
waiting for userspace to control the power management.
Link: https://lore.kernel.org/r/1568649411-5127-2-git-send-email-stanley.chu@mediatek.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Simplify this function implementation by using a known function.
Generated by: scripts/coccinelle/api/ptr_ret.cocci
[mkp: applied by hand]
Link: https://lore.kernel.org/r/9e667f19-434e-ed30-78cb-9ddc6323c51e@web.de
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don't populate the array setup_attrs on the stack but instead make it
static const. Makes the object code smaller by 180 bytes.
Before:
text data bss dec hex filename
2140 224 0 2364 93c drivers/scsi/ufs/ufshcd-dwc.o
After:
text data bss dec hex filename
1863 320 0 2183 887 drivers/scsi/ufs/ufshcd-dwc.o
(gcc version 9.2.1, amd64)
Link: https://lore.kernel.org/r/20190906170104.10450-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don't populate the array 'options' on the stack but instead make it static
const. Makes the object code smaller by 143 bytes.
Before:
text data bss dec hex filename
94483 11272 1184 106939 1a1bb drivers/scsi/ips.o
After:
text data bss dec hex filename
94244 11368 1184 106796 1a12c drivers/scsi/ips.o
(gcc version 9.2.1, amd64)
Link: https://lore.kernel.org/r/20190906164522.5644-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don't populate the array dev_cmd_err on the stack but instead make it
static const. Makes the object code smaller by 80 bytes.
Before:
text data bss dec hex filename
21461 1564 0 23025 59f1 drivers/scsi/fnic/vnic_dev.o
After:
text data bss dec hex filename
21318 1628 0 22946 59a2 drivers/scsi/fnic/vnic_dev.o
(gcc version 9.2.1, amd64)
Link: https://lore.kernel.org/r/20190906163945.3889-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The variable rc is being initialized with a value that is never read and is
being re-assigned a little later on. The assignment is redundant and hence
can be removed.
Link: https://lore.kernel.org/r/20190905135017.23772-1-colin.king@canonical.com
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The pointer host is being initialized with a value that is never read and
is being re-assigned a little later on. The assignment is redundant and
hence can be removed.
Link: https://lore.kernel.org/r/20190905134229.21194-1-colin.king@canonical.com
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/smartpqi/smartpqi_init.c: In function 'pqi_driver_version_show':
drivers/scsi/smartpqi/smartpqi_init.c:6164:24: warning:
variable 'ctrl_info' set but not used [-Wunused-but-set-variable]
commit 6d90615f13 ("scsi: smartpqi: add sysfs entries") added it but
it was never used. Also remove variable 'shost'.
[mkp: commit desc]
Link: https://lore.kernel.org/r/20190831130348.20552-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is a statement that is indented one level too deeply, remove the tab,
re-join broken line and remove some empty lines.
Link: https://lore.kernel.org/r/20190831073903.7834-1-colin.king@canonical.com
Addresses-Coverity: ("Indentation does not match nesting")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Load driver with module parameter "max_msix_vectors". Value provided in
module parameter is not used by mpt3sas driver. Driver loads with max
controller supported MSI-X value.
In _base_alloc_irq_vectors use reply_queue_count which is determined using
user provided msix value insted of ioc->msix_vector_count which tells max
supported msix value of the controller.
Link: https://lore.kernel.org/r/1568379890-18347-13-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If any faulty application issues an NVMe Encapsulated commands to HBA which
doesn't support NVMe protocol then driver should return the command as
invalid with the following message.
"HBA doesn't support NVMe. Rejecting NVMe Encapsulated request."
Otherwise below page fault kernel panic will be observed while building the
PRPs as there is no PRP pools allocated for the HBA which doesn't support
NVMe drives.
RIP: 0010:_base_build_nvme_prp+0x3b/0xf0 [mpt3sas]
Call Trace:
_ctl_do_mpt_command+0x931/0x1120 [mpt3sas]
_ctl_ioctl_main.isra.11+0xa28/0x11e0 [mpt3sas]
? prepare_to_wait+0xb0/0xb0
? tty_ldisc_deref+0x16/0x20
_ctl_ioctl+0x1a/0x20 [mpt3sas]
do_vfs_ioctl+0xaa/0x620
? vfs_read+0x117/0x140
ksys_ioctl+0x67/0x90
__x64_sys_ioctl+0x1a/0x20
do_syscall_64+0x60/0x190
entry_SYSCALL_64_after_hwframe+0x44/0xa9
[mkp: tweaked error string]
Link: https://lore.kernel.org/r/1568379890-18347-12-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The firmware image layout has been changed for Aero controllers. All
compatible HBAs have to get Firmware Package version from Component Image
Header layout.
The Signature field in FW header is set to 0xEB000042 for products
compatible with Component Image Header.
For compatible controllers, driver fetches firmware package version from
ApplicationSpecific field of Component Image Header.
Link: https://lore.kernel.org/r/1568379890-18347-11-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Added a new status flag named MPT3_DIAG_BUFFER_IS_APP_OWNED and it will set
whenever application registers the diag buffer & it will be cleared when
application unregisters the buffer.
When this flag is enabled, and if application issues diag buffer register
command without releasing the buffer, then register command will be failed
with -EINVAL status by saying that this buffer is already registered by the
application.
When user issues a trace buffer register command through sysfs parameter,
and if trace buffer is in released stated but not yet unregistered by the
application which was owning it, then driver will unregister the buffer by
itself and freshly register the 1MB sized trace buffer with the HBA
firmware.
Link: https://lore.kernel.org/r/1568379890-18347-9-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The diag buffer which is allocated during driver load time or through sysfs
parameter is marked as driver allocated diag buffer.
MPT3_DIAG_BUFFER_IS_DRIVER_ALLOCATED bit will be set for this buffer.
This buffer won't be de-allocated even when application issues unregister
command, driver just clears the registered status bit. Same buffer will be
reused while re-registering the same diag buffer type by any application.
While re-registering the same diag buffer type application has to register
with the same size that the buffer was allocated during driver load
time. This buffer size can be read by the application by issuing diag
'query' command.
This always makes sure that the memory is available for applications for
collecting the firmware logs. Only thing is that this won't allow the
application to re-register the diag buffer with different size, but the
buffer size which is allocated during driver load time will be enough for
most of the cases for collecting the firmware logs.
Link: https://lore.kernel.org/r/1568379890-18347-8-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Clear MPT3_DIAG_BUFFER_IS_RELEASED bit once diag buffer is re-registered
after reading the buffer, else driver won't release the buffer and return
the 'diag release' command with -EINVAL status saying that buffer is
already released.
Link: https://lore.kernel.org/r/1568379890-18347-7-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Application A has registered a diag buffer and looking for particular event
to happen to release & read the trace buffer. Meanwhile application B has
unregistered the diag buffer and now Application A can't get the required
diag buffer. So proper diag buffer ownership is missing.
Each application has to maintain its own Unique ID. Now driver has to save
the Application's UniqueID for each diag buffer type when diag buffer is
registered. And driver has to allow 'release', 'read' & 'unregister' diag
commands only if application's UniqueID matches with saved UniqueID for the
corresponding diag buffer type.
When diag buffer is registered by the driver, then the UniqueID saved by
the driver is "BRCM" (i.e. 0x4252434D) for SAS3 and above generations HBA
devices. For SAS2 HBAs, driver keeps the legacy UniqueID 0x07075900 for
maintaining compatibility with the legacy SAS2 application and this
improvement won't be applicable for SAS2 HBA devices.
Any application can own the buffer registered by the driver by sending
diag register request to driver with same buffer type and size
(Application can get the buffer size by sending 'query' command). Then
driver changes the ownership of the buffer by saving application's
UniqueID for that corresponding buffer type.
Also, application can re-register the diag buffer with same size without
un-registering it, but diag buffer should be released before re-registering
it. By allowing this, driver no need to deallocate and allocate a new
buffer for re-register command, same buffer can be re-used.
Link: https://lore.kernel.org/r/1568379890-18347-6-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Memory leak can happen when diag buffer is released but not unregistered
(where buffer is deallocated) by the user. During module unload time driver
is not deallocating the buffer if the buffer is in released state.
Deallocate the diag buffer during module unload time without any diag
buffer status checks.
Link: https://lore.kernel.org/r/1568379890-18347-5-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When user issues diag register command from application with required size,
and if driver unable to allocate the memory, then it will fail the register
command. While failing the register command, driver is not currently
clearing MPT3_CMD_PENDING bit in ctl_cmds.status variable which was set
before trying to allocate the memory. As this bit is set, subsequent
register command will be failed with BUSY status even when user wants to
register the trace buffer will less memory.
Clear MPT3_CMD_PENDING bit in ctl_cmds.status before returning the diag
register command with no memory status.
Link: https://lore.kernel.org/r/1568379890-18347-4-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Display message before releasing the diag buffer so that user knows which
event caused the release of diag buffer.
Releasing of diag buffer means HBA firmware stops posting the firmware logs
on the registered diag buffer.
Link: https://lore.kernel.org/r/1568379890-18347-3-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently if user wishes to enable the host trace buffer during driver load
time, then user has to load the driver with module parameter
'diag_buffer_enable' set to one.
Alternatively now the user can enable host trace buffer by enabling the
following fields in manufacturing page11 in NVDATA (nvdata xml is used
while building HBA firmware image):
* HostTraceBufferMaxSizeKB - Maximum trace buffer size in KB that host can
allocate,
* HostTraceBufferMinSizeKB - Minimum trace buffer size in KB atleast host
should allocate,
* HostTraceBufferDecrementSizeKB - size by which host can reduce from
buffer size and retry the buffer allocation
when buffer allocation failed with previous
calculated buffer size.
The driver will register the trace buffer automatically without any module
parameter during boot time when above fields are enabled in manufacturing
page11 in HBA firmware.
Driver follows the following algorithm for enabling the host trace buffer
during driver load time:
* If user has loaded the driver with module parameter 'diag_buffer_enable'
set to one, then driver allocates 2MB buffer and registers this buffer
with HBA firmware for capturing the firmware trace logs.
* Else driver reads manufacture page11 data and checks whether
HostTraceBufferMaxSizeKB filed is zero or not?
- If HostTraceBufferMaxSizeKB is non-zero then driver tries to allocate
HostTraceBufferMaxSizeKB size of memory. If the buffer allocation is
successful, then it will register this buffer with HBA firmware, else
in a loop the driver will try again by reducing the current buffer size
with HostTraceBufferDecrementSizeKB size until memory allocation is
successful or buffer size falls below HostTraceBufferMinSizeKB. If the
memory allocation is successful, then the buffer will be registered
with the firmware. Else, if the buffer size falls below the
HostTraceBufferMinSizeKB, then driver won't register trace buffer with
HBA firmware.
- If HostTraceBufferMaxSizeKB is zero, then driver won't register trace
buffer with HBA firmware.
Link: https://lore.kernel.org/r/1568379890-18347-2-git-send-email-sreekanth.reddy@broadcom.com
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Local variable fcp_txcmplq_cnt is initialized to 0 and then displayed in
lpfc driver message 0387.
Presumed residual (or unused) code from previous commit.
Removed fcp_txcmplq_cnt.
Link: https://lore.kernel.org/r/20190922035906.10977-20-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
T10 PI support on SLI-4-based FCoE adapters is not supported. A prior
commit in the 12.4.0.0 stream added device recognition that would prevent
T10 PI enablement. However, it didn't contain a complete device list. Thus
some SLI-4 FCoE adapters still had T10 PI enabled.
Fix by expanding the device list that identifies FCoE devices.
Link: https://lore.kernel.org/r/20190922035906.10977-19-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch updates ACQE handling for:
- an EEPROM failure error reported by the adapter.
- ensures that all data for any ACQE, recognized or not, is logged.
- Given that all data is now logged unconditionally, the default case
(unrecognized) data can be reduced.
Link: https://lore.kernel.org/r/20190922035906.10977-18-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In lpfc_release_io_buf, an lpfc_io_buf is returned to the 'available' pool
before any associated sgl or cmd and rsp buffers are returned via their
respective 'put' routines. If xri rebalancing occurs and an lpfc_io_buf
structure is reused quickly, there may be a race condition between release
of old and association of new resources.
Re-ordered lpfc_release_io_buf to release sgl and cmd/rsp
buffer lists before releasing the lpfc_io_buf structure for re-use.
Fixes: d79c9e9d4b ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.")
Link: https://lore.kernel.org/r/20190922035906.10977-17-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Many of the sgl-per-hdwq paths are locking with spin_lock_irq() and
spin_unlock_irq() and may unwittingly raising irq when it shouldn't. Hard
deadlocks were seen around lpfc_scsi_prep_cmnd().
Fix by converting the locks to irqsave/irqrestore.
Fixes: d79c9e9d4b ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.")
Link: https://lore.kernel.org/r/20190922035906.10977-16-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While reviewing the CT behavior, issues with spinlock_irq were seen. The
driver should be using spinlock_irqsave/irqrestore in the els flush
routine.
Changed to spinlock_irqsave/irqrestore.
Link: https://lore.kernel.org/r/20190922035906.10977-15-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After study, it was determined there was a double free of a CT iocb during
execution of lpfc_offline_prep and lpfc_offline. The prep routine issued
an abort for some CT iocbs, but the aborts did not complete fast enough for
a subsequent routine that waits for completion. Thus the driver proceeded
to lpfc_offline, which releases any pending iocbs. Unfortunately, the
completions for the aborts were then received which re-released the ct
iocbs.
Turns out the issue for why the aborts didn't complete fast enough was not
their time on the wire/in the adapter. It was the lpfc_work_done routine,
which requires the adapter state to be UP before it calls
lpfc_sli_handle_slow_ring_event() to process the completions. The issue is
the prep routine takes the link down as part of it's processing.
To fix, the following was performed:
- Prevent the offline routine from releasing iocbs that have had aborts
issued on them. Defer to the abort completions. Also means the driver
fully waits for the completions. Given this change, the recognition of
"driver-generated" status which then releases the iocb is no longer
valid. As such, the change made in the commit 296012285c is reverted.
As recognition of "driver-generated" status is no longer valid, this
patch reverts the changes made in
commit 296012285c ("scsi: lpfc: Fix leak of ELS completions on adapter reset")
- Modify lpfc_work_done to allow slow path completions so that the abort
completions aren't ignored.
- Updated the fdmi path to recognize a CT request that fails due to the
port being unusable. This stops FDMI retries. FDMI will be restarted on
next link up.
Link: https://lore.kernel.org/r/20190922035906.10977-14-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Scenarios were seen where a host hung when the system booted or the host
was very slow in booting. The link would not come up and no luns were
visible to the host.
After investigation, this was found to be due to the introduction of a new
ACQE that adapter may generate to report a adapter hw warning. The ACQE was
delivered to the driver very early in adapter initialization, when the
driver did not expect command completion. As part of handling this
unexpected interrupt the an EQEs are consumed and discarded and the EQ
rearmed. The issue is the CQ that cause the EQE and thus the interrupt was
not processed and the CQ was left unarmed. Meaning it would no longer
generate a new interrupt condition. Subsequent mailbox commands used to
initialize the adapter use the same CQ, and as there was no completion
interrupt generated, the driver never saw the mailbox commands complete and
it would wait long command timeouts.
Fix by having the early flush routine also process the related CQ and rearm
the CQ.
Link: https://lore.kernel.org/r/20190922035906.10977-13-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Coverity flagged several scenarios where checking of null pointer values
wasn't consistent.
Fix the code to that be consistent on checking.
Link: https://lore.kernel.org/r/20190922035906.10977-12-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the port, running as a nvme target, receives an ABTS, it submits
commands to the adapter to Abort i/o outstanding in the adapter. The Abort
command formatting routine left a command field set to zero, which
instructs the adapter to generate an ABTS on the wire as part of cleaning
up the I/O. This is common operation for an initiator, but not for a
target.
Fix the driver to check whether an ABTS had been received for the I/O, and
if so, change the Abort command formatting so that the ABTS generation is
disabled (IA=1). No need to ABTS it when the other side already has.
Also refactored the code such that there is a single routine being used for
nvme or nvmet ABORT requests, and IA is an argument.
Link: https://lore.kernel.org/r/20190922035906.10977-11-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
An issue was seen discovering all SCSI Luns when a target device undergoes
link bounce.
The driver currently does not qualify the FC4 support on the target.
Therefore it will send a SCSI PRLI and an NVMe PRLI. The expectation is
that the target will reject the PRLI if it is not supported. If a PRLI
times out, the driver will retry. The driver will not proceed with the
device until both SCSI and NVMe PRLIs are resolved. In the failure case,
the device is FCP only and does not respond to the NVMe PRLI, thus
initiating the wait/retry loop in the driver. During that time, a RSCN is
received (device bounced) causing the driver to issue a GID_FT. The GID_FT
response comes back before the PRLI mess is resolved and it prematurely
cancels the PRLI retry logic and leaves the device in a STE_PRLI_ISSUE
state. Discovery with the target never completes or resets.
Fix by resetting the node state back to STE_NPR_NODE when GID_FT completes,
thereby restarting the discovery process for the node.
Link: https://lore.kernel.org/r/20190922035906.10977-10-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Faults are seen with RIP of lpfc_scsi_cmd_iocb_cmpl(). The failure is when
lpfc_update_status is being called as part of the completion. After
debugging, it was seen the issue was the shost pointer that the driver
derived from the scsi cmd. The crash showed the cmd->device pointer being
bogus, which is likely as the scsi devices were offlined prior. The bogus
device pointer caused subsequent pointers derived from the location,
specifically the vport, to be bogus.
Fix by adjusting the calling sequence to pass in the vport rather than
having to derive it from the cmd structure.
Link: https://lore.kernel.org/r/20190922035906.10977-9-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Symptoms were seen of the driver not having valid data for mailbox
commands. After debugging, the following sequence was found:
The driver maintains a port-wide pointer of the mailbox command that is
currently in execution. Once finished, the port-wide pointer is cleared
(done in lpfc_sli4_mq_release()). The next mailbox command issued will set
the next pointer and so on.
The mailbox response data is only copied if there is a valid port-wide
pointer.
In the failing case, it was seen that a new mailbox command was being
attempted in parallel with the completion. The parallel path was seeing
the mailbox no long in use (flag check under lock) and thus set the port
pointer. The completion path had cleared the active flag under lock, but
had not touched the port pointer. The port pointer is cleared after the
lock is released. In this case, the completion path cleared the just-set
value by the parallel path.
Fix by making the calls that clear mbox state/port pointer while under
lock. Also slightly cleaned up the error path.
Link: https://lore.kernel.org/r/20190922035906.10977-8-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When target-side fault injections are made, the driver isn't reconnecting
to the remote port. The driver is logging "2753" error messages which
state:
"PLOGI failure DID:1B2400 Status:x3/xf0240008"
The failures status is indicating a Illegal field error, which points to
the Temporary RPI field being used for the ELS. This error typically means
the driver used an RPI that was already registered (shouldn't be registered
if using it in this context).
Study has found that if the driver were in discovery attempts and
encountered an error, it wouldn't flag the temporary rpi in error. Yet the
rpi was released for reallocation in these error paths and another ELS
could allocate the rpi. In the failure situation a retry was done on an ELS
that had encountered an error, and as the rpi wasn't marked in error, the
ELS reused the rpi it originally allocated. But that rpi had been allocated
by a different ELS issued after the original error and before the retry
attempt. The different ELS had succeeded and the RPI was registered.
Fix by marking the rpi state for the node to be in error, aka as needing
reallocation, upon an error in the els processing. Error state marking is
always done prior to release back to the internal rpi free list, which the
driver wasn't doing in cases prior.
Also enhanced some of the logging to help in the next case of problem
troubleshooting.
Link: https://lore.kernel.org/r/20190922035906.10977-7-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A prior use-after-free mailbox fix solved it's problem by null'ing a ndlp
pointer. However, further testing has shown that this change causes a
later state change to occasionally be skipped, which results in a reference
count never being decremented thus the rpi is never released, which causes
a vport delete to never succeed.
Revise the fix in the prior patch to no longer null the ndlp. Instead the
RELEASE_RPI flag is set which will drive the release of the rpi.
Given the new code was added at a deep indentation level, refactor the code
block using a new routine that avoids the indentation issues.
Fixes: 9b16406864 ("scsi: lpfc: Fix use-after-free mailbox cmd completion")
Link: https://lore.kernel.org/r/20190922035906.10977-6-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The nvme-fc transport may call to abort an io on controller reset. If the
driver is out of resources to issue an abort command, it just gives up and
does nothing. The transport expects the lldd to always be able to terminate
an io it has issued. At that point, the controller hangs waiting for
aborted ios to be returned. Note: flaged by "6136" and "6176" error
messages.
Root issue was the adapter mis-allocated the number resources it allocated
for command entries for the adapter.
Convert the driver to allocate command resources based on the number of
xris supported by the FC port - 1 resource for the original command and 1
resource for the abort request.
Link: https://lore.kernel.org/r/20190922035906.10977-5-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Coverity flagged missing status check on register read that flags a
poisoned data return value.
Add checking of register read status.
Link: https://lore.kernel.org/r/20190922035906.10977-4-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use of spin_lock_irq may re-enable interrupts prematurely.
Convert to spin_lock. Note: code is under the phba->hba_lock which has been
locked with irqsave.
Link: https://lore.kernel.org/r/20190922035906.10977-3-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After exchanging PLOGI on an SLI-3 adapter, the PRLI exchange failed. Link
trace showed the port was assigned a non-zero n_port_id, but didn't use the
address on the PRLI. The assigned address is set on the port by the
CONFIG_LINK mailbox command. The driver responded to the PRLI before the
mailbox command completed. Thus the PRLI response used the old n_port_id.
Defer the PRLI response until CONFIG_LINK completes.
Link: https://lore.kernel.org/r/20190922035906.10977-2-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl2J8xQQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpujgD/94s9GGKN8JShxCpT0YNuWyyFF5gNlaimQU
RSGAwnv2YUgEGNSUOPpcaj5FAYhTfYzbqoHlE+jytA2U5KXTOhc5Z85QV+TY4HPs
I03xczYuYD/uX0QuF00zU2+6eV3lETELPiBARbfEQdHfm72iwurweHzlh4dfhbxW
P7UA/cKixXWF2CH9wg5347Ll93nD24f2pi8BUyLJi/xpdlaRrN11Ii8AzNlRmq52
VRxURuogl98W89F6EV2VhPGFgUEYHY2Ot7II2OqqV+jmjHDQW9y5hximzINOqkxs
bQwo5J+WrDSPoqwl8+db2k7QQjAl1XKDAHmCwz+7J/BoOgZj8/M1FMBwzita+5x+
UqxEYe7k+2G3w2zuhBrq03BypU8pwqFep/QI0cCCPaHs4J5QnkVOScEqd6iV/C3T
FPvMvqDf7MrElghj4Qa2IZlh/CgqmLG5NUEz8E40cXkdiP+E+eK9ZY2Uwx2XhBrm
7Gl+SpG5DxWqqJeRNVWjFwM4p5L+01NtwDbTjZ1rsf+mCW5cNsy/L9B4UpPz4HxW
coAs0y/Ce+ZhCopIXZ4jLDBoTG9yoVg8EcyfaHKD2Zz0mUFxa2xm+LvXKeT49qqx
xuodpKD3fiuM7h9Xgv+cDsmn8Rr8gSeXEGV7qzpudmkxbp6IVg/yG5hC/dM921GR
EVrRtUIwdw==
=aAPP
-----END PGP SIGNATURE-----
Merge tag 'for-5.4/post-2019-09-24' of git://git.kernel.dk/linux-block
Pull more block updates from Jens Axboe:
"Some later additions that weren't quite done for the first pull
request, and also a few fixes that have arrived since.
This contains:
- Kill silly pktcdvd warning on attempting to register a non-scsi
passthrough device (me)
- Use symbolic constants for the block t10 protection types, and
switch to handling it in core rather than in the drivers (Max)
- libahci platform missing node put fix (Nishka)
- Small series of fixes for BFQ (Paolo)
- Fix possible nbd crash (Xiubo)"
* tag 'for-5.4/post-2019-09-24' of git://git.kernel.dk/linux-block:
block: drop device references in bsg_queue_rq()
block: t10-pi: fix -Wswitch warning
pktcdvd: remove warning on attempting to register non-passthrough dev
ata: libahci_platform: Add of_node_put() before loop exit
nbd: fix possible page fault for nbd disk
nbd: rename the runtime flags as NBD_RT_ prefixed
block, bfq: push up injection only after setting service time
block, bfq: increase update frequency of inject limit
block, bfq: reduce upper bound for inject limit to max_rq_in_driver+1
block, bfq: update inject limit only after injection occurred
block: centralize PI remapping logic to the block layer
block: use symbolic constants for t10_pi type
For N2N, the NPort ID is assigned by driver in the PLOGI ELS. According to
FW Spec the byte order for SID is not the same as DID.
Link: https://lore.kernel.org/r/20190912180918.6436-8-hmadhani@marvell.com
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Tested-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During link up/bounce, qla driver would do command flush as part of
cleanup. In this case, the flush can intefere with FW state. This patch
allows FW to be in control of link up.
Link: https://lore.kernel.org/r/20190912180918.6436-7-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix stalled link recovery for N2N with FC-NVMe connection.
Link: https://lore.kernel.org/r/20190912180918.6436-6-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In the case of NPIV port is being torn down, this patch will set a flag to
indicate VPORT_DELETE. This would prevent relogin to be triggered.
Link: https://lore.kernel.org/r/20190912180918.6436-5-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On driver unload, 'remove_one' thread was allowed to advance, while session
cleanup still lag behind. This patch ensures session deletion will finish
before remove_one can advance.
Link: https://lore.kernel.org/r/20190912180918.6436-4-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are instances, though rare, where a LOGO request cannot be sent out
and the thread in free session done can wait indefinitely. Fix this by
putting an upper bound to sleep.
Link: https://lore.kernel.org/r/20190912180918.6436-3-hmadhani@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Print if fwdt template is present or not, only when
ql2xextended_error_logging is enabled.
Link: https://lore.kernel.org/r/20190912180918.6436-2-hmadhani@marvell.com
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix sparse warnings:
drivers/scsi/hisi_sas/hisi_sas_main.c:3686:6:
warning: symbol 'hisi_sas_debugfs_release' was not declared. Should it be static?
drivers/scsi/hisi_sas/hisi_sas_main.c:3708:5:
warning: symbol 'hisi_sas_debugfs_alloc' was not declared. Should it be static?
drivers/scsi/hisi_sas/hisi_sas_main.c:3799:6:
warning: symbol 'hisi_sas_debugfs_bist_init' was not declared. Should it be static?
Link: https://lore.kernel.org/r/20190923054035.19036-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
storvsc doesn't use a dedicated hardware queue for a given CPU queue. When
issuing I/O, it selects returning CPU (hardware queue) dynamically based on
vmbus channel usage across all channels.
This patch advertises num_present_cpus() as number of hardware queues. This
will have upper layer setup 1:1 mapping between hardware queue and CPU
queue and avoid unnecessary locking when issuing I/O.
Link: https://lore.kernel.org/r/1567790660-48142-1-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since tmp_prio is declared as u8, the following statement is always false.
tmp_prio < 0
So remove 'always false' statement.
Link: https://lore.kernel.org/r/20190919075548.GA112801@LGEARND20B15
Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In some cases, hba may go through shutdown flow without successful
initialization and then make system hang.
For example, if ufshcd_change_power_mode() gets error and leads to
ufshcd_hba_exit() to release resources of the host, future shutdown flow
may hang the system since the host register will be accessed in unpowered
state.
To solve this issue, simply add checking to skip shutdown for above kind of
situation.
Link: https://lore.kernel.org/r/1568780438-28753-1-git-send-email-stanley.chu@mediatek.com
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Acked-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The qla2xxx driver had this issue as well when the newer array firmware
returned the retry_delay_timer in the fcp_rsp. The bnx2fc is not handling
the masking of the scope bits either so the retry_delay_timestamp value
lands up being a large value added to the timer timestamp delaying I/O for
up to 27 Minutes. This patch adds similar code to handle this to the
bnx2fc driver to avoid the huge delay.
Link: https://lore.kernel.org/r/1568210202-12794-1-git-send-email-loberman@redhat.com
Signed-off-by: Laurence Oberman <loberman@redhat.com>
Reported-by: David Jeffery <djeffery@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This was missing from scsi_device_from_queue() due to the introduction of
another new scsi_mq_ops_no_commit of linux-next commit 8930a6c207 ("scsi:
core: add support for request batching") from Martin's scsi/5.4/scsi-queue
or James' scsi/misc.
Only devicehandler code seems to call scsi_device_from_queue():
*** drivers/scsi/scsi_dh.c:
scsi_dh_activate[255] sdev = scsi_device_from_queue(q);
scsi_dh_set_params[302] sdev = scsi_device_from_queue(q);
scsi_dh_attach[325] sdev = scsi_device_from_queue(q);
scsi_dh_attached_handler_name[363] sdev = scsi_device_from_queue(q);
Fixes multipath tools follow-on errors:
$ multipath -v6
...
libdevmapper: ioctl/libdm-iface.c(1887): device-mapper: reload ioctl on mpatha failed: No such device
...
mpatha: failed to load map, error 19
...
showing also as kernel messages:
device-mapper: table: 252:0: multipath: error attaching hardware handler
device-mapper: ioctl: error adding target to table
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 8930a6c207 ("scsi: core: add support for request batching")
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This was missing from scsi_mq_ops_no_commit of linux-next commit
8930a6c207 ("scsi: core: add support for request batching") from Martin's
scsi/5.4/scsi-queue or James' scsi/misc.
See also linux-next commit b7e9e1fb7a ("scsi: implement .cleanup_rq
callback") from block/for-next.
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 8930a6c207 ("scsi: core: add support for request batching")
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is mostly update of the usual drivers: qla2xxx, ufs, smartpqi,
lpfc, hisi_sas, qedf, mpt3sas; plus a whole load of minor updates.
The only core change this time around is the addition of request
batching for virtio. Since batching requires an additional flag to
use, it should be invisible to the rest of the drivers.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXYQE/yYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishXs9AP4usPY5
OpMlF6OiKFNeJrCdhCScVghf9uHbc7UA6cP+EgD/bCtRgcDe1ZjOTYWdeTwvwWqA
ltWYonnv6Lg3b1f9yqI=
=jRC/
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly update of the usual drivers: qla2xxx, ufs, smartpqi,
lpfc, hisi_sas, qedf, mpt3sas; plus a whole load of minor updates. The
only core change this time around is the addition of request batching
for virtio. Since batching requires an additional flag to use, it
should be invisible to the rest of the drivers"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (264 commits)
scsi: hisi_sas: Fix the conflict between device gone and host reset
scsi: hisi_sas: Add BIST support for phy loopback
scsi: hisi_sas: Add hisi_sas_debugfs_alloc() to centralise allocation
scsi: hisi_sas: Remove some unused function arguments
scsi: hisi_sas: Remove redundant work declaration
scsi: hisi_sas: Remove hisi_sas_hw.slot_complete
scsi: hisi_sas: Assign NCQ tag for all NCQ commands
scsi: hisi_sas: Update all the registers after suspend and resume
scsi: hisi_sas: Retry 3 times TMF IO for SAS disks when init device
scsi: hisi_sas: Remove sleep after issue phy reset if sas_smp_phy_control() fails
scsi: hisi_sas: Directly return when running I_T_nexus reset if phy disabled
scsi: hisi_sas: Use true/false as input parameter of sas_phy_reset()
scsi: hisi_sas: add debugfs auto-trigger for internal abort time out
scsi: virtio_scsi: unplug LUNs when events missed
scsi: scsi_dh_rdac: zero cdb in send_mode_select()
scsi: fcoe: fix null-ptr-deref Read in fc_release_transport
scsi: ufs-hisi: use devm_platform_ioremap_resource() to simplify code
scsi: ufshcd: use devm_platform_ioremap_resource() to simplify code
scsi: hisi_sas: use devm_platform_ioremap_resource() to simplify code
scsi: ufs: Use kmemdup in ufshcd_read_string_desc()
...
Pull networking updates from David Miller:
1) Support IPV6 RA Captive Portal Identifier, from Maciej Żenczykowski.
2) Use bio_vec in the networking instead of custom skb_frag_t, from
Matthew Wilcox.
3) Make use of xmit_more in r8169 driver, from Heiner Kallweit.
4) Add devmap_hash to xdp, from Toke Høiland-Jørgensen.
5) Support all variants of 5750X bnxt_en chips, from Michael Chan.
6) More RTNL avoidance work in the core and mlx5 driver, from Vlad
Buslov.
7) Add TCP syn cookies bpf helper, from Petar Penkov.
8) Add 'nettest' to selftests and use it, from David Ahern.
9) Add extack support to drop_monitor, add packet alert mode and
support for HW drops, from Ido Schimmel.
10) Add VLAN offload to stmmac, from Jose Abreu.
11) Lots of devm_platform_ioremap_resource() conversions, from
YueHaibing.
12) Add IONIC driver, from Shannon Nelson.
13) Several kTLS cleanups, from Jakub Kicinski.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1930 commits)
mlxsw: spectrum_buffers: Add the ability to query the CPU port's shared buffer
mlxsw: spectrum: Register CPU port with devlink
mlxsw: spectrum_buffers: Prevent changing CPU port's configuration
net: ena: fix incorrect update of intr_delay_resolution
net: ena: fix retrieval of nonadaptive interrupt moderation intervals
net: ena: fix update of interrupt moderation register
net: ena: remove all old adaptive rx interrupt moderation code from ena_com
net: ena: remove ena_restore_ethtool_params() and relevant fields
net: ena: remove old adaptive interrupt moderation code from ena_netdev
net: ena: remove code duplication in ena_com_update_nonadaptive_moderation_interval _*()
net: ena: enable the interrupt_moderation in driver_supported_features
net: ena: reimplement set/get_coalesce()
net: ena: switch to dim algorithm for rx adaptive interrupt moderation
net: ena: add intr_moder_rx_interval to struct ena_com_dev and use it
net: phy: adin: implement Energy Detect Powerdown mode via phy-tunable
ethtool: implement Energy Detect Powerdown support via phy-tunable
xen-netfront: do not assume sk_buff_head list is empty in error handling
s390/ctcm: Delete unnecessary checks before the macro call “dev_kfree_skb”
net: ena: don't wake up tx queue when down
drop_monitor: Better sanitize notified packets
...
Currently t10_pi_prepare/t10_pi_complete functions are called during the
NVMe and SCSi layers command preparetion/completion, but their actual
place should be the block layer since T10-PI is a general data integrity
feature that is used by block storage protocols. Introduce .prepare_fn
and .complete_fn callbacks within the integrity profile that each type
can implement according to its needs.
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Suggested-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Fixed to not call queue integrity functions if BLK_DEV_INTEGRITY
isn't defined in the config.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQUwxxKyE5l/npt8ARiEGxRG/Sl2wUCXYAIeQAKCRBiEGxRG/Sl
2/SzAQDEnoNxzV/R5kWFd+2kmFeY3cll0d99KMrWJ8om+kje6QD/cXxZHzFm+T1L
UPF66k76oOODV7cyndjXnTnRXbeCRAM=
=Szby
-----END PGP SIGNATURE-----
Merge tag 'leds-for-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
Pull LED updates from Jacek Anaszewski:
"In this cycle we've finally managed to contribute the patch set
sorting out LED naming issues. Besides that there are many changes
scattered among various LED class drivers and triggers.
LED naming related improvements:
- add new 'function' and 'color' fwnode properties and deprecate
'label' property which has been frequently abused for conveying
vendor specific names that have been available in sysfs anyway
- introduce a set of standard LED_FUNCTION* definitions
- introduce a set of standard LED_COLOR_ID* definitions
- add a new {devm_}led_classdev_register_ext() API with the
capability of automatic LED name composition basing on the
properties available in the passed fwnode; the function is
backwards compatible in a sense that it uses 'label' data, if
present in the fwnode, for creating LED name
- add tools/leds/get_led_device_info.sh script for retrieving LED
vendor, product and bus names, if applicable; it also performs
basic validation of an LED name
- update following drivers and their DT bindings to use the new LED
registration API:
- leds-an30259a, leds-gpio, leds-as3645a, leds-aat1290, leds-cr0014114,
leds-lm3601x, leds-lm3692x, leds-lp8860, leds-lt3593, leds-sc27xx-blt
Other LED class improvements:
- replace {devm_}led_classdev_register() macros with inlines
- allow to call led_classdev_unregister() unconditionally
- switch to use fwnode instead of be stuck with OF one
LED triggers improvements:
- led-triggers:
- fix dereferencing of null pointer
- fix a memory leak bug
- ledtrig-gpio:
- GPIO 0 is valid
Drop superseeded apu2/3 support from leds-apu since for apu2+ a newer,
more complete driver exists, based on a generic driver for the AMD
SOCs gpio-controller, supporting LEDs as well other devices:
- drop profile field from priv data
- drop iosize field from priv data
- drop enum_apu_led_platform_types
- drop superseeded apu2/3 led support
- add pr_fmt prefix for better log output
- fix error message on probing failure
Other misc fixes and improvements to existing LED class drivers:
- leds-ns2, leds-max77650:
- add of_node_put() before return
- leds-pwm, leds-is31fl32xx:
- use struct_size() helper
- leds-lm3697, leds-lm36274, leds-lm3532:
- switch to use fwnode_property_count_uXX()
- leds-lm3532:
- fix brightness control for i2c mode
- change the define for the fs current register
- fixes for the driver for stability
- add full scale current configuration
- dt: Add property for full scale current.
- avoid potentially unpaired regulator calls
- move static keyword to the front of declarations
- fix optional led-max-microamp prop error handling
- leds-max77650:
- add of_node_put() before return
- add MODULE_ALIAS()
- Switch to fwnode property API
- leds-as3645a:
- fix misuse of strlcpy
- leds-netxbig:
- add of_node_put() in netxbig_leds_get_of_pdata()
- remove legacy board-file support
- leds-is31fl319x:
- simplify getting the adapter of a client
- leds-ti-lmu-common:
- fix coccinelle issue
- move static keyword to the front of declaration
- leds-syscon:
- use resource managed variant of device register
- leds-ktd2692:
- fix a typo in the name of a constant
- leds-lp5562:
- allow firmware files up to the maximum length
- leds-an30259a:
- fix typo
- leds-pca953x:
- include the right header"
* tag 'leds-for-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: (72 commits)
leds: lm3532: Fix optional led-max-microamp prop error handling
led: triggers: Fix dereferencing of null pointer
leds: ti-lmu-common: Move static keyword to the front of declaration
leds: lm3532: Move static keyword to the front of declarations
leds: trigger: gpio: GPIO 0 is valid
leds: pwm: Use struct_size() helper
leds: is31fl32xx: Use struct_size() helper
leds: ti-lmu-common: Fix coccinelle issue in TI LMU
leds: lm3532: Avoid potentially unpaired regulator calls
leds: syscon: Use resource managed variant of device register
leds: Replace {devm_}led_classdev_register() macros with inlines
leds: Allow to call led_classdev_unregister() unconditionally
leds: lm3532: Add full scale current configuration
dt: lm3532: Add property for full scale current.
leds: lm3532: Fixes for the driver for stability
leds: lm3532: Change the define for the fs current register
leds: lm3532: Fix brightness control for i2c mode
leds: Switch to use fwnode instead of be stuck with OF one
leds: max77650: Switch to fwnode property API
led: triggers: Fix a memory leak bug
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl1/no0QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpmo9EACFXMbdNmEEUMyRSdOkVLlr7ZlTyQi1tLpB
YESDPxdBfybzpi0qa8JSaysGIfvSkSjmSAqBqrWPmASOSOL6CK4bbA4fTYbgPplk
XeHUdgGiG34oCQUn8Xil5reYaTm7I6LQWnWTpVa5fIhAyUYaGJL+987ykoGmpQmB
Dvf3YSc+8H0RTp9PCMVd6UCGPkZbVlLImGad3PF5ULvTEaE4RCXC2aiAgh0p1l5A
J2CkRZ+/mio3zN2O4YN7VdPGfr1Wo1iZ834xbIGLegv1miHXagFk7jwTcC7zIt5t
oSnJnqIg3iCe7SpWt4Bkzw/zy/2UqaspifbCMgw8vychlViVRUHFO5h85Yboo7kQ
OMLEQPcwjm6dTHv5h1iXF9LW1O7NoiYmmgvApU9uOo1HUrl1X7PZ3JEfUsVHxkOO
T4D5igf0Krsl1eAbiwEUQzy7vFZ8PlRHqrHgK+fkyotzHu1BJR7OQkYygEfGFOB/
EfMxplGDpmibYGuWCwDX2bPAmLV3SPUQENReHrfPJRDt5TD1UkFpVGv/PLLhbr0p
cLYI78DKpDSigBpVMmwq5nTYpnex33eyDTTA8C0sakcsdzdmU5qv30y3wm4nTiep
f6gZo6IMXwRg/rCgVVrd9SKQAr/8wEzVlsDW3qyi2pVT8sHIgm0tFv7paihXGdDV
xsKgmTrQQQ==
=Qt+h
-----END PGP SIGNATURE-----
Merge tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
- Two NVMe pull requests:
- ana log parse fix from Anton
- nvme quirks support for Apple devices from Ben
- fix missing bio completion tracing for multipath stack devices
from Hannes and Mikhail
- IP TOS settings for nvme rdma and tcp transports from Israel
- rq_dma_dir cleanups from Israel
- tracing for Get LBA Status command from Minwoo
- Some nvme-tcp cleanups from Minwoo, Potnuri and Myself
- Some consolidation between the fabrics transports for handling
the CAP register
- reset race with ns scanning fix for fabrics (move fabrics
commands to a dedicated request queue with a different lifetime
from the admin request queue)."
- controller reset and namespace scan races fixes
- nvme discovery log change uevent support
- naming improvements from Keith
- multiple discovery controllers reject fix from James
- some regular cleanups from various people
- Series fixing (and re-fixing) null_blk debug printing and nr_devices
checks (André)
- A few pull requests from Song, with fixes from Andy, Guoqing,
Guilherme, Neil, Nigel, and Yufen.
- REQ_OP_ZONE_RESET_ALL support (Chaitanya)
- Bio merge handling unification (Christoph)
- Pick default elevator correctly for devices with special needs
(Damien)
- Block stats fixes (Hou)
- Timeout and support devices nbd fixes (Mike)
- Series fixing races around elevator switching and device add/remove
(Ming)
- sed-opal cleanups (Revanth)
- Per device weight support for BFQ (Fam)
- Support for blk-iocost, a new model that can properly account cost of
IO workloads. (Tejun)
- blk-cgroup writeback fixes (Tejun)
- paride queue init fixes (zhengbin)
- blk_set_runtime_active() cleanup (Stanley)
- Block segment mapping optimizations (Bart)
- lightnvm fixes (Hans/Minwoo/YueHaibing)
- Various little fixes and cleanups
* tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block: (186 commits)
null_blk: format pr_* logs with pr_fmt
null_blk: match the type of parameter nr_devices
null_blk: do not fail the module load with zero devices
block: also check RQF_STATS in blk_mq_need_time_stamp()
block: make rq sector size accessible for block stats
bfq: Fix bfq linkage error
raid5: use bio_end_sector in r5_next_bio
raid5: remove STRIPE_OPS_REQ_PENDING
md: add feature flag MD_FEATURE_RAID0_LAYOUT
md/raid0: avoid RAID0 data corruption due to layout confusion.
raid5: don't set STRIPE_HANDLE to stripe which is in batch list
raid5: don't increment read_errors on EILSEQ return
nvmet: fix a wrong error status returned in error log page
nvme: send discovery log page change events to userspace
nvme: add uevent variables for controller devices
nvme: enable aen regardless of the presence of I/O queues
nvme-fabrics: allow discovery subsystems accept a kato
nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery()
nvme: Remove redundant assignment of cq vector
nvme: Assign subsys instance from first ctrl
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJdf64MAAoJEKurIx+X31iBB20P/07o93sBT92SiA2/ety9sLqV
BGJmEdw7gyb9WVbUip6s71FIEKZw4foCGkqDiX+lr5Fw2A9tiK7LmFgTLi4LLwg+
syhYZ1y5/mwBI4FLlJudKjQdFZjr/n7DNlz4H67woE2kK+FyRsOKEaFUhuR8+0rC
mKJBKtIGnoIOPG06PT1k5qfdpzlreCFoWdIhjO55LfDgZnnDiMaX5h0vcBQ9xgCp
xGV0n/f7+qn4pzB4hGvNV209Sdgv2V4t77bHNvyXlJrM5Hqzafo5MzFgEJv+fRqJ
2RnkWVhwctfbid/2ggf2aAsYnMK3GigEaOCsYW2oWJESVUQhxIi3ndF/Jt9fraZv
ZouD7G/s64P5lUQuCT9JnKGzJrSgxvkd37049AZ4pFVc2MzLC6o6dyyP8pu5ARe8
T0shFik3+gsml2US/vSUzxvrg1saRQjl9E/AJ0RTZ8oyP4FNnFmkJf38qj3a0L0k
ILFYscM5q7WPggoDA/m6F96tLGhdK/sKjDzrADjEh2dIvn4woqoEJSDn+rXuP+Gm
UOj1v8mILZCqvOAmc9IkGCkPUlbrmNV/1FYh5+GWudtillEaD82vjSqm+jnVbfXD
REvHlR/kxCSj1gg/+nk+NFdZCkW3xETOcTZohhDkR7du2mHjTwBMZ2YRPrqoX4c8
VZA57Mrqm5Uk5601qYRl
=L5e+
-----END PGP SIGNATURE-----
Merge tag 'please-pull-ia64_for_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull ia64 updates from Tony Luck:
"The big change here is removal of support for SGI Altix"
* tag 'please-pull-ia64_for_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: (33 commits)
genirq: remove the is_affinity_mask_valid hook
ia64: remove CONFIG_SWIOTLB ifdefs
ia64: remove support for machvecs
ia64: move the screen_info setup to common code
ia64: move the ROOT_DEV setup to common code
ia64: rework iommu probing
ia64: remove the unused sn_coherency_id symbol
ia64: remove the SGI UV simulator support
ia64: remove the zx1 swiotlb machvec
ia64: remove CONFIG_ACPI ifdefs
ia64: remove CONFIG_PCI ifdefs
ia64: remove the hpsim platform
ia64: remove now unused machvec indirections
ia64: remove support for the SGI SN2 platform
drivers: remove the SGI SN2 IOC4 base support
drivers: remove the SGI SN2 IOC3 base support
qla2xxx: remove SGI SN2 support
qla1280: remove SGI SN2 support
misc/sgi-xp: remove SGI SN2 support
char/mspec: remove SGI SN2 support
...
Currently blk_set_runtime_active() is checking if q->dev is null by
itself, thus remove the same checking in its user: scsi_dev_type_resume().
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When device gone, it will check whether it is during reset, if not, it will
send internal task abort. Before internal task abort returned, reset
begins, and it will check whether SAS_PHY_UNUSED is set, if not, it will
call hisi_sas_init_device(), but at that time domain_device may already be
freed or part of it is freed, so it may referenece null pointer in
hisi_sas_init_device(). It may occur as follows:
thread0 thread1
hisi_sas_dev_gone()
check whether in RESET(no)
internal task abort
reset prep
soft_reset
... (part of reset_done)
internal task abort failed
release resource anyway
clear_itct
device->lldd_dev=NULL
hisi_sas_reset_init_all_device
check sas_dev->dev_type is SAS_PHY_UNUSED and
!device
set dev_type SAS_PHY_UNUSED
sas_free_device
hisi_sas_init_device
...
Semaphore hisi_hba.sema is used to sync the processes of device gone and
host reset.
To solve the issue, expand the scope that semaphore protects and let them
never occur together.
And also some places will check whether domain_device is NULL to judge
whether the device is gone. So when device gone, need to clear
sas_dev->sas_device.
Link: https://lore.kernel.org/r/1567774537-20003-14-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add BIST (built in self test) support for phy loopback.
Through the new debugfs interface, the user can configure loopback
mode/linkrate/phy id/code mode before enabling it. And also user can
enable/disable BIST function.
Link: https://lore.kernel.org/r/1567774537-20003-13-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We extract the code of memory allocate and construct an new function for
it. We think it's convenient for subsequent optimization.
Link: https://lore.kernel.org/r/1567774537-20003-12-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some function arguments are unused, so remove them.
Also move the timeout print in for wait_cmds_complete_timeout_vX_hw()
callsites into that same function.
Link: https://lore.kernel.org/r/1567774537-20003-11-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently the NCQ tag is only assigned for FPDMA READ and FPDMA WRITE
commands, and for other NCQ commands (such as FPDMA SEND), their NCQ tags
are set in the delivery command to 0.
So for all the NCQ commands, we also need to assign normal NCQ tag for
them, so drop the command type check in hisi_sas_get_ncq_tag() [drop
hisi_sas_get_ncq_tag() altogether actually], and always use the ATA command
NCQ tag when appropriate.
Link: https://lore.kernel.org/r/1567774537-20003-8-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After suspend and resume, the HW registers will be set back to their
initial value. We use init_reg_v3_hw() to set some registers, but some
registers are set via firmware in ACPI "_RST" method, so add reset handler
before init_reg_v3_hw().
Link: https://lore.kernel.org/r/1567774537-20003-7-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When init device for SAS disks, it will send TMF IO to clear disks. At that
time TMF IO is broken by some operations such as injecting controller reset
from HW RAs event, the TMF IO will be timeout, and at last device will be
gone. Print is as followed:
hisi_sas_v3_hw 0000:74:02.0: dev[240:1] found
...
hisi_sas_v3_hw 0000:74:02.0: controller resetting...
hisi_sas_v3_hw 0000:74:02.0: phyup: phy7 link_rate=10(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy0 link_rate=9(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy1 link_rate=9(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy2 link_rate=9(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy3 link_rate=9(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy6 link_rate=10(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy5 link_rate=11
hisi_sas_v3_hw 0000:74:02.0: phyup: phy4 link_rate=11
hisi_sas_v3_hw 0000:74:02.0: controller reset complete
hisi_sas_v3_hw 0000:74:02.0: abort tmf: TMF task timeout and not done
hisi_sas_v3_hw 0000:74:02.0: dev[240:1] is gone
sas: driver on host 0000:74:02.0 cannot handle device 5000c500a75a860d,
error:5
To improve the reliability, retry TMF IO max of 3 times for SAS disks which
is the same as softreset does.
Link: https://lore.kernel.org/r/1567774537-20003-6-git-send-email-john.garry@huawei.com
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
At expander environment, we delay after issue phy reset to wait for
hardware to handle phy reset. But if sas_smp_phy_control() fails, the
delay is unnecessary so remove it.
Link: https://lore.kernel.org/r/1567774537-20003-5-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
At hisi_sas_debug_I_T_nexus_reset(), we call sas_phy_reset() to reset a
phy. But if the phy is disabled, sas_phy_reset() will directly return
-ENODEV without issue a phy reset request.
If so, We can directly return -ENODEV to libsas before issue a phy
reset.
Link: https://lore.kernel.org/r/1567774537-20003-4-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When calling sas_phy_reset(), we need to specify whether the reset type
is hard reset or link reset - use true/false for clarity.
Link: https://lore.kernel.org/r/1567774537-20003-3-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The event handler calls scsi_scan_host() when events are missed, which will
hotplug new LUNs. However, this function won't remove any unplugged LUNs.
The result is that hotunplug doesn't work properly when the number of
unplugged LUNs exceeds the event queue size (currently 8).
Scan existing LUNs when events are missed to check if they are still
present. If not, remove them.
Link: https://lore.kernel.org/r/20190905181903.29756-1-mlupfer@ddn.com
Signed-off-by: Matt Lupfer <mlupfer@ddn.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
cdb in send_mode_select() is not zeroed and is only partially filled in
rdac_failover_get(), which leads to some random data getting to the
device. Users have reported storage responding to such commands with
INVALID FIELD IN CDB. Code before commit 3278255741 was not affected, as
it called blk_rq_set_block_pc().
Fix this by zeroing out the cdb first.
Identified & fix proposed by HPE.
Fixes: 3278255741 ("scsi_dh_rdac: switch to scsi_execute_req_flags()")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20190904155205.1666-1-martin.wilck@suse.com
Signed-off-by: Martin Wilck <mwilck@suse.com>
Acked-by: Ales Novak <alnovak@suse.cz>
Reviewed-by: Shane Seymour <shane.seymour@hpe.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In fcoe_if_init, if fc_attach_transport(&fcoe_vport_fc_functions)
fails, need to free the previously memory and return fail, otherwise
will trigger null-ptr-deref Read in fc_release_transport.
fcoe_exit
fcoe_if_exit
fc_release_transport(fcoe_vport_scsi_transport)
Link: https://lore.kernel.org/r/1566279789-58207-1-git-send-email-zhengbin13@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Link: https://lore.kernel.org/r/20190904130457.24744-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Link: https://lore.kernel.org/r/20190904130348.24772-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.
Link: https://lore.kernel.org/r/20190904130256.24704-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use kmemdup rather than duplicating its implementation
Link: https://lore.kernel.org/r/20190831124424.18642-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The UFS_RESET pin on Qualcomm SoCs are controlled by TLMM and exposed
through the GPIO framework. Acquire the device-reset GPIO and use this to
implement the device_reset vops, to allow resetting the attached memory.
Based on downstream support implemented by Subhash Jadavani
<subhashj@codeaurora.org>.
Link: https://lore.kernel.org/r/20190828191756.24312-3-bjorn.andersson@linaro.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Avri Altman <Avri.Altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some UFS memory devices needs their reset line toggled in order to get them
into a good state for initialization. Provide a new vops to allow the
platform driver to implement this operation.
Link: https://lore.kernel.org/r/20190828191756.24312-2-bjorn.andersson@linaro.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Acked-by: Avri Altman <Avri.Altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A recent patch unconditionally marks the hba as in error as part of
resetting the adapter. The driver flow that called the adapter reset was a
recovery path, which expects the adapter to not be in an error state in
order to finish the recovery. Given the new error state being set, the
recovery fails and the adapter is left in limbo.
Revise the adapter reset routine so that it will only mark the adapter in
error if it was unable to reset the adapter.
Fixes: 8c24a4f643 ("scsi: lpfc: Fix crash due to port reset racing vs adapter error handling")
Link: https://lore.kernel.org/r/20190903215441.10490-1-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Convert the remaining %pf users to %ps to prepare for the removal of the
old %pf conversion specifier support.
Fixes: 3235066449 ("scsi: lpfc: Migrate to %px and %pf in kernel print calls")
Link: https://lore.kernel.org/r/20190904160423.3865-1-sakari.ailus@linux.intel.com
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On fast cable pull, where driver is unable to detect device has disappeared
and came back based on switch info, qla2xxx would not re-login while remote
port has already invalidated the session. This causes IO timeout. This
patch would relogin to remote device for RSCN affected port.
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Link: https://lore.kernel.org/r/20190830222402.23688-6-hmadhani@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Login session was stucked on cable pull. When FW is in the middle PRLI
PENDING + driver is in Initiator mode, driver fails to check back with FW to
see if the PRLI has completed. This patch would re-check with FW again to
make sure PRLI would complete before pushing forward with relogin.
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Link: https://lore.kernel.org/r/20190830222402.23688-5-hmadhani@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
HINT_MBX_INT_PENDING is not guaranteed to be cleared by firmware. Remove
check that prevent driver load with ISP82XX.
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Link: https://lore.kernel.org/r/20190830222402.23688-4-hmadhani@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use adapter specific callback to read flash instead of ISP adapter
specific.
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Link: https://lore.kernel.org/r/20190830222402.23688-3-hmadhani@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch updates log message which indicates number of vectors used by
the driver instead of displaying failure to get maximum requested
vectors. Driver will always request maximum vectors during
initialization. In the event driver is not able to get maximum requested
vectors, it will adjust the allocated vectors. This is normal and does not
imply failure in driver.
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Link: https://lore.kernel.org/r/20190830222402.23688-2-hmadhani@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For commands completing with a resid not aligned on the device logical
sector size, also print the command CDB in addition to the current message
to help debug hardware generating such incorrect command completion
information.
Link: https://lore.kernel.org/r/20190828053511.14818-1-damien.lemoal@wdc.com
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
pci_alloc_irq_vectors() returns number of vectors allocated. Fix the check
for error condition.
Fixes: cca678dfba ("scsi: fnic: switch to pci_alloc_irq_vectors")
Link: https://lore.kernel.org/r/20190827211340.1095-1-gvaradar@cisco.com
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Acked-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Just a single lpfc fix adjusting the number of available queues for
high CPU count systems.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXXK/9iYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishUTCAP9C9a9W
sUBdDpe1bedPFJBBqT3540rucXGlSINXpm20RAEA7C9BkrHk7wFpCmieZscdDG2v
T5o0P6RYDEShcm91HLk=
=lrs3
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"Just a single lpfc fix adjusting the number of available queues for
high CPU count systems"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Raise config max for lpfc_fcp_mq_threshold variable
Using the helper blk_queue_required_elevator_features(), set the
elevator feature ELEVATOR_F_ZBD_SEQ_WRITE as required for the request
queue of SCSI ZBC disks.
This feature requirement can always be satisfied as the mq-deadline
elevator is always selected for in-kernel compilation when
CONFIG_BLK_DEV_ZONED (zoned block device support) is enabled.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Port speed printing was added by commit d948e6383e ("scsi: fnic: Add port
speed stat to fnic debug stats"). As currently configured, this will cause
the port speed to be printed to syslog every 2 seconds. To prevent log
spamming, only print the vnic port speed at driver initialization and if
the speed changes. Also clean up a small typo in fnic_trace.c.
Fixes: d948e6383e ("scsi: fnic: Add port speed stat to fnic debug stats")
Signed-off-by: John Pittman <jpittman@redhat.com>
Reviewed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/bnx2fc/bnx2fc_hwi.c: In function bnx2fc_process_unsol_compl:
drivers/scsi/bnx2fc/bnx2fc_hwi.c:636:30: warning: variable task set but not used [-Wunused-but-set-variable]
drivers/scsi/bnx2fc/bnx2fc_hwi.c: In function bnx2fc_process_ofld_cmpl:
drivers/scsi/bnx2fc/bnx2fc_hwi.c:1125:21: warning: variable port set but not used [-Wunused-but-set-variable]
drivers/scsi/bnx2fc/bnx2fc_hwi.c: In function bnx2fc_init_seq_cleanup_task:
drivers/scsi/bnx2fc/bnx2fc_hwi.c:1468:30: warning: variable orig_task set but not used [-Wunused-but-set-variable]
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/bnx2fc/bnx2fc_io.c: In function bnx2fc_initiate_seq_cleanup:
drivers/scsi/bnx2fc/bnx2fc_io.c:932:19: warning: variable lport set but not used [-Wunused-but-set-variable]
drivers/scsi/bnx2fc/bnx2fc_io.c: In function bnx2fc_initiate_cleanup:
drivers/scsi/bnx2fc/bnx2fc_io.c:1001:19: warning: variable lport set but not used [-Wunused-but-set-variable]
drivers/scsi/bnx2fc/bnx2fc_io.c: In function bnx2fc_process_scsi_cmd_compl:
drivers/scsi/bnx2fc/bnx2fc_io.c:1882:20: warning: variable host set but not used [-Wunused-but-set-variable]
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/bnx2fc/bnx2fc_fcoe.c: In function bnx2fc_rcv:
drivers/scsi/bnx2fc/bnx2fc_fcoe.c:431:26: warning: variable fh set but not used [-Wunused-but-set-variable]
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Acked-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update the driver version to 8.42.3.0.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is a race b/w fipvlan request and response path:
=====
qedf_fcoe_process_vlan_resp:113]:2: VLAN response, vid=0xffd.
qedf_initiate_fipvlan_req:165]:2: vlan = 0x6ffd already set.
qedf_set_vlan_id:139]:2: Setting vlan_id=0ffd prio=3.
======
The request thread sees that vlan is already set and fails to call
ctrl_link_up.
Fix:
- While setting vlan_id use local variable and before setting vlan_id.
- Call fcoe_ctlr_link_up in next iteration of fipvlan request.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The list of rports might become stale so we should rather traverse the
discovery list when trying relogin.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Prevent race where we're removing the module and we get link update
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Problem Statement:
- Driver has fc_id of 0xcc0200
- Driver gets link down (due to test) and calls fcoe_ctlr_link_down().
- At this point, the fc_id of the initiator port is zeroed out.
- Driver gets a link up 14 seconds later.
- Driver performs FIP VLAN request, gets a response from the switch.
- No change in VLAN is detected.
- Driver then notifies libfcoe via fcoe_ctlr_link_up().
- Libfcoe then issues a multicast discovery solicitation as expected.
- Cisco FCF responds to that correctly.
- Libfcoe at this point starts a 3 sec count-down to allow any other FCFs
to be discovered. However, at this point, it has been 20 seconds since
the last FKA from the driver (which would have been sent prior to
backlink toggle), which causes the CVL to be issued from Cisco CVL from
the switch is dropped by the driver as the vx_port identification
descriptor is present and has value of 0xcc0200, which does not match
the driver's value of 0. Libfcoe completes the 3 sec count down and
proceeds to issue FLOGI as per protocol. Switch rejects FLogi request.
All subsequent FLOGI requests from libfc are rejected by the switch
(possibly because it is now expecting a new solicitation). This
situation will continue until the next link toggle.
Solution:
The Vx_port descriptor in the CVL has three fields:
MAC address
Fabric ID
Port Name
Today, the code checks for both #1 and #2 above. In the case where we went
through a link down, both these will be zero until FLOGI succeeds.
We should change our code to check if any one of these 3 is valid and if
so, handle the CVL (basically switching from AND to OR). The port name
field is definitely expected to be valid always.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Log s_id, d_id, type and command to the log message.
[mkp: fixed warning]
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current code doeesn't support 20Gbps speed for current and supported
speed. Add support for it.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver was wrongly interpreting the supported cap value returned by qed.
Solution: Use QED define macros instead of OS defined for interpreting
supporting speeds.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver was attempting to print cdb[0], which is not set for resets coming
from SCSI ioctls. Check for cmd_len before accessing cmnd.
Crash info:
[84790.864747] BUG: unable to handle kernel NULL pointer dereference at (null)
[84790.864783] IP: qedf_initiate_tmf+0x7a/0x6e0 [qedf]
[84790.865204] Call Trace:
[84790.865246] scsi_try_target_reset+0x2b/0x90 [scsi_mod]
[84790.865266] scsi_ioctl_reset+0x20f/0x2a0 [scsi_mod]
[84790.865284] scsi_ioctl+0x131/0x3a0 [scsi_mod]
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
- On some setups fipvlan can be retried for long duration and the
connection to switch was not there so it was not getting any reply.
- During unload this thread was hanging.
Problem Resolution:
Check if unload is in progress, then quit from fipvlan thread.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Print messages during exiting condition to help debugging.
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Gerry Morong <gerry.morong@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add:
PM8222 VID_9005, DID_028F, SVID_1BD4 and SDID_004F
3101E-4i (1G, no GB) VID_9005, DID_028F, SVID_9005 and SDID_0808
3102E-8i (2G, no GB) VID_9005, DID_028F, SVID_9005 and SDID_0809
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Gilbert Wu <gilbert.wu@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Return -EINPROGRESS when a rescan worker is queued.
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Gilbert Wu <gilbert.wu@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When each ld is deleted, a rescan event is triggered in the driver. These
can stack up waiting on mutex_lock.
Change to mutex_try_lock and schedule a rescan for later.
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Return identify physical device "Phys_Bay_in_Box" as bay_identifier.
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Gilbert Wu <gilbert.wu@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
- serial number
- model
- vendor
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Gilbert Wu <gilbert.wu@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Expose physical devices before logical devices.
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Gilbert Wu <gilbert.wu@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The 12.4.0.0 patch that merged WQ/CQ pairs into single per-cpu pair
contained a bug: a local variable was set to the queue pair by index. This
should have allowed the local variable to be natively used. Instead, the
code reused the index relative to the local variable, obtaining a random
pointer value that when used eventually faulted the system
Convert offending code to use local variable.
Fixes: c00f62e6c5 ("scsi: lpfc: Merge per-protocol WQ/CQ pairs into single per-cpu pair")
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Raise the config max for lpfc_fcp_mq_threshold variable to 256.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
CC: Hannes Reinecke <hare@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Capturing and downloading dif command data and dif data was done a dozen
years ago and no longer being used. Also creates a potential security hole.
Remove the debugfs buffer for dif debugging.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
CC: KyleMahlkuch <kmahlkuc@linux.vnet.ibm.com>
CC: Hannes Reinecke <hare@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Per Dan Carpenter:
The patch d79c9e9d4b: "scsi: lpfc: Support dynamic unbounded SGL lists on
G7 hardware." from Aug 14, 2019, leads to the following static checker
warning:
drivers/scsi/lpfc/lpfc_init.c:4107 lpfc_new_io_buf()
error: not allocating enough data 784 vs 768
There was no need to compare sizes nor to allocate size based on a define.
Change allocation to use actual structure length
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
CC: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/ufs/ufs-qcom.c: In function ufs_qcom_pwr_change_notify:
drivers/scsi/ufs/ufs-qcom.c:808:6: warning: variable val set but not used [-Wunused-but-set-variable]
Fixes: 1e1e465c6d ("scsi/ufs: qcom: Remove ufs_qcom_phy_*() calls from host")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Acked-by: Avri Altman <Avri.Altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is a spelling mistake in a ql_log message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/hisi_sas/hisi_sas_v1_hw.c: In function cq_interrupt_v1_hw:
drivers/scsi/hisi_sas/hisi_sas_v1_hw.c:1542:6: warning: variable irq_value set but not used [-Wunused-but-set-variable]
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch provides a module parameter and sysfs interface to select
whether the queue depth for each device should be based on the
protocol-specific value set by the driver (the default) or the maximum
supported by the controller (can_queue).
Although we have a sysfs interface per sdev to change the queue depth
of individual scsi devices, this implementation provides a single
sysfs entry per shost to switch between the controller max and the
driver default.
[mkp: tweaked commit desc]
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
According to the firmware documentation a status type 0 IOCB can be
followed by one or more status continuation type 0 IOCBs. Hence do not
complain if the completion function is not called from inside the status
type 0 IOCB handler.
WARNING: CPU: 10 PID: 425 at drivers/scsi/qla2xxx/qla_isr.c:2784
qla2x00_status_entry.isra.7+0x484/0x17b0 [qla2xxx]
CPU: 10 PID: 425 Comm: kworker/10:1 Tainted: G E 5.3.0-rc4-next-20190813-autotest-autotest #1
Workqueue: qla2xxx_wq qla25xx_free_rsp_que [qla2xxx]
Call Trace:
qla2x00_status_entry.isra.7+0x1484/0x17b0 [qla2xxx] (unreliable)
qla24xx_process_response_queue+0x7d8/0xbd0 [qla2xxx]
qla25xx_free_rsp_que+0x1a0/0x220 [qla2xxx]
process_one_work+0x25c/0x520
worker_thread+0x8c/0x5e0
kthread+0x154/0x1a0
ret_from_kernel_thread+0x5c/0x7c
Cc: Himanshu Madhani <hmadhani@marvell.com>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently bits in hba->outstanding_tasks are cleared only after their
corresponding task management commands are successfully done by
__ufshcd_issue_tm_cmd().
If timeout happens in a task management command, its corresponding bit in
hba->outstanding_tasks will not be cleared until next task management
command with the same tag used successfully finishes.
This is wrong and can lead to some issues, like power issue. For example,
ufshcd_release() and ufshcd_gate_work() will do nothing if
hba->outstanding_tasks is not zero even if both UFS host and devices are
actually idle.
Solution is referred from error handling of device commands: bits in
hba->outstanding_tasks shall be cleared regardless of their execution
results.
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Chun-Hung Wu <chun-hung.wu@mediatek.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pointer fh is being assigned a return value from the call to
skb_transport_header however this value is never read and fh is being
re-assigned immediately afterwards with a new value. Since there are
side-effects from calling skb_transport_header the call is redundant and
can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some UFS devices have issues if LCC is enabled. So we are setting
PA_LOCAL_TX_LCC_Enable to 0 before link startup which will make sure that
both host and device TX LCC are disabled once link startup is completed.
Signed-off-by: Anil Varughese <aniljoy@cadence.com>
Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Variable error is being initialized with a value that is never read and
error is being re-assigned a little later on. The assignment is redundant
and hence can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Move ASPM definitions and function prototypes from include/linux/pci-aspm.h
to include/linux/pci.h so users only need to include <linux/pci.h>:
PCIE_LINK_STATE_L0S
PCIE_LINK_STATE_L1
PCIE_LINK_STATE_CLKPM
pci_disable_link_state()
pci_disable_link_state_locked()
pcie_no_aspm()
No functional changes intended.
Link: https://lore.kernel.org/r/20190827095620.11213-1-kw@linux.com
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Four fixes, three for edge conditions which don't occur very often.
The lpfc fix mitigates memory exhaustion for some high CPU systems.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXWEBrCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishds+AQCyQlgV
TzSFQ1zvbAb3SNFdNsCzzb8Aq2vJC+RojF2VFgD/cJfE2fix9E7Nk8PCGwH1sgnf
m5Glsvv8BEmmtoikrb8=
=Ti6g
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four fixes, three for edge conditions which don't occur very often.
The lpfc fix mitigates memory exhaustion for some high CPU systems"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Mitigate high memory pre-allocation by SCSI-MQ
scsi: ufs: Fix NULL pointer dereference in ufshcd_config_vreg_hpm()
scsi: target: tcmu: avoid use-after-free after command timeout
scsi: qla2xxx: Fix gnl.l memory leak on adapter init failure
Hi Linus,
Please, pull the following patches that mark switch cases where we are
expecting to fall through.
- Fix fall-through warnings on arm and mips for multiple
configurations.
Thanks
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAl1clmEACgkQRwW0y0cG
2zGqbg/9HPC3Cf3oYq4o0/kV+cfS0ir6iJCz1mspFfbBloaS/EU7A2CF35bDz7k3
XUzl/ci82EQCnuJv/X6ddayUF1S/vFWLnQXRznz07kJspUnNpu7JKgsZr2qsHaRe
CfCj62J/Kuhnke8EUjuWEuga6YXYsYlcevgg/tpVXsTmxrpq2A15tWyut7WEe4JQ
kWPELwYbPsDvTj2siZrgMRBx4gVzQKQVo5TpZiuADeJu9RuFT/64PI9TDQGE7c+X
fFq4ijd1YPj/E+WI7k5VdUbXYiPIIXmkJ4VAPcu5VWmUS7y7bTeye0Jc3uYAxI1r
7rykYhNzniGn3SZL+wq8rHchL3dTLBYhd34HhTlb5xdGFwmbzKgHBqdlGpH8HOo+
CLu8kPYdmnzYCth4md0ENwgBVkj0tweyZuMzCys1qR6RFhOipxWLNGEvIXWZ0Sp8
uNyXnPdCrZTmlwubwY4FOOLsGKW06GnD64cfmEYoCMcmT2j7clbjasWYM4PXQvbt
0dVtt8k4M5LJBLh8qTX7RMZHDQYMiiYiMnLLAXf4wB0VUTqgNuLc4k0PpX3kBYtO
4b0lU/LQH+8811BMNVBHK55StQ8DjM0C2yfQWx610eoohjV70JTyxOWoqeHFL5hq
DIFdLDOgvJCqtyYgJDjmCmH9x6lgfvmxAKq66h9Z7vt25KLUizQ=
=fQZm
-----END PGP SIGNATURE-----
Merge tag 'Wimplicit-fallthrough-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull more fallthrough fixes from Gustavo A. R. Silva:
"Fix fall-through warnings on arm and mips for multiple configurations"
* tag 'Wimplicit-fallthrough-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
video: fbdev: acornfb: Mark expected switch fall-through
scsi: libsas: sas_discover: Mark expected switch fall-through
MIPS: Octeon: Mark expected switch fall-through
power: supply: ab8500_charger: Mark expected switch fall-through
watchdog: wdt285: Mark expected switch fall-through
mtd: sa1100: Mark expected switch fall-through
drm/sun4i: tcon: Mark expected switch fall-through
drm/sun4i: sun6i_mipi_dsi: Mark expected switch fall-through
ARM: riscpc: Mark expected switch fall-through
dmaengine: fsldma: Mark expected switch fall-through
Mark switch cases where we are expecting to fall through.
Fix the following warning (Building: mtx1_defconfig mips):
drivers/scsi/libsas/sas_discover.c: In function ‘sas_discover_domain’:
./include/linux/printk.h:309:2: warning: this statement may fall through [-Wimplicit-fallthrough=]
printk(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/scsi/libsas/sas_discover.c:459:3: note: in expansion of macro ‘pr_notice’
pr_notice("ATA device seen but CONFIG_SCSI_SAS_ATA=N so cannot attach\n");
^~~~~~~~~
drivers/scsi/libsas/sas_discover.c:462:2: note: here
default:
^~~~~~~
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Update lpfc version to 12.4.0.0
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, each hardware queue, typically allocated per-cpu, consists of a
WQ/CQ pair per protocol. Meaning if both SCSI and NVMe are supported 2
WQ/CQ pairs will exist for the hardware queue. Separate queues are
unnecessary. The current implementation wastes memory backing the 2nd set
of queues, and the use of double the SLI-4 WQ/CQ's means less hardware
queues can be supported which means there may not always be enough to have
a pair per cpu. If there is only 1 pair per cpu, more cpu's may get their
own WQ/CQ.
Rework the implementation to use a single WQ/CQ pair by both protocols.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
FC-NVMe-2 added support for sequence level error recovery in the FC-NVME
protocol. This allows for the detection of errors and lost frames and
immediate retransmission of data to avoid exchange termination, which
escalates into NVMeoFC connection and association failures. A significant
RAS improvement.
The driver is modified to indicate support for SLER in the NVMe PRLI is
issues and to check for support in the PRLI response. When both sides
support it, the driver will set a bit in the WQE to enable the recovery
behavior on the exchange. The adapter will take care of all detection and
retransmission.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Typical SLI-4 hardware supports up to 2 4KB pages to be registered per XRI
to contain the exchanges Scatter/Gather List. This caps the number of SGL
elements that can be in the SGL. There are not extensions to extend the
list out of the 2 pages.
The G7 hardware adds a SGE type that allows the SGL to be vectored to a
different scatter/gather list segment. And that segment can contain a SGE
to go to another segment and so on. The initial segment must still be
pre-registered for the XRI, but it can be a much smaller amount (256Bytes)
as it can now be dynamically grown. This much smaller allocation can
handle the SG list for most normal I/O, and the dynamic aspect allows it to
support many MB's if needed.
The implementation creates a pool which contains "segments" and which is
initially sized to hold the initial small segment per xri. If an I/O
requires additional segments, they are allocated from the pool. If the
pool has no more segments, the pool is grown based on what is now
needed. After the I/O completes, the additional segments are returned to
the pool for use by other I/Os. Once allocated, the additional segments are
not released under the assumption of "if needed once, it will be needed
again". Pools are kept on a per-hardware queue basis, which is typically
1:1 per cpu, but may be shared by multiple cpus.
The switch to the smaller initial allocation significantly reduces the
memory footprint of the driver (which only grows if large ios are
issued). Based on the several K of XRIs for the adapter, the 8KB->256B
reduction can conserve 32MBs or more.
It has been observed with per-cpu resource pools that allocating a resource
on CPU A, may be put back on CPU B. While the get routines are distributed
evenly, only a limited subset of CPUs may be handling the put routines.
This can put a strain on the lpfc_put_cmd_rsp_buf_per_cpu routine because
all the resources are being put on a limited subset of CPUs.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Added code to support driver loopback with MDS Diagnostics. This style of
diagnostics passes frames from the fabric to the driver who then echo them
back out the link. SEND_FRAME WQEs are used to transmit the frames. Added
the SOF and EOF field location definitions for use by SEND_FRAME.
Also ensure that enable_mds_diags is a RW parameter.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
To aid better hardware detection when there are issues, report the first
and second level hardware revisions from the READ_REV command. Add the
elements to the existing hardware id string.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In order to see real addresses, convert %p with %px for kernel addresses
and replace %p with %pf for functions.
While converting, standardize on "x%px" throughout (not %px or 0x%px).
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While performing code review, several relatively simple optimizations can
be done in the fast path.
Add these optimizations (unlikely designators).
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Running on Coverity produced the following errors:
- coding style (indentation)
- memset size mismatch errors
note: comment cases where it is purposely a mismatch
Fix the errors.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
modinfo for lpfc_nvme_enable_fb is incorrect. FirstBurst on lpfc target is
not fully supported.
Update the attribute description
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is allowing the user to change lpfc_enable_bg while loading the
driver against a FCoE adapter. This is not supported.
No check is made for the adapter type when applying the blockguard
enablement value.
Fix by verifying the adapter type before setting the enablement flag.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
GetTrunkInfo is displaying an incorrect link speed when the link is a trunk
and the link has gone down. The driver is not clearing the logical speed
as part of the link down transition.
Fix by setting the logical speed to UNKNOWN SPEED when the link goes down.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Max Frame Size value is shown as 34816 in fdmishow from Switch.
The driver uses bbRcvSize in common service param which is obtained from
the READ_SPARM mailbox command. The bbRcvSize field which is displayed is a
three nibble field but the driver is printing a full four nibbles.
Fix by masking off the upper nibble.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The scsi transport fc bsg interface does not expect the bsg_job_done()
callback to be done if the bsg request call returns failure. Several of the
HST_VENDOR cases in the driver unconditionally call bsg_job_done()
regardless of the returning value.
Fix the code to only call bsg_job_done() if the call to lpfc_bsg_request()
will return success.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When forcing the use of MSI (vs MSI-X) the driver is crashing in
pci_irq_get_affinity.
The driver was not using the new pci_alloc_irq_vectors interface in the MSI
path.
Fix by using pci_alloc_irq_vectors() with PCI_RQ_MSI in the MSI path.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is currently reporting a non-zero nvme sg_seg_cnt value of 256
when nvme is disabled. It should be zero.
Fix by ensuring the value is cleared.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If an unsolicited ABTS was received, the driver looks up the exchange it
references. It it does various searches looking for the exchange
context. When one is eventually matched and it is associated with an XRI
context, the driver sends an ABORT WQE to terminate the exchange. Current
code looks at whether the transport had taken action on the XRI yet or not
(no action if set to LPFC_NVMET_STE_RCV; action if non-LPFC_NVMET_STE_RCV).
Based on action or not one of two (sol vs unsol) issue abort routines are
called. The unsol version cheats and transmits a sequence containing an
ABTS with no interaction with the adapter. The sol version issues an Abort
WQE and lets the adapter manage whether the ABTS is sent to not.
The issue is the unsol version is sending ABTS unconditionally for the
exchange that received the ABTS. It's unnecessary.
Remove the conditional and just call the adapter command-based routine to
let the adapter manage the ABTS.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
As part of firmware download, the adapter is reset. On the adapter the
reset causes the function to stop and all outstanding io is terminated
(without responses). The reset path then starts teardown of the adapter,
starting with deregistration of the remote ports with the nvme-fc
transport. The local port is then deregistered and the driver waits for
local port deregistration. This never finishes.
The remote port deregistrations terminated the nvme controllers, causing
them to send aborts for all the outstanding io. The aborts were serviced in
the driver, but stalled due to its state. The nvme layer then stops to
reclaim it's outstanding io before continuing. The io must be returned
before the reset on the controller is deemed complete and the controller
delete performed. The remote port deregistration won't complete until all
the controllers are terminated. And the local port deregistration won't
complete until all controllers and remote ports are terminated. Thus things
hang.
The issue is the reset which stopped the adapter also stopped all the
responses that would drive i/o completions, and the aborts were also
stopped that stopped i/o completions. The driver, when resetting the
adapter like this, needs to be generating the completions as part of the
adapter reset so that I/O complete (in error), and any aborts are not
queued.
Fix by adding flush routines whenever the adapter port has been reset or
discovered in error. The flush routines will generate the completions for
the scsi and nvme outstanding io. The abort ios, if waiting, will be caught
and flushed as well.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This issue is specific to SLI-3 adapters, specifically when DIF is used.
Once seen, this message floods the logs:
9064 BLKGRD: lpfc_scsi_prep_dma_buf_s3: Too many sg segments from
dma_map_sg
The driver, upon detecting an error such as too many elements in an sglist,
misrepresents the error by treating it as a temporary resource issue by
returning MLQUEUE_HOST_BUSY. In these cases, no retry will fix it and it
should have been a hard error. The repeated retry was causing the spamming
of the log.
As for the initial reason of why an I/O encountered this issue at all is
not clear as parameters set by the driver should have avoided this. The
dm multipath maintainer has been notified of the issue.
Fix by changing the return code for the dma mapping routines to indicate
cases that are not retryable and return DID_ERROR on those cases.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the adapter encounters a condition which causes the adapter to fail
(driver must detect the failure) simultaneously to a request to the driver
to reset the adapter (such as a host_reset), the reset path will be racing
with the asynchronously-detect adapter failure path. In the failing
situation, one path has started to tear down the adapter data structures
(io_wq's) while the other path has initiated a repeat of the teardown and
is in the lpfc_sli_flush_xxx_rings path and attempting to access the
just-freed data structures.
Fix by the following:
- In cases where an adapter failure is detected, rather than explicitly
calling offline_eratt() to start the teardown, change the adapter state
and let the later calls of posted work to the slowpath thread invoke the
adapter recovery. In essence, this means all requests to reset are
serialized on the slowpath thread.
- Clean up the routine that restarts the adapter. If there is a failure
from brdreset, don't immediately error and leave things in a partial
state. Instead, ensure the adapter state is set and finish the teardown
of structures before returning.
- If in the scsi host reset handler and the board fails to reset and
restart (which can be due to parallel reset/recovery paths), instead of
hard failing and explicitly calling offline_eratt() (which gets into the
redundant path), just fail out and let the asynchronous path resolve the
adapter state.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During cable pull testing a deadlock was seen between lpfc_nlp_counters()
vs lpfc_mbox_process_link_up() vs lpfc_work_list_done(). They are all
waiting on the shost->host_lock.
Issue is all of these cases raise irq when taking out the lock but use
spin_unlock_irq() when unlocking. The unlock path is will unconditionally
re-enable interrupts in cases where irq state should be preserved. The
re-enablement allowed the other paths to execute which then causes the
deadlock.
Fix by converting the lock/unlock to irqsave/irqrestore.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In a test with high nvme remote port counts connected via a multi-hop FC
switch config where switches were systematically reset (e.g. fabric
partitioning and re-establishment), the nvme remote ports would switch
addresses based on the switch reconfiguration events. The driver would get
into a situation where the nvme port changed address, PLOGI and PRLI would
succeed nvme transport registration occurred, but subsequent LS requests by
the nvme subsystem failed due to a bad ndlp state and connectivity to the
device failed.
The driver hit a race condition on multiple devices that address swapped
simultaneously. In cases where the driver notices the remote port structure
came back as the same value as previously (meaning a nvme_rport structure
was re-enabled and did not go through devloss_tmo/connect_tmo_failures on
all controllers) the driver would unconditionally exit assuming the ndlp
information was correct. But, if the ndlp's had been swapped, the ndlp had
stale port state information, which when used by the LS request commands,
would fail the commands.
Fix by checking whether a node swap had occurred, and only exit if no ndlp
swap had occurred.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In situations where zoning is not being used, thus NVMe initiators see
other NVMe initiators as well as NVMe targets, a link bounce on an
initiator will cause the NVMe initiators to spew "6169" State Error
messages.
The driver is not qualifying whether the remote port is a NVMe targer or
not before calling the lpfc_nvme_rescan_port(), which validates the role
and prints the message if its only an NVMe initiator.
Fix by the following:
- Before calling lpfc_nvme_rescan_port() ensure that the node is a NVMe
storage target or a NVMe discovery controller.
- Clean up implementation of lpfc_nvme_rescan_port. remoteport pointer
will always be NULL if a NVMe initiator only. But, grabbing of
remoteport pointer should be done under lock to coincide with the
registering of the remote port with the fc transport.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On an SLI-3 adapter which does not support NVMe, but with the driver global
attribute to enable nvme on any adapter if it does support NVMe
(e.g. module parameter lpfc_enable_fc4_type=3), the SGL and total SGE
values are being munged by the protocol enablement when it shouldn't be.
Correct by changing the location of where the NVME sgl information is being
applied, which will avoid any SLI-3-based adapter.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If admin changes the devloss_tmo on an rport via the fc_remote_port rport
dev_loss_tmo attribute, the value is on set on scsi stack. The change is
not propagated to NVMe.
The set routine in the lldd lacks the call to
nvme_fc_set_remoteport_devloss() to set the value.
Fix by adding the call to the lldd set routine.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In tests with remote ports contantly logging out/logging coupled with
occassional local link bounce, if a remote port is disocnnected for longer
than devloss_tmo and then subsequently reconnected, eventually the test
will fail to login with the remote port and remote port connectivity is
lost.
When devloss_tmo expires, the driver does not free the node struct until
the port or npiv instances is being deleted. The node is left allocated but
the state set to UNUSED. If the node was in the process of logging in when
the local link drop occurred, meaning the RPI was allocated for the node in
order to send the ELS, but not yet registered which comes after successful
login, the node is moved to the NPR state, and if devloss expires, to
UNUSED state. If the remote port comes back, the node associated with it
is restarted and this path happens to allocate a new RPI and overwrites the
prior RPI value. In the cases where the port was logged in and loggs out,
the path did release the RPI but did not set the node rpi value. In the
cases where the remote port never finished logging in, the path never did
the call to release the rpi. In this latter case, when the node is
subsequently restore, the new rpi allocation overwrites the rpi that was
not released, and the rpi is now leaked. Eventually the port will run out
of RPI resources to log into new remote ports.
Fix by following changes:
- When an rpi is released, do so under locks and ensure the node rpi value
is set to a non-allocated value (LPFC_RPI_ALLOC_ERROR). Note:
refactored to a small service routine to avoid indentation issues.
- When re-enabling a node, check the rpi value to determine if a new
allocation is necessary. If already set, use the prior rpi.
Enhanced logging to help in the future.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If a remote port is removed and remains removed for devloss_tmo, if an RSCN
is subsequently received indicating the presence of the remte port, the
driver does not login to and rediscovery the remote port.
Currently, in order to for a port to be rediscovered post an RSCN, the node
state must be NPR to reflect not logged in. When devloss expires, the node
state is marked UNUSED. When an RSCN occurs, the nodes referenced by the
RSCN will have a NPR_2B_DISC flag set, but the re-login will only be
attempted if the node is in NPR_NODE state. Thus the node is skipped over.
Fix by recognizing the NPR_2B_DISC and UNUSED and transition the node back
to NPR state to allow the re-login to take place.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If an admin updates lpfc's devloss_tmo sysfs attribute, the kernel will
oops.
Coding of a loop allowed a new value (rport) to be set/checked for null
followed by an older value (remoteport) checked for null to allow progress
where the new value, even though null, will be referenced.
Rework the logic to validate and prevent any reference to the null ptr.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It's possible for the driver to initiate an FLOGI and before it completes,
another link down/up transition occurs requiring a new FLOGI. Currently,
nothing is done to abort/noop the older FLOGI request to the adapter, so if
this transition occurs and the FLOGI completion is received after the link
down/up transition, the driver may erroneously act on the older FLOGI. In
most cases, the adapter properly terminates/fails the FLOGI, but there is a
timing condition where the FLOGI may complete on the wire prior to the
transition, but the response may not be seen/processed by the driver before
the driver sees the link transition.
Fix by having the link down handler in the driver run through any
outstanding ELS's and change the completion handler of the ELS so that it
will be no-op'd and released.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When tearing down the adapter for a reset, online/offline, or driver
unload, the queue free routine would hit a GPF oops. This only occurs on
conditions where the number of hardware queues created is fewer than the
number of cpus in the system. In this condition cpus share a hardware
queue. And of course, it's the 2nd cpu that shares a hardware that
attempted to free it a second time and hit the oops.
Fix by reworking the cpu to hardware queue mapping such that:
Assignment of hardware queues to cpus occur in two passes:
first pass: is first time assignment of a hardware queue to a cpu.
This will set the LPFC_CPU_FIRST_IRQ flag for the cpu.
second pass: for cpus that did not get a hardware queue they will
be assigned one from a primary cpu (one set in first pass).
Deletion of hardware queues is driven by cpu itteration, and queues
will only be deleted if the LPFC_CPU_FIRST_IRQ flag is set.
Also contains a few small cleanup fixes and a little better logging.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The adapter reset path (lpfc_sli_hba_down) is taking/releasing a lock with
irq. But, the path is already under the hbalock which raised irq so it's
unnecessary.
Convert to simple lock/unlock.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
lpfc_nvme_register_port hit a null prev_ndlp pointer in a test with lots of
target ports swapping addresses. The oldport value was stale, thus it's
ndlp (prev_ndlp set to it) was used.
Fix by moving oldrport pointer checks, and if used prev_ndlp pointer
assignment, to be done while the lock is held.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is inadvertently trying to issue an INIT_VPI mailbox command on
an SLI-3 driver. The command is specific to SLI-4. When the call is made to
send the command, if on an SLI-3 adapter, an array pointer is NULL and the
driver will oops.
Fix by restricting the command to SLI-4 adapters only.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If a target issues an ADISC to the port and the target is a NVME target,
the driver is inadvertantly invalidating the login and marking the remote
port as logged out. Communication with the target is lost.
Revise the ADISC check so that FCP or NVME targets will be marked valid at
the end of ADISC processing. Enhance logging to recognize condition
better.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some remote ports may be slow in registering their GID_FT protocol
information with the fabric. If the remote port is an initiator, it may
send PLOGI to the port before the GID_FT logic is complete. Meaning, after
accepting the PLOGI, when the driver may see no response to the GID_FT that
is issued after the login to determine the protocols supported so that
proper PRLI's may be transmit. If the driver has no fc4 information, it
currently stops and the remote port is not discovered.
Fix by issuing a LOGO when there is no GID_FT information. The LOGO
completion handling will attempt to re-login if the nport_id is still
present.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In cases of remote-port-side cable pull/replug, there happens to be a
target that upon replug will send the port a PLOGI, a PRLI, and a LOGO.
When this sequence is received by the driver, the PLOGI accepted and a
GFT_ID is issued to find the protocol support for the remote port. While
the GFT_ID is outstanding, a LOGO is received. The driver logs the remote
port out and unregisters the RPI and schedules a new PLOGI transmission.
However, the GFT_ID was not terminated. When it completed, the driver
attempted to transition the remote port to PRLI transmission, which cancels
the PLOGI scheduling. The PRLI transmit attempt is rejected by the adapter
as the remote port is not logged in. No retry is attempted as it's expected
the logout is noted and the supposedly scheduled PLOGI should address the
state. As there is no PLOGI, the remote port does not get re-discovered.
Fix by aborting the outstanding GFT_ID if the related remote port is logged
out.
Ensure a PRLI transmit attempt only occurs if the remote port is logging
in. This avoids the incorrect attempt while logged out.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the adapter is reset while there are outstanding ELS's, subsequent
reinitialization of the adapter will fail as it has not recovered all of
the io contexts relative to the ELS's.
If an ELS timed out or otherwise failed and an the ELS was attempted to be
aborted (which changes the ELS completion context), in causes where the
driver generates completions for the outstanding IO as the adapter would
not due to being reset, the driver released only the ELS context and failed
to release the abort context. When the adapter went to reinit, as it had
not received all of the contexts, it failed to reinit.
Fix by having the ELS completion handler identify the driver-generated
completion status and release the abort context.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Unusually high IO latency can be observed with little IO in progress. The
latency may remain high regardless of amount of IO and can only be cleared
by forcing lpfc_fcp_imax values to non-zero and then back to zero.
The driver's eq_delay mechanism that scales the interrupt coalescing based
on io completion load failed to reduce or turn off coalescing when load
decreased. Specifically, if no io completed on a cpu within an eq_delay
polling window, the eq delay processing was skipped and no change was made
to the coalescing values. This left the coalescing values set when they
were no longer applicable.
Fix by always clearing the percpu counters for each time period and always
run the eq_delay calculations if an eq has a non-zero coalescing value.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If a timer routine uses workqueues, it could fire before the workqueue is
allocated.
Fix by allocating the workqueue before the timer routines are setup
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After seeing some interoperability issues with ADISC, it was determined the
ELS definitions in lpfc were using types that allowed the compiler to add
pad to the structure, causing the structure to no longer be per spec. The
offending structures are ADISC, FAN, and RNID.
This patch implements the simple fix of eliminating the pad by forcing the
compiler to pack the structure. Care was taken to ensure field accesses
won't be by operations that would hit a bad field alignment.
The better solution would be to convert to the uapi fc header definitions,
but the number of changes required to do is rather intrusive so this course
of action was deferred.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When connected to a high number of remote ports, the driver is encountering
PLOGI errors. The errors are due to adapter detected failures indicating
illegal field values.
Turns out the driver was prematurely clearing an RPI bitmask before waiting
for an UNREG_RPI mailbox completion. This allowed the RPI to be reused
before it was actually available.
Fix by clearing RPI bitmask only after UNREG_RPI mailbox completion.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi-mq operation inherently performs pre-allocation of resources for
blk-mq request queues. Even though the kdump environment reduces the
configuration to a single CPU, thus 1 hardware queue, which helps
significantly, the resources are still rather large due to the per request
allocations. blk-mq pre-allocations can be over 4KB per request. With
adapter can_queue values in the 4k or 8k range, this can easily be 32MBs
before any other driver memory is factored in. Driver SGL DMA buffer
allocation can be up to 8KB per request as well adding an additional
64MB. Totals are well over 100MB for a single shost. Given kdump memory
auto-sizing utilities don't accommodate this amount of memory well, it's
very possible for kdump to fail due to lack of memory.
Fix by having the driver recognize that it is booting within a kdump
context and reduce the number of requests it will support to a more
reasonable value.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
As dma_pool_destroy and mempool_destroy functions has NULL check. We may
not need NULL check before calling them.
Fix below warnings reported by coccicheck
./drivers/scsi/lpfc/lpfc_mem.c:252:2-18: WARNING: NULL check before some
freeing functions is not needed.
./drivers/scsi/lpfc/lpfc_mem.c:255:2-18: WARNING: NULL check before some
freeing functions is not needed.
./drivers/scsi/lpfc/lpfc_mem.c:258:2-18: WARNING: NULL check before some
freeing functions is not needed.
./drivers/scsi/lpfc/lpfc_mem.c:261:2-18: WARNING: NULL check before some
freeing functions is not needed.
./drivers/scsi/lpfc/lpfc_mem.c:265:2-18: WARNING: NULL check before some
freeing functions is not needed.
./drivers/scsi/lpfc/lpfc_mem.c:269:2-17: WARNING: NULL check before some
freeing functions is not needed.
Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Static structure ufs_hba_qcom_vops, of type ufs_hba_variant_ops, is used
only once, when it is passed as the second argument to function
ufshcd_pltfrm_init(). In the definition of ufshcd_pltfrm_init(), its second
parameter (corresponding to ufs_hba_qcom_vops) is declared as
constant. Hence declare ufs_hba_qcom_vops itself constant as well to
protect it from unintended modification. Issue found with Coccinelle.
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Reviewed-by: Vivek Gautam <vivek.gautam@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When SCSI-MQ is enabled, the SCSI-MQ layers will do pre-allocation of MQ
resources based on shost values set by the driver. In newer cases of the
driver, which attempts to set nr_hw_queues to the cpu count, the
multipliers become excessive, with a single shost having SCSI-MQ
pre-allocation reaching into the multiple GBytes range. NPIV, which
creates additional shosts, only multiply this overhead. On lower-memory
systems, this can exhaust system memory very quickly, resulting in a system
crash or failures in the driver or elsewhere due to low memory conditions.
After testing several scenarios, the situation can be mitigated by limiting
the value set in shost->nr_hw_queues to 4. Although the shost values were
changed, the driver still had per-cpu hardware queues of its own that
allowed parallelization per-cpu. Testing revealed that even with the
smallish number for nr_hw_queues for SCSI-MQ, performance levels remained
near maximum with the within-driver affiinitization.
A module parameter was created to allow the value set for the nr_hw_queues
to be tunable.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
As spin_unlock_irq will enable interrupts.
Function lpfc_findnode_rpi is called from
lpfc_sli_abts_err_handler (./drivers/scsi/lpfc/lpfc_sli.c)
<- lpfc_sli_async_event_handler
<- lpfc_sli_process_unsol_iocb
<- lpfc_sli_handle_fast_ring_event
<- lpfc_sli_fp_intr_handler
<- lpfc_sli_intr_handler
and lpfc_sli_intr_handler is an interrupt handler.
Interrupts are enabled in interrupt handler. Use
spin_lock_irqsave/spin_unlock_irqrestore instead of spin_(un)lock_irq in
IRQ context to avoid this.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove the redundant initialization code.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Variable ret is initialized to a value that is never read and it is
re-assigned later and immediately returns. Clean up the code by removing
rc and just returning 0.
[mkp: typo]
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Karan Tilak Kumar <kartilak@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>