Rmmod on SLI-4 adapters is sometimes hitting a bad ptr dereference in
lpfc_els_free_iocb().
A prior patch refactored the lpfc_sli_abort_iocb() routine. One of the
changes was to convert from building/sending an abort within the routine to
using a common routine. The reworked routine passes, without modification,
the pring ptr to the new common routine. The older routine had logic to
check SLI-3 vs SLI-4 and adapt the pring ptr if necessary as callers were
passing SLI-3 pointers even when not on an SLI-4 adapter. The new routine
is missing this check and adapt, so the SLI-3 ring pointers are being used
in SLI-4 paths.
Fix by cleaning up the calling routines. In review, there is no need to
pass the ring ptr argument to abort_iocb at all. The routine can look at
the adapter type itself and reference the proper ring.
Link: https://lore.kernel.org/r/20210412013127.2387-2-jsmart2021@gmail.com
Fixes: db7531d2b3 ("scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlers")
Cc: <stable@vger.kernel.org> # v5.11+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes the following W=1 kernel build warning(s):
drivers/scsi/lpfc/lpfc_sli.c:9654: warning: expecting prototype for lpfc_sli_iocb2wqe(). Prototype was for lpfc_sli4_iocb2wqe() instead
drivers/scsi/lpfc/lpfc_sli.c:10439: warning: Function parameter or member 'phba' not described in 'lpfc_sli_issue_fcp_io'
drivers/scsi/lpfc/lpfc_sli.c:10439: warning: Function parameter or member 'ring_number' not described in 'lpfc_sli_issue_fcp_io'
drivers/scsi/lpfc/lpfc_sli.c:10439: warning: Function parameter or member 'piocb' not described in 'lpfc_sli_issue_fcp_io'
drivers/scsi/lpfc/lpfc_sli.c:10439: warning: Function parameter or member 'flag' not described in 'lpfc_sli_issue_fcp_io'
drivers/scsi/lpfc/lpfc_sli.c:14189: warning: expecting prototype for lpfc_sli4_sp_process_cq(). Prototype was for __lpfc_sli4_sp_process_cq() instead
drivers/scsi/lpfc/lpfc_sli.c:14754: warning: expecting prototype for lpfc_sli4_hba_process_cq(). Prototype was for lpfc_sli4_dly_hba_process_cq() instead
drivers/scsi/lpfc/lpfc_sli.c:17230: warning: expecting prototype for lpfc_sli4_free_xri(). Prototype was for __lpfc_sli4_free_xri() instead
drivers/scsi/lpfc/lpfc_sli.c:18950: warning: expecting prototype for lpfc_sli4_free_rpi(). Prototype was for __lpfc_sli4_free_rpi() instead
Link: https://lore.kernel.org/r/20210303144631.3175331-18-lee.jones@linaro.org
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-scsi@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For the files modified in 2021 via the 12.8.0.7 and 12.8.0.8 patch sets,
update the copyright for 2021.
Link: https://lore.kernel.org/r/20210301171821.3427-23-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver is causing a crash in __lpfc_sli_release_iocbq_s4() when it
dereferences the els_wq which is NULL.
Validate the pring for the els_wq before dereferencing. Reorg the code to
move the pring assignment closer to where it is actually used.
Link: https://lore.kernel.org/r/20210301171821.3427-18-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When connected in pt2pt mode, there is a scenario where the remote port
significantly delays sending a response to our FLOGI, but acts on the FLOGI
it sent us and proceeds to PLOGI/PRLI. The FLOGI ends up timing out and
kicks off recovery logic. End result is a lot of unnecessary state changes
and lots of discovery messages being logged.
Fix by terminating the FLOGI and noop'ing its completion if we have already
accepted the remote ports FLOGI and are now processing PLOGI.
Link: https://lore.kernel.org/r/20210301171821.3427-13-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Whenever an RRQ needs to be triggered, the DID from the node structure and
node pointer are stored in the RRQ data structure and the RRQ is scheduled
for later transmission. However, at the point in time that the timer
triggers, there's no validation on the node pointer. Reference counters may
have freed the structure. Additionally the DID in the node may no longer be
valid.
Fix by not tracking the node pointer in the RRQ, only the DID. At the time
of the timer expiration, look up the node with the did and if present, send
the RRQ. If no node exists, no need to send the RRQ.
Link: https://lore.kernel.org/r/20210301171821.3427-5-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While testing recent discovery node rework, several items were seen that
could be done better with respect to the new trace event logic.
1) in the following msg:
kernel: lpfc 0000:44:00.0: start 35 end 35 cnt 0
If cnt is zero in the 1st message, there is no reason to display the
1st message, which is just giving start/end positioning.
Fix by not displaying message if cnt is 0.
2) If the driver is loaded with module log verbosity off, and later a
single NPIV host instance verbosity is enabled via sysfs, it enables
messages on all instances. This is due to the trace log verbosity checks
(lpfc_dmp_dbg) looking at the phba only. It should look at the phba and
the vport.
Fix by enabling a check on both phba and vport.
3) in the following messages:
2904 Firmware Dump Image Present on Adapter
2887 Reset Needed: Attempting Port Recovery...
These messages are not necessary for the trace event log, which is
primarily for discovery.
Fix by changing log level on these 2 messages to LOG_SLI.
Link: https://lore.kernel.org/r/20210104180240.46824-15-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Several errors have occurred where the adapter stops or fails but does not
raise the register values for the driver to detect failure. Thus driver is
unaware of the failure. The failure typically results in I/O timeouts, the
I/O timeout handler failing (after several seconds), and the error handler
escalating recovery policy and resulting in more errors. Eventually, the
driver is in a position where things have spiraled and it can't do recovery
because other recovery ops are still outstanding and it becomes unusable.
Resolve the situation by having the I/O timeout handler (actually a els,
SCSI I/O, NVMe ls, or NVMe I/O timeout), in addition to aborting the I/O,
perform a mailbox command and look for a response from the hardware. If
the mailbox command fails, it will mark the adapter offline and then invoke
the adapter reset handler to clean up.
The new I/O timeout test will be limited to a test every 5s. If there are
multiple I/O timeouts concurrently, only the 1st I/O timeout will generate
the mailbox command. Further testing will only occur once a timeout occurs
after a 5s delay from the last mailbox command has expired.
Link: https://lore.kernel.org/r/20210104180240.46824-14-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When lpfc is running in NVMET mode and supports the NVME-1 addendum
changes, a LIP on a bound NVME Initiator or lipping the lpfc NVMET's link
resulted in an Oops in lpfc_nvmet_host_release.
The fix requires lpfc NVMET to maintain an additional reference on any node
structure that acts as the hosthandle for the NVMET transport. This
reference get is a one-time addition, is taken prior to the upcall of an
unsolicited LS_REQ, and is released when the NVMET transport releases the
hosthandle during the host_release downcall.
Link: https://lore.kernel.org/r/20210104180240.46824-13-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If a mailbox command times out, the SLI port is deemed in error and the
port is reset. The HBA cleanup is not returning I/Os to the NVMe layer
before the port is unregistered. This is due to the HBA being marked
offline (!SLI_ACTIVE) and cleanup being done by the mailbox timeout handler
rather than an general adapter reset routine. The mailbox timeout handler
mailbox handler only cleaned up SCSI I/Os.
Fix by reworking the mailbox handler to:
- After handling the mailbox error, detect the board is already in
failure (may be due to another error), and leave cleanup to the
other handler.
- If the mailbox command timeout is initial detector of the port error,
continue with the board cleanup and marking the adapter offline
(!SLI_ACTIVE). Remove the SCSI-only I/O cleanup routine. The generic
reset adapter routine that is subsequently invoked, will clean up the
I/Os.
- Have the reset adapter routine flush all NVMe and SCSI I/Os if the
adapter has been marked failed (!SLI_ACTIVE).
- Rework the NVMe I/O terminate routine to take a status code to fail the
I/O with and update so that cleaned up I/O calls the wqe completion
routine. Currently it is bypassing the wqe cleanup and calling the NVMe
I/O completion directly. The wqe completion routine will take care of
data structure and node cleanup then call the NVMe I/O completion
handler.
Link: https://lore.kernel.org/r/20210104180240.46824-11-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A very long time ago, there was a feature: auto sli mode. It gave the user
the ability to auto select the SLI mode (SLI2 or SLI3) to run the port in,
or even force SLI2 mode if configured. Because of the convoluted logic,
the CONFIG_PORT mbox command ends up being called 2 or 3 times. It should
have been called only once. Additionally, the driver no longer supports
SLI-2, so only SLI-3 mode should be allowed.
The following changes were made:
- Force module parameter to SLI3 only.
- Rip out redundant CONFIG_PORT mbox commands.
- Force CONFIG_PORT mbox command to be in beginning of enable ISR routine.
- Added changes for offline to online behavior
Link: https://lore.kernel.org/r/20210104180240.46824-3-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove vport variable that is assigned but not used in
lpfc_sli4_abts_err_handler().
Link: https://lore.kernel.org/r/20201119203407.121913-1-james.smart@broadcom.com
Fixes: e7dab164a9 ("scsi: lpfc: Fix scheduling call while in softirq context in lpfc_unreg_rpi")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch reworks the abort interfaces such that SLI-3 retains the
iocb-based formatting and completions and SLI-4 now uses native WQEs and
completion routines.
The following changes are made:
- The code is refactored from a confusing 2 routine sequence of
xx_abort_iotag_issue(), which creates/formats and abort cmd, and
xx_issue_abort_tag(), which then issues and handles the completion of
the abort cmd - into a single interface of xx_issue_abort_iotag(). The
new interface will determine whether SLI-3 or SLI-4 and then call the
appropriate handler. A completion handler can now be specified to
address the differences in completion handling. Note: original code is
all iocb based, with SLI-4 converting to SLI-3 for the SCSI/ELS path,
and NVMe natively using wqes.
- The SLI-3 side is refactored:
The older iocb-base lpfc_sli_issue_abort_iotag() routine is combined
with the logic of lpfc_sli_abort_iotag_issue() as well as the
iocb-specific code in lpfc_abort_handler() and lpfc_sli_abort_iocb() to
create the new single SLI-3 abort routine that formats and issues the
iocb.
- The SLI-4 side is refactored and added to:
The native WQE abort code in NVMe is moved to the new SLI-4
issue_abort_iotag() routine. Items in SCSI that set fields not set by
NVMe is migrated into the new routine. Thus the routine supports NVMe
and SCSI initiators. The nvmet block (target) formats the abort slightly
different (like the old NVMe initiator) thus it has its own prep routine
stolen from NVMe initiator and it retains the current code it has for
issuing the WQE (does not use the commonized routine the initiators
do). SLI-4 completion handlers were also added.
- lpfc_abort_handler now becomes a wrapper that determines whether
SLI-3 or SLI-4 and calls the proper abort handler.
Link: https://lore.kernel.org/r/20201115192646.12977-16-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current driver implementation uses SLI-4 WQE to iocb conversion before
calling the cmpl callback function.
Rework the FCP I/O completion path to utilize the SLI-4 WQE.
This patch converts the SCSI I/O completion paths from the iocb-centric
interfaces to the routines are native for whether I/Os are iocb-based
(SLI-3) or WQE-based (SLI-4).
Most existing routines were iocb-based, so this creates a lot of SLI-4
specific routines to provide the functionality.
Link: https://lore.kernel.org/r/20201115192646.12977-15-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch converts the SCSI I/O path from the iocb-centric interfaces to
the common I/O submission path which supports native SLI-4 WQEs.
A wrapper routine is put in place to distinguish SLI-3 from SLI. If SLI-3,
the same iocb-centric paths are used, perhaps with refactored code that is
explicitly for SLI-3. For SLI-4, any iocb-related formatting is replaced
by wqe-based formatting, although much of that is addressed by the common
wqe templates in the SLI-4 path.
Link: https://lore.kernel.org/r/20201115192646.12977-14-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
To set up common use by the SCSI and NVMe I/O paths, create a new routine
that issues FCP I/O commands which can be used by either protocol. The new
routine addresses SLI-3 vs SLI-4 differences within its implementation.
Replace the (SLI-3 centric) iocb routine in the SCSI path with this new
WQE-centric common routine.
Link: https://lore.kernel.org/r/20201115192646.12977-13-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is currently using SLI-4 WQE templates only for NVMe. Refactor
the template and the placement of the service routine so that it can be
used by both SCSI and NVMe.
Link: https://lore.kernel.org/r/20201115192646.12977-12-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While testing initiator-side cable swaps with NPIV, oops occur. The
reference counts for the Fabric nodes on the NPIV vports isn't balanced,
resulting in premature node removal.
The following fixes were made:
- Removed the FC_LBIT check in lpfc_linkup_port. This removed the special
case for vports that didn't have them clean up just like the physical
port.
- Removed the unreg_rpi call in lpfc_cleanup_node. In this section, the
node is being removed in the context of a reference count release and a
mailbox command can't be issued at this point.
- Remove special case handling in the default mailbox completion handler
that allowed the skipping of a node reference. Now, reference counting
always requires the removal of the reference.
- Move the location of the DEVICE_RM event is done during LOGO handling as
the driver has additional work to do on the ndlp before puts/releases
can be performed.
Link: https://lore.kernel.org/r/20201115192646.12977-10-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently the discovery layers within the driver use the SCSI midlayer
host_lock to access node-specific structures. This can contend with the I/O
path and is too coarse of a lock.
Rework the driver so that it uses a lock specific to the remote port node
structure when accessing the structure contents. A few of the changes
brought out spots were some slightly reorganized routines worked better.
Link: https://lore.kernel.org/r/20201115192646.12977-6-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Due to bug history and code review, the node reference counting approach in
the driver isn't implemented consistently with how the scsi and nvme
transport perform registrations and unregistrations and their callbacks.
This resulted in many bad/stale node pointers.
Reword the driver so that reference handling is performed as follows:
- The initial node reference is taken on structure allocation
- Take a reference on any add/register call to the transport
- Remove a reference on any delete/unregister call to the transport
- After the node has fully removed from both the SCSI and NVMEe transports
(dev_loss_callbacks have called back) call the discovery engine
DEVICE_RM event which will remove the final reference and release the
node structure.
- Alter dev_loss handling when a vport or base port is unloading.
- Remove the put_node handling - no longer needed.
- Rewrite the vport_delete handling on reference counts. Part of this
effort was driven from the FDISC not registering with the transport and
disrupting the model for node reference counting.
- Deleted lpfc_nlp_remove. Pushed it's remaining ops into
lpfc_nlp_release.
- Several other small code cleanups.
Link: https://lore.kernel.org/r/20201115192646.12977-5-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Now that the driver has gone to a normal ref interface (with no odd logic)
the discovery logic needs to be updated to reworked so that it properly
takes references when it should and give them up when it should.
Rework the driver for the following get/put model:
- Move gets to just before an I/O is issued. Add gets for places where an
I/O was issued without one.
- Ensure that failures from lpfc_nlp_get() are handled by the driver.
- Check and fix the placement of lpfc_nlp_puts relative to io completions.
Note: some of these paths may not release the reference on the exact io
completion as the reference is held as the code takes another step in
the discovery thread and which may cause another io to be issued.
- Rearrange some code for error processing and calling lpfc_nlp_put.
- Fix some places of incorrect reference freeing that was causing the
premature releasing of the structure.
- Nvmet plogi handling performs unreg_rpi's. The reference counts were
unbalanced resulting in premature node removal. In some cases this
caused loss of node discovery. Corrected the reftaking around nvmet
plogis.
Nodes that experience devloss now get released from the node list now that
there is a proper reference taking.
Link: https://lore.kernel.org/r/20201115192646.12977-3-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When a remote port is disconnected and disappears, its node structure
(ndlp) stays allocated and on a vport node list. While on the list it can
be matched, thus requires validation checks on state to be added in
numerous code paths. If the node comes back, its possible for there to be
multiple node structures for the same device on the vport node list. There
is no reason to keep the node structure around after it is no longer in
existence, and the current implementation creates problems for itself
(multiple nodes) and lots of unnecessary code for state validation.
Additionally, the reference taking on the node structure didn't follow the
normal model used by the kernel kref api. It included lots of odd logic to
match state with reference count. The combination of this odd logic plus
the way it was implicitly used in the discovery engine made its reference
taking implementation suspect and extremely hard to follow.
Change the driver such that the reference taking routines are now normal
ref increments/decrements and callout on refcount=0.
With this in place, the rework can be done such that the node structure is
fully removed and deallocated when the remote port no longer exists and all
references are removed. This removal logic, and the basic ref counting are
intrically tied, thus in a single patch.
Link: https://lore.kernel.org/r/20201115192646.12977-2-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During code reviews duplicate code sections were found to determine the WQ
Create version. The duplication was potentially overriding logic that
validated page size.
Link: https://lore.kernel.org/r/20201020202719.54726-6-james.smart@broadcom.com
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The following call trace was seen during HBA reset testing:
BUG: scheduling while atomic: swapper/2/0/0x10000100
...
Call Trace:
dump_stack+0x19/0x1b
__schedule_bug+0x64/0x72
__schedule+0x782/0x840
__cond_resched+0x26/0x30
_cond_resched+0x3a/0x50
mempool_alloc+0xa0/0x170
lpfc_unreg_rpi+0x151/0x630 [lpfc]
lpfc_sli_abts_recover_port+0x171/0x190 [lpfc]
lpfc_sli4_abts_err_handler+0xb2/0x1f0 [lpfc]
lpfc_sli4_io_xri_aborted+0x256/0x300 [lpfc]
lpfc_sli4_sp_handle_abort_xri_wcqe.isra.51+0xa3/0x190 [lpfc]
lpfc_sli4_fp_handle_cqe+0x89/0x4d0 [lpfc]
__lpfc_sli4_process_cq+0xdb/0x2e0 [lpfc]
__lpfc_sli4_hba_process_cq+0x41/0x100 [lpfc]
lpfc_cq_poll_hdler+0x1a/0x30 [lpfc]
irq_poll_softirq+0xc7/0x100
__do_softirq+0xf5/0x280
call_softirq+0x1c/0x30
do_softirq+0x65/0xa0
irq_exit+0x105/0x110
do_IRQ+0x56/0xf0
common_interrupt+0x16a/0x16a
With the conversion to blk_io_poll for better interrupt latency in normal
cases, it introduced this code path, executed when I/O aborts or logouts
are seen, which attempts to allocate memory for a mailbox command to be
issued. The allocation is GFP_KERNEL, thus it could attempt to sleep.
Fix by creating a work element that performs the event handling for the
remote port. This will have the mailbox commands and other items performed
in the work element, not the irq. A much better method as the "irq" routine
does not stall while performing all this deep handling code.
Ensure that allocation failures are handled and send LOGO on failure.
Additionally, enlarge the mailbox memory pool to reduce the possibility of
additional allocation in this path.
Link: https://lore.kernel.org/r/20201020202719.54726-3-james.smart@broadcom.com
Fixes: 317aeb83c9 ("scsi: lpfc: Add blk_io_poll support for latency improvment")
Cc: <stable@vger.kernel.org> # v5.9+
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The following calltrace was seen:
BUG: sleeping function called from invalid context at mm/slab.h:494
...
Call Trace:
dump_stack+0x9a/0xf0
___might_sleep.cold.63+0x13d/0x178
slab_pre_alloc_hook+0x6a/0x90
kmem_cache_alloc_trace+0x3a/0x2d0
lpfc_sli4_nvmet_alloc+0x4c/0x280 [lpfc]
lpfc_post_rq_buffer+0x2e7/0xa60 [lpfc]
lpfc_sli4_hba_setup+0x6b4c/0xa4b0 [lpfc]
lpfc_pci_probe_one_s4.isra.15+0x14f8/0x2280 [lpfc]
lpfc_pci_probe_one+0x260/0x2880 [lpfc]
local_pci_probe+0xd4/0x180
work_for_cpu_fn+0x51/0xa0
process_one_work+0x8f0/0x17b0
worker_thread+0x536/0xb50
kthread+0x30c/0x3d0
ret_from_fork+0x3a/0x50
A prior patch introduced a spin_lock_irqsave(hbalock) in the
lpfc_post_rq_buffer() routine. Call trace is seen as the hbalock is held
with interrupts disabled during a GFP_KERNEL allocation in
lpfc_sli4_nvmet_alloc().
Fix by reordering locking so that hbalock not held when calling
sli4_nvmet_alloc() (aka rqb_buf_list()).
Link: https://lore.kernel.org/r/20201020202719.54726-2-james.smart@broadcom.com
Fixes: 411de511c6 ("scsi: lpfc: Fix RQ empty firmware trap")
Cc: <stable@vger.kernel.org> # v4.17+
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While mds diagnostic tests are running, if the driver is requested to be
unloaded, oops or hangs are observed. The driver doesn't terminate the
processing of diag frames when the unload is started. As such: oops may be
seen for __lpfc_sli_release_iocbq_s4 because ring memory is referenced that
was already freed; or hangs see in lpfc_nvme_wait_for_io_drain as ios no
longer complete.
If unloading, don't process diag frames. Just clean them up.
Link: https://lore.kernel.org/r/20200803210229.23063-5-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
'pg_addr' is only used when CONFIG_X86 is defined. So only declare it if
CONFIG_X86 is defined.
Fixes the following W=1 kernel build warning(s):
drivers/scsi/lpfc/lpfc_sli.c: In function ‘lpfc_wq_create’:
drivers/scsi/lpfc/lpfc_sli.c:15813:16: warning: unused variable ‘pg_addr’ [-Wunused-variable]
15813 | unsigned long pg_addr;
| ^~~~~~~
Link: https://lore.kernel.org/r/20200721164148.2617584-37-lee.jones@linaro.org
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes the following W=1 kernel build warning(s):
drivers/scsi/lpfc/lpfc_sli.c:257: warning: Function parameter or member 'mqe' not described in 'lpfc_sli4_mq_put'
drivers/scsi/lpfc/lpfc_sli.c:257: warning: Excess function parameter 'wqe' description in 'lpfc_sli4_mq_put'
drivers/scsi/lpfc/lpfc_sli.c:675: warning: Function parameter or member 'hq' not described in 'lpfc_sli4_rq_put'
drivers/scsi/lpfc/lpfc_sli.c:675: warning: Function parameter or member 'dq' not described in 'lpfc_sli4_rq_put'
drivers/scsi/lpfc/lpfc_sli.c:675: warning: Function parameter or member 'hrqe' not described in 'lpfc_sli4_rq_put'
drivers/scsi/lpfc/lpfc_sli.c:675: warning: Function parameter or member 'drqe' not described in 'lpfc_sli4_rq_put'
drivers/scsi/lpfc/lpfc_sli.c:675: warning: Excess function parameter 'q' description in 'lpfc_sli4_rq_put'
drivers/scsi/lpfc/lpfc_sli.c:675: warning: Excess function parameter 'wqe' description in 'lpfc_sli4_rq_put'
drivers/scsi/lpfc/lpfc_sli.c:738: warning: Function parameter or member 'hq' not described in 'lpfc_sli4_rq_release'
drivers/scsi/lpfc/lpfc_sli.c:738: warning: Function parameter or member 'dq' not described in 'lpfc_sli4_rq_release'
drivers/scsi/lpfc/lpfc_sli.c:738: warning: Excess function parameter 'q' description in 'lpfc_sli4_rq_release'
drivers/scsi/lpfc/lpfc_sli.c:1021: warning: Function parameter or member 'xritag' not described in 'lpfc_test_rrq_active'
drivers/scsi/lpfc/lpfc_sli.c:1132: warning: Function parameter or member 'piocbq' not described in '__lpfc_sli_get_els_sglq'
drivers/scsi/lpfc/lpfc_sli.c:1132: warning: Excess function parameter 'piocb' description in '__lpfc_sli_get_els_sglq'
drivers/scsi/lpfc/lpfc_sli.c:1207: warning: Function parameter or member 'piocbq' not described in '__lpfc_sli_get_nvmet_sglq'
drivers/scsi/lpfc/lpfc_sli.c:1207: warning: Excess function parameter 'piocb' description in '__lpfc_sli_get_nvmet_sglq'
drivers/scsi/lpfc/lpfc_sli.c:2243: warning: Function parameter or member 'rb_list' not described in 'lpfc_sli_hbqbuf_get'
drivers/scsi/lpfc/lpfc_sli.c:2243: warning: Excess function parameter 'phba' description in 'lpfc_sli_hbqbuf_get'
drivers/scsi/lpfc/lpfc_sli.c:2243: warning: Excess function parameter 'hbqno' description in 'lpfc_sli_hbqbuf_get'
drivers/scsi/lpfc/lpfc_sli.c:2262: warning: Function parameter or member 'hrq' not described in 'lpfc_sli_rqbuf_get'
drivers/scsi/lpfc/lpfc_sli.c:2262: warning: Excess function parameter 'hbqno' description in 'lpfc_sli_rqbuf_get'
drivers/scsi/lpfc/lpfc_sli.c:3429: warning: Function parameter or member 't' not described in 'lpfc_poll_eratt'
drivers/scsi/lpfc/lpfc_sli.c:3429: warning: Excess function parameter 'ptr' description in 'lpfc_poll_eratt'
drivers/scsi/lpfc/lpfc_sli.c:4115: warning: Excess function parameter 'pring' description in 'lpfc_sli_abort_fcp_rings'
drivers/scsi/lpfc/lpfc_sli.c:5331: warning: Excess function parameter 'mboxq' description in 'lpfc_sli4_read_fcoe_params'
drivers/scsi/lpfc/lpfc_sli.c:5879: warning: Function parameter or member 'extnt_cnt' not described in 'lpfc_sli4_cfg_post_extnts'
drivers/scsi/lpfc/lpfc_sli.c:5879: warning: Function parameter or member 'type' not described in 'lpfc_sli4_cfg_post_extnts'
drivers/scsi/lpfc/lpfc_sli.c:5879: warning: Function parameter or member 'emb' not described in 'lpfc_sli4_cfg_post_extnts'
drivers/scsi/lpfc/lpfc_sli.c:5879: warning: Function parameter or member 'mbox' not described in 'lpfc_sli4_cfg_post_extnts'
drivers/scsi/lpfc/lpfc_sli.c:6459: warning: Function parameter or member 'pmb' not described in 'lpfc_sli4_ras_mbox_cmpl'
drivers/scsi/lpfc/lpfc_sli.c:6459: warning: Excess function parameter 'pmboxq' description in 'lpfc_sli4_ras_mbox_cmpl'
drivers/scsi/lpfc/lpfc_sli.c:6912: warning: Function parameter or member 'extnt_cnt' not described in 'lpfc_sli4_get_allocated_extnts'
drivers/scsi/lpfc/lpfc_sli.c:6912: warning: Excess function parameter 'extnt_count' description in 'lpfc_sli4_get_allocated_extnts'
drivers/scsi/lpfc/lpfc_sli.c:7064: warning: Excess function parameter 'pring' description in 'lpfc_sli4_repost_sgl_list'
drivers/scsi/lpfc/lpfc_sli.c:7312: warning: Function parameter or member 'phba' not described in 'lpfc_init_idle_stat_hb'
drivers/scsi/lpfc/lpfc_sli.c:8022: warning: Function parameter or member 't' not described in 'lpfc_mbox_timeout'
drivers/scsi/lpfc/lpfc_sli.c:8022: warning: Excess function parameter 'ptr' description in 'lpfc_mbox_timeout'
drivers/scsi/lpfc/lpfc_sli.c:8902: warning: Function parameter or member 'mboxq' not described in 'lpfc_sli_issue_mbox_s4'
drivers/scsi/lpfc/lpfc_sli.c:8902: warning: Excess function parameter 'pmbox' description in 'lpfc_sli_issue_mbox_s4'
drivers/scsi/lpfc/lpfc_sli.c:9413: warning: Function parameter or member 'piocbq' not described in 'lpfc_sli4_bpl2sgl'
drivers/scsi/lpfc/lpfc_sli.c:9413: warning: Excess function parameter 'piocb' description in 'lpfc_sli4_bpl2sgl'
drivers/scsi/lpfc/lpfc_sli.c:9518: warning: Function parameter or member 'iocbq' not described in 'lpfc_sli4_iocb2wqe'
drivers/scsi/lpfc/lpfc_sli.c:9518: warning: Excess function parameter 'piocb' description in 'lpfc_sli4_iocb2wqe'
drivers/scsi/lpfc/lpfc_sli.c:10212: warning: Function parameter or member 'phba' not described in '__lpfc_sli_issue_iocb'
drivers/scsi/lpfc/lpfc_sli.c:10212: warning: Function parameter or member 'ring_number' not described in '__lpfc_sli_issue_iocb'
drivers/scsi/lpfc/lpfc_sli.c:10212: warning: Function parameter or member 'piocb' not described in '__lpfc_sli_issue_iocb'
drivers/scsi/lpfc/lpfc_sli.c:10212: warning: Function parameter or member 'flag' not described in '__lpfc_sli_issue_iocb'
drivers/scsi/lpfc/lpfc_sli.c:10300: warning: Function parameter or member 'ring_number' not described in 'lpfc_sli_issue_iocb'
drivers/scsi/lpfc/lpfc_sli.c:10300: warning: Excess function parameter 'pring' description in 'lpfc_sli_issue_iocb'
drivers/scsi/lpfc/lpfc_sli.c:11807: warning: Function parameter or member 'cmd' not described in 'lpfc_sli_abort_taskmgmt'
drivers/scsi/lpfc/lpfc_sli.c:11807: warning: Excess function parameter 'taskmgmt_cmd' description in 'lpfc_sli_abort_taskmgmt'
drivers/scsi/lpfc/lpfc_sli.c:12067: warning: Function parameter or member 'ring_number' not described in 'lpfc_sli_issue_iocb_wait'
drivers/scsi/lpfc/lpfc_sli.c:12067: warning: Excess function parameter 'pring' description in 'lpfc_sli_issue_iocb_wait'
drivers/scsi/lpfc/lpfc_sli.c:12262: warning: Function parameter or member 'mbx_action' not described in 'lpfc_sli_mbox_sys_shutdown'
drivers/scsi/lpfc/lpfc_sli.c:13219: warning: Function parameter or member 'irspiocbq' not described in 'lpfc_sli4_els_wcqe_to_rspiocbq'
drivers/scsi/lpfc/lpfc_sli.c:13219: warning: Excess function parameter 'wcqe' description in 'lpfc_sli4_els_wcqe_to_rspiocbq'
drivers/scsi/lpfc/lpfc_sli.c:13285: warning: Function parameter or member 'mcqe' not described in 'lpfc_sli4_sp_handle_async_event'
drivers/scsi/lpfc/lpfc_sli.c:13285: warning: Excess function parameter 'cqe' description in 'lpfc_sli4_sp_handle_async_event'
drivers/scsi/lpfc/lpfc_sli.c:13318: warning: Function parameter or member 'mcqe' not described in 'lpfc_sli4_sp_handle_mbox_event'
drivers/scsi/lpfc/lpfc_sli.c:13318: warning: Excess function parameter 'cqe' description in 'lpfc_sli4_sp_handle_mbox_event'
drivers/scsi/lpfc/lpfc_sli.c:13441: warning: Function parameter or member 'cq' not described in 'lpfc_sli4_sp_handle_mcqe'
drivers/scsi/lpfc/lpfc_sli.c:13768: warning: Function parameter or member 'speq' not described in 'lpfc_sli4_sp_handle_eqe'
drivers/scsi/lpfc/lpfc_sli.c:14126: warning: Function parameter or member 'cq' not described in 'lpfc_sli4_nvmet_handle_rcqe'
drivers/scsi/lpfc/lpfc_sli.c:14235: warning: Function parameter or member 'cqe' not described in 'lpfc_sli4_fp_handle_cqe'
drivers/scsi/lpfc/lpfc_sli.c:14235: warning: Excess function parameter 'eqe' description in 'lpfc_sli4_fp_handle_cqe'
drivers/scsi/lpfc/lpfc_sli.c:14336: warning: Function parameter or member 'eq' not described in 'lpfc_sli4_hba_handle_eqe'
drivers/scsi/lpfc/lpfc_sli.c:14808: warning: Function parameter or member 'entry_count' not described in 'lpfc_sli4_queue_alloc'
drivers/scsi/lpfc/lpfc_sli.c:15185: warning: Function parameter or member 'type' not described in 'lpfc_cq_create'
drivers/scsi/lpfc/lpfc_sli.c:15185: warning: Function parameter or member 'subtype' not described in 'lpfc_cq_create'
drivers/scsi/lpfc/lpfc_sli.c:15333: warning: Function parameter or member 'type' not described in 'lpfc_cq_create_set'
drivers/scsi/lpfc/lpfc_sli.c:15333: warning: Function parameter or member 'subtype' not described in 'lpfc_cq_create_set'
drivers/scsi/lpfc/lpfc_sli.c:16063: warning: Function parameter or member 'subtype' not described in 'lpfc_rq_create'
drivers/scsi/lpfc/lpfc_sli.c:16353: warning: Function parameter or member 'subtype' not described in 'lpfc_mrq_create'
drivers/scsi/lpfc/lpfc_sli.c:16533: warning: Function parameter or member 'phba' not described in 'lpfc_eq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16590: warning: Function parameter or member 'phba' not described in 'lpfc_cq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16644: warning: Function parameter or member 'phba' not described in 'lpfc_mq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16644: warning: Function parameter or member 'mq' not described in 'lpfc_mq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16644: warning: Excess function parameter 'qm' description in 'lpfc_mq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16698: warning: Function parameter or member 'phba' not described in 'lpfc_wq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16754: warning: Function parameter or member 'phba' not described in 'lpfc_rq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16754: warning: Function parameter or member 'hrq' not described in 'lpfc_rq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16754: warning: Function parameter or member 'drq' not described in 'lpfc_rq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16754: warning: Excess function parameter 'rq' description in 'lpfc_rq_destroy'
drivers/scsi/lpfc/lpfc_sli.c:16940: warning: Function parameter or member 'xri' not described in '__lpfc_sli4_free_xri'
drivers/scsi/lpfc/lpfc_sli.c:16955: warning: Function parameter or member 'xri' not described in 'lpfc_sli4_free_xri'
drivers/scsi/lpfc/lpfc_sli.c:17002: warning: Function parameter or member 'post_cnt' not described in 'lpfc_sli4_post_sgl_list'
drivers/scsi/lpfc/lpfc_sli.c:17002: warning: Excess function parameter 'count' description in 'lpfc_sli4_post_sgl_list'
drivers/scsi/lpfc/lpfc_sli.c:17221: warning: Function parameter or member 'sb_count' not described in 'lpfc_sli4_post_io_sgl_list'
drivers/scsi/lpfc/lpfc_sli.c:17451: warning: Function parameter or member 'did' not described in 'lpfc_fc_frame_to_vport'
drivers/scsi/lpfc/lpfc_sli.c:17590: warning: Function parameter or member 'vport' not described in 'lpfc_fc_frame_add'
drivers/scsi/lpfc/lpfc_sli.c:17817: warning: Function parameter or member 'vport' not described in 'lpfc_sli4_seq_abort_rsp'
drivers/scsi/lpfc/lpfc_sli.c:17817: warning: Function parameter or member 'aborted' not described in 'lpfc_sli4_seq_abort_rsp'
drivers/scsi/lpfc/lpfc_sli.c:17817: warning: Excess function parameter 'phba' description in 'lpfc_sli4_seq_abort_rsp'
drivers/scsi/lpfc/lpfc_sli.c:18060: warning: Function parameter or member 'seq_dmabuf' not described in 'lpfc_prep_seq'
drivers/scsi/lpfc/lpfc_sli.c:18060: warning: Excess function parameter 'dmabuf' description in 'lpfc_prep_seq'
drivers/scsi/lpfc/lpfc_sli.c:18332: warning: Function parameter or member 'dmabuf' not described in 'lpfc_sli4_handle_received_buffer'
drivers/scsi/lpfc/lpfc_sli.c:18655: warning: Function parameter or member 'rpi' not described in '__lpfc_sli4_free_rpi'
drivers/scsi/lpfc/lpfc_sli.c:18683: warning: Function parameter or member 'rpi' not described in 'lpfc_sli4_free_rpi'
drivers/scsi/lpfc/lpfc_sli.c:18714: warning: Function parameter or member 'ndlp' not described in 'lpfc_sli4_resume_rpi'
drivers/scsi/lpfc/lpfc_sli.c:18714: warning: Function parameter or member 'cmpl' not described in 'lpfc_sli4_resume_rpi'
drivers/scsi/lpfc/lpfc_sli.c:18714: warning: Function parameter or member 'arg' not described in 'lpfc_sli4_resume_rpi'
drivers/scsi/lpfc/lpfc_sli.c:18714: warning: Excess function parameter 'phba' description in 'lpfc_sli4_resume_rpi'
drivers/scsi/lpfc/lpfc_sli.c:19103: warning: Function parameter or member 'phba' not described in 'lpfc_check_next_fcf_pri_level'
drivers/scsi/lpfc/lpfc_sli.c:19266: warning: Function parameter or member 'fcf_index' not described in 'lpfc_sli4_fcf_rr_index_set'
drivers/scsi/lpfc/lpfc_sli.c:19295: warning: Function parameter or member 'fcf_index' not described in 'lpfc_sli4_fcf_rr_index_clear'
drivers/scsi/lpfc/lpfc_sli.c:19331: warning: Function parameter or member 'mbox' not described in 'lpfc_mbx_cmpl_redisc_fcf_table'
drivers/scsi/lpfc/lpfc_sli.c:20027: warning: Function parameter or member 'pwqeq' not described in 'lpfc_wqe_bpl2sgl'
drivers/scsi/lpfc/lpfc_sli.c:20027: warning: Excess function parameter 'pwqe' description in 'lpfc_wqe_bpl2sgl'
drivers/scsi/lpfc/lpfc_sli.c:20141: warning: Function parameter or member 'qp' not described in 'lpfc_sli4_issue_wqe'
drivers/scsi/lpfc/lpfc_sli.c:20141: warning: Excess function parameter 'ring_number' description in 'lpfc_sli4_issue_wqe'
drivers/scsi/lpfc/lpfc_sli.c:20434: warning: Function parameter or member 'qp' not described in '_lpfc_move_xri_pbl_to_pvt'
drivers/scsi/lpfc/lpfc_sli.c:20552: warning: Function parameter or member 'hwqid' not described in 'lpfc_keep_pvt_pool_above_lowwm'
drivers/scsi/lpfc/lpfc_sli.c:20552: warning: Excess function parameter 'qp' description in 'lpfc_keep_pvt_pool_above_lowwm'
drivers/scsi/lpfc/lpfc_sli.c:20682: warning: Function parameter or member 'qp' not described in 'lpfc_get_io_buf_from_private_pool'
Link: https://lore.kernel.org/r/20200721164148.2617584-24-lee.jones@linaro.org
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes the following W=1 kernel build warning(s):
drivers/scsi/lpfc/lpfc_sli.c: In function ‘lpfc_wq_create’:
drivers/scsi/lpfc/lpfc_sli.c:15810:16: warning: variable ‘pg_addr’ set but not used [-Wunused-but-set-variable]
15810 | unsigned long pg_addr;
| ^~~~~~~
Link: https://lore.kernel.org/r/20200721164148.2617584-21-lee.jones@linaro.org
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix smatch warning:
drivers/scsi/lpfc/lpfc_sli.c:15156 lpfc_cq_poll_hdler() warn:
inconsistent indenting
Link: https://lore.kernel.org/r/20200707150018.823350-1-colin.king@canonical.com
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current logging methods typically end up requesting a reproduction with
a different logging level set to figure out what happened. This was mainly
by design to not clutter the kernel log messages with things that were
typically not interesting and the messages themselves could cause other
issues.
When looking to make a better system, it was seen that in many cases when
more data was wanted was when another message, usually at KERN_ERR level,
was logged. And in most cases, what the additional logging that was then
enabled was typically. Most of these areas fell into the discovery machine.
Based on this summary, the following design has been put in place: The
driver will maintain an internal log (256 elements of 256 bytes). The
"additional logging" messages that are usually enabled in a reproduction
will be changed to now log all the time to the internal log. A new logging
level is defined - LOG_TRACE_EVENT. When this level is set (it is not by
default) and a message marked as KERN_ERR is logged, all the messages in
the internal log will be dumped to the kernel log before the KERN_ERR
message is logged.
There is a timestamp on each message added to the internal log. However,
this timestamp is not converted to wall time when logged. The value of the
timestamp is solely to give a crude time reference for the messages.
Link: https://lore.kernel.org/r/20200630215001.70793-14-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Although the existing implementation is very good at high I/O load, on
tests involving light load, especially on only a few hardware queues,
latency was a little higher than it can be due to using workqueue
scheduling. Other tasks in the system can delay handling.
Change the lower level to use irq_poll by default which uses a softirq for
I/O completion. This gives better latency as variance in when the cq is
processed is reduced over the workqueue interface. However, as high load is
better served by not being in softirq when the CPU is loaded, work queues
are still used under high I/O load.
Link: https://lore.kernel.org/r/20200630215001.70793-13-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, if there has been an issue whereby an adapter dump was taken,
there is nothing displayed to hint that it is present. Utilities must be
run and they must query for the status in order to then download the dump.
Add a message to the driver to query dump image presence when initializing
the SLI Port.
Link: https://lore.kernel.org/r/20200630215001.70793-12-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Change vocabulary of 0373 log msg from "error" to "cmpl" The current
language of the 0373 message contains the word "error" which caused a
number of customers to inquire about the "error" and if it should be a
concern. It isn't an error, it's simply an io completion status.
Revise the message to replace the word "error" with "cmpl" for completion.
Link: https://lore.kernel.org/r/20200630215001.70793-10-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the kdump kernel shuts down lpfc calls flush_work_queue on an
interrupt to schedule the cq handler. When there is only one CPU active on
the kdump kernel, it is possible for the work_on to get scheduled on a
non-active CPU causing it to never be scheduled.
When in the kdump environment, per-CPU affinity of cq's to cpus is not
necessary. In those cases, use a general queue_work rather than a
queue_work_on().
Link: https://lore.kernel.org/r/20200630215001.70793-9-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Call traces have been observed running different tests that involve aborts
and setting the rrq active flag. The lpfc_set_rrq_active routine is doing
a mempool_alloc under the soft_irq processing level. When the mempool needs
to get a new buffer from the free pool and has to wait for memory to become
free it will check the flags passed in on the alloc and dump the stack if
the thread is running in interrupt context.
Replace the GFP_KERNEL flag with GFP_ATOMIC so that the memory allocation
will not attempt to sleep if there is no mem available.
Link: https://lore.kernel.org/r/20200630215001.70793-7-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When using DUMP on SLI3 to read VPD and Port status data (config region
23), the adapter is overruning the kmalloc'd buffer causing havoc on other
consumers of the allocation pools.
Rework the loops processing the dump data and validate/size memory lengths
before performing bcopy.
Link: https://lore.kernel.org/r/20200630215001.70793-6-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Visual code inspection of the MDS implementation revealed two errors in
the driver:
- The set features Feature Code had an incorrect value
- The routine that classifies command type for cmd completions was missing
the Send Frame definition. Send Frame is used for MDS driver loopback.
Link: https://lore.kernel.org/r/20200630215001.70793-3-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This series consists of the usual driver updates (qla2xxx, ufs, zfcp,
target, scsi_debug, lpfc, qedi, qedf, hisi_sas, mpt3sas) plus a host
of other minor updates. There are no major core changes in this
series apart from a refactoring in scsi_lib.c.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXtq5QyYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishXyGAQCipTWx
7kHKHZBCVTU133bADt3+SstLrAm8PKZEXMnP9wEAzu4QkkW8URxEDRrpu7qk5gbA
9M/KyqvfRtTH7+BSK7M=
=J6aO
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
:This series consists of the usual driver updates (qla2xxx, ufs, zfcp,
target, scsi_debug, lpfc, qedi, qedf, hisi_sas, mpt3sas) plus a host
of other minor updates.
There are no major core changes in this series apart from a
refactoring in scsi_lib.c"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits)
scsi: ufs: ti-j721e-ufs: Fix unwinding of pm_runtime changes
scsi: cxgb3i: Fix some leaks in init_act_open()
scsi: ibmvscsi: Make some functions static
scsi: iscsi: Fix deadlock on recovery path during GFP_IO reclaim
scsi: ufs: Fix WriteBooster flush during runtime suspend
scsi: ufs: Fix index of attributes query for WriteBooster feature
scsi: ufs: Allow WriteBooster on UFS 2.2 devices
scsi: ufs: Remove unnecessary memset for dev_info
scsi: ufs-qcom: Fix scheduling while atomic issue
scsi: mpt3sas: Fix reply queue count in non RDPQ mode
scsi: lpfc: Fix lpfc_nodelist leak when processing unsolicited event
scsi: target: tcmu: Fix a use after free in tcmu_check_expired_queue_cmd()
scsi: vhost: Notify TCM about the maximum sg entries supported per command
scsi: qla2xxx: Remove return value from qla_nvme_ls()
scsi: qla2xxx: Remove an unused function
scsi: iscsi: Register sysfs for iscsi workqueue
scsi: scsi_debug: Parser tables and code interaction
scsi: core: Refactor scsi_mq_setup_tags function
scsi: core: Fix incorrect usage of shost_for_each_device
scsi: qla2xxx: Fix endianness annotations in source files
...
The axchg structure is a structure allocated early in the
lpfc_nvme_unsol_ls_handler() to represent the newly received exchange.
Upon error, the out_fail path in the routine unconditionally frees the
pointer, yet subsequently passes the pointer to the abort routine.
Additionally, the abort routine, lpfc_nvme_unsol_ls_issue_abort(), also
has a failure path that will attempt to delete the pointer on error.
Fix these errors by:
- Removing the unconditional free so that it stays valid if passed
to the abort routine.
- Revise the abort routine to not free the pointer. Instead, return
a success/failure status. Note: if success, the later completion of
the abort frees the structure.
- Back in the unsol_ls_handler() error path, if the abort routine was
skipped (thus no possible reference) or the abort routine returned
error, free the pointer.
Fixes: 3a8070c567 ("lpfc: Refactor NVME LS receive handling")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In preparation for supporting both intiator mode and target mode
receiving NVME LS's, commonize the existing NVME LS request receive
handling found in the base driver and in the nvmet side.
Using the original lpfc_nvmet_unsol_ls_event() and
lpfc_nvme_unsol_ls_buffer() routines as a templates, commonize the
reception of an NVME LS request. The common routine will validate the LS
request, that it was received from a logged-in node, and allocate a
lpfc_async_xchg_ctx that is used to manage the LS request. The role of
the port is then inspected to determine which handler is to receive the
LS - nvme or nvmet. As such, the nvmet handler is tied back in. A handler
is created in nvme and is stubbed out.
Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
To support FC-NVME-2 support (actually FC-NVME (rev 1) with Ammendment 1),
both the nvme (host) and nvmet (controller/target) sides will need to be
able to receive LS requests. Currently, this support is in the nvmet side
only. To prepare for both sides supporting LS receive, rename
lpfc_nvmet_rcv_ctx to lpfc_async_xchg_ctx and commonize the definition.
Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A lot of files in lpfc include nvme headers, building up relationships that
require a file to change for its headers when there is no other change
necessary. It would be better to localize the nvme headers.
There is also no need for separate nvme (initiator) and nvmet (tgt)
header files.
Refactor the inclusion of nvme headers so that all nvme items are
included by lpfc_nvme.h
Merge lpfc_nvmet.h into lpfc_nvme.h so that there is a single header used
by both the nvme and nvmet sides. This prepares for structure sharing
between the two roles. Prep to add shared function prototypes for upcoming
shared routines.
Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Running make C=1 M=drivers/scsi/lpfc triggers sparse warnings
Correct the code generating the following errors:
- Incompatible address space assignment without proper conversion.
- Deference of usespace and per-cpu pointers.
Link: https://lore.kernel.org/r/20200501214310.91713-8-jsmart2021@gmail.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In an audit of lockdep calls in the driver, there are multiple lockdep
checks in successive calling layers. E.g. a routine checks, and then calls
a lower routine that also checks, and so on. Calling sequences result in
many redundant checks.
Refine the code to remove lower-level lockdep checks. Update comments on
the lock, correcting a few places where lock object in comment was
incorrect.
Link: https://lore.kernel.org/r/20200501214310.91713-7-jsmart2021@gmail.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A previous change introduced the atomic use of queue_claimed flag for eq's
and cq's. The code works fine, but the clearing of the queue_claimed flag
is not atomic.
Change queue_claimed = 0 into xchg(&queue_claimed, 0) to be consistent for
change under atomicity.
Link: https://lore.kernel.org/r/20200501214310.91713-3-jsmart2021@gmail.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During code review, identified dss feature that was a prototype only and
was never productized in SLI3. They shouldn't be there and prevents reuse
of the command areas.
Remove any code in the driver to deal with dss, including code to deal with
fips, which is associated with the dss feature.
Link: https://lore.kernel.org/r/20200322181304.37655-12-jsmart2021@gmail.com
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>