On devices that support FCP sequence error recovery, which attempts to
preserve the devices login across link bounce, adisc is used for device
validation. Turns out the device fc4 type is cleared as part of the link
bounce, but the ADISC handling doesn't restore the FC4 support as it
normally would with a PRLI. This caused situations where the device wasn't
reregistered with the transport thus scan logic and LUN discovery never
kicked in.
In the ADISC completion handling, reset the fc4 type so that transport port
reregistration occurs with the remote port.
Link: https://lore.kernel.org/r/20200803210229.23063-8-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes the following W=1 kernel build warning(s):
drivers/scsi/lpfc/lpfc_nportdisc.c:1079: warning: Function parameter or member 'ndlp' not described in 'lpfc_release_rpi'
Link: https://lore.kernel.org/r/20200723122446.1329773-20-lee.jones@linaro.org
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current logging methods typically end up requesting a reproduction with
a different logging level set to figure out what happened. This was mainly
by design to not clutter the kernel log messages with things that were
typically not interesting and the messages themselves could cause other
issues.
When looking to make a better system, it was seen that in many cases when
more data was wanted was when another message, usually at KERN_ERR level,
was logged. And in most cases, what the additional logging that was then
enabled was typically. Most of these areas fell into the discovery machine.
Based on this summary, the following design has been put in place: The
driver will maintain an internal log (256 elements of 256 bytes). The
"additional logging" messages that are usually enabled in a reproduction
will be changed to now log all the time to the internal log. A new logging
level is defined - LOG_TRACE_EVENT. When this level is set (it is not by
default) and a message marked as KERN_ERR is logged, all the messages in
the internal log will be dumped to the kernel log before the KERN_ERR
message is logged.
There is a timestamp on each message added to the internal log. However,
this timestamp is not converted to wall time when logged. The value of the
timestamp is solely to give a crude time reference for the messages.
Link: https://lore.kernel.org/r/20200630215001.70793-14-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During driver unload/reload testing, the NVMe initiator would not
re-establish connectivity to NVMe controllers on reload.
The failing NVMe array supports concurrent FCP and NVMe operation via
different nport_id's. The array was repeatedly sending an ADISC every 2
seconds after PLOGI completed and while NVMe subsystems were executing
discovery. The target would continue this state for roughly 45 seconds.
The driver's current behavior on ADISC receipt is to validate a the ADISC
vs the device and issue a RESUME_RPI to restore transmission. The receipt
of the ADISC effectively caused a driver to take actions similar to a
logout and login for the remote port, causing the deregistration of the
nvme rport and a subsequent re-registration. This caused a constant reset
and re-connect of the NVMe controller while this 45s window occurred. There
was no need for the state changes as ADISC does not change login state.
This patch corrects this behavior by validating if the remoteport is
already logged in (MAPPED) and when true, avoids the call to set the ndlp
state to MAPPED, which triggers the unreg/re-reg. Thus ADISC does not
change the login state of the node.
Link: https://lore.kernel.org/r/20200630215001.70793-5-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
As the nvmet layer does not have the concept of a remoteport object, which
can be used to identify the entity on the other end of the fabric that is
to receive an LS, the hosthandle was introduced. The driver passes the
hosthandle, a value representative of the remote port, with a ls request
receive. The LS request will create the association. The transport will
remember the hosthandle for the association, and if there is a need to
initiate a LS request to the remote port for the association, the
hosthandle will be used. When the driver loses connectivity with the
remote port, it needs to notify the transport that the hosthandle is no
longer valid, allowing the transport to terminate associations related to
the hosthandle.
This patch adds support to the driver for the hosthandle. The driver will
use the ndlp pointer of the remote port for the hosthandle in calls to
nvmet_fc_rcv_ls_req(). The discovery engine is updated to invalidate the
hosthandle whenever connectivity with the remote port is lost.
Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A lot of files in lpfc include nvme headers, building up relationships that
require a file to change for its headers when there is no other change
necessary. It would be better to localize the nvme headers.
There is also no need for separate nvme (initiator) and nvmet (tgt)
header files.
Refactor the inclusion of nvme headers so that all nvme items are
included by lpfc_nvme.h
Merge lpfc_nvmet.h into lpfc_nvme.h so that there is a single header used
by both the nvme and nvmet sides. This prepares for structure sharing
between the two roles. Prep to add shared function prototypes for upcoming
shared routines.
Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Fix sparse warning:
drivers/scsi/lpfc/lpfc_nportdisc.c:344:1: warning:
symbol 'lpfc_defer_acc_rsp' was not declared. Should it be static?
Link: https://lore.kernel.org/r/20200107014956.41748-1-yuehaibing@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
NVMe device re-discovery does not complete. Dev_loss_tmo messages seen on
initiator after recovery from a link disturbance.
The failing case is the following:
When the driver (as a NVME target) receives a PLOGI, the driver initiates
an "unreg rpi" mailbox command. While the mailbox command is in progress,
the driver requests that an ACC be sent to the initiator. The target's ACC
is received by the initiator and the initiator then transmits a PLOGI. The
driver receives the PLOGI prior to receiving the completion for the PLOGI
response WQE that sent the ACC. (Different delivery sources from the hw so
the race is very possible). Given the PLOGI is prior to the ACC completion
(signifying PLOGI exchange complete), the driver LS_RJT's the PRLI. The
"unreg rpi" mailbox then completes. Since PRLI has been received, the
driver transmits a PLOGI to restart discovery, which the initiator then
ACC's. If the driver processes the (re)PLOGI ACC prior to the completing
the handling for the earlier ACC it sent the intiators original PLOGI,
there is no state change for completion of the (re)PLOGI. The ndlp remains
in "PLOGI Sent" and the initiator continues sending PRLI's which are
rejected by the target until timeout or retry is reached.
Fix by: When in target mode, defer sending an ACC for the received PLOGI
until unreg RPI completes.
Link: https://lore.kernel.org/r/20191218235808.31922-2-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is mostly update of the usual drivers: aacraid, ufs, zfcp,
NCR5380, lpfc, qla2xxx, smartpqi, hisi_sas, target, mpt3sas, pm80xx
plus a whole load of minor updates and fixes. The two major core
changes are Al Viro's reworking of sg's handling of copy to/from user,
Ming Lei's removal of the host busy counter to avoid contention in the
multiqueue case and Damien Le Moal's fixing of residual tracking
across error handling.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXeKvHCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishQJMAQDAjlAi
SNfbyndMqyf+rZGWufDI+43Up1VvW9GeWJHeDwEAxfO5XZsCks2uT8UxXhpEp9L7
HkiUww3zbcgl0FWFkUM=
=cdVU
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly update of the usual drivers: aacraid, ufs, zfcp,
NCR5380, lpfc, qla2xxx, smartpqi, hisi_sas, target, mpt3sas, pm80xx
plus a whole load of minor updates and fixes.
The major core changes are Al Viro's reworking of sg's handling of
copy to/from user, Ming Lei's removal of the host busy counter to
avoid contention in the multiqueue case and Damien Le Moal's fixing of
residual tracking across error handling"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (251 commits)
scsi: bnx2fc: timeout calculation invalid for bnx2fc_eh_abort()
scsi: target: core: Fix a pr_debug() argument
scsi: iscsi: Don't send data to unbound connection
scsi: target: iscsi: Wait for all commands to finish before freeing a session
scsi: target: core: Release SPC-2 reservations when closing a session
scsi: target: core: Document target_cmd_size_check()
scsi: bnx2i: fix potential use after free
Revert "scsi: qla2xxx: Fix memory leak when sending I/O fails"
scsi: NCR5380: Add disconnect_mask module parameter
scsi: NCR5380: Unconditionally clear ICR after do_abort()
scsi: NCR5380: Call scsi_set_resid() on command completion
scsi: scsi_debug: num_tgts must be >= 0
scsi: lpfc: use hdwq assigned cpu for allocation
scsi: arcmsr: fix indentation issues
scsi: qla4xxx: fix double free bug
scsi: pm80xx: Modified the logic to collect fatal dump
scsi: pm80xx: Tie the interrupt name to the module instance
scsi: pm80xx: Controller fatal error through sysfs
scsi: pm80xx: Do not request 12G sas speeds
scsi: pm80xx: Cleanup command when a reset times out
...
Prior to the last FC-NVME-2 draft, SLER and CONF were independent. SLER
now requires CONF to be set.
Revise the NVME PRLI checking to look for both inorder to enable SLER.
Link: https://lore.kernel.org/r/20191105005708.7399-7-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When operating in private loop mode, PLOGI exchanges are racing and the
driver tries to abort it's PLOGI. But the PLOGI abort ends up terminating
the login with the other end causing the other end to abort its PLOGI as
well. Discovery never fully completes.
Fix by disabling the PLOGI abort when private loop and letting the state
machine play out.
Link: https://lore.kernel.org/r/20191018211832.7917-5-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The initial lpfc_desc_set_adisc implementation in commit
dea3101e0a ("lpfc: add Emulex FC driver version 8.0.28") enabled ADISC if
cfg_use_adisc && RSCN_MODE && FCP_2_DEVICE
In commit 92d7f7b0cd ("[SCSI] lpfc: NPIV: add NPIV support on top of
SLI-3") this changed to
(cfg_use_adisc && RSC_MODE) || FCP_2_DEVICE
and later in commit ffc954936b ("[SCSI] lpfc 8.3.13: FC Discovery Fixes
and enhancements.") to
(cfg_use_adisc && RSC_MODE) || (FCP_2_DEVICE && FCP_TARGET)
A customer reports that after a devloss, an ADISC failure is logged. It
turns out the ADISC flag is set even the user explicitly set lpfc_use_adisc
= 0.
[Sat Dec 22 22:55:58 2018] lpfc 0000:82:00.0: 2:(0):0203 Devloss timeout on WWPN 50:01:43:80:12:8e:40:20 NPort x05df00 Data: x82000000 x8 xa
[Sat Dec 22 23:08:20 2018] lpfc 0000:82:00.0: 2:(0):2755 ADISC failure DID:05DF00 Status:x9/x70000
[mkp: fixed Hannes' email]
Fixes: 92d7f7b0cd ("[SCSI] lpfc: NPIV: add NPIV support on top of SLI-3")
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: James Smart <james.smart@broadcom.com>
Link: https://lore.kernel.org/r/20191022072112.132268-1-dwagner@suse.de
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix sparse warnings:
drivers/scsi/lpfc/lpfc_nportdisc.c:290:1: warning: symbol 'lpfc_defer_pt2pt_acc' was not declared. Should it be static?
Link: https://lore.kernel.org/r/1570183477-137273-1-git-send-email-zhengbin13@huawei.com
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Reviewed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After exchanging PLOGI on an SLI-3 adapter, the PRLI exchange failed. Link
trace showed the port was assigned a non-zero n_port_id, but didn't use the
address on the PRLI. The assigned address is set on the port by the
CONFIG_LINK mailbox command. The driver responded to the PRLI before the
mailbox command completed. Thus the PRLI response used the old n_port_id.
Defer the PRLI response until CONFIG_LINK completes.
Link: https://lore.kernel.org/r/20190922035906.10977-2-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
FC-NVMe-2 added support for sequence level error recovery in the FC-NVME
protocol. This allows for the detection of errors and lost frames and
immediate retransmission of data to avoid exchange termination, which
escalates into NVMeoFC connection and association failures. A significant
RAS improvement.
The driver is modified to indicate support for SLER in the NVMe PRLI is
issues and to check for support in the PRLI response. When both sides
support it, the driver will set a bit in the WQE to enable the recovery
behavior on the exchange. The adapter will take care of all detection and
retransmission.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In order to see real addresses, convert %p with %px for kernel addresses
and replace %p with %pf for functions.
While converting, standardize on "x%px" throughout (not %px or 0x%px).
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If a target issues an ADISC to the port and the target is a NVME target,
the driver is inadvertantly invalidating the login and marking the remote
port as logged out. Communication with the target is lost.
Revise the ADISC check so that FCP or NVME targets will be marked valid at
the end of ADISC processing. Enhance logging to recognize condition
better.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some remote ports may be slow in registering their GID_FT protocol
information with the fabric. If the remote port is an initiator, it may
send PLOGI to the port before the GID_FT logic is complete. Meaning, after
accepting the PLOGI, when the driver may see no response to the GID_FT that
is issued after the login to determine the protocols supported so that
proper PRLI's may be transmit. If the driver has no fc4 information, it
currently stops and the remote port is not discovered.
Fix by issuing a LOGO when there is no GID_FT information. The LOGO
completion handling will attempt to re-login if the nport_id is still
present.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In cases of remote-port-side cable pull/replug, there happens to be a
target that upon replug will send the port a PLOGI, a PRLI, and a LOGO.
When this sequence is received by the driver, the PLOGI accepted and a
GFT_ID is issued to find the protocol support for the remote port. While
the GFT_ID is outstanding, a LOGO is received. The driver logs the remote
port out and unregisters the RPI and schedules a new PLOGI transmission.
However, the GFT_ID was not terminated. When it completed, the driver
attempted to transition the remote port to PRLI transmission, which cancels
the PLOGI scheduling. The PRLI transmit attempt is rejected by the adapter
as the remote port is not logged in. No retry is attempted as it's expected
the logout is noted and the supposedly scheduled PLOGI should address the
state. As there is no PLOGI, the remote port does not get re-discovered.
Fix by aborting the outstanding GFT_ID if the related remote port is logged
out.
Ensure a PRLI transmit attempt only occurs if the remote port is logging
in. This avoids the incorrect attempt while logged out.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch does not change any functionality but avoids that the compiler
complains about set-but-not-used variables when building with W=1.
Cc: James Smart <james.smart@broadcom.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch avoids that the compiler warns about missing fall-through
annotation when building with W=1.
Cc: James Smart <james.smart@broadcom.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch avoids that the compiler complains about missing declarations
when building with W=1.
Cc: James Smart <james.smart@broadcom.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For files modified as part of 12.2.0.0 patches, update copyright to 2019
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The conversion to enable SCSI and NVME fc4 support ran into an issue with
NPIV support. With NVME, NPIV is not currently supported, but with SCSI it
was. The driver reverted to its lowest setting meaning NPIV with SCSI was
not allowed.
Convert the NPIV checks and implementation so that SCSI can continue to
allow NPIV support.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is getting hit with 100s of RSCNs during remote port address
changes. Each of those RSCN's ends up generating UNREG_RPI and REG_PRI
mailbox commands. The discovery engine within the driver doesn't wait for
the mailbox command completions. Instead it sets state flags and moves
forward. At some point, there's a massive backlog of mailbox commands which
take time for the adapter to process. Additionally, it appears there were
duplicate events from the switch so the driver generated duplicate mailbox
commands for the same remote port. During this window, failures on PLOGI
and PRLI ELS's are see as the adapter is rejecting them as they are for
remote ports that still have pending mailbox commands.
Streamline the discovery engine so that PLOGI log checks for outstanding
UNREG_RPIs and defer the processing until the commands complete. This
better synchronizes the ELS transmission vs the RPI registrations.
Filter out multiple UNREG_RPIs being queued up for the same remote port.
Beef up log messages in this area.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver data structure for managing a mailbox command contained two
context fields. Unfortunately, the context were considered "generic" to be
used at the whim of the command code. Of course, one section of code used
fields this way, while another did it that way, and eventually there were
mixups.
Refactored the structure so that the generic contexts become a node context
and a buffer context and all code standardizes on their use.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The switches seem to respond faster to GID_PT vs GID_FT NameServer
queries. Add support for GID_PT to be used over GID_FT to enable
faster storage failover detection. Includes addition of new module
parameter to select between GID_PT and GID_FT (GID_FT is default).
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
An address change for a remote port cause PRLI for the wrong protocol
to be sent. The node copy done in the discovery code skipped copying
the fc4 protocols supported as well.
Fix the copy logic for the address change. Beefed up log messages in
this area as well.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After a LOGO in response to an ABTS timeout, a PLOGI wasn't issued to
re-establish the login. An nlp_type check in the LOGO completion
handler failed to restart discovery for NVME targets. Revised the
nlp_type check for NVME as well as SCSI.
While reviewing the LOGO handling a few other issues were seen and
were addressed:
- Better lock synchronization around ndlp data types
- When the ABTS times out, unregister the RPI before sending the LOGO
so that all local exchange contexts are cleared and nothing received
while awaiting LOGO/PLOGI handling will be accepted.
- LOGO handling optimized to:
Wait only R_A_TOV for a response.
It doesn't need to be retried on timeout. If there wasn't a
response, a PLOGI will be sent, thus an implicit logout
applies as well when the other port sees it.
If there is a response, any kind of response is considered "good"
and the XRI quarantined for a exchange qualifier window.
- PLOGI is issued as soon a LOGO state is resolved.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver only sends NVME PRLI to a device that also supports FCP. This resuls
in remote ports that don't have fc_remote_ports created for them. The driver
is clearing the nlp_fc4_type for a ndlp at the wrong time.
Fix by moving the nlp_fc4_type clearing to the discovery engine in the
DEVICE_RECOVERY state. Also ensure that rport registration is done for all
nlp_fc4_types.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Performance is affected when target queue depth is tracked. An atomic
counter is incremented on the submission path which competes with it being
decremented on the completion path. In addition, multiple CPUs can
simultaniously be manipulating this counter for the same ndlp.
Reduce the overhead by only performing the target increment/decrement when
the target queue depth is less than the overall adapter depth, thus is
actually meaningful.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For ABORT_XRI_CN command, firmware identifies XRI to abort by IOTAG and RPI
combination. For ELS aborts, driver specifies IOTAG correctly but RPI is
not specified.
Fix by setting RPI in WQE.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Change references from "Broadcom Limited" to "Broadcom Inc." in the
copyright message. Update copyright duration if not yet updated for 2018.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Under large configurations, the driver would start to log message 6065 -
NVME out of buffers (exchanges).
The driver is using the ndlp cmd_qdepth value when determining the max
outstanding ios for an adapter. This value, by default, is set to 65536,
which exceeds the maximum exchange counts supported on an adapter. The ndlp
cmd_qdepth has no relevance and outstanding io count should be capped at
the max exchange count with IO requests beyond that level getting bounced
back with an EBUSY status so that they are retried by the block layer.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Nodelist entry for SCSI array ends up in UNMAPPED state. This is due to
illegal discovery State machine transition because of two PRLIs and the
first one failing with LS_RJT. Also, the error path was designed
assuming the PRLIs complete in the order they were sent, FCP first, then
NVME. In a failing case, the array thinks about the first PRLI (FCP),
but issues LS_RJT for the 2nd PRLI immediately.
Fix PRLI completion error path for the ordering expectation. Ensure the
discovery state machine update is not set until all outstanding PRLIs
are complete.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
First Burst support was not properly indicated in NVMe PRLI.
Correct the bit position and the logic to check and set first burst support.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Updated Copyright in files updated 11.4.0.7
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Revise the NVME PRLI to indicate CONF support.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In the lpfc discovery engine, when as a nvme target, where the driver
was performing mailbox io with the adapter for port login when a NVME
PRLI is received from the host. Rather than queue and eventually get
back to sending a response after the mailbox traffic, the driver
rejected the io with an error response.
Turns out this particular initiator didn't like the rejection values
(unable to process command/command in progress) so it never attempted a
retry of the PRLI. Thus the host never established nvme connectivity
with the lpfc target.
By changing the rejection values (to Logical Busy/nothing more), the
initiator accepted the response and would retry the PRLI, resulting in
nvme connectivity.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When enabled for both SCSI and NVME support, and connected pt2pt to a
SCSI only target, the driver nodelist entry for the remote port is left
in PRLI_ISSUE state and no SCSI LUNs are discovered. Works fine if only
configured for SCSI support.
Error was due to some of the prli points still reflecting the need to
send only 1 PRLI. On a lot of fabric configs, targets were NVME only,
which meant the fabric-reported protocol attributes were only telling
the driver one protocol or the other. Thus things worked fine. With
pt2pt, the driver must send a PRLI for both protocols as there are no
hints on what the target supports. Thus pt2pt targets were hitting the
multiple PRLI issues.
Complete the dual PRLI support. Track explicitly whether scsi (fcp) or
nvme prli's have been sent. Accurately track protocol support detected
on each node as reported by the fabric or probed by PRLI traffic.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Handling a rcv'ed PRLI incorrectly can cause the ndlp to end up in the
wrong state or the driver to ACC and PRLI when it should send LS_RJT.
The cause was due to the driver not properly looking at the PRLI type
and taking the multiple protocol support into consideration.
Resolved by adding checks in the various PRLI receive points to validate
PRLI type and reject if not valid for the enabled protocols and mode
(host vs target).
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver crashes when attempting to use a freed ndpl pointer.
The pci_remove_one handler runs on a separate kernel thread. The order
of the removal is starting by freeing all of the ndlps and then
disabling interrupts. In between these two events the driver can still
receive an ELS and process it. When it tries to use the ndlp pointer
will be NULL
Change the order of the pci_remove_one vs disable interrupts so that
interrupts are disabled before the ndlp's are freed.
Cc: <stable@vger.kernel.org> # 4.12+
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After link bounce in a NVME Pt2Pt config, the driver managed to map the
same nport twice, resulting in multiple device nodes for the same
namespace.
In Pt2Pt, the driver must send PRLI's for both (scsi) FCP and NVME
rather than using fabric aids. The driver was inconsistent on handling
various PRLI completions, especially rejects, which had reject codes
cross the different protocol PRLI completions.
Fixed to perform the following: if nvmet mode (fc port can only be a
nvme target) - rejects all unsolicitly FCP PRLI's. Never issues a FCP
PRLI.
The multiple protocol PRLI's are sent simultaneously. However, driver
will now only state transition after both PRLI's are complete. New flags
were added to aid tracking the responses from the different PRLI's.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the switch blade is pulled out then plugged back in, the driver
does not issue a PLOGI to the target
When the switch blade is pulled out, it does not reset the link. The
driver ends up issuing a LOGO to the target, and finally sees devloss.
Since the driver believes that a LOGO is outstanding, it does not issue
a PLOGI to the target upon link up
Correct by placing the ndlp in UNUSED state When devloss happens in
LOGO_ISSUE state.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver does not discover targets when in loop mode.
The NLP type is correctly getting set when a fabric connection is
detected but, not for loop. The unknown NLP type means that the driver
does not issue a PRLI when in loop topology. Thus target discovery
fails.
Fix by checking the topology during discovery. If it is loop, set the
NLP FC4 type to FCP.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We might have a NULL pring in lpfc_els_abort(), for example on error
recovery path, since queues are destroyed during error recovery
mechanism.
In this case, we should just drop the abort since the queues will be
recreated anyway. This patch just verifies for NULL pointer and stop the
abortion of the queue in case of a NULL pring.
Also, this patch converts return type of lpfc_els_abort() from int to
void, since it's not checked anywhere.
Reported-by: Harsha Thyagaraja <hathyaga@in.ibm.com>
Reported-by: Naresh Bannoth <nbannoth@in.ibm.com>
Tested-by: Raphael Silva <raphasil@linux.vnet.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Code review of NVMEI's FC_PORT_ROLE_NVME_DISCOVERY looked wrong.
Discussions with storage architecture team clarified NVMEI's audit of
the PRLI response port roles. Following up discussion with code review
showed a few minor corrections were required - especially in
anticipation of NVME auto discovery.
During PRLI, NVMEI should sent prli_init - which it it does. NVMET
should send prli_tgt and prli_disc - which it does. When NVMEI receives
a PRLI Response now, it audits the incoming target bits and stores the
attributes in the corresponding NDLP. Later, when NVMEI registers the
NVME rport, it uses the stored ndlp attributes to set the rport
port_roles correctly.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
NVMET didn't have any RSCN handling at all and
would not execute implicit LOGO when receiving a PLOGI
from an rport that NVMET had in state UNMAPPED.
Clean up the logic in lpfc_nlp_state_cleanup for
initiators (FCP and NVME). NVMET should not respond to
RSCN including allocating new ndlps so this code was
conditionalized when nvmet_support is true. The check
for NLP_RCV_PLOGI in lpfc_setup_disc_node was moved
below the check for nvmet_support to allow the NVMET
to recover initiator nodes correctly. The implicit
logo was introduced with lpfc_rcv_plogi when NVMET gets
a PLOGI on an ndlp in UNMAPPED state. The RSCN handling
was modified to not respond to an RSCN in NVMET. Instead
NVMET sends a GID_FT and determines if an NVMEP_INITIATOR
it has is UNMAPPED but no longer in the zone membership.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Update copyrights to 2017 for all files touched in this patch set
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>