The G7 adapter supports 64G link speeds. Add support to the driver.
In addition, a small cleanup to replace the odd bitmap logic with
a switch case.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add PCI ids for the new G7 adapter
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
New if_type=6 adapters support an additional BAR that provides
apertures to allow direct WQE to adapter push support - termed
Direct Packet Push (DPP). WQ creation differs slightly to ask for
a WQ to be DPP-ized. When submitting a WQE to a DPP WQ, it is
submitted to the host memory for the WQ normally, but is also
written by the host cpu directly to a BAR aperture. Write buffer
coalescing in hardware is (hopefully) turned on, enabling single
pci write operation support. The doorbell is thing rung to indicate
the WQE is available and was pushed to the aperture.
This patch:
- Updates the WQ Create commands for the DPP options
- Adds the bar mapping for if_type=6 DPP bar
- Adds the WQE pushing to the DDP aperture received from WQ create
- Adds a new module parameter to disable DPP operation if desired.
Default is enabled.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
New hardware supports a SLI-4 interface, but with a new if_type
variant of 6.
If_type=6 has a different PCI BAR map, separate EQ/CQ doorbells,
and some changes in doorbell formats.
Add the changes for the if_type into headers, adapter initialization
and control flows. Add new eq and cq handlers.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Up until now, all SLI-4 devices had the same doorbells at the same
bar locations. With newer hardware, there are now independent EQ and
CQ doorbells and the bar locations differ.
Prepare the code for new hardware by separating the eq/cq doorbell into
separate components. The components can be set based on if_type.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Up until now, an SLI-4 device had no variance in the way it handled
its EQs and CQs. With newer hardware, there are now differences in
doorbells and some differences in how entries are valid.
Prepare the code for new hardware by creating a sli4-based callout
table that can be set based on if_type.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Updated Copyright in files updated 11.4.0.7
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update the driver version to 11.4.0.7
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In a test that is doing large numbers of cable swaps on the target, the
nvme controllers wouldn't reconnect.
During the cable swaps, the targets n_port_id would change. This
information was passed to the nvme-fc transport, in the new remoteport
registration. However, the nvme-fc transport didn't update the n_port_id
value in the remoteport struct when it reused an existing structure.
Later, when a new association was attempted on the remoteport, the
driver's NVME LS routine would use the stale n_port_id from the
remoteport struct to address the LS. As the device is no longer at that
address, the LS would go into never never land.
Separately, the nvme-fc transport will be corrected to update the
n_port_id value on a re-registration.
However, for now, there's no reason to use the transports values. The
private pointer points to the drivers node structure and the node
structure is up to date. Therefore, revise the LS routine to use the
drivers data structures for the LS. Augmented the debug message for
better debugging in the future.
Also removed a duplicate if check that seems to have slipped in.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, write underruns (mismatch of amount transferred vs scsi
status and its residual) detected by the adapter are not being flagged
as an error. Its expected the target controls the data transfer and
would appropriately set the RSP values. Only read underruns are treated
as errors.
Revise the SCSI error handling to treat write underruns as an error as
well.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver was inappropriately pulling in the nvme host's nvme.h
header. What it really needed was the standard <linux/nvme.h> header.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When using the special option to suppress the response iu, ensure the
adapter fully supports the feature by checking feature flags from the
adapter and validating the support when formatting the WQE.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During SCSI error handling escalation to host reset, the SCSI io
routines were moved off the txcmplq, but the individual io's ON_CMPLQ
flag wasn't cleared. Thus, a background thread saw the io and attempted
to access it as if on the txcmplq.
Clear the flag upon removal.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Revise the NVME PRLI to indicate CONF support.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver ignored checks on whether the link should be kept
administratively down after a link bounce. Correct the checks.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During link bounce testing in a point-to-point topology, the host may
enter a soft lockup on the lpfc_worker thread:
Call Trace:
lpfc_work_done+0x1f3/0x1390 [lpfc]
lpfc_do_work+0x16f/0x180 [lpfc]
kthread+0xc7/0xe0
ret_from_fork+0x3f/0x70
The driver was simultaneously setting a combination of flags that caused
lpfc_do_work()to effectively spin between slow path work and new event
data, causing the lockup.
Ensure in the typical wq completions, that new event data flags are set
if the slow path flag is running. The slow path will eventually
reschedule the wq handling.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Make the attribute writeable.
Remove the ramp up to logic as its unnecessary, simply set depth. Add
debug message if depth changed, possibly reducing limit, yet our
outstanding count has yet to catch up with it.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When nvme target deferred receive logic waits for exchange resources,
the corresponding receive buffer is not replenished with the hardware.
This can result in a lack of asynchronous receive buffer resources in
the hardware, resulting in a "2885 Port Status Event: ... error
1=0x52004a01 ..." message.
Correct by replenishing the buffer whenenver the deferred logic kicks
in. Update corresponding debug messages and statistics as well.
[mkp: applied by hand]
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A stress test repeatedly resetting the adapter while performing io would
eventually report I/O failures and missing nvme namespaces.
The driver was setting the nvmefc_fcp_req->private pointer to NULL
during the IO completion routine before upcalling done(). If the
transport was also running an abort for that IO, the driver would fail
the abort with message 6140. Failing the abort is not allowed by the
nvme-fc transport, as it mandates that the io must be returned back to
the transport. As that does not happen, the transport controller delete
has an outstanding reference and can't complete teardown.
The NULL-ing of the private pointer should be done only when the io is
considered complete. It's complete when the adapter returns the exchange
with the "exchange busy" flag clear.
Move the NULL'ing of the structure to the done case. This leaves the io
contexts set while it is busy and until the subsequent XRI_ABORTED
completion which returns the exchange is received.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The lpfc driver does not discover a target when the topology changes
from switched-fabric to direct-connect. The target rejects the PRLI from
the initiator in direct-connect as the driver is using the old S_ID from
the switched topology.
The driver was inappropriately clearing the VP bit to register the VPI,
which is what is associated with the S_ID.
Fix by leaving the VP bit set (it was set earlier) and as the VFI is
being re-registered, set the UPDT bit.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
I/O conditions on the nvme target may have the driver submitting to a
full hardware wq. The hardware wq is a shared resource among all nvme
controllers. When the driver hit a full wq, it failed the io posting
back to the nvme-fc transport, which then escalated it into errors.
Correct by maintaining a sideband queue within the driver that is added
to when the WQ full condition is hit, and drained from as soon as new WQ
space opens up.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Existing code was using the wrong field for the completion status when
comparing whether to increment abort statistics
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ensure nvme localports/targetports are torn down before dismantling the
adapter sli interface on driver detachment. This aids leaving
interfaces live while nvme may be making callbacks to abort it.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Increased CQ and WQ sizes for SCSI FCP, matching those used for NVMe
development.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver controls when the hardware sends completions that communicate
consumption of elements from the WQ. This is done by setting a WQEC bit
on a WQE.
The current driver sets it on every Nth WQE posting. However, the driver
isn't clearing the bit if the WQE is reused. Thus, if the queue depth
isn't evenly divisible by N, with enough time, it can be set on every
element, creating a lot of overhead and risking CQ full conditions.
Correct by clearing the bit when not setting it on an Nth element.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Here is the set of "big" driver core patches for 4.16-rc1.
The majority of the work here is in the firmware subsystem, with reworks
to try to attempt to make the code easier to handle in the long run, but
no functional change. There's also some tree-wide sysfs attribute
fixups with lots of acks from the various subsystem maintainers, as well
as a handful of other normal fixes and changes.
And finally, some license cleanups for the driver core and sysfs code.
All have been in linux-next for a while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWnLvPw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynNzACgkzjPoBytJWbpWFt6SR6L33/u4kEAnRFvVCGL
s6ygQPQhZIjKk2Lxa2hC
=Zihy
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the set of "big" driver core patches for 4.16-rc1.
The majority of the work here is in the firmware subsystem, with
reworks to try to attempt to make the code easier to handle in the
long run, but no functional change. There's also some tree-wide sysfs
attribute fixups with lots of acks from the various subsystem
maintainers, as well as a handful of other normal fixes and changes.
And finally, some license cleanups for the driver core and sysfs code.
All have been in linux-next for a while with no reported issues"
* tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (48 commits)
device property: Define type of PROPERTY_ENRTY_*() macros
device property: Reuse property_entry_free_data()
device property: Move property_entry_free_data() upper
firmware: Fix up docs referring to FIRMWARE_IN_KERNEL
firmware: Drop FIRMWARE_IN_KERNEL Kconfig option
USB: serial: keyspan: Drop firmware Kconfig options
sysfs: remove DEBUG defines
sysfs: use SPDX identifiers
drivers: base: add coredump driver ops
sysfs: add attribute specification for /sysfs/devices/.../coredump
test_firmware: fix missing unlock on error in config_num_requests_store()
test_firmware: make local symbol test_fw_config static
sysfs: turn WARN() into pr_warn()
firmware: Fix a typo in fallback-mechanisms.rst
treewide: Use DEVICE_ATTR_WO
treewide: Use DEVICE_ATTR_RO
treewide: Use DEVICE_ATTR_RW
sysfs.h: Use octal permissions
component: add debugfs support
bus: simple-pm-bus: convert bool SIMPLE_PM_BUS to tristate
...
This is mostly updates of the usual driver suspects: arcmsr,
scsi_debug, mpt3sas, lpfc, cxlflash, qla2xxx, aacraid, megaraid_sas,
hisi_sas. We also have a rework of the libsas hotplug handling to
make it more robust, a slew of 32 bit time conversions and fixes, and
a host of the usual minor updates and style changes. The biggest
potential for regressions is the libsas hotplug changes, but so far
they seem stable under testing.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWnH+5SYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishWxuAP0UvuJp
MNR/yU/wv/emSzOc48Ldwd7I0xD2XxSnloGUgwD+IGZZT5yNUQA1THCbm+en4hkB
WvyBieQs9qRit+2czd4=
=gJMf
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly updates of the usual driver suspects: arcmsr,
scsi_debug, mpt3sas, lpfc, cxlflash, qla2xxx, aacraid, megaraid_sas,
hisi_sas.
We also have a rework of the libsas hotplug handling to make it more
robust, a slew of 32 bit time conversions and fixes, and a host of the
usual minor updates and style changes. The biggest potential for
regressions is the libsas hotplug changes, but so far they seem stable
under testing"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (313 commits)
scsi: qla2xxx: Fix logo flag for qlt_free_session_done()
scsi: arcmsr: avoid do_gettimeofday
scsi: core: Add VENDOR_SPECIFIC sense code definitions
scsi: qedi: Drop cqe response during connection recovery
scsi: fas216: fix sense buffer initialization
scsi: ibmvfc: Remove unneeded semicolons
scsi: hisi_sas: fix a bug in hisi_sas_dev_gone()
scsi: hisi_sas: directly attached disk LED feature for v2 hw
scsi: hisi_sas: devicetree: bindings: add LED feature for v2 hw
scsi: megaraid_sas: NVMe passthrough command support
scsi: megaraid: use ktime_get_real for firmware time
scsi: fnic: use 64-bit timestamps
scsi: qedf: Fix error return code in __qedf_probe()
scsi: devinfo: fix format of the device list
scsi: qla2xxx: Update driver version to 10.00.00.05-k
scsi: qla2xxx: Add XCB counters to debugfs
scsi: qla2xxx: Fix queue ID for async abort with Multiqueue
scsi: qla2xxx: Fix warning for code intentation in __qla24xx_handle_gpdb_event()
scsi: qla2xxx: Fix warning during port_name debug print
scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout()
...
Several statements are indented too far, fix these
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
localport is being dereferenced to assign lport and then immediately
afterwards localport is being sanity checked to see if it is null. Fix
this by only dereferencing localport until after it has been null
checked.
Detected by CoverityScan, CID#1463038 ("Dereference before null check")
Fixes: 3a8cefbfc5ee ("scsi: lpfc: Beef up stat counters for debug")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Prior patch mixed up what argument in the macro was what, so min value
was placed as the "default" argument, and the default value was placed
as the "min" argument. Thus, when the default was applied, it looked
like the default was smaller than the allowed min.
Swap argument postions to correct.
[mkp: fixed checkpatch warning]
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update the driver version to 11.4.0.6
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If log verbose in not turned on, its hard to tell when certain error
paths get hit. Add stats counters and corresponding logic to
debugfs/sysfs to aid understanding what paths were traversed.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When unregistering a remote port the lpfc driver would eventually wait
for the remoteport_unreg done callback. But the driver never completed
the io aborts that would allow the connections to terminate thus the
unreg done callback was never issued. Turns out the coding style of the
driver allowed for the wait to occur on the same cpu that the deferred
isr is called on. The blocking for the wait, blocked the isr, and as the
isr didn't run, the io aborts wouldn't finish.
Turns out there was never a good reason to block waiting for the unreg
done in the first place. The driver can continue execution and the ref
counting within the driver will do the right thing.
Resolve by removing the wait and patching up a few cases where the ref
counting didn't look right - mainly cases where the remote port comes
back before the aborts had completed and the unreg done had been
called. Additionally, a few places which used pointer values to guide
driver actions weren't protected by lock, so correct those.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In the lpfc discovery engine, when as a nvme target, where the driver
was performing mailbox io with the adapter for port login when a NVME
PRLI is received from the host. Rather than queue and eventually get
back to sending a response after the mailbox traffic, the driver
rejected the io with an error response.
Turns out this particular initiator didn't like the rejection values
(unable to process command/command in progress) so it never attempted a
retry of the PRLI. Thus the host never established nvme connectivity
with the lpfc target.
By changing the rejection values (to Logical Busy/nothing more), the
initiator accepted the response and would retry the PRLI, resulting in
nvme connectivity.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When enabled for both SCSI and NVME support, and connected pt2pt to a
SCSI only target, the driver nodelist entry for the remote port is left
in PRLI_ISSUE state and no SCSI LUNs are discovered. Works fine if only
configured for SCSI support.
Error was due to some of the prli points still reflecting the need to
send only 1 PRLI. On a lot of fabric configs, targets were NVME only,
which meant the fabric-reported protocol attributes were only telling
the driver one protocol or the other. Thus things worked fine. With
pt2pt, the driver must send a PRLI for both protocols as there are no
hints on what the target supports. Thus pt2pt targets were hitting the
multiple PRLI issues.
Complete the dual PRLI support. Track explicitly whether scsi (fcp) or
nvme prli's have been sent. Accurately track protocol support detected
on each node as reported by the fabric or probed by PRLI traffic.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Increased the sizes of the SCSI WQ's and CQ's so that SCSI operation is
similar to that used by NVME. However, size increase restricted only to
those newer adapters that can support the larger WQE size, thus bigger
queue sizes.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Handling a rcv'ed PRLI incorrectly can cause the ndlp to end up in the
wrong state or the driver to ACC and PRLI when it should send LS_RJT.
The cause was due to the driver not properly looking at the PRLI type
and taking the multiple protocol support into consideration.
Resolved by adding checks in the various PRLI receive points to validate
PRLI type and reject if not valid for the enabled protocols and mode
(host vs target).
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is all set to handle the defer_rcv api for the nvmet_fc
transport, yet didn't properly recognize the return status when the
defer_rcv occurred. The driver treated it simply as an error and aborted
the io. Several residual issues occurred at that point.
Finish the defer_rcv support: recognize the return status when the io
request is being handled in a deferred style. This stops the rogue
aborts; Replenish the async cmd rcv buffer in the deferred receive if
needed.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
NVME targets appear to randomly disconnect from the initiator when
running heavy IO.
The error is due to the host aggregate (across all controllers) io load
was beyond the maximum exchange count for nvme on the adapter. The
driver was properly returning a resource busy status, but the io load
was so great heartbeat commands would be bounced and not have a
successful retry within the fuzz amount for the nvme heartbeat (yes, a
very high io load!). Thus the target was terminating the controller due
to a keep alive failure.
Resolve by reserving a few exchanges (by counters) which can be used
when the adapter is out of normal exchanges and the command is a NVME
heartbeat command. As counters are used, while the reserved command is
outstanding, as soon as any other exchange completes, the counters are
adjusted and the reserved count is replenished. The heartbeat completes
execution in a normal fashion.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update the driver version to 11.4.0.5
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The logic for sg_seg_cnt is a bit convoluted. This patch tries to clean
up a couple of areas, especially around the +2 and +1 logic.
This patch:
- Cleans up the lpfc_sg_seg_cnt attribute to specify a real minimum
rather than making the minimum be whatever the default is.
- Removes the hardcoding of +2 (for the number of elements we use in a
sgl for cmd iu and rsp iu) and +1 (an additional entry to compensate
for nvme's reduction of io size based on a possible partial page)
logic in sg list initialization. In the case where the +1 logic is
referenced in host and target io checks, use the values set in the
transport template as that value was properly set.
There can certainly be more done in this area and it will be addressed
in combined host/target driver effort.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During driver unload, the driver may crash due to NULL pointers. The
NULL pointers were due to the driver not protecting itself sufficiently
during some of the teardown paths. Additionally, the driver was not
waiting for and cleanup up nvme io resources. As such, the driver wasn't
making the callbacks to the transport, stalling the transports
association teardown.
This patch waits for io clean up before tearding down and adds checks
for possible NULL pointers.
Cc: <stable@vger.kernel.org> # 4.12+
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the driver is unloading, the nvme transport could be in the process
of submitting new requests, will send abort requests to terminate
associations, or may make LS-related requests. The driver's abort and
request entry points currently is ignorant of the unloading state and is
starting the requests even though the infrastructure to complete them
continues to teardown.
Change the entry points for new requests to check whether unloading and
if so, reject the requests. Abort routines check unloading, and if so,
noop the request. An abort is noop'd as the teardown paths are already
aborting/terminating the io outstanding at the time the teardown
initiated.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver's interaction with the host nvme transport has been incorrect
for a while. The driver did not wait for the unregister callbacks
(waited only 5 jiffies). Thus the driver may remove objects that may be
referenced by subsequent abort commands from the transport, and the
actual unregister callback was effectively a noop. This was especially
problematic if the driver was unloaded.
The driver now waits for the unregister callbacks, as it should, before
continuing with teardown.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver currently registers any remote port that has NVME support.
It should only be registering target ports.
Register only target ports.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During RSCN storms, the driver does not rediscover some targets. The
driver marks some RSCN as to be handled after the ones it's working
on. The driver missed processing some deferred RSCN.
Move where the driver checks for deferred RSCNs and initiate deferred
RSCN handling if the flag was set. Also revise nport state within the
RSCN confirm routine. Add some state data to a possible debug print to
aid future debugging.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
pt2pt ndlp ref count prematurely goes to 0. There was reference removed
that should only be removed if connected to a switch, not if in
point-to-point mode.
Add a mode check before the reference remove.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current default for async hw receive queues is 1, which presents
issues under heavy load as number of queues influence the available
async receive buffer limits.
Raise the default to the either the current hw limit (16) or the number
of hw qs configured (io channel value).
Revise the attribute definition for mrq to better reflect what we do for
hw queues. E.g. 0 means default to optimal (# of cpus), non-zero
specifies a specific limit. Before this change, mrq=0 meant target mode
was disabled. As 0 now has a different meaning, rework the if tests to
use the better nvmet_support check.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Display for lpfc/fnX/iDiag/queInfo isn't formatted perfectly. Corrected
the format strings for the queue info debug messages.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver does not respond to PLOGI from the direct attach target. The
driver uses incorrect S_ID in CONFIG_LINK, after FLOGI completion
Correct by issuing CONFIG_LINK with the correct S_ID after receiving the
PLOGI from the target
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Raise the maximum NVME sg list size allowed to 256 elements.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Performing an LS abort results in the following message being seen:
0603 Invalid CQ subtype 6: 00000300 22000002 ffff0016 d0050000
and the associated exchange is not properly freed.
The code did not recognize the exchange type that was aborted, thus it
was not properly handled.
Correct by adding the NVME LS ELS type to the exchange types that are
recognized.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In test cases where an instance of the driver is detached and
reattached, the driver will crash on reattachment. There is a compound
if statement that will skip over the bar setup if the pci_resource_start
call is not successful. The driver erroneously returns success to its
bar setup in this scenario even though the bars aren't properly
configured.
Rework the offending code segment for proper initialization steps. If
the pci_resource_start call fails, -ENOMEM is now returned.
Sample stack:
rport-5:0-10: blocked FC remote port time out: removing rport
BUG: unable to handle kernel NULL pointer dereference at (null)
... lpfc_sli4_wait_bmbx_ready+0x32/0x70 [lpfc]
...
... RIP: 0010:... ... lpfc_sli4_wait_bmbx_ready+0x32/0x70 [lpfc]
Call Trace:
... lpfc_sli4_post_sync_mbox+0x106/0x4d0 [lpfc]
... ? __alloc_pages_nodemask+0x176/0x420
... ? __kmalloc+0x2e/0x230
... lpfc_sli_issue_mbox_s4+0x533/0x720 [lpfc]
... ? mempool_alloc+0x69/0x170
... ? dma_generic_alloc_coherent+0x8f/0x140
... lpfc_sli_issue_mbox+0xf/0x20 [lpfc]
... lpfc_sli4_driver_resource_setup+0xa6f/0x1130 [lpfc]
... ? lpfc_pci_probe_one+0x23e/0x16f0 [lpfc]
... lpfc_pci_probe_one+0x445/0x16f0 [lpfc]
... local_pci_probe+0x45/0xa0
... work_for_cpu_fn+0x14/0x20
... process_one_work+0x17a/0x440
Cc: <stable@vger.kernel.org> # 4.12+
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
XRI_ABORTED_CQE completions were not being handled in the fast path.
They were being queued and deferred to the lpfc worker thread for
processing. This is an artifact of the driver design prior to moving
queue processing out of the isr and into a workq element. Now that queue
processing is already in a deferred context, remove this artifact and
process them directly.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Hardware queues are a fast staging area to push commands into the
adapter. The adapter should drain them extremely quickly. However,
under heavy io load, the host cpu is pushing commands faster than the
drain rate of the adapter causing the driver to resource busy commands.
Enlarge the hardware queue (wq & cq) to support a larger number of queue
entries (4x the prior size) before backpressure. Enlarging the queue
requires larger contiguous buffers (16k) per logical page for the
hardware. This changed calling sequences that were expecting 4K page
sizes that now must pass a parameter with the page sizes. It also
required use of a new version of an adapter command that can vary the
page size values.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the HBA is connected to a private loop, the driver reports FLOGI
loop-open failure as functional error. This is an expected condition.
Mark loop-open failure as a warning instead of error.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The error message dereferences "rqb_entry" so we need to print it first
and then free the buffer.
Fixes: 6c621a2229 ("scsi: lpfc: Separate NVMET RQ buffer posting from IO resources SGL/iocbq/context")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas,
megaraid_sas, pm80xx, mpt3sas, be2iscsi, hpsa. and a host of minor
updates.
There's no major behaviour change or additions to the core in all of
this, so the potential for regressions should be small (biggest
potential being in the scsi error handler changes).
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJaCxtCAAoJEAVr7HOZEZN4d9EQAI+OHP6ss6zjKKC21c9jNPcH
NhLrNv37gHg/LA2VXeUEL9RGUjCGLIUrI4HsrxzkFAMLKP4TkshMs8/2RvczY+Sa
VpayPqVybEKLIS6ipQyM1SLIQff2nvtDVcN/T+8z1lkk45TrbA6ZGuwUwd2aJyEA
2V2wtg51ObnL0Nr9QPPll0JrtL1AnCZyRlu9XrwTZuuSBZwk93opIuuvbZm/3dVg
Ir4GSS4Y+PuHIfu4cxqdsPMdzRdY9I2me1YiE4jeFSn1/VTAjL4HBz7fO9eITT42
VhXSpDz1XvFsa9dJ0ubkqoALpJzCfOcBw+EuGvSydLEvOBoEVwMccdfaD9lT1zc5
L9e1Z5qqJoq7hTA6xTXCYfWG73I9HYvljtmc8yudKHhADOdnSTUXhaO6uBF0RNqD
OxPSA1RZwRx3c6lDOcK6BTtvLAkTEuYKdrWSKJi0w+QXJAyQ6etqbmsKpmPdRim7
Z4ZSpJFro2gyo9gcdJO0ykTG+z3U7Z/ay1sNgnuprsv+eU/QjUdlAPl18o79EkRf
H54zZggZ4wC6q/cFVVt4Vx+V+oqIeu38s7NDXS9UltLoTZPm2EzDW6pXd/38Z4Tf
a1oBAUET8kYLC90P8sVZxUIHZjITlpgDbyE2Lq00PMYXhk8S4IxF0aMN5RvVqzUv
+7N2HrHkSSgG1nhw1t+E
=3O85
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas,
megaraid_sas, pm80xx, mpt3sas, be2iscsi, hpsa. and a host of minor
updates.
There's no major behaviour change or additions to the core in all of
this, so the potential for regressions should be small (biggest
potential being in the scsi error handler changes)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (203 commits)
scsi: lpfc: Fix hard lock up NMI in els timeout handling.
scsi: mpt3sas: remove a stray KERN_INFO
scsi: mpt3sas: cleanup _scsih_pcie_enumeration_event()
scsi: aacraid: use timespec64 instead of timeval
scsi: scsi_transport_fc: add 64GBIT and 128GBIT port speed definitions
scsi: qla2xxx: Suppress a kernel complaint in qla_init_base_qpair()
scsi: mpt3sas: fix dma_addr_t casts
scsi: be2iscsi: Use kasprintf
scsi: storvsc: Avoid excessive host scan on controller change
scsi: lpfc: fix kzalloc-simple.cocci warnings
scsi: mpt3sas: Update mpt3sas driver version.
scsi: mpt3sas: Fix sparse warnings
scsi: mpt3sas: Fix nvme drives checking for tlr.
scsi: mpt3sas: NVMe drive support for BTDHMAPPING ioctl command and log info
scsi: mpt3sas: Add-Task-management-debug-info-for-NVMe-drives.
scsi: mpt3sas: scan and add nvme device after controller reset
scsi: mpt3sas: Set NVMe device queue depth as 128
scsi: mpt3sas: Handle NVMe PCIe device related events generated from firmware.
scsi: mpt3sas: API's to remove nvme drive from sml
scsi: mpt3sas: API 's to support NVMe drive addition to SML
...
Pull core block layer updates from Jens Axboe:
"This is the main pull request for block storage for 4.15-rc1.
Nothing out of the ordinary in here, and no API changes or anything
like that. Just various new features for drivers, core changes, etc.
In particular, this pull request contains:
- A patch series from Bart, closing the whole on blk/scsi-mq queue
quescing.
- A series from Christoph, building towards hidden gendisks (for
multipath) and ability to move bio chains around.
- NVMe
- Support for native multipath for NVMe (Christoph).
- Userspace notifications for AENs (Keith).
- Command side-effects support (Keith).
- SGL support (Chaitanya Kulkarni)
- FC fixes and improvements (James Smart)
- Lots of fixes and tweaks (Various)
- bcache
- New maintainer (Michael Lyle)
- Writeback control improvements (Michael)
- Various fixes (Coly, Elena, Eric, Liang, et al)
- lightnvm updates, mostly centered around the pblk interface
(Javier, Hans, and Rakesh).
- Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)
- Writeback series that fix the much discussed hundreds of millions
of sync-all units. This goes all the way, as discussed previously
(me).
- Fix for missing wakeup on writeback timer adjustments (Yafang
Shao).
- Fix laptop mode on blk-mq (me).
- {mq,name} tupple lookup for IO schedulers, allowing us to have
alias names. This means you can use 'deadline' on both !mq and on
mq (where it's called mq-deadline). (me).
- blktrace race fix, oopsing on sg load (me).
- blk-mq optimizations (me).
- Obscure waitqueue race fix for kyber (Omar).
- NBD fixes (Josef).
- Disable writeback throttling by default on bfq, like we do on cfq
(Luca Miccio).
- Series from Ming that enable us to treat flush requests on blk-mq
like any other request. This is a really nice cleanup.
- Series from Ming that improves merging on blk-mq with schedulers,
getting us closer to flipping the switch on scsi-mq again.
- BFQ updates (Paolo).
- blk-mq atomic flags memory ordering fixes (Peter Z).
- Loop cgroup support (Shaohua).
- Lots of minor fixes from lots of different folks, both for core and
driver code"
* 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
nvme: fix visibility of "uuid" ns attribute
blk-mq: fixup some comment typos and lengths
ide: ide-atapi: fix compile error with defining macro DEBUG
blk-mq: improve tag waiting setup for non-shared tags
brd: remove unused brd_mutex
blk-mq: only run the hardware queue if IO is pending
block: avoid null pointer dereference on null disk
fs: guard_bio_eod() needs to consider partitions
xtensa/simdisk: fix compile error
nvme: expose subsys attribute to sysfs
nvme: create 'slaves' and 'holders' entries for hidden controllers
block: create 'slaves' and 'holders' entries for hidden gendisks
nvme: also expose the namespace identification sysfs files for mpath nodes
nvme: implement multipath access to nvme subsystems
nvme: track shared namespaces
nvme: introduce a nvme_ns_ids structure
nvme: track subsystems
block, nvme: Introduce blk_mq_req_flags_t
block, scsi: Make SCSI quiesce and resume work reliably
block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
...
This patch calls the new nvme transport routine for dev_loss_tmo
whenever the SCSI fc transport calls the lldd to make a dynamic
change to a remote ports dev_loss_tmo.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
System crashed due to a hard lockup at lpfc_els_timeout_handler+0x128.
The els ring's txcmplq list is corrupted: the last element in the list
does not point back the the head causing a loop. Issue is the els
processing path for sli4 hbas are using the hbalock instead of the
ring_lock for removing elements from the txcmplq list.
Use the adapter SLI_REV to determine which lock should be used for
removing iocbqs from the els rings txcmplq.
note: the future refactoring will address this so that we don't have
this ugly type-based lock code.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
drivers/scsi/lpfc/lpfc_debugfs.c:5460:22-29: WARNING: kzalloc should be used for phba -> nvmeio_trc, instead of kmalloc/memset
drivers/scsi/lpfc/lpfc_debugfs.c:2230:20-27: WARNING: kzalloc should be used for phba -> nvmeio_trc, instead of kmalloc/memset
Use kzalloc rather than kmalloc followed by memset with 0
Generated by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci
Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
The ! has higher precedence than the & operation. I've added
parenthesis so this works as intended.
Fixes: 952c303b32 ("scsi: lpfc: Ensure io aborts interlocked with the target.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Change version to 11.4.0.4
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The internal cfg flag is actually smaller, by 1 (for a partial page
sge), than the sg list maintained by the driver. Thus the check on sg
segments errored out when it shouldn't have
Ensure the check is +1
Note: having a value that is less than what it really is is bogus.
Correcting it now would be a significant rework. Add this item to the
list to be refactored in the merge with efct.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When running NVME io as a NVME host, if the driver is unloaded there
would be oops in lpfc_sli4_issue_wqe.
When unloading, controllers are torn down and the transport initiates
set_property commands to reset the controller and issues aborts to
terminate existing io. The drivers nvme abort and fcp io submit
routines needed to recognize the driver is unloading and fail the new
requests. It didn't, resulting in the oops.
Revise the ls and fcp io submit routines to detect the unloading state
and properly handle their cleanup.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Support RDP and Multiple Frames
If the remote Nport is not logged in, the driver would not populate all
the descriptors in the RDP response payload. Doing so would create a
payload length that requires multiple frames due to exceeding the
default rx buffer size without an explicit login. Currently FC-LS
explicitly states the RDP response must be a single frame sequence.
Thus we did not violate the standard.
Recently, a modification to FC-LS was accepted which allows multi-frame
sequences and all vendors have indicated they are interoperable with the
change. As such, extend RDP support with the additional fields and send
a multi-frame sequence.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Before releasing nvme io back to the io stack for possible retry on
other paths, ensure the io termination is interlocked with the target
device by ensuring the entire ABTS-LS protocol is complete.
Additionally, FC-NVME ABTS-LS protocol does not use RRQ. Remove RRQ
behavior from ABTS-LS.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Firmware update fails with: status x17 add_status x56 on the final write
If multiple DMA buffers are used for the download, some firmware revs
have difficulty with signatures and crcs split across the dma buffer
boundaries. Resolve by making all writes be a single 4k page in length.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is seeing a NULL pointer in lpfc_nvme_fcp_io_submit. This
was ultimately due to a transport AER being sent on a terminated
controller, thus some of the values were not set. In case we're in a
system without a corrected transport and in case a race condition occurs
where we enter the routine as the teardown is happening in a separate
thread, validate the parameters before starting the io.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The initial implementation of NVME didn't merge with NPIV support. As
such, there are several issues if NPIV is used with NVME. For now,
ensure that if NVME is enabled then NPIV is not enabled.
Support for NPIV with NVME will be added in the near future.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
if nvmet targetport registration fails, the driver encounters a NULL
pointer oops in lpfc_hb_timeout_handler.
To fix: if registration fails, ensure nvmet_support is cleared on the
port structure.
Also enhanced the log message on failure.
Cc: <stable@vger.kernel.org> # 4.12+
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The descriptions for lpfc_xri_split and lpfc_enable_fc4_type were
poor. Revise for better understanding:
lpfc_xri_split - Percentage of FCP XRI resources versus NVME
lpfc_enable_fc4_type - Enable FC4 Protocol support - FCP / NVME
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Always set ctxp->state to LPFC_NVMET_STE_ABORT if ABORT op gets called
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are several log messages that report abnormal terminations that by
default are marked warn. These are typically the result of failures due
to invalid controller state or abort completions. They are all natural
when a controller resets.
Unfortunately, as they are logged by default, it makes the admin very
concerned.
Convert the messages to Info.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is encountering oops in lpfc_sli_calc_ring.
The driver is setting hba_wqidx for FCP based on the policy in use for
NVME. The two may not be the same. Change to set the wqidx based on the
FCP policy.
Cc: <stable@vger.kernel.org> # 4.12+
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Under heavy target nvme load duration, the lpfc irq handler is
encountering cpu lockup warnings.
Convert the driver to a shortened ISR handler which identifies the
interrupting condition then schedules a workq thread to process the
completion queue the interrupt was for. This moves all the real work
into the workq element.
As nvmet_fc upcalls are no longer in ISR context, don't set the feature
flags
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Need to make ktime samples more accurate
If ktime is turned on in the middle of an IO, the max calculation could
be misleading. Base sampling on the start time of the IO as opposed to
ktime_on.
Make ISR ktime timestamps be from when CQE is read instead of EQE.
Added additional sanity checks when deciding whether to accept an IO
sample or not.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Local Reject/Invalid RPI errors seen during discovery.
Temporary RPI cleanup was occurring regardless of SLI rev. It's only
necessary on SLI-4.
Adjust the test for whether cleanup is necessary.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Warning messages when NVME_TARGET_FC not defined on ppc builds
The lpfc_nvmet_replenish_context() function is only meaningful when NVME
target mode enabled. Surround the function body with ifdefs for target
mode enablement.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In a link bounce scenario, a condition can occur where the discovery
engine swaps an ndlp structure (address change for an nport). While the
swap was successfully executed by the discovery engine, the driver did
not properly detect a change in the ndlp bound to the nvme rport. This
error resulted in the nvme host transport issuing an IO to the correct
nvme rport, but the lpfc driver addressed a ndlp with an NLP_UNUSED
status and failed the io. This resulting it it looking like there were
missing namespaces and applications failed due to io errors.
To fix, in lpfc_nvme_register_rport, rework the "rebind" case to break
the nvme rport<->ndlp association when the ndlp already has an
nrport. Then rebind the rport to the correct ndlp data and backpointers.
[mkp: typo]
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver crashes when attempting to use a freed ndpl pointer.
The pci_remove_one handler runs on a separate kernel thread. The order
of the removal is starting by freeing all of the ndlps and then
disabling interrupts. In between these two events the driver can still
receive an ELS and process it. When it tries to use the ndlp pointer
will be NULL
Change the order of the pci_remove_one vs disable interrupts so that
interrupts are disabled before the ndlp's are freed.
Cc: <stable@vger.kernel.org> # 4.12+
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During pci hot plug, the kernel crashes in a list_add_call
The lookup by tag function will return null if the IOCB is out of range
or does not have the on txcmplq flag set.
Fix: Check for null return from lookup by tag.
Cc: <stable@vger.kernel.org> # 4.12+
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During pci hot plug, the kernel crashes in timer management code.
The sli4 remove_one handler is not stoping the timers as it starts to
remove the port so that it can be swapped.
Fix: Stop the timers early in the handler routine.
Note: Fix in SLI-4 only. SLI-3 already stopped the timers properly.
Cc: <stable@vger.kernel.org> # 4.12+
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Eight mostly minor fixes for recently discovered issues in drivers.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJZz2OGAAoJEAVr7HOZEZN4NTQQAL1avebL1Abau5ODtGTciJrZ
I4QqUuJ0PGo9AKWIA/s34cY7kdPt2qhIoUJY0bLKdv8QtR5MDqfJ6c+2ianKenWm
EkDKYBQ2csBqzH9VN0tEIPQhvLr7oxvCgEDjfvusxWoUX4AgSv2bMaxIpgs4GUxS
hOH4fg/6A0HuhP/HjtYfd89DBdhhKOPU6oRx6YLF4ctlfZxAcALsvVjNAGaJzZX8
Bpv2K/xWOwb/UghjJvxv1wYZ+AL0BHAFVZilBFpyX+ClRhmU9d9SVYs43CbwSu+Y
41qIAYLJXrls6CSWJMTEo/+mhbPMQBS6Q3wSewOaNZXkx4comjJHpx/k2HsiVk7K
ObN9eIXNzyby132pIIHc9ZRSpJRTr0/jjb9FetneZlLHaNabtIf67JA0j4fIoCEh
Qb73uR030mCNvQ+xvK8DxF9UE1zQXnXiJWoIH9NrGtFtknQOmFlfFfmo/gMDI8w6
aYkxuHdqn2oFjKDuZOQ4zYa8ptcZAAml64gj2YiNiwgNosEpt9UMJ5WRknrGJ7po
jHohzKyKkSykjOxgOL1Noh5d7AoWPtJ1Qbg+Leyg1WLnRGHTgAYYmv1LabOdMhbM
SIMhNRBFQunCBKTyes/RB2Nykl1OXipnxIaRgSrRhsEiTOVutmyy7jr6BbZSB2vY
XBpadk38SGSsZva+Sfk0
=v2G5
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Eight mostly minor fixes for recently discovered issues in drivers"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ILLEGAL REQUEST + ASC==27 => target failure
scsi: aacraid: Add a small delay after IOP reset
scsi: scsi_transport_fc: Also check for NOTPRESENT in fc_remote_port_add()
scsi: scsi_transport_fc: set scsi_target_id upon rescan
scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly
scsi: aacraid: error: testing array offset 'bus' after use
scsi: lpfc: Don't return internal MBXERR_ERROR code from probe function
scsi: aacraid: Fix 2T+ drives on SmartIOC-2000
Use *_pool_zalloc rather than *_pool_alloc followed by memset with 0.
Found by coccinelle spatch "api/alloc/pool_zalloc-simple.cocci"
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The lpfc driver uses the FC-specific error when it needed to return an
error to the FC-NVME transport. Convert to use a generic value instead.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Internal error codes happen to be positive, thus the PCI driver core
won't treat them as failure, but we do. This would cause a crash later
on as lpfc_pci_remove_one() is called (e.g. as shutdown function).
Fixes: 6d368e5321 ("[SCSI] lpfc 8.3.24: Add resource extent support")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The pointer eqe is always non-null inside the while loop, so the check
to see if eqe is NULL is redudant and hence can be removed.
Detected by CoverityScan CID#1248693 ("Logically Dead Code")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas, megaraid_sas, zfcp and a host of minor updates.
The major driver change here is the elimination of the block based
cciss driver in favour of the SCSI based hpsa driver (which now drives
all the legacy cases cciss used to be required for). Plus a reset
handler clean up and the redo of the SAS SMP handler to use bsg lib.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJZscDNAAoJEAVr7HOZEZN4DWIQAK/UkkrvKpV/jLATM/yi7CoL
QidY86Hmwwl7A9HQ+2fjLfAsye0xcCzRwkucKK90IP5b4pefHhiJJfiMKAAe3TUW
xstnY5z5jaOhDG4nyJFoSm5fH5qXkMnJ8NZRK8f6Qg5yBN5dStEKqoBboNsz4KBI
md7idw0mbp5i2GXlJwSpc5eDS97GiPL6WkwgGaGKfXF1NDau0GbEdjijfz55haCD
pMhY7WJh/71RfOq/1ThXT1Z3khOlVcKXrkdO+602n7zh/klRBRtBC8m2a6xCfZPj
n7Pb/s0jhCQPd+e/Xtv7WEbY8uNOCrGoVgZ6U5EGrT5IeTfep24ackYqerjMhE63
esi4BJY8lUP9SGleLMgjYWyCHdmxBJRa7UI614DWN/H0QoGP6j/2EzGoi5Fw04vC
H8/+aqPPWZc9KUBioRYo8xWO8YgMqL2eyXY+Tc9cwxqAe2T6k/NC1zJVgDFKXfzb
QoWW4v9NNmYwf5vL/7tNgkeTMFQV66yUR7dR3SGTSk8UIrJ40ok0JyUAsDg86ZAH
BfMkWwhWQ6Byoel0Y7Ti88T49Cox/64r/I0ux06Qgg99+KpRLT7z20+GLIEHgXxg
116C39rgvYKqzc7W8RCyj8qSROuMVzg6QFbB6n+1PEsYIX2O8A2Re3jdS34q2LbX
aBDm/Lfdl4kkJrV9xY6P
=nQUG
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas,
megaraid_sas, zfcp and a host of minor updates.
The major driver change here is the elimination of the block based
cciss driver in favour of the SCSI based hpsa driver (which now drives
all the legacy cases cciss used to be required for). Plus a reset
handler clean up and the redo of the SAS SMP handler to use bsg lib"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits)
scsi: scsi-mq: Always unprepare before requeuing a request
scsi: Show .retries and .jiffies_at_alloc in debugfs
scsi: Improve requeuing behavior
scsi: Call scsi_initialize_rq() for filesystem requests
scsi: qla2xxx: Reset the logo flag, after target re-login.
scsi: qla2xxx: Fix slow mem alloc behind lock
scsi: qla2xxx: Clear fc4f_nvme flag
scsi: qla2xxx: add missing includes for qla_isr
scsi: qla2xxx: Fix an integer overflow in sysfs code
scsi: aacraid: report -ENOMEM to upper layer from aac_convert_sgraw2()
scsi: aacraid: get rid of one level of indentation
scsi: aacraid: fix indentation errors
scsi: storvsc: fix memory leak on ring buffer busy
scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough
scsi: smartpqi: remove the smp_handler stub
scsi: hpsa: remove the smp_handler stub
scsi: bsg-lib: pass the release callback through bsg_setup_queue
scsi: Rework handling of scsi_device.vpd_pg8[03]
scsi: Rework the code for caching Vital Product Data (VPD)
scsi: rcu: Introduce rcu_swap_protected()
...
This is an interesting regression with gcc-8, showing a harmless warning
for correct code:
In file included from include/linux/kernel.h:13:0,
...
from drivers/scsi/lpfc/lpfc_debugfs.c:23:
include/linux/printk.h:301:2: error: 'eq' may be used uninitialized in this function [-Werror=maybe-uninitialized]
printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~
In file included from drivers/scsi/lpfc/lpfc_debugfs.c:58:0:
drivers/scsi/lpfc/lpfc_debugfs.h:451:31: note: 'eq' was declared here
I managed to reduce the warning into a small test case for gcc-8 that I
reported in the gcc bugzilla[1].
As a workaround, this changes the logic to move the two assignments of
'eq' out of the conditions and instead make the index conditional. This
works for all configurations I tried and avoids adding a bogus
initialization.
Acked-by: James Smart <james.smart@broadcom.com>
Link: [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81958
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The only reference to lpfc_nvmet_replenish_context() is inside of an
disabled:
drivers/scsi/lpfc/lpfc_nvmet.c:1457:1: error: 'lpfc_nvmet_replenish_context' defined but not used [-Werror=unused-function]
This replaces the preprocessor conditional with a C condition, so the
compiler can see that the function is intentionally unused.
Fixes: 9a38e4f1c82f ("scsi: lpfc: Fix MRQ > 1 context list handling")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update driver version to 11.4.0.3
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
cc1: warnings being treated as errors
drivers/scsi/lpfc/lpfc_init.c: In function 'lpfc_get_wwpn':
drivers/scsi/lpfc/lpfc_init.c:3253: error: integer constant is too large for 'long' type
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add Buffer to buffer credit recovery support to the driver. This is a
negotiated feature with the peer that allows for both sides to detect
dropped RRDY's and FC Frames and recover credit.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Change hw queue binding messages to info - not error.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>