There is a chance of the driver to be stuck in kdump if drives start
acting up in kdump discovery process and the kernel decides to send eh
resets, which would prompt rescan to be scheduled.
Do not perform a rescan in kdump context, since we do not expect a hotplug
event during kdump and all the devices are going to go away anyway.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add back the ability to scan for hotplug changes while eh was in progress.
Schedule a rescan for a later time in the eh recovery code and wait for
eh to complete in the rescan worker.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the driver fails to retrieve information from the fw (could happen when
the fw is not fully in its senses), the driver does nothing and change is
not processed correctly by the driver
Schedule host rescan in case of failure. This is only for SAFW, since
the information retrieval failure will happen on SAFW devices.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver uses scsi_scan_host to add new devices in the driver init path,
which adds all the fw exposed devices. The drivers resorts to queue
command checks to block out commands to _hidden_ devices.
Use the hotplug handler code to add new devices during driver init and
other areas, this is only for safw. For ARC scsi_scan_host will still
apply.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently driver will attempt to process hotplug events concurrently based
on the FW interrupt.
Protect safw update function with a scan mutex.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The device hotplug events are processed only after retrieving the updated
lun information from the fw. Does not make sense to keep them separate.
Merge both the hotplug handling and safw adapter setup code into single
function.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Resolve luns checks the if a sdev is already present in the os to figure
out if it needs to be removed. Internally the driver exposes HBA on bus
2 even though its bus 1 in the fw. Its mildly confusing.
Refactor out the sdev lookup into its function to check if sdev has been
added to the kernel or not. Add helper functions to add, remove and put
devices based on their fw bus and target number.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Added macros to loop through the MAX SUPPORTED Buses and Targets. This
will make the code a bit easier to read.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The hotplug handler code is duplicated for hba handling and container
handling.
Merged function to handle hba and container hot plug events into the
resolve luns functions. Added a bunch of helper functions to check the
validity of a given target and to check if bus, target is container
device.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Merge aac_get_containers to setup target function, so that information
about all the present devices can be retrieved in one shot.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add helper function to set queue depth from information retrieved from
the bmic phy structure.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Save the bmic information for each phy, so that it can processed in
target setup function.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Created inline function to retrieve lun info for each device from the
phy luns structure.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Move the function to get phy luns information to the top of function
to set target information
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove function call to process targets from the report phy luns function
and make it a function in its own right. This will help understand the
flow of the code.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add helper function to setup targets devices and create the base for the
upcoming patches
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Rename variables and functions to make bmic identify, report phy luns
to make them consistent across code internal existing code bases
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Edit function that retrieves phy lun information to use common
bmic function
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
safw command submission is duplicated across many functions.
Move the safw submission code from bmic identify into its own function
for common use
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ideally driver needs to wait for IO to be submitted or responded to before
shutdown.
Move code to wait for IO completion into shutdown path
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Refactored the reset_host store function to make consistent across code
bases
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It is possible to restart the controller via the use of the reset_host
sysfs variable. This does work for controllers that can no longer respond,
since driver will attempt to send down a shutdown in this path.
Check if the controller is able to receive commands before sending down
a shutdown
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver would hang when attempting to send reset from the ioctl interface,
since it would wait to retrieve the ioctl mutex at send shutdown.
Set adapter shutdown and unlock mutex before sending down reset request.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
As part of the recovery process, the drivers removes offline devices (
done by the kernel) and then tries to add them back in the rescan code.
Removing the device is like taking a sledgehammer to a nail.
Set the device as running if it is marked offline.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver attempts to perform a device scan and device add after coming out
of reset. At times when the kdump kernel loads and it tries to perform
eh recovery, the device scan hangs since its commands are blocked because
of the eh recovery. This should have shown up in normal eh recovery path
(Should have been obvious)
Remove the code that performs scanning.I can live without the rescanning
support in the stable kernels but a hanging kdump/eh recovery needs to be
fixed.
Fixes: a2d0321dd5 (scsi: aacraid: Reload offlined drives after controller reset)
Cc: <stable@vger.kernel.org>
Reported-by: Douglas Miller <dougmill@linux.vnet.ibm.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Fixes: a2d0321dd5 (scsi: aacraid: Reload offlined drives after controller reset)
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Check if the adapter can receive abort requests, before sending aborts
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When udev requests for a devices inquiry string, it might create multiple
threads causing a race condition on the shared inquiry resource string.
Created a buffer with the string for each thread.
Cc: <stable@vger.kernel.org>
Fixes: 3bc8070fb7 ([SCSI] aacraid: SMC vendor identification)
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Variable managed_request_id is being assigned but it is never read,
hence it is redundant and can be removed. Cleans up clang warning:
drivers/scsi/aacraid/linit.c:706:5: warning: Value stored to
'managed_request_id' is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas,
megaraid_sas, pm80xx, mpt3sas, be2iscsi, hpsa. and a host of minor
updates.
There's no major behaviour change or additions to the core in all of
this, so the potential for regressions should be small (biggest
potential being in the scsi error handler changes).
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJaCxtCAAoJEAVr7HOZEZN4d9EQAI+OHP6ss6zjKKC21c9jNPcH
NhLrNv37gHg/LA2VXeUEL9RGUjCGLIUrI4HsrxzkFAMLKP4TkshMs8/2RvczY+Sa
VpayPqVybEKLIS6ipQyM1SLIQff2nvtDVcN/T+8z1lkk45TrbA6ZGuwUwd2aJyEA
2V2wtg51ObnL0Nr9QPPll0JrtL1AnCZyRlu9XrwTZuuSBZwk93opIuuvbZm/3dVg
Ir4GSS4Y+PuHIfu4cxqdsPMdzRdY9I2me1YiE4jeFSn1/VTAjL4HBz7fO9eITT42
VhXSpDz1XvFsa9dJ0ubkqoALpJzCfOcBw+EuGvSydLEvOBoEVwMccdfaD9lT1zc5
L9e1Z5qqJoq7hTA6xTXCYfWG73I9HYvljtmc8yudKHhADOdnSTUXhaO6uBF0RNqD
OxPSA1RZwRx3c6lDOcK6BTtvLAkTEuYKdrWSKJi0w+QXJAyQ6etqbmsKpmPdRim7
Z4ZSpJFro2gyo9gcdJO0ykTG+z3U7Z/ay1sNgnuprsv+eU/QjUdlAPl18o79EkRf
H54zZggZ4wC6q/cFVVt4Vx+V+oqIeu38s7NDXS9UltLoTZPm2EzDW6pXd/38Z4Tf
a1oBAUET8kYLC90P8sVZxUIHZjITlpgDbyE2Lq00PMYXhk8S4IxF0aMN5RvVqzUv
+7N2HrHkSSgG1nhw1t+E
=3O85
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas,
megaraid_sas, pm80xx, mpt3sas, be2iscsi, hpsa. and a host of minor
updates.
There's no major behaviour change or additions to the core in all of
this, so the potential for regressions should be small (biggest
potential being in the scsi error handler changes)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (203 commits)
scsi: lpfc: Fix hard lock up NMI in els timeout handling.
scsi: mpt3sas: remove a stray KERN_INFO
scsi: mpt3sas: cleanup _scsih_pcie_enumeration_event()
scsi: aacraid: use timespec64 instead of timeval
scsi: scsi_transport_fc: add 64GBIT and 128GBIT port speed definitions
scsi: qla2xxx: Suppress a kernel complaint in qla_init_base_qpair()
scsi: mpt3sas: fix dma_addr_t casts
scsi: be2iscsi: Use kasprintf
scsi: storvsc: Avoid excessive host scan on controller change
scsi: lpfc: fix kzalloc-simple.cocci warnings
scsi: mpt3sas: Update mpt3sas driver version.
scsi: mpt3sas: Fix sparse warnings
scsi: mpt3sas: Fix nvme drives checking for tlr.
scsi: mpt3sas: NVMe drive support for BTDHMAPPING ioctl command and log info
scsi: mpt3sas: Add-Task-management-debug-info-for-NVMe-drives.
scsi: mpt3sas: scan and add nvme device after controller reset
scsi: mpt3sas: Set NVMe device queue depth as 128
scsi: mpt3sas: Handle NVMe PCIe device related events generated from firmware.
scsi: mpt3sas: API's to remove nvme drive from sml
scsi: mpt3sas: API 's to support NVMe drive addition to SML
...
aacraid passes the current time to the firmware in one of two ways,
either as year/month/day/... or as 32-bit unsigned seconds.
The first one is broken on 32-bit architectures as it cannot go past
year 2038. Using timespec64 here makes it behave properly on both 32-bit
and 64-bit architectures, and avoids relying on signed integer overflow
to pass times into the second interface.
The interface used in aac_send_hosttime() however is still problematic
in year 2106 when 32-bit seconds overflow. Hopefully we don't have to
worry about aacraid by that time.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is a fix to an issue where the driver sends its periodic WELLNESS
command to the controller after the driver shut it down.This causes the
controller to crash. The window where this can happen is small, but it
can be hit at around 4 hours of constant resets.
Cc: <stable@vger.kernel.org>
Fixes: fbd185986e (aacraid: Fix AIF triggered IOP_RESET)
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 0e9973ed33 ("scsi: aacraid: Add periodic checks to see IOP reset
status") changed the way driver checks if a reset succeeded. Now, after an
IOP reset, aacraid immediately start polling a register to verify the reset
is complete.
This behavior cause regressions on the reset path in PowerPC (at least).
Since the delay after the IOP reset was removed by the aforementioned patch,
the fact driver just starts to read a register instantly after the reset
was issued (by writing in another register) "corrupts" the reset procedure,
which ends up failing all the time.
The issue highly impacted kdump on PowerPC, since on kdump path we
proactively issue a reset in adapter (through the reset_devices kernel
parameter).
This patch (re-)adds a delay right after IOP reset is issued. Empirically
we measured that 3 seconds is enough, but for safety reasons we delay
for 5s (and since it was 30s before, 5s is still a small amount).
For reference, without this patch we observe the following messages
on kdump kernel boot process:
[ 76.294] aacraid 0003:01:00.0: IOP reset failed
[ 76.294] aacraid 0003:01:00.0: ARC Reset attempt failed
[ 86.524] aacraid 0003:01:00.0: adapter kernel panic'd ff.
[ 86.524] aacraid 0003:01:00.0: Controller reset type is 3
[ 86.524] aacraid 0003:01:00.0: Issuing IOP reset
[146.534] aacraid 0003:01:00.0: IOP reset failed
[146.534] aacraid 0003:01:00.0: ARC Reset attempt failed
Fixes: 0e9973ed33 ("scsi: aacraid: Add periodic checks to see IOP reset status")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Acked-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix possible indexing array of bound for &aac->hba_map[bus][cid], where
bus and cid boundary check happens later.
Fixes: 0d643ff3c3 ("scsi: aacraid: use aac_tmf_callback for reset fib")
Signed-off-by: Nikola Pajkovsky <npajkovsky@suse.cz>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The logic for supporting large drives was previously tied to 4Kn support
for SmartIOC-2000. As SmartIOC-2000 does not support volumes using 4Kn
drives, use the intended option flag AAC_OPT_NEW_COMM_64 to determine
support for volumes greater than 2T.
Cc: <stable@vger.kernel.org>
Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
aac_convert_sgraw2() kmalloc memory and return -1 on error, which should
be -ENOMEM. However, nobody is checking return value, so with this
change, -ENOMEM is propagated to upper layer.
Signed-off-by: Nikola Pajkovsky <npajkovsky@suse.cz>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
unsigned long byte_count = 0;
nseg = scsi_dma_map(scsicmd);
if (nseg < 0)
return nseg;
if (nseg) {
...
}
return byte_count;
is equal to
unsigned long byte_count = 0;
nseg = scsi_dma_map(scsicmd);
if (nseg <= 0)
return nseg;
...
return byte_count;
No other code has changed.
[mkp: fix checkpatch complaints]
Signed-off-by: Nikola Pajkovsky <npajkovsky@suse.cz>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This fixes a potential race condition observed on Power systems.
Several places throughout the aacraid driver call aac_fib_send or
similar to send a command to the aacraid adapter, then check the return
code to determine if the command was actually sent to the adapter, then
update the phase field in the scsi command scratch pad area to track
that the firmware now owns this command. However, there is nothing that
ensures that by the time the aac_fib_send function returns and we go to
write to the scsi command, that the command hasn't already completed and
the scsi command has been freed. This was causing random crashes in the
TCP stack which was tracked down to be caused by memory that had been a
struct request + scsi_cmnd being now used for an skbuff. Memory
poisoning was enabled in the kernel to debug this which showed that the
last owner of the memory that had been freed was aacraid and that it was
a struct request. The memory that was corrupted was the exact data
pattern of AAC_OWNER_FIRMWARE and it was at the same offset that aacraid
writes, which is scsicmd->SCp.phase. The patch below resolves this
issue.
Cc: <stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Tested-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We terminate the aac_get_name_resp on a byte that is outside the bounds
of the structure. Extend the return response by one byte to remove the
out of bounds reference.
Fixes: b836439faf ("aacraid: 4KB sector support")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Carroll <david.carroll@microsemi.com>
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When issuing a bus reset we should complete all commands, not
just the command triggering the reset.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
To correctly identify which fib has a scsi command callback this
patch implements a flag FIB_CONTEXT_FLAG_SCSI_CMD.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
aac_hba_send() will return FAILED for any non-SCSI command requests,
failing any TMFs. This patch updates the check to allow TMFs.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When sending a reset fib we shouldn't rely on the scsi command,
but rather set the TMF status in the map_info->reset_state variable.
That allows us to send a TMF independent on a scsi command.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Split off device, target, and bus reset functionality into
individual functions.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Split off the host reset parts of aac_eh_reset() into a separate
host reset function.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Split off reset FIB generation into separate functions.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
"qd.id" comes directly from the copy_from_user() on the line before so
we should verify that it's within bounds.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Both aac_send_raw_srb() and aac_get_hba_info() may copy stack allocated
structs to userspace without initializing all members of these
structs. Clear out this memory to prevent information leaks.
Fixes: 423400e64d ("scsi: aacraid: Include HBA direct interface")
Fixes: c799d519bf ("scsi: aacraid: Retrieve HBA host information ioctl")
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>