Commit Graph

482 Commits

Author SHA1 Message Date
Dave Carroll
1c6b41fb92 scsi: aacraid: Insure command thread is not recursively stopped
If a recursive IOP_RESET is invoked, usually due to the eh_thread
handling errors after the first reset, be sure we flag that the command
thread has been stopped to avoid an Oops of the form;

 [ 336.620256] CPU: 28 PID: 1193 Comm: scsi_eh_0 Kdump: loaded Not tainted 4.14.0-49.el7a.ppc64le #1
 [ 336.620297] task: c000003fd630b800 task.stack: c000003fd61a4000
 [ 336.620326] NIP: c000000000176794 LR: c00000000013038c CTR: c00000000024bc10
 [ 336.620361] REGS: c000003fd61a7720 TRAP: 0300 Not tainted (4.14.0-49.el7a.ppc64le)
 [ 336.620395] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 22084022 XER: 20040000
 [ 336.620435] CFAR: c000000000130388 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1
 [ 336.620435] GPR00: c00000000013038c c000003fd61a79a0 c0000000014c7e00 0000000000000000
 [ 336.620435] GPR04: 000000000000000c 000000000000000c 9000000000009033 0000000000000477
 [ 336.620435] GPR08: 0000000000000477 0000000000000000 0000000000000000 c008000010f7d940
 [ 336.620435] GPR12: c00000000024bc10 c000000007a33400 c0000000001708a8 c000003fe3b881d8
 [ 336.620435] GPR16: c000003fe3b88060 c000003fd61a7d10 fffffffffffff000 000000000000001e
 [ 336.620435] GPR20: 0000000000000001 c000000000ebf1a0 0000000000000001 c000003fe3b88000
 [ 336.620435] GPR24: 0000000000000003 0000000000000002 c000003fe3b88840 c000003fe3b887e8
 [ 336.620435] GPR28: c000003fe3b88000 c000003fc8181788 0000000000000000 c000003fc8181700
 [ 336.620750] NIP [c000000000176794] exit_creds+0x34/0x160
 [ 336.620775] LR [c00000000013038c] __put_task_struct+0x8c/0x1f0
 [ 336.620804] Call Trace:
 [ 336.620817] [c000003fd61a79a0] [c000003fe3b88000] 0xc000003fe3b88000 (unreliable)
 [ 336.620853] [c000003fd61a79d0] [c00000000013038c] __put_task_struct+0x8c/0x1f0
 [ 336.620889] [c000003fd61a7a00] [c000000000171418] kthread_stop+0x1e8/0x1f0
 [ 336.620922] [c000003fd61a7a40] [c008000010f7448c] aac_reset_adapter+0x14c/0x8d0 [aacraid]
 [ 336.620959] [c000003fd61a7b00] [c008000010f60174] aac_eh_host_reset+0x84/0x100 [aacraid]
 [ 336.621010] [c000003fd61a7b30] [c000000000864f24] scsi_try_host_reset+0x74/0x180
 [ 336.621046] [c000003fd61a7bb0] [c000000000867ac0] scsi_eh_ready_devs+0xc00/0x14d0
 [ 336.625165] [c000003fd61a7ca0] [c0000000008699e0] scsi_error_handler+0x550/0x730
 [ 336.632101] [c000003fd61a7dc0] [c000000000170a08] kthread+0x168/0x1b0
 [ 336.639031] [c000003fd61a7e30] [c00000000000b528] ret_from_kernel_thread+0x5c/0xb4
 [ 336.645971] Instruction dump:
 [ 336.648743] 384216a0 7c0802a6 fbe1fff8 f8010010 f821ffd1 7c7f1b78 60000000 60000000
 [ 336.657056] 39400000 e87f0838 f95f0838 7c0004ac <7d401828> 314affff 7d40192d 40c2fff4
 [ 336.663997] -[ end trace 4640cf8d4945ad95 ]-

So flag when the thread is stopped by setting the thread pointer to NULL.

Signed-off-by: Dave Carroll <david.carroll@microsemi.com>
Reviewed-by: Raghava Aditya Renukunta <raghavaaditya.renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-09 21:08:30 -04:00
Linus Torvalds
052c220da3 SCSI for-linus on 20180404
This is mostly updates of the usual drivers: arcmsr, qla2xx, lpfc,
 ufs, mpt3sas, hisi_sas.  In addition we have removed several really
 old drivers: sym53c416, NCR53c406a, fdomain, fdomain_cs and removed
 the old scsi_module.c initialization from all remaining drivers.  Plus
 an assortment of bug fixes, initialization errors and other minor
 fixes.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWsVSnSYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishbvbAP9ErpTZ
 OR5iJ5HIz4W3Bd8aTfEpJrDyeYwSUC+sra5SKQD/ZWyVB3fYFSg+ZROyT26pmtmd
 SdImhG7hLaHgVvF5qRQ=
 =SQ/n
 -----END PGP SIGNATURE-----

Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This is mostly updates of the usual drivers: arcmsr, qla2xx, lpfc,
  ufs, mpt3sas, hisi_sas.

  In addition we have removed several really old drivers: sym53c416,
  NCR53c406a, fdomain, fdomain_cs and removed the old scsi_module.c
  initialization from all remaining drivers.

  Plus an assortment of bug fixes, initialization errors and other minor
  fixes"

* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (168 commits)
  scsi: ufs: Add support for Auto-Hibernate Idle Timer
  scsi: ufs: sysfs: reworking of the rpm_lvl and spm_lvl entries
  scsi: qla2xxx: fx00 copypaste typo
  scsi: qla2xxx: fix error message on <qla2400
  scsi: smartpqi: update driver version
  scsi: smartpqi: workaround fw bug for oq deletion
  scsi: arcmsr: Change driver version to v1.40.00.05-20180309
  scsi: arcmsr: Sleep to avoid CPU stuck too long for waiting adapter ready
  scsi: arcmsr: Handle adapter removed due to thunderbolt cable disconnection.
  scsi: arcmsr: Rename ACB_F_BUS_HANG_ON to ACB_F_ADAPTER_REMOVED for adapter hot-plug
  scsi: qla2xxx: Update driver version to 10.00.00.06-k
  scsi: qla2xxx: Fix Async GPN_FT for FCP and FC-NVMe scan
  scsi: qla2xxx: Cleanup code to improve FC-NVMe error handling
  scsi: qla2xxx: Fix FC-NVMe IO abort during driver reset
  scsi: qla2xxx: Fix retry for PRLI RJT with reason of BUSY
  scsi: qla2xxx: Remove nvme_done_list
  scsi: qla2xxx: Return busy if rport going away
  scsi: qla2xxx: Fix n2n_ae flag to prevent dev_loss on PDB change
  scsi: qla2xxx: Add FC-NVMe abort processing
  scsi: qla2xxx: Add changes for devloss timeout in driver
  ...
2018-04-05 15:05:53 -07:00
James Bottomley
2e1f44f6ad Merge branch 'fixes' into misc
Somewhat nasty merge due to conflicts between "33b28357dd00 scsi:
qla2xxx: Fix Async GPN_FT for FCP and FC-NVMe scan" and "2b5b96473efc
scsi: qla2xxx: Fix FC-NVMe LUN discovery"

Merge is non-trivial and has been verified by Qlogic (Cavium)

Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
2018-04-03 17:38:39 -07:00
Masanari Iida
bc8282a730 treewide: Fix typos in printk
This patch fixes spelling typos found in printk.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-03-27 09:51:22 +02:00
Linus Torvalds
170e07bf6b SCSI fixes on 20180222
These are mostly fixes for problems with merge window code.  In
 addition we have one doc update (alua) and two dead code removals
 (aiclib and octogon) a spurious assignment removal (csiostor) and a
 performance improvement for storvsc involving better interrupt
 spreading and increasing the command per lun handling.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWo+H2yYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishe2eAQDyWfoK
 Mfjbrl6cdPop+JIoED0VtBzAQyeXceJt8GYDQwEApXTIZon2HTdJqGawfUhaapBA
 JnO6iOiC13/nZjl7C28=
 =K3Pk
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "These are mostly fixes for problems with merge window code.

  In addition we have one doc update (alua) and two dead code removals
  (aiclib and octogon) a spurious assignment removal (csiostor) and a
  performance improvement for storvsc involving better interrupt
  spreading and increasing the command per lun handling"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: qla4xxx: skip error recovery in case of register disconnect.
  scsi: aacraid: fix shutdown crash when init fails
  scsi: qedi: Cleanup local str variable
  scsi: qedi: Fix truncation of CHAP name and secret
  scsi: qla2xxx: Fix incorrect handle for abort IOCB
  scsi: qla2xxx: Fix double free bug after firmware timeout
  scsi: storvsc: Increase cmd_per_lun for higher speed devices
  scsi: qla2xxx: Fix a locking imbalance in qlt_24xx_handle_els()
  scsi: scsi_dh: Document alua_rtpg_queue() arguments
  scsi: Remove Makefile entry for oktagon files
  scsi: aic7xxx: remove aiclib.c
  scsi: qla2xxx: Avoid triggering undefined behavior in qla2x00_mbx_completion()
  scsi: mptfusion: Add bounds check in mptctl_hp_targetinfo()
  scsi: sym53c8xx_2: iterator underflow in sym_getsync()
  scsi: bnx2fc: Fix check in SCSI completion handler for timed out request
  scsi: csiostor: remove redundant assignment to pointer 'ln'
  scsi: ufs: Enable quirk to ignore sending WRITE_SAME command
  scsi: ibmvfc: fix misdefined reserved field in ibmvfc_fcp_rsp_info
  scsi: qla2xxx: Fix memory corruption during hba reset test
  scsi: mpt3sas: fix an out of bound write
2018-02-23 14:09:43 -08:00
Raghava Aditya Renukunta
eee549e1e3 scsi: aacraid: Auto detect INTx or MSIx mode during sync cmd processing
During sync command processing, if legacy INTx status indicates command
is not completed, sample the MSIx register and check if it indicates
command completion, set controller MSIx enabled flag.

Signed-off-by: Prasad B Munirathnam <prasad.munirathnam@microsemi.com>
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-02-13 21:37:04 -05:00
Raghava Aditya Renukunta
a5799d74d9 scsi: aacraid: Preserve MSIX mode in the OMR register
Preserve the current MSIX mode value in the OMR before rewriting the OMR
to initiate the IOP or Soft Reset.

Signed-off-by: Prasad B Munirathnam <prasad.munirathnam@microsemi.com>
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-02-13 21:37:03 -05:00
Raghava Aditya Renukunta
44f1ce7d2f scsi: aacraid: Implement DropIO sync command
IOP_RESET takes a long time to complete. If controller is in a state
where we can bring it back with init struct, send a DropIO sync command
instead.

 - If controller is faulted perform standard IOP_RESET in aac_srcv_init.

 - If controller is not faulted get adapter properties and extended
   properties.

 - Update the sa_firmware variable and determine if DropIO request is
   supported.

 - Issue DropIO request, and get the number of outstanding commands.

 - If all commands are complete with success (CT_OK), consider IOP_RESET
   is complete.

 - If any commands timeout, Perform the IOP_RESET.

Signed-off-by: Prasad B Munirathnam <prasad.munirathnam@microsemi.com>
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-02-13 21:37:03 -05:00
Meelis Roos
00c20cdc79 scsi: aacraid: fix shutdown crash when init fails
When aacraid init fails with "AAC0: adapter self-test failed.", shutdown
leads to UBSAN warning and then oops:

[154316.118423] ================================================================================
[154316.118508] UBSAN: Undefined behaviour in drivers/scsi/scsi_lib.c:2328:27
[154316.118566] member access within null pointer of type 'struct Scsi_Host'
[154316.118631] CPU: 2 PID: 14530 Comm: reboot Tainted: G        W        4.15.0-dirty #89
[154316.118701] Hardware name: Hewlett Packard HP NetServer/HP System Board, BIOS 4.06.46 PW 06/25/2003
[154316.118774] Call Trace:
[154316.118848]  dump_stack+0x48/0x65
[154316.118916]  ubsan_epilogue+0xe/0x40
[154316.118976]  __ubsan_handle_type_mismatch+0xfb/0x180
[154316.119043]  scsi_block_requests+0x20/0x30
[154316.119135]  aac_shutdown+0x18/0x40 [aacraid]
[154316.119196]  pci_device_shutdown+0x33/0x50
[154316.119269]  device_shutdown+0x18a/0x390
[...]
[154316.123435] BUG: unable to handle kernel NULL pointer dereference at 000000f4
[154316.123515] IP: scsi_block_requests+0xa/0x30

This is because aac_shutdown() does

        struct Scsi_Host *shost = pci_get_drvdata(dev);
        scsi_block_requests(shost);

and that assumes shost has been assigned with pci_set_drvdata().

However, pci_set_drvdata(pdev, shost) is done in aac_probe_one() far
after bailing out with error from calling the init function
((*aac_drivers[index].init)(aac)), and when the init function fails, no
error is returned from aac_probe_one() so PCI layer assumes there is
driver attached, and tries to shut it down later.

Fix it by returning error from aac_probe_one() when card-specific init
function fails.

This fixes reboot on my HP NetRAID-4M with dead battery.

Signed-off-by: Meelis Roos <mroos@linux.ee>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-02-13 21:35:40 -05:00
Linus Torvalds
28bc6fb959 SCSI misc on 20180131
This is mostly updates of the usual driver suspects: arcmsr,
 scsi_debug, mpt3sas, lpfc, cxlflash, qla2xxx, aacraid, megaraid_sas,
 hisi_sas.  We also have a rework of the libsas hotplug handling to
 make it more robust, a slew of 32 bit time conversions and fixes, and
 a host of the usual minor updates and style changes.  The biggest
 potential for regressions is the libsas hotplug changes, but so far
 they seem stable under testing.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWnH+5SYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishWxuAP0UvuJp
 MNR/yU/wv/emSzOc48Ldwd7I0xD2XxSnloGUgwD+IGZZT5yNUQA1THCbm+en4hkB
 WvyBieQs9qRit+2czd4=
 =gJMf
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This is mostly updates of the usual driver suspects: arcmsr,
  scsi_debug, mpt3sas, lpfc, cxlflash, qla2xxx, aacraid, megaraid_sas,
  hisi_sas.

  We also have a rework of the libsas hotplug handling to make it more
  robust, a slew of 32 bit time conversions and fixes, and a host of the
  usual minor updates and style changes. The biggest potential for
  regressions is the libsas hotplug changes, but so far they seem stable
  under testing"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (313 commits)
  scsi: qla2xxx: Fix logo flag for qlt_free_session_done()
  scsi: arcmsr: avoid do_gettimeofday
  scsi: core: Add VENDOR_SPECIFIC sense code definitions
  scsi: qedi: Drop cqe response during connection recovery
  scsi: fas216: fix sense buffer initialization
  scsi: ibmvfc: Remove unneeded semicolons
  scsi: hisi_sas: fix a bug in hisi_sas_dev_gone()
  scsi: hisi_sas: directly attached disk LED feature for v2 hw
  scsi: hisi_sas: devicetree: bindings: add LED feature for v2 hw
  scsi: megaraid_sas: NVMe passthrough command support
  scsi: megaraid: use ktime_get_real for firmware time
  scsi: fnic: use 64-bit timestamps
  scsi: qedf: Fix error return code in __qedf_probe()
  scsi: devinfo: fix format of the device list
  scsi: qla2xxx: Update driver version to 10.00.00.05-k
  scsi: qla2xxx: Add XCB counters to debugfs
  scsi: qla2xxx: Fix queue ID for async abort with Multiqueue
  scsi: qla2xxx: Fix warning for code intentation in __qla24xx_handle_gpdb_event()
  scsi: qla2xxx: Fix warning during port_name debug print
  scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout()
  ...
2018-01-31 11:23:28 -08:00
Raghava Aditya Renukunta
cfc350ab0e scsi: aacraid: Delay for rescan worker needs to be 10 seconds
The delay for the rescan worker needs to 10 seconds, missed the HZ in
there.

Fixes: a1367e4ade (scsi: aacraid: Reschedule host scan in case of failure)
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-10 23:25:12 -05:00
Raghava Aditya Renukunta
bbd16d96d1 scsi: aacraid: Get correct lun count
The correct lun count needs to be divided by 24, missed it in the
previous patch set.

Fixes: 4b00022753 (scsi: aacraid: Create helper functions to get lun info)
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-10 23:25:11 -05:00
Colin Ian King
9181474464 scsi: aacraid: remove redundant setting of variable c
A previous commit no longer stores the contents of c, so we now have a
situation where c is being updated but the value is never read. Clean up
the code by removing the now redundant setting of variable c.

Cleans up clang warning:
drivers/scsi/aacraid/aachba.c:943:3: warning: Value stored to 'c' is
never read

Fixes: f4e8708d31 ("scsi: aacraid: Fix udev inquiry race condition")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-08 21:49:09 -05:00
Meelis Roos
bef4e68830 scsi: aacraid: Fix driver oops with dead battery
The battery in my HP NetRAID-4M died of old age, and the aacraid driver
started oopsing with NULL pointer dereference on startup after that.

Fix it by reordering the init sequence to fill in function pointers
before ioremapping memory, or dev->a_ops.adapter_ioremap pointer will be
NULL.

Other subtypes of aacraid seem to have the order already correct.

This was the call trace:

 ? aac_probe_one+0x7a5/0xb30 [aacraid]
 pci_device_probe+0xc0/0x1a0
 driver_probe_device+0x1df/0x3b0
 __driver_attach+0xa9/0xe0
 ? driver_probe_device+0x3b0/0x3b0
 bus_for_each_dev+0x4c/0x90
 driver_attach+0x1d/0x40
 ? driver_probe_device+0x3b0/0x3b0
 bus_add_driver+0x1a7/0x2a0
 driver_register+0x6e/0x130
 __pci_register_driver+0x54/0x90
 ? 0xf81f4000
 aac_init+0x2b/0x1000 [aacraid]
 do_one_initcall+0x45/0x1e0
 ? kfree_skbmem+0x74/0xa0
 ? kfree+0x16d/0x240
 ? kvfree+0x45/0x50
 ? kvfree+0x45/0x50
 ? __vunmap+0x99/0x120
 ? do_init_module+0x1a/0x245
 do_init_module+0x83/0x245
 load_module+0x2764/0x34a0
 ? kernel_read_file+0x150/0x320
 SyS_finit_module+0x82/0xa0
 do_fast_syscall_32+0xba/0x340

Signed-off-by: Meelis Roos <mroos@linux.ee>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-04 01:03:41 -05:00
Raghava Aditya Renukunta
1cdb74b80f scsi: aacraid: Update driver version to 50877
Update driver Version to 50877

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:43 -05:00
Raghava Aditya Renukunta
e51c4d703d scsi: aacraid: Remove AAC_HIDE_DISK check in queue command
Earlier driver would scan throgh all supported buses and targets and add
devices that responded. It would add devices that were _hidden_ by the fw.
Driver would invalidate commands sent to _hidden_ devices via the
AAC_HIDE_DISK check.

Since the driver now adds only the devices that are supposed to be
exposed, this code can be removed.

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:43 -05:00
Raghava Aditya Renukunta
75be67cd15 scsi: aacraid: Remove unused rescan variable
Remove unused rescan variable.

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:43 -05:00
Raghava Aditya Renukunta
fe5237590b scsi: aacraid: Skip schedule rescan in case of kdump
There is a chance of the driver to be stuck in kdump if drives start
acting up in kdump discovery process and the kernel decides to send eh
resets, which would prompt rescan to be scheduled.

Do not perform a rescan in kdump context, since we do not expect a hotplug
event during kdump and all the devices are going to go away anyway.

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:43 -05:00
Raghava Aditya Renukunta
8a30e50b72 scsi: aacraid: Fix hang while scanning in eh recovery
Add back the ability to scan for hotplug changes while eh was in progress.

Schedule a rescan for a later time in the eh recovery code and wait for
eh to complete in the rescan worker.

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:43 -05:00
Raghava Aditya Renukunta
a1367e4ade scsi: aacraid: Reschedule host scan in case of failure
If the driver fails to retrieve information from the fw (could happen when
the fw is not fully in its senses), the driver does nothing and change is
not processed correctly by the driver

Schedule host rescan in case of failure. This is only for SAFW, since
the information retrieval failure will happen on SAFW devices.

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:43 -05:00
Raghava Aditya Renukunta
8ebaa67fc2 scsi: aacraid: Use hotplug handling function in place of scsi_scan_host
Driver uses scsi_scan_host to add new devices in the driver init path,
which adds all the fw exposed devices. The drivers resorts to queue
command checks to block out commands to _hidden_ devices.

Use the hotplug handler code to add new devices during driver init and
other areas, this is only for safw. For ARC scsi_scan_host will still
apply.

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:43 -05:00
Raghava Aditya Renukunta
3395614e48 scsi: aacraid: Block concurrent hotplug event handling
Currently driver will attempt to process hotplug events concurrently based
on the FW interrupt.

Protect safw update function with a scan mutex.

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:43 -05:00
Raghava Aditya Renukunta
6f44a22b2c scsi: aacraid: Merge adapter setup with resolve luns
The device hotplug events are processed only after retrieving the updated
lun information from the fw. Does not make sense to keep them separate.

Merge both the hotplug handling and safw adapter setup code into single
function.

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:43 -05:00
Raghava Aditya Renukunta
3031c6565f scsi: aacraid: Refactor resolve luns code and scsi functions
Resolve luns checks the if a sdev is already present in the os to figure
out if it needs to be removed. Internally the driver exposes HBA on bus
2 even though its bus 1 in the fw. Its mildly confusing.

Refactor out the sdev lookup into its function to check if sdev has been
added to the kernel or not. Add helper functions to add, remove and put
devices based on their fw bus and target number.

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:43 -05:00
Raghava Aditya Renukunta
2290678fed scsi: aacraid: Added macros to help loop through known buses and targets
Added macros to loop through the MAX SUPPORTED Buses and Targets. This
will make the code a bit easier to read.

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:43 -05:00
Raghava Aditya Renukunta
f2d2cabadb scsi: aacraid: Process hba and container hot plug events in single function
The hotplug handler code is duplicated for hba handling and container
handling.

Merged function to handle hba and container hot plug events into the
resolve luns functions. Added a bunch of helper functions to check the
validity of a given target and to check if bus, target is container
device.

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:42 -05:00
Raghava Aditya Renukunta
1d1fec53dc scsi: aacraid: Merge func to get container information
Merge aac_get_containers to setup target function, so that information
about all the present devices can be retrieved in one shot.

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:42 -05:00
Raghava Aditya Renukunta
0bcb45fb20 scsi: aacraid: Add helper function to set queue depth
Add helper function to set queue depth from information retrieved from
the bmic phy structure.

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:42 -05:00
Raghava Aditya Renukunta
e2ee8c9480 scsi: aacraid: Save bmic phy information for each phy
Save the bmic information for each phy, so that it can processed in
target setup function.

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:42 -05:00
Raghava Aditya Renukunta
4b00022753 scsi: aacraid: Create helper functions to get lun info
Created inline function to retrieve lun info for each device from the
phy luns structure.

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:42 -05:00
Raghava Aditya Renukunta
a25b6ca1a9 scsi: aacraid: Move function around to match existing code
Move the function to get phy luns information to the top of function
to set target information

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:42 -05:00
Raghava Aditya Renukunta
3edfb8b2e2 scsi: aacraid: Untangle targets setup from report phy luns
Remove function call to process targets from the report phy luns function
and make it a function in its own right. This will help understand the
flow of the code.

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:42 -05:00
Raghava Aditya Renukunta
fc0fdd9abc scsi: aacraid: Add target setup helper function
Add helper function to setup targets devices and create the base for the
upcoming patches

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:42 -05:00
Raghava Aditya Renukunta
b5a475e944 scsi: aacraid: Refactor and rename to make mirror existing changes
Rename variables and functions to make bmic identify, report phy luns
to make them consistent across code internal existing code bases

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:42 -05:00
Raghava Aditya Renukunta
5480aa1837 scsi: aacraid: Change phy luns function to use common bmic function
Edit function that retrieves phy lun information to use common
bmic function

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:42 -05:00
Raghava Aditya Renukunta
8fb391827f scsi: aacraid: Create bmic submission function from bmic identify
safw command submission is duplicated across many functions.

Move the safw submission code from bmic identify into its own function
for common use

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:42 -05:00
Raghava Aditya Renukunta
216ced02fa scsi: aacraid: Move code to wait for IO completion to shutdown func
Ideally driver needs to wait for IO to be submitted or responded to before
shutdown.

Move code to wait for IO completion into shutdown path

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:42 -05:00
Raghava Aditya Renukunta
97a4e8ac3f scsi: aacraid: Refactor reset_host store function
Refactored the reset_host store function to make consistent across code
bases

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:41 -05:00
Raghava Aditya Renukunta
d1471eb0fa scsi: aacraid: Allow reset_host sysfs var to recover Panicked Fw
It is possible to restart the controller via the use of the reset_host
sysfs variable. This does work for controllers that can no longer respond,
since driver will attempt to send down a shutdown in this path.

Check if the controller is able to receive commands before sending down
a shutdown

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:41 -05:00
Raghava Aditya Renukunta
f3a2327725 scsi: aacraid: Fix ioctl reset hang
Driver would hang when attempting to send reset from the ioctl interface,
since it would wait to retrieve the ioctl mutex at send shutdown.

Set adapter shutdown and unlock mutex before sending down reset request.

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:41 -05:00
Raghava Aditya Renukunta
95900629fa scsi: aacraid: Do not remove offlined devices
As part of the recovery process, the drivers removes offline devices (
done by the kernel) and then tries to add them back in the rescan code.
Removing the device is like taking a sledgehammer to a nail.

Set the device as running if it is marked offline.

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:41 -05:00
Raghava Aditya Renukunta
c5313ae8e4 scsi: aacraid: Fix hang in kdump
Driver attempts to perform a device scan and device add after coming out
of reset. At times when the kdump kernel loads and it tries to perform
eh recovery, the device scan hangs since its commands are blocked because
of the eh recovery. This should have shown up in normal eh recovery path
(Should have been obvious)

Remove the code that performs scanning.I can live without the rescanning
support in the stable kernels but a hanging kdump/eh recovery needs to be
fixed.

Fixes: a2d0321dd5 (scsi: aacraid: Reload offlined drives after controller reset)
Cc: <stable@vger.kernel.org>
Reported-by: Douglas Miller <dougmill@linux.vnet.ibm.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Fixes: a2d0321dd5 (scsi: aacraid: Reload offlined drives after controller reset)
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:41 -05:00
Raghava Aditya Renukunta
dfb92a1f93 scsi: aacraid: Do not attempt abort when Fw panicked
Check if the adapter can receive abort requests, before sending aborts

Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:41 -05:00
Raghava Aditya Renukunta
f4e8708d31 scsi: aacraid: Fix udev inquiry race condition
When udev requests for a devices inquiry string, it might create multiple
threads causing a race condition on the shared inquiry resource string.

Created a buffer with the string for each thread.

Cc: <stable@vger.kernel.org>
Fixes: 3bc8070fb7 ([SCSI] aacraid: SMC vendor identification)
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-01-03 23:26:41 -05:00
Prasad B Munirathnam
5771cfffdf scsi: aacraid: Fix I/O drop during reset
"FIB_CONTEXT_FLAG_TIMEDOUT" flag is set in aac_eh_abort to indicate
command timeout. Using the same flag in reset handler causes the command
to time out and the I/Os were dropped.

Define a new flag "FIB_CONTEXT_FLAG_EH_RESET" to make sure I/O is
properly handled in eh_reset handler.

[mkp: tweaked commit message]

Signed-off-by: Prasad B Munirathnam <prasad.munirathnam@microsemi.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-12-14 22:34:28 -05:00
Colin Ian King
efbbbb1023 scsi: aacraid: remove unused variable managed_request_id
Variable managed_request_id is being assigned but it is never read,
hence it is redundant and can be removed. Cleans up clang warning:

drivers/scsi/aacraid/linit.c:706:5: warning: Value stored to
'managed_request_id' is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-12-04 20:32:53 -05:00
Arnd Bergmann
d18539754d scsi: aacraid: address UBSAN warning regression
As reported by Meelis Roos, my previous patch causes an incorrect
calculation of the timeout, through an undefined signed integer
overflow:

[   12.228155] UBSAN: Undefined behaviour in drivers/scsi/aacraid/commsup.c:2514:49
[   12.228229] signed integer overflow:
[   12.228283] 964297611 * 250 cannot be represented in type 'long int'

The problem is that doing a multiplication with HZ first and then
dividing by USEC_PER_SEC worked correctly for 32-bit microseconds,
but not for 32-bit nanoseconds, which would require up to 41 bits.

This reworks the calculation to first convert the nanoseconds into
jiffies, which should give us the same result as before and not overflow.

Unfortunately I did not understand the exact intention of the algorithm,
in particular the part where we add half a second, so it's possible that
there is still a preexisting problem in this function. I added a comment
that this would be handled more nicely using usleep_range(), which
generally works better for waking up at a particular time than the
current schedule_timeout() based implementation. I did not feel
comfortable trying to implement that without being sure what the
intent is here though.

Fixes: 820f188659 ("scsi: aacraid: use timespec64 instead of timeval")
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-11-29 00:07:20 -05:00
Guilherme G. Piccoli
e4717292dd scsi: aacraid: Prevent crash in case of free interrupt during scsi EH path
As part of the scsi EH path, aacraid performs a reinitialization of the
adapter, which encompass freeing resources and IRQs, NULLifying lots of
pointers, and then initialize it all over again.  We've identified a
problem during the free IRQ portion of this path if CONFIG_DEBUG_SHIRQ
is enabled on kernel config file.

Happens that, in case this flag was set, right after free_irq()
effectively clears the interrupt, it checks if it was requested as
IRQF_SHARED. In positive case, it performs another call to the IRQ
handler on driver. Problem is: since aacraid currently free some
resources *before* freeing the IRQ, once free_irq() path calls the
handler again (due to CONFIG_DEBUG_SHIRQ), aacraid crashes due to NULL
pointer dereference with the following trace:

  aac_src_intr_message+0xf8/0x740 [aacraid]
  __free_irq+0x33c/0x4a0
  free_irq+0x78/0xb0
  aac_free_irq+0x13c/0x150 [aacraid]
  aac_reset_adapter+0x2e8/0x970 [aacraid]
  aac_eh_reset+0x3a8/0x5d0 [aacraid]
  scsi_try_host_reset+0x74/0x180
  scsi_eh_ready_devs+0xc70/0x1510
  scsi_error_handler+0x624/0xa20

This patch prevents the crash by changing the order of the
deinitialization in this path of aacraid: first we clear the IRQ, then
we free other resources. No functional change intended.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-11-20 22:33:09 -05:00
Guilherme G. Piccoli
d9b6d85a38 scsi: aacraid: Perform initialization reset only once
Currently the driver accepts two ways of requesting an initialization
reset on the adapter: by passing aac_reset_devices module parameter,
or the generic kernel parameter reset_devices.

It's working as intended...but if we end up reaching a scsi hang and
the scsi EH mechanism takes place, aacraid performs resets as part of
the scsi error recovery procedure. These EH routines might reinitialize
the device, and if we have provided some of the reset parameters in the
kernel command-line, we again perform an "initialization" reset.

So, to avoid this duplication of resets in case of scsi EH path, this
patch adds a field to aac_dev struct to keep per-adapter track of the
init reset request - once it's done, we set it to false and don't
proactively reset anymore in case of reinitializations.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-11-20 22:32:00 -05:00
Guilherme G. Piccoli
bd257b2f3b scsi: aacraid: Check for PCI state of device in a generic way
Commit 16ae9dd35d ("scsi: aacraid: Fix for excessive prints on EEH")
introduced checks about the state of device before any PCI operations in
the driver. Basically, this prevents it to perform PCI accesses when
device is in the process of recover from a PCI error. In PowerPC, such
mechanism is called EEH, and the aforementioned commit introduced checks
that are based on EEH-specific primitives for that.

The potential problems with this approach are three: first, these checks
are "locked" to powerpc only - another archs could have error recovery
methods too, like AER in Intel. Also, the powerpc primitives perform
expensive FW accesses to validate the precise PCI state of a device.
Finally, code becomes more complicated and needs ifdef validation based
on arch config being set.

So, this patch makes use of generic PCI state checks, which are
lightweight and non-dependent of arch configs - also, it makes the code
cleaner.

Fixes: 16ae9dd35d ("scsi: aacraid: Fix for excessive prints on EEH")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-11-20 22:29:10 -05:00