Commit Graph

14468 Commits

Author SHA1 Message Date
Colin Ian King
a28b259b43 scsi: hisi_sas: add missing break in switch statement
It appears that a break in the TRANS_TX_OPEN_CNX_ERR_NO_DESTINATION case
got accidentally removed in an earlier commit, as it stands, the
ts->stat and ts->open_rej_reason are being updated twice for this case
which looks incorrect.  Fix this by adding in the missing break
statement.

Detected by CoverityScan, CID#1422110 ("Missing break in switch")

Fixes: 634a9585f4 ("scsi: hisi_sas: process error codes according to their priority")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-29 22:38:53 -04:00
Jitendra Bhivare
cbe9fc8594 scsi: be2iscsi: Update driver version
Version 11.4.0.0

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:42 -04:00
Jitendra Bhivare
942b76542e scsi: be2iscsi: Update Copyright
Update Broadcom Copyright markings in all files.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:41 -04:00
Jitendra Bhivare
0ddee50e3f scsi: be2iscsi: Check size before copying ASYNC handle
Data in buffers are gathered into a single buffer before giving to iSCSI
layer. Though less likely to have payload more than 8K in ASYNC PDU, the
data length is provide by FW and check is missing for overrun.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:41 -04:00
Jitendra Bhivare
ba6983a745 scsi: be2iscsi: Remove free_list for ASYNC handles
With previous patch adding ASYNC Rx buffers to free_list is not
required.  Remove all free_list related operations.

Add in_use to track if buffer posted is being processed by driver and
purge all buffers received for connection if found so.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:40 -04:00
Jitendra Bhivare
1e2931f134 scsi: be2iscsi: Use num_cons field in Rx CQE
FW runs out of buffer if buffers are not posted back soon.  ASYNC Rx CQE
indicates that FW has consumed 8 RQEs.  Use it to post back buffers
instead of waiting for buffers to be processed and freed by driver.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:39 -04:00
Jitendra Bhivare
fecc382469 scsi: be2iscsi: Increase HDQ default queue size
Currently, ASYNC PDU default queue size is set to max connections.  This
leaves only one buffer per connection for any ASYNC PDUs from targets.

Double the size of the default queue.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:39 -04:00
Jitendra Bhivare
90e96313a9 scsi: scsi_transport_iscsi: Use flush_work in iscsi_remove_session
scsi_flush_work flushes workqueue for the Scsi_Host.  In iSCSI offload
enabled host, this would wait for all other sessions under the host.

Use flush_work for the session being removed instead.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:33 -04:00
Jitendra Bhivare
d1e1d63b32 scsi: be2iscsi: Replace spin_unlock_bh with spin_lock
spin_unlock_bh back_lock is used in beiscsi_eh_device_reset instead of
spin_lock.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:04 -04:00
Jitendra Bhivare
49fc5152f5 scsi: be2iscsi: Fix closing of connection
CID needs to be freed even when invalidate or upload connection fails.
Attempt to close connection 3 times before freeing CID.

Set cleanup_type to INVALIDATE instead of force TCP_RST.  This
unnecessarily is terminating connection with reset instead of gracefully
closing it.

Set save_cfg to 0 - session not to be saved on flash.

Add delay and process CQ before uploading connection.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:04 -04:00
Jitendra Bhivare
eb419229be scsi: be2iscsi: Check tag in beiscsi_mccq_compl_wait
scsi host12: BS_1377 : mgmt_invalidate_connection Failed for cid=256
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff81332ebf>] __list_add+0xf/0xc0
PGD 0
Oops: 0000 [#1] SMP
Modules linked in:
...
CPU: 9 PID: 1542 Comm: iscsid Tainted: G               ------------ T 3.10.0-514.el7.x86_64 #1
Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 09/12/2016
task: ffff88076f310fb0 ti: ffff88076bba8000 task.ti: ffff88076bba8000
RIP: 0010:[<ffffffff81332ebf>]  [<ffffffff81332ebf>] __list_add+0xf/0xc0
RSP: 0018:ffff88076bbab8e8  EFLAGS: 00010046
RAX: 0000000000000246 RBX: ffff88076bbab990 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880468badf58 RDI: ffff88076bbab990
RBP: ffff88076bbab900 R08: 0000000000000246 R09: 00000000000020de
R10: 0000000000000000 R11: ffff88076bbab5be R12: 0000000000000000
R13: ffff880468badf58 R14: 000000000001adb0 R15: ffff88076f310fb0
FS:  00007f377124a880(0000) GS:ffff88046fa40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000771318000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffff88076bbab990 ffff880468badf50 0000000000000001 ffff88076bbab938
ffffffff810b128b 0000000000000246 00000000cf9b7040 ffff880468bac7a0
0000000000000000 ffff880468bac7a0 ffff88076bbab9d0 ffffffffa05a6ea3

Call Trace:
[<ffffffff810b128b>] prepare_to_wait+0x7b/0x90
[<ffffffffa05a6ea3>] beiscsi_mccq_compl_wait+0x153/0x330 [be2iscsi]
[<ffffffff810b1600>] ? wake_up_atomic_t+0x30/0x30
[<ffffffffa05981b1>] beiscsi_ep_disconnect+0x91/0x2d0 [be2iscsi]
[<ffffffffa0202ffa>] iscsi_if_ep_disconnect.isra.14+0x5a/0x70 [scsi_transport_iscsi]
[<ffffffffa02042fb>] iscsi_if_recv_msg+0x113b/0x14a0 [scsi_transport_iscsi]
[<ffffffff811dffd8>] ? __kmalloc_node_track_caller+0x58/0x290
[<ffffffffa02046ee>] iscsi_if_rx+0x8e/0x1f0 [scsi_transport_iscsi]
[<ffffffff815a351d>] netlink_unicast+0xed/0x1b0
[<ffffffff815a38fe>] netlink_sendmsg+0x31e/0x690
[<ffffffff815a03e4>] ? netlink_rcv_wake+0x44/0x60
[<ffffffff815a19e3>] ? netlink_recvmsg+0x1e3/0x450

beiscsi_mccq_compl_wait gets called even when MCC tag allocation failed
for mgmt_invalidate_connection.  mcc_wait is not initialized for tag 0
so causes crash in prepare_to_wait.

Signed-off-by: Jitendra Bhivare <jitendra.bhivare@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 22:03:04 -04:00
Tomohiro Kusumi
031d1e0f2d scsi: ufs: fix wrong/ambiguous fall through comments
These aren't really falling through to anywhere meaningful.

Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com>
Reviewed-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 21:56:03 -04:00
Dan Carpenter
03b1a06203 scsi: osd_uld: remove an unneeded NULL check
We don't call the remove() function unless probe() succeeds so "oud"
can't be NULL here.  Plus, if it were NULL, we dereference it on the
next line so it would crash anyway.

[mkp: applied by hand]

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Boaz Harrosh <ooo@electrozaur.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-27 21:55:14 -04:00
Brian King
16a20b52d1 scsi: ipr: Driver version 2.6.4
Bump driver version

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 12:04:05 -04:00
Brian King
ef97d8ae12 scsi: ipr: Fix SATA EH hang
This patch fixes a hang that can occur in ATA EH with ipr. With ipr's
usage of libata, commands should never end up on ap->eh_done_q. The
timeout function we use for ipr, even for SATA devices, is
scsi_times_out, so ATA_QCFLAG_EH_SCHEDULED never gets set for ipr and EH
is driven completely by ipr and SCSI. The SCSI EH thread ends up calling
ipr's eh_device_reset_handler, which then calls
ata_std_error_handler. This ends up calling ipr_sata_reset, which issues
a reset to the device. This should result in all pending commands
getting failed back and having ata_qc_complete called for them, which
should end up clearing ATA_QCFLAG_FAILED as qc->flags gets zeroed in
ata_qc_free.  This ensures that when we end up in ata_eh_finish, we
don't do anything more with the command.

On adapters that only support a single interrupt and when running with
two MSI-X vectors or less, the adapter firmware guarantees that
responses to all outstanding commands are sent back prior to sending the
response to the SATA reset command.  On newer adapters supporting
multiple HRRQs, however, this can no longer be guaranteed, since the
command responses and reset response may be processed on different
HRRQs.

If ipr returns from ipr_sata_reset before the outstanding command was
returned, this sends us down the path of __ata_eh_qc_complete which then
moves the associated scsi_cmd from the work_q in
scsi_eh_bus_device_reset to ap->eh_done_q, which then will sit there
forever and we will be wedged.

This patch fixes this up by ensuring that any outstanding commands are
flushed before returning from eh_device_reset_handler for a SATA device.

Reported-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 12:04:05 -04:00
Brian King
f646f325a8 scsi: ipr: Error path locking fixes
This patch closes up some potential race conditions observed in the
error handling paths in ipr while debugging an issue resulting in a hang
with SATA error handling. These patches ensure we are holding the
correct lock when adding and removing commands from the free and pending
queues in some error scenarios.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 12:04:05 -04:00
Brian King
439ae285b9 scsi: ipr: Fix abort path race condition
This fixes a race condition in the error handlomg paths of ipr. While a
command is outstanding to the adapter, it is placed on a pending queue
for the hrrq it is associated with, while holding the HRRQ lock. When a
command is completed, it is removed from the pending queue, under HRRQ
lock, and placed on a local list.  This list is then iterated through
without any locks and each command's done function is invoked, inside of
which, the command gets returned to the free list while grabbing the
HRRQ lock. This fixes two race conditions when commands have been
removed from the pending list but have not yet been added to the free
list. Both of these changes fix race conditions that could result in
returning success from eh_abort_handler and then later calling scsi_done
for the same request.

The first race condition is in ipr_cancel_op. It looks through each
pending queue to see if the command to be aborted is still outstanding
or not. Rather than looking on the pending queue, reverse the logic to
check to look for commands that are NOT on the free queue.  The second
race condition can occur when in ipr_wait_for_ops where we are waiting
for responses for commands we've aborted.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 12:04:05 -04:00
Brian King
960e96480e scsi: ipr: Remove redundant initialization
Removes some code in __ipr_eh_dev_reset which was modifying the ipr_cmd
done function. This should have already been setup at command allocation
time and if its since been changed, it means we are in the ipr_erp*
functions and need to wait for them to complete and don't want to
override that here.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 12:04:05 -04:00
Brian King
66a0d59cdd scsi: ipr: Fix missed EH wakeup
Following a command abort or device reset, ipr's EH handlers wait for
the commands getting aborted to get sent back from the adapter prior to
returning from the EH handler. This fixes up some cases where the
completion handler was not getting called, which would have resulted in
the EH thread waiting until it timed out, greatly extending EH time.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Tested-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 12:04:05 -04:00
Xiaofei Tan
4935933e07 scsi: hisi_sas: add is_sata_phy_v2_hw()
Add helper function is_sata_phy_v2_hw() to judge whether the attached
device is SATA disk for a root PHY.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
Xiang Chen
6073b7719a scsi: hisi_sas: use dev_is_sata to identify SATA or SAS disk
When SMP IO is sent, sas_protocol_ata couldn't judge whether the disk is
SATA or SAS disk.  So use dev_is_sata to identify SATA or SAS disk.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry
14d3f397f6 scsi: hisi_sas: check hisi_sas_lu_reset() error message
Unless we actually get some sort of failure in hisi_sas_lu_reset(),
don't print a message.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
Xiang Chen
ccbfe5a05a scsi: hisi_sas: release SMP slot in lldd_abort_task
When an SMP task timeouts, it will call lldd_abort_task to release the
associated slot, and then will release the sas_task.

Currently in lldd_abort_task, if we fail to internally abort IO, then
the slot of SMP IO is not released, but sas_task will still be later
released, so the slot's sas_task is NULL, which will cause NULL pointer
when hisi_sas_slot_task_free happens later.

To resolve, check the return value of internal abort, and release the
slot if it failed.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry
8b05ad6a9d scsi: hisi_sas: add hisi_sas_clear_nexus_ha()
Add function for upper-layer to reset controller when all else fails.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry
4df642db5b scsi: hisi_sas: rename hisi_sas_link_timeout_{enable, disable}_link
For consistency, remove the "hisi_sas_" prefix.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
Xiaofei Tan
981843c6c4 scsi: hisi_sas: handle PHY UP+DOWN simultaneous irq
Handle the situation that PHY UP and DOWN irq happen simultaneously.
There is no mechanism of SoC HW to ensure this situation will never
happen. So, we add this handle just in case.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry
f1dc751876 scsi: hisi_sas: some modifications to v2 hw reg init values
This patch includes:
(1) Disable transport layer retry
(2) Support CQ time and count interrupt coal
(3) fix link FIFO full issue

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Zhao Nenglong <zhaonenglong@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
Xiang Chen
634a9585f4 scsi: hisi_sas: process error codes according to their priority
There are some rules to decide which error code has the high priority
when errors happen together:

(1) Error phase of CQ decides the error happens on RX or TX;

(2) For TX error, when DMA/TRANS TX error happen simultaneously, the
    priority of DMA TX error is higher than TRANS TX error, so for the
    priority of TX error: DW2 (DMA TX part) > DW0;

(3) For RX error, when TRANS/DMA/SIPC RX error happen simultaneously,
    the priority of TRANS RX error is higher than DMA and SIPC RX error,
    and we should also keep the rules (the priority of DW3 > DW2), so
    for the priority of RX error: DW1 > DW3 > DW2(SIPC RX part);

(4) There are also a priority we should keep in the same error type.

So, modify slot error code to handle this.

In addition to this, some some error codes are modified according to
recommendation from SoC designer.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry
6fcdda8051 scsi: hisi_sas: remove task free'ing for timeouts
When a TMF or internal abort times-out, do not free slot. We expect this
to be done upon later escalated error handling.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry
54c9dd2d26 scsi: hisi_sas: fix some sas_task.task_state_lock locking
Some more locking needs to be added/modified for when
read-modify-writing sas_task.task_state_flags.

Note: since we can attempt to grab this lock in interrupt
      context we should use irq variant of spin_lock.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
Xiang Chen
6131243acd scsi: hisi_sas: free slots after hardreset
After hardreset, we clear up IOs of remote disks, so we need to free
those slots in LLDD.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry
a305f33775 scsi: hisi_sas: check for SAS_TASK_STATE_ABORTED in slot complete
Check in slot_complete_v2_hw() for whether a task has already been
completed by upper layer.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry
055945df4c scsi: hisi_sas: hardreset for SATA disk in LU reset
When issuing an LU reset for a SATA target, issue an internal abort and
a hard reset.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry
c35279f2f1 scsi: hisi_sas: modify hisi_sas_abort_task() for SSP
Currently an internal abort is executed regardless of the result of the
TMF. We should also check the result of the internal abort to see if we
should free the slot.

So change the status code STAT_IO_COMPLETE to TMF_RESP_FUNC_SUCC,
meaning the slot has been successfully aborted.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
Xiang Chen
fc86695144 scsi: hisi_sas: modify error handling for v2 hw
For error codes which need abort-and-retry, simulate IO timeout and let
SCSI+ATA layers process those errors.

Previously for SSP, we should try to abort the IO in the LLDD and then
pass back to upper layer, but sometimes this would also error. So
Instead of adding special error handling for this scenario in the LLDD,
allow the upper layer to handle completely.

No performance hit is seen by taking this approach.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry
b4c67a6ca7 scsi: hisi_sas: only reset link for PHY_FUNC_LINK_RESET
We currently do a hard reset for a link reset. Change this to do a link
reset only.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:02 -04:00
John Garry
ddabca216c scsi: hisi_sas: error hisi_sas_task_prep() when port down
When sas_port is NULL, then return SAS_PHY_DOWN.

In addition, when the sas_dev is gone then explicitly return
SAS_PHY_DOWN.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:01 -04:00
John Garry
405314df56 scsi: hisi_sas: remove hisi_sas_port_deformed()
Currently when a root PHY is deformed from a asd_sas_port we try to
release the slots in the LLDD, and fail.

Regardless, it is not right to release this early.

This patch removes the deformed function. As it was before, port
deformation is still done in hisi_sas_phy_down().

It would be nice to actually remove the hisi_sas_port_{de}formed() pair,
however we cannot as we need to know the asd_sas_port index libsas has
associated with an asd_sas_phy.

The hw does actually generate a port id for a PHY, but this seems to a
random number, so ignored for this purpose.

This patch also changes the code to link slots to the hisi_sas_device,
and not hisi_sas_port.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:01 -04:00
Xiang Chen
7c594f0407 scsi: hisi_sas: add softreset function for SATA disk
Add softreset to clear IO after internal abort device for SATA disk.

The SATA error handling for the controller is based on device internal
abort and softreset function.

The controller does not support internal abort for single IO, so we need
to execute internal abort for device.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:01 -04:00
John Garry
396b80448f scsi: hisi_sas: move PHY init to hisi_sas_scan_start()
Relocate the PHY init code from LLDD hw init path to
hisi_sas_scan_start().

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:01 -04:00
Xiang Chen
06ec0fb97c scsi: hisi_sas: add controller reset
There are some scenarios that we need to warm-reset to reset registers
of SAS controller. During reset we disable interrupts/DQs/PHYs, and
after reset we re-init the hardware and rescan the topology to see if
anything changed.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:01 -04:00
John Garry
2e244f0f5b scsi: hisi_sas: add to_hisi_sas_port()
Introduce function to get hisi_sas_port from asd_sas_port.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 11:12:01 -04:00
Satish Kharat
b8e1aa3c72 scsi: fnic: bug fix for fip.fip_subcode in fnic_fcoe_send_vlan_req
This is a bug introduced when they moved the fip subcodes to central
place. Was sending FIP_SC_VL_NOTE in fip.fip_subcode for VLAN request in
fnic_fcoe_send_vlan_req. Change is to use FIP_SC_VL_REQ instead.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:57:23 -04:00
Satish Kharat
445d296086 scsi: fnic: Adding debug IO and Abort latency counter to fnic stats
The IO and Abort latency counter counts the time taken to complete the
IO and abort command into broad buckets. This is not intended for
performance measurement, just a debug statistic.  current_max_io_time
tries to keep track of the maximum time an IO has taken to complete if
it is > 30sec.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:57:23 -04:00
Satish Kharat
39fcbbc01b scsi: fnic: Adding Check Condition counter to misc fnicstats
Just a simple counter of number of check conditions encountered on that
host.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:57:23 -04:00
Satish Kharat
b9202b4ae8 scsi: fnic: Avoid false out-of-order detection for aborted command
If SCSI-ML has already issued abort on a command i.e
FNIC_IOREQ_ABTS_PENDING is set and we get a IO completion, avoid this
being flagged as out-of-order completion by setting the FNIC_IO_DONE
flag in fnic_fcpio_icmnd_cmpl_handler

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:57:23 -04:00
Satish Kharat
7ef539c88d scsi: fnic: Fix for "Number of Active IOs" in fnicstats becoming negative
Fixing the IO stats update (Active IOs and IO completion) to prevent
"Number of Active IOs" from becoming negative in the fnistats output.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:57:23 -04:00
Satish Kharat
ccc6d70460 scsi: fnic: minor cleanup in fnic_fcpio_itmf_cmpl_handler, removing else case
Getting rid of else case to make the flow look bit simpler.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:51:25 -04:00
Satish Kharat
b43abcbbd5 scsi: fnic: Ratelimit printks to avoid flooding when vlan is not set by the switch.i
This is to avoid the log from being filled with vlan discovery messages
when there is no vlan configured on the switch.

Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Sesidhar Baddela <sebaddel@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:51:25 -04:00
Christoph Hellwig
cca678dfba scsi: fnic: switch to pci_alloc_irq_vectors
Not a full cleanup for the IRQ code, for that we'd need to know if the
max number of the various CQ types is going to stay 1 forever.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-03-23 09:51:10 -04:00