Commit Graph

185 Commits

Author SHA1 Message Date
Hannes Reinecke
d9a00459ef scsi: hisi_sas: fix calls to dma_set_mask_and_coherent()
The change to use dma_set_mask_and_coherent() incorrectly made a second
call with the 32 bit DMA mask value when the call with the 64 bit DMA
mask value succeeded.

[mkp: fixed commit message]

Fixes: e4db40e7a1 ("scsi: hisi_sas: use dma_set_mask_and_coherent")
Cc: <stable@vger.kernel.org>
Suggested-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-02-25 21:44:40 -05:00
Xiang Chen
6db831f4ef scsi: hisi_sas: Make sg_tablesize consistent value
Sht->sg_tablesize is set in the driver, and it will be assigned to
shost->sg_tablesize in SCSI mid-layer. So it is not necessary to assign
shost->sg_table one more time in the driver.

In addition to the change, change each scsi_host_template.sg_tablesize
to HISI_SAS_SGE_PAGE_CNT instead of SG_ALL.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-12 21:23:17 -05:00
Xiang Chen
6e1b731b53 scsi: hisi_sas: Relocate some code to reduce complexity
Relocate the codes related to dma_map/unmap in hisi_sas_task_prep() to
reduce complexity, with a view to add DIF/DIX support.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-12 21:23:16 -05:00
John Garry
735bcc77e6 scsi: hisi_sas: Fix warnings detected by sparse
This patchset fixes some warnings detected by the sparse tool, like these:
drivers/scsi/hisi_sas/hisi_sas_main.c:1469:52: warning: incorrect type in assignment (different base types)
drivers/scsi/hisi_sas/hisi_sas_main.c:1469:52:    expected unsigned short [unsigned] [assigned] [usertype] tag_of_task_to_be_managed
drivers/scsi/hisi_sas/hisi_sas_main.c:1469:52:    got restricted __le16 [usertype] <noident>
drivers/scsi/hisi_sas/hisi_sas_main.c:1723:52: warning: incorrect type in assignment (different base types)
drivers/scsi/hisi_sas/hisi_sas_main.c:1723:52:    expected unsigned short [unsigned] [assigned] [usertype] tag_of_task_to_be_managed
drivers/scsi/hisi_sas/hisi_sas_main.c:1723:52:    got restricted __le16 [usertype] <noident>

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-12-12 21:23:16 -05:00
Xiang Chen
745b684763 scsi: hisi_sas: Relocate some codes to avoid an unused check
In function hisi_sas_task_prep(), we check asd_sas_port, but in function
hisi_sas_task_exec(), we already refer to asd_sas_port by using function
dev_to_hisi_hba() implicitly. So to avoid this possible invalid
dereference, relocate the check to function hisi_sas_task_prep().

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 14:37:05 -05:00
Xiang Chen
c3566f9a61 scsi: hisi_sas: Create separate host attributes per HBA
Currently all the three HBA (v1/v2/v3 HW) share the same host attributes.

To support each HBA having separate attributes in future, create per-HBA
attributes.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-11-15 14:37:05 -05:00
Gustavo A. R. Silva
f4445bb93d scsi: hisi_sas: Fix NULL pointer dereference
There is a NULL pointer dereference in case *slot* happens to be NULL at
lines 1053 and 1878:

struct hisi_sas_cq *cq =
	&hisi_hba->cq[slot->dlvry_queue];

Notice that *slot* is being NULL checked at lines 1057 and 1881:
if (slot), which implies it may be NULL.

Fix this by placing the declaration and definition of variable cq, which
contains the pointer dereference slot->dlvry_queue, after slot has been
properly NULL checked.

Addresses-Coverity-ID: 1474515 ("Dereference before null check")
Addresses-Coverity-ID: 1474520 ("Dereference before null check")
Fixes: 584f53fe5f ("scsi: hisi_sas: Fix the race between IO completion and timeout for SMP/internal IO")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-19 18:27:49 -04:00
Xiang Chen
784b46b7cb scsi: hisi_sas: Use block layer tag instead for IPTT
Currently we use the IPTT defined in LLDD to identify IOs. Actually for
IOs which are from the block layer, they have tags to identify them. So
for those IOs, use tag of the block layer directly, and for IOs which is
not from the block layer (such as internal IOs from libsas/LLDD), reserve
96 IPTTs for them.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16 00:27:04 -04:00
Xiang Chen
584f53fe5f scsi: hisi_sas: Fix the race between IO completion and timeout for SMP/internal IO
If SMP/internal IO times out, we will possibly free the task immediately.

However if the IO actually completes at the same time, the IO completion
may refer to task which has been freed.

So to solve the issue, flush the tasklet to finish IO completion before
free'ing slot/task.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16 00:27:04 -04:00
Luo Jiaxing
1668e3b6f8 scsi: hisi_sas: Move evaluation of hisi_hba in hisi_sas_task_prep()
In evaluating hisi_hba, the sas_port may be NULL, so for safety relocate
the the check to value possible NULL deference.

Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16 00:27:04 -04:00
Luo Jiaxing
5a54691f87 scsi: hisi_sas: Feed back linkrate(max/min) when re-attached
At directly attached situation, if the user modifies the sysfs interface
of maximum_linkrate and minimum_linkrate to renegotiate the linkrate
between SAS controller and target, the value of both files mentioned
above should have change to user setting after renegotiate is over, but
it remains unchanged.

To fix this bug, maximum_linkrate and minimum_linkrate will be directly
fed back to relevant sas_phy structure.

Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-16 00:27:03 -04:00
Jason Yan
640208a1c9 scsi: libsas: make the lldd_port_deformed method optional
Now LLDDs have to implement lldd_port_deformed method otherwise NULL
dereference will happen. Make it optional and remove the dummy implementation
in hisi_sas.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
CC: John Garry <john.garry@huawei.com>
CC: Johannes Thumshirn <jthumshirn@suse.de>
CC: Ewan Milne <emilne@redhat.com>
CC: Christoph Hellwig <hch@lst.de>
CC: Tomas Henzl <thenzl@redhat.com>
CC: Dan Williams <dan.j.williams@intel.com>
CC: Hannes Reinecke <hare@suse.com>
Acked-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-09-25 21:20:23 -04:00
Xiaofei Tan
1c09b66316 scsi: hisi_sas: add memory barrier in task delivery function
In task start delivery function, we need to add a memory barrier to prevent
re-ordering of reading memory by hardware. Because the slot data is set in
task prepare function and it could be running in another CPU.

This patch adds an memory barrier after s->ready is read in the task start
delivery function, and uses WRITE_ONCE() in the places where s->ready is
set to ensure that the compiler does not re-order.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-07-19 21:57:40 -04:00
Xiang Chen
6cca51ee0a scsi: hisi_sas: Tidy hisi_sas_task_prep()
To decrease the usage of spinlock during delivery IO, relocate some code in
hisi_sas_task_prep().

Also an invalid comment is removed.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-07-19 21:57:40 -04:00
Xiaofei Tan
4522204ab2 scsi: hisi_sas: tidy host controller reset function a bit
This patch tidies host controller reset function by putting some code to
two new functions, and exports these two functions out, so that they could
be used by FLR feature to be realised.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-07-19 21:57:39 -04:00
John Garry
4e32b2f484 scsi: hisi_sas: Drop hisi_sas_slot_abort()
For some time now we have not used hisi_sas_slot_abort() to handle erroring
slots, apart from in archaic v1 hw.

As such, remove this function and associated code. For v1 hw, move error
handling to same scheme as other hw revisions, where we allow erroring
commands to timeout.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-07-19 21:57:39 -04:00
John Garry
ce70c2e6af scsi: hisi_sas: Add missing PHY spinlock init
The init is missed for hisi_sas_phy spinlock, so add it.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-06-19 22:02:25 -04:00
Xiang Chen
2ba5afb683 scsi: hisi_sas: Pre-allocate slot DMA buffers
Currently the driver spends much time allocating and freeing the slot DMA
buffer for command delivery/completion. To boost the performance,
pre-allocate the buffers for all IPTT. The downside of this approach is
that we are reallocating all buffer memory upfront, so hog memory which we
may not need.

However, the current method - DMA buffer pool - also caches all buffers and
does not free them until the pool is destroyed, so is not exactly efficient
either.

On top of this, since the slot DMA buffer is slightly bigger than a 4K
page, we need to allocate 2x4K pages per buffer (for 4K page kernel), which
is quite wasteful. For 64K page size this is not such an issue.

So, for the 4K page case, in order to make memory usage more efficient,
pre-allocating larger blocks of DMA memory for the buffers can be more
efficient.

To make DMA memory usage most efficient, we would choose a single
contiguous DMA memory block, but this could use up all the DMA memory in
the system (when CMA enabled and no IOMMU), or we may just not be able to
allocate a DMA buffer large enough when no CMA or IOMMU.

To decide the block size we use the LCM (least common multiple) of the
buffer size and the page size. We roundup(64) to ensure the LCM is not too
large, even though a little memory may be wasted per block.

So, with this, the total memory requirement is about is about 17MB for 4096
max IPTT.

Previously (for 4K pages case), it would be 32MB (for all slots
allocated).

With this change, the relative increase of IOPS for bs=4K read when
PAGE_SIZE=4K and PAGE_SIZE=64K is as follows:
    IODEPTH     4K PAGE_SIZE      64K PAGE_SIZE
    32          56%               47%
    64          53%               44%
    128         64%               43%
    256         67%               45%

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-06-19 22:02:25 -04:00
Xiaofei Tan
f2ae8d0432 scsi: hisi_sas: Release all remaining resources in clear nexus ha
In host reset, we use TMF or soft-reset to re-init device, and if success,
we will release all LLDD resources of this device. If the init fails -
maybe because the device was removed or link has not come up - then do not
release the LLDD resources, but rather rely on SCSI EH to handle the
timeout for these resources later on.

But if clear nexus ha calls host reset, which is the last effort of SCSI
EH, we should release all LLDD remain resources. Because SCSI EH will
release all tasks after clear nexus ha.

Before release, we do I_T nexus reset to try to clear target remain IOs.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-06-19 22:02:25 -04:00
Xiaofei Tan
ed99e1d949 scsi: hisi_sas: Add a flag to filter PHY events during reset
During reset, we don't want PHY events reported to libsas for PHYs which
were previously attached prior to reset.

So check hisi_hba->flags for HISI_SAS_RESET_BIT to filter PHY events during
reset.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-06-19 22:02:25 -04:00
Xiaofei Tan
214e702d4b scsi: hisi_sas: Adjust task reject period during host reset
After soft_reset() for host reset, we should not be allowed to send
commands to the HW before the PHYs have come up and the port ids have been
refreshed.

Prior to this point, any commands cannot be successfully completed.

This exclusion is achieved by grabbing the host reset semaphore.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-06-19 22:02:25 -04:00
Xiaofei Tan
d2fc401e47 scsi: hisi_sas: Fix the conflict between dev gone and host reset
There is a possible conflict when a device is removed and host reset occurs
concurrently.

The reason is that then the device is notified as gone, we try to clear the
ITCT, which is notified via an interrupt. The dev gone function pends on
this event with a completion, which is completed when the ITCT interrupt
occurs.

But host reset will disable all interrupts, the wait_for_completion() may
wait indefinitely.

This patch adds an semaphore to synchronise this two processes. The
semaphore is taken by the host reset as the basis of synchronising.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-06-19 22:02:25 -04:00
Xiang Chen
4e63ac82b9 scsi: hisi_sas: Use dmam_alloc_coherent()
This patch replaces the usage of dma_alloc_coherent() with the managed
version, dmam_alloc_coherent(), hereby reducing replicated code.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by; John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-06-19 22:02:25 -04:00
Xiang Chen
3e1fb1b8ab scsi: hisi_sas: Mark PHY as in reset for nexus reset
When issuing a nexus reset for directly attached device, we want to ignore
the PHY down events so libsas will not deform and reform the port.

In the case that the attached SAS changes for the reset, libsas will deform
and form a port.

For scenario that the PHY does not come up after a timeout period, then
report the PHY down to libsas.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-28 22:40:32 -04:00
Xiaofei Tan
d87e72fb4f scsi: hisi_sas: Fix return value when get_free_slot() failed
It is an step of executing task to get free slot. If the step fails, we
will cleanup LLDD resources and should return failure to upper layer or
internal caller to abort task execution of this time.

But in the current code, the caller of get_free_slot() doesn't return
failure when get_free_slot() failed. This patch is to fix it.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-28 22:40:32 -04:00
Xiaofei Tan
31709548d2 scsi: hisi_sas: Terminate STP reject quickly for v2 hw
For v2 hw, STP link from target is rejected after host reset because of a
SoC bug. The STP reject will be terminated after we have sent IO from each
PHY of a port.

This is not an problem before, as we don't need to setup STP link from
target immediately after host reset. But now, it is.  Because we want to
send soft-reset immediately after host reset.

In order to terminate STP reject quickly, this patch send ATA reset command
through each PHY of a port. Notes: ATA reset command don't need target's
response.

Besides, we do abort dev for each device before terminating STP reject.
This is a quirk of v2 hw.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-28 22:40:32 -04:00
Xiaofei Tan
78bd2b4f6e scsi: hisi_sas: Include TMF elements in struct hisi_sas_slot
In future scenarios we will want to use the TMF struct for more task types
than SSP.

As such, we can add struct hisi_sas_tmf_task directly into struct
hisi_sas_slot, and this will mean we can remove the TMF parameters from the
task prep functions.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-28 22:40:32 -04:00
Xiaofei Tan
a865ae14ff scsi: hisi_sas: Try wait commands before before controller reset
We may reset the controller in many scenarios, such as SCSI EH and HW
errors. There should be no IO which returns from target when SCSI EH is
active. But for other scenarios, there may be.  It is not necessary to make
such IOs fail.

This patch adds an function of trying to wait for any commands, or IO, to
complete before host reset. If no more CQ returned from host controller in
100ms, we assume no more IO can return, and then stop waiting. We wait 5s
at most.

The HW has a register CQE_SEND_CNT to indicate the total number of CQs that
has been reported to driver. We can use this register and it is reliable to
resd this register in such scenarios that require host reset.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-28 22:40:32 -04:00
Xiaofei Tan
6175abdeae scsi: hisi_sas: Init disks after controller reset
After the controller is reset, it is possible that the disks attached still
have outstanding IO to complete.

Thus, when the PHYs come back up after controller reset, it is possible
that these IOs complete at some unknown point later.

We want to ensure that all IOs are complete after the controller reset so
that all associated IPTT and other resources can be recycled safely.

To achieve this, re-init the disks by TMF or softreset (in case of ATA
devices).

If the init fails - maybe because the device was removed or link has not
come up - then do not release the device resources, but rather rely on SCSI
EH to handle the timeout for these resources later on.

This patch also does some cleanup to hisi_sas_init_disk(), including
removing superfluous cases in the switch statement.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-28 22:40:32 -04:00
Xiang Chen
235bfc7ff6 scsi: hisi_sas: Create a scsi_host_template per HW module
When a SCSI host is registered, the SCSI mid-layer takes a reference to a
module in Scsi_host.hostt.module. In doing this, we are prevented from
removing the driver module for the host in dangerous scenario, like when a
disk is mounted.

Currently there is only one scsi_host_template (sht) for all HW versions,
and this is the main.c module. So this means that we can possibly remove
the HW module in this dangerous scenario, as SCSI mid-layer is only
referencing the main.c module.

To fix this, create a sht per module, referencing that same module to
create the Scsi host.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-28 22:40:32 -04:00
Xiang Chen
d5a60dfdb3 scsi: hisi_sas: Reset disks when discovered
When a disk is discovered, it may be in an error state, or there may be
residual commands remaining in the disk.

To ensure any disk is in good state after discovery, reset via TMF (for SAS
disk) or softreset (for a SATA disk).

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-28 22:40:32 -04:00
Xiang Chen
1b86518581 scsi: hisi_sas: Change common allocation mode of device id
To reduce possibility of hitting unknown SoC bugs and aid debugging and
test, change allocation mode of device id from last used device id instead
of lowest available index.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-28 22:40:31 -04:00
Xiang Chen
fa3be0f231 scsi: hisi_sas: change slot index allocation mode
Currently we find the lowest available empty bit in the IPTT bitmap to
allocate the IPTT for a command.

To reduce possibility of hitting unknown SoC bugs and also aid in the
debugging of those same bugs, change the allocation mode.

The next allocation method is to use the next free slot adjacent to the
most recently allocated slot, in a round-robin fashion.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-28 22:40:31 -04:00
John Garry
757db2dae2 scsi: hisi_sas: Introduce hisi_sas_phy_set_linkrate()
There is much common code and functionality between the HW versions to set
the PHY linkrate.

As such, this patch factors out the common code into a generic function
hisi_sas_phy_set_linkrate().

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-28 22:40:31 -04:00
Wei Yongjun
eb217359eb scsi: hisi_sas: fix a typo in hisi_sas_task_prep()
Fix a typo in hisi_sas_task_prep().

Fixes: 7eee4b9218 ("scsi: hisi_sas: relocate smp sg map")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-28 21:36:51 -04:00
Xiaofei Tan
2f6bca202b scsi: hisi_sas: add check of device in hisi_sas_task_exec()
Currently we don't check that device is not gone before dereferencing
its elements in the function hisi_sas_task_exec() (specifically, the DQ
pointer).

This patch fixes this issue by filling in the DQ pointer in
hisi_sas_task_prep() after we check that the device pointer is still
safe to reference.

[mkp: typo]

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:22:09 -04:00
Xiang Chen
e85d93b212 scsi: hisi_sas: Use device lock to protect slot alloc/free
The IPTT of a slot is unique, and we currently use hisi_hba lock to
protect it.

Now slot is managed on hisi_sas_device.list, so use DQ lock to protect
for allocating and freeing the slot.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:22:09 -04:00
Xiang Chen
fa222db0b0 scsi: hisi_sas: Don't lock DQ for complete task sending
Currently we lock the DQ to protect whole delivery process.  So this
stops us building slots for the same queue in parallel, and can affect
performance.

To optimise it, only lock the DQ during special periods, specifically
when allocating a slot from the DQ and when delivering a slot to the HW.

This approach is now safe, thanks to the previous patches to ensure that
we always deliver a slot to the HW once allocated.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:22:09 -04:00
Xiang Chen
3de0026dad scsi: hisi_sas: allocate slot buffer earlier
Currently we allocate the slot's memory buffer after allocating the DQ
slot.

To aid DQ lockout reduction, and allow slots to be built in parallel,
move this step (which can fail) prior to allocating the slot.

Also a stray spin_unlock_irqrestore() is removed from internal task exec
function.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:22:09 -04:00
Xiang Chen
a2b3820bdd scsi: hisi_sas: make return type of prep functions void
Since the task prep functions now should not fail, adjust the return
types to void.

In addition, some checks in the task prep functions are relocated to the
main module; this is specifically the check for the number of elements
in an sg list exceeded the HW SGE limit.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:22:09 -04:00
Xiang Chen
7eee4b9218 scsi: hisi_sas: relocate smp sg map
Currently we use DQ lock to protect delivery of DQ entry one by one.

To optimise to allow more than one slot to be built for a single DQ in
parallel, we need to remove the DQ lock when preparing slots, prior to
delivery.

To achieve this, we rearrange the slot build order to ensure that once
we allocate a slot for a task, we do cannot fail to deliver the task.

In this patch, we rearrange the slot building for SMP tasks to ensure
that sg mapping part (which can fail) happens before we allocate the
slot in the DQ.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-18 11:22:09 -04:00
Xiang Chen
c2c1d9ded0 scsi: hisi_sas: update PHY linkrate after a controller reset
After the controller is reset, we currently may not honour the PHY max
linkrate set via sysfs, in that after a reset we always revert to max
linkrate of 12Gbps, ignoring the value set via sysfs.

This patch modifies to policy to set the programmed PHY linkrate,
honouring the max linkrate programmed via sysfs.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-08 01:10:44 -04:00
John Garry
6f7c32d605 scsi: hisi_sas: stop controller timer for reset
We should only have the timer enabled after PHY up after controller
reset, so disable prior to reset.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-08 01:10:44 -04:00
Xiang Chen
c6ef895472 scsi: hisi_sas: check sas_dev gone earlier in hisi_sas_abort_task()
It is possible to dereference a NULL-pointer in hisi_sas_abort_task() in
special scenario when the device has been removed.

If an SMP task times-out, it will call hisi_sas_abort_task() to
recover. And currently there is a check in hisi_sas_abort_task() to
avoid the situation of processing the abort for the removed device.

However we have an ordering problem, in that we may reference a task for
the removed device before checking if the device has been removed.

Fix this by only referencing the sas_dev after we know it is still
present.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-08 01:10:44 -04:00
Xiang Chen
cd938e535e scsi: hisi_sas: check host frozen before calling "done" function
When the host is frozen in SCSI EH state, at any point after the LLDD
sets SAS_TASK_STATE_DONE for the sas_task task state, libsas may free
the task; see sas_scsi_find_task().

This puts the LLDD in a difficult position, in that once it sets
SAS_TASK_STATE_DONE for the task state it should not reference the
sas_task again. But the LLDD needs will check the sas_task indirectly in
calling task->task_done()->sas_scsi_task_done() or sas_ata_task_done()
(to check if the host is frozen state actually).

And the LLDD cannot set SAS_TASK_STATE_DONE for the task state after
task->task_done() is called (as the sas_task is free'd at this point).

This situation would seem to be a problem made by libsas.

To work around, check in the LLDD whether the host is in frozen state to
ensure it is ok to call task->task_done() function. If in the frozen
state, we rely on SCSI EH and libsas to free the sas_task directly.

We do not do this for the following IO types:

 - SMP - they are managed in libsas directly, outside SCSI EH
 - Any internally originated IO, for similar reason

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-08 01:10:44 -04:00
Xiang Chen
b81b6cce58 scsi: hisi_sas: Add some checks to avoid free'ing a sas_task twice
If the SCSI host enters EH, any pending IO will be processed by SCSI
EH. However it is possible that SCSI EH will try to abort the IO and
also at the same time the IO completes in the driver. In this situation
there is a small chance of freeing the sas_task twice.

Then if another IO re-uses freed sas_task before the second time of
free'ing sas_task, it is possible to free incorrect sas_task.

To avoid this situation, add some checks to increase reliability.  The
sas_task task state flag SAS_TASK_STATE_ABORTED is used to mutually
protect the LLDD and libsas freeing the task.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-08 01:10:43 -04:00
John Garry
c90a0bea4f scsi: hisi_sas: remove some unneeded structure members
This patch removes unneeded structure elements:

- hisi_sas_phy.dev_sas_addr: only ever written
	- Also remove associated function which writes it,
	  hisi_sas_init_add().

- hisi_sas_device.attached_phy: only ever written
	- Also remove code to set it in hisi_sas_dev_found()

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:51 -04:00
Xiaofei Tan
3ff0f0b657 scsi: hisi_sas: consolidate command check in hisi_sas_get_ata_protocol()
Currently we check the fis->command value in 2 locations in
hisi_sas_get_ata_protocol() switch statement. Fix this by consolidating
the check for fis->command value to 1 location only.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:51 -04:00
Xiang Chen
4f4e21b8ff scsi: hisi_sas: use dma_zalloc_coherent()
This is a warning coming from Coccinelle, and need to use new interface
dma_zalloc_coherent() instead of dma_alloc_coherent()/memset().

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:51 -04:00
Xiang Chen
5df41af4b1 scsi: hisi_sas: delete timer when removing hisi_sas driver
Delete timer for v1 and v3 hw when removing hisi_sas driver.

Signed-off-by: Xiang chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-18 19:32:51 -04:00