When expander connected in x2 or x4 mode and with IO runnning, if
a cable from wideport is plugged out from the phy, IO's start failing
on all the targets.
Observed that when cable is pulled with IO running, cominit is
happening on all the links and IO's start dropping to 0 and eventually
the whole IO fails. Second observation, target is trying to open and
SCU is responding with "Open reject no destination".
A cause of the problem is when the port went from the "ready
configuring substate" back to "ready configuring substate" as a result
of phy being pulled off, scic suspended the port task scheduler
register. As a result no IO was allowed and in the "substate
configuring enter" routine the IO never goes back to 0. As a result
the port never comes out of "ready substate configuring".
The patch adds a mechanism of activate and deactivate phy when a port
link up, which fixes the problem.
Signed-off-by: Bartek Nowakowski <bartek.nowakowski@intel.com>
Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Marcin Tomczak <marcin.tomczak@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Arrange for task_contexts prepared for the wide targets to account for
all the attached phys in the port.
Signed-off-by: Bartek Nowakowski <bartek.nowakowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Failure seen pulling a cable from a x4 port configured in manual port
configuration (MPC) mode (MPC mode is set by the the OEM paramaters
provided by the platform or isci_firmware.bin). While IO running to
devices behind and expander, plugging out the cable from phy is causing
IO failures and IO drops on disks and never recover.
It happens because during link up/down the phy were being taken out of
the port.
Fix: during link down the phy is kept in the same logical port.
Signed-off-by: Marcin Tomczak <marcin.tomczak@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The initial bcn filtering implementation was validated on a kernel
baseline that predated the switch to new libata error handling. Also,
prior to that conversion we borrowed the mvsas MVS_DEV_EH approach to
prevent the unwanted extra ap->ops->phy_reset(ap) that occurred in the
ata_bus_probe() path.
After the conversion to new libata eh resets at discovery are more
frequent and get filtered prematurely by IDEV_EH. The result is that
our bcn filtering has been blocked from running and at discovery and it
appears to stall discovery completion to the point of triggering hung
task timeouts. So, revert the implementation for now. When it returns
it will go into libsas proper.
The domain rediscovery that takes place due to ->lldd_I_T_nexus_reset()
events should now be properly waited for by the ata_port_wait_eh() call
in ata_port_probe(). So the hard coded delay in the isci
->lldd_I_T_nexus_reset() and other libsas drivers should help debounce
the libsas thread from seeing temporary device removals.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
A hard reset can timeout before or after the last phy in the
port goes away. If after, then notify the OS that the last
phy has failed.
The recovery for the failed hard reset has been removed.
This recovery code was unecessary in that the link would
recover from the failure normally by a new link reset sequence
or hotplug of the remote device.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Fixes a bug where any phy removed from the port set the port
state to "stopping" - do this only when the last phy removed
from the port.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Most of these simple dereference macros are longer than their open coded
equivalent. Deleting enum sci_controller_mode is thrown in for good
measure.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The distinction between scic_sds_ scic_ and sci_ are no longer relevant
so just unify the prefixes on sci_. The distinction between isci_ and
sci_ is historically significant, and useful for comparing the old
'core' to the current Linux driver. 'sci_' represents the former core as
well as the routines that are closer to the hardware and protocol than
their 'isci_' brethren. sci == sas controller interface.
Also unwind the 'sds1' out of the parameter structs.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Remove the distinction between these two implementations and unify on
isci_host (local instances named ihost). Hmmm, we had two
'oem_parameters' instances, one was unused... nice.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Remove the distinction between these two implementations and unify on
isci_remote_device (local instances named idev).
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Remove the distinction between these two implementations and unify on
isci_port (local instances named iport). The duplicate '->owning_port' and
'->isci_port' in both isci_phy and isci_remote_device will be fixed in a later
patch... this is just the straightforward rename/unification.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
They are one in the same object so remove the distinction. The near
duplicate fields (owning_port, and isci_port) will be cleaned up
after the scic_sds_port isci_port unification.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
They are one in the same object so remove the distinction. The near
duplicate fields (owning_controller, and isci_host) will be cleaned up
after the scic_sds_contoller isci_host unification.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The tci_pool tracks our outstanding command slots which are also the 'index'
portion of our tags. Grabbing the tag early in ->lldd_execute_task let's us
drop the isci_host_can_queue() and ->was_tag_assigned_by_user infrastructure.
->was_tag_assigned_by_user required the task context to be duplicated in
request-local buffer. With the tci established early we can build the
task_context directly into its final location and skip a memcpy.
With the task context buffer at a known address at request construction we
have the opportunity/obligation to also fix sgl handling. This rework feels
like it belongs in another patch but the sgl handling and task_context are too
intertwined.
1/ fix the 'ab' pair embedded in the task context to point to the 'cd' pair in
the task context (previously we were prematurely linking to the staging
buffer).
2/ fix the broken iteration of pio sgls that assumes all sgls are relative to
the request, and does a dangerous looking reverse lookup of physical
address to virtual address.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We have unsafe references to remote devices that are notified to
disappear at lldd_dev_gone. In order to clean this up we need a single
canonical source for device lookups and stable references once a lookup
succeeds. Towards that end guarantee that domain_device.lldd_dev is
NULL as soon as we start the process of stopping a device. Any code
path that wants to safely lookup a remote device must do so through
task->dev->lldd_dev (isci_lookup_device()).
For in-flight references outside of scic_lock we need reference counting
to ensure that the device is not recycled before we are done with it.
Simplify device back references to just scic_sds_request.target_device
which is now the only permissible internal reference that is maintained
relative to the reference count.
There were two occasions where we wanted new i/o's to be treated as
SAS_TASK_UNDELIVERED but where the domain_dev->lldd_dev link is still
intact. Introduce a 'gone' flag to prevent i/o while waiting for libsas
to take action on the port down event.
One 'core' leftover is that we currently call
scic_remote_device_destruct() from isci_remote_device_deconstruct()
which is called when the 'core' says the device is stopped. It would be
more natural for the final put to trigger
isci_remote_device_deconstruct() but this implementation is deferred as
it requires other changes.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
A tag is a 16 bit number where the upper four bits is a sequence number
and the remainder is the task context index (tci). Sanitize the macro
names and shave 256-bytes out of scic_sds_controller by reducing the size of
io_request_sequence.
scic_sds_io_tag_construct --> ISCI_TAG
scic_sds_io_tag_get_sequence --> ISCI_TAG_SEQ
scic_sds_io_tag_get_index() --> ISCI_TAG_TCI
scic_sds_io_sequence_increment() [delete / open code]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In the case where the hard reset process fails, each link in
the port is put through a link reset sequence.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When resetting a sata device in the domain we have seen occasions where
libsas prematurely marks a device gone in the time it takes for the
device to re-establish the link. This plays badly with software raid
arrays. Other libsas drivers have non-uniform delays in their reset
handlers to try to cover this condition, but not sufficient to close the
hole. Given that a sata device can take many seconds to recover we
filter bcns and poll for the device reattach state before notifying
libsas that the port needs the domain to be rediscovered. Once this has
been proven out at the lldd level we can think about uplevelling this
feature to a common implementation in libsas.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
[ use kzalloc instead of kmem_cache ]
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
[ use eventq and time macros ]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Additional state machine cleanups:
o Remove static functions sci_state_machine_exit_state() and
sci_state_machine_enter_state()
o Combines sci_base_state_machine_construct() and
sci_base_state_machine_start() into a single function,
sci_init_sm()
o Remove sci_base_state_machine_stop() which is unused.
o Kill state_machine.[ch]
Signed-off-by: Edmund Nadolski <edmund.nadolski@intel.com>
[fixed too large to inline functions]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This cleans up several areas of the state machine mechanism:
o Rename sci_base_state_machine_change_state to sci_change_state
o Remove sci_base_state_machine_get_state function
o Rename 'state_machine' struct member to 'sm' in client structs
o Shorten the name of request states
o Shorten state machine state names as follows:
SCI_BASE_CONTROLLER_STATE_xxx to SCIC_xxx
SCI_BASE_PHY_STATE_xxx to SCI_PHY_xxx
SCIC_SDS_PHY_STARTING_SUBSTATE_xxx to SCI_PHY_SUB_xxx
SCI_BASE_PORT_STATE_xxx to SCI_PORT_xxx and
SCIC_SDS_PORT_READY_SUBSTATE_xxx to SCI_PORT_SUB_xxx
SCI_BASE_REMOTE_DEVICE_STATE_xxx to SCI_DEV_xxx
SCIC_SDS_STP_REMOTE_DEVICE_READY_SUBSTATE_xxx to SCI_STP_DEV_xxx
SCIC_SDS_SMP_REMOTE_DEVICE_READY_SUBSTATE_xxx to SCI_SMP_DEV_xxx
SCIC_SDS_REMOTE_NODE_CONTEXT_xxx_STATE to SCI_RNC_xxx
Signed-off-by: Edmund Nadolski <edmund.nadolski@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Rather than preallocating a list of timers and doling them out at runtime,
embed a struct timerlist in each object that needs one. A struct sci_timer
interface is introduced to manage the timer cancellation semantics which
currently need to guarantee the timer is cancelled while holding
spin_lock(ihost->scic_lock). Since the timeout functions also need to acquire
the lock it currently prevents the driver from using del_timer_sync() for
runtime cancellations.
del_timer_sync() is used however before the objects go out of scope.
Signed-off-by: Edmund Nadolski <edmund.nadolski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that any given object type only has one state_machine we can use
container_of() to get back to the given state machine owner.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Unify the handlers and kill the state handler infrastructure.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Unify the handlers and kill the state handler implementations.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Piotr Sawicki <piotr.sawicki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Unused infrastructure.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Piotr Sawicki <piotr.sawicki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Unify the implementations and remove the state handlers.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Piotr Sawicki <piotr.sawicki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Implement the stop handlers directly in scic_sds_port_stop()
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Piotr Sawicki <piotr.sawicki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
remove the handler from the port state handler table and implement the
logic directly in scic_sds_port_start().
Signed-off-by: Piotr Sawicki <piotr.sawicki@intel.com>
[remove a level of indirection]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This conversion was complicated by the fact that the ready state exit routine
took unconditional action beyond just stopping the substate machine (like in
previous conversions). In order to ensure identical behaviour every state
transition needs to be instrumented to catch ready-->!ready transitions and
execute scic_sds_port_invalidate_dummy_remote_node()
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Piotr Sawicki <piotr.sawicki@intel.com>
[fix ready state exit handling]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Name the table fields for consistancy and clarity.
Signed-off-by: Piotr Sawicki <piotr.sawicki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
While cleaning up the driver it is very tempting to convert scic_sds_get_*
macros to their open coded equivalent. They are all just pointer dereferences
*except* scic_sds_phy_get_port() which returns NULL if the phy is assigned to
the dummy port. Clarify this by renaming it to phy_get_non_dummy_port().
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Move port configuration agent implementation
* Merge core/scic_sds_port.[ch] into port.[ch]
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Consolidate tiny header files
* Move files out of core/ (drop core/scic_sds_ prefix)
* Merge core/scic_sds_request.[ch] into request.[ch]
* Cleanup request.c namespace (clean forward declarations and global
namespace pollution)
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that the data structures are unified unify the implementation in
host.[ch] and cleanup namespace pollution.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Make scic_sds_port a member of isci_port and merge their lifetimes which
means removing the port table from scic_sds_controller in favor of the
one at the isci_host level. Merge ihost->sas_ports into ihost->ports.
_
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Make scic_sds_phy a member of isci_phy and merge their lifetimes which
means removing the phy table from scic_sds_controller in favor of the
one at that isci_host level.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This function is just overkill and its usage is inconsistent. Replace
with inlined code.
Signed-off-by: Edmund Nadolski <edmund.nadolski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Make it explicit that isci_host and scic_sds_controller are one in the same
object.
Signed-off-by: Artur Wojcik <artur.wojcik@intel.com>
[removed ->ihost back pointer]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Moved the actual data structure that's read from the phy register to phy
header. Removed the parsing of identify address frame protocol bits as
that seemed not necessary and we can use existing information.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We need to remove the extra copies of identify address frame that's
being kept around. We only need the one copy that libsas is using.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
[further cleanups]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Convert struct sci_sas_identify_address_frame to struct sas_identify_frame
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Converting of sata_fis_reg_d2h to dev_to_host_fis
Converting of sata_fis_reg_h2d to host_to_dev_fis
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The 'struct sci_base_object' was removed from the struct
scic_sds_port and was replaced by a pointer to
struct isci_port.
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The 'struct sci_base_object' was removed from the struct
scic_sds_phy and was replaced by a pointer to
struct isci_phy.
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The 'struct sci_base_object' was removed from the struct
scic_sds_controller and was replaced by a pointer to
struct isci_host.
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that the core/lldd remote_device data structures are nominally unified
merge the corresponding sources into the top-level directory. Also move the
remote_node_context infrastructure which has no analog at the lldd level.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Removes unnecessary usage of BUG_ON macro, excluding core directory.
In some cases macro is unnecesary, check is done in caller function.
In other cases macro is replaced by if construction with
appropriate warning.
Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
[changed some survivable bug conditions to WARN_ONCE]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>