linux/drivers/firewire
Stefan Richter a481e97d3c firewire: sbp2: fix stall with "Unsolicited response"
Fix I/O stalls with some 4-bay RAID enclosures which are based on
OXUF936QSE:
  - Onnto dataTale RSM4QO, old firmware (not anymore with current
    firmware),
  - inXtron Hydra Super-S LCM, old as well as current firmware
when used in RAID-5 mode, perhaps also in other RAID modes.

The stalls happen during heavy or moderate disk traffic in periods that
are a multiple of 5 minutes, roughly twice per hour.  They are caused
by the target responding too late to an ORB_Pointer register write:
The target responds after Split_Timeout, hence firewire-core cancels
the transaction, and firewire-sbp2 fails the SCSI request.  The SCSI
core retries the request, that fails again (and again), hence SCSI core
calls firewire-sbp2's abort handler (and even the Management_Agent
register write in the abort handler has the transaction timeout
problem).

During all that, the process which issued the I/O is stalled in I/O
wait state.

Meanwhile, the target actually acts on the first failed SCSI request:
It responds to the ORB_Pointer write later (seen in the kernel log as
"firewire_core: Unsolicited response") and also finishes the SCSI
request with proper status (seen in the kernel log as "firewire_sbp2:
status write for unknown orb").

So let's just ignore RCODE_CANCELLED in the transaction callback and
wait for the target to complete the ORB nevertheless.  This requires
a small modification is sbp2_cancel_orbs(); it now needs to call
orb->callback() regardless whether fw_cancel_transaction() found the
transaction unfinished or finished.

A different solution is to increase Split_Timeout on the local node.
(Tested: 2000ms timeout; maybe 1000ms or something like that works too.
200ms is insufficient.  Standard is 100ms.)  However, I rather not do
this because any software on any node could change the Split_Timeout to
something unsuitable.  Or such a large Split_Timeout may be undesirable
for other purposes.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2010-08-19 20:28:25 +02:00
..
core-card.c Merge firewire branches to be released post v2.6.35 2010-08-02 10:09:04 +02:00
core-cdev.c Merge firewire branches to be released post v2.6.35 2010-08-02 10:09:04 +02:00
core-device.c Merge firewire branches to be released post v2.6.35 2010-08-02 10:09:04 +02:00
core-iso.c Merge firewire branches to be released post v2.6.35 2010-08-02 10:09:04 +02:00
core-topology.c firewire: core: fix fw_send_request kerneldoc comment 2010-07-13 09:47:47 +02:00
core-transaction.c firewire: core: fix upper bound of possible CSR allocations 2010-07-23 13:36:28 +02:00
core.h firewire: add isochronous multichannel reception 2010-07-29 23:09:18 +02:00
Kconfig tools/firewire: add userspace front-end of nosy 2010-07-27 11:04:11 +02:00
Makefile firewire: new driver: nosy - IEEE 1394 traffic sniffer 2010-07-27 11:04:10 +02:00
net.c Merge firewire branches to be released post v2.6.35 2010-08-02 10:09:04 +02:00
nosy-user.h firewire: nosy: endianess fixes and annotations 2010-07-27 11:04:11 +02:00
nosy.c firewire: nosy: use generic printk macros 2010-07-27 11:04:11 +02:00
nosy.h firewire: nosy: misc cleanups 2010-07-27 11:04:10 +02:00
ohci.c Merge firewire branches to be released post v2.6.35 2010-08-02 10:09:04 +02:00
ohci.h firewire: add CSR cmstr support 2010-06-10 08:36:37 +02:00
sbp2.c firewire: sbp2: fix stall with "Unsolicited response" 2010-08-19 20:28:25 +02:00