linux/arch/powerpc/platforms/pseries
Mike Mason c58dc575f3 powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset()
While adding native EEH support to Emulex and Qlogic drivers, it was
discovered that dev->error_state was set to pci_io_channel_normal too
late in the recovery process. These drivers rely on error_state to
determine if they can access the device in their slot_reset callback,
thus error_state needs to be set to pci_io_channel_normal in
eeh_report_reset(). Below is a detailed explanation (courtesy of Richard
Lary) as to why this is necessary.

Background:
PCI MMIO or DMA accesses to a frozen slot generate additional EEH
errors. If the number of additional EEH errors exceeds EEH_MAX_FAILS the
adapter will be shutdown. To avoid triggering excessive EEH errors and
an undesirable adapter shutdown, some drivers use the
pci_channel_offline(dev) wrapper function to return a Boolean value
based on the value of pci_dev->error_state to determine if PCI MMIO or
DMA accesses are safe. If the wrapper returns TRUE, drivers must not
make PCI MMIO or DMA access to their hardware.

The pci_dev structure member error_state reflects one of three values,
1) pci_channel_io_normal, 2) pci_channel_io_frozen, 3)
pci_channel_io_perm_failure.  Function pci_channel_offline(dev) returns
TRUE if error_state is pci_channel_io_frozen or pci_channel_io_perm_failure.

The EEH driver sets pci_dev->error_state to pci_channel_io_frozen at the
point where the PCI slot is frozen. Currently, the EEH driver restores
dev->error_state to pci_channel_io_normal in eeh_report_resume() before
calling the driver's resume callback. However, when the EEH driver calls
the driver's slot_reset callback() from eeh_report_reset(), it
incorrectly indicates the error state is still pci_channel_io_frozen.

Waiting until eeh_report_resume() to restore dev->error_state to
pci_channel_io_normal is too late for Emulex and QLogic FC drivers and
any other drivers which are designed to use common code paths in these
two cases: i) those called after the driver's slot_reset callback() and
ii) those called after the PCI slot is frozen but before the driver's
slot_reset callback is called. Case i) all driver paths executed to
reinitialize the hardware after a reset and case ii) all code paths
executed by driver kernel threads that run asynchronous to the main
driver thread, such as interrupt handlers and worker threads to process
driver work queues.

Emulex and QLogic FC drivers are designed with common code paths which
require that pci_channel_offline(dev) reflect the true state of the
hardware. The state transitions that the hardware takes from Normal
Operations to Slot Frozen to Reset to Normal Operations are documented
in the Power Architecture™ Platform Requirements+ (PAPR+) in Table 75.
PE State Control.

PAPR defines the following 3 states:

0 -- Not reset, Not EEH stopped, MMIO load/store allowed, DMA allowed
     (Normal Operations)
1 -- Reset, Not EEH stopped, MMIO load/store disabled, DMA disabled
2 -- Not reset, EEH stopped, MMIO load/store disabled, DMA disabled
     (Slot Frozen)

An EEH error places the slot in state 2 (Frozen) and the adapter driver
is notified that an EEH error was detected. If the adapter driver
returns PCI_ERS_RESULT_NEED_RESET, the EEH driver calls
eeh_reset_device() to place the slot into state 1 (Reset) and
eeh_reset_device completes by placing the slot into State 0 (Normal
Operations). Upon return from eeh_reset_device(), the EEH driver calls
eeh_report_reset, which then calls the adapter's slot_reset callback. At
the time the adapter's slot_reset callback is called, the true state of
the hardware is Normal Operations and should be accurately reflected by
setting dev->error_state to pci_channel_io_normal.

The current implementation of EEH driver does not do so and requires
this change to correct this deficiency.

Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
Acked-by: Linas Vepstas <linasvepstas@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-15 15:23:53 +10:00
..
cmm.c powerpc: Add reboot notifier to Collaborative Memory Manager 2008-12-21 14:21:15 +11:00
dtl.c powerpc: Add write barrier before enabling DTL flags 2009-03-27 16:58:23 +11:00
eeh_cache.c [POWERPC] Add CONFIG_PPC_PSERIES_DEBUG to enable debugging for platforms/pseries 2008-04-24 21:08:12 +10:00
eeh_driver.c powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset() 2009-04-15 15:23:53 +10:00
eeh_event.c
eeh_sysfs.c [POWERPC] Show EEH per-device false positives 2007-06-14 22:29:55 +10:00
eeh.c powerpc/eeh: Make EEH device add/remove more robust 2008-11-06 09:25:15 +11:00
firmware.c [POWERPC] pseries/firmware.c should include pseries/pseries.h 2008-05-14 22:32:00 +10:00
hotplug-cpu.c powerpc/pseries: Fix cpu hotplug 2008-12-23 15:13:27 +11:00
hotplug-memory.c powerpc: Add missing sparsemem.h include 2009-02-10 14:39:09 +11:00
hvCall_inst.c
hvCall.S
hvconsole.c
hvcserver.c
iommu.c powerpc: Change u64/s64 to a long long integer type 2009-01-13 14:47:59 +11:00
Kconfig powerpc: Add virtual processor dispatch trace log 2009-03-24 13:47:28 +11:00
kexec.c powerpc/pseries: Call pseries_kexec_setup only on pseries 2008-06-30 22:30:57 +10:00
lpar.c powerpc: Remove unnecessary condition when sanity-checking WIMG bits 2008-07-15 12:24:59 +10:00
Makefile powerpc: Add virtual processor dispatch trace log 2009-03-24 13:47:28 +11:00
msi.c powerpc/pseries: Reject discontiguous/non-zero based MSI-X requests 2009-03-11 17:11:33 +11:00
nvram.c [POWERPC] Add missing of_node_put in pseries/nvram.c 2008-06-16 15:00:32 +10:00
pci_dlpar.c powerpc/pseries: Remove write only variable in PCI DLPAR 2009-02-11 13:37:59 +11:00
pci.c
phyp_dump.c powerpc: Printing fix for l64 to ll64 conversion: phyp_dump.c 2009-01-28 17:15:51 +11:00
plpar_wrappers.h powerpc: Add virtual processor dispatch trace log 2009-03-24 13:47:28 +11:00
power.c [POWERPC] Fix warning in pseries/power.c 2008-02-20 13:33:37 +11:00
pseries.h [POWERPC] Move prototype for find_udbg_vterm() into a header file 2008-04-17 10:00:59 +10:00
ras.c [POWERPC] Fix sparse warnings in arch/powerpc/platforms/pseries 2008-05-14 22:32:02 +10:00
reconfig.c powerpc/pseries: Failed reconfig notifier chain call cleanup 2009-03-24 13:43:52 +11:00
rtasd.c "Tree RCU": scalable classic RCU implementation 2008-12-18 21:56:04 +01:00
scanlog.c [POWERPC] Assign PDE->data before gluing PDE into /proc tree 2008-05-05 16:47:14 +10:00
setup.c Remove asm/a.out.h files for all architectures without a.out support. 2008-09-06 19:30:24 +01:00
smp.c powerpc: Use cpu_thread_in_core in smp_init for of_spin_map 2008-10-21 15:19:12 +11:00
xics.c irq: update all arches for new irq_desc 2009-01-12 15:27:13 -08:00
xics.h powerpc/xics: Consolidate ipi message encode and decode 2008-10-13 16:24:16 +11:00