linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-28 13:51:44 +00:00

Author	SHA1	Message	Date
Richard A Lary	82578e192b	powerpc/eeh: Display eeh error location for bus and device For adapters which have devices under a PCIe switch/bridge it is informative to display information for both the PCIe switch/bridge and the device on which the bus error was detected. rebased to powerpc-next Signed-off-by: Richard A Lary <rlary@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-05-06 13:32:31 +10:00
Breno Leitao	8d3d50bf19	powerpc/eeh: Fix a bug when pci structure is null During a EEH recover, the pci_dev structure can be null, mainly if an eeh event is detected during cpi config operation. In this case, the pci_dev will not be known (and will be null) the kernel will crash with the following message: Unable to handle kernel paging request for data at address 0x000000a0 Faulting instruction address: 0xc00000000006b8b4 Oops: Kernel access of bad area, sig: 11 [#1] NIP [c00000000006b8b4] .eeh_event_handler+0x10c/0x1a0 LR [c00000000006b8a8] .eeh_event_handler+0x100/0x1a0 Call Trace: [c0000003a80dff00] [c00000000006b8a8] .eeh_event_handler+0x100/0x1a0 [c0000003a80dff90] [c000000000031f1c] .kernel_thread+0x54/0x70 The bug occurs because pci_name() tries to access a null pointer. This patch just guarantee that pci_name() is not called on Null pointers. Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com> Signed-off-by: Linas Vepstas <linasvepstas@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-17 14:02:47 +11:00
Frans Pop	8354be9c10	powerpc: Remove trailing space in messages Signed-off-by: Frans Pop <elendil@planet.nl> Cc: linuxppc-dev@ozlabs.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-09 13:56:23 +11:00
Michael Ellerman	59e3f83702	powerpc/pseries: Use irq_has_action() in eeh_disable_irq() Rather than open-coding our own check, use irq_has_action() to check if an irq has an action - ie. is "in use". irq_has_action() doesn't take the descriptor lock, but it shouldn't matter - we're just using it as an indicator that the irq is in use. disable_irq_nosync() will take the descriptor lock before doing anything also. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-30 17:20:54 +11:00
Zhang, Yanmin	70298c6e6c	PCI AER: support Multiple Error Received and no error source id Based on PCI Express AER specs, a root port might receive multiple TLP errors while it could only save a correctable error source id and an uncorrectable error source id at the same time. In addition, some root port hardware might be unable to provide a correct source id, i.e., the source id, or the bus id part of the source id provided by root port might be equal to 0. The patchset implements the support in kernel by searching the device tree under the root port. Patch 1 changes parameter cb of function pci_walk_bus to return a value. When cb return non-zero, pci_walk_bus stops more searching on the device tree. Reviewed-by: Andrew Patterson <andrew.patterson@hp.com> Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-06-16 14:30:13 -07:00
Mike Mason	c58dc575f3	powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset() While adding native EEH support to Emulex and Qlogic drivers, it was discovered that dev->error_state was set to pci_io_channel_normal too late in the recovery process. These drivers rely on error_state to determine if they can access the device in their slot_reset callback, thus error_state needs to be set to pci_io_channel_normal in eeh_report_reset(). Below is a detailed explanation (courtesy of Richard Lary) as to why this is necessary. Background: PCI MMIO or DMA accesses to a frozen slot generate additional EEH errors. If the number of additional EEH errors exceeds EEH_MAX_FAILS the adapter will be shutdown. To avoid triggering excessive EEH errors and an undesirable adapter shutdown, some drivers use the pci_channel_offline(dev) wrapper function to return a Boolean value based on the value of pci_dev->error_state to determine if PCI MMIO or DMA accesses are safe. If the wrapper returns TRUE, drivers must not make PCI MMIO or DMA access to their hardware. The pci_dev structure member error_state reflects one of three values, 1) pci_channel_io_normal, 2) pci_channel_io_frozen, 3) pci_channel_io_perm_failure. Function pci_channel_offline(dev) returns TRUE if error_state is pci_channel_io_frozen or pci_channel_io_perm_failure. The EEH driver sets pci_dev->error_state to pci_channel_io_frozen at the point where the PCI slot is frozen. Currently, the EEH driver restores dev->error_state to pci_channel_io_normal in eeh_report_resume() before calling the driver's resume callback. However, when the EEH driver calls the driver's slot_reset callback() from eeh_report_reset(), it incorrectly indicates the error state is still pci_channel_io_frozen. Waiting until eeh_report_resume() to restore dev->error_state to pci_channel_io_normal is too late for Emulex and QLogic FC drivers and any other drivers which are designed to use common code paths in these two cases: i) those called after the driver's slot_reset callback() and ii) those called after the PCI slot is frozen but before the driver's slot_reset callback is called. Case i) all driver paths executed to reinitialize the hardware after a reset and case ii) all code paths executed by driver kernel threads that run asynchronous to the main driver thread, such as interrupt handlers and worker threads to process driver work queues. Emulex and QLogic FC drivers are designed with common code paths which require that pci_channel_offline(dev) reflect the true state of the hardware. The state transitions that the hardware takes from Normal Operations to Slot Frozen to Reset to Normal Operations are documented in the Power Architecture™ Platform Requirements+ (PAPR+) in Table 75. PE State Control. PAPR defines the following 3 states: 0 -- Not reset, Not EEH stopped, MMIO load/store allowed, DMA allowed (Normal Operations) 1 -- Reset, Not EEH stopped, MMIO load/store disabled, DMA disabled 2 -- Not reset, EEH stopped, MMIO load/store disabled, DMA disabled (Slot Frozen) An EEH error places the slot in state 2 (Frozen) and the adapter driver is notified that an EEH error was detected. If the adapter driver returns PCI_ERS_RESULT_NEED_RESET, the EEH driver calls eeh_reset_device() to place the slot into state 1 (Reset) and eeh_reset_device completes by placing the slot into State 0 (Normal Operations). Upon return from eeh_reset_device(), the EEH driver calls eeh_report_reset, which then calls the adapter's slot_reset callback. At the time the adapter's slot_reset callback is called, the true state of the hardware is Normal Operations and should be accurately reflected by setting dev->error_state to pci_channel_io_normal. The current implementation of EEH driver does not do so and requires this change to correct this deficiency. Signed-off-by: Mike Mason <mmlnx@us.ibm.com> Acked-by: Linas Vepstas <linasvepstas@gmail.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-04-15 15:23:53 +10:00
Mike Mason	8535ef05a6	powerpc/eeh: Only disable/enable LSI interrupts in EEH The EEH code disables and enables interrupts during the device recovery process. This is unnecessary for MSI and MSI-X interrupts because they are effectively disabled by the DMA Stopped state when an EEH error occurs. The current code is also incorrect for MSI-X interrupts. It doesn't take into account that MSI-X interrupts are tracked in a different way than LSI/MSI interrupts. This patch ensures only LSI interrupts are disabled/enabled. Signed-off-by: Mike Mason <mmlnx@us.ibm.com> Acked-by: Linas Vepstas <linasvepstas@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 16:00:08 +11:00
Tony Breeds	dcfcfe7567	powerpc: Guard print_device_node_tree() with #if 0 Currently print_device_node_tree() isn't called but it can be useful for debugging. Leave the function there but hide it behind '#if 0' to save it being rewritten. If you want to call it you're already editing this file anyway. ;P Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-08-20 16:34:57 +10:00
Andrew Morton	8e01520c06	[POWERPC] Fix warning in pseries/eeh_driver.c Fix this: /usr/src/devel/arch/powerpc/platforms/pseries/eeh_driver.c: In function 'print_device_node_tree': /usr/src/devel/arch/powerpc/platforms/pseries/eeh_driver.c:55: warning: ISO C90 forbids mixed declarations and code also make that function look like it's part of Linux. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-06-16 15:00:44 +10:00
Stephen Rothwell	b76e5e9398	[POWERPC] EEH: Avoid a possible NULL pointer dereference Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-12-11 13:46:12 +11:00
Linas Vepstas	5f1a7c811b	[POWERPC] EEH: Report errors as soon as possible Do not wait for the pci slot status before reporting an error to the device driver. Some systems may take many seconds to report the slot status, and this can confuse unsuspecting device drivers. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-12-03 13:56:26 +11:00
Linas Vepstas	2a50f144fc	[POWERPC] EEH: Drivers that need reset trump others Bugfix: if a driver controlling one part of a multi-function PCI card has asked for a reset, honor that request above all others. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-11-08 14:15:32 +11:00
Linas Vepstas	638799b335	[POWERPC] EEH: Clean up comments Clean up commentary, remove dead code. Signed-off-by Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-11-08 14:15:32 +11:00
Linas Vepstas	3c8c90ab88	[POWERPC] Tweak EEH copyright info Twiddle the copyright notices. Per current guidelines, the use of the (C) or (c) in source code is deprecated. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/platforms/pseries/eeh.c \| 6 +++++- arch/powerpc/platforms/pseries/eeh_cache.c \| 3 ++- arch/powerpc/platforms/pseries/eeh_driver.c \| 6 +++--- 3 files changed, 10 insertions(+), 5 deletions(-) Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-06-14 22:29:56 +10:00
Linas Vepstas	17213c3bf6	[POWERPC] Assorted janitorial EEH cleanups Assorted minor cleanups to EEH code; -- use literals, use kerneldoc format. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/platforms/pseries/eeh.c \| 13 ++++++++++--- arch/powerpc/platforms/pseries/eeh_driver.c \| 7 ++++--- include/asm-powerpc/ppc-pci.h \| 18 +++++++++++++++--- 3 files changed, 29 insertions(+), 9 deletions(-) Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-10 21:28:13 +10:00
Linas Vepstas	b455b24cf2	[POWERPC] EEH: Split up long error msg Make some minor adjustments to the EEH error messages. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-09 16:35:01 +10:00
Linas Vepstas	ede8ca269f	[POWERPC] EEH: log error only after driver notification. It turns out many/most versions of firmware enable MMIO when the slto-error-detail rtas call is made (in violation of the architecture). Thus, it would be best to call slot-error-detail only after notifying device drivers of a freeze, as otherwise, a variety of strange and unexpected things may happen. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-09 16:35:00 +10:00
Stephen Rothwell	e2eb63927b	[POWERPC] Rename get_property to of_get_property: arch/powerpc Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-04-13 03:55:19 +10:00
Linas Vepstas	4980d5eb75	[POWERPC] EEH: restructure multi-function support Rework how multi-function PCI devices are identified and traversed. This fixes a bug with multi-function recovery on Power4 that was introduced by a recent Power4 EEH patch. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-03-22 22:52:57 +11:00
Linas Vepstas	fa1be476a2	[POWERPC] EEH: verify state change After requesting a state change, verify that the state change actually ocurred, and the system ends up in the expected state. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-03-22 22:52:56 +11:00
Linas Vepstas	d0ab95ca98	[POWERPC] EEH: rm un-needed data The EEH event notification system passes around data that is not needed or at least, not used properly. Stop passing this data; get it in a more reliable fashion. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-03-22 22:52:55 +11:00
Linas Vepstas	5794dbcbab	[POWERPC] EEH: multifunction recovery bugfix If the second or higher function of a multi-function device fails to recover, this failure is not reported upwards. Fix this. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-03-22 22:52:53 +11:00
Linas Vepstas	90fdd6130f	[POWERPC] EEH: hotplug recovery bugfix If a device driver does not have native PCI error recovery, a hotplug error recovery will be attemped. In this case, the device driver will not report back whether its healthy or not; simply assume that it is. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-03-22 22:52:52 +11:00
Linas Vepstas	e0f90b6418	[POWERPC] EEH: Add clarifying messages. There are multiple code patchs tht resuls in a "permanent failure"; when examining rare events, it can be hard to see which was taken. This patch adds printk's to assist. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-03-22 22:52:50 +11:00
Linas Vepstas	a885902de3	[POWERPC] Clarify EEH error message Clarify error message re EEH permanent failure. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-01-24 21:13:56 +11:00
Linas Vepstas	d0e70341c0	[POWERPC] EEH recovery tweaks If one attempts to create a device driver recovery sequence that does not depend on a hard reset of the device, but simply just attempts to resume processing, then one discovers that the recovery sequence implemented on powerpc is not quite right. This patch fixes this up. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-12-08 17:10:18 +11:00
Linas Vepstas	6a1ca373a1	[POWERPC] EEH: support MMIO enable recovery step Update to the PowerPC PCI error recovery code. Add code to enable MMIO if a device driver reports that it is capable of recovering on its own. One anticipated use of this having a device driver enable MMIO so that it can take a register dump, which might then be followed by the device driver requesting a full reset. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-09-21 22:59:20 +10:00
Linas Vepstas	cb5b562444	[POWERPC] EEH: code comment cleanup Clean up subroutine documentation; mostly formatting changes, with some new content. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-09-21 22:59:10 +10:00
Jeremy Kerr	954a46e2d5	[POWERPC] pseries: Constify & voidify get_property() Now that get_property() returns a void *, there's no need to cast its return value. Also, treat the return value as const, so we can constify get_property later. pseries platform changes. Built for pseries_defconfig Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-07-31 15:55:04 +10:00
Adrian Bunk	80f7228b59	typo fixes: occuring -> occurring Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 18:27:16 +02:00
Linas Vepstas	0aa8d15b01	[POWERPC] pseries: Print PCI slot location code on failure The PCI error recovery code will printk diagnostic info when a PCI error event occurs. Change the messages to include the slot location code, which is how most sysadmins will know the device. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-06-21 15:01:32 +10:00
Linas Vepstas	4240545661	[PATCH] powerpc/pseries: Increment fail counter in PCI recovery When a PCI device driver does not support PCI error recovery, the powerpc/pseries code takes a walk through a branch of code that resets the failure counter. Because of this, if a broken PCI card is present, the kernel will attempt to reset it an infinite number of times. (This is annoying but mostly harmless: each reset takes about 10-20 seconds, and uses almost no CPU time). This patch preserves the failure count across resets. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-05-19 13:51:12 +10:00
Linas Vepstas	ac325acd50	[PATCH] powerpc/pseries: clear PCI failure counter if no new failures The current PCI error recovery system keeps track of the number of PCI card resets, and refuses to bring a card back up if this number is too large. The goal of doing this was to avoid an infinite loop of resets if a card is obviously dead. However, if the failures are rare, but the machine has a high uptime, this mechanism might still be triggered; this is too harsh. This patch will avoids this problem by decrementing the fail count after an hour. Thus, as long as a pci card BSOD's less than 6 times an hour, it will continue to be reset indefinitely. If it's failure rate is greater than that, it will be taken off-line permanently. This patch is larger than it might otherwise be because it changes indentation by removing a pointless while-loop. The while loop is not needed, as the handler is invoked once fo each event (by schedule_work()); the loop is leftover cruft from an earlier implementation. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-22 18:46:13 +10:00
Linas Vepstas	a219be2cf4	[PATCH] powerpc/pseries: fix device name printing, again. The recent patch to print device names in EEH reset messages was lacking ... this patch works better. Signed-off-by: Linas Vepstas <linas@linas.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-01 22:37:02 +11:00
Linas Vepstas	8df83028cf	[PATCH] powerpc/pseries: print message if EEH recovery fails The current code prints an ambiguous message if the recovery of a failed PCI device fails. Give this special case its own unique message. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-04-01 22:35:01 +11:00
Linas Vepstas	b4f382a3e5	[PATCH] powerpc/pseries: Cleanup device name printing. This avoids printk'ing a NULL string. Signed-off-by: Linas Vepstas <linas@linas.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-03-27 14:48:46 +11:00
Olaf Hering	273d280381	[PATCH] powerpc: fix NULL pointer in handle_eeh_events This patch fixes a crash in handle_eeh_events, but ethtool -t still doesnt work right. ... pepino:~ # cpu 0x3: Vector: 300 (Data Access) at [c00000005192bbe0] pc: c00000000004a380: .handle_eeh_events+0xe0/0x23c lr: c00000000004a374: .handle_eeh_events+0xd4/0x23c sp: c00000005192be60 msr: 9000000000009032 dar: 268 dsisr: 40000000 current = 0xc0000001fe7bf1a0 paca = 0xc00000000048b280 pid = 16322, comm = eehd enter ? for help [c00000005192bf00] c00000000004a808 .eeh_event_handler+0xcc/0x130 [c00000005192bf90] c000000000025e00 .kernel_thread+0x4c/0x68 ... (none):/# /usr/sbin/ethtool -i eth0 driver: e100 version: 3.5.10-k2-NAPI firmware-version: N/A bus-info: 0000:21:01.0 (none):/# /usr/sbin/ethtool -t eth0 Call Trace: [C00000000F8DEFF0] [C00000000000F270] .show_stack+0x74/0x1b4 (unreliable) [C00000000F8DF0A0] [C000000000049D04] .eeh_dn_check_failure+0x290/0x2d8 [C00000000F8DF150] [C000000000049E58] .eeh_check_failure+0x10c/0x138 [C00000000F8DF1E0] [C0000000002DFDB0] .e100_hw_reset+0x70/0xf4 [C00000000F8DF270] [C0000000002E1BBC] .e100_hw_init+0x2c/0x260 [C00000000F8DF310] [C0000000002E2464] .e100_loopback_test+0x8c/0x220 [C00000000F8DF3C0] [C0000000002E28DC] .e100_diag_test+0xdc/0x16c [C00000000F8DF490] [C000000000420BE0] .dev_ethtool+0xf24/0x14f8 [C00000000F8DF8F0] [C00000000041F4A8] .dev_ioctl+0x5cc/0x740 [C00000000F8DFA20] [C00000000040FEFC] .sock_ioctl+0x3d0/0x404 [C00000000F8DFAC0] [C0000000000D513C] .do_ioctl+0x68/0x108 [C00000000F8DFB50] [C0000000000D56B0] .vfs_ioctl+0x4d4/0x510 [C00000000F8DFC10] [C0000000000D5740] .sys_ioctl+0x54/0x94 [C00000000F8DFCC0] [C0000000000FB6EC] .ethtool_ioctl+0x11c/0x150 [C00000000F8DFD60] [C0000000000F7E40] .compat_sys_ioctl+0x338/0x3bc [C00000000F8DFE30] [C00000000000871C] syscall_exit+0x0/0x40 EEH: Detected PCI bus error on device 0000:21:01.0 EEH: This PCI device has failed 1 times since last reboot: <NULL> - modprobe: FATAL: Could not load /lib/modules/2.6.16-rc4-git7/modules.dep: No such file or directory Cannot get strings: No such device (none):/# (none):/# EEH: Unable to configure device bridge (-3) for /pci@400000000110/pci@2,2 (none):/# Call Trace: [C00000000FA17940] [C00000000000F270] .show_stack+0x74/0x1b4 (unreliable) [C00000000FA179F0] [C000000000049D04] .eeh_dn_check_failure+0x290/0x2d8 [C00000000FA17AA0] [C00000000001E114] .rtas_read_config+0x120/0x154 [C00000000FA17B40] [C000000000049664] .early_enable_eeh+0x274/0x2bc [C00000000FA17C00] [C000000000049708] .eeh_add_device_early+0x5c/0x6c [C00000000FA17C90] [C000000000049748] .eeh_add_device_tree_early+0x30/0x5c [C00000000FA17D20] [C000000000046568] .pcibios_add_pci_devices+0x8c/0x1f8 [C00000000FA17DD0] [C00000000004A528] .eeh_reset_device+0xe0/0x110 [C00000000FA17E60] [C00000000004A698] .handle_eeh_events+0x140/0x250 [C00000000FA17F00] [C00000000004AC7C] .eeh_event_handler+0xe8/0x140 [C00000000FA17F90] [C000000000025784] .kernel_thread+0x4c/0x68 EEH: Detected PCI bus error on device <NULL> EEH: This PCI device has failed 1 times since last reboot: <NULL> - EEH: Unable to configure device bridge (-3) for /pci@400000000110/pci@2,2 Call Trace: [C00000000FA17940] [C00000000000F270] .show_stack+0x74/0x1b4 (unreliable) [C00000000FA179F0] [C000000000049D04] .eeh_dn_check_failure+0x290/0x2d8 [C00000000FA17AA0] [C00000000001E114] .rtas_read_config+0x120/0x154 [C00000000FA17B40] [C000000000049664] .early_enable_eeh+0x274/0x2bc [C00000000FA17C00] [C000000000049708] .eeh_add_device_early+0x5c/0x6c [C00000000FA17C90] [C000000000049748] .eeh_add_device_tree_early+0x30/0x5c [C00000000FA17D20] [C000000000046568] .pcibios_add_pci_devices+0x8c/0x1f8 [C00000000FA17DD0] [C00000000004A528] .eeh_reset_device+0xe0/0x110 [C00000000FA17E60] [C00000000004A698] .handle_eeh_events+0x140/0x250 [C00000000FA17F00] [C00000000004AC7C] .eeh_event_handler+0xe8/0x140 [C00000000FA17F90] [C000000000025784] .kernel_thread+0x4c/0x68 EEH: Detected PCI bus error on device <NULL> EEH: This PCI device has failed 1 times since last reboot: <NULL> - EEH: Unable to configure device bridge (-3) for /pci@400000000110/pci@2,2 Call Trace: [C00000000FA17940] [C00000000000F270] .show_stack+0x74/0x1b4 (unreliable) [C00000000FA179F0] [C000000000049D04] .eeh_dn_check_failure+0x290/0x2d8 [C00000000FA17AA0] [C00000000001E114] .rtas_read_config+0x120/0x154 [C00000000FA17B40] [C000000000049664] .early_enable_eeh+0x274/0x2bc [C00000000FA17C00] [C000000000049708] .eeh_add_device_early+0x5c/0x6c [C00000000FA17C90] [C000000000049748] .eeh_add_device_tree_early+0x30/0x5c [C00000000FA17D20] [C000000000046568] .pcibios_add_pci_devices+0x8c/0x1f8 [C00000000FA17DD0] [C00000000004A528] .eeh_reset_device+0xe0/0x110 [C00000000FA17E60] [C00000000004A698] .handle_eeh_events+0x140/0x250 [C00000000FA17F00] [C00000000004AC7C] .eeh_event_handler+0xe8/0x140 [C00000000FA17F90] [C000000000025784] .kernel_thread+0x4c/0x68 EEH: Detected PCI bus error on device <NULL> and so on Signed-off-by: Olaf Hering <olh@suse.de> Acked-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2006-02-28 16:25:54 +11:00
Al Viro	d04e4e115b	[PATCH] eeh_driver NULL noise removal Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-02-07 20:58:33 -05:00
Paul Mackerras	18eb3b398d	powerpc: Fix up some compile errors in the PCI error recovery code <asm/systemcfg.h> is gone now, and the PCI error recovery constants in include/linux/pci.h changed their names in the process of getting accepted. Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from 5a2516156c591fc3d2059fbd93f97e15eb6010d6 commit)	2006-01-10 15:32:31 +11:00
Linas Vepstas	3914ac7b0e	[PATCH] powerpc: handle multifunction PCI devices properly 239-eeh-multifunction-consolidate.patch New-style firmware will often place multiple different functions under a non-EEH-aware parent. However, these devices might share a common PE "partition endpoint" and config address, ad thus any EEH events will affect all of the devices in common. This patch makes the effort to find all of these common devices and handle them together. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from 216810296bb97d39da8e176822e9de78d2f00187 commit)	2006-01-10 15:30:23 +11:00
Linas Vepstas	b6495c0c8f	[PATCH] powerpc: Don't continue with PCI Error recovery if slot reset failed. 238-eeh-stop-if-reset_failed.patch If the firmware is unable to reset the PCI slot for some reason, then don't attempt any further recovery steps after that point. Instead, mark the device as permanently failed. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from e06b942521eb2cdaf232726f45a820d5837acb12 commit)	2006-01-10 15:30:14 +11:00
Linas Vepstas	9fb40eb883	[PATCH] powerpc: Remove duplicate code 234-eeh-find-pe.patch The find_device_pe() routine is duplicated in two files. Remove one of the two copies, declare the other extern. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from 48408e708282d4d0269136ff27ea5acbd9410b5a commit)	2006-01-10 15:29:33 +11:00
Linas Vepstas	77bd741561	[PATCH] powerpc: PCI Error Recovery: PPC64 core recovery routines Various PCI bus errors can be signaled by newer PCI controllers. The core error recovery routines are architecture dependent. This patch adds a recovery infrastructure for the PPC64 pSeries systems. Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> (cherry picked from e8ca11b460c4c9c7fa6b529be221529ebd770e38 commit)	2006-01-10 15:28:32 +11:00

43 Commits