Commit Graph

19 Commits

Author SHA1 Message Date
Zhang, Yanmin
3d5505c56d PCI AER: multiple error support
When a root port receives the same errors more than once before the
kernel process them, the Multiple Error Messages Received flags are set
by hardware. Because the root port could only save one kind of
correctable error source id and another uncorrectable error source id at
the same time, the second message sender id is lost if the 2 messages
are sent from 2 different devices. This patch makes the kernel search
all devices under the root port when multiple messages are received.

Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-06-16 14:30:14 -07:00
Zhang, Yanmin
28eb27cf08 PCI AER: support invalid error source IDs
When the bus id part of error source id is equal to 0 or nosourceid=1,
make the kernel probe the AER status registers of all devices under the
root port to find the initial error reporter.

Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-06-16 14:30:13 -07:00
Zhang, Yanmin
70298c6e6c PCI AER: support Multiple Error Received and no error source id
Based on PCI Express AER specs, a root port might receive multiple
TLP errors while it could only save a correctable error source id
and an uncorrectable error source id at the same time. In addition,
some root port hardware might be unable to provide a correct source
id, i.e., the source id, or the bus id part of the source id provided
by root port might be equal to 0.

The patchset implements the support in kernel by searching the device
tree under the root port.

Patch 1 changes parameter cb of function pci_walk_bus to return a value.
When cb return non-zero, pci_walk_bus stops more searching on the
device tree.

Reviewed-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-06-16 14:30:13 -07:00
Andrew Patterson
43c1640884 PCI: Add support for turning PCIe ECRC on or off
Adds support for PCI Express transaction layer end-to-end CRC checking
(ECRC).  This patch will enable/disable ECRC checking by setting/clearing
the ECRC Check Enable and/or ECRC Generation Enable bits for devices that
support ECRC.

The ECRC setting is controlled by the "pci=ecrc=<policy>" command-line
option. If this option is not set or is set to 'bios", the enable and
generation bits are left in whatever state that firmware/BIOS set them to.
The "off" setting turns them off, and the "on" option turns them on (if the
device supports it).

Turning ECRC on or off can be a data integrity versus performance
tradeoff.  In theory, turning it on will catch more data errors, turning
it off means possibly better performance since CRC does not need to be
calculated by the PCIe hardware and packet sizes are reduced.

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-06-11 12:04:21 -07:00
Rafael J. Wysocki
22106368c9 PCI: PCIe portdrv: Remove struct pcie_port_service_id
The PCI Express port driver uses 'struct pcie_port_service_id' for
matching port service devices and drivers, but this structure
contains fields that duplicate information from the port device
itself (vendor, device, subvendor, subdevice) and fields that are not
used by any existing port service driver (class, class_mask,
drvier_data).  Also, both existing port service drivers (AER and
PCIe HP) don't even use the vendor and device fields for device
matching.  Therefore 'struct pcie_port_service_id' can be removed
altogether and the only useful members of it (port_type, service) can
be introduced directly into the port service device and port service
driver structures.  That simplifies the code quite a bit and reduces
its size.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-19 19:29:23 -07:00
Alex Chiang
cb4cb4ac73 PCIe: AER: during disable, check subordinate before walking
Commit 47a8b0cc (Enable PCIe AER only after checking firmware
support) wants to walk the PCI bus in the remove path to disable
AER, and calls pci_walk_bus for downstream bridges.

Unfortunately, in the remove path, we remove devices and bridges
in a depth-first manner, starting with the furthest downstream
bridge and working our way backwards.

The furthest downstream bridges will not have a dev->subordinate,
and we hit a NULL deref in pci_walk_bus.

Check for dev->subordinate first before attempting to walk the
PCI hierarchy below us.

Acked-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2009-03-12 15:09:51 -04:00
Andrew Patterson
1f9f13c8d5 PCI: Enable PCIe AER only after checking firmware support
The PCIe port driver currently sets the PCIe AER error reporting bits for
any root or switch port without first checking to see if firmware will grant
control. This patch moves setting these bits to the AER service driver
aer_enable_port routine.  The bits are then set for the root port and any
downstream switch ports after the check for firmware support (aer_osc_setup)
is made. The patch also unsets the bits in a similar fashion when the AER
service driver is unloaded.

Reviewed-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@hobbes.lan>
2009-02-24 09:47:46 -08:00
Hidetoshi Seto
b0b801dd7d PCI: fix aer resume sanity check
What we have to check here before calling is err_handler->resume, not
->slot_reset.  Looks like a copy & paste error from report_slot_reset.

Acked-by: Yanmin Zhang <yanmin.zhang@intel.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-12-16 13:26:45 -08:00
Yu Zhao
270c66be9b PCI: fix AER capability check
The 'use pci_find_ext_capability everywhere' cleanup brought a new bug,
which makes the AER stop working.  Fix it by actually using find_ext_cap
instead of just find_cap.  Drop the unused config space size define while
we're at it.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-10-20 11:01:52 -07:00
Jesse Barnes
0927678f55 PCI: use pci_find_ext_capability everywhere
Remove some open coded (and buggy) versions of pci_find_ext_capability
in favor of the real routine in the PCI core.

Tested-by: Tomasz Czernecki <czernecki@gmail.com>
Acked-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-10-20 11:01:51 -07:00
Bjorn Helgaas
531f254e5c PCIE: aer: use dev_printk when possible
Convert printks to use dev_printk().

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-25 16:05:16 -07:00
Harvey Harrison
66bef8c059 PCI: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20 21:47:09 -07:00
Adrian Bunk
21c6847406 PCI: #if 0 pci_cleanup_aer_correct_error_status()
#if 0 the no longer used pci_cleanup_aer_correct_error_status().

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-20 21:47:02 -07:00
Randy Dunlap
d885c6b75b pci-aer: fix kernel-doc mistakes
Fix kernel-doc parameter names and ending block comments (change **/
to */).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Linas Vepstas <linas@linas.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-28 14:35:26 -08:00
Stephen Hemminger
f0dce41193 PCI aer: add pci_cleanup_aer_correct_aer_status
Function to clear bogus correctable errors. Analog to pci_aer_uncorrect_are_status.
The Marvell chips seem to start out with a bogus value that needs to be
cleared.

Yanmin ported it to 2.6.22-rc4 by fixing a fuzz patch applying info.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Acked-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:02:08 -07:00
Zhang, Yanmin
8d29bfb79e PCI: fix AER driver error information
Below patch fixes aer driver error information and enables aer driver
although CONFIG_ACPI=n.

As a matter of fact, the new patch is created from below 2 patches plus
a minor patch apply fuzz fixing. Because the second patch fixed a compilation
error introduced by the first patch, I merge them to facilitate bisect.


1) http://marc.info/?l=linux-kernel&m=117783233918191&w=2;
2) http://marc.info/?l=linux-mm-commits&m=118046936720790&w=2


Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-07-11 16:02:08 -07:00
David Howells
65f27f3844 WorkStruct: Pass the work_struct pointer instead of context data
Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.

For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.

To make this work, an extra flag is introduced into the management side of the
work_struct.  This governs auto-release of the structure upon execution.

Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function.  This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated..  This is a
problem if the auxiliary data is taken away (as done by the last patch).

However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems.  But then the work function must itself release the
work_struct by calling work_release().

In most cases, automatic release is fine, so this is the default.  Special
initiators exist for the non-auto-release case (ending in _NAR).


Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:55:48 +00:00
Greg Kroah-Hartman
b19441af18 PCI: fix __must_check warnings
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-26 17:43:53 -07:00
Zhang, Yanmin
6c2b374d74 PCI-Express AER implemetation: AER core and aerdriver
Patch 3 implements the core part of PCI-Express AER and aerdrv
port service driver.

When a root port service device is probed, the aerdrv will call
request_irq to register irq handler for AER error interrupt.

When a device sends an PCI-Express error message to the root port,
the root port will trigger an interrupt, by either MSI or IO-APIC,
then kernel would run the irq handler. The handler collects root
error status register and schedules a work. The work will call
the core part to process the error based on its type
(Correctable/non-fatal/fatal).

As for Correctable errors, the patch chooses to just clear the correctable
error status register of the device.

As for the non-fatal error, the patch follows generic PCI error handler
rules to call the error callback functions of the endpoint's driver. If
the device is a bridge, the patch chooses to broadcast the error to
downstream devices.

As for the fatal error, the patch resets the pci-express link and
follows generic PCI error handler rules to call the error callback
functions of the endpoint's driver. If the device is a bridge, the patch
chooses to broadcast the error to downstream devices.

Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-26 17:43:53 -07:00