linux

Author	SHA1	Message	Date
Borislav Petkov	cab4d27764	amd64_edac: Do not falsely trigger kerneloops An unfortunate "WARNING" in the message amd64_edac dumps when the system doesn't support DRAM ECC or ECC checking is not enabled in the BIOS used to trigger kerneloops which qualified the message as an OOPS thus misleading the users. See, e.g. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/422536 http://bugzilla.kernel.org/show_bug.cgi?id=15238 Downgrade the message level to KERN_NOTICE and fix the formulation. Cc: stable@kernel.org # .32.x Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Doug Thompson <dougthompson@xmission.com>	2010-02-11 20:32:14 +01:00
Tamas Vincze	118f3e1afd	edac: i5000_edac critical fix panic out of bounds EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4) Kernel panic - not syncing: EDAC MC0: Uncorrected Error (XEN) Domain 0 crashed: 'noreboot' set - not rebooting. This happens because FERR_NF_FBD bit 28 is not updated on i5000. Due to that, both bits 28 and 29 may be equal to one, returning channel = 3. As this value is invalid, EDAC core generates the panic. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14568 Signed-off-by: Tamas Vincze <tom@vincze.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-01-16 12:15:38 -08:00
Roel Kluin	926311fd7d	amd64_edac: Ensure index stays within bounds in amd64_get_scrub_rate Add a missing iterator variable thus fixing the conditional of the for-loop in amd64_get_scrub_rate(). Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-01-15 10:45:58 +01:00
Borislav Petkov	5213c32f9d	edac, pci: remove pesky debug printk Do not spam the logs needlessly with the sole info that edac_pci_dev_parity_clear is being called. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-24 11:07:09 +01:00
Borislav Petkov	92389102b6	amd64_edac: restrict PCI config space access Do not access F2x19[0,4] on K8 since they're undefined there. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-24 11:07:08 +01:00
Borislav Petkov	43f5e68733	amd64_edac: fix forcing module load/unload Clear the override flag after force-loading the module. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-24 11:07:08 +01:00
Borislav Petkov	56b34b91e2	amd64_edac: make driver loading more robust Currently, the module does not initialize fully when the DIMMs aren't ECC but remains still loaded. Propagate the error when no instance of the driver is properly initialized and prevent further loading. Reorganize and polish error handling in amd64_edac_init() while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-24 11:07:07 +01:00
Borislav Petkov	8f68ed9728	amd64_edac: fix driver instance freeing Fix use-after-free errors by pushing all memory-freeing calls to the end of amd64_remove_one_instance(). Reported-by: Darren Jenkins <darrenrjenkins@gmail.com> LKML-Reference: <1261370306.11354.52.camel@ICE-BOX> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-24 11:07:07 +01:00
Borislav Petkov	603adaf6b3	amd64_edac: fix K8 chip select reporting Fix the case when amd64_debug_display_dimm_sizes() reports only half the amount of DRAM on it because it doesn't account for when the single DCT operates in 128-bit mode and merges chip selects from different DIMMs. Reported-by: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de> LKML-Reference: <200912112202.48173.johannes.hirte@fem.tu-ilmenau.de> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-24 11:07:07 +01:00
Linus Torvalds	661e338f72	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: edac, mce, amd: silence GART TLB errors edac, mce: correct corenum reporting	2009-12-16 10:09:43 -08:00
Borislav Petkov	256f7276af	edac, mce, amd: silence GART TLB errors Although reporting of benign GART TLB errors is disabled in __mcheck_cpu_apply_quirks, those are still being logged, and, as a result, trip up amd64_edac. Pull up reporting check so that machines with loaded edac module bail out early and don't spit fragments into dmesg. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-16 17:48:39 +01:00
Nils Carlson	bbead2104e	edac: i5100 add 6 ranks per channel Add support for 6 ranks per channel to the i5100 chipset. I have tested the patch as far as possible with correctible errors and things appear good. The DIMM mapping is correct for our board, but boards may differ. Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se> Acked-by: Arthur Jones <ajones@riverbed.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:20:12 -08:00
Nils Carlson	295439f2a3	edac: i5100 add scrubbing Addscrubbing to the i5100 chipset. The i5100 chipset only supports one scrubbing rate, which is not constant but dependent on memory load. The rate returned by this driver is an estimate based on some experimentation, but is substantially closer to the truth than the speed supplied in the documentation. Also, scrubbing is done once, and then a done-bit is set. This means that to accomplish continuous scrubbing a re-enabling mechanism must be used. I have created the simplest possible such mechanism in the form of a work-queue which will check every five minutes. This interval is quite arbitrary but should be sufficient for all sizes of system memory. Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:20:12 -08:00
Nils Carlson	b18dfd05f9	edac: i5100 clean controller to channel terms The i5100 driver uses the word controller instead of channel in a lot of places, this is simply a cleanup of the patch. Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:20:12 -08:00
Borislav Petkov	35d8069234	edac, mce: correct corenum reporting Fix core number reporting with NB MCEs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-15 15:52:13 +01:00
Borislav Petkov	505422517d	x86, msr: Add support for non-contiguous cpumasks The current rd/wrmsr_on_cpus helpers assume that the supplied cpumasks are contiguous. However, there are machines out there like some K8 multinode Opterons which have a non-contiguous core enumeration on each node (e.g. cores 0,2 on node 0 instead of 0,1), see http://www.gossamer-threads.com/lists/linux/kernel/1160268. This patch fixes out-of-bounds writes (see URL above) by adding per-CPU msr structs which are used on the respective cores. Additionally, two helpers, msrs_{alloc,free}, are provided for use by the callers of the MSR accessors. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20091211171440.GD31998@aftab> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-12-11 10:59:21 -08:00
Borislav Petkov	df5b1606bd	amd64_edac: bump driver version This was long overdue ... Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-08 13:38:14 +01:00
Andrew Morton	18ba54ac12	amd64_edac: fix use-uninitialised bug drivers/edac/amd64_edac.c: In function 'amd64_edac_init': drivers/edac/amd64_edac.c:2840: warning: 'ret' may be used uninitialized in this function Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-08 13:38:13 +01:00
Borislav Petkov	bdc30a0c8c	amd64_edac: correct sys address to chip select mapping The routine does the reverse mapping of the error address of a CECC back to the node id, DRAM controller and chip select of the DIMM which caused the error. We should lookup the channel using the syndromes _only_ when the DCTs are ganged so fix that. Also, add an early exit when there's an error while scanning for the csrow thus decreasing indentation levels for better readability. Finally, fixup comments. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-08 13:38:12 +01:00
Borislav Petkov	bfc04aec7d	amd64_edac: add a leaner syndrome decoding algorithm Instead of using the whole syndrome tables for channel decoding, use a set of eigenvectors with which the tables can be generated to search for the syndrome in error. The algorithm operates independently of symbol size and can be used for both x4 and x8 syndromes. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-08 13:37:59 +01:00
Borislav Petkov	986a42a250	amd64_edac: remove early hw support check The .probe_valid_hardware low_ops member checked whether the DCTs are in DDR3 mode and bailed out if so. Now that all the needed changes for DDR3 support is in place, remove it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:31 +01:00
Borislav Petkov	6b4c0bdeb0	amd64_edac: detect DDR3 memory type Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:31 +01:00
Borislav Petkov	239642fe19	edac: add memory types strings for debugging Instead of using deeply-nested conditionals for dumping the DIMM type in debug mode, add a strings array of the supported DIMM types. This is useful in cases where an edac driver supports multiple DRAM types and is only defined in debug builds. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:31 +01:00
Borislav Petkov	cec7924f56	edac, mce: update AMD F10h revD check F10h revD start with model number 8. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:30 +01:00
Borislav Petkov	1f6bcee75e	amd64_edac: remove unneeded extract_error_address wrapper Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:30 +01:00
Borislav Petkov	44e9e2ee21	amd64_edac: rename StinkyIdentifier SystemAddress -> sys_addr Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:30 +01:00
Borislav Petkov	ad858bfa14	amd64_edac: remove superfluous dbg printk Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:29 +01:00
Borislav Petkov	1433eb9903	amd64_edac: enhance address to DRAM bank mapping Add cs mode to cs size mapping tables for DDR2 and DDR3 and F10 and all K8 flavors and remove klugdy table of pseudo values. Add a low_ops->dbam_to_cs member which is family-specific and replaces low_ops->dbam_map_to_pages since the pages calculation is a one liner now. Further cleanups, while at it: - shorten family name defines - align amd64_family_types struct members Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:29 +01:00
Borislav Petkov	d16149e8c3	amd64_edac: cleanup f10_early_channel_count Do not read DCLR[01] again since this is done in amd64_read_mc_registers() earlier. There can be more than two physical DIMMs present so clamp the channels value to max 2. Also, do not report DCT data width - it is also done earlier. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:29 +01:00
Borislav Petkov	8566c4df16	amd64_edac: dump DIMM sizes on K8 too Extend f10_debug_display_dimm_sizes to dump the logical DIMMs configuration on K8 revF too. Remove the ganged arg since we print the DCT operating mode (ganged vs unganged) earlier. Also, DCT csrow configuration is relevant therefore dump it as KERN_DEBUG instead of only on debug builds. Remove misleading DIMM output since there's no reliable way of mapping of chip selects to actual physical DIMMs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:28 +01:00
Borislav Petkov	8de1d91e62	amd64_edac: cleanup rest of amd64_dump_misc_regs Clarify bitfields description, add PCI config function/offset names to registers for easy reference, simplify code layout, remove unneeded info. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:28 +01:00
Borislav Petkov	68798e1760	amd64_edac: cleanup DRAM cfg low debug output Carve out the register-specific debug statements into a separate function, clarify meanings of the single bitfields in the register, remove irrelevant output and macros. There should be no functionality change resulting from this patch. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:28 +01:00
Borislav Petkov	6ba5dcdc44	amd64_edac: wrap-up pci config read error handling Add a pci config read wrapper for signaling pci config space access errors instead of them being visible only on a debug build. This is important on amd64_edac since it uses all those pci config register values to access the DRAM/DIMM configuration of the nodes. In addition, the wrapper makes a _lot_ (look at the diffstat!) of error handling code superfluous and improves much of the overall code readability by removing error handling details out of the way. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:27 +01:00
Borislav Petkov	f6d6ae9657	amd64_edac: unify MCGCTL ECC switching Unify almost identical code into one function and remove NUMA-specific usage (specifically cpumask_of_node()) in favor of generic topology methods. Remove unused defines, while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:27 +01:00
Rusty Russell	ba578cb34a	cpumask: use modern cpumask style in drivers/edac/amd64_edac.c cpumask_t -> struct cpumask, and don't put one on the stack. (Note: this is actually on the stack unless CONFIG_CPUMASK_OFFSTACK=y). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:27 +01:00
Borislav Petkov	e97f8bb8ce	amd64_edac: make DRAM regions output more human-readable Do not shift the TOP_MEM and TOP_MEM2 values by 23 but rather save the whole 64-bit value read from the MSR. Although the TOP_MEM/TOP_MEM2 bits are only a subset of the 64bit register, the values are correct since the remaining bits are Read-As-Zero and no shifting is needed. Also, cleanup DRAM base/limit debug output. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:27 +01:00
Borislav Petkov	72381bd55e	amd64_edac: clarify DRAM CTL debug reporting Make debug info formulations about the DRAM and DCT configuration of the machine more human readable. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-12-07 19:14:26 +01:00
Ingo Molnar	26fb20d008	Merge branch 'perf/mce' into perf/core Merge reason: It's ready for v2.6.33. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-03 20:11:06 +01:00
Borislav Petkov	17adea01b9	amd64_edac: fix CECCs reporting Shift error type bits properly. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-11-04 14:04:06 +01:00
Li Hong	a3c4c58085	amd64_edac: fix a wrong goto clause in amd64_edac.c In amd64_edac_init(void) in amd64_edac.c, cache_k8_northbridges() is called before pci_register_driver. If it fails, should exit with err directly. Signed-off-by: Li Hong <lihong.hi@gmail.com> Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-11-04 14:02:32 +01:00
Keith Mannthey	c2494ace99	edac: i5100 fix initialization code Allow csrows to properly initialize when the topology only has active channels on 2 and 3. This new check allows proper detection and initialization in this topology. Only checking the first mrt that represented channels 0 and 1 is not sufficient. I also fixed up the related debug information path. I can submit as a 2nd patch if needed. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Acked-by: Aristeu Rozanski <aris@ruivo.org> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:30 -07:00
Ira W. Snyder	0616fb003d	edac: i5400 fix missing CONFIG_PCI define When building without CONFIG_PCI the edac_pci_idx variable is unused, causing a build-time warning. Wrap the variable in #ifdef CONFIG_PCI, just like the rest of the PCI support. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:30 -07:00
Jeff Roberson	156edd4aaa	edac: i5400 fix csrow mapping The i5400 EDAC driver has several bugs with chip-select row computation which most likely lead to bugs in detailed error reporting. Attempts to contact the authors have gone mostly unanswered so I am presenting my diff here. I do not subscribe to lkml and would appreciate being kept in the cc. The most egregious problem was miscalculating the addresses of MTR registers after register 0 by assuming they are 32bit rather than 16. This caused the driver to miss half of the memories. Most motherboards tend to have only 8 dimm slots and not 16, so this may not have been noticed before. Further, the row calculations multiplied the number of dimms several times, ultimately ending up with a maximum row of 32. The chipset only supports 4 dimms in each of 4 channels, so csrow could not be higher than 4 unless you use a row per-rank with dual-rank dimms. I opted to eliminate this behavior as it is confusing to the user and the error reporting works by slot and not rank. This gives a much clearer view of memory by slot and channel in /sys. Signed-off-by: Jeff Roberson <jroberson@jroberson.net> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-10-29 07:39:30 -07:00
Borislav Petkov	4997811e3b	amd64_edac: fix DRAM base and limit extraction masks, v2 This is a proper fix as a follow-up to `66216a7` and `916d11b`. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-10-16 18:51:22 +02:00
Borislav Petkov	fb2531953f	mce, edac: Use an atomic notifier for MCEs decoding Add an atomic notifier which ensures proper locking when conveying MCE info to EDAC for decoding. The actual notifier call overrides a default, negative priority notifier. Note: make sure we register the default decoder only once since mcheck_init() runs on each CPU. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20091003065752.GA8935@liondog.tnic> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-12 12:24:45 +02:00
Linus Torvalds	624235c5b3	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pci: Correct spelling in a comment x86: Simplify bound checks in the MTRR code x86: EDAC: carve out AMD MCE decoding logic initcalls: Add early_initcall() for modules x86: EDAC: MCE: Fix MCE decoding callback logic	2009-10-08 12:06:36 -07:00
Borislav Petkov	94baaee494	amd64_edac: beef up DRAM error injection When injecting DRAM ECC errors (F3xBC_x8), EccVector[15:0] is a bitmask of which bits should be error injected when written to and holds the payload of 16-bit DRAM word when read, respectively. Add /sysfs members to show the DRAM ECC section/word/vector. Fail wrong injection values entered over /sysfs instead of truncating them. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-10-07 16:51:28 +02:00
Borislav Petkov	66216a7a15	amd64_edac: fix DRAM base and limit extraction On Fam10h and above, F1x[1, 0][7C:40] are DRAM Base/Limit registers which specify the destination node of a DRAM address. Those address boundaries are being extracted into ->dram_base[] and ->dram_limit[]. Correct the extraction masks to match the respective address bits. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-10-07 16:51:15 +02:00
Borislav Petkov	9d858bb10a	amd64_edac: fix chip select handling Different processor families support a different number of chip selects. Handle this in a family-dependent way with the proper values assigned at init time (see amd64_set_dct_base_and_mask). Remove _DCSM_COUNT defines since they're used at one place and originate from public documentation. CC: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-10-07 16:50:50 +02:00
Keith Mannthey	2cff18c22c	amd64_edac: simple fix to allow reporting of CECC errors This allows the errors to be further decoded and mapped to csrows. Tested with ECC debug dimms and an Rev F cpu based system. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-10-07 16:49:58 +02:00
Borislav Petkov	8edc544589	amd64_edac: fix K8 intlv_sel check The check when DRAM interleaving is enabled should be done against the pvt->dram_IntlvSel field and not against the ->dram_limit. Simplify first loop and fixup printk formatting while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-10-07 16:49:43 +02:00
Borislav Petkov	72f158fe6f	amd64_edac: fix interleave enable tests The pvt->dram_IntlvEn saves the 3 "Interleave Enable" bits already right-shifted by 8 so the check in find_mc_by_sys_addr() by shifting the values to the left 8 bits is wrong. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-10-07 16:48:08 +02:00
Borislav Petkov	916d11b2b5	amd64_edac: fix DRAM base and limit address extraction K8 DRAM base and limit addresses from F1x40 +8i and F1x44 + 8i, where i in (0..7) are both bits 39-24 and therefore the shifting should be done by 24 and not by 8. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-10-07 16:47:51 +02:00
Borislav Petkov	3011b20da9	amd64_edac: fix driver instance lookup table allocation Allocate memory statically for 8-node machines max for simplicity instead of relying on MAX_NUMNODES which is 0 on !CONFIG_NUMA builds. Spotted by Jan Beulich. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-10-07 16:47:34 +02:00
Borislav Petkov	0d18b2e34b	x86: EDAC: carve out AMD MCE decoding logic This converts the MCE decoding logic into a standalone config option which can be built-in or a module, the first one being the default for MCEs happening early on in the boot process. This, beyond being separated in a cleaner way, also saves RAM by making the decoding logic modular. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> LKML-Reference: <20091002133148.GD28682@aftab> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-02 15:42:19 +02:00
Ingo Molnar	f436f8bb73	x86: EDAC: MCE: Fix MCE decoding callback logic Make decoding of MCEs happen only on AMD hardware by registering a non-default callback only on CPU families which support it. While looking at the interaction of decode_mce() with the other MCE code i also noticed a few other things and made the following cleanups/fixes: - Fixed the mce_decode() weak alias - a weak alias is really not good here, it should be a proper callback. A weak alias will be overriden if a piece of code is built into the kernel - not good, obviously. - The patch initializes the callback on AMD family 10h and 11h. - Added the more correct fallback printk of: No support for human readable MCE decoding on this CPU type. Transcribe the message and run it through 'mcelog --ascii' to decode. On CPUs that dont have a decoder. - Made the surrounding code more readable. Note that the callback allows us to have a default fallback - without having to check the CPU versions during the printout itself. When an EDAC module registers itself, it can install the decode-print function. (there's no unregister needed as this is core code.) version -v2 by Borislav Petkov: - add K8 to the set of supported CPUs - always build in edac_mce_amd since we use an early_initcall now - fix checkpatch warnings Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> LKML-Reference: <20091001141432.GA11410@aftab> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-02 15:42:18 +02:00
Jesper Dangaard Brouer	458e5ff13e	edac: core: remove completion-wait for complete with rcu_barrier Module edac_core.ko uses call_rcu() callbacks in edac_device.c, edac_mc.c and edac_pci.c. They all use a wait_for_completion() scheme, but this scheme it not 100% safe on multiple CPUs. See the _rcu_barrier() implementation which explains why extra precausion is needed. The patch adds a comment about rcu_barrier() and as a precausion calls rcu_barrier(). A maintainer needs to look at removing the wait_for_completion code. [dougthompson@xmission.com: remove the wait_for_completion code] Signed-off-by Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-24 07:21:05 -07:00
Jason Uhlenkott	dd8ef1db87	edac: i3200 memory controller driver A driver for the Intel 3200 and 3210 memory controllers. It has only had light testing so far, and currently makes no attempt to decode error addresses at anything finer than csrow granularity. Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-24 07:21:04 -07:00
Julia Lawall	30a61fff3a	edac: fix resource size calculation Use the function resource_size, which reduces the chance of introducing off-by-one errors in calculating the resource size. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ struct resource *res; @@ - (res->end - res->start) + 1 + resource_size(res) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-24 07:21:04 -07:00
Ira W. Snyder	b484625172	edac: mpc85xx add mpc83xx support Add support for the Freescale MPC83xx memory controller to the existing driver for the Freescale MPC85xx memory controller. The only difference between the two processors are in the CS_BNDS register parsing code, which has been changed so it will work on both processors. The L2 cache controller does not exist on the MPC83xx, but the OF subsystem will not use the driver if the device is not present in the OF device tree. I had to change the nr_pages calculation to make the math work out. I checked it on my board and did the math by hand for a 64GB 85xx using 64K pages. In both cases, nr_pages * PAGE_SIZE comes out to the correct value. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: Kumar Gala <galak@gate.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-24 07:21:04 -07:00
Yang Shi	a014554e66	edac: mpc85xx add P2020DS support Based on Kumar's new compatible types patch, add P2020 into MPC85xx EDAC compatible lists so that EDAC can recognize P2020 meomry controller and L2 cache controller and export the relevant fields to sysfs. EDAC MPC85xx DDR3 support is needed if DDR3 memory stick is installed on a P2020DS board so that EDAC core can recognize DDR3 memory type. Signed-off-by: Yang Shi <yang.shi@windriver.com> Acked-by: Dave Jiang <djiang@mvista.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-24 07:21:04 -07:00
Anand Gadiyar	411c940385	trivial: fix typo "for for" in multiple files trivial: fix typo "for for" in multiple files Signed-off-by: Anand Gadiyar <gadiyar@ti.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-09-21 15:14:54 +02:00
Borislav Petkov	06724535f8	amd64_edac: check NB MCE bank enable on the current node properly The old code was using smp_call_function_many which skips the current cpu if it is in the supplied cpumask. Switch to the rdmsr_on_cpus() interface which takes care of that. In addition, add get_cpus_on_this_dct_cpumask helper which computes a cpumask of all the cores on a node and thus on a DCT. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-16 13:05:46 +02:00
Wan Wei	57a30854c8	amd64_edac: Rewrite unganged mode code of f10_early_channel_count Simplify the procedure by checking if there is any DIMM in each channel. This patch will fix the bugs such as when there is no DIMMs under certain node, two DIMMs in the same channel, and only one DIMM in each channel of the node. Borislav: minor fixups Signed-off-by: Wan Wei <wanwei@mail.dawning.com.cn> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-16 12:42:55 +02:00
Borislav Petkov	be3468e8ff	amd64_edac: cleanup amd64_check_ecc_enabled Simplify code flow and make sure return value is always valid since further driver init depends on it. Carve out long warning string and make code more readable. Shorten some names, while at it. There should be no functional change resulting from this patch. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-16 12:40:38 +02:00
Andreas Herrmann	6a8126911a	x86, EDAC: Provide function to return NodeId of a CPU Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: H. Peter Anvin <hpa@zytor.com>	2009-09-16 11:33:40 +02:00
Ingo Molnar	b9183f9b99	amd64_edac: build driver only on AMD hardware -tip testing found the following build failure (config attached): drivers/built-in.o: In function `amd64_check': amd64_edac.c:(.text+0x3e9491): undefined reference to `amd_decode_nb_mce' drivers/built-in.o: In function `amd64_init_2nd_stage': amd64_edac.c:(.text+0x3e9b46): undefined reference to `amd_report_gart_errors' amd64_edac.c:(.text+0x3e9b55): undefined reference to `amd_register_ecc_decoder' drivers/built-in.o: In function `amd64_nbea_store': amd64_edac_dbg.c:(.text+0x3ea22e): undefined reference to `amd_decode_nb_mce' drivers/built-in.o: In function `amd64_remove_one_instance': amd64_edac.c:(.devexit.text+0x3eea): undefined reference to `amd_report_gart_errors' amd64_edac.c:(.devexit.text+0x3ef6): undefined reference to `amd_unregister_ecc_decoder' the AMD EDAC code has a dependency on CONFIG_CPU_SUP_AMD facilities. The patch below solves the problem here. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-16 11:31:57 +02:00
Borislav Petkov	53bd5fedca	EDAC, AMD: decode FR MCEs See Fam10h BKDG (31116, rev. 3.28), Table 101. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-14 19:01:37 +02:00
Borislav Petkov	f9350efd6f	EDAC, AMD: decode load store MCEs See Fam10h BKDG (31116, rev. 3.28), Table 100. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-14 19:01:33 +02:00
Borislav Petkov	56cad2d6fb	EDAC, AMD: decode bus unit MCEs ... according to Table 69, Fam10h BKDG (31116, rev. 3.28). Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-14 19:01:30 +02:00
Borislav Petkov	ab5535e70f	EDAC, AMD: decode instruction cache MCEs See Fam10h BKDG (31116, rev. 3.28), Table 95 Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-14 19:01:27 +02:00
Borislav Petkov	5196624136	EDAC, AMD: decode data cache MCEs Those get reported in MC0_STATUS, see Table 92, F10h BKDG (31116, rev. 3.28) for more details. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-14 19:01:23 +02:00
Borislav Petkov	d93cc222ad	EDAC, AMD: carve out decoding of MCi_STATUS ErrorCode This is the MCE error code from the MCi_STATUS banks, bits [15:0] which describe what type of error was encountered: GART TLB, Memory or Bus error. The semantics of those bits are identical across all MCE banks so decode those separately, irrespectively of MCE type. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-14 19:01:20 +02:00
Borislav Petkov	b69b29de65	EDAC, AMD: carve out MCi_STATUS decoding The MCi_STATUS registers have most field definitions in common so decode them in the general path. Do not pass ecc_type along and compute it in __amd64_decode_bus_error instead. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-14 19:01:07 +02:00
Borislav Petkov	549d042df2	x86, mce: pass mce info to EDAC for decoding Move NB decoder along with required defines to EDAC MCE core. Add registration routines for further decoding of the MCE info in the AMD64 EDAC module. CC: Andi Kleen <andi@firstfloor.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-14 18:59:17 +02:00
Borislav Petkov	ecaf5606de	amd64_edac: cleanup amd64_decode_bus_error Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-14 18:58:37 +02:00
Borislav Petkov	b7225e4fc1	amd64_edac: remove memory and GART TLB error decoders Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-14 18:58:29 +02:00
Borislav Petkov	5110dbdeab	amd64_edac: cleanup/complete NB MCE decoding * don't dump info which mcheck already does * update to newest BKDG * mv amd64_process_error_info -> amd64_decode_nb_mce * shorten error struct names * remove redundant info ptr in amd64_process_error_info * remove unused ErrorCodeExt[19:16] (MCx_STATUS) defines Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-14 18:58:25 +02:00
Borislav Petkov	ef44cc4c22	amd64_edac: cleanup amd64_process_error_info * mv amd64_error_info_regs -> err_regs * remove redundant info ptr Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-14 18:58:18 +02:00
Borislav Petkov	1c43f2e24d	EDAC: beef up ErrorCodeExt error signatures Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-14 18:58:14 +02:00
Borislav Petkov	b70ef01016	EDAC: move MCE error descriptions to EDAC core This is in preparation of adding AMD-specific MCE decoding functionality to the EDAC core. The error decoding macros originate from the AMD64 EDAC driver albeit in a simplified and cleaned up version here. While at it, add macros to generate the error description strings and use them in the error type decoders directly which removes a bunch of code and makes the decoding functions much more readable. Also, fix strings and shorten macro names. Remove superfluous htlink_msgs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-09-14 18:57:48 +02:00
Doug Thompson	c2718348b4	amd64_edac: print debug statements only on error Add forgotten return calls for the successful cases. Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-08-04 12:10:06 +02:00
Doug Thompson	126b67b8d2	amd64_edac: fix ECC checking On the good path of BIOS enabled ECC and no override, the value returned is 1 by omission and thus is deemed failing by the probe-function. Allow proper module initialization by clearing the retval explicitly. Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-08-03 16:54:20 +02:00
Lu Zhihe	3d768213a6	edac: x38 fix mchbar high register addr Intel X38 MCHBAR is a 64bits register, base from 0x48, so its higher base is 0x4C. Signed-off-by: Lu Zhihe <tombowfly@gmail.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: <stable@kernel.org> [2.6.30.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-29 19:10:34 -07:00
Wan Wei	4afcd2dcc6	amd64_edac: read the right F2 maskoffset reg Signed-off-by: Wan Wei <onewayforever@gmail.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-07-27 14:42:24 +02:00
Yang Shi	b1cfebc923	edac: add DDR3 memory type for MPC85xx EDAC Since some new MPC85xx SOCs support DDR3 memory now, so add DDR3 memory type for MPC85xx EDAC. Signed-off-by: Yang Shi <yang.shi@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-30 18:55:59 -07:00
Borislav Petkov	37da045067	amd64_edac: misc small cleanups - cleanup debug calls - shorten function names - cleanup error exit paths Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-26 13:06:41 +02:00
Borislav Petkov	30c875cbc1	amd64_edac: fix ecc_enable_override handling amd64_check_ecc_enabled() returns non-zero status when ECC checking/correcting is disabled and this fails further loading of the driver even when 'ecc_enable_override' boot param is used. Fix that by clearing return status in that case. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-26 13:06:41 +02:00
Borislav Petkov	584fcff428	amd64_edac: check only ECC bit in amd64_determine_edac_cap Checking whether the machine is using ECC enabled DRAM is done through testing the DimmEccEn bit in the DRAM Cfg Low register (F2x[1,0]90). Do that instead of testing all bits from the DimmEccEn upwards. Also, remove mci->edac_cap assignment and use value returned from amd64_determine_edac_cap(). Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-26 13:06:40 +02:00
GeunSik Lim	e24aca672f	edac: Kconfig: fix the meaning of EDAC abbreviation Fix the meaning of EDAC(Error Detection And Correction) correctly. [akpm@linux-foundation.org: add missing space] Signed-off-by: GeunSik Lim <geunsik.lim@samsung.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:57 -07:00
Mike Frysinger	20ea8fad9e	edac: add missing __devexit_p() The remove function uses __devexit, so the .remove assignment needs __devexit_p() to fix a build error with hotplug disabled. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:57 -07:00
Harry Ciao	1dc9b70d7d	edac: add edac_device_alloc_index() Add edac_device_alloc_index(), because for MAPLE platform there may exist several EDAC driver modules that could make use of edac_device_ctl_info structure at the same time. The index allocation for these structures should be taken care of by EDAC core. [akpm@linux-foundation.org: cleanups] Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kumar Gala <galak@gate.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:56 -07:00
Harry Ciao	2a9036afff	edac: add CPC925 Memory Controller driver Introduce IBM CPC925 EDAC driver, which makes use of ECC, CPU and HyperTransport Link error detections and corrections on the IBM CPC925 Bridge and Memory Controller. [akpm@linux-foundation.org: cleanup] Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kumar Gala <galak@gate.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-18 13:03:56 -07:00
Martin Olsson	98a1708de1	trivial: fix typos s/paramter/parameter/ and s/excute/execute/ in documentation and source comments. Signed-off-by: Martin Olsson <martin@minimum.se> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-06-12 18:01:46 +02:00
Borislav Petkov	9456ffffcf	EDAC: do not enable modules by default Prevent EDAC compilation units from being built by default and let the user explicitly select the needed modules. Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Tested-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:19:41 +02:00
Borislav Petkov	3d37329045	amd64_edac: do not enable module by default While at it, fix a link failure when !K8_NB. Acked-by: Doug Thompson <dougthompson@xmission.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Tested-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:19:40 +02:00
Doug Thompson	7d6034d321	amd64_edac: add module registration routines Also, link into Kbuild by adding Kconfig and Makefile entries. Borislav: - Kconfig/Makefile splitting - use zero-sized arrays for the sysfs attrs if not enabled - rename sysfs attrs to more conform values - shorten CONFIG_ names - make multiple structure members assignment vertically aligned - fix/cleanup comments - fix function return value patterns - fix err labels - fix a memleak bug caught by Ingo - remove the NUMA dependency and use num_k8_northbrides for initializing a driver instance per NB. - do not copy the pvt contents into the mci struct in amd64_init_2nd_stage() and save it in the mci->pvt_info void ptr instead. - cleanup debug calls - simplify amd64_setup_pci_device() Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:19:28 +02:00
Doug Thompson	f9431992b6	amd64_edac: add ECC reporting initializers Borislav: - convert to the new {rd\|wr}msr_on_cpus interfaces. - convert pvt->old_mcgctl to a bitmask thus saving some bytes - fix/cleanup comments - fix function return value patterns - add a proper bugfix found by Doug to amd64_check_ecc_enabled where we missed checking for the ECC enabled bit in NB CFG. - cleanup debug calls Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:19:01 +02:00
Doug Thompson	0ec449ee95	amd64_edac: add EDAC core-related initializers Borislav: - add a amd64_free_mc_sibling_devices() helper instead of opencoding the release-path. - fix/cleanup comments - fix function return value patterns - cleanup debug calls Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:19:00 +02:00
Doug Thompson	d27bf6fa36	amd64_edac: add error decoding logic Borislav: - fold amd64_error_info_valid() into its only user - fix/cleanup comments - fix function return value patterns - cleanup debug calls Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:18:59 +02:00
Doug Thompson	b1289d6f9d	amd64_edac: add ECC chipkill syndrome mapping table Borislav: - fix comments - cleanup debug calls Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:18:58 +02:00
Doug Thompson	4d37607adb	amd64_edac: add per-family descriptors Borislav: - fix comments - fix function return value patterns Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:18:57 +02:00
Doug Thompson	f71d0a0500	amd64_edac: add F10h-and-later methods-p3 Borislav: - compute dct_sel_base_off in f10_match_to_this_node() correctly since it cannot be assumed that the Reserved bits are zero and they have to be masked out instead. - cleanup, remove StinkyIdentifiers, simplify logic - fix function return value patterns - cleanup debug calls Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:18:56 +02:00
Doug Thompson	6163b5d4fb	amd64_edac: add F10h-and-later methods-p2 Borislav: - fix a wrong negation in f10_determine_base_addr_offset() - fix a wrong mask in f10_determine_base_addr_offset() which should select DctSelBaseAddr[31:11] and not [31:16] as it was before - remove StinkyIdentifiers, trivially simplify code. - fix/cleanup comments - fix function return value patterns Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:18:56 +02:00
Doug Thompson	1afd3c98b5	amd64_edac: add F10h-and-later methods-p1 Borislav: Fail f10_early_channel_count() if error encountered while reading a NB register since those cached register contents are accessed afterwards. - fix/cleanup comments - fix function return value patterns - cleanup debug calls Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:18:55 +02:00
Doug Thompson	ddff876d20	amd64_edac: add k8-specific methods Borislav: - fix/cleanup/move comments - fix function return value patterns - cleanup debug calls Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:18:54 +02:00
Doug Thompson	94be4bff21	amd64_edac: assign DRAM chip select base and mask in a family-specific way Borislav: - cleanup/fix comments - fix function return value patterns - cleanup debug calls Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:18:53 +02:00
Doug Thompson	2da11654ea	amd64_edac: add helper to dump relevant registers Borislav: - cleanup/fix comments - fix function return value patterns - cleanup dbg calls Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:18:52 +02:00
Doug Thompson	93c2df58b5	amd64_edac: add DRAM address type conversion facilities Borislav: - cleanup/fix comments, add BKDG refs - fix function return value patterns - cleanup dbg calls Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:18:51 +02:00
Doug Thompson	e2ce7255e8	amd64_edac: add functionality to compute the DRAM hole Borislav: - cleanup/fix comments, add BKDG refs - cleanup debug calls Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:18:50 +02:00
Doug Thompson	6775763a23	amd64_edac: add sys addr to memory controller mapping helpers Borislav: - cleanup comments - cleanup debug calls - simplify find_mc_by_sys_addr's exit path Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:18:49 +02:00
Doug Thompson	2bc6541872	amd64_edac: add memory scrubber interface Borislav: - fix/cleanup comments - fix function return value patterns - cleanup debug calls Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:18:49 +02:00
Doug Thompson	b52401cece	amd64_edac: add MCA error types Borislav: - cleanup comments Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:18:48 +02:00
Doug Thompson	eb919690be	amd64_edac: add DRAM error injection logic using sysfs Borislav: - rename sysfs attrs to more conform names - cleanup/fix comments according to BKDG text - fix function return value patterns - cleanup debug calls Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:18:47 +02:00
Doug Thompson	fd3d6780f7	amd64_edac: add debugging/testing code This is for dumping different registers and testing the address mapping logic using the ECC syndromes. Borislav: - split sysfs attrs per file - use more conform names for the sysfs attrs - fix function return value patterns - cleanup/fix comments - cleanup debug calls Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:18:46 +02:00
Doug Thompson	cfe40fdb4a	amd64_edac: add driver header Borislav: - remove register bit descriptions (complete text in BKDG) - cleanup and remove excessive/superfluous comments Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:18:45 +02:00
Borislav Petkov	d357cbb445	edac: fold __func__ into edac_debug_printk This shortens debugfX() calls a bit. Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com> CC: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2009-06-10 12:18:44 +02:00
Harry Ciao	715fe7af9f	edac: AMD8111 & AMD8131 Kconfig fixup The amd8111_edac.c driver will fail allmodconfig on architectures other than PPC, introduce Kconfig dependency to avoid this, since both AMD8111 and AMD8131 chips are only adopted on Maple so far. Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-05-29 08:40:03 -07:00
Harry Ciao	56ec0c7b88	edac: AMD8111 & AMD8131 use dev_name() The "bus_id" member in the device structure has been obsolete, use dev_name() instead. Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-05-29 08:40:03 -07:00
Dave Jiang	55e5750b3e	edac: ppc mpc85xx fix mc err detect Error found by Jeff Haran. The error detect register is 0s when no errors are detected. The check code is incorrect, so reverse check sense. Reported-by: Jeff Haran <jharan@Brocade.COM> Signed-off-by: Dave Jiang <djiang@mvista.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-21 13:41:51 -07:00
Jean Delvare	fbeb438474	edac: use to_delayed_work() The edac-core driver includes code which assumes that the work_struct which is included in every delayed_work is the first member of that structure. This is currently the case but might change in the future, so use to_delayed_work() instead, which doesn't make such an assumption. linux-2.6.30-rc1 has the to_delayed_work() function that will allow this patch to work Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-13 15:04:34 -07:00
Jeff Haran	e6da46b273	edac: fix local pci_write_bits32 Fix the edac local pci_write_bits32 to properly note the 'escape' mask if all ones in a 32-bit word. Currently no consumer of this function uses that mask, so there is no danger to existing code. Signed-off-by: Jeff Haran <jharan@Brocade.COM> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-13 15:04:33 -07:00
Harry Ciao	58b4ce6f24	edac: AMD8111 driver Kconfig & Makefile Introduce Kconfig and Makefile options for AMD8111 EDAC driver. Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:04 -07:00
Harry Ciao	e876558415	edac: AMD8131 driver Kconfig & Makefile Introduce Kconfig and Makefile options for AMD8131 EDAC driver. Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:04 -07:00
Harry Ciao	28d16272b1	edac: AMD8131 driver source file Introduce AMD8131 EDAC driver source file, which makes use of error detections on the PCI-X Bridge Controllers on the AMD8131 HyperTransport PCI-X Tunnel. Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:04 -07:00
Harry Ciao	a35a281880	edac: AMD8131 driver header file Introduce AMD8131 EDAC driver header file, which adds register and bits definitions for the PCI-X Bridge Controller on the AMD8131 HyperTransport I/O Hub. Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:03 -07:00
Harry Ciao	8641a3845d	edac: Add edac_pci_alloc_index() Add edac_pci_alloc_index(), because for MAPLE platform there may exist several EDAC driver modules that could make use of edac_pci_ctl_info structure at the same time. The index allocation for these structures should be taken care of by EDAC core. Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:03 -07:00
Harry Ciao	697dab6484	edac: AMD8111 driver source file Introduce AMD8111 EDAC driver source file, which makes use of error detections on the LPC Bridge Controller and PCI Bridge Controller on the AMD8111 HyperTransport I/O Hub. Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:03 -07:00
Harry Ciao	ec2cf2e272	edac: AMD8111 driver header file Introduce AMD8111 EDAC driver header file, which adds register and bits definitions for the LPC Bridge Controller and PCI Bridge Controller on the AMD8111 HyperTransport I/O Hub. Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:03 -07:00
Grant Erickson	dba7a77c0e	edac: new ppc4xx driver module This adds support for an EDAC memory controller adaptation driver for the "ibm,sdram-4xx-ddr2" ECC controller realized in the AMCC PowerPC 405EX[r]. At present, this driver has been developed and tested against the controller realization in the AMCC PPC405EX[r] on the AMCC Kilauea and Haleakala boards (256 MiB w/o ECC memory soldered onto the board) and a proprietary board based on those designs (128 MiB ECC memory, also soldered onto the board). In the future, dynamic feature detection and handling needs to be added for the other realizations of this controller found in the 440SP, 440SPe, 460EX, 460GT and 460SX. Eventually, this driver will likely be evolved and adapted to the above variant realizations of this controller as well as broken apart to handle the other known ECC-capable controllers prevalent in other PPC4xx processors: - IBM SDRAM (405GP, 405CR and 405EP) "ibm,sdram-4xx" - IBM DDR1 (440GP, 440GX, 440EP and 440GR) "ibm,sdram-4xx-ddr" - Denali DDR1/DDR2 (440EPX and 440GRX) "denali,sdram-4xx-ddr2" [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Grant Erickson <gerickson@nuovations.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:03 -07:00
Doug Thompson	4577ca5568	edac: remove EDAC's experimental status After 3 years, this is a patch to remove the EXPERIMENTAL tag on EDAC. We now have many module drivers submitters in EDAC and believe EDAC is no longer EXPERIMENTAL Signed-off-by: Doug Thompson <dougthompson@xmission.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:03 -07:00
Hitoshi Mitake	cc18e3cd53	edac: add more verbose debug info A patch for making a debugging information more verbose for use in development debugging. By enabling the new option "More verbose debugging", information about source file and line number will be added to debugging message. This is sample output, EDAC MC0: Giving out device to 'e7xxx_edac' 'E7205': DEV 0000:00:00.0 EDAC DEBUG: in drivers/edac/edac_pci.c, line at 48: edac_pci_alloc_ctl_info() EDAC DEBUG: in drivers/edac/edac_pci.c, line at 334: edac_pci_add_device() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:02 -07:00
Kay Sievers	031d551859	edac: struct device - replace bus_id with dev_name(), dev_set_name() Cc: dougthompson@xmission.com Cc: bluesmoke-devel@lists.sourceforge.net Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>	2009-03-24 16:38:21 -07:00
Stephen Rothwell	4712fff9be	powerpc: More printing warning fixes for the l64 to ll64 conversion These are all powerpc specific drivers. res.start in fsl_elbc_nand.c needs to be cast since it may be either 32 or 64 bit. Thanks to Scott Wood for noticing. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Arnd Bergmann <arnd@arndb.de> call_edac bits in particular Acked-by: Olof Johansson <olof@lixom.net> pasemi_nand peices Acked-by: Scott Wood <scottwood@freescale.com> fsl_elbc fixes Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-28 17:15:52 +11:00
Mauro Carvalho Chehab	8375d4909a	edac: driver for i5400 MCH (update) Signed-off-by: Ben Woodard <woodard@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-06 15:59:30 -08:00
Mauro Carvalho Chehab	920c8df6ac	edac: driver for i5400 MCH (Seaburg) EDAC driver for i5400 MCH (Seaburg) This driver adds support for i5400 MCH chipset. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Ben Woodard <woodard@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-06 15:59:30 -08:00
Kumar Gala	29d6cf26a7	edac: fix mpc85xx and add mpc8536 mpc8560 All other compatibles that are uniquely identifying the processor use a prefix of the form fsl,mpc85...'. We add support for it so we can deprecate the older 'fsl,85...' that was improperly used here. Additionally added mpc8536 & mpc8560 to the compatible lists. This patch is based on Nate's 8572 patch. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Acked-by: Dave Jiang <djiang@mvista.com> Cc: Nate Case <ncase@xes-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-06 15:59:30 -08:00
Kay Sievers	281efb17d8	edac: struct device: replace bus_id with dev_name(), dev_set_name() This patch is part of a larger patch series which will remove the "char bus_id[20]" name string from struct device. The device name is managed in the kobject anyway, and without any size limitation, and just needlessly copied into "struct device". [akpm@linux-foundation.org: coding-style fixes] Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-06 15:59:30 -08:00
Arjan van de Ven	1dca00bd02	pci: use pci_ioremap_bar() in drivers/edac Use the newly introduced pci_ioremap_bar() function in drivers/edac. pci_ioremap_bar() just takes a pci device and a bar number, with the goal of making it really hard to get wrong, while also having a central place to stick sanity checks. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-06 15:59:30 -08:00
Linus Torvalds	3c92ec8ae9	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits) powerpc/44x: Support 16K/64K base page sizes on 44x powerpc: Force memory size to be a multiple of PAGE_SIZE powerpc/32: Wire up the trampoline code for kdump powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M powerpc/32: Allow __ioremap on RAM addresses for kdump kernel powerpc/32: Setup OF properties for kdump powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs() powerpc: Prepare xmon_save_regs for use with kdump powerpc: Remove default kexec/crash_kernel ops assignments powerpc: Make default kexec/crash_kernel ops implicit powerpc: Setup OF properties for ppc32 kexec powerpc/pseries: Fix cpu hotplug powerpc: Fix KVM build on ppc440 powerpc/cell: add QPACE as a separate Cell platform powerpc/cell: fix build breakage with CONFIG_SPUFS disabled powerpc/mpc5200: fix error paths in PSC UART probe function powerpc/mpc5200: add rts/cts handling in PSC UART driver powerpc/mpc5200: Make PSC UART driver update serial errors counters powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver ... Fix trivial conflict in drivers/char/Makefile as per Paul's directions	2008-12-28 16:54:33 -08:00
Harry Ciao	d519c8d9cc	edac: fix edac core deadlock when removing a device When deleting an edac device, we have to wait for its edac_dev.work to be completed before deleting the whole edac_dev structure. Since we have no idea which work in current edac_poller's workqueue is the work we are conerned about, we wait for all work in the edac_poller's workqueue to be proceseed. This is done via flush_cpu_workqueue() which inserts a wq_barrier into the tail of the workqueue and then sleeping on the completion of this wq_barrier. The edac_poller will wake up sleepers when it is found. EDAC core creates only one kernel worker thread, edac_poller, to run the works of all current edac devices. They share the same callback function of edac_device_workq_function(), which would grab the mutex of device_ctls_mutex first before it checks the device. This is exactly where edac_poller and rmmod would have a great chance to deadlock. In below call trace of rmmod > ... > edac_device_del_device > edac_device_workq_teardown > flush_workqueue > flush_cpu_workqueue, device_ctls_mutex would have already been grabbed by edac_device_del_device(). So, on one hand rmmod would sleep on the completion of a wq_barrier, holding device_ctls_mutex; on the other hand edac_poller would be blocked on the same mutex when it's running any one of works of existing edac evices(Note, this edac_dev.work is likely to be totally irrelevant to the one that is being removed right now)and never would have a chance to run the work of above wq_barrier to wake rmmod up. edac_device_workq_teardown() should not be called within the critical region of device_ctls_mutex. Just like is done in edac_pci_del_device() and edac_mc_del_mc(), where edac_pci_workq_teardown() and edac_mc_workq_teardown() are called after related mutex are released. Moreover, an edac_dev.work should check first if it is being removed. If this is the case, then it should bail out immediately. Since not all of existing edac devices are to be removed, this "shutting flag" should be contained to edac device being removed. The current edac_dev.op_state can be used to serve this purpose. The original deadlock problem and the solution have been witnessed and tested on actual hardware. Without the solution, rmmod an edac driver would result in below deadlock: root@localhost:/root> rmmod mv64x60_edac EDAC DEBUG: mv64x60_dma_err_remove() EDAC DEBUG: edac_device_del_device() EDAC DEBUG: find_edac_device_by_dev() (hang for a moment) INFO: task edac-poller:2030 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. edac-poller D 00000000 0 2030 2 Call Trace: [df159dc0] [c0071e3c] free_hot_cold_page+0x17c/0x304 (unreliable) [df159e80] [c000a024] __switch_to+0x6c/0xa0 [df159ea0] [c03587d8] schedule+0x2f4/0x4d8 [df159f00] [c03598a8] __mutex_lock_slowpath+0xa0/0x174 [df159f40] [e1030434] edac_device_workq_function+0x28/0xd8 [edac_core] [df159f60] [c003beb4] run_workqueue+0x114/0x218 [df159f90] [c003c674] worker_thread+0x5c/0xc8 [df159fd0] [c004106c] kthread+0x5c/0xa0 [df159ff0] [c0013538] original_kernel_thread+0x44/0x60 INFO: task rmmod:2062 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rmmod D 0ff2c9fc 0 2062 1839 Call Trace: [df119c00] [c0437a74] 0xc0437a74 (unreliable) [df119cc0] [c000a024] __switch_to+0x6c/0xa0 [df119ce0] [c03587d8] schedule+0x2f4/0x4d8 [df119d40] [c03591dc] schedule_timeout+0xb0/0xf4 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-23 15:58:21 -08:00
Benjamin Krill	def434c231	powerpc/cell: add QPACE as a separate Cell platform Since the QPACE (Chromodynamics Parallel Computing on the Cell Broadband Engine) platform doesn't use a iommu, doesn't have PCI devices and a MPIC much lesser setup and configurations are needed. So far all devices are detected as OF device. A notifier function is used to set the dma_ops for the of_platform bus. Further this patch splits the PPC_CELL_NATIVE into PPC_CELL_COMMON which are parts that are shared with the QPACE platform and the rest. Signed-off-by: Benjamin Krill <ben@codiert.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2008-12-22 22:19:19 +01:00
Jarkko Lavinen	09a81269c7	i82875p_edac: fix module remove Fix module removal bugs of i82875p_edac. Also i82975x_edac code seems to have the same module removal bugs as in i82875p_edac. The problems were: 1. In module removal i82875p_remove_one() is never called. Variable i82875p_registered is newer changed from 1, which guarantees i82875p_remove_one() is not called (and even if it were called, it would be called in wrong order). As a result, the edac_mc workque is not stopped and keeps probing. If kernel debugging options are not enabled, user may not notice anything going wrong. if debugging options are enabled and I do "rmmod i82875p_edac", I get: edac debug: edac_pci_workq_function() checking BUG: unable to handle kernel paging request at f882d16f ... call trace: [<f8834df3>] ? edac_mc_workq_function+0x55/0x7e [edac_core] [<c0233974>] ? run_workqueue+0xd7/0x1a5 [<c023392f>] ? run_workqueue+0x92/0x1a5 [<f8834d9e>] ? edac_mc_workq_function+0x0/0x7e [edac_core] [<c0233af9>] ? worker_thread+0xb7/0xc3 [<c0236a7b>] ? autoremove_wake_function+0x0/0x33 [<c0233a42>] ? worker_thread+0x0/0xc3 [<c0236809>] ? kthread+0x3b/0x61 [<c02367ce>] ? kthread+0x0/0x61 [<c0204587>] ? kernel_thread_helper+0x7/0x10 Fix for this is to get rid of needles variable i82875p_registered altogether and run i82875p_remove_one() before pci_unregister_driver(). 2. edac_mc_del_mc() uses mci after freeing mci edac_mc_del_mc() calls calls edac_remove_sysfs_mci_device(). The kobject refcount of mci drops to 0 and mci is freed. After this mci is accessed via debug print and i82875p_remove_one() still uses mci->pvt and tries to free mci again with edac_mc_free(). The fix for this is add kobject_get(&mci->edac_mci_kobj) after edac_mc_alloc(). Then the mci is still available after returning from edac_mc_del_mc() with refcount 1, and mci->pvt is still available. When i82875p_remove_one() finally calls edac_mc_free(), this will cause kobject_put() and mci is released properly. Signed-off-by: Jarkko Lavinen <jlavi@iki.fi> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-01 19:55:25 -08:00
Jarkko Lavinen	307d114441	i82875p_edac: fix overflow device resource setup When I do "modprobe i82875p_edac" on my Asus P4C800 MB on kernels 2.6.26 or later, the module load fails due to BAR 0 collision. On 2.6.25 the module loads just fine. The overflow device on the MB seems to be hidden and its resources are not allocated at normal PCI bus init. Log shows the missing resource problem: EDAC DEBUG: i82875p_probe1() PCI: 0000:00:06.0 reg 10 32bit mmio: [fecf0000, fecf0fff] pci 0000:00:06.0: device not available because of BAR 0 [0xfecf0000-0xfecf0fff] collisions EDAC i82875p: i82875p_setup_overfl_dev(): Failed to enable overflow device The patch below fixes this by calling pci_bus_assign_resources() after the overflow device is revealed and added to the bus. With this patch I am again able to load and use the module. Signed-off-by: Jarkko Lavinen <jlavi@iki.fi> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-12-01 19:55:25 -08:00
Darrick J. Wong	f0f7e0dc73	i5000-edac: hold reference to mci kobject It turns out that edac_mc_del_mc will kobject_put the last kref on the mci object. If the timing is just right, that means that the mci object is freed before before i5000_remove_one has a chance to free the resources associated with it, causing a null pointer exceptions when unloading the driver. Insert a kobject_{get,put} pair so that this doesn't happen. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-11-12 17:17:16 -08:00
Benjamin Herrenschmidt	992b692dcf	edac: fix enabling of polling cell module The edac driver on cell turned out to be not enabled because of a missing op_state. This patch introduces it. Verified to work on top of Ben's next branch. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Osterkamp <jens@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-30 11:38:46 -07:00
Hitoshi Mitake	df8bc08c19	edac x38: new MC driver module I wrote a new module for Intel X38 chipset. This chipset is very similar to Intel 3200 chipset, but there are some different points, so I copyed i3200_edac.c and modified. This is Intel's web page describing this chipset. http://www.intel.com/Products/Desktop/Chipsets/X38/X38-overview.htm I've tested this new module with broken memory, and it seems to be working well. Signed-off-by: Hitoshi Mitake <mitake@clustcom.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-30 11:38:45 -07:00
Benjamin Herrenschmidt	3b274f44d2	edac cell: fix incorrect edac_mode The cell_edac driver is setting the edac_mode field of the csrow's to an incorrect value, causing the sysfs show routine for that field to go out of an array bound and Oopsing the kernel when used. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: <stable@kernel.org> [2.6.27.x, 2.6.26.x. 2.6.25.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:40 -07:00
Aristeu Rozanski	8360e81b5d	edac i5000: fix thermal issues Make the Thermal messages (temperature got past Tmid) be displayed only once because: 1) it's the BIOS job to configure and handle the memory throttling 2) if the BIOS is broken or is aware about the condition, flooding the system logs won't help anything. 3) According to the specification update for Intel 5000 MCHs, all the revisions of this MCH have problems on the thermal sensors, making not automatic (a.k.a. intelligent thermal throttling) impossible. Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-16 11:21:48 -07:00
Aristeu Rozanski	c066740739	edac i5000: fix error messages Update the i5000_edac messages, making everything pass through the EDAC (so the log controls will work) and being more specific about the errors. Also, it makes the miscellaneous errors optional and disabled by default. As I didn't found anywhere information about M23ERR-M26ERR (FERR_NF_THERMAL) on FERR_NF_FBD, I'm removing them. Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-16 11:21:48 -07:00

1 2 3 4 5 ...

418 Commits