linux

Author	SHA1	Message	Date
Mauro Carvalho Chehab	116389ed21	i7300_edac: Add a FIXME note about the error correction type Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-08-30 14:56:45 -03:00
Mauro Carvalho Chehab	c3af2eaf7a	i7300_edac: add global error registers Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-08-30 14:56:44 -03:00
Mauro Carvalho Chehab	af3d8831e7	i7300_edac: display info if ECC is enabled or not Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-08-30 14:56:43 -03:00
Mauro Carvalho Chehab	fcaf780b2a	i7300_edac: start a driver for i7300 chipset (Clarksboro) Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-08-30 14:56:42 -03:00
Borislav Petkov	37b7370a8d	amd64_edac: Do not report error overflow as a separate error When the Overflow MCi_STATUS bit is set, EDAC reports the lost error with a "no information available" message which often puzzles users parsing the dmesg. This doesn't make much sense since this error has been lost anyway so no need for reporting it separately. Thus, report the overflow bit setting in the MCE dump instead. While at it, remove reporting of MiscV and ErrorEnable (en) which are superfluous. Now it looks like this: [ 1501.650024] MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error [ 1501.666887] Northbridge Error, node 2 Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-08-26 12:46:03 +02:00
Borislav Petkov	e045c29126	MCE, AMD: Limit MCE decoding to current families for now Limit MCE error decoding to current and older families only (K8-F11h). Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-08-24 18:06:54 +02:00
Linus Torvalds	58d4ea65b9	Merge branch 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6 * 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6: mmc_spi: Fix unterminated of_match_table of/sparc: fix build regression from of_device changes of/device: Replace struct of_device with struct platform_device	2010-08-12 09:11:31 -07:00
Anton Vorontsov	cd1542c819	edac: mpc85xx: add support for new MPCxxx/Pxxxx EDAC controllers Simply add proper IDs into the device table. Signed-off-by: Anton Vorontsov <avorontsov@mvista.com> Cc: Scott Wood <scottwood@freescale.com> Cc: Peter Tyser <ptyser@xes-inc.com> Cc: Dave Jiang <djiang@mvista.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:21 -07:00
Kulikov Vasiliy	b425d5c82d	edac: i5400: improve handling of pci_enable_device() return value -EIO is not the only error code that pci_enable_device() may return, also the set of errors can be enhanced in future. We should compare return code with zero, not with concrete error value. Signed-off-by: Kulikov Vasiliy <segooon@gmail.com> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Jeff Roberson <jroberson@jroberson.net> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:21 -07:00
Kulikov Vasiliy	44aa80f005	edac: i5000: improve handling of pci_enable_device() return value -EIO is not the only error code that pci_enable_device() may return, also the set of errors can be enhanced in future. We should compare return code with zero, not with concrete error value. Signed-off-by: Kulikov Vasiliy <segooon@gmail.com> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Jeff Roberson <jroberson@jroberson.net> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:21 -07:00
Christoph Egger	bd1688dcdf	edac: add wissing pieces from MPC85xx -> FSL_SOC_BOOKE In `5753c082f6` ("powerpc/85xx: Kconfig cleanup") menuconfig MPC85xx was replaced by FSL_SOC_BOOKE but some references insider the code were not adjusted accordingly. This patch adresses these missing pieces. Signed-off-by: Christoph Egger <siccegge@cs.fau.de> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Peter Tyser <ptyser@xes-inc.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Scott Wood <scottwood@freescale.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:20 -07:00
Grant Likely	2dc1158137	of/device: Replace struct of_device with struct platform_device of_device is just an alias for platform_device, so remove it entirely. Also replace to_of_device() with to_platform_device() and update comment blocks. This patch was initially generated from the following semantic patch, and then edited by hand to pick up the bits that coccinelle didn't catch. @@ @@ -struct of_device +struct platform_device Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Reviewed-by: David S. Miller <davem@davemloft.net>	2010-08-06 09:25:50 -06:00
Borislav Petkov	c4799c7570	amd64_edac: Minor formatting fix EDAC MC3: CE page 0xc32281, offset 0x8a0, grain 0, syndrome 0x1, row 2, channel 1, label "": amd64_edac EDAC MC3: CE - no information available: amd64_edacError Overflow Add the missing space before "Error Overflow" on the second line. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-08-04 11:16:01 +02:00
Borislav Petkov	962b70a1eb	amd64_edac: Fix operator precendence error The bitwise AND is of higher precedence, make that explicit. Cc: <stable@kernel.org> # 34.x Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-08-04 11:15:09 +02:00
Borislav Petkov	eba042a81e	edac, mc: Improve scrub rate handling Fortify the interface to not accept negative values, remove memctrl_int_store() as a result. Also, sanitize bandwidth setting by making the argument a simple u32 instead of strange u32 pointer being passed around for no obvious reason. Then, fix error handling and teach it to return proper error values. Finally, make code more readable, simplify debug messages. Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Arthur Jones <ajones@riverbed.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Doug Thompson <dougthompson@xmission.com>	2010-08-03 16:14:06 +02:00
Borislav Petkov	bc57117856	amd64_edac: Correct scrub rate setting Exit early when setting scrub rate on unknown/unsupported families. Cc: <stable@kernel.org> # 32.x 33.x 34.x Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Doug Thompson <dougthompson@xmission.com>	2010-08-03 16:14:05 +02:00
Borislav Petkov	9975a5f22a	amd64_edac: Fix DCT base address selector The correct check is to verify whether in high range we're below 4GB and not to extract the DctSelBaseAddr again. See "2.8.5 Routing DRAM Requests" in the F10h BKDG. Cc: <stable@kernel.org> # .32.x .33.x .34.x Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Doug Thompson <dougthompson@xmission.com>	2010-08-03 16:14:04 +02:00
Borislav Petkov	f4347553b3	amd64_edac: Remove polling mechanism Switch to reusing the mcheck core's machine check polling mechanism instead of duplicating functionality by using the EDAC polling routine. Correct formatting while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Doug Thompson <dougthompson@xmission.com>	2010-08-03 16:14:03 +02:00
Borislav Petkov	695426506e	amd64_edac: Remove unneeded defines All F2x110-related bit defines are used at only one place so replace them with simple BIT() macros. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Doug Thompson <dougthompson@xmission.com>	2010-08-03 16:14:01 +02:00
Borislav Petkov	935ab88e34	edac: Remove EDAC_DEBUG_VERBOSE This option differs from EDAC_DEBUG only by printing the file and line of where the debug statement is placed, which contains unneeded information. So remove it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Doug Thompson <dougthompson@xmission.com>	2010-08-03 16:14:00 +02:00
Borislav Petkov	ad6a32e969	amd64_edac: Sanitize syndrome extraction Remove the two syndrome extraction macros and add a single function which does the same thing but with proper typechecking. While at it, make sure to cache ECC syndrome size and dump it in debug output. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-08-03 16:13:31 +02:00
Anton Vorontsov	952e1c6632	edac: mpc85xx: fix coldplug/hotplug module autoloading The MPC85xx EDAC driver is missing module device aliases, so the driver won't load automatically on boot. This patch fixes the issue by adding proper MODULE_DEVICE_TABLE() macros. Signed-off-by: Anton Vorontsov <avorontsov@mvista.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Peter Tyser <ptyser@xes-inc.com> Cc: Dave Jiang <djiang@mvista.com> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-07-27 14:32:06 -07:00
Daniel J Blueman	ab08937400	quiesce EDAC initialisation on desktop/mobile i7 Don't print failure to detect Core i7 EDAC facilities to the console at boot time, most often occurring on Core i7 desktops and laptops. Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-07-26 08:17:44 -07:00
Anton Vorontsov	5528e229f0	edac: mpc85xx: add support for MPC8569 EDAC controllers Simply add a proper ID into the device table. Signed-off-by: Anton Vorontsov <avorontsov@mvista.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Peter Tyser <ptyser@xes-inc.com> Cc: Dave Jiang <djiang@mvista.com> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-07-20 16:25:40 -07:00
Anton Vorontsov	1cd8521e7d	edac: mpc85xx: fix MPC85xx dependency Since commit `5753c082f6` ("powerpc/85xx: Kconfig cleanup"), there is no MPC85xx Kconfig symbol anymore, so the driver became non-selectable. This patch fixes the issue by switching to PPC_85xx symbol. Signed-off-by: Anton Vorontsov <avorontsov@mvista.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Peter Tyser <ptyser@xes-inc.com> Cc: Dave Jiang <djiang@mvista.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-07-20 16:25:40 -07:00
Linus Torvalds	62fd985717	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core: MAINTAINERS: Add an entry for i7core_edac i7core_edac: Avoid doing multiple probes for the same card i7core_edac: Properly discover the first QPI device	2010-07-04 20:12:06 -07:00
Mauro Carvalho Chehab	2d95d8158b	i7core_edac: Avoid doing multiple probes for the same card As Nehalem/Nehalem-EP/Westmere devices uses several devices for the same functionality (memory controller), the default way of proping devices doesn't work. So, instead of a per-device probe, all devices should be probed at once. This means that we should block any new attempt of probe, otherwise, it will try to register the same device several times. Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-07-02 18:04:29 -03:00
Mauro Carvalho Chehab	bda142890e	i7core_edac: Properly discover the first QPI device On Nehalem/Nehalem-EP/Westmere, the first QPI device is the last PCI bus. The last bus is generally at 0x3f or 0xff, but there are also other systems using different setups. For example, HP Z800 has 0x7f as the last bus. This patch adds a logic to discover the last bus, dynamically detecting it at runtime. Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-07-02 18:04:05 -03:00
Borislav Petkov	41c310447f	amd64_edac: Fix syndrome calculation on K8 When calculating the DCT channel from the syndrome we need to know the syndrome type (x4 vs x8). On F10h, this is read out from extended PCI cfg space register F3x180 while on K8 we only support x4 syndromes and don't have extended PCI config space anyway. Make the code accessing F3x180 F10h only and fall back to x4 syndromes on everything else. Cc: <stable@kernel.org> # .33.x .34.x Reported-by: Jeffrey Merkey <jeffmerkey@gmail.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-07-02 17:32:34 +02:00
Linus Torvalds	9a9620db07	Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core: (83 commits) i7core_edac: Better describe the supported devices Add support for Westmere to i7core_edac driver i7core_edac: don't free on success i7core_edac: Add support for X5670 Always call i7core_[ur]dimm_check_mc_ecc_err i7core_edac: fix memory leak of i7core_dev EDAC: add __init to i7core_xeon_pci_fixup i7core_edac: Fix wrong device id for channel 1 devices i7core: add support for Lynnfield alternate address i7core_edac: Add initial support for Lynnfield i7core_edac: do not export static functions edac: fix i7core build edac: i7core_edac produces undefined behaviour on 32bit i7core_edac: Use a more generic approach for probing PCI devices i7core_edac: PCI device is called NONCORE, instead of NOCORE i7core_edac: Fix ringbuffer maxsize i7core_edac: First store, then increment i7core_edac: Better parse "any" addrmask i7core_edac: Use a lockless ringbuffer edac: Create an unique instance for each kobj ...	2010-06-04 15:39:54 -07:00
Anatolij Gustschin	a26f95fed3	of/edac: fix build breakage in drivers Fixes build errors in EDAC drivers caused by the OF device_node pointer being moved into struct device Signed-off-by: Anatolij Gustschin <agust@denx.de> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2010-06-02 21:02:41 -06:00
Joe Perches	63ae96be98	drivers/edac: convert logging messages direct uses of __FILE__ to %s, __FILE Reduces text by eliminating multiple __FILE__ uses. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Joe Perches <joe@perches.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Tim Small <tim@buttersideup.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:52 -07:00
Grant Likely	cf9b59e9d3	Merge remote branch 'origin' into secretlab/next-devicetree Merging in current state of Linus' tree to deal with merge conflicts and build failures in vio.c after merge. Conflicts: drivers/i2c/busses/i2c-cpm.c drivers/i2c/busses/i2c-mpc.c drivers/net/gianfar.c Also fixed up one line in arch/powerpc/kernel/vio.c to use the correct node pointer. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2010-05-22 00:36:56 -06:00
Grant Likely	4018294b53	of: Remove duplicate fields from of_platform_driver .name, .match_table and .owner are duplicated in both of_platform_driver and device_driver. This patch is a removes the extra copies from struct of_platform_driver and converts all users to the device_driver members. This patch is a pretty mechanical change. The usage model doesn't change and if any drivers have been missed, or if anything has been fixed up incorrectly, then it will fail with a compile time error, and the fixup will be trivial. This patch looks big and scary because it touches so many files, but it should be pretty safe. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Sean MacLennan <smaclennan@pikatech.com>	2010-05-22 00:10:40 -06:00
Mauro Carvalho Chehab	52707f918c	i7core_edac: Better describe the supported devices Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-18 20:43:52 -03:00
Vernon Mauery	bd9e19ca46	Add support for Westmere to i7core_edac driver This adds new PCI IDs for the Westmere's memory controller devices and modifies the i7core_edac driver to be able to probe both Nehalem and Westmere processors. Signed-off-by: Vernon Mauery <vernux@us.ibm.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-18 20:23:56 -03:00
Roman Fietze	ee6583f6e8	PCI: fix typos pci_device_dis/enable to pci_dis/enable_device in comments This fixes all occurrences of pci_enable_device and pci_disable_device in all comments. There are no code changes involved. Signed-off-by: Roman Fietze <roman.fietze@telemotive.de> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-05-18 14:59:08 -07:00
Tony Luck	d4d1ef4515	i7core_edac: don't free on success Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-18 14:47:31 -03:00
Mauro Carvalho Chehab	ac1ececea9	i7core_edac: Add support for X5670 As reported by Vernon Mauery <vernux@us.ibm.com>, X5670 (Westmere-EP) uses a different register for one of the uncore PCI devices. Add support for it. Those are the PCI ID's on this new chipset: fe:00.0 0600: 8086:2c70 (rev 02) fe:00.1 0600: 8086:2d81 (rev 02) fe:02.0 0600: 8086:2d90 (rev 02) fe:02.1 0600: 8086:2d91 (rev 02) fe:02.2 0600: 8086:2d92 (rev 02) fe:02.3 0600: 8086:2d93 (rev 02) fe:02.4 0600: 8086:2d94 (rev 02) fe:02.5 0600: 8086:2d95 (rev 02) fe:03.0 0600: 8086:2d98 (rev 02) fe:03.1 0600: 8086:2d99 (rev 02) fe:03.2 0600: 8086:2d9a (rev 02) fe:03.4 0600: 8086:2d9c (rev 02) fe:04.0 0600: 8086:2da0 (rev 02) fe:04.1 0600: 8086:2da1 (rev 02) fe:04.2 0600: 8086:2da2 (rev 02) fe:04.3 0600: 8086:2da3 (rev 02) fe:05.0 0600: 8086:2da8 (rev 02) fe:05.1 0600: 8086:2da9 (rev 02) fe:05.2 0600: 8086:2daa (rev 02) fe:05.3 0600: 8086:2dab (rev 02) fe:06.0 0600: 8086:2db0 (rev 02) fe:06.1 0600: 8086:2db1 (rev 02) fe:06.2 0600: 8086:2db2 (rev 02) fe:06.3 0600: 8086:2db3 (rev 02) (as usual, the same PCI devices repeat at ff: bus) The PCI device 8086:2c70 is shown as: fe:00.0 Host bridge: Intel Corporation QuickPath Architecture Generic Non-core Registers (rev 02) So, for this device to be recognized, it is only a matter of adding this new PCI ID to the driver. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-18 13:15:42 -03:00
Vernon Mauery	8a311e179e	Always call i7core_[ur]dimm_check_mc_ecc_err This fixes an error in function i7core_check_error In commit `ca9c90ba09` which converts the driver to use double buffering, there is a change in the logic. Before, if mce_count was zero, it skipped over a couple of statements and finished out with a call to the check_mc_ecc_err function. The current code checks to see if mce_count is 0 and then exits. This change reverts the behavior back to the original where if there are no errors to report, we skip to the end and call the check_mc_ecc_err function. This fix allows the driver to work again on my Nehalem based blades again. Signed-off-by: Vernon Mauery <vernux@us.ibm.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-18 12:43:23 -03:00
Alexander Beregalov	2a6fae3267	i7core_edac: fix memory leak of i7core_dev Free already allocated i7core_dev. Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-18 11:45:20 -03:00
Jiri Slaby	71753e0141	EDAC: add __init to i7core_xeon_pci_fixup It's called only from an __init function and is the only user of pcibios_scan_specific_bus which will be marked as __devinit in the next patch. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-18 11:45:19 -03:00
Mauro Carvalho Chehab	508fa179f8	i7core_edac: Fix wrong device id for channel 1 devices Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 12:18:31 -03:00
Mauro Carvalho Chehab	f05da2f785	i7core: add support for Lynnfield alternate address Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 12:18:29 -03:00
Mauro Carvalho Chehab	52a2e4fc37	i7core_edac: Add initial support for Lynnfield Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 12:18:28 -03:00
Randy Dunlap	3b918c12df	edac: fix i7core build Fix build warning (missing header file) and build error when CONFIG_SMP=n. drivers/edac/i7core_edac.c:860: error: implicit declaration of function 'msleep' drivers/edac/i7core_edac.c:1700: error: 'struct cpuinfo_x86' has no member named 'phys_proc_id' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:49:32 -03:00
Alan Cox	486dd09f12	edac: i7core_edac produces undefined behaviour on 32bit Fix the shifts up Signed-off-by: Alan Cox <alan@linux.intel.com> Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:49:32 -03:00
Mauro Carvalho Chehab	de06eeef58	i7core_edac: Use a more generic approach for probing PCI devices Currently, only one PCI set of tables is allowed. This prevents using the driver for other devices like Lynnfield, with have a different set of PCI ID's. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab	fd3826549d	i7core_edac: PCI device is called NONCORE, instead of NOCORE Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab	321ece4dda	i7core_edac: Fix ringbuffer maxsize Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab	6e103be1c7	i7core_edac: First store, then increment Fix ringbuffer store logic. While here, add a few comments to the code and remove the undesired printk that could otherwise be called during NMI time. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab	4f87fad1d3	i7core_edac: Better parse "any" addrmask Instead of accepting just "any", accept also "any\n" Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:49:30 -03:00
Mauro Carvalho Chehab	ca9c90ba09	i7core_edac: Use a lockless ringbuffer Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:49:30 -03:00
Mauro Carvalho Chehab	b968759ee7	edac: Create an unique instance for each kobj Current code only works when there's just one memory controller, since we need one kobj for each instance. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:49:30 -03:00
Mauro Carvalho Chehab	f338d73691	i7core_edac: Convert UDIMM error counters into a proper sysfs group Instead of displaying 3 values at the same var, break it into 3 different sysfs nodes: /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm0 /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm1 /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm2 For registered dimms, however, the error counters are already being displayed at: /sys/devices/system/edac/mc/mc0/csrow*/ce_count So, there's no need to add any extra sysfs nodes. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:45:02 -03:00
Mauro Carvalho Chehab	c419d921e6	edac: Don't create csrow entries on instance groups Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:45:02 -03:00
Mauro Carvalho Chehab	cc301b3ae3	edac: store/show methods for device groups weren't working Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:45:02 -03:00
Mauro Carvalho Chehab	a5538e531f	i7core_edac: Add support for sysfs addrmatch group Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:45:01 -03:00
Mauro Carvalho Chehab	9fa2fc2e2d	edac_core: Allow the creation of sysfs groups Currently, all sysfs nodes are stored at /sys/.*/mc. (regex) However, sometimes it is needed to create attribute groups. This patch extends edac_core to allow groups creation. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:45:01 -03:00
Mauro Carvalho Chehab	4af91889e0	i7core_edac: Avoid printing a warning when debug is disabled Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab	4253868034	i7core_edac: We need to use list_for_each_entry_safe to avoid errors Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab	22e6bcbdcf	i7core_edac: change remove module strategy The old remove module stragegy didn't work on devices with multiple cores, since only one PCI device is used to open all mc's, due to Nehalem nature. Also, it were based at pdev value. However, this doesn't point to the pci device used at mci->dev. So, instead, it unregisters all devices at once, deleting them from the device list. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab	0f062792b4	i7core_edac: remove static counter for max sockets The number of sockets is now fully dynamic. Get rid of this obsolete var. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab	13d6e9b653	i7core_edac: at remove, don't remove all pci devices at once Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:59 -03:00
Mauro Carvalho Chehab	d88b85072f	i7core_edac: Fix a bug when printing error counts with RDIMMs Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:59 -03:00
Mauro Carvalho Chehab	d4c277957f	i7core_edac: a few fixes for multiple mc's Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:59 -03:00
Mauro Carvalho Chehab	6c6aa3afdb	i7core_edac: sanity check: print a warning if a mcelog is ignored In thesis, the other mc controller should handle it. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab	f47429494f	i7core_edac: create one mc per socket/QPI Instead of creating just one memory controller, create one per socket (e. g. per Quick Link Path Interconnect). This better reflects the Nehalem architecture. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab	66607706ce	Dynamically allocate memory for PCI devices Instead of using a static table assuming always 2 CPU sockets, allocate space dynamically for Nehalem PCI devs. This patch is part of a series of patches that changes i7core_edac to allow more than 2 sockets and to properly report one memory controller per socket.	2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab	a55456f344	i7core: temporary workaround to allow it to compile against 2.6.30 Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab	3a3bb4a647	i7core_edac: Improve corrected_error_counts output for RDIMM Just cosmetics. instead of showing something like: socket 0, channel 2dimm0: 1 dimm1: 0 dimm2: 0 socket 1, channel 2dimm0: 0 dimm1: 0 dimm2: 0 Show: socket 0, channel 2 RDIMM0: 1 RDIMM1: 0 RDIMM2: 0 socket 0, channel 2 RDIMM0: 0 RDIMM1: 0 RDIMM2: 0 This is more synthetic and easier to parse. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:58 -03:00
Keith Mannthey	bc2d7245ff	i7core_edac: Probe on Xeons eariler On the Xeon 55XX series cpus the pci deives are not exposed via acpi so we much explicitly probe them to make the usable as a Linux PCI device. This moves the detection of this state to before pci_register_driver is called. Its present position was not working on my systems, the driver would complain about not finding a specific device. This patch allows the driver to load on my systems. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:57 -03:00
Mauro Carvalho Chehab	14d2c08343	i7core: Use registered memories per processor Instead of assuming that the entire machine has either registered or unregistered memories, do it at CPU socket based. While here, fix a bug at i7core_mce_output_error(), where the we're using m->cpu directly as if it would represent a socket. Instead, the proper socket_id is given by cpu_data[m->cpu].phys_proc_id. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> ---	2010-05-10 11:44:57 -03:00
Mauro Carvalho Chehab	b4e8f0b6ea	i7core_edac: Use Device 3 function 2 to report errors with RDIMM's Nehalem and upper chipsets provide an special device that has corrected memory error counters detected with registered dimms. This device is only seen if there are registered memories plugged. After this patch, on a machine fully equiped with RDIMM's, it will use the Device 3 function 2 to count corrected errors instead on relying at mcelog. For unregistered DIMMs, it will keep the old behavior, counting errors via mcelog. This patch were developed together with Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:56 -03:00
Keith Mannthey	61053fdedb	i7core_edac: Fix ecc enable shift From: Keith Mannthey <kmannth@us.ibm.com> Simple correction to a shift value. ECC_ENABLED is bit 4 of MC_STATUS, Dev 3 Fun 0 Offset 0x4c This correctly identifies the state of the ECC at the machine. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:56 -03:00
Mauro Carvalho Chehab	3ef288a983	i7core_edac: Print an error message if pci register fails Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:56 -03:00
Mauro Carvalho Chehab	b990538a78	i7core_edac: CodingSyle fixes/cleanups No functional changes. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:56 -03:00
Mauro Carvalho Chehab	4157d9f554	i7core_edac: fix error injection There were two stupid error injection bugs introduced by wrong cut-and-paste: one at socket store, and another at the error inject register. The last one were causing the code to not work at all. While here, adds debug messages to allow seeing what registers are being set while sending error injection. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:55 -03:00
Mauro Carvalho Chehab	2068def56c	i7core_edac: fix error codes for sysfs error injection interface Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:55 -03:00
Mauro Carvalho Chehab	276b824c30	i7core_edac: some fixes at error injection code Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:54 -03:00
Mauro Carvalho Chehab	17cb7b0cf7	i7core_edac: Some cleanups at displayed info Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:54 -03:00
Mauro Carvalho Chehab	086271a037	i7core: remove some uneeded noisy debug messages Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:54 -03:00
Mauro Carvalho Chehab	3a7dde7fcd	i7core: add socket info at the debug msg Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab	ec6df24c15	i7core: better document i7core_get_active_channels() Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab	c77720b954	i7core: fix get_devices routine for Xeon55xx i7core_get_devices() were preparet to get just the first found device of each type. Due to that, on Xeon 55xx, only socket 1 were retrived. Rework i7core_get_devices() to clean it and to properly support Xeon 55xx. While here, fix a small typo. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab	a639539fa2	i7core: enrich error information based on memory transaction type Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab	c5d3452869	i7core: check if the memory error is fatal or non-fatal Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab	310cbb7284	i7core: fix probing on Xeon55xx Xeon55xx fails to probe with this error message: EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1660: MC: drivers/edac/i7core_edac.c: i7core_init() EDAC i7core: Device not found: dev 00:00.0 PCI ID 8086:2c41 i7core_edac: probe of 0000:00:14.0 failed with error -22 This is due to the fact that, on Xeon35xx (and i7core), device 00.0 has PCI ID 8086:2c40. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:52 -03:00
Mauro Carvalho Chehab	f237fcf2b7	i7core_edac: some fixes at memory error parser m->bank is not related to the memory bank but, instead, to the MCA Error register bank. Fix it accordingly. While here, improves the comments for Nehalem bank. A later fix is needed, in order to get bank/rank information from MCA error log. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:52 -03:00
Mauro Carvalho Chehab	8a2f118e3a	i7core_edac: decode mcelog error and send it via edac interface Enriches mcelog error by using the encoded information at MCE status and misc registers (IA32_MCx_STATUS, IA32_MCx_MISC). Some fixes are still needed here, in order to properly fill the EDAC fields. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:52 -03:00
Mauro Carvalho Chehab	ba6c5c62ee	i7core_edac: maps all sockets as if ther are one MC controller Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:52 -03:00
Mauro Carvalho Chehab	67166af4ab	i7core_edac: add support for more than one MC socket Some Nehalem architectures have more than one MC socket. Socket 0 is located at bus 255. Currently, it is using up to 2 sockets, but increasing it to a larger number is just a matter of increasing MAX_SOCKETS definition. This seems to be required for properly support of Xeon 55xx. Still needs testing with Xeon 55xx. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:51 -03:00
Mauro Carvalho Chehab	d1fd4fb69e	i7core_edac: Add a code to probe Xeon 55xx bus This code changes the detection procedure of i7core_edac. Instead of directly probing for MC registers, it probes for another register found on Nehalem. If found, it tries to pick the first MC PCI BUS. This should work fine with Xeon 35xx, but, on Xeon 55xx, this is at bus 254 and 255 that are not properly detected by the non-legacy PCI methods. The new detection code scans specifically at buses 254 and 255 for the Xeon 55xx devices. This code has not tested yet. After working, a change at the code will be needed, since the i7core is not yet ready for working with 2 sets of MC. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:51 -03:00
Mauro Carvalho Chehab	e9bd2e7379	i7core_edac: Adds write unlock to MC registers The public Intel Xeon 5500 volume 2 datasheet describes, on page 53, session 2.6.7 a register that can lock/unlock Memory Controller the configuration register, called MC_CFG_CONTROL. Adds support for it in the hope that software error injection would work. With my tests with Xeon 35xx, there's still something missing. With a program that does sequencial bit writes at dev 0.0, sometimes, it produces error injection, after unblocking the MC_CFG_CONTROL (and, sometimes, it just locks my testing machine). I'll try later to discover by trial and error what's the register that solves this issue on Xeon 35xx. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:50 -03:00
Mauro Carvalho Chehab	d5381642ab	i7core_edac: Add edac_mce glue Adds a glue code to allow i7core to work with mcelog. With the glue, i7core registers itself on edac_mce. At mce, when an error is detected, it calls all registered drivers (in this case, i7core), for EDAC error handling. TODO: It currently just prints the MCE error log using about the same format as mce panic messages. The error message should be enhanced with mcelog userspace info and converted into the proper EDAC format, to feed the EDAC error counts. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:50 -03:00
Mauro Carvalho Chehab	963c5ba359	edac/Kconfig: edac_mce can't be module Since mcelog is bool, edac_mce glue should also be bool, or otherwise will not work. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:50 -03:00
Mauro Carvalho Chehab	696e409dbd	edac_mce: Add an interface driver to report mce errors via edac edac_mce module is an interface module that gets mcelog data and forwards to any registered edac module that expects to receive data via mce. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:49 -03:00
Mauro Carvalho Chehab	41fcb7feed	i7core_edac: CodingStyle fixes Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:48 -03:00
Mauro Carvalho Chehab	eb94fc402f	i7core_edac: fill csrows edac sysfs info csrows is still fake, since we can't identify its representation with Nehalem registers. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:48 -03:00
Mauro Carvalho Chehab	5566cb7c91	i7core_edac: Memory info fixes and preparation for properly filling cswrow data Now, memory size is properly displayed: EDAC i7core: DOD Max limits: DIMMS: 2, 1-ranked, 8-banked EDAC i7core: DOD Max rows x colums = 0x4000 x 0x400 EDAC i7core: Memory channel configuration: EDAC i7core: Ch0 phy rd0, wr0 (0x063f7c31): 2 ranks, UDIMMs EDAC i7core: dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8, numrank: 1, numrow: 0x4000, numcol: 0x400 EDAC i7core: dimm 1 (0x00001288) 1024 Mb offset: 4, numbank: 8, numrank: 1, numrow: 0x4000, numcol: 0x400 EDAC i7core: Ch1 phy rd1, wr1 (0x063f7c31): 2 ranks, UDIMMs EDAC i7core: dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8, numrank: 1, numrow: 0x4000, numcol: 0x400 EDAC i7core: Ch2 phy rd3, wr3 (0x063f7c31): 2 ranks, UDIMMs EDAC i7core: dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8, numrank: 1, numrow: 0x4000, numcol: 0x400 Still, as the way to retrieve csrows info is not known, it does a mapping of what's available to csrows basic unit at edac core. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2010-05-10 11:44:48 -03:00

1 2 3 4 5 ...

497 Commits