linux

Author	SHA1	Message	Date
Linus Torvalds	047486d8e7	Merge tag 'edac_for_4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC updates from Borislav Petkov: - Altera: L2 cache and On-Chip RAM support (Thor Thayer). - EDAC: Workqueue handling cleanups (Borislav Petkov). - Xgene: Register bus error handling (Loc Ho). - Misc small fixes. * tag 'edac_for_4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: ARM: socfpga: Enable OCRAM ECC on startup ARM: socfpga: Enable L2 cache ECC on startup ARM: dts: Add Altera L2 Cache and OCRAM EDAC entries EDAC, altera: Add Altera L2 cache and OCRAM support EDAC: Use edac_debugfs_remove_recursive() in edac_debugfs_exit() EDAC, mpc85xx: Silence unused variable warning EDAC: Cleanup/sync workqueue functions EDAC: Kill workqueue setup/teardown functions EDAC: Balance workqueue setup and teardown arm64: Update the APM X-Gene EDAC node with the RB register resource EDAC, xgene: Add missing SoC register bus error handling Documentation, EDAC: Update xgene binding for missing register bus EDAC, amd64_edac: Shift wrapping issue in f1x_get_norm_dct_addr()	2016-03-16 08:36:55 -07:00
Linus Torvalds	d88bfe1d68	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "Various RAS updates: - AMD MCE support updates for future CPUs, fixes and 'SMCA' (Scalable MCA) error decoding support (Aravind Gopalakrishnan) - x86 memcpy_mcsafe() support, to enable smart(er) hardware error recovery in NVDIMM drivers, based on an extension of the x86 exception handling code. (Tony Luck)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: EDAC/sb_edac: Fix computation of channel address x86/mm, x86/mce: Add memcpy_mcsafe() x86/mce/AMD: Document some functionality x86/mce: Clarify comments regarding deferred error x86/mce/AMD: Fix logic to obtain block address x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors x86/mce: Move MCx_CONFIG MSR definitions x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries x86/mm: Expand the exception table logic to allow new handling options x86/mce/AMD: Set MCAX Enable bit x86/mce/AMD: Carve out threshold block preparation x86/mce/AMD: Fix LVT offset configuration for thresholding x86/mce/AMD: Reduce number of blocks scanned per bank x86/mce/AMD: Do not perform shared bank check for future processors x86/mce: Fix order of AMD MCE init function call	2016-03-14 18:43:51 -07:00
Luck, Tony	eb1af3b71f	EDAC/sb_edac: Fix computation of channel address Large memory Haswell-EX systems with multiple DIMMs per channel were sometimes reporting the wrong DIMM. Found three problems: 1) Debug printouts for socket and channel interleave were not interpreting the register fields correctly. The socket interleave field is a 2^X value (0=1, 1=2, 2=4, 3=8). The channel interleave is X+1 (0=1, 1=2, 2=3. 3=4). 2) Actual use of the socket interleave value didn't interpret as 2^X 3) Conversion of address to channel address was complicated, and wrong. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-03-10 18:31:55 +01:00
Aravind Gopalakrishnan	be0aec23bf	x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors For Scalable MCA enabled processors, errors are listed per IP block. And since it is not required for an IP to map to a particular bank, we need to use HWID and McaType values from the MCx_IPID register to figure out which IP a given bank represents. We also have a new bit (TCC) in the MCx_STATUS register to indicate Task context is corrupt. Add logic here to decode errors from all known IP blocks for Fam17h Model 00-0fh and to print TCC errors. [ Minor fixups. ] Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1457021458-2522-3-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-03-08 11:48:14 +01:00
Hubert Chrzaniuk	83bdaad4d9	EDAC, sb_edac: Fix logic when computing DIMM sizes on Xeon Phi Correct a typo introduced by `d0cdf90031` ("EDAC, sb_edac: Add Knights Landing (Xeon Phi gen 2) support") As a result under some configurations DIMMs were not correctly recognized. Problem affects only Xeon Phi architecture. Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1457361045-26221-1-git-send-email-hubert.chrzaniuk@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-03-07 19:07:40 +01:00
Thor Thayer	c3eea1942a	EDAC, altera: Add Altera L2 cache and OCRAM support Add L2 Cache and On-Chip RAM EDAC support for the Altera SoCs. The SDRAM controller is using the Memory Controller model. Each type of ECC is individually configurable. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: galak@codeaurora.org Cc: grant.likely@linaro.org Cc: ijc+devicetree@hellion.org.uk Cc: linux-arm-kernel@lists.infradead.org Cc: linux@arm.linux.org.uk Cc: linux-doc@vger.kernel.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mark.rutland@arm.com Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: pawel.moll@arm.com Cc: robh+dt@kernel.org Link: http://lkml.kernel.org/r/1455132384-17108-1-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-02-11 12:23:06 +01:00
Thor Thayer	9bf4f00567	EDAC: Use edac_debugfs_remove_recursive() in edac_debugfs_exit() debugfs_remove() is used to remove a file or a directory from the debugfs filesystem on an EDAC device exit. However edac_debugfs might not be empty. This is similar to `30f84a891b` ("EDAC: Use edac_debugfs_remove_recursive()") which changed the EDAC MCI code to use edac_debugfs_remove_recursive(). Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1455064165-3816-1-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-02-10 10:37:46 +01:00
Sudip Mukherjee	f2b59ac66f	EDAC, mpc85xx: Silence unused variable warning We were getting this build warning: drivers/edac/mpc85xx_edac.c:1247:6: warning: unused variable 'pvr' pvr is only used if CONFIG_FSL_SOC_BOOKE is defined. Declare it __maybe_unused. Suggested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1454427573-7994-1-git-send-email-sudipm.mukherjee@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-02-02 18:53:15 +01:00
Borislav Petkov	06e912d4d4	EDAC: Cleanup/sync workqueue functions They're both running only when ->edac_check is initialized so remove that check from the workqueue function itself. Synchronize/generalize the ->op_state check between the two. Kill useless comments, while at it. Signed-off-by: Borislav Petkov <bp@suse.de>	2016-02-02 11:38:50 +01:00
Borislav Petkov	626a7a4dba	EDAC: Kill workqueue setup/teardown functions We have the generic wrappers now, use those. edac_pci_workq_setup() had an unused argument anyway. Signed-off-by: Borislav Petkov <bp@suse.de>	2016-02-02 11:38:43 +01:00
Borislav Petkov	0966760619	EDAC: Balance workqueue setup and teardown We use the ->edac_check function pointers to determine whether we need to setup a polling workqueue. However, the destroy path is not balanced and we might try to teardown an unitialized workqueue. Balance init and destroy paths by looking at ->edac_check in both cases. Set op_state to OP_OFFLINE before destroying anything. Reported-by: Zhiqiang Hou <Zhiqiang.Hou@freescale.com> Cc: Varun Sethi <Varun.Sethi@freescale.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2016-02-02 11:04:29 +01:00
Loc Ho	4d67e3ce75	EDAC, xgene: Add missing SoC register bus error handling Add missing register bus error handling for APM X-Gene EDAC SoC and fix a checking condition for CE error promoted to UE. Signed-off-by: Loc Ho <lho@apm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: devicetree@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: patches@apm.com Link: http://lkml.kernel.org/r/1453495625-28006-3-git-send-email-lho@apm.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-01-25 11:17:22 +01:00
Dan Carpenter	6f3508f61c	EDAC, amd64_edac: Shift wrapping issue in f1x_get_norm_dct_addr() dct_sel_base_off is declared as a u64 but we're only using the lower 32 bits because of a shift wrapping bug. This can possibly truncate the upper 16 bits of DctSelBaseOffset[47:26], causing us to misdecode the CS row. Fixes: `c8e518d567` ('amd64_edac: Sanitize f10_get_base_addr_offset') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20160120095451.GB19898@mwanda Signed-off-by: Borislav Petkov <bp@suse.de>	2016-01-25 11:17:14 +01:00
Geliang Tang	1cac5503fb	EDAC, i5100: Use to_delayed_work() Use to_delayed_work() instead of open-coding it. Signed-off-by: Geliang Tang <geliangtang@163.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/58c0e319c7263a10b692100c657c06c42814aecf.1451659910.git.geliangtang@163.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-01-01 18:31:34 +01:00
Hubert Chrzaniuk	45f4d3ab3e	EDAC, sb_edac: Set fixed DIMM width on Xeon Knights Landing Knights Landing does not come with register that could be used to fetch DIMM width. However the value is fixed for this architecture so it can be hardcoded. Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1449840082-18673-1-git-send-email-hubert.chrzaniuk@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-11 16:58:32 +01:00
Borislav Petkov	c4cf3b454e	EDAC: Rework workqueue handling Hide the EDAC workqueue pointer in a separate compilation unit and add accessors for the workqueue manipulations needed. Remove edac_pci_reset_delay_period() which wasn't used by anything. It seems it got added without a user with `91b99041c1` ("drivers/edac: updated PCI monitoring") Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-11 16:56:43 +01:00
Borislav Petkov	e136fa016f	EDAC: Make edac_device workqueue setup/teardown functions static They're not used anywhere else. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-11 16:56:42 +01:00
Borislav Petkov	d4538000ca	EDAC: Remove edac_get_sysfs_subsys() error handling It cannot fail now. We either load EDAC core after having successfully initialized edac_subsys or we don't. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-11 16:56:41 +01:00
Borislav Petkov	a97d262701	EDAC: Unexport and make edac_subsys static ... and use the accessor instead. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-11 16:56:40 +01:00
Borislav Petkov	733476cf20	EDAC: Rip out the edac_subsys reference counting This was really dumb - reference counting for the main EDAC sysfs object. While we could've simply registered it as the first thing in the module init path and then hand it around to what needs it. Do that and rip out all the code around it, thus simplifying the whole handling significantly. Move the edac_subsys node back to edac_module.c. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-11 16:56:39 +01:00
Borislav Petkov	fcd5c4dd82	EDAC: Robustify workqueues destruction EDAC workqueue destruction is really fragile. We cancel delayed work but if it is still running and requeues itself, we still go ahead and destroy the workqueue and the queued work explodes when workqueue core attempts to run it. Make the destruction more robust by switching op_state to offline so that requeuing stops. Cancel any pending work synchronously too. EDAC i7core: Driver loaded. general protection fault: 0000 [#1] SMP CPU 12 Modules linked in: Supported: Yes Pid: 0, comm: kworker/0:1 Tainted: G IE 3.0.101-0-default #1 HP ProLiant DL380 G7 RIP: 0010:[<ffffffff8107dcd7>] [<ffffffff8107dcd7>] __queue_work+0x17/0x3f0 < ... regs ...> Process kworker/0:1 (pid: 0, threadinfo ffff88019def6000, task ffff88019def4600) Stack: ... Call Trace: call_timer_fn run_timer_softirq __do_softirq call_softirq do_softirq irq_exit smp_apic_timer_interrupt apic_timer_interrupt intel_idle cpuidle_idle_call cpu_idle Code: ... RIP __queue_work RSP <...> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org>	2015-12-11 16:56:39 +01:00
Borislav Petkov	12e26969b3	EDAC, mc_sysfs: Fix freeing bus' name I get the splat below when modprobing/rmmoding EDAC drivers. It happens because bus->name is invalid after bus_unregister() has run. The Code: section below corresponds to: .loc 1 1108 0 movq 672(%rbx), %rax # mci_1(D)->bus, mci_1(D)->bus .loc 1 1109 0 popq %rbx # .loc 1 1108 0 movq (%rax), %rdi # _7->name, jmp kfree # and %rax has some funky stuff 2030203020312030 which looks a lot like something walked over it. Fix that by saving the name ptr before doing stuff to string it points to. general protection fault: 0000 [#1] SMP Modules linked in: ... CPU: 4 PID: 10318 Comm: modprobe Tainted: G I EN 3.12.51-11-default+ #48 Hardware name: HP ProLiant DL380 G7, BIOS P67 05/05/2011 task: ffff880311320280 ti: ffff88030da3e000 task.ti: ffff88030da3e000 RIP: 0010:[<ffffffffa019da92>] [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core] RSP: 0018:ffff88030da3fe28 EFLAGS: 00010292 RAX: 2030203020312030 RBX: ffff880311b4e000 RCX: 000000000000095c RDX: 0000000000000001 RSI: ffff880327bb9600 RDI: 0000000000000286 RBP: ffff880311b4e750 R08: 0000000000000000 R09: ffffffff81296110 R10: 0000000000000400 R11: 0000000000000000 R12: ffff88030ba1ac68 R13: 0000000000000001 R14: 00000000011b02f0 R15: 0000000000000000 FS: 00007fc9bf8f5700(0000) GS:ffff8801a7c40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000403c90 CR3: 000000019ebdf000 CR4: 00000000000007e0 Stack: Call Trace: i7core_unregister_mci.isra.9 i7core_remove pci_device_remove __device_release_driver driver_detach bus_remove_driver pci_unregister_driver i7core_exit SyS_delete_module system_call_fastpath 0x7fc9bf426536 Code: 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 48 89 fb e8 52 2a 1f e1 48 8b bb a0 02 00 00 e8 46 59 1f e1 48 8b 83 a0 02 00 00 5b <48> 8b 38 e9 26 9a fe e0 66 0f 1f 44 00 00 66 66 66 66 90 48 8b RIP [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core] RSP <ffff88030da3fe28> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: <stable@vger.kernel.org> # v3.6.. Fixes: `7a623c0390` ("edac: rewrite the sysfs code to use struct device")	2015-12-11 16:56:38 +01:00
Scott Wood	666db563d3	EDAC, mpc85xx: Make mpc85xx-pci-edac a platform device Originally the mpc85xx-pci-edac driver bound directly to the PCI controller node. Commit `905e75c46d` ("powerpc/fsl-pci: Unify pci/pcie initialization code") turned the PCI controller code into a platform device. Since we can't have two drivers binding to the same device, the EDAC code was changed to be called into as a library-style submodule. However, this doesn't work if the EDAC driver is built as a module. Commit 8d8fcba6d1ea ("EDAC: Rip out the edac_subsys reference counting") exposed another problem with this approach -- mpc85xx_pci_err_probe() was being called in the same early boot phase that the PCI controller is initialized, rather than in the device_initcall phase that the EDAC layer expects. This caused a crash on boot. To fix this, the PCI controller code now creates a child platform device specifically for EDAC, which the mpc85xx-pci-edac driver binds to. Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Daniel Axtens <dja@axtens.net> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Jia Hongtao <B38951@freescale.com> Cc: Jiri Kosina <jkosina@suse.com> Cc: Kim Phillips <kim.phillips@freescale.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Masanari Iida <standby24x7@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rob Herring <robh@kernel.org> Link: http://lkml.kernel.org/r/1449774432-18593-1-git-send-email-scottwood@freescale.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-11 16:56:16 +01:00
Jim Snow	d0cdf90031	EDAC, sb_edac: Add Knights Landing (Xeon Phi gen 2) support Knights Landing is the next generation architecture for HPC market. KNL introduces concept of a tile and CHA - Cache/Home Agent for memory accesses. Some things are fixed in KNL: () There's single DIMM slot per channel () There's 2 memory controllers with 3 channels each, however, from EDAC standpoint, it is presented as single memory controller with 6 channels. In order to represent 2 MCs w/ 3 CH, it would require major redesign of EDAC core driver. Basically, two functionalities are added/extended: () during driver initialization KNL topology is being recognized, i.e. which channels are populated with what DIMM sizes (knl_get_dimm_capacity function) () handle MCE errors - channel swizzling Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Jim Snow <jim.m.snow@intel.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1449136134-23706-5-git-send-email-hubert.chrzaniuk@intel.com [ Rebase to 4.4-rc3. ] Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-05 19:00:52 +01:00
Jim Snow	c1979ba254	EDAC, sb_edac: Add support for duplicate device IDs Add options to sbridge_get_all_devices() to allow for duplicate device IDs and devices that are scattered across mulitple PCI buses. Signed-off-by: Jim Snow <jim.m.snow@intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1449136134-23706-4-git-send-email-hubert.chrzaniuk@intel.com [ Rebase to 4.4-rc3. ] Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-05 18:57:41 +01:00
Jim Snow	c59f9c06bd	EDAC, sb_edac: Virtualize several hard-coded functions SAD limit, interleave mode and DRAM related functionalities are now virtualized, so that overriding them is easier. Signed-off-by: Jim Snow <jim.m.snow@intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1449136134-23706-3-git-send-email-hubert.chrzaniuk@intel.com [ Rebase to 4.4-rc3. ] Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-05 18:54:45 +01:00
Thierry Reding	768ce42cce	EDAC, mv64x60: Use platform_register/unregister_drivers() These new helpers simplify implementing multi-driver modules and properly handle failure to register one driver by unregistering all previously registered drivers. Signed-off-by: Thierry Reding <treding@nvidia.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1449073138-10852-2-git-send-email-thierry.reding@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-03 12:05:40 +01:00
Thierry Reding	d54051f1cc	EDAC, mpc85xx: Use platform_register/unregister_drivers() These new helpers simplify implementing multi-driver modules and properly handle failure to register one driver by unregistering all previously registered drivers. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Thierry Reding <treding@nvidia.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1449136632-11680-1-git-send-email-thierry.reding@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-03 12:03:36 +01:00
Borislav Petkov	e8f937da74	EDAC, pci: Remove old disabled code Remove an unused edac_pci_find() function iterating over edac_pci_list. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-11-18 14:00:05 +01:00
Linus Torvalds	9cf5c095b6	Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic cleanups from Arnd Bergmann: "The asm-generic changes for 4.4 are mostly a series from Christoph Hellwig to clean up various abuses of headers in there. The patch to rename the io-64-nonatomic-.h headers caused some conflicts with new users, so I added a workaround that we can remove in the next merge window. The only other patch is a warning fix from Marek Vasut" tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic: temporarily add back asm-generic/io-64-nonatomic.h asm-generic: cmpxchg: avoid warnings from macro-ized cmpxchg() implementations gpio-mxc: stop including <asm-generic/bug> n_tracesink: stop including <asm-generic/bug> n_tracerouter: stop including <asm-generic/bug> mlx5: stop including <asm-generic/kmap_types.h> hifn_795x: stop including <asm-generic/kmap_types.h> drbd: stop including <asm-generic/kmap_types.h> move count_zeroes.h out of asm-generic move io-64-nonatomic.h out of asm-generic	2015-11-06 14:22:15 -08:00
Linus Torvalds	b831ef2cad	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS changes from Ingo Molnar: "The main system reliability related changes were from x86, but also some generic RAS changes: - AMD MCE error injection subsystem enhancements. (Aravind Gopalakrishnan) - Fix MCE and CPU hotplug interaction bug. (Ashok Raj) - kcrash bootup robustness fix. (Baoquan He) - kcrash cleanups. (Borislav Petkov) - x86 microcode driver rework: simplify it by unmodularizing it and other cleanups. (Borislav Petkov)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/mce: Add a default case to the switch in __mcheck_cpu_ancient_init() x86/mce: Add a Scalable MCA vendor flags bit MAINTAINERS: Unify the microcode driver section x86/microcode/intel: Move #ifdef DEBUG inside the function x86/microcode/amd: Remove maintainers from comments x86/microcode: Remove modularization leftovers x86/microcode: Merge the early microcode loader x86/microcode: Unmodularize the microcode driver x86/mce: Fix thermal throttling reporting after kexec kexec/crash: Say which char is the unrecognized x86/setup/crash: Check memblock_reserve() retval x86/setup/crash: Cleanup some more x86/setup/crash: Remove alignment variable x86/setup: Cleanup crashkernel reservation functions x86/amd_nb, EDAC: Rename amd_get_node_id() x86/setup: Do not reserve crashkernel high memory if low reservation failed x86/microcode/amd: Do not overwrite final patch levels x86/microcode/amd: Extract current patch level read to a function x86/ras/mce_amd_inj: Inject bank 4 errors on the NBC x86/ras/mce_amd_inj: Trigger deferred and thresholding errors interrupts ...	2015-11-03 17:51:33 -08:00
Tan Xiaojun	990995bad1	EDAC: Fix PAGES_TO_MiB macro misuse The PAGES_TO_MiB macro is used for unit conversion but the trace_mc_event() tracepoint expects a page address. Fix that. Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1445341538-24271-1-git-send-email-tanxiaojun@huawei.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-10-22 22:57:30 +02:00
Aravind Gopalakrishnan	1a6775c1a2	x86/amd_nb, EDAC: Rename amd_get_node_id() This function doesn't give us the "Node ID" as the function name suggests. Rather, it receives a PCI device as argument, checks the available F3 PCI device IDs in the system and returns the index of the matching Bus/Device IDs. Rename it to amd_pci_dev_to_node_id(). No functional change is introduced. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1445246268-26285-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-10-21 11:10:55 +02:00
Dinh Nguyen	941fd2e709	EDAC, altera: SoCFPGA EDAC should not look for ECC_CORR_EN The bootloader may or may not enable the ECC_CORR_EN bit. By not enabling ECC_CORR_EN, when error happens, it is the user's responsibility to perform a full SDRAM scrub. Remove the check for ECC_CORR_EN. Signed-off-by: Dinh Nguyen <dinguyen@opensource.altera.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Thor Thayer <tthayer@opensource.altera.com> Link: http://lkml.kernel.org/r/1444864456-21778-1-git-send-email-dinguyen@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-10-15 11:57:23 +02:00
Christoph Hellwig	2f8e2c8777	move io-64-nonatomic*.h out of asm-generic These are not implementations of default architecture code but helpers for drivers. Move them to the place they belong to. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Darren Hart <dvhart@linux.intel.com> Acked-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2015-10-15 00:21:07 +02:00
Tan Xiaojun	30f84a891b	EDAC: Use edac_debugfs_remove_recursive() debugfs_remove() is used to remove a file or a directory from the debugfs filesystem, but mci->debugfs might not empty. This can be triggered by the following sequence: 1) Enable CONFIG_EDAC_DEBUG 2) insmod an EDAC module (like i3000_edac or similar) 3) rmmod this module 4) we can see files remaining under <debugfs_mountpoint>/edac/ like "fake_inject", for example. Removing edac_core then, causes a NULL pointer dereference. Reported-by: Yun Wu (Abel) <wuyun.wu@huawei.com> Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1444787364-104353-1-git-send-email-tanxiaojun@huawei.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-10-14 18:50:32 +02:00
Luis de Bethencourt	90b3b37383	EDAC, ppc4xx_edac: Fix module autoload for OF platform driver This platform driver has an OF device ID table but the OF module alias information is not created so module autoloading won't work. Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20150917114619.GA13145@goodgumbo.baconseed.org Signed-off-by: Borislav Petkov <bp@suse.de>	2015-10-03 12:19:42 +02:00
Aravind Gopalakrishnan	1a8bc7707e	EDAC, amd64_edac: Update copyright and remove changelog Git provides us all the changelogs anyway. So trim the comments section here. Update the copyrights info while at it. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1443440593-2316-3-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-09-29 13:41:04 +02:00
Aravind Gopalakrishnan	da92110dfd	EDAC, amd64_edac: Extend scrub rate support to F15hM60h The scrub rate control register has moved to function 2 in PCI config space and is at a different offset on family 0x15, models 0x60 and later. The minimum recommended scrub rate has also changed. (Refer to D18F2x1c9_dct[1:0][DramScrub] in Fam15hM60h BKDG). Adjust set_scrub_rate() and get_scrub_rate() functions to accommodate this. Tested on F15hM60h, Fam15h, models 00h-0fh and Fam10h systems. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1443440593-2316-2-git-send-email-Aravind.Gopalakrishnan@amd.com [ Cleanup conditionals. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2015-09-29 13:25:33 +02:00
Toshi Kani	d0c9c93019	EDAC: Don't allow empty DIMM labels Updating dimm_label to an empty string does not make much sense. Change the sysfs dimm_label store operation to fail a request when an input string is empty. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: elliott@hpe.com Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1443124767.25474.172.camel@hpe.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-09-28 16:39:05 +02:00
Toshi Kani	438470b84c	EDAC: Fix sysfs dimm_label store operation Sysfs "dimm_label" and "chX_dimm_label" nodes have the following issues in their store operation: 1) A newline-terminated input string causes redundant newlines: # echo "test" > /sys/bus/mc0/devices/dimm0/dimm_label # cat /sys/bus/mc0/devices/dimm0/dimm_label test # od -bc /sys/bus/mc0/devices/dimm0/dimm_label 0000000 164 145 163 164 012 012 t e s t \n \n 0000006 2) The original label string (31 characters) cannot be stored due to an improper size check: # echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0" > /sys/bus/mc0/devices/dimm0/dimm_label # cat /sys/bus/mc0/devices/dimm0/dimm_label # od -bc /sys/bus/mc0/devices/dimm0/dimm_label 0000000 012 012 \n \n 0000002 3) An input string longer than the buffer size results a wrong label info as it allows a retry with the remaining string: # echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0_TEST" > /sys/bus/mc0/devices/dimm0/dimm_label # cat /sys/bus/mc0/devices/dimm0/dimm_label _TEST Fix these issues by making the following changes: 1) Replace a newline character at the end by setting a null. It also assures that the string is null-terminated in the label buffer. 2) Check the label buffer size with 'sizeof(dimm->label)'. 3) Fail a request if its string exceeds the label buffer size. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Robert Elliott <elliott@hpe.com> Link: http://lkml.kernel.org/r/1443121564.25474.160.camel@hpe.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-09-25 19:45:59 +02:00
Toshi Kani	1ea62c59c8	EDAC: Fix sysfs dimm_label show operation After `7d375bffa5` ("sb_edac: Fix support for systems with two home agents per socket") sysfs "dimm_label" and "chX_dimm_label" show their label string without a newline "\n" at the end. [root@orange ~]# cat /sys/bus/mc0/devices/dimm0/dimm_label CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]# [root@orange ~]# cat /sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]# The label strings now have 31 characters, which are the same as EDAC_MC_LABEL_LEN. Since the snprintf()s in channel_dimm_label_show() and dimmdev_label_show() limit the whole length by EDAC_MC_LABEL_LEN, the newline in the format "%s\n" is ignored. [root@orange ~]# od -bc /sys/bus/mc0/devices/dimm0/dimm_label 0000000 103 120 125 137 123 162 143 111 104 043 060 137 110 141 043 060 C P U _ S r c I D # 0 _ H a # 0 0000020 137 103 150 141 156 043 060 137 104 111 115 115 043 060 000 _ C h a n # 0 _ D I M M # 0 \0 0000037 Fix it by using 'sizeof(dimm->label) + 1' as the whole length in the snprintf()s in channel_dimm_label_show() and dimmdev_label_show(). Reported-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Link: http://lkml.kernel.org/r/1442933883-21587-2-git-send-email-toshi.kani@hpe.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-09-25 19:19:21 +02:00
Loc Ho	f864b79ba2	EDAC, xgene: Add SoC support Add support for the SoC component. Signed-off-by: Loc Ho <lho@apm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: devicetree@vger.kernel.org Cc: ijc+devicetree@hellion.org.uk Cc: jcm@redhat.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mark.rutland@arm.com Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: patches@apm.com Cc: robh+dt@kernel.org Link: http://lkml.kernel.org/r/1443055261-8613-4-git-send-email-lho@apm.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-09-25 15:41:46 +02:00
Loc Ho	9bc1c0c0ec	EDAC, xgene: Fix possible sprintf() overflow issue Replace sprintf() with snprintf() to avoid possible string array overflow. Signed-off-by: Loc Ho <lho@apm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: devicetree@vger.kernel.org Cc: ijc+devicetree@hellion.org.uk Cc: jcm@redhat.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mark.rutland@arm.com Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: patches@apm.com Cc: robh+dt@kernel.org Link: http://lkml.kernel.org/r/1443116287-11752-1-git-send-email-lho@apm.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-09-25 15:39:52 +02:00
Loc Ho	9347473c7d	EDAC, xgene: Add L3 support Add EDAC support for the L3 component. Signed-off-by: Loc Ho <lho@apm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: devicetree@vger.kernel.org Cc: ijc+devicetree@hellion.org.uk Cc: jcm@redhat.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mark.rutland@arm.com Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: patches@apm.com Cc: robh+dt@kernel.org Link: http://lkml.kernel.org/r/1443055261-8613-3-git-send-email-lho@apm.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-09-25 15:36:31 +02:00
Seth Jennings	2900ea6096	EDAC, sb_edac: Fix TAD presence check for sbridge_mci_bind_devs() In commit `7d375bffa5` ("sb_edac: Fix support for systems with two home agents per socket") NUM_CHANNELS was changed to 8 and the channel space was renumerated to handle EN, EP, and EX configurations. The *_mci_bind_devs() functions - except for sbridge_mci_bind_devs() - got a new device presence check in the form of saw_chan_mask. However, sbridge_mci_bind_devs() still uses the NUM_CHANNELS for loop. With the increase in NUM_CHANNELS, this loop fails at index 4 since SB only has 4 TADs. This results in the following error on SB machines: EDAC sbridge: Some needed devices are missing EDAC sbridge: Couldn't find mci handler EDAC sbridge: Couldn't find mci handle This patch adapts the saw_chan_mask logic for sbridge_mci_bind_devs() as well. After this patch: EDAC MC0: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#0: DEV 0000:3f:0e.0 (POLLED) EDAC MC1: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#1: DEV 0000:7f:0e.0 (POLLED) Signed-off-by: Seth Jennings <sjenning@redhat.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Tony Luck <tony.luck@intel.com> Tested-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # v4.2 Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1438798561-10180-1-git-send-email-sjenning@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-09-24 20:40:50 +02:00
Aravind Gopalakrishnan	58a9c251c9	EDAC, ghes_edac: Remove redundant memory_type array We already have edac_mem_types[] that enumerates the different kinds of memory. So, use that and remove the redundant memory_type[] array here. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1442436811-23382-2-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-09-23 16:59:25 +02:00
Borislav Petkov	09bd1b4f81	EDAC, xgene: Convert to debugfs wrappers Drop CONFIG_EDAC_DEBUG ifdeffery too, while at it. Tested-by: Loc Ho <lho@apm.com> Cc: linux-edac@vger.kernel.org Signed-off-by: Borislav Petkov <bp@suse.de>	2015-09-23 11:37:45 +02:00
Borislav Petkov	52019e406c	EDAC, i5100: Convert to debugfs wrappers This driver creates its debugfs hierarchy under the toplevel debugfs dir - see i5100_init() - so make it use edac_debugfs_create_dir_at( , NULL) because we're not breaking userspace. Oh well. Signed-off-by: Borislav Petkov <bp@suse.de>	2015-09-22 18:21:12 +02:00
Borislav Petkov	bba3b31e44	EDAC, altera: Convert to debugfs wrappers Use the EDAC-specific wrappers. Drop CONFIG_EDAC_DEBUG ifdeffery. Cc: Thor Thayer <tthayer@opensource.altera.com> Cc: <linux-edac@vger.kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-09-22 18:20:56 +02:00

1 2 3 4 5 ...

1170 Commits