linux/drivers/edac
Tony Luck f7cf2a22a2 sb_edac: Fix discovery of top-of-low-memory for Haswell
Haswell moved the TOLM/TOHM registers to a different device and offset.
The sb_edac driver accounted for the change of device, but not for the
new offset.  There was also a typo in the constant to fill in the low
26 bits (was 0x1ffffff, should be 0x3ffffff).

This resulted in a bogus value for the top of low memory:

  EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff)

which would result in EDAC refusing to translate addresses for
errors above the bogus value and below 4GB:

   sbridge MC3: HANDLING MCE MEMORY ERROR
   sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
   sbridge MC3: TSC 0
   sbridge MC3: ADDR 2000000
   sbridge MC3: MISC 523eac86
   sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0
   MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory ( page:0x0 offset:0x0 grain:32 syndrome:0x0)

With the fix we see the correct TOLM value:

   DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff)

and we decode address 2000000 correctly:

   sbridge MC3: HANDLING MCE MEMORY ERROR
   sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090
   sbridge MC3: TSC 0
   sbridge MC3: ADDR 2000000
   sbridge MC3: MISC 523e1086
   sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0
   DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 0, shiftup: 0
   DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000 < 0x000000007fffffff, socket interleave 1, channel interleave 4 (offset 0x00000000), index 0, base ch: 0, ch mask: 0x01
   DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB (0x00000000ffffffff), way: 1
   DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000 < 0xffffffff, RIR interleave 0, index 0
   DEBUG: sbridge_mce_output_error:  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0
   MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-12-02 12:06:52 -02:00
..
altera_edac.c edac: altera: Add Altera SDRAM EDAC support 2014-09-04 13:41:46 -05:00
amd64_edac_dbg.c amd64_edac: convert sysfs logic to use struct device 2012-06-11 13:23:40 -03:00
amd64_edac_inj.c EDAC: Replace strict_strtoul() with kstrtoul() 2013-06-08 10:16:33 +02:00
amd64_edac.c amd64_edac: Modify usage of amd64_read_dct_pci_cfg() 2014-09-23 13:16:05 +02:00
amd64_edac.h amd64_edac: Modify usage of amd64_read_dct_pci_cfg() 2014-09-23 13:16:05 +02:00
amd76x_edac.c EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro 2013-12-06 10:23:41 +01:00
amd8111_edac.c amd8111_edac: Fix leaks in probe error paths 2014-02-25 10:09:09 +01:00
amd8111_edac.h
amd8131_edac.c edac: Drop __DATE__ usage 2011-04-19 00:23:22 +02:00
amd8131_edac.h tree-wide: fix comment/printk typos 2010-11-01 15:38:34 -04:00
cell_edac.c of: Migrate of_find_node_by_name() users to for_each_node_by_name() 2014-06-26 17:12:24 +01:00
cpc925_edac.c cpc925_edac: Report UE events properly 2014-10-22 22:58:45 +02:00
e7xxx_edac.c e7xxx_edac: Report CE events properly 2014-10-22 22:59:00 +02:00
e752x_edac.c e752x_edac: Drop pvt->bridge_ck 2014-02-25 10:01:30 +01:00
edac_core.h EDAC: Fix mem_types strings type 2014-09-02 09:11:16 +02:00
edac_device_sysfs.c edac: Convert debugfX to edac_dbg(X, 2012-06-11 13:23:49 -03:00
edac_device.c EDAC: Don't try to cancel workqueue when it's never setup 2014-01-10 15:57:36 +01:00
edac_mc_sysfs.c sb_edac: Fix off-by-one error in number of channels 2014-12-02 12:06:51 -02:00
edac_mc.c EDAC: Fix mem_types strings type 2014-09-02 09:11:16 +02:00
edac_module.c EDAC, edac_module.c: Remove unnecessary test on unsigned value 2014-06-24 15:13:08 +02:00
edac_module.h EDAC: Poll timeout cannot be zero, p2 2014-02-14 10:40:29 +01:00
edac_pci_sysfs.c Linux 3.8-rc7 2013-02-20 15:45:52 -03:00
edac_pci.c edac: Unify reporting of device info for device, mc and pci 2013-11-04 17:01:09 -06:00
edac_stub.c EDAC: Add an edac_report parameter to EDAC 2013-12-11 18:06:47 +01:00
ghes_edac.c [media, edac] Change my email address 2014-02-07 08:03:07 -02:00
highbank_l2_edac.c edac, highbank: Improve and unify naming 2013-11-04 17:01:07 -06:00
highbank_mc_edac.c edac, highbank: Moving error injection to sysfs for edac 2013-11-04 17:01:11 -06:00
i7core_edac.c Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media 2014-04-04 09:50:07 -07:00
i3000_edac.c EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro 2013-12-06 10:23:41 +01:00
i3200_edac.c i3200_edac: Report CE events properly 2014-10-22 22:58:13 +02:00
i5000_edac.c EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro 2013-12-06 10:23:41 +01:00
i5100_edac.c i5100_edac: Remove an unneeded condition in i5100_init_csrows() 2014-02-20 11:52:58 +01:00
i5400_edac.c Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media 2014-04-04 09:50:07 -07:00
i7300_edac.c Linux 3.14-rc5 2014-03-11 06:55:49 -03:00
i82443bxgx_edac.c EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro 2013-12-06 10:23:41 +01:00
i82860_edac.c i82860_edac: Report CE events properly 2014-10-22 22:58:31 +02:00
i82875p_edac.c Merge branches 'pci/host-exynos', 'pci/host-imx6', 'pci/resource' and 'pci/misc' into next 2014-05-30 11:41:17 -06:00
i82975x_edac.c EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro 2013-12-06 10:23:41 +01:00
ie31200_edac.c ie31200_edac: Allocate mci and map mchbar first 2014-07-10 10:55:12 +02:00
Kconfig edac: altera: Add Altera SDRAM EDAC support 2014-09-04 13:41:46 -05:00
Makefile edac: altera: Add Altera SDRAM EDAC support 2014-09-04 13:41:46 -05:00
mce_amd_inj.c EDAC: Replace strict_strtoul() with kstrtoul() 2013-06-08 10:16:33 +02:00
mce_amd.c EDAC, MCE, AMD: Add MCE decoding for F15h M60h 2014-07-14 16:58:19 +02:00
mce_amd.h EDAC, MCE, AMD: Remove unneeded exports 2013-01-22 22:40:03 +01:00
mpc85xx_edac.c mpc85xx_edac: Make L2 interrupt shared too 2014-09-30 12:55:41 +02:00
mpc85xx_edac.h edac/85xx: Add PCIe error interrupt edac support 2013-11-25 11:29:15 +01:00
mv64x60_edac.c Drivers: edac: remove __dev* attributes. 2013-01-03 15:57:03 -08:00
mv64x60_edac.h edac: Drop __DATE__ usage 2011-04-19 00:23:22 +02:00
octeon_edac-l2c.c Drivers: edac: remove __dev* attributes. 2013-01-03 15:57:03 -08:00
octeon_edac-lmc.c EDAC: Octeon: Add error injection support 2014-03-31 18:17:12 +02:00
octeon_edac-pc.c Drivers: edac: remove __dev* attributes. 2013-01-03 15:57:03 -08:00
octeon_edac-pci.c Drivers: edac: remove __dev* attributes. 2013-01-03 15:57:03 -08:00
pasemi_edac.c Drivers: edac: remove __dev* attributes. 2013-01-03 15:57:03 -08:00
ppc4xx_edac.c ppc4xx_edac: Fix build error caused by wrong member access 2014-09-15 14:20:56 +02:00
ppc4xx_edac.h
r82600_edac.c EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro 2013-12-06 10:23:41 +01:00
sb_edac.c sb_edac: Fix discovery of top-of-low-memory for Haswell 2014-12-02 12:06:52 -02:00
tile_edac.c edac: Remove redundant platform_set_drvdata() 2013-07-17 12:49:55 -04:00
x38_edac.c x38_edac: make use of lo_hi_readq() 2014-07-04 13:46:03 +02:00