linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-27 06:31:52 +00:00

Author	SHA1	Message	Date
Uwe Kleine-König	9441e5ca3a	EDAC/xgene: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231004131254.2673842-21-u.kleine-koenig@pengutronix.de	2023-11-20 23:31:44 +01:00
Uwe Kleine-König	8312b2bbdd	EDAC/ti: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231004131254.2673842-20-u.kleine-koenig@pengutronix.de	2023-11-20 23:30:09 +01:00
Uwe Kleine-König	f30e2fac7d	EDAC/synopsys: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231004131254.2673842-19-u.kleine-koenig@pengutronix.de	2023-11-20 23:29:21 +01:00
Uwe Kleine-König	bfee05aa38	EDAC/qcom: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231004131254.2673842-18-u.kleine-koenig@pengutronix.de	2023-11-20 23:28:17 +01:00
Uwe Kleine-König	58758ffa11	EDAC/ppc4xx: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231004131254.2673842-17-u.kleine-koenig@pengutronix.de	2023-11-20 23:24:46 +01:00
Uwe Kleine-König	524d3e56fb	EDAC/octeon-pci: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20231004131254.2673842-16-u.kleine-koenig@pengutronix.de	2023-11-20 23:11:21 +01:00
Uwe Kleine-König	a92dd68e16	EDAC/octeon-pc: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20231004131254.2673842-15-u.kleine-koenig@pengutronix.de	2023-11-20 23:10:37 +01:00
Uwe Kleine-König	c2a962933c	EDAC/octeon-lmc: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20231004131254.2673842-14-u.kleine-koenig@pengutronix.de	2023-11-20 23:09:49 +01:00
Uwe Kleine-König	01314f2772	EDAC/octeon-l2c: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20231004131254.2673842-13-u.kleine-koenig@pengutronix.de	2023-11-20 23:09:07 +01:00
Uwe Kleine-König	8510e004d5	EDAC/npcm: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231004131254.2673842-12-u.kleine-koenig@pengutronix.de	2023-11-20 22:52:14 +01:00
Uwe Kleine-König	1baf49724e	EDAC/mpc85xx: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231004131254.2673842-11-u.kleine-koenig@pengutronix.de	2023-11-20 22:44:37 +01:00
Uwe Kleine-König	81b3e87411	EDAC/highbank_mc: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20231004131254.2673842-10-u.kleine-koenig@pengutronix.de	2023-11-20 22:33:52 +01:00
Uwe Kleine-König	7aca2e9b7b	EDAC/highbank_l2: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20231004131254.2673842-9-u.kleine-koenig@pengutronix.de	2023-11-20 22:33:22 +01:00
Uwe Kleine-König	d27cb32e00	EDAC/dmc520: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231004131254.2673842-8-u.kleine-koenig@pengutronix.de	2023-11-20 22:01:16 +01:00
Uwe Kleine-König	0576ded05b	EDAC/cpc925: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231004131254.2673842-7-u.kleine-koenig@pengutronix.de	2023-11-20 21:56:49 +01:00
Uwe Kleine-König	d8d9f99fd0	EDAC/cell: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231004131254.2673842-6-u.kleine-koenig@pengutronix.de	2023-11-20 21:39:13 +01:00
Uwe Kleine-König	a5347591eb	EDAC/bluefield: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231004131254.2673842-5-u.kleine-koenig@pengutronix.de	2023-11-20 21:28:08 +01:00
Uwe Kleine-König	2546fffd91	EDAC/aspeed: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231004131254.2673842-4-u.kleine-koenig@pengutronix.de	2023-11-20 21:26:01 +01:00
Uwe Kleine-König	5aafd02da7	EDAC/armada_xp: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231004131254.2673842-3-u.kleine-koenig@pengutronix.de	2023-11-20 20:07:53 +01:00
Uwe Kleine-König	b73e11c873	EDAC/altera: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231004131254.2673842-2-u.kleine-koenig@pengutronix.de	2023-11-20 19:43:18 +01:00
Rob Herring	1b09892962	EDAC/altera: Use device_get_match_data() Use preferred device_get_match_data() instead of of_match_device() to get the driver match data. With this, adjust the includes to explicitly include the correct headers. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231115210201.3743564-1-robh@kernel.org	2023-11-20 09:59:16 +01:00
Linus Torvalds	befaa609f4	hardening updates for v6.7-rc1 - Add LKDTM test for stuck CPUs (Mark Rutland) - Improve LKDTM selftest behavior under UBSan (Ricardo Cañuelo) - Refactor more 1-element arrays into flexible arrays (Gustavo A. R. Silva) - Analyze and replace strlcpy and strncpy uses (Justin Stitt, Azeem Shaikh) - Convert group_info.usage to refcount_t (Elena Reshetova) - Add __counted_by annotations (Kees Cook, Gustavo A. R. Silva) - Add Kconfig fragment for basic hardening options (Kees Cook, Lukas Bulwahn) - Fix randstruct GCC plugin performance mode to stay in groups (Kees Cook) - Fix strtomem() compile-time check for small sources (Kees Cook) -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmU/3cUWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJsEoEACBGPSiOmfSWdH3TOnIG270PD24 jGjg8KFv7RC/JTOdYmpLl0okdlGT9LvjN/ToSSDEw3PIayxoXUdhkbYy0MYtiV3m yz2ozDTzJuplQX/W2fPE+nXSzIwHao2zjPPFjHnT7lt8IIjhgjiOtLfZ2gGUkW99 Mdu2aWh3u0r4tC8OS23++yN5ibRc5l72efsjDWjZ0aPXnxE1bjmLMiIPiizpndIf beasPuDBs98sJVYouemCwnsPXuXOPz3Q1Cpo/fTd+TMTJCLSemCQZCTuOBU0acI/ ZjLCgCaJU1yIYKBMtrIN4G9kITZniXX3/Nm4o6NQMVlcCqMeNaHuflomqWoqWfhE UPbRo2eghZOaMNiCKLLvZDIqPrh1IcsiEl6Ef3W4hICc42GTK96IuGisIvDXwQ4N /SzTOupJuN42noh3z1M3XuZy5RoXJ99IYDNY5CTKf9IdqvA0bbGkU3nb1gZH/xw9 BjTqKzR/7K1kTXuSgagDZ1Wceej9pZxhX7E3IHYsP8ZOvKug3EeL4yybVwQ3HRfq Qnzcp/qPB9cOkLSQXveRTFTsj2mX28Gixct/iDuc1jIYwGQlY1gI6dcUcqby6ptM BrQti7eR2NH2+T3aE2UVCIWsZVhx7NaSF+z8JxfAuu56jicc4xJVsi8zrNveWX5M m2VXyBl3121BVtKi4w== =0iVF -----END PGP SIGNATURE----- Merge tag 'hardening-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "One of the more voluminous set of changes is for adding the new __counted_by annotation[1] to gain run-time bounds checking of dynamically sized arrays with UBSan. - Add LKDTM test for stuck CPUs (Mark Rutland) - Improve LKDTM selftest behavior under UBSan (Ricardo Cañuelo) - Refactor more 1-element arrays into flexible arrays (Gustavo A. R. Silva) - Analyze and replace strlcpy and strncpy uses (Justin Stitt, Azeem Shaikh) - Convert group_info.usage to refcount_t (Elena Reshetova) - Add __counted_by annotations (Kees Cook, Gustavo A. R. Silva) - Add Kconfig fragment for basic hardening options (Kees Cook, Lukas Bulwahn) - Fix randstruct GCC plugin performance mode to stay in groups (Kees Cook) - Fix strtomem() compile-time check for small sources (Kees Cook)" * tag 'hardening-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (56 commits) hwmon: (acpi_power_meter) replace open-coded kmemdup_nul reset: Annotate struct reset_control_array with __counted_by kexec: Annotate struct crash_mem with __counted_by virtio_console: Annotate struct port_buffer with __counted_by ima: Add __counted_by for struct modsig and use struct_size() MAINTAINERS: Include stackleak paths in hardening entry string: Adjust strtomem() logic to allow for smaller sources hardening: x86: drop reference to removed config AMD_IOMMU_V2 randstruct: Fix gcc-plugin performance mode to stay in group mailbox: zynqmp: Annotate struct zynqmp_ipi_pdata with __counted_by drivers: thermal: tsens: Annotate struct tsens_priv with __counted_by irqchip/imx-intmux: Annotate struct intmux_data with __counted_by KVM: Annotate struct kvm_irq_routing_table with __counted_by virt: acrn: Annotate struct vm_memory_region_batch with __counted_by hwmon: Annotate struct gsc_hwmon_platform_data with __counted_by sparc: Annotate struct cpuinfo_tree with __counted_by isdn: kcapi: replace deprecated strncpy with strscpy_pad isdn: replace deprecated strncpy with strscpy NFS/flexfiles: Annotate struct nfs4_ff_layout_segment with __counted_by nfs41: Annotate struct nfs4_file_layout_dsaddr with __counted_by ...	2023-10-30 19:09:55 -10:00
Shubhrajyoti Datta	6f15b178cd	EDAC/versal: Add a Xilinx Versal memory controller driver Add a EDAC driver for the RAS capabilities on the Xilinx integrated DDR Memory Controllers (DDRMCs) which support both DDR4 and LPDDR4/4X memory interfaces. It has four programmable Network-on-Chip (NoC) interface ports and is designed to handle multiple streams of traffic. The driver reports correctable and uncorrectable errors, and also creates debugfs entries for testing through error injection. [ bp: - Add a pointer to the documentation about the register unlock code. - Squash in a fix for a Smatch static checker issue as reported by Dan Carpenter: https://lore.kernel.org/r/a4db6f93-8e5f-4d55-a7b8-b5a987d48a58@moroto.mountain ] Co-developed-by: Sai Krishna Potthuri <sai.krishna.potthuri@amd.com> Signed-off-by: Sai Krishna Potthuri <sai.krishna.potthuri@amd.com> Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231005101242.14621-3-shubhrajyoti.datta@amd.com	2023-10-23 19:41:27 +02:00
Justin Stitt	6b343a4642	EDAC/mc_sysfs: Replace deprecated strncpy() with memcpy() `strncpy` is deprecated for use on NUL-terminated destination strings [1]. We've already calculated bounds, possible truncation with '\0' or '\n' and manually NUL-terminated. The situation is now just a literal byte copy from one buffer to another, let's treat it as such and use a less ambiguous interface in memcpy. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230918-strncpy-drivers-edac-edac_mc_sysfs-c-v4-1-38a23d2fcdd8@google.com Signed-off-by: Kees Cook <keescook@chromium.org>	2023-09-29 14:48:32 -07:00
Linus Torvalds	bb511d4b25	Intel EDAC fixes: - Old igen6 driver could lose pending events during initialization - Sapphire Rapids workstations have fewer memory controllers than their bigger siblings. This confused the driver. -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEENIoOqscayAmBOQ5Iq6sjH5ffWIEFAmTudg4UHHRvbnkubHVj a0BpbnRlbC5jb20ACgkQq6sjH5ffWIH4wA//Z+pbRElvnWyK8rTx6SbWFu82D8a/ dAXx5V+8I6v64MPb9VZXP6KEiBQgk2jD2AsC0+2QrZL9FUnKwnBSDC3rgVWPTxBo dTxu8j1PDTlnffU+wuaB+3cCRikwa1h+Fr/SQaphwTLA3nm13CHj+dUOp3ZUR8fT vz+M4t3SRgcU/0W40jcLnn1h5hsTNjQWr//zVVdctGr++sl7xtVh7wxZPakTC9RL FBMx3elqdroeQ5ILMxC5e1V02tAZVrXxZbSNpLWhH25MBwe8P7rc+SHYfNaddnpx 3qrOOzRZl3fGifoM+GU/JsMeIYh6FYUhOfBNTjUFWQZP+6mDvgj9WaLxVgw9V99R W384K7KnjLSnE01/REZ0x9R1sehXyQIv2zGosJitRuKyLuw5UODx/khzpCG6a0P3 RPi4tNemscCIr5djX8VBqmyxS5tqUzlBBDskDnsHHS7NXLuYv1O6SqR/7kvCqhFQ 7/qGWNFbzZOMJZiLGUmmxEv3Pk+tfTlZdYOipfaHpSlNNr9zO07VXBRNK18aqQVp 3GCpRp3IhTL3EmOE2RaV2uhyRIcpSnjvqi8shoN6p1wy8jQwNKoe3/nt7QobKhCl 4kYC9q0jNDWgh/QWxgtoB6UzWHIieeVZQQcW0Da4fvlsIBwbzcpu5+j3qaCxUNBD jUt/DwSD+D91yPI= =4Uqu -----END PGP SIGNATURE----- Merge tag 'edac_updates_for_v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull intel EDAC fixes from Tony Luck: - Old igen6 driver could lose pending events during initialization - Sapphire Rapids workstations have fewer memory controllers than their bigger siblings. This confused the driver. * tag 'edac_updates_for_v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/igen6: Fix the issue of no error events EDAC/i10nm: Skip the absent memory controllers	2023-08-30 19:23:00 -07:00
Linus Torvalds	ef2a0b7cdb	Devicetree include cleanups for v6.6: These are the remaining few clean-ups of DT related includes which didn't get applied to subsystem trees. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmTucUoACgkQ+vtdtY28 YcOYoQ//RwIPeWc74PHQbOb6eQR95eTHDcDE1MR9Fw8amqxFaomGlSMpbyVyP4ag 8p82c6qfJIZautyEikbKFO+iYjFMua0KuOTMVuDxHErQOl6ym4P4Uk3+1h5stVSj IdfK4CACtMKxKBOPAcyxJU6HKoWcUtMKsKV6OLdDh7M2Fy/G4RCjv4w1Xf3VAn59 VOa0KF7FhHU3dhIB/tGsj0t13+3e3kF5+l4+pdoMoZWhR4gac5FJRxiR5dMZG6jr VY8i9FZb7DW2VtY78FVVOaYDDVf4vNrc+0kqnCbWUaKACHPgNXC375LvS7jFGXvc HYVN3teqhFxNOyoSehn2bdBVwJxjQFgy2gTt2vRWTa/CaUDES90cue2R9GT2Sz0b eBc3DQtNeT5m8mrLkuEfZrJjKjaEy2Pr6FjNDhNcmkJak7dkMMgkG/Y/SpNmpZOe 2C3T6i4i6FUxni/2/rWHSVLnYBGfhPNdwWAZcQOi8rqtzp3tF46wVa345+Ev3VDG ECDndH8Qk3gtOmGyeTIvPc51yDP6Hpuh7+0jydtehkXHB+cUJtR+g0efIGf7BDgo sQpa1vRxkOolrCxyzKwcogEY7jjeccv/FM7BwaZQKXEibiKGkxeDuahdwbfvDuVq br16Uj9VzG8Jl6KK0gexV7kzZAAdw1y3JqPGUZf7hn4zmk099ow= =eLMf -----END PGP SIGNATURE----- Merge tag 'devicetree-header-cleanups-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree include cleanups from Rob Herring: "These are the remaining few clean-ups of DT related includes which didn't get applied to subsystem trees" * tag 'devicetree-header-cleanups-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: ipmi: Explicitly include correct DT includes tpm: Explicitly include correct DT includes lib/genalloc: Explicitly include correct DT includes parport: Explicitly include correct DT includes sbus: Explicitly include correct DT includes mux: Explicitly include correct DT includes macintosh: Explicitly include correct DT includes hte: Explicitly include correct DT includes EDAC: Explicitly include correct DT includes clocksource: Explicitly include correct DT includes sparc: Explicitly include correct DT includes riscv: Explicitly include correct DT includes	2023-08-30 17:04:28 -07:00
Linus Torvalds	1a7c611546	Perf events changes for v6.6: - AMD IBS improvements - Intel PMU driver updates - Extend core perf facilities & the ARM PMU driver to better handle ARM big.LITTLE events - Micro-optimize software events and the ring-buffer code - Misc cleanups & fixes Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmTtBscRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hHoQ/+IBQ8Xi/rcdd40n8OqEB/VBWVuSjNT3uN 3pHHcTl2Pio9CxBeat42NekNijlRILCKJrZ3Lt3JWBmWyWv5l3KFabelj+lDF2xa TVCjTnQNe1+HvrODYnF4ECIs5vaoMVjcJ9jg8+VDgAcOQr1nZs4m5TVAd6TLqPpV urBEQVULkkzk7ZRhfrugKhw+wrpWFefgGCx0RV8ijZB7TLMHc2wE+Q/sTxKdKceL wNaJaDgV33pZh0aImwR9pKUE532hF1FiBdLuehkh61PZa1L82jzAX1xjw2s1hSa4 eIWemPHJIYfivRlENbJsDWc4N8gk6ijVHwrxGcr4Axu+NN+zPtQ3ddhaGMAyKdTo qUKXH3MZSMIl++jI5Fkc6xM+XLvY1rML62epSzMwu/cc7Z5MeyWdQcri0N9YFuO7 wUUNnFpU00lwQBLbyyUQ3Zi8E0QV7NuPW4axTkmntiIjMpLagaEvVSf6nf8qLpbE WTT16s707t19hUZNazNZ7ONmhly4ALbHFQEH65J2KoYn99fYqy9z68Hwk+xnmykw bc3qvfhpw0MImQQ+DqHiBwb4n4UuvY2WlkkZI3FfNeSG63DaM2mZikfpElpXYjn6 9iOIXvx21Wiq/n0cbLhidI2q/ZzFCzYLCk6ikZ320wb+rhvd7EoSlZil6QSzn3pH Qdk+NEZgWQY= =ZT6+ -----END PGP SIGNATURE----- Merge tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event updates from Ingo Molnar: - AMD IBS improvements - Intel PMU driver updates - Extend core perf facilities & the ARM PMU driver to better handle ARM big.LITTLE events - Micro-optimize software events and the ring-buffer code - Misc cleanups & fixes * tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/uncore: Remove unnecessary ?: operator around pcibios_err_to_errno() call perf/x86/intel: Add Crestmont PMU x86/cpu: Update Hybrids x86/cpu: Fix Crestmont uarch x86/cpu: Fix Gracemont uarch perf: Remove unused extern declaration arch_perf_get_page_size() perf: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability arm_pmu: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability perf/x86: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA perf/mem: Introduce PERF_MEM_LVLNUM_UNC perf/ring_buffer: Use local_try_cmpxchg in __perf_output_begin locking/arch: Avoid variable shadowing in local_try_cmpxchg() perf/core: Use local64_try_cmpxchg in perf_swevent_set_period perf/x86: Use local64_try_cmpxchg perf/amd: Prevent grouping of IBS events	2023-08-28 16:35:01 -07:00
Rob Herring	408d808893	EDAC: Explicitly include correct DT includes The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it was merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they "temporarily" include each other. They also include platform_device.h and of.h. As a result, there's a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes. Link: https://lore.kernel.org/r/20230714174434.4054728-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>	2023-08-28 13:31:01 -05:00
Avadhut Naik	c4d07c3712	EDAC/amd64: Add support for AMD family 1Ah models 00h-1Fh and 40h-4Fh Add support for family 1Ah-based models 00h-1Fh and 40h-4Fh. [ bp: Simplify. ] Signed-off-by: Avadhut Naik <Avadhut.Naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230809035244.2722455-4-avadhut.naik@amd.com	2023-08-10 14:25:21 +02:00
Peter Zijlstra	0cfd8fbadd	x86/cpu: Fix Crestmont uarch Sierra Forest and Grand Ridge are both E-core only using Crestmont micro-architecture, They fit the pre-existing naming scheme prefectly fine, adhere to it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230807150405.757666627@infradead.org	2023-08-09 21:51:06 +02:00
Qiuxu Zhuo	ce53ad81ed	EDAC/igen6: Fix the issue of no error events Current igen6_edac checks for pending errors before the registration of the error handler. However, there is a possibility that the error occurs during the registration process, leading to unhandled pending errors and no future error events. This issue can be reproduced by repeatedly injecting errors during the loading of the igen6_edac. Fix this issue by moving the pending error handler after the registration of the error handler, ensuring that no pending errors are left unhandled. Fixes: `10590a9d4f` ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Reported-by: Ee Wey Lim <ee.wey.lim@intel.com> Tested-by: Ee Wey Lim <ee.wey.lim@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20230725080427.23883-1-qiuxu.zhuo@intel.com	2023-08-02 13:09:56 -07:00
Qiuxu Zhuo	c545f5e412	EDAC/i10nm: Skip the absent memory controllers Some Sapphire Rapids workstations' absent memory controllers still appear as PCIe devices that fool the i10nm_edac driver and result in "shift exponent -66 is negative" call traces from skx_get_dimm_info(). Skip the absent memory controllers to avoid the call traces. Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Closes: https://lore.kernel.org/linux-edac/CAAd53p41Ku1m1rapeqb1xtD+kKuk+BaUW=dumuoF0ZO3GhFjFA@mail.gmail.com/T/#m5de16dce60a8c836ec235868c7c16e3fefad0cc2 Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reported-by: Koba Ko <koba.ko@canonical.com> Closes: https://lore.kernel.org/linux-edac/SA1PR11MB71305B71CCCC3D9305835202892AA@SA1PR11MB7130.namprd11.prod.outlook.com/T/#t Tested-by: Koba Ko <koba.ko@canonical.com> Fixes: `d4dc89d069` ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20230710013232.59712-1-qiuxu.zhuo@intel.com	2023-07-24 08:57:26 -07:00
Linus Torvalds	aa35a4835e	- Add initial support for RAS hardware found on AMD server GPUs (MI200). Those GPUs and CPUs are connected together through the coherent fabric and the GPU memory controllers report errors through x86's MCA so EDAC needs to support them. The amd64_edac driver supports now HBM (High Bandwidth Memory) and thus such heterogeneous memory controller systems - Other small cleanups and improvements -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmSZiUwACgkQEsHwGGHe VUphSQ/+JLXTAQ06CNos98MR8iCGdThVujhWt1pBIgjhQFJuf4JlEEtKs9htjbud 9HZvgnGbHahRoO8pMCB0jwtz0ATrPbaOvz4BofVp3SIRiR5jMI0tfmyl8iSrnA3Q m5pbMh6uiIAlH8aPqQXret2iwp7JXOjnBWksgbmUWkI7d2qseKu98ikXyC4QoCaD AGRJJ6OCA3P85rdT9qabOuXh6yoELOPKw3j243s22sTLiqn+EuoTE+QX5ZjrQ8Ts DyXN/pYI/vGVP7sECkWf7PsEf1BkL6m5KeXDB4Ij2YJesQnBlBZQdAcxdGdY8z3M f/qpLdrYvpcLHQy42Jm5VnnISOvMvAl8YWqCEyUmBjXcLwSPNIKHN9LQuznhnQHr vssRVqQUg1J+/UWAoIzHdrAQ6zvgv1xlX2dG2YOw3t1WMDnMhztW3eoQv04etD3d fqQH3MrkGHI4qeq1Mice1Gz+NWQG/PXVhgBzbTBDDCiRJkg1Dhxce1OMRUiM4tUW 0JABoU+KS0RZAKXAwine6v5duYmwK36Vl1SSCCWjqFMeR7XMwWWHA9d7t8+wdT1l KBIEiRTcRnXaZXyLUPSPRbEF5ALS25RgWVPCA3ibuSUnJjGU7Z7/rbwlQryAefVB nqjATed0zat4fbL9bvnDuOKQEzkuySvUWpU+Eozxbct6oRu5ms0= =Vcif -----END PGP SIGNATURE----- Merge tag 'ras_core_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Add initial support for RAS hardware found on AMD server GPUs (MI200). Those GPUs and CPUs are connected together through the coherent fabric and the GPU memory controllers report errors through x86's MCA so EDAC needs to support them. The amd64_edac driver supports now HBM (High Bandwidth Memory) and thus such heterogeneous memory controller systems - Other small cleanups and improvements * tag 'ras_core_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: EDAC/amd64: Cache and use GPU node map EDAC/amd64: Add support for AMD heterogeneous Family 19h Model 30h-3Fh EDAC/amd64: Document heterogeneous system enumeration x86/MCE/AMD, EDAC/mce_amd: Decode UMC_V2 ECC errors x86/amd_nb: Re-sort and re-indent PCI defines x86/amd_nb: Add MI200 PCI IDs ras/debugfs: Fix error checking for debugfs_create_dir() x86/MCE: Check a hw error's address to determine proper recovery action	2023-06-26 15:09:18 -07:00
Linus Torvalds	e5ce2f196f	- amd64_edac: Add support for Zen4 client hardware - amd64_edac: Remove the version string as it is useless and actively confusing when looking at backported versions of the driver - Add a driver for the Nuvoton NPCM memory controller - A debugfs error checking cleanup -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmSZSk4ACgkQEsHwGGHe VUqPaQ/8Dwc8vS4iibztAFyYisRGrJRfVP6k7Nl2tSgCi+Tg0BWNFTSMyBzoLNuY ewGUe1ZKNkKb3Vs8OE3E48vstVd6J/jcoMAxUmtl4uHxzjhVfoIruBD/xK2Q6mO1 UDfRrfT2LZv/0/Tn7++QP3R3aQLvDqJC6IVAG1Hn4hqHSnhw7CqgCetbBY/M+hQR p9Xjtb2Gbm1UwMEK+z9DG9jNZR2vtPRfOeieAcHpOnDwTe2QY1jQGoeeVDfdfJbC iU2D87ad1V7o4p+7Eur0wwg8smuWqSVslWId6+qmtL4xePK6JUL9D+3kPEO4AjWV iYqDi4EcdXOglYnAEvKhRbN8eCFMaYyoZqpC10DUTccyWv5w/CW2tRc7ZOKDPgyZ LVpupz87rKaJ2C6ymQ41vv98hpHEiGSSHserK0aY4K03ecL+pnHp4Qu3ZID8YLCo V6P1R7S63YFO1TU0LSWiVBBcmoWg0Zy5MQkKc+2PcWYm6soGDYFoD5lURVoVAiw4 YZhReq58NQwyZQYhxgpBmdZYaLlrvGiGQZx/dhuR5C2qF3uL3wdi5mYvP/vSmKbG vLPMl/DrqGQEHJnCU2U8Xo3kss3mf/Qv7qvusaxkjcub8wvfKRbX7w4QhXSU7+qb 1sf6LPWBOk+xb2daUM1tzaMUnF3Pr+8gbzlAxlu1SmtG/HiC7JA= =e4qx -----END PGP SIGNATURE----- Merge tag 'edac_updates_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - amd64_edac: Add support for Zen4 client hardware - amd64_edac: Remove the version string as it is useless and actively confusing when looking at backported versions of the driver - Add a driver for the Nuvoton NPCM memory controller - A debugfs error checking cleanup * tag 'edac_updates_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/npcm: Add NPCM memory controller driver dt-bindings: memory-controllers: nuvoton: Add NPCM memory controller EDAC/thunderx: Check debugfs file creation retval properly EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh EDAC/amd64: Remove module version string	2023-06-26 15:06:42 -07:00
Yazen Ghannam	4251566ebc	EDAC/amd64: Cache and use GPU node map AMD systems have historically provided an "AMD Node ID" that is a unique identifier for each die in a multi-die package. This was associated with a unique instance of the AMD Northbridge on a legacy system. And now it is associated with a unique instance of the AMD Data Fabric on modern systems. Each instance is referred to as a "Node"; this is an AMD-specific term not to be confused with NUMA nodes. The data fabric provides a number of interfaces accessible through a set of functions in a single PCI device. There is one PCI device per Data Fabric (AMD Node), and multi-die systems will see multiple such PCI devices. The AMD Node ID matches a Node's position in the PCI hierarchy. For example, the Node 0 is accessed using the first PCI device, Node 1 is accessed using the second, and so on. A logical CPU can find its AMD Node ID using CPUID. Furthermore, the AMD Node ID is used within the hardware fabric, so it is not purely a logical value. Heterogeneous AMD systems, with a CPU Data Fabric connected to GPU data fabrics, follow a similar convention. Each CPU and GPU die has a unique AMD Node ID value, and each Node ID corresponds to PCI devices in sequential order. However, there are two caveats: 1) GPUs are not x86, and they don't have CPUID to read their AMD Node ID like on CPUs. This means the value is more implicit and based on PCI enumeration and hardware-specifics. 2) There is a gap in the hardware values for AMD Node IDs. Values 0-7 are for CPUs and values 8-15 are for GPUs. For example, a system with one CPU die and two GPUs dies will have the following values: CPU0 -> AMD Node 0 GPU0 -> AMD Node 8 GPU1 -> AMD Node 9 EDAC is the only subsystem where this has a practical effect. Memory errors on AMD systems are commonly reported through MCA to a CPU on the local AMD Node. The error information is passed along to EDAC where the AMD EDAC modules use the AMD Node ID of reporting logical CPU to access AMD Node information. However, memory errors from a GPU die will be reported to the CPU die. Therefore, the logical CPU's AMD Node ID can't be used since it won't match the AMD Node ID of the GPU die. The AMD Node ID of the GPU die is provided as part of the MCA information, and the value will match the hardware enumeration (e.g. 8-15). Handle this situation by discovering GPU dies the same way as CPU dies in the AMD NB code. But do a "node id" fixup in AMD64 EDAC where it's needed. The GPU data fabrics provide a register with the base AMD Node ID for their local "type", i.e. GPU data fabric. This value is the same for all fabrics of the same type in a system. Read and cache the base AMD Node ID from one of the GPU devices during module initialization. Use this to fixup the "node id" when reporting memory errors at runtime. [ bp: Squash a fix making gpu_node_map static as reported by Tom Rix <trix@redhat.com>. Link: https://lore.kernel.org/r/20230610210930.174074-1-trix@redhat.com ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Co-developed-by: Muralidhara M K <muralidhara.mk@amd.com> Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230515113537.1052146-6-muralimk@amd.com	2023-06-19 13:01:44 +02:00
Borislav Petkov (AMD)	852667c317	Merge ras/edac-drivers into for-next * ras/edac-drivers: EDAC/npcm: Add NPCM memory controller driver dt-bindings: memory-controllers: nuvoton: Add NPCM memory controller Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2023-06-12 15:15:36 +02:00
Marvin Lin	d244c610f1	EDAC/npcm: Add NPCM memory controller driver Add driver for memory controller present on Nuvoton NPCM SoCs. The memory controller supports single bit error correction and double bit error detection. Signed-off-by: Marvin Lin <milkfafa@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230111093245.318745-4-milkfafa@gmail.com	2023-06-12 15:14:10 +02:00
Borislav Petkov (AMD)	0a81fa5d74	Merge ras/edac-misc into for-next * ras/edac-misc: EDAC/thunderx: Check debugfs file creation retval properly Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2023-06-07 10:50:07 +02:00
Yeqi Fu	bf5c04ddd3	EDAC/thunderx: Check debugfs file creation retval properly edac_debugfs_create_file() returns ERR_PTR by way of the respective debugfs function it calls, if an error occurs. The appropriate way to verify for errors is to use IS_ERR(). Do so. [ bp: Rewrite all text. ] Signed-off-by: Yeqi Fu <asuk4.q@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230517173111.365787-1-asuk4.q@gmail.com	2023-06-06 23:04:56 +02:00
Muralidhara M K	9c42edd571	EDAC/amd64: Add support for AMD heterogeneous Family 19h Model 30h-3Fh AMD Family 19h Model 30h-3Fh systems can be connected to AMD MI200 accelerator/GPU devices such that the CPU and GPU data fabrics are connected together. In this configuration, the CPU manages error logging and reporting for MCA banks located on the GPUs. This includes HBM memory errors reported from Unified Memory Controllers (UMCs) on the GPUs. The GPU memory errors are handled like CPU memory errors. AMD CPU UMC support in EDAC can be re-used for GPU UMC support. However, keeping them separate means drastic changes in one path (e.g. to support newer products) should have less impact on the other path. Also, simplify the "gpu_" helper functions where possible. GPU product configuration, like memory type and channel count, is fixed compared to CPU products. GPU UMCs each have four physical connections (phys) connected to eight channels. There is a single "chip select". This differs from CPUs where each UMC has one physical connection connected to one channel, and each channel has up to four "chip selects". Enumerate each UMC "phy" as an EDAC CSROW, since there is only a single chip select for each physical connection. This is similar to how a CPU UMC "phy" is enumerated as an EDAC CHANNEL, since there is only a single channel for each physical connection. Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230515113537.1052146-5-muralimk@amd.com	2023-06-05 12:27:18 +02:00
Yazen Ghannam	c35977b00f	x86/MCE/AMD, EDAC/mce_amd: Decode UMC_V2 ECC errors The MI200 (Aldebaran) series of devices introduced a new SMCA bank type for Unified Memory Controllers. The MCE subsystem already has support for this new type. The MCE decoder module will decode the common MCA error information for the new bank type, but it will not pass the information to the AMD64 EDAC module for detailed memory error decoding. Have the MCE decoder module recognize the new bank type as an SMCA UMC memory error and pass the MCA information to AMD64 EDAC. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Co-developed-by: Muralidhara M K <muralidhara.mk@amd.com> Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230515113537.1052146-3-muralimk@amd.com	2023-06-05 12:27:11 +02:00
Manivannan Sadhasivam	cbd77119b6	EDAC/qcom: Get rid of hardcoded register offsets The LLCC EDAC register offsets varies between each SoC. Hardcoding the register offsets won't work and will often result in crash due to accessing the wrong locations. Hence, get the register offsets from the LLCC driver matching the individual SoCs. Cc: <stable@vger.kernel.org> # 6.0: `5365cea199` ("soc: qcom: llcc: Rename reg_offset structs to reflect LLCC version") Cc: <stable@vger.kernel.org> # 6.0: `c13d7d261e` ("soc: qcom: llcc: Pass LLCC version based register offsets to EDAC driver") Cc: <stable@vger.kernel.org> # 6.0 Fixes: `a6e9d7ef25` ("soc: qcom: llcc: Add configuration data for SM8450 SoC") Acked-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230517114635.76358-3-manivannan.sadhasivam@linaro.org	2023-05-26 20:56:55 -07:00
Manivannan Sadhasivam	3d49f7406b	EDAC/qcom: Remove superfluous return variable assignment in qcom_llcc_core_setup() "ret" variable will be assigned on both success and failure cases. So there is no need to initialize it during start of qcom_llcc_core_setup(). Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230517114635.76358-2-manivannan.sadhasivam@linaro.org	2023-05-26 20:56:54 -07:00
Hristo Venev	6c79e42169	EDAC/amd64: Add support for ECC on family 19h model 60h-7Fh Ryzen 9 7950X uses model 61h. Treat it as Epyc 9004, but with 2 channels instead of 12. With two 32GB dual-rank DIMMs the sizes appear to be reported correctly: EDAC MC0: Giving out device to module amd64_edac controller F19h_M60h: DEV 0000:00:18.3 (INTERRUPT) EDAC amd64: F19h_M60h detected (node 0). EDAC MC: UMC0 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 16384MB 3: 16384MB EDAC MC: UMC1 chip selects: EDAC amd64: MC: 0: 0MB 1: 0MB EDAC amd64: MC: 2: 16384MB 3: 16384MB AMD64 EDAC driver v3.5.0 ECC errors can also be detected: mce: [Hardware Error]: Machine check events logged [Hardware Error]: Corrected error, no action required. [Hardware Error]: CPU:0 (19:61:2) MC21_STATUS[Over\|CE\|MiscV\|AddrV\|-\|-\|SyndV\|CECC\|-\|-\|-]: 0xdc2040000400011b [Hardware Error]: Error Addr: 0x00000007ff7e93c0 [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x000100010a801203 [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error. EDAC MC0: 1 CE Cannot decode normalized address on mc#0csrow#3channel#0 (csrow:3 channel:0 page:0x0 offset:0x0 grain:64 syndrome:0x1) [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD According to Mario Limonciello, the same code should also work for models 70h-7Fh (follow thread in Link). [ bp: Massage, the translation logic updates are pending. ] Signed-off-by: Hristo Venev <hristo@venev.name> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20230425201239.324476-1-hristo@venev.name Link: https://lore.kernel.org/r/20230511174506.875153-2-hristo@venev.name	2023-05-15 16:32:47 +02:00
Yazen Ghannam	b34348a0d7	EDAC/amd64: Remove module version string The AMD64 EDAC module version information is not exposed through ABI like MODULE_VERSION(). Instead it is printed during module init. Version numbers can be confusing in cases where module updates are partly backported resulting in a difference between upstream and backported module versions. Remove the AMD64 EDAC module version information to avoid user confusion. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230410190959.3367528-1-yazen.ghannam@amd.com	2023-05-10 15:49:52 +02:00
Linus Torvalds	556eb8b791	Driver core changes for 6.4-rc1 Here is the large set of driver core changes for 6.4-rc1. Once again, a busy development cycle, with lots of changes happening in the driver core in the quest to be able to move "struct bus" and "struct class" into read-only memory, a task now complete with these changes. This will make the future rust interactions with the driver core more "provably correct" as well as providing more obvious lifetime rules for all busses and classes in the kernel. The changes required for this did touch many individual classes and busses as many callbacks were changed to take const * parameters instead. All of these changes have been submitted to the various subsystem maintainers, giving them plenty of time to review, and most of them actually did so. Other than those changes, included in here are a small set of other things: - kobject logging improvements - cacheinfo improvements and updates - obligatory fw_devlink updates and fixes - documentation updates - device property cleanups and const * changes - firwmare loader dependency fixes. All of these have been in linux-next for a while with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5 LEGadNS38k5fs+73UaxV =7K4B -----END PGP SIGNATURE----- Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.4-rc1. Once again, a busy development cycle, with lots of changes happening in the driver core in the quest to be able to move "struct bus" and "struct class" into read-only memory, a task now complete with these changes. This will make the future rust interactions with the driver core more "provably correct" as well as providing more obvious lifetime rules for all busses and classes in the kernel. The changes required for this did touch many individual classes and busses as many callbacks were changed to take const * parameters instead. All of these changes have been submitted to the various subsystem maintainers, giving them plenty of time to review, and most of them actually did so. Other than those changes, included in here are a small set of other things: - kobject logging improvements - cacheinfo improvements and updates - obligatory fw_devlink updates and fixes - documentation updates - device property cleanups and const * changes - firwmare loader dependency fixes. All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits) device property: make device_property functions take const device * driver core: update comments in device_rename() driver core: Don't require dynamic_debug for initcall_debug probe timing firmware_loader: rework crypto dependencies firmware_loader: Strip off \n from customized path zram: fix up permission for the hot_add sysfs file cacheinfo: Add use_arch[\|_cache]_info field/function arch_topology: Remove early cacheinfo error message if -ENOENT cacheinfo: Check cache properties are present in DT cacheinfo: Check sib_leaf in cache_leaves_are_shared() cacheinfo: Allow early level detection when DT/ACPI info is missing/broken cacheinfo: Add arm64 early level initializer implementation cacheinfo: Add arch specific early level initializer tty: make tty_class a static const structure driver core: class: remove struct class_interface * from callbacks driver core: class: mark the struct class in struct class_interface constant driver core: class: make class_register() take a const * driver core: class: mark class_release() as taking a const * driver core: remove incorrect comment for device_create* MIPS: vpe-cmp: remove module owner pointer from struct class usage. ...	2023-04-27 11:53:57 -07:00
Linus Torvalds	a907047732	ARM: SoC drivers for v6.4 The most notable updates this time are for Qualcomm Snapdragon platforms. The Inline-Crypto-Engine gets a new DT binding and driver. A number of drivers now support additional Snapdragon variants, in particular the rsc, scm, geni, bwm, glink and socinfo, while the llcc (edac) and rpm drivers get notable functionality updates. Updates on other platforms include: - Various updates to the Mediatek mutex and mmsys drivers, including support for the Helio X10 SoC - Support for unidirectional mailbox channels in Arm SCMI firmware - Support for per cpu asynchronous notification in OP-TEE firmware - Minor updates for memory controller drivers. - Minor updates for Renesas, TI, Amlogic, Apple, Broadcom, Tegra, Allwinner, Versatile Express, Canaan, Microchip, Mediatek and i.MX SoC drivers, mainly updating the use of MODULE_LICENSE() macros and obsolete DT driver interfaces. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmRGmncACgkQYKtH/8kJ Uif6ghAAw1TiPTJzJLLCNx+txOVFB62WDglv3T1CufjfcWp0Eh0RJSCcsCOPV+/7 UHi4+X4nPAcudeOFMFtslCR8ExLRWY4j7t2ZYo/k+VI3jdB8Qkbr6NAQgAuRdLYX WZ1cV6o76B3bhO2HqSVNVZ8/3Z7OAYw4j9VDD/4AbW+l3GyentlQTjabpJNREvSS 5HzT3ZI33o7M8mM4uYmmEXVrg8sCupbRyL9S7jTiFXRLcfqujclhfezJ4UrJJv7b wxGf+e2YNMqKH6PiKYufzN1TYI2D0YQeB1m56Y9FsAKxgAyHh2xWpsHeyVnaw0jc KaKjRN/H3JDlW/VCMAjQOIShCZdAs02xHnEXxY6pKLMM6i8/FkzzNIxNQwXrx5KH zYESXVd6suOI0eCZT8zkKKLHRT5EJRaliUv5Z+Qp2BBe3vJVZD0JqSlZ7lOznplF lviwL6ydAMr2cfTgfMxbRiYQVDzncFkfnR3t55SC6rYjGt6QWjeS0dDbGHf4WVC4 FDbnST4JaBmi+frh55VooX7EpzIv9wa0/taayaChd9qvXnh22uqaqho1sPYKZ6BI OXduHQ3qojJhKKKK1VJKzN5Ef3OHLQLNrvcc1DsKILrrES4w4LX1C9dmyh2CLXLo q5cX6L1iB1Hx5tujalDYBsHBBmbiT/1tNM2S7pAGigiGy4KEc28= =r6jm -----END PGP SIGNATURE----- Merge tag 'soc-drivers-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC driver updates from Arnd Bergmann: "The most notable updates this time are for Qualcomm Snapdragon platforms. The Inline-Crypto-Engine gets a new DT binding and driver, and a number of drivers now support additional Snapdragon variants, in particular the rsc, scm, geni, bwm, glink and socinfo, while the llcc (edac) and rpm drivers get notable functionality updates. Updates on other platforms include: - Various updates to the Mediatek mutex and mmsys drivers, including support for the Helio X10 SoC - Support for unidirectional mailbox channels in Arm SCMI firmware - Support for per cpu asynchronous notification in OP-TEE firmware - Minor updates for memory controller drivers. - Minor updates for Renesas, TI, Amlogic, Apple, Broadcom, Tegra, Allwinner, Versatile Express, Canaan, Microchip, Mediatek and i.MX SoC drivers, mainly updating the use of MODULE_LICENSE() macros and obsolete DT driver interfaces" * tag 'soc-drivers-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (165 commits) soc: ti: smartreflex: Simplify getting the opam_sr pointer bus: vexpress-config: Add explicit of_platform.h include soc: mediatek: Kconfig: Add MTK_CMDQ dependency to MTK_MMSYS memory: mtk-smi: mt8365: Add SMI Support dt-bindings: memory-controllers: mediatek,smi-larb: add mt8365 dt-bindings: memory-controllers: mediatek,smi-common: add mt8365 memory: tegra: read values from correct device dt-bindings: crypto: Add Qualcomm Inline Crypto Engine soc: qcom: Make the Qualcomm UFS/SDCC ICE a dedicated driver dt-bindings: firmware: document Qualcomm QCM2290 SCM soc: qcom: rpmh-rsc: Support RSC v3 minor versions soc: qcom: smd-rpm: Use GFP_ATOMIC in write path soc/tegra: fuse: Remove nvmem root only access soc/tegra: cbb: tegra194: Use of_address_count() helper soc/tegra: cbb: Remove MODULE_LICENSE in non-modules ARM: tegra: Remove MODULE_LICENSE in non-modules soc/tegra: flowctrl: Use devm_platform_get_and_ioremap_resource() soc: tegra: cbb: Drop empty platform remove function firmware: arm_scmi: Add support for unidirectional mailbox channels dt-bindings: firmware: arm,scmi: Support mailboxes unidirectional channels ...	2023-04-25 12:02:16 -07:00
Borislav Petkov (AMD)	ce8ac91130	Merge branches 'edac-drivers', 'edac-amd64' and 'edac-misc' into edac-updates Combine all queued EDAC changes for submission into v6.4: * ras/edac-drivers: EDAC/i10nm: Add Intel Sierra Forest server support EDAC/skx: Fix overflows on the DRAM row address mapping arrays * ras/edac-amd64: (27 commits) EDAC/amd64: Fix indentation in umc_determine_edac_cap() EDAC/amd64: Add get_err_info() to pvt->ops EDAC/amd64: Split dump_misc_regs() into dct/umc functions EDAC/amd64: Split init_csrows() into dct/umc functions EDAC/amd64: Split determine_edac_cap() into dct/umc functions EDAC/amd64: Rename f17h_determine_edac_ctl_cap() EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions EDAC/amd64: Split ecc_enabled() into dct/umc functions EDAC/amd64: Split read_mc_regs() into dct/umc functions EDAC/amd64: Split determine_memory_type() into dct/umc functions EDAC/amd64: Split read_base_mask() into dct/umc functions EDAC/amd64: Split prep_chip_selects() into dct/umc functions EDAC/amd64: Rework hw_info_{get,put} EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt EDAC/amd64: Do not discover ECC symbol size for Family 17h and later EDAC/amd64: Drop dbam_to_cs() for Family 17h and later EDAC/amd64: Split get_csrow_nr_pages() into dct/umc functions EDAC/amd64: Rename debug_display_dimm_sizes() * ras/edac-misc: EDAC/altera: Remove MODULE_LICENSE in non-module EDAC: Sanitize MODULE_AUTHOR strings EDAC/amd81[13]1: Remove trailing newline from MODULE_AUTHOR EDAC/i5100: Fix typo in comment EDAC/altera: Remove redundant error logging Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2023-04-24 09:14:30 +02:00
Qiuxu Zhuo	96ae3995c6	EDAC/i10nm: Add Intel Sierra Forest server support The Sierra Forest CPU model uses similar memory controller registers as Granite Rapids server. Add Sierra Forest CPU model ID for EDAC support. Tested-by: Li Zhang <li4.zhang@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20230410131531.11914-1-qiuxu.zhuo@intel.com	2023-04-10 09:33:51 -07:00
Yang Li	49aba1c589	EDAC/amd64: Fix indentation in umc_determine_edac_cap() Use consistent indentation to improve the readability and fix: drivers/edac/amd64_edac.c:1279 umc_determine_edac_cap() warn: inconsistent indenting Fixes: `f6a4b4a1aa` ("EDAC/amd64: Split determine_edac_cap() into dct/umc functions") Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230404022557.46409-1-yang.lee@linux.alibaba.com	2023-04-04 17:22:55 +02:00
Nick Alcock	e088d80e2a	EDAC/altera: Remove MODULE_LICENSE in non-module Since `8b41fc4454` ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. altera_edac is not a module for a while now, remove the macro call. Suggested-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230217141059.392471-24-nick.alcock@oracle.com	2023-04-01 13:18:50 +02:00
Borislav Petkov (AMD)	371b27f2f3	EDAC: Sanitize MODULE_AUTHOR strings Fixup the remaining MODULE_AUTHOR strings to not contain newlines. Shorten and unbreak others. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230328134309.23159-1-bp@alien8.de	2023-03-28 15:43:30 +02:00
Jonathan Neuschäfer	01db1030f1	EDAC/amd81[13]1: Remove trailing newline from MODULE_AUTHOR MODULE_AUTHOR strings don't usually include a newline character. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230129165054.1675554-1-j.neuschaefer@gmx.net	2023-03-28 15:26:52 +02:00
Muralidhara M K	b3ece3a6a2	EDAC/amd64: Add get_err_info() to pvt->ops GPU Nodes will use a different method to determine the chip select and channel of an error. A function pointer should be used rather than introduce another branching condition. Prepare for this by adding get_err_info() to pvt->ops. This function is only called from the modern code path, so a legacy function is not defined. Make sure to call this after MCA_STATUS[SyndV] is checked, since the csrow value is found in MCA_SYND. [ Yazen: rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-23-yazen.ghannam@amd.com	2023-03-24 13:03:21 +01:00
Muralidhara M K	f6f36382d6	EDAC/amd64: Split dump_misc_regs() into dct/umc functions Add a function pointer to pvt->ops. No functional change is intended. [ Yazen: Rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-22-yazen.ghannam@amd.com	2023-03-24 13:03:21 +01:00
Muralidhara M K	6fb8b5fb9e	EDAC/amd64: Split init_csrows() into dct/umc functions Call them from their respective setup_mci_misc_attrs() paths. Also, drop the check for an "empty" device, i.e. one without memory. This is redundant and already done in instance_has_memory() earlier in the init path. No functional change is intended. [ Yazen: rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-21-yazen.ghannam@amd.com	2023-03-24 13:03:21 +01:00
Muralidhara M K	f6a4b4a1aa	EDAC/amd64: Split determine_edac_cap() into dct/umc functions Call them from their respective setup_mci_misc_attrs() paths. No functional change is intended. [ Yazen: rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-20-yazen.ghannam@amd.com	2023-03-24 13:03:21 +01:00
Yazen Ghannam	9369239e8d	EDAC/amd64: Rename f17h_determine_edac_ctl_cap() ...to match the "umc_" prefix convention. No functional change is intended. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-19-yazen.ghannam@amd.com	2023-03-24 13:03:20 +01:00
Muralidhara M K	0a42a37f65	EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions The init_one_instance() path is shared between legacy and modern systems. So add the new functions to a function pointer in pvt->ops. No functional change is intended. [ Yazen: Rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-18-yazen.ghannam@amd.com	2023-03-24 13:03:20 +01:00
Muralidhara M K	eb2bcdfc37	EDAC/amd64: Split ecc_enabled() into dct/umc functions Call them using a function pointer in pvt->ops. The "ECC enabled" check is done outside of the hardware information gathering done in hw_info_get(). So a high-level function pointer is needed to separate the legacy and modern paths. No functional change is intended. [Yazen: rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-17-yazen.ghannam@amd.com	2023-03-24 13:03:20 +01:00
Muralidhara M K	32ecdf8688	EDAC/amd64: Split read_mc_regs() into dct/umc functions Call them from their respective hw_info_get() paths. ECC symbol size is not needed on UMC systems, so determine_ecc_sym_sz() is left out of the UMC path. Do not save TOP_MEM* values on modern controllers because they're not needed there (read: they were used only for debugging, if anything). [ Yazen: rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-16-yazen.ghannam@amd.com	2023-03-24 13:03:20 +01:00
Muralidhara M K	78ec161a91	EDAC/amd64: Split determine_memory_type() into dct/umc functions Call them from their respective hw_info_get() paths. Call them after all other hardware registers have been saved, since the memory type for a device will be determined based on the saved information. [ Yazen: rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-15-yazen.ghannam@amd.com	2023-03-24 13:03:20 +01:00
Muralidhara M K	b29dad9bf3	EDAC/amd64: Split read_base_mask() into dct/umc functions Call them from their respective hw_info_get() paths. Call the new functions after the setting the chip select base and mask counts, since those are need to read the correct number of chip select base and mask registers. And call the new functions before the remaining set up, because the base and mask register values will be needed later. [Yazen: Rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-14-yazen.ghannam@amd.com	2023-03-24 13:03:20 +01:00
Muralidhara M K	637f60ef2c	EDAC/amd64: Split prep_chip_selects() into dct/umc functions Call them from their respective hw_info_get() function. Avoid the need for family/model-based function pointers. Add the calls before reading hardware registers from the memory controllers, since the number of chip select bases and masks needs to be known first. [ Yazen: rebased/reworked patch and reworded commit message. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-13-yazen.ghannam@amd.com	2023-03-24 13:03:20 +01:00
Yazen Ghannam	9a97a7f4d7	EDAC/amd64: Rework hw_info_{get,put} The bulk of system-specific information is gathered at init time with hw_info_get(). This function calls a number of helper functions, and many of these helper functions are split between a modern UMC/DF path and a legacy DCT path. Split hw_info_get() into legacy and modern versions. This creates two separate code paths early on, and legacy and modern helper functions can be called directly in the appropriate code path. Also, simplify hw_info_put() and share it between legacy and modern systems. NULL pointer checks are done in pci_dev_put() and kfree(), so they can be called unconditionally. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-12-yazen.ghannam@amd.com	2023-03-24 13:03:20 +01:00
Muralidhara M K	ed623d55ee	EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt Future AMD systems will support heterogeneous "AMD Node" types, e.g. CPU and GPU types. Therefore, a global family type shared across all AMD nodes is no longer appropriate. Move struct low_ops routines and members of struct amd64_family_type to struct amd64_pvt. Currently, there are many code branches that split between "modern" and "legacy" systems. Another code branch will be needed in order to cover GPU cases. However, rather than introduce another branching case in multiple functions, the current branching code should be switched to a set of function pointers. This change makes the code more readable and simplifies adding support for new families/models. In order to reuse code, define two sets of function pointers. Use one for modern systems (Family 17h and later). This will not change between current CPU families. Use another set of function pointers for legacy systems (before Family 17h). Use the Family 16h versions as default for the legacy ops since these are the latest, and adjust the function pointers as needed for older families. [ Yazen: rebased/reworked patch and reworded commit message. ] [ bp: Fix rev8 or later check. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com> Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-11-yazen.ghannam@amd.com	2023-03-24 13:03:19 +01:00
Yazen Ghannam	5a1adb375d	EDAC/amd64: Do not discover ECC symbol size for Family 17h and later The ECC symbol size was needed on legacy system to lookup the ECC syndrome. This is not needed on modern systems because the ECC syndrome is explicitly provided in the MCA information. Remove the ECC symbol size discovery code for modern UMC-based systems. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-10-yazen.ghannam@amd.com	2023-03-24 13:03:19 +01:00
Yazen Ghannam	a2e59ab8e9	EDAC/amd64: Drop dbam_to_cs() for Family 17h and later The same function is used to calculate chip select size for all Zen-based family/models. Therefore, a family/model function pointer is not necessary. Drop the dbam_to_cs() function pointer for Family 17h and later systems. Also, move the Family 17h function to avoid a forward declaration. Rename it to indicate that the UMC Address Mask is used rather than the legacy DBAM value. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-9-yazen.ghannam@amd.com	2023-03-24 13:03:19 +01:00
Yazen Ghannam	c0984666fd	EDAC/amd64: Split get_csrow_nr_pages() into dct/umc functions Split get_csrow_nr_pages() into a legacy and modern versions in preparation for further legacy/modern refactoring. Also, rename f17_get_cs_mode() to match the new convention. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-8-yazen.ghannam@amd.com	2023-03-24 13:03:19 +01:00
Yazen Ghannam	00e4feb8c0	EDAC/amd64: Rename debug_display_dimm_sizes() Use the "dct" and "umc" prefixes for legacy and modern versions respectively. Also, move the "dct" version to avoid a forward declaration, and fixup some checkpatch warnings in the process. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-7-yazen.ghannam@amd.com	2023-03-24 12:54:47 +01:00
Jongwoo Han	5b6cb45072	EDAC/i5100: Fix typo in comment Correct typo from 'preform' to 'perform' in comment. Signed-off-by: Jongwoo Han <jongwooo.han@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230302021120.56794-1-jongwooo.han@gmail.com	2023-03-23 12:04:04 +01:00
Greg Kroah-Hartman	cb4a0bec0b	EDAC/sysfs: move to use bus_get_dev_root() Direct access to the struct bus_type dev_root pointer is going away soon so replace that with a call to bus_get_dev_root() instead, which is what it is there for. Cc: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rric@kernel.org> Cc: linux-edac@vger.kernel.org Link: https://lore.kernel.org/r/20230313182918.1312597-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2023-03-22 09:25:49 +01:00
Deepak R Varma	4e89780a4c	EDAC/altera: Remove redundant error logging A call to platform_get_irq() already prints an error on failure within its own implementation. So printing another error based on its return value in the caller is redundant and should be removed. The clean up also makes if condition block braces unnecessary. Remove that as well. Issue identified using platform_get_irq.cocci coccinelle semantic patch. Signed-off-by: Deepak R Varma <drv@mailo.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/Y/+j27kqdhflPtaj@ubun2204.myguest.virtualbox.org	2023-03-21 22:50:46 +01:00
Manivannan Sadhasivam	721d3e91bf	qcom: llcc/edac: Support polling mode for ECC handling Not all Qcom platforms support IRQ mode for ECC handling. For those platforms, the current EDAC driver will not be probed due to missing ECC IRQ in devicetree. So add support for polling mode so that the EDAC driver can be used on all Qcom platforms supporting LLCC. The polling delay of 5000ms is chosen based on Qcom downstream/vendor driver. Reported-by: Luca Weiss <luca.weiss@fairphone.com> Tested-by: Luca Weiss <luca.weiss@fairphone.com> Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230314080443.64635-14-manivannan.sadhasivam@linaro.org	2023-03-15 15:17:08 -07:00
Manivannan Sadhasivam	ee13b50087	qcom: llcc/edac: Fix the base address used for accessing LLCC banks The Qualcomm LLCC/EDAC drivers were using a fixed register stride for accessing the (Control and Status Registers) CSRs of each LLCC bank. This stride only works for some SoCs like SDM845 for which driver support was initially added. But the later SoCs use different register stride that vary between the banks with holes in-between. So it is not possible to use a single register stride for accessing the CSRs of each bank. By doing so could result in a crash. For fixing this issue, let's obtain the base address of each LLCC bank from devicetree and get rid of the fixed stride. This also means, there is no need to rely on reg-names property and the base addresses can be obtained using the index. First index is LLCC bank 0 and last index is LLCC broadcast. If the SoC supports more than one bank, then those need to be defined in devicetree for index from 1..N-1. Reported-by: Parikshit Pareek <quic_ppareek@quicinc.com> Tested-by: Luca Weiss <luca.weiss@fairphone.com> Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20230314080443.64635-13-manivannan.sadhasivam@linaro.org	2023-03-15 15:17:08 -07:00
Qiuxu Zhuo	71b1e3ba3f	EDAC/skx: Fix overflows on the DRAM row address mapping arrays The current DRAM row address mapping arrays skx_{open,close}_row[] only support ranks with sizes up to 16G. Decoding a rank address to a DRAM row address for a 32G rank by using either one of the above arrays by the skx_edac driver, will result in an overflow on the array. For a 32G rank, the most significant DRAM row address bit (the bit17) is mapped from the bit34 of the rank address. Add this new mapping item to both arrays to fix the overflow issue. Fixes: `4ec656bdf4` ("EDAC, skx_edac: Add EDAC driver for Skylake") Reported-by: Feng Xu <feng.f.xu@intel.com> Tested-by: Feng Xu <feng.f.xu@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230211011728.71764-1-qiuxu.zhuo@intel.com	2023-03-13 10:42:00 -07:00
Linus Torvalds	d9de5ce8a5	- Add a driver for the RAS functionality on Xilinx's on chip memory controller - Add support for decoding errors from the first and second level memory on SKL-based hardware - Add support for the memory controllers in Intel Granite Rapids and Emerald Rapids machines - First round of amd64_edac driver simplification and removal of unneeded functionality - The usual cleanups and fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPw5xsACgkQEsHwGGHe VUohSA//bS/iFiglmpTKiY1qynPuVRfZCYGZov5JN+fRzpFQos1HISHGHTKtGbGJ pau8Y6+QJG5LkFdR8Nf1u25WOEaYhBHHj1crUAkmSIz9zYyinrdYyDTOC2LBTmSf BziOElAtuhTrvQ4WNL75cFzpaAKCGE7yuwZZFLVM3gHXiuVeZ3Spzbe0I9eJ4uDe Hvgg1/IVoGAsvhNouxG5ABgVzKWxoyqEDFZtLo1adLuv8cm0hwFKWqC7zw9Y/gj0 b8tiqnoRxrEDNt8uc+D+y9HIXunB+YPBUcGhDZFrYAMlWQbENQ2WJSodIg0klNtv Nd62wWZavdtCv9rMjOdGFPuLvWV1Lr5uIsNVSEhuqRpXjywFdYycMfmuD30YIfA6 k1t71pxGSB5fJ6qr/y0a4HkoRz9HON03Ki00gkVIMMo48k0DJKtzt6Mui8rtzIe3 uNlSDxyMXQvEUg/nR54kPAropL5DvKRx7QJ3Z1Yh4KcFmH1NtjIqoJfDghK2Gz1X XIzIzeTJy+LRepZ6KRSEDOM8FrFzHkUKU9OZTnn/RlWha6nKyBaVyeb5kutJCW+N Ytj9DqSxpAFDRBvbUpHRRFL1h5bgss7+AXLpkmYBF0QKmYiYV/MBSBdNpEZ1B3VC CsRlD1IT6FSUhAdPqhAvbCDPOGpd/AvGhmLnfmn78wGIIWR0W24= =i3bo -----END PGP SIGNATURE----- Merge tag 'edac_updates_for_v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - Add a driver for the RAS functionality on Xilinx's on chip memory controller - Add support for decoding errors from the first and second level memory on SKL-based hardware - Add support for the memory controllers in Intel Granite Rapids and Emerald Rapids machines - First round of amd64_edac driver simplification and removal of unneeded functionality - The usual cleanups and fixes * tag 'edac_updates_for_v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/amd64: Shut up an -Werror,-Wsometimes-uninitialized clang false positive EDAC/amd64: Remove early_channel_count() EDAC/amd64: Remove PCI Function 0 EDAC/amd64: Remove PCI Function 6 EDAC/amd64: Remove scrub rate control for Family 17h and later EDAC/amd64: Don't set up EDAC PCI control on Family 17h+ EDAC/i10nm: Add driver decoder for Sapphire Rapids server EDAC/i10nm: Add Intel Granite Rapids server support EDAC/i10nm: Make more configurations CPU model specific EDAC/i10nm: Add Intel Emerald Rapids server support EDAC/skx_common: Delete duplicated and unreachable code EDAC/skx_common: Enable EDAC support for the "near" memory EDAC/qcom: Add platform_device_id table for module autoloading EDAC/zynqmp: Add EDAC support for Xilinx ZynqMP OCM dt-bindings: edac: Add bindings for Xilinx ZynqMP OCM	2023-02-21 08:10:03 -08:00
Linus Torvalds	0246725d73	- Add support for reporting more bits of the physical address on error, on newer AMD CPUs - Mask out bits which don't belong to the address of the error being reported -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPzUAEACgkQEsHwGGHe VUr22RAAh7fi3s8sDP4B2WBe1LPKZystZamxlLObBG2eLT7g0YmSKV12+bHCGf/B nGqz9iy+e/T1Khxv0gdEyuENwzuitXgiEOYgB4u70HimWy5422ZCzn1EiOMFtyST g0ehOR+tU84YwMVR40ui3spI1DHgeVPqVBLHBARZ1OAaA58N8eVREC6MqJAeAzIU +VYiBbn69quECTuU1P7yaT8NDnbm5G6pA1dhKLc7vLl9QWzoW1yWLLcp+oGFN6B8 rcGDKEDK1OYtdHScRCfhFrznkeYP6SVnSt4wlAgX+HVGPoMpvq8nJygxCWdE0yjd aQGhdcVJkQlSqm1iJUv0MK9nkolqXVVSVTurpHunAq7ctul6Qm/X+fsfwBgSIXXn Gdj3in374MLWCz/xGqeBS8IiiPxGxJA9s350jyk02LK6Np6sXeuc4PpR66+6FAKQ Ypen+uWJ6oBof04bW7DBK0R14atA8EpOOLUrrGIsSkNSEIjLaCipMZOpRCbOw76N bXcdnKKsaEDjKtHClvx/vZXklfzWk0OgF8qtY0nGF+khvDAi3pQaIIlCehf0Qemh 6j00TqIYBCXa0kuKktdPzVJSM7A7TZ5ftboa1IPhE+GYrFFee/VJ3yfgqz102FWI RJsY8JXt+EP3VMSOQYqQ5KzcLBJ2uDiRYtgUo4P1CITNpRfZEMc= =e9v9 -----END PGP SIGNATURE----- Merge tag 'ras_core_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Add support for reporting more bits of the physical address on error, on newer AMD CPUs - Mask out bits which don't belong to the address of the error being reported * tag 'ras_core_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Mask out non-address bits from machine check bank x86/mce: Add support for Extended Physical Address MCA changes x86/mce: Define a function to extract ErrorAddr from MCA_ADDR	2023-02-21 08:04:51 -08:00
Yazen Ghannam	28980db947	EDAC/amd64: Shut up an -Werror,-Wsometimes-uninitialized clang false positive Reportedly, clang cannot do interprocedural analysis: https://lore.kernel.org/r/20230213-amd64_edac-wsometimes-uninitialized-v1-1-5bde32b89e02@kernel.org and see that those arguments won't be used uninitialized. So, yeah, the code's fine even without this. Normally, such a "fix" won't be applied but that warning gets automatically enabled in -Wall builds and when CONFIG_WERROR is set in allmodconfig builds, the build fails. So shut it up with a minimal fix as this code will see more reorganization very soon. [ bp: Write commit message. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/Y%2BqdVHidnrrKvxiD@dev-arch.thelio-3990X	2023-02-14 17:56:14 +01:00
Yazen Ghannam	c4605bde33	EDAC/amd64: Remove early_channel_count() The early_channel_count() function seems to have been useful in the past for knowing how many EDAC mci structures to populate. However, this is no longer needed as the maximum channel count for a system is used instead. Remove the early_channel_count() helper functions and related code. Use the size of the channel layer when iterating over channel structures. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-6-yazen.ghannam@amd.com	2023-02-09 14:43:39 +01:00
Yazen Ghannam	cf981562e6	EDAC/amd64: Remove PCI Function 0 PCI Function 0 is used on Family 17h and later only to read the "dhar" value. This value is printed and provided through a module-specific debug sysfs file. The value is not used for any Family 17h and later code, and it does not have any apparent debug value on these systems. Remove "dhar", Function 0 PCI IDs, and all related code. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-5-yazen.ghannam@amd.com	2023-02-09 14:41:56 +01:00
Yazen Ghannam	6229235f7c	EDAC/amd64: Remove PCI Function 6 PCI Function 6 is used on Family 17h and later to access scrub registers. With scrub access removed, this function has no other use. Remove all Function 6 PCI IDs and related code. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-4-yazen.ghannam@amd.com	2023-02-09 11:50:52 +01:00
Yazen Ghannam	6e241bc93c	EDAC/amd64: Remove scrub rate control for Family 17h and later The scrub registers on AMD Family 17h and later may be inaccessible to the OS. Furthermore, hardware designers recommend that the scrubbing feature is managed by the firmware. Remove support for the sdram_scrub_rate interface for AMD Family 17h systems and later by not setting the scrub function pointers. The EDAC MC core will then not expose the scrub files in sysfs. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-3-yazen.ghannam@amd.com	2023-02-09 11:25:21 +01:00
Yazen Ghannam	fdce765a13	EDAC/amd64: Don't set up EDAC PCI control on Family 17h+ EDAC PCI control is used to detect/report legacy PCI errors like "Parity" and "SERROR". Modern AMD systems use PCIe Advanced Error Reporting (AER), and legacy PCI errors should not be reported. Remove EDAC PCI control setup on AMD Family 17h and later systems. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230127170419.1824692-2-yazen.ghannam@amd.com	2023-02-09 11:14:53 +01:00
Youquan Song	221aa03fb4	EDAC/i10nm: Add driver decoder for Sapphire Rapids server Intel SDM (December 2022) vol3B 17.13.2 contains IMC MC error codes for Sapphire Rapids. Current i10nm_edac only supports firmware decoder (ACPI DSM methods) for Sapphire Rapids. So add the driver decoder (decoding DDR memory errors via extracting error information from the IMC MC error codes) for Sapphire Rapids for better decoding performance. Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2023-02-08 08:29:11 -08:00
Qiuxu Zhuo	ba987eaaab	EDAC/i10nm: Add Intel Granite Rapids server support The Granite Rapids CPU model uses similar memory controller registers as Sapphire Rapids server but with some different configurations: - Various memory controller numbers for different Granite Rapids CPUs. So detect the number of present memory controllers at run time. - Different MMIO offsets of memory controllers. - Different triples of bus/dev/fun of some PCI devices used in i10nm_edac. Add above configurations and Granite Rapids CPU model ID for EDAC support. [Tony: Fixed 2 typos s/strcture/structure/] Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com	2023-01-25 08:17:30 -08:00
Qiuxu Zhuo	dd7814b785	EDAC/i10nm: Make more configurations CPU model specific The numbers of memory controllers per socket, channels per memory controller, DIMMs per channel and the triples of bus/device/function of PCI devices used in i10nm_edac can be CPU model specific. Add new fields to the structure res_config for above numbers and triples to make them CPU model specific. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com	2023-01-25 08:17:20 -08:00
Qiuxu Zhuo	e4b2bc6616	EDAC/i10nm: Add Intel Emerald Rapids server support The Emerald Rapids CPU model uses similar memory controller registers as Sapphire Rapids server. Add Emerald Rapids CPU model number ID for EDAC support. Tested-by: Li Zhang <li4.zhang@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com	2023-01-25 08:17:08 -08:00
Qiuxu Zhuo	d2415e2e53	EDAC/skx_common: Delete duplicated and unreachable code skx_mce_check_error() returns early if the error isn't from memory. So when skx_mce_output_error() is invoked from skx_mce_check_error(), it doesn't need to re-check whether the error is from memory. Delete the duplicated and unreachable code from skx_mce_output_error(). Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com	2023-01-25 08:16:53 -08:00
Qiuxu Zhuo	6e8746cb73	EDAC/skx_common: Enable EDAC support for the "near" memory The current {skx,i10nm}_edac miss the EDAC support to decode errors from the 1st level memory (the fast "near" memory as cache) of the 2-level memory system. Introduce a helper function skx_error_in_mem() to check whether errors are from memory at the beginning of skx_mce_check_error(). As long as the errors are from memory (either the 1-level memory system or the 2-level memory system), decode the errors. Reported-and-tested-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com	2023-01-25 08:16:44 -08:00
Manivannan Sadhasivam	977c6ba624	EDAC/qcom: Do not pass llcc_driv_data as edac_device_ctl_info's pvt_info The memory for llcc_driv_data is allocated by the LLCC driver. But when it is passed as the private driver info to the EDAC core, it will get freed during the qcom_edac driver release. So when the qcom_edac driver gets probed again, it will try to use the freed data leading to the use-after-free bug. Hence, do not pass llcc_driv_data as pvt_info but rather reference it using the platform_data pointer in the qcom_edac driver. Fixes: `27450653f1` ("drivers: edac: Add EDAC driver support for QCOM SoCs") Reported-by: Steev Klimaszewski <steev@kali.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride Cc: <stable@vger.kernel.org> # 4.20 Link: https://lore.kernel.org/r/20230118150904.26913-4-manivannan.sadhasivam@linaro.org	2023-01-20 19:47:34 +01:00
Manivannan Sadhasivam	8d8fcc391f	EDAC/qcom: Add platform_device_id table for module autoloading Add a device ID table so that the driver loads automatically when the associated platform_device gets registered. Reported-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride Link: https://lore.kernel.org/r/20230118150904.26913-3-manivannan.sadhasivam@linaro.org	2023-01-20 16:59:53 +01:00
Manivannan Sadhasivam	cec669ff71	EDAC/device: Respect any driver-supplied workqueue polling value The EDAC drivers may optionally pass the poll_msec value. Use that value if available, else fall back to 1000ms. [ bp: Touchups. ] Fixes: `e27e3dac65` ("drivers/edac: add edac_device class") Reported-by: Luca Weiss <luca.weiss@fairphone.com> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride Cc: <stable@vger.kernel.org> # 4.9 Link: https://lore.kernel.org/r/COZYL8MWN97H.MROQ391BGA09@otso	2023-01-19 11:43:16 +01:00
Tony Luck	8a01ec97dc	x86/mce: Mask out non-address bits from machine check bank Systems that support various memory encryption schemes (MKTME, TDX, SEV) use high order physical address bits to indicate which key should be used for a specific memory location. When a memory error is reported, some systems may report those key bits in the IA32_MCi_ADDR machine check MSR. The Intel SDM has a footnote for the contents of the address register that says: "Useful bits in this field depend on the address methodology in use when the register state is saved." AMD Processor Programming Reference has a more explicit description of the MCA_ADDR register: "For physical addresses, the most significant bit is given by Core::X86::Cpuid::LongModeInfo[PhysAddrSize]." Add a new #define MCI_ADDR_PHYSADDR for the mask of valid physical address bits within the machine check bank address register. Use this mask for recoverable machine check handling and in the EDAC driver to ignore any key bits that may be present. [ Tony: Based on independent fixes proposed by Fan Du and Isaku Yamahata ] Reported-by: Isaku Yamahata <isaku.yamahata@intel.com> Reported-by: Fan Du <fan.du@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20230109152936.397862-1-tony.luck@intel.com	2023-01-10 11:47:07 +01:00
Sai Krishna Potthuri	3bd2706c91	EDAC/zynqmp: Add EDAC support for Xilinx ZynqMP OCM Add EDAC support for Xilinx ZynqMP OCM Controller, so this driver reports CE and UE errors upon interrupt generation. Also add debugfs files for error injection. On Xilinx ZynqMP platform, both OCM Controller driver(zynqmp_edac) and DDR Memory Controller driver(synopsys_edac) co-exist which means both can be loaded at a time. This scenario is tested on Xilinx ZynqMP platform. Fix following issue reported by the robot: "MAINTAINERS references a file that doesn't exist: Documentation/devicetree/bindings/edac/xlnx,zynqmp-ocmc.yaml" [ bp: - Massage commit message - s/EDAC_ZYNQMP_OCM/EDAC_ZYNQMP/ - Touchups ] Reported-by: kernel test robot <lkp@intel.com> Co-developed-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Sai Krishna Potthuri <sai.krishna.potthuri@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230104084512.1855243-3-sai.krishna.potthuri@amd.com	2023-01-09 11:13:58 +01:00
Miaoqian Lin	e7a293658c	EDAC/highbank: Fix memory leak in highbank_mc_probe() When devres_open_group() fails, it returns -ENOMEM without freeing memory allocated by edac_mc_alloc(). Call edac_mc_free() on the error handling path to avoid a memory leak. [ bp: Massage commit message. ] Fixes: `a1b01edb27` ("edac: add support for Calxeda highbank memory controller") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20221229054825.1361993-1-linmq006@gmail.com	2023-01-03 17:03:57 +01:00
Eliav Farber	e840774379	EDAC/device: Fix period calculation in edac_device_reset_delay_period() Fix period calculation in case user sets a value of 1000. The input of round_jiffies_relative() should be in jiffies and not in milli-seconds. [ bp: Use the same code pattern as in edac_device_workq_setup() for clarity. ] Fixes: `c4cf3b454e` ("EDAC: Rework workqueue handling") Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20221020124458.22153-1-farbere@amazon.com	2022-12-30 15:51:41 +01:00
Borislav Petkov (AMD)	3919430fe9	Merge branches 'edac-ghes' and 'edac-misc' into edac-updates-for-v6.2 Combine all queued EDAC changes for submission into v6.2: * ras/edac-ghes: EDAC/igen6: Return the correct error type when not the MC owner apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg() EDAC: Check for GHES preference in the chipset-specific EDAC drivers EDAC/ghes: Make ghes_edac a proper module EDAC/ghes: Prepare to make ghes_edac a proper module EDAC/ghes: Add a notifier for reporting memory errors efi/cper: Export several helpers for ghes_edac to use * ras/edac-misc: EDAC/i10nm: fix refcount leak in pci_get_dev_wrapper() EDAC/i5400: Fix typo in comment: vaious -> various EDAC/mc_sysfs: Increase legacy channel support to 12 MAINTAINERS: Make Mauro EDAC reviewer MAINTAINERS: Make Manivannan Sadhasivam the maintainer of qcom_edac EDAC/i5000: Mark as BROKEN Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2022-12-12 15:40:03 +01:00
Yang Yingliang	9c89215559	EDAC/i10nm: fix refcount leak in pci_get_dev_wrapper() As the comment of pci_get_domain_bus_and_slot() says, it returns a PCI device with refcount incremented, so it doesn't need to call an extra pci_dev_get() in pci_get_dev_wrapper(), and the PCI device needs to be put in the error path. Fixes: `d4dc89d069` ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20221128065512.3572550-1-yangyingliang@huawei.com	2022-11-28 09:42:41 -08:00
Chen Zhang	b586a59e14	EDAC/i5400: Fix typo in comment: vaious -> various Fix spelling typo in comment: vaious -> various. [ bp: Massage. ] Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: Chen Zhang <chenzhang@kylinos.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221102081248.45694-1-chenzhang@kylinos.cn	2022-11-25 19:29:02 +01:00

1 2 3 4 5 ...

1982 Commits