linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-26 22:21:42 +00:00

Author	SHA1	Message	Date
Ilpo Järvinen	f8367a74ae	EDAC/igen6: Convert PCIBIOS_* return codes to errnos errcmd_enable_error_reporting() uses pci_{read,write}_config_word() that return PCIBIOS_* codes. The return code is then returned all the way into the probe function igen6_probe() that returns it as is. The probe functions, however, should return normal errnos. Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal errno before returning it from errcmd_enable_error_reporting(). Fixes: `10590a9d4f` ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240527132236.13875-2-ilpo.jarvinen@linux.intel.com	2024-06-04 11:29:52 +02:00
Ilpo Järvinen	3ec8ebd8a5	EDAC/amd64: Convert PCIBIOS_* return codes to errnos gpu_get_node_map() uses pci_read_config_dword() that returns PCIBIOS_* codes. The return code is then returned all the way into the module init function amd64_edac_init() that returns it as is. The module init functions, however, should return normal errnos. Convert PCIBIOS_* returns code using pcibios_err_to_errno() into normal errno before returning it from gpu_get_node_map(). For consistency, convert also the other similar cases which return PCIBIOS_* codes even if they do not have any bugs at the moment. Fixes: `4251566ebc` ("EDAC/amd64: Cache and use GPU node map") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240527132236.13875-1-ilpo.jarvinen@linux.intel.com	2024-06-04 11:24:16 +02:00
Linus Torvalds	eba77c0477	- Have skx_edac decode error addresses belonging to SGX properly - Remove a bunch of unused struct members - Other cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmZB1k0ACgkQEsHwGGHe VUryTQ/8DuVnwHwPcRMrQmge6x2ZPuKZ73RBuFrDqnAcJdNau6YTzd1Iav2r5DE3 Op6ubfUT1RJv7pmE5Q8EBZ5qoWJQ3SnIifFXT8HDqg2iqTlibXS7NUJCxHzeOzTs Z+YgAU618x18IZ0j+Dq55U7yUtvQTviwY8FkO+D+mr/4TFt7w6zCfKNomZm5sDi8 3RfQD10OGVAlDBFdHVziKyhj82dNyQ20OMLrQ0RnhSG6D2e/3+gB88t9SaM0yiJ2 ogWHlGiB9vLQEnGuru9+HXahHqJd0DZQPJc5ygO4EufNTpuDWFctm5zGzNcnk1rz tMvvyaN8ix7KTo5a9gWRqb5ElW7dDHJkM86z/uvGsNhD1DjVGZl5VUwgJp+sSuL6 oepW1t6zqmNw81OgiZuhvWWk99HPEDQT2u1zxzmTkjXKEa2cY6Ju1KpzPxNECD4Y WwJPyUZhUsdEJ8+oQdZT2MzG3enAE/CxGlxcDEKbZU6WL19N+ofDiWYgMJaLLmW4 5k0zejE6GMgts6seFNu7NfEAVieaT7proar0GPdi4WR+oERrlEDExyzkNPyUHShR H+Q7tlEQlKQQdApoa4H6WuKiSpPZtxkRgOW5W7AE4LHEvd3MGzT5PC+qu/s/vo/H uzL9rCnYjBdKNwNEg0bpWGXHk/hXIRJcXWdPtcIXMP5w+vTSIY4= =SVW7 -----END PGP SIGNATURE----- Merge tag 'edac_updates_for_v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - Have skx_edac decode error addresses belonging to SGX properly - Remove a bunch of unused struct members - Other cleanups * tag 'edac_updates_for_v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/skx_common: Allow decoding of SGX addresses EDAC/mc_sysfs: Convert sprintf()/snprintf() to sysfs_emit() EDAC: Remove unused struct members EDAC: Remove dynamic attributes from edac_device_alloc_ctl_info() EDAC/device: Remove edac_dev_sysfs_block_attribute::store() EDAC/device: Remove edac_dev_sysfs_block_attribute::{block,value} EDAC/amd64: Remove unused struct member amd64_pvt::ext_nbcfg	2024-05-14 08:31:10 -07:00
Serge Semin	591c946675	EDAC/synopsys: Fix ECC status and IRQ control race condition The race condition around the ECCCLR register access happens in the IRQ disable method called in the device remove() procedure and in the ECC IRQ handler: 1. Enable IRQ: a. ECCCLR = EN_CE \| EN_UE 2. Disable IRQ: a. ECCCLR = 0 3. IRQ handler: a. ECCCLR = CLR_CE \| CLR_CE_CNT \| CLR_CE \| CLR_CE_CNT b. ECCCLR = 0 c. ECCCLR = EN_CE \| EN_UE So if the IRQ disabling procedure is called concurrently with the IRQ handler method the IRQ might be actually left enabled due to the statement 3c. The root cause of the problem is that ECCCLR register (which since v3.10a has been called as ECCCTL) has intermixed ECC status data clear flags and the IRQ enable/disable flags. Thus the IRQ disabling (clear EN flags) and handling (write 1 to clear ECC status data) procedures must be serialised around the ECCCTL register modification to prevent the race. So fix the problem described above by adding the spin-lock around the ECCCLR modifications and preventing the IRQ-handler from modifying the IRQs enable flags (there is no point in disabling the IRQ and then re-enabling it again within a single IRQ handler call, see the statements 3a/3b and 3c above). Fixes: `f7824ded41` ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR") Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240222181324.28242-2-fancer.lancer@gmail.com	2024-05-06 14:19:07 +02:00
Shubhrajyoti Datta	1a24733e80	EDAC/versal: Do not log total error counts When logging errors, the driver currently logs the total error count. However, it should log the current error only. Fix it. [ bp: Rewrite text. ] Fixes: `6f15b178cd` ("EDAC/versal: Add a Xilinx Versal memory controller driver") Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240425121942.26378-4-shubhrajyoti.datta@amd.com	2024-04-25 18:08:05 +02:00
Shubhrajyoti Datta	de87ba848d	EDAC/versal: Check user-supplied data before injecting an error The function inject_data_ue_store() lacks a NULL check for the user passed values. To prevent below kernel crash include a NULL check. Call trace: kstrtoull kstrtou8 inject_data_ue_store full_proxy_write vfs_write ksys_write __arm64_sys_write invoke_syscall el0_svc_common.constprop.0 do_el0_svc el0_svc el0t_64_sync_handler el0t_64_sync Fixes: `83bf24051a` ("EDAC/versal: Make the bit position of injected errors configurable") Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240425121942.26378-3-shubhrajyoti.datta@amd.com	2024-04-25 18:04:47 +02:00
Shubhrajyoti Datta	edbe59428e	EDAC/versal: Do not register for NOC errors The NOC errors are not handled in the driver. Remove the request for registration. Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240425121942.26378-2-shubhrajyoti.datta@amd.com	2024-04-25 18:04:14 +02:00
Qiuxu Zhuo	e0d3350778	EDAC/skx_common: Allow decoding of SGX addresses There are no "struct page" associations with SGX pages, causing the check pfn_to_online_page() to fail. This results in the inability to decode the SGX addresses and warning messages like: Invalid address 0x34cc9a98840 in IA32_MC17_ADDR Add an additional check to allow the decoding of the error address and to skip the warning message, if the error address is an SGX address. Fixes: `1e92af09fa` ("EDAC/skx_common: Filter out the invalid address") Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20240408120419.50234-1-qiuxu.zhuo@intel.com	2024-04-08 09:49:45 -07:00
Li Zhijian	d7518ad4ed	EDAC/mc_sysfs: Convert sprintf()/snprintf() to sysfs_emit() Per Documentation/filesystems/sysfs.rst, show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Generated by: make coccicheck M=<path/to/file> MODE=patch \ COCCI=scripts/coccinelle/api/device_attr_show.cocci No functional change intended. [ bp: Massage. ] Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240314084628.1322006-1-lizhijian@fujitsu.com	2024-04-04 18:23:57 +02:00
Jiri Slaby (SUSE)	c8d37084e9	EDAC: Remove unused struct members Remove unused - edac_pci_ctl_info::edac_subsys - edac_pci_ctl_info::complete - edac_device_ctl_info::removal_complete members. Found by https://github.com/jirislaby/clang-struct. [ bp: Squash three almost identical trivial patches into one. ] Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240213112051.27715-6-jirislaby@kernel.org Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2024-03-27 18:26:58 +01:00
Jiri Slaby (SUSE)	48bc8869c5	EDAC: Remove dynamic attributes from edac_device_alloc_ctl_info() Dynamic attributes are not passed from any caller of edac_device_alloc_ctl_info(). Drop this unused/untested functionality completely. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240213112051.27715-5-jirislaby@kernel.org Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2024-03-27 18:26:58 +01:00
Jiri Slaby (SUSE)	9186695ef7	EDAC/device: Remove edac_dev_sysfs_block_attribute::store() No one uses this store hook (both BLOCK_ATTR() pass NULL). It actually never was since its addition in `fd309a9d8e` ("drivers/edac: fix leaf sysfs attribute") so drop it. Found by https://github.com/jirislaby/clang-struct. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240213112051.27715-4-jirislaby@kernel.org Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2024-03-27 18:26:57 +01:00
Jiri Slaby (SUSE)	3667a35a50	EDAC/device: Remove edac_dev_sysfs_block_attribute::{block,value} They're unused. And they were never used since their addition in `fd309a9d8e` ("drivers/edac: fix leaf sysfs attribute") Drop it. Found by https://github.com/jirislaby/clang-struct. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240213112051.27715-3-jirislaby@kernel.org Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2024-03-27 18:26:57 +01:00
Jiri Slaby (SUSE)	f5ca0d5156	EDAC/amd64: Remove unused struct member amd64_pvt::ext_nbcfg Commit `cfe40fdb4a` ("amd64_edac: add driver header") added amd64_pvt struct with ext_nbcfg in it. But no one used that member since then. Therefore, remove it. Found by https://github.com/jirislaby/clang-struct. Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20240213112051.27715-2-jirislaby@kernel.org Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2024-03-27 18:26:57 +01:00
Linus Torvalds	b0402403e5	- Add a FRU (Field Replaceable Unit) memory poison manager which collects and manages previously encountered hw errors in order to save them to persistent storage across reboots. Previously recorded errors are "replayed" upon reboot in order to poison memory which has caused said errors in the past. The main use case is stacked, on-chip memory which cannot simply be replaced so poisoning faulty areas of it and thus making them inaccessible is the only strategy to prolong its lifetime. - Add an AMD address translation library glue which converts the reported addresses of hw errors into system physical addresses in order to be used by other subsystems like memory failure, for example. Add support for MI300 accelerators to that library. - igen6: Add support for Alder Lake-N SoC - i10nm: Add Grand Ridge support - The usual fixlets and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmXvKHcACgkQEsHwGGHe VUo4Lg/+OwXDI1EaCDyaHJ+f6JRmNok1EGjKMVjpp71/XmE3eUjiXfCv/b0bwl3V oIXGlXpJ5RSME+9aFDWADaE3h5zAGzTwQXuKtOUQPiJ6UuCebXodm8SaIG8V8trG yaW/hhP98AoJD+fN6qzv4XWYvTG8VRQs4tdISg9FXiljTjv4mKA+sxuCu8KpfrDh Tg+9F4Rre6gyR5GaB6N7Cc0k97DM7n5yKBZZGKucv+oYzDyf6n631ZSJ2zA9NC51 CJlux917hCXI/IWrCQ2nkyfPPXxn8AaznUAA30wKgwlt8TFSdKTW+DvRA2zyuAU3 0UDHO4FezOKuzVnWkzdnKsIMAnDyTGOz3Fi2LU4mC+JHaHHmI2quSWDxp5phWBuy S+T3XHxpbSsLGEI7zxT5F9u1oAlCvYu1C7HJw+yxNSn2iCy5LoNo0H/kl/nhR8Xr FgVp8SYgQRU2Pp8vgGOibMYY/TAHX55EticKdxvBI0yY+iqoJyAbZ0fb0XyLNc7s GqoWfvrK1KQzf5/Ya1Mm//0/QTPyFmJwujMJ2eEnMRRER+23bYpGvVBBT8E1sG9s gqEJkKjmVCPt9xJTcivm96sLJ7CG36w8+r/axSqpKXdcvDG9ec8G8PRqjlo5pcvh gYevmCBIcKny1xuhALwD6Rn2mkPip7araycDx9X9nd5z1qCxBaU= =FR2l -----END PGP SIGNATURE----- Merge tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - Add a FRU (Field Replaceable Unit) memory poison manager which collects and manages previously encountered hw errors in order to save them to persistent storage across reboots. Previously recorded errors are "replayed" upon reboot in order to poison memory which has caused said errors in the past. The main use case is stacked, on-chip memory which cannot simply be replaced so poisoning faulty areas of it and thus making them inaccessible is the only strategy to prolong its lifetime. - Add an AMD address translation library glue which converts the reported addresses of hw errors into system physical addresses in order to be used by other subsystems like memory failure, for example. Add support for MI300 accelerators to that library. - igen6: Add support for Alder Lake-N SoC - i10nm: Add Grand Ridge support - The usual fixlets and cleanups * tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/versal: Convert to platform remove callback returning void RAS/AMD/FMPM: Fix off by one when unwinding on error RAS/AMD/FMPM: Add debugfs interface to print record entries RAS/AMD/FMPM: Save SPA values RAS: Export helper to get ras_debugfs_dir RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2() RAS: Introduce a FRU memory poison manager RAS/AMD/ATL: Add MI300 row retirement support Documentation: Move RAS section to admin-guide EDAC/versal: Make the bit position of injected errors configurable EDAC/i10nm: Add Intel Grand Ridge micro-server support EDAC/igen6: Add one more Intel Alder Lake-N SoC support RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support RAS/AMD/ATL: Fix array overflow in get_logical_coh_st_fabric_id_mi300() RAS/AMD/ATL: Add MI300 support Documentation: RAS: Add index and address translation section EDAC/amd64: Use new AMD Address Translation Library RAS: Introduce AMD Address Translation Library EDAC/synopsys: Convert to devm_platform_ioremap_resource()	2024-03-11 18:14:06 -07:00
Borislav Petkov (AMD)	af65545a0f	Merge remote-tracking branches 'ras/edac-drivers', 'ras/edac-misc' and 'ras/edac-amd-atl' into edac-updates-for-v6.9 * ras/edac-drivers: EDAC/i10nm: Add Intel Grand Ridge micro-server support EDAC/igen6: Add one more Intel Alder Lake-N SoC support * ras/edac-misc: EDAC/versal: Convert to platform remove callback returning void EDAC/versal: Make the bit position of injected errors configurable EDAC/synopsys: Convert to devm_platform_ioremap_resource() * ras/edac-amd-atl: RAS/AMD/FMPM: Fix off by one when unwinding on error RAS/AMD/FMPM: Add debugfs interface to print record entries RAS/AMD/FMPM: Save SPA values RAS: Export helper to get ras_debugfs_dir RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2() RAS: Introduce a FRU memory poison manager RAS/AMD/ATL: Add MI300 row retirement support Documentation: Move RAS section to admin-guide RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support RAS/AMD/ATL: Fix array overflow in get_logical_coh_st_fabric_id_mi300() RAS/AMD/ATL: Add MI300 support Documentation: RAS: Add index and address translation section EDAC/amd64: Use new AMD Address Translation Library RAS: Introduce AMD Address Translation Library Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2024-03-11 16:24:20 +01:00
Uwe Kleine-König	4527a2194e	EDAC/versal: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve this, there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Link: https://lore.kernel.org/r/83deca1ce260f7e17ff3cb106c9a6946d4ca4505.1709886922.git.u.kleine-koenig@pengutronix.de	2024-03-08 13:57:49 +01:00
Thomas Gleixner	7e3ec62867	x86/cpu/amd: Provide a separate accessor for Node ID AMD (ab)uses topology_die_id() to store the Node ID information and topology_max_dies_per_pkg to store the number of nodes per package. This collides with the proper processor die level enumeration which is coming on AMD with CPUID 8000_0026, unless there is a correlation between the two. There is zero documentation about that. So provide new storage and new accessors which for now still access die_id and topology_max_die_per_pkg(). Will be mopped up after AMD and HYGON are converted over. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Juergen Gross <jgross@suse.com> Tested-by: Sohil Mehta <sohil.mehta@intel.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Wang Wendy <wendy.wang@intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Link: https://lore.kernel.org/r/20240212153624.956116738@linutronix.de	2024-02-15 22:07:37 +01:00
Shubhrajyoti Datta	83bf24051a	EDAC/versal: Make the bit position of injected errors configurable Currently, the bit positions to inject correctable and uncorrectable errors are hardcoded. To make that configurable add separate sysfs entries to set the bit positions for injecting CE and UE errors. Allow for single bit error for CE and two bits errors for UE injection. [ bp: Massage. ] Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240208094653.11704-1-shubhrajyoti.datta@amd.com	2024-02-14 08:57:35 +01:00
Qiuxu Zhuo	e77086c375	EDAC/i10nm: Add Intel Grand Ridge micro-server support The Grand Ridge CPU model uses similar memory controller registers with Granite Rapids server. Add Grand Ridge CPU model ID for EDAC support. Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20240129062040.60809-3-qiuxu.zhuo@intel.com	2024-02-01 12:36:50 -08:00
Lili Li	65c441ec58	EDAC/igen6: Add one more Intel Alder Lake-N SoC support Add a new Intel Alder Lake-N SoC compute die ID for EDAC support. Signed-off-by: Lili Li <lili.li@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lore.kernel.org/r/20240129062040.60809-2-qiuxu.zhuo@intel.com	2024-02-01 12:36:50 -08:00
Yazen Ghannam	6c9058f490	EDAC/amd64: Use new AMD Address Translation Library Remove old address translation code and use the new AMD Address Translation Library. Use "imply" in Kconfig so that the "AMD_ATL" config option takes the value of "EDAC_AMD64" as its default. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240123041401.79812-3-yazen.ghannam@amd.com	2024-01-24 12:55:00 +01:00
Yangtao Li	b57c1a1e7e	EDAC/synopsys: Convert to devm_platform_ioremap_resource() Use devm_platform_ioremap_resource() to simplify code. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Michal Simek <michal.simek@amd.com> Link: https://lore.kernel.org/r/20230704101811.49637-3-frank.li@vivo.com	2024-01-23 14:32:41 +01:00
Linus Torvalds	80955ae955	Driver core changes for 6.8-rc1 Here are the set of driver core and kernfs changes for 6.8-rc1. Nothing major in here this release cycle, just lots of small cleanups and some tweaks on kernfs that in the very end, got reverted and will come back in a safer way next release cycle. Included in here are: - more driver core 'const' cleanups and fixes - fw_devlink=rpm is now the default behavior - kernfs tiny changes to remove some string functions - cpu handling in the driver core is updated to work better on many systems that add topologies and cpus after booting - other minor changes and cleanups All of the cpu handling patches have been acked by the respective maintainers and are coming in here in one series. Everything has been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZaeOrg8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ymtcwCffzvKKkSY9qAp6+0v2WQNkZm1JWoAoJCPYUwF If6wEoPLWvRfKx4gIoq9 =D96r -----END PGP SIGNATURE----- Merge tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here are the set of driver core and kernfs changes for 6.8-rc1. Nothing major in here this release cycle, just lots of small cleanups and some tweaks on kernfs that in the very end, got reverted and will come back in a safer way next release cycle. Included in here are: - more driver core 'const' cleanups and fixes - fw_devlink=rpm is now the default behavior - kernfs tiny changes to remove some string functions - cpu handling in the driver core is updated to work better on many systems that add topologies and cpus after booting - other minor changes and cleanups All of the cpu handling patches have been acked by the respective maintainers and are coming in here in one series. Everything has been in linux-next for a while with no reported issues" * tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (51 commits) Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock" kernfs: convert kernfs_idr_lock to an irq safe raw spinlock class: fix use-after-free in class_register() PM: clk: make pm_clk_add_notifier() take a const pointer EDAC: constantify the struct bus_type usage kernfs: fix reference to renamed function driver core: device.h: fix Excess kernel-doc description warning driver core: class: fix Excess kernel-doc description warning driver core: mark remaining local bus_type variables as const driver core: container: make container_subsys const driver core: bus: constantify subsys_register() calls driver core: bus: make bus_sort_breadthfirst() take a const pointer kernfs: d_obtain_alias(NULL) will do the right thing... driver core: Better advertise dev_err_probe() kernfs: Convert kernfs_path_from_node_locked() from strlcpy() to strscpy() kernfs: Convert kernfs_name_locked() from strlcpy() to strscpy() kernfs: Convert kernfs_walk_ns() from strlcpy() to strscpy() initramfs: Expose retained initrd as sysfs file fs/kernfs/dir: obey S_ISGID kernel/cgroup: use kernfs_create_dir_ns() ...	2024-01-18 09:48:40 -08:00
Linus Torvalds	296455ade1	Char/Misc and other Driver changes for 6.8-rc1 Here is the big set of char/misc and other driver subsystem changes for 6.8-rc1. Lots of stuff in here, but first off, you will get a merge conflict in drivers/android/binder_alloc.c when merging this tree due to changing coming in through the -mm tree. The resolution of the merge issue can be found here: https://lore.kernel.org/r/20231207134213.25631ae9@canb.auug.org.au or in a simpler patch form in that thread: https://lore.kernel.org/r/ZXHzooF07LfQQYiE@google.com If there are issues with the merge of this file, please let me know. Other than lots of binder driver changes (as you can see by the merge conflicts) included in here are: - lots of iio driver updates and additions - spmi driver updates - eeprom driver updates - firmware driver updates - ocxl driver updates - mhi driver updates - w1 driver updates - nvmem driver updates - coresight driver updates - platform driver remove callback api changes - tags.sh script updates - bus_type constant marking cleanups - lots of other small driver updates All of these have been in linux-next for a while with no reported issues (other than the binder merge conflict.) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZaeMMQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ynWNgCfQ/Yz7QO6EMLDwHO5LRsb3YMhjL4AoNVdanjP YoI7f1I4GBcC0GKNfK6s =+Kyv -----END PGP SIGNATURE----- Merge tag 'char-misc-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc and other driver updates from Greg KH: "Here is the big set of char/misc and other driver subsystem changes for 6.8-rc1. Other than lots of binder driver changes (as you can see by the merge conflicts) included in here are: - lots of iio driver updates and additions - spmi driver updates - eeprom driver updates - firmware driver updates - ocxl driver updates - mhi driver updates - w1 driver updates - nvmem driver updates - coresight driver updates - platform driver remove callback api changes - tags.sh script updates - bus_type constant marking cleanups - lots of other small driver updates All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (341 commits) android: removed duplicate linux/errno uio: Fix use-after-free in uio_open drivers: soc: xilinx: add check for platform firmware: xilinx: Export function to use in other module scripts/tags.sh: remove find_sources scripts/tags.sh: use -n to test archinclude scripts/tags.sh: add local annotation scripts/tags.sh: use more portable -path instead of -wholename scripts/tags.sh: Update comment (addition of gtags) firmware: zynqmp: Convert to platform remove callback returning void firmware: turris-mox-rwtm: Convert to platform remove callback returning void firmware: stratix10-svc: Convert to platform remove callback returning void firmware: stratix10-rsu: Convert to platform remove callback returning void firmware: raspberrypi: Convert to platform remove callback returning void firmware: qemu_fw_cfg: Convert to platform remove callback returning void firmware: mtk-adsp-ipc: Convert to platform remove callback returning void firmware: imx-dsp: Convert to platform remove callback returning void firmware: coreboot_table: Convert to platform remove callback returning void firmware: arm_scpi: Convert to platform remove callback returning void firmware: arm_scmi: Convert to platform remove callback returning void ...	2024-01-17 16:47:17 -08:00
Linus Torvalds	3edbe8afb6	- Convert the hw error storm handling into a finer-grained, per-bank solution which allows for more timely detection and reporting of errors - Start a documentation section which will hold down relevant RAS features description and how they should be used - Add new AMD error bank types - Slim down and remove error type descriptions from the kernel side of error decoding to rasdaemon which can be used from now on to decode hw errors on AMD - Mark pages containing uncorrectable errors as poison so that kdump can avoid them and thus not cause another panic - The usual cleanups and fixlets -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmWamqkACgkQEsHwGGHe VUrkMw/+Jv//z5pKFMXF2GlI/Xefh8pXDB57D22B1T6zNJm+Qq1b58U44WpxoU24 b9gPtkgxjzA3JwoQG+cBDkCSs1xLV3McUS0UoEqQ28QvFN+mzxYKu4ura3F+rcZG PiCT4gPEYgZirYl0PKYeypnBPq3Krx/RYeTUE0vQ9HtmeBCmH71X1egyB4TqFbFU 7ui9aITLAnLBwgO6On1qviGPvJoEDGAsQ656XoY0Js+8dUeYwAI4qpaaUwtXUo1D ARGzss55/qTRo2G+IkDDhbJ8e8G+eX22oa0n1tdhNFBqfYAN6gM+t4NiFlNnn+18 nlbaSZfDygciE8DVjEnkVIrRJtkq7uj0dk7LvnqEI2y7J0LybHojC3hDKrqYLK3o PRgfPwmykOCwZQldGFYbShvmY8KoEQc/V9OWi/+A/M/uTJsForQmHn78Z2YkO9kG K6VaLuYszSCqz47wf56pHBwtMrivEPmkcxaz9ErkK90vM/NmV7kfLl+R8IK8apNJ cJkuLBjfgGpBP+AlpXGl9OE0lRJK5MCbEBlbGBBl58REBaB4DNkM4QHItrUSRR82 ADLcfLAIRWT8UwwXieDbWF1jb+4L+IJnXCKGCCQ7+eYxcFI9V9TABD2B8io+Dzvz evZwLCPKmjuPc2CMcDu/eUdBKLNTn3QAoN/NLcVmzJ23lguQW/M= =o3UM -----END PGP SIGNATURE----- Merge tag 'ras_core_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Borislav Petkov: - Convert the hw error storm handling into a finer-grained, per-bank solution which allows for more timely detection and reporting of errors - Start a documentation section which will hold down relevant RAS features description and how they should be used - Add new AMD error bank types - Slim down and remove error type descriptions from the kernel side of error decoding to rasdaemon which can be used from now on to decode hw errors on AMD - Mark pages containing uncorrectable errors as poison so that kdump can avoid them and thus not cause another panic - The usual cleanups and fixlets * tag 'ras_core_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Handle Intel threshold interrupt storms x86/mce: Add per-bank CMCI storm mitigation x86/mce: Remove old CMCI storm mitigation code Documentation: Begin a RAS section x86/MCE/AMD: Add new MA_LLC, USR_DP, and USR_CP bank types EDAC/mce_amd: Remove SMCA Extended Error code descriptions x86/mce/amd, EDAC/mce_amd: Move long names to decoder module x86/mce/inject: Clear test status value x86/mce: Remove redundant check from mce_device_create() x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel	2024-01-08 16:03:00 -08:00
Linus Torvalds	1dee7f509d	- The EDAC drivers part of the effort to make the ->remove() platform driver callback return void - Add support for AMD AI accelerators - Add support for a number of Intel SoCs: Alder Lake-N, Raptor Lake-P, Meteor Lake-{P,PS} - Random fixes and cleanups all over the place -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmWZRIAACgkQEsHwGGHe VUpesxAAux/mgyqJ4lfzpP4I3LkDA7X9uU3Iv6j9NtS64TGvulT4ZMiSL/U/nqTY /jsJYNJE4yIXm2Bf/T3zx+Dxh6MmUZNEYzSgZyyNx9NPEkHi4gX+b9F2nUx9i66s D0Pn9IVbM1liNfGr2RSSsKwFa/5UBg+58nGF1DSUz+rOaQ0KOoeta9WE98ne4n81 fC0o4RCzcPswy3NVhi3wPoBiAfyMYmrtvBcSezIbV6CaVJB8aFkvZK4xGKkEBFoJ ApVrSr6bEjJ7I7DdudisfIhLfU6iji7ZGfkwiouN2TIfVCaOZdAVan/OETpPLC6b IPNac1iIIl6v2OfxQp6s6b/Vq+dL+vVa4SW5tNj242s9eHunuevEcly37x4W254m vt91K9dvZI3I5DzAbVagw3Z9tFHbIH7uiSYyjMoyYw11EvwY+bZPXfZExvEUuPQK 2f+VL4ELlBj0imrWSCur5hF1g8R2ejTjb+kYlTcULyaXB4ANeBVzeShff3qc/21g 3QojVA7v+7xkPwURKxRwlqkwE113cz0Rr/kKjXhXnBOuPulX0T5eSzX8WExZzSCT 4FsEvAwZPFt7iCTxB+cspykNxyxLqzMb8GZFG+Ji8szGNbVgiidn/QEQzohs8yBu aFenuLEM7XwvpPCjmLGGnRkFV9mBDfPATLegTFvMpPQVYhjPdp4= =Bptq -----END PGP SIGNATURE----- Merge tag 'edac_updates_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - The EDAC drivers part of the effort to make the ->remove() platform driver callback return void - Add support for AMD AI accelerators - Add support for a number of Intel SoCs: Alder Lake-N, Raptor Lake-P, Meteor Lake-{P,PS} - Random fixes and cleanups all over the place * tag 'edac_updates_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (39 commits) EDAC/skx_common: Filter out the invalid address EDAC, pnd2: Sort headers alphabetically EDAC, pnd2: Correct misleading error message in mk_region_mask() EDAC, pnd2: Apply bit macros and helpers where it makes sense EDAC, pnd2: Replace custom definition by one from sizes.h EDAC/igen6: Add Intel Meteor Lake-P SoCs support EDAC/igen6: Add Intel Meteor Lake-PS SoCs support EDAC/igen6: Add Intel Raptor Lake-P SoCs support EDAC/igen6: Add Intel Alder Lake-N SoCs support EDAC/igen6: Make get_mchbar() helper function EDAC/amd64: Add support for family 0x19, models 0x90-9f devices EDAC/mc: Add support for HBM3 memory type EDAC/{sb,i7core}_edac: Do not use a plain integer for a NULL pointer EDAC/armada_xp: Explicitly include correct DT includes EDAC/pci_sysfs: Use PCI_HEADER_TYPE_MASK instead of literals EDAC/thunderx: Fix possible out-of-bounds string access EDAC/fsl_ddr: Convert to platform remove callback returning void EDAC/zynqmp: Convert to platform remove callback returning void EDAC/xgene: Convert to platform remove callback returning void EDAC/ti: Convert to platform remove callback returning void ...	2024-01-08 13:22:30 -08:00
Jay Buddhabhatti	97d62760e4	drivers: soc: xilinx: add check for platform Some error event IDs for Versal and Versal NET are different. Both the platforms should access their respective error event IDs so use sub_family_code to check for platform and check error IDs for respective platforms. The family code is passed via platform data to avoid platform detection again. Platform data is setup when even driver is registered. Signed-off-by: Jay Buddhabhatti <jay.buddhabhatti@amd.com> Link: https://lore.kernel.org/r/20231219055025.27570-3-jay.buddhabhatti@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-01-04 17:02:49 +01:00
Greg Kroah-Hartman	f36be9ce81	EDAC: constantify the struct bus_type usage In many places in the edac code, struct bus_type pointers are passed around and then eventually sent to the driver core, which can handle a constant pointer. So constantify all of the edac usage of these as well because the data in them is never modified by the edac code either. Cc: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: James Morse <james.morse@arm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Robert Richter <rric@kernel.org> Cc: <linux-edac@vger.kernel.org> Link: https://lore.kernel.org/r/2023121909-tribute-punctuate-4b22@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2024-01-04 14:34:27 +01:00
Qiuxu Zhuo	1e92af09fa	EDAC/skx_common: Filter out the invalid address Decoding an invalid address with certain firmware decoders could cause a #PF (Page Fault) in the EFI runtime context, which could subsequently hang the system. To make {i10nm,skx}_edac more robust against such bogus firmware decoders, filter out invalid addresses before allowing the firmware decoder to process them. Suggested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20231207014512.78564-1-qiuxu.zhuo@intel.com	2024-01-02 09:20:08 -08:00
Shubhrajyoti Datta	9483aa4491	EDAC/versal: Read num_csrows and num_chans using the correct bitfield macro Fix the extraction of num_csrows and num_chans. The extraction of the num_rows is wrong. Instead of extracting using the FIELD_GET it is calling FIELD_PREP. The issue was masked as the default design has the rows as 0. Fixes: `6f15b178cd` ("EDAC/versal: Add a Xilinx Versal memory controller driver") Closes: https://lore.kernel.org/all/60ca157e-6eff-d12c-9dc0-8aeab125edda@linux-m68k.org/ Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231215053352.8740-1-shubhrajyoti.datta@amd.com	2023-12-15 13:01:27 +01:00
Andy Shevchenko	a69badad73	EDAC, pnd2: Sort headers alphabetically Sort the headers in alphabetic order in order to ease the maintenance for this part. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2023-12-05 13:04:57 -08:00
Andy Shevchenko	f1b0b1167f	EDAC, pnd2: Correct misleading error message in mk_region_mask() The mask parameter is expected to be of a sequence of the set bits. It does not mean it must be power of two (only single bit set). Correct misleading error message. Suggested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2023-12-05 13:04:57 -08:00
Andy Shevchenko	530258f872	EDAC, pnd2: Apply bit macros and helpers where it makes sense Apply bit macros (BIT()/BIT_ULL()/GENMASK()/etc) and helpers (for_each_set_bit()/etc) where it makes sense. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2023-12-05 13:04:57 -08:00
Andy Shevchenko	a50cc8de99	EDAC, pnd2: Replace custom definition by one from sizes.h The sizes.h provides a set of common size definitions, use it. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2023-12-05 13:04:57 -08:00
Qiuxu Zhuo	6807434ff0	EDAC/igen6: Add Intel Meteor Lake-P SoCs support Add Intel Meteor Lake-P SoC compute die IDs for EDAC support. These Meteor Lake-P SoCs share similar IBECC registers with Alder Lake-P SoCs. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2023-12-05 10:59:13 -08:00
Qiuxu Zhuo	3c77090c12	EDAC/igen6: Add Intel Meteor Lake-PS SoCs support Add Intel Meteor Lake-PS SoC compute die IDs for EDAC support. These SoCs share similar IBECC registers with Alder Lake-P SoCs. The only difference is that IBECC presence is detected through an MMIO-mapped register instead of the capability register in the PCI configuration space. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2023-12-05 10:59:13 -08:00
Qiuxu Zhuo	d23627a768	EDAC/igen6: Add Intel Raptor Lake-P SoCs support Add Intel Raptor Lake-P SoC compute die IDs for EDAC support. These Raptor Lake-P SoCs share similar IBECC registers with Alder Lake-P SoCs but extend the most significant bit of the error address logged in IBECC from bit 38 to bit 45. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2023-12-05 10:59:13 -08:00
Qiuxu Zhuo	c4a5398991	EDAC/igen6: Add Intel Alder Lake-N SoCs support Add Intel Alder Lake-N SoC compute die IDs for EDAC support. Alder Lake-N, with one memory controller, is a reduced version of Alder Lake-P, which has two memory controllers. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2023-12-05 10:59:13 -08:00
Qiuxu Zhuo	a264f715ec	EDAC/igen6: Make get_mchbar() helper function Make get_mchbar() helper function to retrieve the BAR address of the memory controller. No function changes. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2023-12-05 10:59:13 -08:00
Muralidhara M K	12f230c07a	EDAC/amd64: Add support for family 0x19, models 0x90-9f devices AMD Models 90h-9fh are APUs. They have built-in HBM3 memory. ECC support is enabled by default. APU models have a single Data Fabric (DF) per Package. Each DF is visible to the OS in the same way as chiplet-based systems like Zen2 CPUs and later. However, the Unified Memory Controllers (UMCs) are arranged in the same way as GPU-based MI200 devices rather than CPU-based systems. Use the existing gpu_ops for hetergeneous systems to support enumeration of nodes and memory topology with few fixups. [ bp: Massage comments. ] Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231102114225.2006878-5-muralimk@amd.com	2023-11-29 11:21:05 +01:00
Muralidhara M K	9a5f580c1c	EDAC/mc: Add support for HBM3 memory type AMD MI300A models use HBM3 (High Bandwidth Memory Gen 3) memory. HBM is a high-speed computer memory interface for 3D-stacked synchronous dynamic random-access memory (SDRAM). Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231102114225.2006878-4-muralimk@amd.com	2023-11-28 16:39:12 +01:00
Abhinav Singh	a2f99fbae4	EDAC/{sb,i7core}_edac: Do not use a plain integer for a NULL pointer Sparse warns about the use of the integer constant 0 as a NULL pointer with the -Wnon-pointer-null switch. Even though the C standard requires that 0 == NULL and type conversion rules turn an integer constant 0 into a NULL pointer when cast to a void * type, Linus notes that this is a very poor situation from a type safety angle and a pointer should be initialized with a pointer type - not an integer constant. See https://www.spinics.net/lists/linux-sparse/msg10066.html for more info. [ bp: Rewrite commit message, drop useless comments in the code. ] Signed-off-by: Abhinav Singh <singhabhinav9051571833@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231128141703.614605-1-singhabhinav9051571833@gmail.com	2023-11-28 15:43:43 +01:00
Muralidhara M K	9f988030e8	EDAC/mce_amd: Remove SMCA Extended Error code descriptions On AMD systems with Scalable MCA each machine check error of a SMCA bank type has an associated bit position in the bank's control (CTL) register. An error's bit position in the CTL register is used during error decoding for offsetting into the corresponding bank's error description structure. As new errors are being added in newer AMD systems for existing SMCA bank types, the underlying SMCA architecture guarantees that the bit positions of existing errors are not altered. However, on some AMD systems some of the existing bit definitions in the CTL register of SMCA bank type are reassigned without defining new HWID and McaType. Consequently, the errors whose bit definitions have been reassigned in the CTL register are being erroneously decoded. Remove SMCA Extended Error Code descriptions, this avoids decoding issues for incorrectly reassigned bits, and avoids the related maintenance burden in the kernel. But the bank type and Extended Error Code value for an error will continue to be printed as a convenience. The decoding of SMCA Extended Error Code description can be done by referring to AMD documentation or use external tools such as rasdaemon. Offline decoding can be done using below option in rasdaemon. For example: $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca Also, the user can pass particular family and model to decode the error string. $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca --family <CPU Family> --model <CPU Model> --bank <BANK_NUM> Refer to the rasdaemon commit for details: https://github.com/mchehab/rasdaemon/commit/932118b04a04104dfac6b8536 Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://lore.kernel.org/r/20231102114225.2006878-2-muralimk@amd.com	2023-11-28 15:17:09 +01:00
Rob Herring	9e08ac1b5e	EDAC/armada_xp: Explicitly include correct DT includes The DT of_device.h and of_platform.h date back to the separate of_platform_bus_type before it was merged into the regular platform bus. As part of that merge prepping Arm DT support 13 years ago, they "temporarily" include each other. They also include platform_device.h and of.h. As a result, there's a pretty much random mix of those include files used throughout the tree. In order to detangle these headers and replace the implicit includes with struct declarations, users need to explicitly include the correct includes. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Jan Luebbe <jlu@pengutronix.de> Link: https://lore.kernel.org/r/20231013190342.246973-1-robh@kernel.org	2023-11-27 18:20:51 +01:00
Yazen Ghannam	ff03ff328f	x86/mce/amd, EDAC/mce_amd: Move long names to decoder module The long names of the SMCA banks are only used by the MCE decoder module. Move them out of the arch code and into the decoder module. [ bp: Name the long names array "smca_long_names", drop local ptr in decode_smca_error(), constify arrays. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231118193248.1296798-5-yazen.ghannam@amd.com	2023-11-27 12:16:51 +01:00
Ilpo Järvinen	5f57b717cc	EDAC/pci_sysfs: Use PCI_HEADER_TYPE_MASK instead of literals Replace literal 0x7f with PCI_HEADER_TYPE_MASK. No functional changes: $ sha1sum drivers/edac/edac_pci_sysfs.o.* 805b33a090d8019d8b3b348191f630c72c748c9c drivers/edac/edac_pci_sysfs.o.before 805b33a090d8019d8b3b348191f630c72c748c9c drivers/edac/edac_pci_sysfs.o.after Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231124090919.23687-5-ilpo.jarvinen@linux.intel.com	2023-11-24 15:08:37 +01:00
Arnd Bergmann	475c58e1a4	EDAC/thunderx: Fix possible out-of-bounds string access Enabling -Wstringop-overflow globally exposes a warning for a common bug in the usage of strncat(): drivers/edac/thunderx_edac.c: In function 'thunderx_ocx_com_threaded_isr': drivers/edac/thunderx_edac.c:1136:17: error: 'strncat' specified bound 1024 equals destination size [-Werror=stringop-overflow=] 1136 \| strncat(msg, other, OCX_MESSAGE_SIZE); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... 1145 \| strncat(msg, other, OCX_MESSAGE_SIZE); ... 1150 \| strncat(msg, other, OCX_MESSAGE_SIZE); ... Apparently the author of this driver expected strncat() to behave the way that strlcat() does, which uses the size of the destination buffer as its third argument rather than the length of the source buffer. The result is that there is no check on the size of the allocated buffer. Change it to strlcat(). [ bp: Trim compiler output, fixup commit message. ] Fixes: `41003396f9` ("EDAC, thunderx: Add Cavium ThunderX EDAC driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20231122222007.3199885-1-arnd@kernel.org	2023-11-23 17:44:09 +01:00
Uwe Kleine-König	0c7c7ba0c7	EDAC/fsl_ddr: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). fsl_mc_err_remove() is used as callback in two drivers. So these have to be converted together to the void returning remove callback. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20231013100422.1382040-2-u.kleine-koenig@pengutronix.de	2023-11-20 23:34:04 +01:00
Uwe Kleine-König	ec886cf881	EDAC/zynqmp: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Michal Simek <michal.simek@amd.com> Link: https://lore.kernel.org/r/20231004131254.2673842-22-u.kleine-koenig@pengutronix.de	2023-11-20 23:33:21 +01:00

1 2 3 4 5 ...

1982 Commits