Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening in
the driver core in the quest to be able to move "struct bus" and "struct
class" into read-only memory, a task now complete with these changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules for
all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most of
them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5
LEGadNS38k5fs+73UaxV
=7K4B
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening
in the driver core in the quest to be able to move "struct bus" and
"struct class" into read-only memory, a task now complete with these
changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules
for all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most
of them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems"
* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
device property: make device_property functions take const device *
driver core: update comments in device_rename()
driver core: Don't require dynamic_debug for initcall_debug probe timing
firmware_loader: rework crypto dependencies
firmware_loader: Strip off \n from customized path
zram: fix up permission for the hot_add sysfs file
cacheinfo: Add use_arch[|_cache]_info field/function
arch_topology: Remove early cacheinfo error message if -ENOENT
cacheinfo: Check cache properties are present in DT
cacheinfo: Check sib_leaf in cache_leaves_are_shared()
cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
cacheinfo: Add arm64 early level initializer implementation
cacheinfo: Add arch specific early level initializer
tty: make tty_class a static const structure
driver core: class: remove struct class_interface * from callbacks
driver core: class: mark the struct class in struct class_interface constant
driver core: class: make class_register() take a const *
driver core: class: mark class_release() as taking a const *
driver core: remove incorrect comment for device_create*
MIPS: vpe-cmp: remove module owner pointer from struct class usage.
...
The most notable updates this time are for Qualcomm Snapdragon platforms.
The Inline-Crypto-Engine gets a new DT binding and driver. A number of
drivers now support additional Snapdragon variants, in particular the
rsc, scm, geni, bwm, glink and socinfo, while the llcc (edac) and rpm
drivers get notable functionality updates.
Updates on other platforms include:
- Various updates to the Mediatek mutex and mmsys drivers, including
support for the Helio X10 SoC
- Support for unidirectional mailbox channels in Arm SCMI firmware
- Support for per cpu asynchronous notification in OP-TEE firmware
- Minor updates for memory controller drivers.
- Minor updates for Renesas, TI, Amlogic, Apple, Broadcom, Tegra,
Allwinner, Versatile Express, Canaan, Microchip, Mediatek and i.MX
SoC drivers, mainly updating the use of MODULE_LICENSE() macros and
obsolete DT driver interfaces.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmRGmncACgkQYKtH/8kJ
Uif6ghAAw1TiPTJzJLLCNx+txOVFB62WDglv3T1CufjfcWp0Eh0RJSCcsCOPV+/7
UHi4+X4nPAcudeOFMFtslCR8ExLRWY4j7t2ZYo/k+VI3jdB8Qkbr6NAQgAuRdLYX
WZ1cV6o76B3bhO2HqSVNVZ8/3Z7OAYw4j9VDD/4AbW+l3GyentlQTjabpJNREvSS
5HzT3ZI33o7M8mM4uYmmEXVrg8sCupbRyL9S7jTiFXRLcfqujclhfezJ4UrJJv7b
wxGf+e2YNMqKH6PiKYufzN1TYI2D0YQeB1m56Y9FsAKxgAyHh2xWpsHeyVnaw0jc
KaKjRN/H3JDlW/VCMAjQOIShCZdAs02xHnEXxY6pKLMM6i8/FkzzNIxNQwXrx5KH
zYESXVd6suOI0eCZT8zkKKLHRT5EJRaliUv5Z+Qp2BBe3vJVZD0JqSlZ7lOznplF
lviwL6ydAMr2cfTgfMxbRiYQVDzncFkfnR3t55SC6rYjGt6QWjeS0dDbGHf4WVC4
FDbnST4JaBmi+frh55VooX7EpzIv9wa0/taayaChd9qvXnh22uqaqho1sPYKZ6BI
OXduHQ3qojJhKKKK1VJKzN5Ef3OHLQLNrvcc1DsKILrrES4w4LX1C9dmyh2CLXLo
q5cX6L1iB1Hx5tujalDYBsHBBmbiT/1tNM2S7pAGigiGy4KEc28=
=r6jm
-----END PGP SIGNATURE-----
Merge tag 'soc-drivers-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC driver updates from Arnd Bergmann:
"The most notable updates this time are for Qualcomm Snapdragon
platforms. The Inline-Crypto-Engine gets a new DT binding and driver,
and a number of drivers now support additional Snapdragon variants, in
particular the rsc, scm, geni, bwm, glink and socinfo, while the llcc
(edac) and rpm drivers get notable functionality updates.
Updates on other platforms include:
- Various updates to the Mediatek mutex and mmsys drivers, including
support for the Helio X10 SoC
- Support for unidirectional mailbox channels in Arm SCMI firmware
- Support for per cpu asynchronous notification in OP-TEE firmware
- Minor updates for memory controller drivers.
- Minor updates for Renesas, TI, Amlogic, Apple, Broadcom, Tegra,
Allwinner, Versatile Express, Canaan, Microchip, Mediatek and i.MX
SoC drivers, mainly updating the use of MODULE_LICENSE() macros and
obsolete DT driver interfaces"
* tag 'soc-drivers-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (165 commits)
soc: ti: smartreflex: Simplify getting the opam_sr pointer
bus: vexpress-config: Add explicit of_platform.h include
soc: mediatek: Kconfig: Add MTK_CMDQ dependency to MTK_MMSYS
memory: mtk-smi: mt8365: Add SMI Support
dt-bindings: memory-controllers: mediatek,smi-larb: add mt8365
dt-bindings: memory-controllers: mediatek,smi-common: add mt8365
memory: tegra: read values from correct device
dt-bindings: crypto: Add Qualcomm Inline Crypto Engine
soc: qcom: Make the Qualcomm UFS/SDCC ICE a dedicated driver
dt-bindings: firmware: document Qualcomm QCM2290 SCM
soc: qcom: rpmh-rsc: Support RSC v3 minor versions
soc: qcom: smd-rpm: Use GFP_ATOMIC in write path
soc/tegra: fuse: Remove nvmem root only access
soc/tegra: cbb: tegra194: Use of_address_count() helper
soc/tegra: cbb: Remove MODULE_LICENSE in non-modules
ARM: tegra: Remove MODULE_LICENSE in non-modules
soc/tegra: flowctrl: Use devm_platform_get_and_ioremap_resource()
soc: tegra: cbb: Drop empty platform remove function
firmware: arm_scmi: Add support for unidirectional mailbox channels
dt-bindings: firmware: arm,scmi: Support mailboxes unidirectional channels
...
Combine all queued EDAC changes for submission into v6.4:
* ras/edac-drivers:
EDAC/i10nm: Add Intel Sierra Forest server support
EDAC/skx: Fix overflows on the DRAM row address mapping arrays
* ras/edac-amd64: (27 commits)
EDAC/amd64: Fix indentation in umc_determine_edac_cap()
EDAC/amd64: Add get_err_info() to pvt->ops
EDAC/amd64: Split dump_misc_regs() into dct/umc functions
EDAC/amd64: Split init_csrows() into dct/umc functions
EDAC/amd64: Split determine_edac_cap() into dct/umc functions
EDAC/amd64: Rename f17h_determine_edac_ctl_cap()
EDAC/amd64: Split setup_mci_misc_attrs() into dct/umc functions
EDAC/amd64: Split ecc_enabled() into dct/umc functions
EDAC/amd64: Split read_mc_regs() into dct/umc functions
EDAC/amd64: Split determine_memory_type() into dct/umc functions
EDAC/amd64: Split read_base_mask() into dct/umc functions
EDAC/amd64: Split prep_chip_selects() into dct/umc functions
EDAC/amd64: Rework hw_info_{get,put}
EDAC/amd64: Merge struct amd64_family_type into struct amd64_pvt
EDAC/amd64: Do not discover ECC symbol size for Family 17h and later
EDAC/amd64: Drop dbam_to_cs() for Family 17h and later
EDAC/amd64: Split get_csrow_nr_pages() into dct/umc functions
EDAC/amd64: Rename debug_display_dimm_sizes()
* ras/edac-misc:
EDAC/altera: Remove MODULE_LICENSE in non-module
EDAC: Sanitize MODULE_AUTHOR strings
EDAC/amd81[13]1: Remove trailing newline from MODULE_AUTHOR
EDAC/i5100: Fix typo in comment
EDAC/altera: Remove redundant error logging
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
The Sierra Forest CPU model uses similar memory controller registers as
Granite Rapids server. Add Sierra Forest CPU model ID for EDAC support.
Tested-by: Li Zhang <li4.zhang@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20230410131531.11914-1-qiuxu.zhuo@intel.com
Since
8b41fc4454 ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"),
MODULE_LICENSE declarations are used to identify modules. As
a consequence, uses of the macro in non-modules will cause modprobe to
misidentify their containing object file as a module when it is not
(false positives), and modprobe might succeed rather than failing with
a suitable error message.
altera_edac is not a module for a while now, remove the macro call.
Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230217141059.392471-24-nick.alcock@oracle.com
GPU Nodes will use a different method to determine the chip select
and channel of an error. A function pointer should be used rather than
introduce another branching condition.
Prepare for this by adding get_err_info() to pvt->ops. This function is
only called from the modern code path, so a legacy function is not
defined.
Make sure to call this after MCA_STATUS[SyndV] is checked, since the
csrow value is found in MCA_SYND.
[ Yazen: rebased/reworked patch and reworded commit message. ]
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-23-yazen.ghannam@amd.com
Call them from their respective setup_mci_misc_attrs() paths.
Also, drop the check for an "empty" device, i.e. one without memory.
This is redundant and already done in instance_has_memory() earlier in
the init path.
No functional change is intended.
[ Yazen: rebased/reworked patch and reworded commit message. ]
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-21-yazen.ghannam@amd.com
The init_one_instance() path is shared between legacy and modern
systems. So add the new functions to a function pointer in pvt->ops.
No functional change is intended.
[ Yazen: Rebased/reworked patch and reworded commit message. ]
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-18-yazen.ghannam@amd.com
Call them using a function pointer in pvt->ops. The "ECC enabled"
check is done outside of the hardware information gathering done in
hw_info_get(). So a high-level function pointer is needed to separate
the legacy and modern paths.
No functional change is intended.
[Yazen: rebased/reworked patch and reworded commit message. ]
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-17-yazen.ghannam@amd.com
Call them from their respective hw_info_get() paths.
ECC symbol size is not needed on UMC systems, so determine_ecc_sym_sz()
is left out of the UMC path. Do not save TOP_MEM* values on modern
controllers because they're not needed there (read: they were used only
for debugging, if anything).
[ Yazen: rebased/reworked patch and reworded commit message. ]
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-16-yazen.ghannam@amd.com
Call them from their respective hw_info_get() paths.
Call them after all other hardware registers have been saved, since the
memory type for a device will be determined based on the saved
information.
[ Yazen: rebased/reworked patch and reworded commit message. ]
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-15-yazen.ghannam@amd.com
Call them from their respective hw_info_get() paths.
Call the new functions after the setting the chip select base and mask
counts, since those are need to read the correct number of chip select
base and mask registers. And call the new functions before the remaining
set up, because the base and mask register values will be needed later.
[Yazen: Rebased/reworked patch and reworded commit message. ]
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-14-yazen.ghannam@amd.com
Call them from their respective hw_info_get() function. Avoid the
need for family/model-based function pointers.
Add the calls before reading hardware registers from the memory
controllers, since the number of chip select bases and masks needs to be
known first.
[ Yazen: rebased/reworked patch and reworded commit message. ]
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-13-yazen.ghannam@amd.com
The bulk of system-specific information is gathered at init time with
hw_info_get(). This function calls a number of helper functions, and
many of these helper functions are split between a modern UMC/DF path
and a legacy DCT path.
Split hw_info_get() into legacy and modern versions. This creates two
separate code paths early on, and legacy and modern helper functions can
be called directly in the appropriate code path.
Also, simplify hw_info_put() and share it between legacy and modern
systems. NULL pointer checks are done in pci_dev_put() and kfree(), so
they can be called unconditionally.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-12-yazen.ghannam@amd.com
Future AMD systems will support heterogeneous "AMD Node" types, e.g.
CPU and GPU types. Therefore, a global family type shared across all
AMD nodes is no longer appropriate.
Move struct low_ops routines and members of struct amd64_family_type
to struct amd64_pvt.
Currently, there are many code branches that split between "modern" and
"legacy" systems. Another code branch will be needed in order to cover
GPU cases. However, rather than introduce another branching case in
multiple functions, the current branching code should be switched to a
set of function pointers. This change makes the code more readable and
simplifies adding support for new families/models.
In order to reuse code, define two sets of function pointers. Use one
for modern systems (Family 17h and later). This will not change between
current CPU families. Use another set of function pointers for legacy
systems (before Family 17h). Use the Family 16h versions as default
for the legacy ops since these are the latest, and adjust the function
pointers as needed for older families.
[ Yazen: rebased/reworked patch and reworded commit message. ]
[ bp: Fix rev8 or later check. ]
Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>
Co-developed-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Signed-off-by: Naveen Krishna Chatradhi <naveenkrishna.chatradhi@amd.com>
Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-11-yazen.ghannam@amd.com
The ECC symbol size was needed on legacy system to lookup the ECC syndrome.
This is not needed on modern systems because the ECC syndrome is explicitly
provided in the MCA information.
Remove the ECC symbol size discovery code for modern UMC-based systems.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-10-yazen.ghannam@amd.com
The same function is used to calculate chip select size for all Zen-based
family/models. Therefore, a family/model function pointer is not necessary.
Drop the dbam_to_cs() function pointer for Family 17h and later systems.
Also, move the Family 17h function to avoid a forward declaration. Rename
it to indicate that the UMC Address Mask is used rather than the legacy
DBAM value.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-9-yazen.ghannam@amd.com
Split get_csrow_nr_pages() into a legacy and modern versions in preparation
for further legacy/modern refactoring.
Also, rename f17_get_cs_mode() to match the new convention.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-8-yazen.ghannam@amd.com
Use the "dct" and "umc" prefixes for legacy and modern versions
respectively.
Also, move the "dct" version to avoid a forward declaration, and fixup
some checkpatch warnings in the process.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-7-yazen.ghannam@amd.com
Direct access to the struct bus_type dev_root pointer is going away soon
so replace that with a call to bus_get_dev_root() instead, which is what
it is there for.
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rric@kernel.org>
Cc: linux-edac@vger.kernel.org
Link: https://lore.kernel.org/r/20230313182918.1312597-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A call to platform_get_irq() already prints an error on failure within
its own implementation. So printing another error based on its return
value in the caller is redundant and should be removed. The clean up
also makes if condition block braces unnecessary. Remove that as well.
Issue identified using platform_get_irq.cocci coccinelle semantic patch.
Signed-off-by: Deepak R Varma <drv@mailo.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/Y/+j27kqdhflPtaj@ubun2204.myguest.virtualbox.org
Not all Qcom platforms support IRQ mode for ECC handling. For those
platforms, the current EDAC driver will not be probed due to missing ECC
IRQ in devicetree.
So add support for polling mode so that the EDAC driver can be used on all
Qcom platforms supporting LLCC.
The polling delay of 5000ms is chosen based on Qcom downstream/vendor
driver.
Reported-by: Luca Weiss <luca.weiss@fairphone.com>
Tested-by: Luca Weiss <luca.weiss@fairphone.com>
Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230314080443.64635-14-manivannan.sadhasivam@linaro.org
The Qualcomm LLCC/EDAC drivers were using a fixed register stride for
accessing the (Control and Status Registers) CSRs of each LLCC bank.
This stride only works for some SoCs like SDM845 for which driver
support was initially added.
But the later SoCs use different register stride that vary between the
banks with holes in-between. So it is not possible to use a single register
stride for accessing the CSRs of each bank. By doing so could result in a
crash.
For fixing this issue, let's obtain the base address of each LLCC bank from
devicetree and get rid of the fixed stride. This also means, there is no
need to rely on reg-names property and the base addresses can be obtained
using the index.
First index is LLCC bank 0 and last index is LLCC broadcast. If the SoC
supports more than one bank, then those need to be defined in devicetree
for index from 1..N-1.
Reported-by: Parikshit Pareek <quic_ppareek@quicinc.com>
Tested-by: Luca Weiss <luca.weiss@fairphone.com>
Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20230314080443.64635-13-manivannan.sadhasivam@linaro.org
The current DRAM row address mapping arrays skx_{open,close}_row[]
only support ranks with sizes up to 16G. Decoding a rank address
to a DRAM row address for a 32G rank by using either one of the
above arrays by the skx_edac driver, will result in an overflow on
the array.
For a 32G rank, the most significant DRAM row address bit (the
bit17) is mapped from the bit34 of the rank address. Add this new
mapping item to both arrays to fix the overflow issue.
Fixes: 4ec656bdf4 ("EDAC, skx_edac: Add EDAC driver for Skylake")
Reported-by: Feng Xu <feng.f.xu@intel.com>
Tested-by: Feng Xu <feng.f.xu@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/all/20230211011728.71764-1-qiuxu.zhuo@intel.com
controller
- Add support for decoding errors from the first and second level memory
on SKL-based hardware
- Add support for the memory controllers in Intel Granite Rapids and
Emerald Rapids machines
- First round of amd64_edac driver simplification and removal of
unneeded functionality
- The usual cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPw5xsACgkQEsHwGGHe
VUohSA//bS/iFiglmpTKiY1qynPuVRfZCYGZov5JN+fRzpFQos1HISHGHTKtGbGJ
pau8Y6+QJG5LkFdR8Nf1u25WOEaYhBHHj1crUAkmSIz9zYyinrdYyDTOC2LBTmSf
BziOElAtuhTrvQ4WNL75cFzpaAKCGE7yuwZZFLVM3gHXiuVeZ3Spzbe0I9eJ4uDe
Hvgg1/IVoGAsvhNouxG5ABgVzKWxoyqEDFZtLo1adLuv8cm0hwFKWqC7zw9Y/gj0
b8tiqnoRxrEDNt8uc+D+y9HIXunB+YPBUcGhDZFrYAMlWQbENQ2WJSodIg0klNtv
Nd62wWZavdtCv9rMjOdGFPuLvWV1Lr5uIsNVSEhuqRpXjywFdYycMfmuD30YIfA6
k1t71pxGSB5fJ6qr/y0a4HkoRz9HON03Ki00gkVIMMo48k0DJKtzt6Mui8rtzIe3
uNlSDxyMXQvEUg/nR54kPAropL5DvKRx7QJ3Z1Yh4KcFmH1NtjIqoJfDghK2Gz1X
XIzIzeTJy+LRepZ6KRSEDOM8FrFzHkUKU9OZTnn/RlWha6nKyBaVyeb5kutJCW+N
Ytj9DqSxpAFDRBvbUpHRRFL1h5bgss7+AXLpkmYBF0QKmYiYV/MBSBdNpEZ1B3VC
CsRlD1IT6FSUhAdPqhAvbCDPOGpd/AvGhmLnfmn78wGIIWR0W24=
=i3bo
-----END PGP SIGNATURE-----
Merge tag 'edac_updates_for_v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:
- Add a driver for the RAS functionality on Xilinx's on chip memory
controller
- Add support for decoding errors from the first and second level
memory on SKL-based hardware
- Add support for the memory controllers in Intel Granite Rapids and
Emerald Rapids machines
- First round of amd64_edac driver simplification and removal of
unneeded functionality
- The usual cleanups and fixes
* tag 'edac_updates_for_v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/amd64: Shut up an -Werror,-Wsometimes-uninitialized clang false positive
EDAC/amd64: Remove early_channel_count()
EDAC/amd64: Remove PCI Function 0
EDAC/amd64: Remove PCI Function 6
EDAC/amd64: Remove scrub rate control for Family 17h and later
EDAC/amd64: Don't set up EDAC PCI control on Family 17h+
EDAC/i10nm: Add driver decoder for Sapphire Rapids server
EDAC/i10nm: Add Intel Granite Rapids server support
EDAC/i10nm: Make more configurations CPU model specific
EDAC/i10nm: Add Intel Emerald Rapids server support
EDAC/skx_common: Delete duplicated and unreachable code
EDAC/skx_common: Enable EDAC support for the "near" memory
EDAC/qcom: Add platform_device_id table for module autoloading
EDAC/zynqmp: Add EDAC support for Xilinx ZynqMP OCM
dt-bindings: edac: Add bindings for Xilinx ZynqMP OCM
on newer AMD CPUs
- Mask out bits which don't belong to the address of the error being
reported
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPzUAEACgkQEsHwGGHe
VUr22RAAh7fi3s8sDP4B2WBe1LPKZystZamxlLObBG2eLT7g0YmSKV12+bHCGf/B
nGqz9iy+e/T1Khxv0gdEyuENwzuitXgiEOYgB4u70HimWy5422ZCzn1EiOMFtyST
g0ehOR+tU84YwMVR40ui3spI1DHgeVPqVBLHBARZ1OAaA58N8eVREC6MqJAeAzIU
+VYiBbn69quECTuU1P7yaT8NDnbm5G6pA1dhKLc7vLl9QWzoW1yWLLcp+oGFN6B8
rcGDKEDK1OYtdHScRCfhFrznkeYP6SVnSt4wlAgX+HVGPoMpvq8nJygxCWdE0yjd
aQGhdcVJkQlSqm1iJUv0MK9nkolqXVVSVTurpHunAq7ctul6Qm/X+fsfwBgSIXXn
Gdj3in374MLWCz/xGqeBS8IiiPxGxJA9s350jyk02LK6Np6sXeuc4PpR66+6FAKQ
Ypen+uWJ6oBof04bW7DBK0R14atA8EpOOLUrrGIsSkNSEIjLaCipMZOpRCbOw76N
bXcdnKKsaEDjKtHClvx/vZXklfzWk0OgF8qtY0nGF+khvDAi3pQaIIlCehf0Qemh
6j00TqIYBCXa0kuKktdPzVJSM7A7TZ5ftboa1IPhE+GYrFFee/VJ3yfgqz102FWI
RJsY8JXt+EP3VMSOQYqQ5KzcLBJ2uDiRYtgUo4P1CITNpRfZEMc=
=e9v9
-----END PGP SIGNATURE-----
Merge tag 'ras_core_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:
- Add support for reporting more bits of the physical address on error,
on newer AMD CPUs
- Mask out bits which don't belong to the address of the error being
reported
* tag 'ras_core_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Mask out non-address bits from machine check bank
x86/mce: Add support for Extended Physical Address MCA changes
x86/mce: Define a function to extract ErrorAddr from MCA_ADDR
Reportedly, clang cannot do interprocedural analysis:
https://lore.kernel.org/r/20230213-amd64_edac-wsometimes-uninitialized-v1-1-5bde32b89e02@kernel.org
and see that those arguments won't be used uninitialized.
So, yeah, the code's fine even without this. Normally, such a "fix"
won't be applied but that warning gets automatically enabled in -Wall
builds and when CONFIG_WERROR is set in allmodconfig builds, the build
fails.
So shut it up with a minimal fix as this code will see more
reorganization very soon.
[ bp: Write commit message. ]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/Y%2BqdVHidnrrKvxiD@dev-arch.thelio-3990X
The early_channel_count() function seems to have been useful in the past
for knowing how many EDAC mci structures to populate. However, this is no
longer needed as the maximum channel count for a system is used instead.
Remove the early_channel_count() helper functions and related code. Use the
size of the channel layer when iterating over channel structures.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-6-yazen.ghannam@amd.com
PCI Function 0 is used on Family 17h and later only to read the "dhar"
value. This value is printed and provided through a module-specific
debug sysfs file. The value is not used for any Family 17h and later
code, and it does not have any apparent debug value on these systems.
Remove "dhar", Function 0 PCI IDs, and all related code.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-5-yazen.ghannam@amd.com
PCI Function 6 is used on Family 17h and later to access scrub
registers. With scrub access removed, this function has no other use.
Remove all Function 6 PCI IDs and related code.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-4-yazen.ghannam@amd.com
The scrub registers on AMD Family 17h and later may be inaccessible to
the OS. Furthermore, hardware designers recommend that the scrubbing
feature is managed by the firmware.
Remove support for the sdram_scrub_rate interface for AMD Family 17h
systems and later by not setting the scrub function pointers. The EDAC MC
core will then not expose the scrub files in sysfs.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-3-yazen.ghannam@amd.com
EDAC PCI control is used to detect/report legacy PCI errors like
"Parity" and "SERROR". Modern AMD systems use PCIe Advanced Error
Reporting (AER), and legacy PCI errors should not be reported.
Remove EDAC PCI control setup on AMD Family 17h and later systems.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230127170419.1824692-2-yazen.ghannam@amd.com
Intel SDM (December 2022) vol3B 17.13.2 contains IMC MC error codes
for Sapphire Rapids. Current i10nm_edac only supports firmware decoder
(ACPI DSM methods) for Sapphire Rapids. So add the driver decoder
(decoding DDR memory errors via extracting error information from the
IMC MC error codes) for Sapphire Rapids for better decoding performance.
Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The Granite Rapids CPU model uses similar memory controller registers
as Sapphire Rapids server but with some different configurations:
- Various memory controller numbers for different Granite Rapids CPUs.
So detect the number of present memory controllers at run time.
- Different MMIO offsets of memory controllers.
- Different triples of bus/dev/fun of some PCI devices used in i10nm_edac.
Add above configurations and Granite Rapids CPU model ID for EDAC support.
[Tony: Fixed 2 typos s/strcture/structure/]
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com
The numbers of memory controllers per socket, channels per memory
controller, DIMMs per channel and the triples of bus/device/function
of PCI devices used in i10nm_edac can be CPU model specific.
Add new fields to the structure res_config for above numbers and
triples to make them CPU model specific.
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com
The Emerald Rapids CPU model uses similar memory controller registers
as Sapphire Rapids server. Add Emerald Rapids CPU model number ID for
EDAC support.
Tested-by: Li Zhang <li4.zhang@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com
skx_mce_check_error() returns early if the error isn't from memory.
So when skx_mce_output_error() is invoked from skx_mce_check_error(),
it doesn't need to re-check whether the error is from memory. Delete
the duplicated and unreachable code from skx_mce_output_error().
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com
The current {skx,i10nm}_edac miss the EDAC support to decode errors from
the 1st level memory (the fast "near" memory as cache) of the 2-level
memory system. Introduce a helper function skx_error_in_mem() to check
whether errors are from memory at the beginning of skx_mce_check_error().
As long as the errors are from memory (either the 1-level memory system
or the 2-level memory system), decode the errors.
Reported-and-tested-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/all/20230113032802.41752-1-qiuxu.zhuo@intel.com
The memory for llcc_driv_data is allocated by the LLCC driver. But when
it is passed as the private driver info to the EDAC core, it will get freed
during the qcom_edac driver release. So when the qcom_edac driver gets probed
again, it will try to use the freed data leading to the use-after-free bug.
Hence, do not pass llcc_driv_data as pvt_info but rather reference it
using the platform_data pointer in the qcom_edac driver.
Fixes: 27450653f1 ("drivers: edac: Add EDAC driver support for QCOM SoCs")
Reported-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride
Cc: <stable@vger.kernel.org> # 4.20
Link: https://lore.kernel.org/r/20230118150904.26913-4-manivannan.sadhasivam@linaro.org
Add a device ID table so that the driver loads automatically when the
associated platform_device gets registered.
Reported-by: Andrew Halaney <ahalaney@redhat.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride
Link: https://lore.kernel.org/r/20230118150904.26913-3-manivannan.sadhasivam@linaro.org
Systems that support various memory encryption schemes (MKTME, TDX, SEV)
use high order physical address bits to indicate which key should be
used for a specific memory location.
When a memory error is reported, some systems may report those key
bits in the IA32_MCi_ADDR machine check MSR.
The Intel SDM has a footnote for the contents of the address register
that says: "Useful bits in this field depend on the address methodology
in use when the register state is saved."
AMD Processor Programming Reference has a more explicit description
of the MCA_ADDR register:
"For physical addresses, the most significant bit is given by
Core::X86::Cpuid::LongModeInfo[PhysAddrSize]."
Add a new #define MCI_ADDR_PHYSADDR for the mask of valid physical
address bits within the machine check bank address register. Use this
mask for recoverable machine check handling and in the EDAC driver to
ignore any key bits that may be present.
[ Tony: Based on independent fixes proposed by Fan Du and Isaku Yamahata ]
Reported-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reported-by: Fan Du <fan.du@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Link: https://lore.kernel.org/r/20230109152936.397862-1-tony.luck@intel.com
Add EDAC support for Xilinx ZynqMP OCM Controller, so this driver reports CE and
UE errors upon interrupt generation. Also add debugfs files for error injection.
On Xilinx ZynqMP platform, both OCM Controller driver(zynqmp_edac) and DDR
Memory Controller driver(synopsys_edac) co-exist which means both can be loaded
at a time. This scenario is tested on Xilinx ZynqMP platform.
Fix following issue reported by the robot:
"MAINTAINERS references a file that doesn't exist:
Documentation/devicetree/bindings/edac/xlnx,zynqmp-ocmc.yaml"
[ bp:
- Massage commit message
- s/EDAC_ZYNQMP_OCM/EDAC_ZYNQMP/
- Touchups
]
Reported-by: kernel test robot <lkp@intel.com>
Co-developed-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
Signed-off-by: Sai Krishna Potthuri <sai.krishna.potthuri@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230104084512.1855243-3-sai.krishna.potthuri@amd.com