linux/drivers/cxl/cxl.h

914 lines
30 KiB
C
Raw Normal View History

cxl/mem: Find device capabilities Provide enough functionality to utilize the mailbox of a memory device. The mailbox is used to interact with the firmware running on the memory device. The flow is proven with one implemented command, "identify". Because the class code has already told the driver this is a memory device and the identify command is mandatory. CXL devices contain an array of capabilities that describe the interactions software can have with the device or firmware running on the device. A CXL compliant device must implement the device status and the mailbox capability. Additionally, a CXL compliant memory device must implement the memory device capability. Each of the capabilities can [will] provide an offset within the MMIO region for interacting with the CXL device. The capabilities tell the driver how to find and map the register space for CXL Memory Devices. The registers are required to utilize the CXL spec defined mailbox interface. The spec outlines two mailboxes, primary and secondary. The secondary mailbox is earmarked for system firmware, and not handled in this driver. Primary mailboxes are capable of generating an interrupt when submitting a background command. That implementation is saved for a later time. Reported-by: Colin Ian King <colin.king@canonical.com> (coverity) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> (smatch) Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> (v2) Link: https://www.computeexpresslink.org/download-the-specification Link: https://lore.kernel.org/r/20210217040958.1354670-3-ben.widawsky@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-02-17 04:09:51 +00:00
/* SPDX-License-Identifier: GPL-2.0-only */
/* Copyright(c) 2020 Intel Corporation. */
#ifndef __CXL_H__
#define __CXL_H__
#include <linux/libnvdimm.h>
cxl/mem: Find device capabilities Provide enough functionality to utilize the mailbox of a memory device. The mailbox is used to interact with the firmware running on the memory device. The flow is proven with one implemented command, "identify". Because the class code has already told the driver this is a memory device and the identify command is mandatory. CXL devices contain an array of capabilities that describe the interactions software can have with the device or firmware running on the device. A CXL compliant device must implement the device status and the mailbox capability. Additionally, a CXL compliant memory device must implement the memory device capability. Each of the capabilities can [will] provide an offset within the MMIO region for interacting with the CXL device. The capabilities tell the driver how to find and map the register space for CXL Memory Devices. The registers are required to utilize the CXL spec defined mailbox interface. The spec outlines two mailboxes, primary and secondary. The secondary mailbox is earmarked for system firmware, and not handled in this driver. Primary mailboxes are capable of generating an interrupt when submitting a background command. That implementation is saved for a later time. Reported-by: Colin Ian King <colin.king@canonical.com> (coverity) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> (smatch) Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> (v2) Link: https://www.computeexpresslink.org/download-the-specification Link: https://lore.kernel.org/r/20210217040958.1354670-3-ben.widawsky@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-02-17 04:09:51 +00:00
#include <linux/bitfield.h>
#include <linux/notifier.h>
cxl/mem: Find device capabilities Provide enough functionality to utilize the mailbox of a memory device. The mailbox is used to interact with the firmware running on the memory device. The flow is proven with one implemented command, "identify". Because the class code has already told the driver this is a memory device and the identify command is mandatory. CXL devices contain an array of capabilities that describe the interactions software can have with the device or firmware running on the device. A CXL compliant device must implement the device status and the mailbox capability. Additionally, a CXL compliant memory device must implement the memory device capability. Each of the capabilities can [will] provide an offset within the MMIO region for interacting with the CXL device. The capabilities tell the driver how to find and map the register space for CXL Memory Devices. The registers are required to utilize the CXL spec defined mailbox interface. The spec outlines two mailboxes, primary and secondary. The secondary mailbox is earmarked for system firmware, and not handled in this driver. Primary mailboxes are capable of generating an interrupt when submitting a background command. That implementation is saved for a later time. Reported-by: Colin Ian King <colin.king@canonical.com> (coverity) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> (smatch) Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> (v2) Link: https://www.computeexpresslink.org/download-the-specification Link: https://lore.kernel.org/r/20210217040958.1354670-3-ben.widawsky@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-02-17 04:09:51 +00:00
#include <linux/bitops.h>
#include <linux/log2.h>
#include <linux/node.h>
cxl/mem: Find device capabilities Provide enough functionality to utilize the mailbox of a memory device. The mailbox is used to interact with the firmware running on the memory device. The flow is proven with one implemented command, "identify". Because the class code has already told the driver this is a memory device and the identify command is mandatory. CXL devices contain an array of capabilities that describe the interactions software can have with the device or firmware running on the device. A CXL compliant device must implement the device status and the mailbox capability. Additionally, a CXL compliant memory device must implement the memory device capability. Each of the capabilities can [will] provide an offset within the MMIO region for interacting with the CXL device. The capabilities tell the driver how to find and map the register space for CXL Memory Devices. The registers are required to utilize the CXL spec defined mailbox interface. The spec outlines two mailboxes, primary and secondary. The secondary mailbox is earmarked for system firmware, and not handled in this driver. Primary mailboxes are capable of generating an interrupt when submitting a background command. That implementation is saved for a later time. Reported-by: Colin Ian King <colin.king@canonical.com> (coverity) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> (smatch) Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> (v2) Link: https://www.computeexpresslink.org/download-the-specification Link: https://lore.kernel.org/r/20210217040958.1354670-3-ben.widawsky@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-02-17 04:09:51 +00:00
#include <linux/io.h>
extern const struct nvdimm_security_ops *cxl_security_ops;
/**
* DOC: cxl objects
*
* The CXL core objects like ports, decoders, and regions are shared
* between the subsystem drivers cxl_acpi, cxl_pci, and core drivers
* (port-driver, region-driver, nvdimm object-drivers... etc).
*/
/* CXL 2.0 8.2.4 CXL Component Register Layout and Definition */
#define CXL_COMPONENT_REG_BLOCK_SIZE SZ_64K
/* CXL 2.0 8.2.5 CXL.cache and CXL.mem Registers*/
#define CXL_CM_OFFSET 0x1000
#define CXL_CM_CAP_HDR_OFFSET 0x0
#define CXL_CM_CAP_HDR_ID_MASK GENMASK(15, 0)
#define CM_CAP_HDR_CAP_ID 1
#define CXL_CM_CAP_HDR_VERSION_MASK GENMASK(19, 16)
#define CM_CAP_HDR_CAP_VERSION 1
#define CXL_CM_CAP_HDR_CACHE_MEM_VERSION_MASK GENMASK(23, 20)
#define CM_CAP_HDR_CACHE_MEM_VERSION 1
#define CXL_CM_CAP_HDR_ARRAY_SIZE_MASK GENMASK(31, 24)
#define CXL_CM_CAP_PTR_MASK GENMASK(31, 20)
#define CXL_CM_CAP_CAP_ID_RAS 0x2
#define CXL_CM_CAP_CAP_ID_HDM 0x5
#define CXL_CM_CAP_CAP_HDM_VERSION 1
/* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
#define CXL_HDM_DECODER_CAP_OFFSET 0x0
#define CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
#define CXL_HDM_DECODER_TARGET_COUNT_MASK GENMASK(7, 4)
#define CXL_HDM_DECODER_INTERLEAVE_11_8 BIT(8)
#define CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9)
cxl/region: check interleave capability Since interleave capability is not verified, if the interleave capability of a target does not match the region need, committing decoder should have failed at the device end. In order to checkout this error as quickly as possible, driver needs to check the interleave capability of target during attaching it to region. Per CXL specification r3.1(8.2.4.20.1 CXL HDM Decoder Capability Register), bits 11 and 12 indicate the capability to establish interleaving in 3, 6, 12 and 16 ways. If these bits are not set, the target cannot be attached to a region utilizing such interleave ways. Additionally, bits 8 and 9 represent the capability of the bits used for interleaving in the address, Linux tracks this in the cxl_port interleave_mask. Per CXL specification r3.1(8.2.4.20.13 Decoder Protection): eIW means encoded Interleave Ways. eIG means encoded Interleave Granularity. in HPA: if eIW is 0 or 8 (interleave ways: 1, 3), all the bits of HPA are used, the interleave bits are none, the following check is ignored. if eIW is less than 8 (interleave ways: 2, 4, 8, 16), the interleave bits start at bit position eIG + 8 and end at eIG + eIW + 8 - 1. if eIW is greater than 8 (interleave ways: 6, 12), the interleave bits start at bit position eIG + 8 and end at eIG + eIW - 1. if the interleave mask is insufficient to cover the required interleave bits, the target cannot be attached to the region. Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Signed-off-by: Yao Xingtao <yaoxt.fnst@fujitsu.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20240614084755.59503-2-yaoxt.fnst@fujitsu.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-06-14 08:47:54 +00:00
#define CXL_HDM_DECODER_INTERLEAVE_3_6_12_WAY BIT(11)
#define CXL_HDM_DECODER_INTERLEAVE_16_WAY BIT(12)
#define CXL_HDM_DECODER_CTRL_OFFSET 0x4
#define CXL_HDM_DECODER_ENABLE BIT(1)
#define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10)
#define CXL_HDM_DECODER0_BASE_HIGH_OFFSET(i) (0x20 * (i) + 0x14)
#define CXL_HDM_DECODER0_SIZE_LOW_OFFSET(i) (0x20 * (i) + 0x18)
#define CXL_HDM_DECODER0_SIZE_HIGH_OFFSET(i) (0x20 * (i) + 0x1c)
#define CXL_HDM_DECODER0_CTRL_OFFSET(i) (0x20 * (i) + 0x20)
#define CXL_HDM_DECODER0_CTRL_IG_MASK GENMASK(3, 0)
#define CXL_HDM_DECODER0_CTRL_IW_MASK GENMASK(7, 4)
#define CXL_HDM_DECODER0_CTRL_LOCK BIT(8)
#define CXL_HDM_DECODER0_CTRL_COMMIT BIT(9)
#define CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
#define CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11)
#define CXL_HDM_DECODER0_CTRL_HOSTONLY BIT(12)
#define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
#define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
#define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i)
#define CXL_HDM_DECODER0_SKIP_HIGH(i) CXL_HDM_DECODER0_TL_HIGH(i)
/* HDM decoder control register constants CXL 3.0 8.2.5.19.7 */
#define CXL_DECODER_MIN_GRANULARITY 256
#define CXL_DECODER_MAX_ENCODED_IG 6
static inline int cxl_hdm_decoder_count(u32 cap_hdr)
{
int val = FIELD_GET(CXL_HDM_DECODER_COUNT_MASK, cap_hdr);
return val ? val * 2 : 1;
}
/* Encode defined in CXL 2.0 8.2.5.12.7 HDM Decoder Control Register */
static inline int eig_to_granularity(u16 eig, unsigned int *granularity)
{
if (eig > CXL_DECODER_MAX_ENCODED_IG)
return -EINVAL;
*granularity = CXL_DECODER_MIN_GRANULARITY << eig;
return 0;
}
/* Encode defined in CXL ECN "3, 6, 12 and 16-way memory Interleaving" */
static inline int eiw_to_ways(u8 eiw, unsigned int *ways)
{
switch (eiw) {
case 0 ... 4:
*ways = 1 << eiw;
break;
case 8 ... 10:
*ways = 3 << (eiw - 8);
break;
default:
return -EINVAL;
}
return 0;
}
static inline int granularity_to_eig(int granularity, u16 *eig)
{
if (granularity > SZ_16K || granularity < CXL_DECODER_MIN_GRANULARITY ||
!is_power_of_2(granularity))
return -EINVAL;
*eig = ilog2(granularity) - 8;
return 0;
}
static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
{
if (ways > 16)
return -EINVAL;
if (is_power_of_2(ways)) {
*eiw = ilog2(ways);
return 0;
}
if (ways % 3)
return -EINVAL;
ways /= 3;
if (!is_power_of_2(ways))
return -EINVAL;
*eiw = ilog2(ways) + 8;
return 0;
}
/* RAS Registers CXL 2.0 8.2.5.9 CXL RAS Capability Structure */
#define CXL_RAS_UNCORRECTABLE_STATUS_OFFSET 0x0
#define CXL_RAS_UNCORRECTABLE_STATUS_MASK (GENMASK(16, 14) | GENMASK(11, 0))
#define CXL_RAS_UNCORRECTABLE_MASK_OFFSET 0x4
#define CXL_RAS_UNCORRECTABLE_MASK_MASK (GENMASK(16, 14) | GENMASK(11, 0))
#define CXL_RAS_UNCORRECTABLE_MASK_F256B_MASK BIT(8)
#define CXL_RAS_UNCORRECTABLE_SEVERITY_OFFSET 0x8
#define CXL_RAS_UNCORRECTABLE_SEVERITY_MASK (GENMASK(16, 14) | GENMASK(11, 0))
#define CXL_RAS_CORRECTABLE_STATUS_OFFSET 0xC
#define CXL_RAS_CORRECTABLE_STATUS_MASK GENMASK(6, 0)
#define CXL_RAS_CORRECTABLE_MASK_OFFSET 0x10
#define CXL_RAS_CORRECTABLE_MASK_MASK GENMASK(6, 0)
#define CXL_RAS_CAP_CONTROL_OFFSET 0x14
#define CXL_RAS_CAP_CONTROL_FE_MASK GENMASK(5, 0)
#define CXL_RAS_HEADER_LOG_OFFSET 0x18
#define CXL_RAS_CAPABILITY_LENGTH 0x58
#define CXL_HEADERLOG_SIZE SZ_512
#define CXL_HEADERLOG_SIZE_U32 SZ_512 / sizeof(u32)
cxl/mem: Find device capabilities Provide enough functionality to utilize the mailbox of a memory device. The mailbox is used to interact with the firmware running on the memory device. The flow is proven with one implemented command, "identify". Because the class code has already told the driver this is a memory device and the identify command is mandatory. CXL devices contain an array of capabilities that describe the interactions software can have with the device or firmware running on the device. A CXL compliant device must implement the device status and the mailbox capability. Additionally, a CXL compliant memory device must implement the memory device capability. Each of the capabilities can [will] provide an offset within the MMIO region for interacting with the CXL device. The capabilities tell the driver how to find and map the register space for CXL Memory Devices. The registers are required to utilize the CXL spec defined mailbox interface. The spec outlines two mailboxes, primary and secondary. The secondary mailbox is earmarked for system firmware, and not handled in this driver. Primary mailboxes are capable of generating an interrupt when submitting a background command. That implementation is saved for a later time. Reported-by: Colin Ian King <colin.king@canonical.com> (coverity) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> (smatch) Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> (v2) Link: https://www.computeexpresslink.org/download-the-specification Link: https://lore.kernel.org/r/20210217040958.1354670-3-ben.widawsky@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-02-17 04:09:51 +00:00
/* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
#define CXLDEV_CAP_ARRAY_OFFSET 0x0
#define CXLDEV_CAP_ARRAY_CAP_ID 0
#define CXLDEV_CAP_ARRAY_ID_MASK GENMASK_ULL(15, 0)
#define CXLDEV_CAP_ARRAY_COUNT_MASK GENMASK_ULL(47, 32)
/* CXL 2.0 8.2.8.2 CXL Device Capability Header Register */
#define CXLDEV_CAP_HDR_CAP_ID_MASK GENMASK(15, 0)
/* CXL 2.0 8.2.8.2.1 CXL Device Capabilities */
#define CXLDEV_CAP_CAP_ID_DEVICE_STATUS 0x1
#define CXLDEV_CAP_CAP_ID_PRIMARY_MAILBOX 0x2
#define CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX 0x3
#define CXLDEV_CAP_CAP_ID_MEMDEV 0x4000
cxl/mem: Read, trace, and clear events on driver load CXL devices have multiple event logs which can be queried for CXL event records. Devices are required to support the storage of at least one event record in each event log type. Devices track event log overflow by incrementing a counter and tracking the time of the first and last overflow event seen. Software queries events via the Get Event Record mailbox command; CXL rev 3.0 section 8.2.9.2.2 and clears events via CXL rev 3.0 section 8.2.9.2.3 Clear Event Records mailbox command. If the result of negotiating CXL Error Reporting Control is OS control, read and clear all event logs on driver load. Ensure a clean slate of events by reading and clearing the events on driver load. The status register is not used because a device may continue to trigger events and the only requirement is to empty the log at least once. This allows for the required transition from empty to non-empty for interrupt generation. Handling of interrupts is in a follow on patch. The device can return up to 1MB worth of event records per query. Allocate a shared large buffer to handle the max number of records based on the mailbox payload size. This patch traces a raw event record and leaves specific event record type tracing to subsequent patches. Macros are created to aid in tracing the common CXL Event header fields. Each record is cleared explicitly. A clear all bit is specified but is only valid when the log overflows. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-1-2316a5c8f7d8@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-01-18 05:53:36 +00:00
/* CXL 3.0 8.2.8.3.1 Event Status Register */
#define CXLDEV_DEV_EVENT_STATUS_OFFSET 0x00
#define CXLDEV_EVENT_STATUS_INFO BIT(0)
#define CXLDEV_EVENT_STATUS_WARN BIT(1)
#define CXLDEV_EVENT_STATUS_FAIL BIT(2)
#define CXLDEV_EVENT_STATUS_FATAL BIT(3)
#define CXLDEV_EVENT_STATUS_ALL (CXLDEV_EVENT_STATUS_INFO | \
CXLDEV_EVENT_STATUS_WARN | \
CXLDEV_EVENT_STATUS_FAIL | \
CXLDEV_EVENT_STATUS_FATAL)
cxl/mem: Wire up event interrupts Currently the only CXL features targeted for irq support require their message numbers to be within the first 16 entries. The device may however support less than 16 entries depending on the support it provides. Attempt to allocate these 16 irq vectors. If the device supports less then the PCI infrastructure will allocate that number. Upon successful allocation, users can plug in their respective isr at any point thereafter. CXL device events are signaled via interrupts. Each event log may have a different interrupt message number. These message numbers are reported in the Get Event Interrupt Policy mailbox command. Add interrupt support for event logs. Interrupts are allocated as shared interrupts. Therefore, all or some event logs can share the same message number. In addition all logs are queried on any interrupt in order of the most to least severe based on the status register. Finally place all event configuration logic into cxl_event_config(). Previously the logic was a simple 'read all' on start up. But interrupts must be configured prior to any reads to ensure no events are missed. A single event configuration function results in a cleaner over all implementation. Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Co-developed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20221216-cxl-ev-log-v7-2-2316a5c8f7d8@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-01-18 05:53:37 +00:00
/* CXL rev 3.0 section 8.2.9.2.4; Table 8-52 */
#define CXLDEV_EVENT_INT_MODE_MASK GENMASK(1, 0)
#define CXLDEV_EVENT_INT_MSGNUM_MASK GENMASK(7, 4)
cxl/mem: Find device capabilities Provide enough functionality to utilize the mailbox of a memory device. The mailbox is used to interact with the firmware running on the memory device. The flow is proven with one implemented command, "identify". Because the class code has already told the driver this is a memory device and the identify command is mandatory. CXL devices contain an array of capabilities that describe the interactions software can have with the device or firmware running on the device. A CXL compliant device must implement the device status and the mailbox capability. Additionally, a CXL compliant memory device must implement the memory device capability. Each of the capabilities can [will] provide an offset within the MMIO region for interacting with the CXL device. The capabilities tell the driver how to find and map the register space for CXL Memory Devices. The registers are required to utilize the CXL spec defined mailbox interface. The spec outlines two mailboxes, primary and secondary. The secondary mailbox is earmarked for system firmware, and not handled in this driver. Primary mailboxes are capable of generating an interrupt when submitting a background command. That implementation is saved for a later time. Reported-by: Colin Ian King <colin.king@canonical.com> (coverity) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> (smatch) Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> (v2) Link: https://www.computeexpresslink.org/download-the-specification Link: https://lore.kernel.org/r/20210217040958.1354670-3-ben.widawsky@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-02-17 04:09:51 +00:00
/* CXL 2.0 8.2.8.4 Mailbox Registers */
#define CXLDEV_MBOX_CAPS_OFFSET 0x00
#define CXLDEV_MBOX_CAP_PAYLOAD_SIZE_MASK GENMASK(4, 0)
#define CXLDEV_MBOX_CAP_BG_CMD_IRQ BIT(6)
#define CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK GENMASK(10, 7)
cxl/mem: Find device capabilities Provide enough functionality to utilize the mailbox of a memory device. The mailbox is used to interact with the firmware running on the memory device. The flow is proven with one implemented command, "identify". Because the class code has already told the driver this is a memory device and the identify command is mandatory. CXL devices contain an array of capabilities that describe the interactions software can have with the device or firmware running on the device. A CXL compliant device must implement the device status and the mailbox capability. Additionally, a CXL compliant memory device must implement the memory device capability. Each of the capabilities can [will] provide an offset within the MMIO region for interacting with the CXL device. The capabilities tell the driver how to find and map the register space for CXL Memory Devices. The registers are required to utilize the CXL spec defined mailbox interface. The spec outlines two mailboxes, primary and secondary. The secondary mailbox is earmarked for system firmware, and not handled in this driver. Primary mailboxes are capable of generating an interrupt when submitting a background command. That implementation is saved for a later time. Reported-by: Colin Ian King <colin.king@canonical.com> (coverity) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> (smatch) Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> (v2) Link: https://www.computeexpresslink.org/download-the-specification Link: https://lore.kernel.org/r/20210217040958.1354670-3-ben.widawsky@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-02-17 04:09:51 +00:00
#define CXLDEV_MBOX_CTRL_OFFSET 0x04
#define CXLDEV_MBOX_CTRL_DOORBELL BIT(0)
#define CXLDEV_MBOX_CTRL_BG_CMD_IRQ BIT(2)
cxl/mem: Find device capabilities Provide enough functionality to utilize the mailbox of a memory device. The mailbox is used to interact with the firmware running on the memory device. The flow is proven with one implemented command, "identify". Because the class code has already told the driver this is a memory device and the identify command is mandatory. CXL devices contain an array of capabilities that describe the interactions software can have with the device or firmware running on the device. A CXL compliant device must implement the device status and the mailbox capability. Additionally, a CXL compliant memory device must implement the memory device capability. Each of the capabilities can [will] provide an offset within the MMIO region for interacting with the CXL device. The capabilities tell the driver how to find and map the register space for CXL Memory Devices. The registers are required to utilize the CXL spec defined mailbox interface. The spec outlines two mailboxes, primary and secondary. The secondary mailbox is earmarked for system firmware, and not handled in this driver. Primary mailboxes are capable of generating an interrupt when submitting a background command. That implementation is saved for a later time. Reported-by: Colin Ian King <colin.king@canonical.com> (coverity) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> (smatch) Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> (v2) Link: https://www.computeexpresslink.org/download-the-specification Link: https://lore.kernel.org/r/20210217040958.1354670-3-ben.widawsky@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-02-17 04:09:51 +00:00
#define CXLDEV_MBOX_CMD_OFFSET 0x08
#define CXLDEV_MBOX_CMD_COMMAND_OPCODE_MASK GENMASK_ULL(15, 0)
#define CXLDEV_MBOX_CMD_PAYLOAD_LENGTH_MASK GENMASK_ULL(36, 16)
#define CXLDEV_MBOX_STATUS_OFFSET 0x10
#define CXLDEV_MBOX_STATUS_BG_CMD BIT(0)
cxl/mem: Find device capabilities Provide enough functionality to utilize the mailbox of a memory device. The mailbox is used to interact with the firmware running on the memory device. The flow is proven with one implemented command, "identify". Because the class code has already told the driver this is a memory device and the identify command is mandatory. CXL devices contain an array of capabilities that describe the interactions software can have with the device or firmware running on the device. A CXL compliant device must implement the device status and the mailbox capability. Additionally, a CXL compliant memory device must implement the memory device capability. Each of the capabilities can [will] provide an offset within the MMIO region for interacting with the CXL device. The capabilities tell the driver how to find and map the register space for CXL Memory Devices. The registers are required to utilize the CXL spec defined mailbox interface. The spec outlines two mailboxes, primary and secondary. The secondary mailbox is earmarked for system firmware, and not handled in this driver. Primary mailboxes are capable of generating an interrupt when submitting a background command. That implementation is saved for a later time. Reported-by: Colin Ian King <colin.king@canonical.com> (coverity) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> (smatch) Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> (v2) Link: https://www.computeexpresslink.org/download-the-specification Link: https://lore.kernel.org/r/20210217040958.1354670-3-ben.widawsky@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-02-17 04:09:51 +00:00
#define CXLDEV_MBOX_STATUS_RET_CODE_MASK GENMASK_ULL(47, 32)
#define CXLDEV_MBOX_BG_CMD_STATUS_OFFSET 0x18
#define CXLDEV_MBOX_BG_CMD_COMMAND_OPCODE_MASK GENMASK_ULL(15, 0)
#define CXLDEV_MBOX_BG_CMD_COMMAND_PCT_MASK GENMASK_ULL(22, 16)
#define CXLDEV_MBOX_BG_CMD_COMMAND_RC_MASK GENMASK_ULL(47, 32)
#define CXLDEV_MBOX_BG_CMD_COMMAND_VENDOR_MASK GENMASK_ULL(63, 48)
cxl/mem: Find device capabilities Provide enough functionality to utilize the mailbox of a memory device. The mailbox is used to interact with the firmware running on the memory device. The flow is proven with one implemented command, "identify". Because the class code has already told the driver this is a memory device and the identify command is mandatory. CXL devices contain an array of capabilities that describe the interactions software can have with the device or firmware running on the device. A CXL compliant device must implement the device status and the mailbox capability. Additionally, a CXL compliant memory device must implement the memory device capability. Each of the capabilities can [will] provide an offset within the MMIO region for interacting with the CXL device. The capabilities tell the driver how to find and map the register space for CXL Memory Devices. The registers are required to utilize the CXL spec defined mailbox interface. The spec outlines two mailboxes, primary and secondary. The secondary mailbox is earmarked for system firmware, and not handled in this driver. Primary mailboxes are capable of generating an interrupt when submitting a background command. That implementation is saved for a later time. Reported-by: Colin Ian King <colin.king@canonical.com> (coverity) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> (smatch) Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> (v2) Link: https://www.computeexpresslink.org/download-the-specification Link: https://lore.kernel.org/r/20210217040958.1354670-3-ben.widawsky@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-02-17 04:09:51 +00:00
#define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20
cxl/mem: Introduce 'struct cxl_regs' for "composable" CXL devices CXL MMIO register blocks are organized by device type and capabilities. There are Component registers, Device registers (yes, an ambiguous name), and Memory Device registers (a specific extension of Device registers). It is possible for a given device instance (endpoint or port) to implement register sets from multiple of the above categories. The driver code that enumerates and maps the registers is type specific so it is useful to have a dedicated type and helpers for each block type. At the same time, once the registers are mapped the origin type does not matter. It is overly pedantic to reference the register block type in code that is using the registers. In preparation for the endpoint driver to incorporate Component registers into its MMIO operations reorganize the registers to allow typed enumeration + mapping, but anonymous usage. With the end state of 'struct cxl_regs' to be: struct cxl_regs { union { struct { CXL_DEVICE_REGS(); }; struct cxl_device_regs device_regs; }; union { struct { CXL_COMPONENT_REGS(); }; struct cxl_component_regs component_regs; }; }; With this arrangement the driver can share component init code with ports, but when using the registers it can directly reference the component register block type by name without the 'component_regs' prefix. So, map + enumerate can be shared across drivers of different CXL classes e.g.: void cxl_setup_device_regs(struct device *dev, void __iomem *base, struct cxl_device_regs *regs); void cxl_setup_component_regs(struct device *dev, void __iomem *base, struct cxl_component_regs *regs); ...while inline usage in the driver need not indicate where the registers came from: readl(cxlm->regs.mbox + MBOX_OFFSET); readl(cxlm->regs.hdm + HDM_OFFSET); ...instead of: readl(cxlm->regs.device_regs.mbox + MBOX_OFFSET); readl(cxlm->regs.component_regs.hdm + HDM_OFFSET); This complexity of the definition in .h yields improvement in code readability in .c while maintaining type-safety for organization of setup code. It prepares the implementation to maintain organization in the face of CXL devices that compose register interfaces consisting of multiple types. Given that this new container is named 'regs' rename the common register base pointer @base, and fixup the kernel-doc for the missing @cxlmd description. Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/162096971451.1865304.13540251513463515153.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-05-14 05:21:54 +00:00
/*
* Using struct_group() allows for per register-block-type helper routines,
* without requiring block-type agnostic code to include the prefix.
cxl/mem: Introduce 'struct cxl_regs' for "composable" CXL devices CXL MMIO register blocks are organized by device type and capabilities. There are Component registers, Device registers (yes, an ambiguous name), and Memory Device registers (a specific extension of Device registers). It is possible for a given device instance (endpoint or port) to implement register sets from multiple of the above categories. The driver code that enumerates and maps the registers is type specific so it is useful to have a dedicated type and helpers for each block type. At the same time, once the registers are mapped the origin type does not matter. It is overly pedantic to reference the register block type in code that is using the registers. In preparation for the endpoint driver to incorporate Component registers into its MMIO operations reorganize the registers to allow typed enumeration + mapping, but anonymous usage. With the end state of 'struct cxl_regs' to be: struct cxl_regs { union { struct { CXL_DEVICE_REGS(); }; struct cxl_device_regs device_regs; }; union { struct { CXL_COMPONENT_REGS(); }; struct cxl_component_regs component_regs; }; }; With this arrangement the driver can share component init code with ports, but when using the registers it can directly reference the component register block type by name without the 'component_regs' prefix. So, map + enumerate can be shared across drivers of different CXL classes e.g.: void cxl_setup_device_regs(struct device *dev, void __iomem *base, struct cxl_device_regs *regs); void cxl_setup_component_regs(struct device *dev, void __iomem *base, struct cxl_component_regs *regs); ...while inline usage in the driver need not indicate where the registers came from: readl(cxlm->regs.mbox + MBOX_OFFSET); readl(cxlm->regs.hdm + HDM_OFFSET); ...instead of: readl(cxlm->regs.device_regs.mbox + MBOX_OFFSET); readl(cxlm->regs.component_regs.hdm + HDM_OFFSET); This complexity of the definition in .h yields improvement in code readability in .c while maintaining type-safety for organization of setup code. It prepares the implementation to maintain organization in the face of CXL devices that compose register interfaces consisting of multiple types. Given that this new container is named 'regs' rename the common register base pointer @base, and fixup the kernel-doc for the missing @cxlmd description. Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/162096971451.1865304.13540251513463515153.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-05-14 05:21:54 +00:00
*/
struct cxl_regs {
/*
* Common set of CXL Component register block base pointers
* @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
* @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
*/
struct_group_tagged(cxl_component_regs, component,
void __iomem *hdm_decoder;
void __iomem *ras;
);
/*
* Common set of CXL Device register block base pointers
* @status: CXL 2.0 8.2.8.3 Device Status Registers
* @mbox: CXL 2.0 8.2.8.4 Mailbox Registers
* @memdev: CXL 2.0 8.2.8.5 Memory Device Registers
*/
struct_group_tagged(cxl_device_regs, device_regs,
void __iomem *status, *mbox, *memdev;
);
struct_group_tagged(cxl_pmu_regs, pmu_regs,
void __iomem *pmu;
);
/*
* RCH downstream port specific RAS register
* @aer: CXL 3.0 8.2.1.1 RCH Downstream Port RCRB
*/
struct_group_tagged(cxl_rch_regs, rch_regs,
void __iomem *dport_aer;
);
cxl/mem: Introduce 'struct cxl_regs' for "composable" CXL devices CXL MMIO register blocks are organized by device type and capabilities. There are Component registers, Device registers (yes, an ambiguous name), and Memory Device registers (a specific extension of Device registers). It is possible for a given device instance (endpoint or port) to implement register sets from multiple of the above categories. The driver code that enumerates and maps the registers is type specific so it is useful to have a dedicated type and helpers for each block type. At the same time, once the registers are mapped the origin type does not matter. It is overly pedantic to reference the register block type in code that is using the registers. In preparation for the endpoint driver to incorporate Component registers into its MMIO operations reorganize the registers to allow typed enumeration + mapping, but anonymous usage. With the end state of 'struct cxl_regs' to be: struct cxl_regs { union { struct { CXL_DEVICE_REGS(); }; struct cxl_device_regs device_regs; }; union { struct { CXL_COMPONENT_REGS(); }; struct cxl_component_regs component_regs; }; }; With this arrangement the driver can share component init code with ports, but when using the registers it can directly reference the component register block type by name without the 'component_regs' prefix. So, map + enumerate can be shared across drivers of different CXL classes e.g.: void cxl_setup_device_regs(struct device *dev, void __iomem *base, struct cxl_device_regs *regs); void cxl_setup_component_regs(struct device *dev, void __iomem *base, struct cxl_component_regs *regs); ...while inline usage in the driver need not indicate where the registers came from: readl(cxlm->regs.mbox + MBOX_OFFSET); readl(cxlm->regs.hdm + HDM_OFFSET); ...instead of: readl(cxlm->regs.device_regs.mbox + MBOX_OFFSET); readl(cxlm->regs.component_regs.hdm + HDM_OFFSET); This complexity of the definition in .h yields improvement in code readability in .c while maintaining type-safety for organization of setup code. It prepares the implementation to maintain organization in the face of CXL devices that compose register interfaces consisting of multiple types. Given that this new container is named 'regs' rename the common register base pointer @base, and fixup the kernel-doc for the missing @cxlmd description. Reviewed-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/162096971451.1865304.13540251513463515153.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-05-14 05:21:54 +00:00
};
struct cxl_reg_map {
bool valid;
int id;
unsigned long offset;
unsigned long size;
};
struct cxl_component_reg_map {
struct cxl_reg_map hdm_decoder;
struct cxl_reg_map ras;
};
struct cxl_device_reg_map {
struct cxl_reg_map status;
struct cxl_reg_map mbox;
struct cxl_reg_map memdev;
};
struct cxl_pmu_reg_map {
struct cxl_reg_map pmu;
};
/**
* struct cxl_register_map - DVSEC harvested register block mapping parameters
* @host: device for devm operations and logging
* @base: virtual base of the register-block-BAR + @block_offset
* @resource: physical resource base of the register block
* @max_size: maximum mapping size to perform register search
* @reg_type: see enum cxl_regloc_type
* @component_map: cxl_reg_map for component registers
* @device_map: cxl_reg_maps for device registers
* @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
*/
struct cxl_register_map {
struct device *host;
void __iomem *base;
resource_size_t resource;
resource_size_t max_size;
u8 reg_type;
union {
struct cxl_component_reg_map component_map;
struct cxl_device_reg_map device_map;
struct cxl_pmu_reg_map pmu_map;
};
};
void cxl_probe_component_regs(struct device *dev, void __iomem *base,
struct cxl_component_reg_map *map);
void cxl_probe_device_regs(struct device *dev, void __iomem *base,
struct cxl_device_reg_map *map);
int cxl_map_component_regs(const struct cxl_register_map *map,
struct cxl_component_regs *regs,
unsigned long map_mask);
int cxl_map_device_regs(const struct cxl_register_map *map,
struct cxl_device_regs *regs);
int cxl_map_pmu_regs(struct cxl_register_map *map, struct cxl_pmu_regs *regs);
enum cxl_regloc_type;
int cxl_count_regblock(struct pci_dev *pdev, enum cxl_regloc_type type);
int cxl_find_regblock_instance(struct pci_dev *pdev, enum cxl_regloc_type type,
struct cxl_register_map *map, int index);
int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
struct cxl_register_map *map);
int cxl_setup_regs(struct cxl_register_map *map);
2023-06-25 18:35:20 +00:00
struct cxl_dport;
resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
struct cxl_dport *dport);
2022-12-03 08:40:29 +00:00
#define CXL_RESOURCE_NONE ((resource_size_t) -1)
#define CXL_TARGET_STRLEN 20
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
/*
* cxl_decoder flags that define the type of memory / devices this
* decoder supports as well as configuration lock status See "CXL 2.0
* 8.2.5.12.7 CXL HDM Decoder 0 Control Register" for details.
cxl/region: Add region autodiscovery Region autodiscovery is an asynchronous state machine advanced by cxl_port_probe(). After the decoders on an endpoint port are enumerated they are scanned for actively enabled instances. Each active decoder is flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region. If a region does not already exist for the address range setting of the decoder one is created. That creation process may race with other decoders of the same region being discovered since cxl_port_probe() is asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is introduced to mitigate that race. Once all decoders have arrived, "p->nr_targets == p->interleave_ways", they are sorted by their relative decode position. The sort algorithm involves finding the point in the cxl_port topology where one leg of the decode leads to deviceA and the other deviceB. At that point in the topology the target order in the 'struct cxl_switch_decoder' indicates the relative position of those endpoint decoders in the region. >From that point the region goes through the same setup and validation steps as user-created regions, but instead of programming the decoders it validates that driver would have written the same values to the decoders as were already present. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601999958.1924368.9366954455835735048.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:31:17 +00:00
* Additionally indicate whether decoder settings were autodetected,
* user customized.
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
*/
#define CXL_DECODER_F_RAM BIT(0)
#define CXL_DECODER_F_PMEM BIT(1)
#define CXL_DECODER_F_TYPE2 BIT(2)
#define CXL_DECODER_F_TYPE3 BIT(3)
#define CXL_DECODER_F_LOCK BIT(4)
#define CXL_DECODER_F_ENABLE BIT(5)
#define CXL_DECODER_F_MASK GENMASK(5, 0)
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
enum cxl_decoder_type {
CXL_DECODER_DEVMEM = 2,
CXL_DECODER_HOSTONLYMEM = 3,
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
};
/*
* Current specification goes up to 8, double that seems a reasonable
* software max for the foreseeable future
*/
#define CXL_DECODER_MAX_INTERLEAVE 16
#define CXL_QOS_CLASS_INVALID -1
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
/**
* struct cxl_decoder - Common CXL HDM Decoder Attributes
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
* @dev: this decoder's device
* @id: kernel device name id
* @hpa_range: Host physical address range mapped by this decoder
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
* @interleave_ways: number of cxl_dports in this decode
* @interleave_granularity: data stride per dport
* @target_type: accelerator vs expander (type2 vs type3) selector
* @region: currently assigned region for this decoder
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
* @flags: memory type capabilities and locking
* @commit: device/decoder-type specific callback to commit settings to hw
* @reset: device/decoder-type specific callback to reset hw settings
*/
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
struct cxl_decoder {
struct device dev;
int id;
struct range hpa_range;
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
int interleave_ways;
int interleave_granularity;
enum cxl_decoder_type target_type;
struct cxl_region *region;
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
unsigned long flags;
int (*commit)(struct cxl_decoder *cxld);
cxl/port: Fix use-after-free, permit out-of-order decoder shutdown In support of investigating an initialization failure report [1], cxl_test was updated to register mock memory-devices after the mock root-port/bus device had been registered. That led to cxl_test crashing with a use-after-free bug with the following signature: cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1 cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1 cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0 1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1 [..] cxld_unregister: cxl decoder14.0: cxl_region_decode_reset: cxl_region region3: mock_decoder_reset: cxl_port port3: decoder3.0 reset 2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1 cxl_endpoint_decoder_release: cxl decoder14.0: [..] cxld_unregister: cxl decoder7.0: 3) cxl_region_decode_reset: cxl_region region3: Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI [..] RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core] [..] Call Trace: <TASK> cxl_region_decode_reset+0x69/0x190 [cxl_core] cxl_region_detach+0xe8/0x210 [cxl_core] cxl_decoder_kill_region+0x27/0x40 [cxl_core] cxld_unregister+0x5d/0x60 [cxl_core] At 1) a region has been established with 2 endpoint decoders (7.0 and 14.0). Those endpoints share a common switch-decoder in the topology (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits the "out of order reset case" in the switch decoder. The effect though is that region3 cleanup is aborted leaving it in-tact and referencing decoder14.0. At 3) the second attempt to teardown region3 trips over the stale decoder14.0 object which has long since been deleted. The fix here is to recognize that the CXL specification places no mandate on in-order shutdown of switch-decoders, the driver enforces in-order allocation, and hardware enforces in-order commit. So, rather than fail and leave objects dangling, always remove them. In support of making cxl_region_decode_reset() always succeed, cxl_region_invalidate_memregion() failures are turned into warnings. Crashing the kernel is ok there since system integrity is at risk if caches cannot be managed around physical address mutation events like CXL region destruction. A new device_for_each_child_reverse_from() is added to cleanup port->commit_end after all dependent decoders have been disabled. In other words if decoders are allocated 0->1->2 and disabled 1->2->0 then port->commit_end only decrements from 2 after 2 has been disabled, and it decrements all the way to zero since 1 was disabled previously. Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Cc: stable@vger.kernel.org Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-10-23 01:43:49 +00:00
void (*reset)(struct cxl_decoder *cxld);
};
/*
* CXL_DECODER_DEAD prevents endpoints from being reattached to regions
* while cxld_unregister() is running
*/
enum cxl_decoder_mode {
CXL_DECODER_NONE,
CXL_DECODER_RAM,
CXL_DECODER_PMEM,
CXL_DECODER_MIXED,
CXL_DECODER_DEAD,
};
static inline const char *cxl_decoder_mode_name(enum cxl_decoder_mode mode)
{
static const char * const names[] = {
[CXL_DECODER_NONE] = "none",
[CXL_DECODER_RAM] = "ram",
[CXL_DECODER_PMEM] = "pmem",
[CXL_DECODER_MIXED] = "mixed",
};
if (mode >= CXL_DECODER_NONE && mode <= CXL_DECODER_MIXED)
return names[mode];
return "mixed";
}
cxl/region: Add region autodiscovery Region autodiscovery is an asynchronous state machine advanced by cxl_port_probe(). After the decoders on an endpoint port are enumerated they are scanned for actively enabled instances. Each active decoder is flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region. If a region does not already exist for the address range setting of the decoder one is created. That creation process may race with other decoders of the same region being discovered since cxl_port_probe() is asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is introduced to mitigate that race. Once all decoders have arrived, "p->nr_targets == p->interleave_ways", they are sorted by their relative decode position. The sort algorithm involves finding the point in the cxl_port topology where one leg of the decode leads to deviceA and the other deviceB. At that point in the topology the target order in the 'struct cxl_switch_decoder' indicates the relative position of those endpoint decoders in the region. >From that point the region goes through the same setup and validation steps as user-created regions, but instead of programming the decoders it validates that driver would have written the same values to the decoders as were already present. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601999958.1924368.9366954455835735048.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:31:17 +00:00
/*
* Track whether this decoder is reserved for region autodiscovery, or
* free for userspace provisioning.
*/
enum cxl_decoder_state {
CXL_DECODER_STATE_MANUAL,
CXL_DECODER_STATE_AUTO,
};
/**
* struct cxl_endpoint_decoder - Endpoint / SPA to DPA decoder
* @cxld: base cxl_decoder_object
* @dpa_res: actively claimed DPA span of this decoder
* @skip: offset into @dpa_res where @cxld.hpa_range maps
* @mode: which memory type / access-mode-partition this decoder targets
cxl/region: Add region autodiscovery Region autodiscovery is an asynchronous state machine advanced by cxl_port_probe(). After the decoders on an endpoint port are enumerated they are scanned for actively enabled instances. Each active decoder is flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region. If a region does not already exist for the address range setting of the decoder one is created. That creation process may race with other decoders of the same region being discovered since cxl_port_probe() is asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is introduced to mitigate that race. Once all decoders have arrived, "p->nr_targets == p->interleave_ways", they are sorted by their relative decode position. The sort algorithm involves finding the point in the cxl_port topology where one leg of the decode leads to deviceA and the other deviceB. At that point in the topology the target order in the 'struct cxl_switch_decoder' indicates the relative position of those endpoint decoders in the region. >From that point the region goes through the same setup and validation steps as user-created regions, but instead of programming the decoders it validates that driver would have written the same values to the decoders as were already present. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601999958.1924368.9366954455835735048.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:31:17 +00:00
* @state: autodiscovery state
* @pos: interleave position in @cxld.region
*/
struct cxl_endpoint_decoder {
struct cxl_decoder cxld;
struct resource *dpa_res;
resource_size_t skip;
enum cxl_decoder_mode mode;
cxl/region: Add region autodiscovery Region autodiscovery is an asynchronous state machine advanced by cxl_port_probe(). After the decoders on an endpoint port are enumerated they are scanned for actively enabled instances. Each active decoder is flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region. If a region does not already exist for the address range setting of the decoder one is created. That creation process may race with other decoders of the same region being discovered since cxl_port_probe() is asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is introduced to mitigate that race. Once all decoders have arrived, "p->nr_targets == p->interleave_ways", they are sorted by their relative decode position. The sort algorithm involves finding the point in the cxl_port topology where one leg of the decode leads to deviceA and the other deviceB. At that point in the topology the target order in the 'struct cxl_switch_decoder' indicates the relative position of those endpoint decoders in the region. >From that point the region goes through the same setup and validation steps as user-created regions, but instead of programming the decoders it validates that driver would have written the same values to the decoders as were already present. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601999958.1924368.9366954455835735048.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:31:17 +00:00
enum cxl_decoder_state state;
int pos;
};
/**
* struct cxl_switch_decoder - Switch specific CXL HDM Decoder
* @cxld: base cxl_decoder object
* @nr_targets: number of elements in @target
* @target: active ordered target list in current decoder configuration
*
* The 'switch' decoder type represents the decoder instances of cxl_port's that
* route from the root of a CXL memory decode topology to the endpoints. They
* come in two flavors, root-level decoders, statically defined by platform
* firmware, and mid-level decoders, where interleave-granularity,
* interleave-width, and the target list are mutable.
*/
struct cxl_switch_decoder {
struct cxl_decoder cxld;
int nr_targets;
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
struct cxl_dport *target[];
};
struct cxl_root_decoder;
cxl: Restore XOR'd position bits during address translation When a device reports a DPA in events like poison, general_media, and dram, the driver translates that DPA back to an HPA. Presently, the CXL driver translation only considers the Modulo position and will report the wrong HPA for XOR configured root decoders. Add a helper function that restores the XOR'd bits during DPA->HPA address translation. Plumb a root decoder callback to the new helper when XOR interleave arithmetic is in use. For Modulo arithmetic, just let the callback be NULL - as in no extra work required. Upon completion of a DPA->HPA translation a couple of checks are performed on the result. One simply confirms that the calculated HPA is within the address range of the region. That test is useful for both Modulo and XOR interleave arithmetic decodes. A second check confirms that the HPA is within an expected chunk based on the endpoints position in the region and the region granularity. An XOR decode disrupts the Modulo pattern making the chunk check useless. To align the checks with the proper decode, pull the region range check inline and use the helper to do the chunk check for Modulo decodes only. A cxl-test unit test is posted for upstream review here: https://lore.kernel.org/20240624210644.495563-1-alison.schofield@intel.com/ Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Diego Garcia Rodriguez <diego.garcia.rodriguez@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/1a1ac880d9f889bd6384e657e810431b9a0a72e5.1719980933.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-07-03 05:29:50 +00:00
typedef u64 (*cxl_hpa_to_spa_fn)(struct cxl_root_decoder *cxlrd, u64 hpa);
/**
* struct cxl_root_decoder - Static platform CXL address decoder
* @res: host / parent resource for region allocations
cxl/region: Add region creation support CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 17:28:34 +00:00
* @region_id: region id for next region provisioning event
cxl: Restore XOR'd position bits during address translation When a device reports a DPA in events like poison, general_media, and dram, the driver translates that DPA back to an HPA. Presently, the CXL driver translation only considers the Modulo position and will report the wrong HPA for XOR configured root decoders. Add a helper function that restores the XOR'd bits during DPA->HPA address translation. Plumb a root decoder callback to the new helper when XOR interleave arithmetic is in use. For Modulo arithmetic, just let the callback be NULL - as in no extra work required. Upon completion of a DPA->HPA translation a couple of checks are performed on the result. One simply confirms that the calculated HPA is within the address range of the region. That test is useful for both Modulo and XOR interleave arithmetic decodes. A second check confirms that the HPA is within an expected chunk based on the endpoints position in the region and the region granularity. An XOR decode disrupts the Modulo pattern making the chunk check useless. To align the checks with the proper decode, pull the region range check inline and use the helper to do the chunk check for Modulo decodes only. A cxl-test unit test is posted for upstream review here: https://lore.kernel.org/20240624210644.495563-1-alison.schofield@intel.com/ Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Diego Garcia Rodriguez <diego.garcia.rodriguez@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/1a1ac880d9f889bd6384e657e810431b9a0a72e5.1719980933.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-07-03 05:29:50 +00:00
* @hpa_to_spa: translate CXL host-physical-address to Platform system-physical-address
* @platform_data: platform specific configuration data
cxl/region: Add region autodiscovery Region autodiscovery is an asynchronous state machine advanced by cxl_port_probe(). After the decoders on an endpoint port are enumerated they are scanned for actively enabled instances. Each active decoder is flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region. If a region does not already exist for the address range setting of the decoder one is created. That creation process may race with other decoders of the same region being discovered since cxl_port_probe() is asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is introduced to mitigate that race. Once all decoders have arrived, "p->nr_targets == p->interleave_ways", they are sorted by their relative decode position. The sort algorithm involves finding the point in the cxl_port topology where one leg of the decode leads to deviceA and the other deviceB. At that point in the topology the target order in the 'struct cxl_switch_decoder' indicates the relative position of those endpoint decoders in the region. >From that point the region goes through the same setup and validation steps as user-created regions, but instead of programming the decoders it validates that driver would have written the same values to the decoders as were already present. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601999958.1924368.9366954455835735048.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:31:17 +00:00
* @range_lock: sync region autodiscovery by address range
* @qos_class: QoS performance class cookie
* @cxlsd: base cxl switch decoder
*/
struct cxl_root_decoder {
struct resource *res;
cxl/region: Add region creation support CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 17:28:34 +00:00
atomic_t region_id;
cxl: Restore XOR'd position bits during address translation When a device reports a DPA in events like poison, general_media, and dram, the driver translates that DPA back to an HPA. Presently, the CXL driver translation only considers the Modulo position and will report the wrong HPA for XOR configured root decoders. Add a helper function that restores the XOR'd bits during DPA->HPA address translation. Plumb a root decoder callback to the new helper when XOR interleave arithmetic is in use. For Modulo arithmetic, just let the callback be NULL - as in no extra work required. Upon completion of a DPA->HPA translation a couple of checks are performed on the result. One simply confirms that the calculated HPA is within the address range of the region. That test is useful for both Modulo and XOR interleave arithmetic decodes. A second check confirms that the HPA is within an expected chunk based on the endpoints position in the region and the region granularity. An XOR decode disrupts the Modulo pattern making the chunk check useless. To align the checks with the proper decode, pull the region range check inline and use the helper to do the chunk check for Modulo decodes only. A cxl-test unit test is posted for upstream review here: https://lore.kernel.org/20240624210644.495563-1-alison.schofield@intel.com/ Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Diego Garcia Rodriguez <diego.garcia.rodriguez@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/1a1ac880d9f889bd6384e657e810431b9a0a72e5.1719980933.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-07-03 05:29:50 +00:00
cxl_hpa_to_spa_fn hpa_to_spa;
void *platform_data;
cxl/region: Add region autodiscovery Region autodiscovery is an asynchronous state machine advanced by cxl_port_probe(). After the decoders on an endpoint port are enumerated they are scanned for actively enabled instances. Each active decoder is flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region. If a region does not already exist for the address range setting of the decoder one is created. That creation process may race with other decoders of the same region being discovered since cxl_port_probe() is asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is introduced to mitigate that race. Once all decoders have arrived, "p->nr_targets == p->interleave_ways", they are sorted by their relative decode position. The sort algorithm involves finding the point in the cxl_port topology where one leg of the decode leads to deviceA and the other deviceB. At that point in the topology the target order in the 'struct cxl_switch_decoder' indicates the relative position of those endpoint decoders in the region. >From that point the region goes through the same setup and validation steps as user-created regions, but instead of programming the decoders it validates that driver would have written the same values to the decoders as were already present. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601999958.1924368.9366954455835735048.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:31:17 +00:00
struct mutex range_lock;
int qos_class;
struct cxl_switch_decoder cxlsd;
};
/*
* enum cxl_config_state - State machine for region configuration
* @CXL_CONFIG_IDLE: Any sysfs attribute can be written freely
* @CXL_CONFIG_INTERLEAVE_ACTIVE: region size has been set, no more
* changes to interleave_ways or interleave_granularity
* @CXL_CONFIG_ACTIVE: All targets have been added the region is now
* active
* @CXL_CONFIG_RESET_PENDING: see commit_store()
* @CXL_CONFIG_COMMIT: Soft-config has been committed to hardware
*/
enum cxl_config_state {
CXL_CONFIG_IDLE,
CXL_CONFIG_INTERLEAVE_ACTIVE,
CXL_CONFIG_ACTIVE,
CXL_CONFIG_RESET_PENDING,
CXL_CONFIG_COMMIT,
};
/**
* struct cxl_region_params - region settings
* @state: allow the driver to lockdown further parameter changes
* @uuid: unique id for persistent regions
* @interleave_ways: number of endpoints in the region
* @interleave_granularity: capacity each endpoint contributes to a stripe
* @res: allocated iomem capacity for this region
* @targets: active ordered targets in current decoder configuration
* @nr_targets: number of targets
*
* State transitions are protected by the cxl_region_rwsem
*/
struct cxl_region_params {
enum cxl_config_state state;
uuid_t uuid;
int interleave_ways;
int interleave_granularity;
struct resource *res;
struct cxl_endpoint_decoder *targets[CXL_DECODER_MAX_INTERLEAVE];
int nr_targets;
};
cxl/region: Add region autodiscovery Region autodiscovery is an asynchronous state machine advanced by cxl_port_probe(). After the decoders on an endpoint port are enumerated they are scanned for actively enabled instances. Each active decoder is flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region. If a region does not already exist for the address range setting of the decoder one is created. That creation process may race with other decoders of the same region being discovered since cxl_port_probe() is asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is introduced to mitigate that race. Once all decoders have arrived, "p->nr_targets == p->interleave_ways", they are sorted by their relative decode position. The sort algorithm involves finding the point in the cxl_port topology where one leg of the decode leads to deviceA and the other deviceB. At that point in the topology the target order in the 'struct cxl_switch_decoder' indicates the relative position of those endpoint decoders in the region. >From that point the region goes through the same setup and validation steps as user-created regions, but instead of programming the decoders it validates that driver would have written the same values to the decoders as were already present. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601999958.1924368.9366954455835735048.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:31:17 +00:00
/*
* Indicate whether this region has been assembled by autodetection or
* userspace assembly. Prevent endpoint decoders outside of automatic
* detection from being added to the region.
*/
cxl/region: Move cache invalidation before region teardown, and before setup Vikram raised a concern with the theoretical case of a CPU sending MemClnEvict to a device that is not prepared to receive. MemClnEvict is a message that is sent after a CPU has taken ownership of a cacheline from accelerator memory (HDM-DB). In the case of hotplug or HDM decoder reconfiguration it is possible that the CPU is holding old contents for a new device that has taken over the physical address range being cached by the CPU. To avoid this scenario, invalidate caches prior to tearing down an HDM decoder configuration. Now, this poses another problem that it is possible for something to speculate into that space while the decode configuration is still up, so to close that gap also invalidate prior to establish new contents behind a given physical address range. With this change the cache invalidation is now explicit and need not be checked in cxl_region_probe(), and that obviates the need for CXL_REGION_F_INCOHERENT. Cc: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Fixes: d18bc74aced6 ("cxl/region: Manage CPU caches relative to DPA invalidation events") Reported-by: Vikram Sethi <vsethi@nvidia.com> Closes: http://lore.kernel.org/r/BYAPR12MB33364B5EB908BF7239BB996BBD53A@BYAPR12MB3336.namprd12.prod.outlook.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168696506886.3590522.4597053660991916591.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-06-17 01:24:28 +00:00
#define CXL_REGION_F_AUTO 0
cxl/region: Add region autodiscovery Region autodiscovery is an asynchronous state machine advanced by cxl_port_probe(). After the decoders on an endpoint port are enumerated they are scanned for actively enabled instances. Each active decoder is flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region. If a region does not already exist for the address range setting of the decoder one is created. That creation process may race with other decoders of the same region being discovered since cxl_port_probe() is asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is introduced to mitigate that race. Once all decoders have arrived, "p->nr_targets == p->interleave_ways", they are sorted by their relative decode position. The sort algorithm involves finding the point in the cxl_port topology where one leg of the decode leads to deviceA and the other deviceB. At that point in the topology the target order in the 'struct cxl_switch_decoder' indicates the relative position of those endpoint decoders in the region. >From that point the region goes through the same setup and validation steps as user-created regions, but instead of programming the decoders it validates that driver would have written the same values to the decoders as were already present. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601999958.1924368.9366954455835735048.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:31:17 +00:00
/*
* Require that a committed region successfully complete a teardown once
* any of its associated decoders have been torn down. This maintains
* the commit state for the region since there are committed decoders,
* but blocks cxl_region_probe().
*/
#define CXL_REGION_F_NEEDS_RESET 1
cxl/region: Add region autodiscovery Region autodiscovery is an asynchronous state machine advanced by cxl_port_probe(). After the decoders on an endpoint port are enumerated they are scanned for actively enabled instances. Each active decoder is flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region. If a region does not already exist for the address range setting of the decoder one is created. That creation process may race with other decoders of the same region being discovered since cxl_port_probe() is asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is introduced to mitigate that race. Once all decoders have arrived, "p->nr_targets == p->interleave_ways", they are sorted by their relative decode position. The sort algorithm involves finding the point in the cxl_port topology where one leg of the decode leads to deviceA and the other deviceB. At that point in the topology the target order in the 'struct cxl_switch_decoder' indicates the relative position of those endpoint decoders in the region. >From that point the region goes through the same setup and validation steps as user-created regions, but instead of programming the decoders it validates that driver would have written the same values to the decoders as were already present. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601999958.1924368.9366954455835735048.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:31:17 +00:00
cxl/region: Add region creation support CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 17:28:34 +00:00
/**
* struct cxl_region - CXL region
* @dev: This region's device
* @id: This region's id. Id is globally unique across all regions
* @mode: Endpoint decoder allocation / access mode
* @type: Endpoint decoder target type
cxl/pmem: Refactor nvdimm device registration, delete the workqueue The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and 'struct cxl_pmem_region' manage CXL persistent memory resources. The bridge represents base platform resources, the nvdimm represents one or more endpoints, and the region is a collection of nvdimms that contribute to an assembled address range. Their relationship is such that a region is torn down if any component endpoints are removed. All regions and endpoints are torn down if the foundational bridge device goes down. A workqueue was deployed to manage these interdependencies, but it is difficult to reason about, and fragile. A recent attempt to take the CXL root device lock in the cxl_mem driver was reported by lockdep as colliding with the flush_work() in the cxl_pmem flows. Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn down immediately and hierarchically. A similar change is made to both the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both changes are made in the same patch which unfortunately makes the patch bigger than desired. Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and cxl_pmem_region as a devres release action of the bridge device. Additionally, include a devres release action of the cxl_memdev or cxl_region device that triggers the bridge's release action if an endpoint exits before the bridge. I.e. this allows either unplugging the bridge, or unplugging and endpoint to result in the same cleanup actions. To keep the patch smaller the cleanup of the now defunct workqueue infrastructure is saved for a follow-on patch. Tested-by: Robert Richter <rrichter@amd.com> Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-01 21:33:37 +00:00
* @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown
* @cxlr_pmem: (for pmem regions) cached copy of the nvdimm bridge
cxl/region: Manage CPU caches relative to DPA invalidation events A "DPA invalidation event" is any scenario where the contents of a DPA (Device Physical Address) is modified in a way that is incoherent with CPU caches, or if the HPA (Host Physical Address) to DPA association changes due to a remapping event. PMEM security events like Unlock and Passphrase Secure Erase already manage caches through LIBNVDIMM, so that leaves HPA to DPA remap events that need cache management by the CXL core. Those only happen when the boot time CXL configuration has changed. That event occurs when userspace attaches an endpoint decoder to a region configuration, and that region is subsequently activated. The implications of not invalidating caches between remap events is that reads from the region at different points in time may return different results due to stale cached data from the previous HPA to DPA mapping. Without a guarantee that the region contents after cxl_region_probe() are written before being read (a layering-violation assumption that cxl_region_probe() can not make) the CXL subsystem needs to ensure that reads that precede writes see consistent results. A CONFIG_CXL_REGION_INVALIDATION_TEST option is added to support debug and unit testing of the CXL implementation in QEMU or other environments where cpu_cache_has_invalidate_memregion() returns false. This may prove too restrictive for QEMU where the HDM decoders are emulated, but in that case the CXL subsystem needs some new mechanism / indication that the HDM decoder is emulated and not a passthrough of real hardware. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/166993222098.1995348.16604163596374520890.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-01 22:03:41 +00:00
* @flags: Region state flags
* @params: active + config params for the region
* @coord: QoS access coordinates for the region
* @memory_notifier: notifier for setting the access coordinates to node
* @adist_notifier: notifier for calculating the abstract distance of node
cxl/region: Add region creation support CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 17:28:34 +00:00
*/
struct cxl_region {
struct device dev;
int id;
enum cxl_decoder_mode mode;
enum cxl_decoder_type type;
cxl/pmem: Refactor nvdimm device registration, delete the workqueue The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and 'struct cxl_pmem_region' manage CXL persistent memory resources. The bridge represents base platform resources, the nvdimm represents one or more endpoints, and the region is a collection of nvdimms that contribute to an assembled address range. Their relationship is such that a region is torn down if any component endpoints are removed. All regions and endpoints are torn down if the foundational bridge device goes down. A workqueue was deployed to manage these interdependencies, but it is difficult to reason about, and fragile. A recent attempt to take the CXL root device lock in the cxl_mem driver was reported by lockdep as colliding with the flush_work() in the cxl_pmem flows. Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn down immediately and hierarchically. A similar change is made to both the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both changes are made in the same patch which unfortunately makes the patch bigger than desired. Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and cxl_pmem_region as a devres release action of the bridge device. Additionally, include a devres release action of the cxl_memdev or cxl_region device that triggers the bridge's release action if an endpoint exits before the bridge. I.e. this allows either unplugging the bridge, or unplugging and endpoint to result in the same cleanup actions. To keep the patch smaller the cleanup of the now defunct workqueue infrastructure is saved for a follow-on patch. Tested-by: Robert Richter <rrichter@amd.com> Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-01 21:33:37 +00:00
struct cxl_nvdimm_bridge *cxl_nvb;
struct cxl_pmem_region *cxlr_pmem;
cxl/region: Manage CPU caches relative to DPA invalidation events A "DPA invalidation event" is any scenario where the contents of a DPA (Device Physical Address) is modified in a way that is incoherent with CPU caches, or if the HPA (Host Physical Address) to DPA association changes due to a remapping event. PMEM security events like Unlock and Passphrase Secure Erase already manage caches through LIBNVDIMM, so that leaves HPA to DPA remap events that need cache management by the CXL core. Those only happen when the boot time CXL configuration has changed. That event occurs when userspace attaches an endpoint decoder to a region configuration, and that region is subsequently activated. The implications of not invalidating caches between remap events is that reads from the region at different points in time may return different results due to stale cached data from the previous HPA to DPA mapping. Without a guarantee that the region contents after cxl_region_probe() are written before being read (a layering-violation assumption that cxl_region_probe() can not make) the CXL subsystem needs to ensure that reads that precede writes see consistent results. A CONFIG_CXL_REGION_INVALIDATION_TEST option is added to support debug and unit testing of the CXL implementation in QEMU or other environments where cpu_cache_has_invalidate_memregion() returns false. This may prove too restrictive for QEMU where the HDM decoders are emulated, but in that case the CXL subsystem needs some new mechanism / indication that the HDM decoder is emulated and not a passthrough of real hardware. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/166993222098.1995348.16604163596374520890.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-01 22:03:41 +00:00
unsigned long flags;
struct cxl_region_params params;
struct access_coordinate coord[ACCESS_COORDINATE_MAX];
struct notifier_block memory_notifier;
struct notifier_block adist_notifier;
cxl/region: Add region creation support CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 17:28:34 +00:00
};
struct cxl_nvdimm_bridge {
int id;
struct device dev;
struct cxl_port *port;
struct nvdimm_bus *nvdimm_bus;
struct nvdimm_bus_descriptor nd_desc;
};
#define CXL_DEV_ID_LEN 19
struct cxl_nvdimm {
struct device dev;
struct cxl_memdev *cxlmd;
u8 dev_id[CXL_DEV_ID_LEN]; /* for nvdimm, string of 'serial' */
};
struct cxl_pmem_region_mapping {
struct cxl_memdev *cxlmd;
struct cxl_nvdimm *cxl_nvd;
u64 start;
u64 size;
int position;
};
struct cxl_pmem_region {
struct device dev;
struct cxl_region *cxlr;
struct nd_region *nd_region;
struct range hpa_range;
int nr_mappings;
struct cxl_pmem_region_mapping mapping[];
};
struct cxl_dax_region {
struct device dev;
struct cxl_region *cxlr;
struct range hpa_range;
};
/**
* struct cxl_port - logical collection of upstream port devices and
* downstream port devices to construct a CXL memory
* decode hierarchy.
* @dev: this port's device
* @uport_dev: PCI or platform device implementing the upstream port capability
* @host_bridge: Shortcut to the platform attach point for this port
* @id: id for port device-name
* @dports: cxl_dport instances referenced by decoders
* @endpoints: cxl_ep instances, endpoints that are a descendant of this port
cxl/region: Attach endpoint decoders CXL regions (interleave sets) are made up of a set of memory devices where each device maps a portion of the interleave with one of its decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure). As endpoint decoders are identified by a provisioning tool they can be added to a region provided the region interleave properties are set (way, granularity, HPA) and DPA has been assigned to the decoder. The attach event triggers several validation checks, for example: - is the DPA sized appropriately for the region - is the decoder reachable via the host-bridges identified by the region's root decoder - is the device already active in a different region position slot - are there already regions with a higher HPA active on a given port (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming) ...and the attach event affords an opportunity to collect data and resources relevant to later programming the target lists in switch decoders, for example: - allocate a decoder at each cxl_port in the decode chain - for a given switch port, how many the region's endpoints are hosted through the port - how many unique targets (next hops) does a port need to map to reach those endpoints The act of reconciling this information and deploying it to the decoder configuration is saved for a follow-on patch. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784337277.1758207.4108508181328528703.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-06-07 17:56:10 +00:00
* @regions: cxl_region_ref instances, regions mapped by this port
* @parent_dport: dport that points to this port in the parent
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
* @decoder_ida: allocator for decoder ids
* @reg_map: component and ras register mapping parameters
cxl/region: Fix 'distance' calculation with passthrough ports When programming port decode targets, the algorithm wants to ensure that two devices are compatible to be programmed as peers beneath a given port. A compatible peer is a target that shares the same dport, and where that target's interleave position also routes it to the same dport. Compatibility is determined by the device's interleave position being >= to distance. For example, if a given dport can only map every Nth position then positions less than N away from the last target programmed are incompatible. The @distance for the host-bridge's cxl_port in a simple dual-ported host-bridge configuration with 2 direct-attached devices is 1, i.e. An x2 region divided by 2 dports to reach 2 region targets. An x4 region under an x2 host-bridge would need 2 intervening switches where the @distance at the host bridge level is 2 (x4 region divided by 2 switches to reach 4 devices). However, the distance between peers underneath a single ported host-bridge is always zero because there is no limit to the number of devices that can be mapped. In other words, there are no decoders to program in a passthrough, all descendants are mapped and distance only starts matters for the intervening descendant ports of the passthrough port. Add tracking for the number of dports mapped to a port, and use that to detect the passthrough case for calculating @distance. Cc: <stable@vger.kernel.org> Reported-by: Bobo WL <lmw.bobo@gmail.com> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/166752185440.947915.6617495912508299445.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-04 00:30:54 +00:00
* @nr_dports: number of entries in @dports
* @hdm_end: track last allocated HDM decoder instance for allocation ordering
* @commit_end: cursor to track highest committed decoder for commit ordering
* @dead: last ep has been removed, force port re-creation
* @depth: How deep this port is relative to the root. depth 0 is the root.
cxl/port: Read CDAT table The per-device CDAT data provides performance data that is relevant for mapping which CXL devices can participate in which CXL ranges by QTG (QoS Throttling Group) (per ECN: CXL 2.0 CEDT CFMWS & QTG_DSM) [1]. The QTG association specified in the ECN is advisory. Until the cxl_acpi driver grows support for invoking the QTG _DSM method the CDAT data is only of interest to userspace that may need it for debug purposes. Search the DOE mailboxes available, query CDAT data, cache the data and make it available via a sysfs binary attribute per endpoint at: /sys/bus/cxl/devices/endpointX/CDAT ...similar to other ACPI-structured table data in /sys/firmware/ACPI/tables. The CDAT is relative to 'struct cxl_port' objects since switches in addition to endpoints can host a CDAT instance. Switch CDAT support is not implemented. This does not support table updates at runtime. It will always provide whatever was there when first cached. It is also the case that table updates are not expected outside of explicit DPA address map affecting commands like Set Partition with the immediate flag set. Given that the driver does not support Set Partition with the immediate flag set there is no current need for update support. Link: https://www.computeexpresslink.org/spec-landing [1] Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> [djbw: drop in-kernel parsing infra for now, and other minor fixups] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220719205249.566684-7-ira.weiny@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-19 20:52:49 +00:00
* @cdat: Cached CDAT data
* @cdat_available: Should a CDAT attribute be available in sysfs
* @pci_latency: Upstream latency in picoseconds
*/
struct cxl_port {
struct device dev;
struct device *uport_dev;
struct device *host_bridge;
int id;
struct xarray dports;
struct xarray endpoints;
cxl/region: Attach endpoint decoders CXL regions (interleave sets) are made up of a set of memory devices where each device maps a portion of the interleave with one of its decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure). As endpoint decoders are identified by a provisioning tool they can be added to a region provided the region interleave properties are set (way, granularity, HPA) and DPA has been assigned to the decoder. The attach event triggers several validation checks, for example: - is the DPA sized appropriately for the region - is the decoder reachable via the host-bridges identified by the region's root decoder - is the device already active in a different region position slot - are there already regions with a higher HPA active on a given port (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming) ...and the attach event affords an opportunity to collect data and resources relevant to later programming the target lists in switch decoders, for example: - allocate a decoder at each cxl_port in the decode chain - for a given switch port, how many the region's endpoints are hosted through the port - how many unique targets (next hops) does a port need to map to reach those endpoints The act of reconciling this information and deploying it to the decoder configuration is saved for a follow-on patch. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784337277.1758207.4108508181328528703.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-06-07 17:56:10 +00:00
struct xarray regions;
struct cxl_dport *parent_dport;
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
struct ida decoder_ida;
struct cxl_register_map reg_map;
cxl/region: Fix 'distance' calculation with passthrough ports When programming port decode targets, the algorithm wants to ensure that two devices are compatible to be programmed as peers beneath a given port. A compatible peer is a target that shares the same dport, and where that target's interleave position also routes it to the same dport. Compatibility is determined by the device's interleave position being >= to distance. For example, if a given dport can only map every Nth position then positions less than N away from the last target programmed are incompatible. The @distance for the host-bridge's cxl_port in a simple dual-ported host-bridge configuration with 2 direct-attached devices is 1, i.e. An x2 region divided by 2 dports to reach 2 region targets. An x4 region under an x2 host-bridge would need 2 intervening switches where the @distance at the host bridge level is 2 (x4 region divided by 2 switches to reach 4 devices). However, the distance between peers underneath a single ported host-bridge is always zero because there is no limit to the number of devices that can be mapped. In other words, there are no decoders to program in a passthrough, all descendants are mapped and distance only starts matters for the intervening descendant ports of the passthrough port. Add tracking for the number of dports mapped to a port, and use that to detect the passthrough case for calculating @distance. Cc: <stable@vger.kernel.org> Reported-by: Bobo WL <lmw.bobo@gmail.com> Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/166752185440.947915.6617495912508299445.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-04 00:30:54 +00:00
int nr_dports;
int hdm_end;
int commit_end;
bool dead;
unsigned int depth;
cxl/port: Read CDAT table The per-device CDAT data provides performance data that is relevant for mapping which CXL devices can participate in which CXL ranges by QTG (QoS Throttling Group) (per ECN: CXL 2.0 CEDT CFMWS & QTG_DSM) [1]. The QTG association specified in the ECN is advisory. Until the cxl_acpi driver grows support for invoking the QTG _DSM method the CDAT data is only of interest to userspace that may need it for debug purposes. Search the DOE mailboxes available, query CDAT data, cache the data and make it available via a sysfs binary attribute per endpoint at: /sys/bus/cxl/devices/endpointX/CDAT ...similar to other ACPI-structured table data in /sys/firmware/ACPI/tables. The CDAT is relative to 'struct cxl_port' objects since switches in addition to endpoints can host a CDAT instance. Switch CDAT support is not implemented. This does not support table updates at runtime. It will always provide whatever was there when first cached. It is also the case that table updates are not expected outside of explicit DPA address map affecting commands like Set Partition with the immediate flag set. Given that the driver does not support Set Partition with the immediate flag set there is no current need for update support. Link: https://www.computeexpresslink.org/spec-landing [1] Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> [djbw: drop in-kernel parsing infra for now, and other minor fixups] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220719205249.566684-7-ira.weiny@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-07-19 20:52:49 +00:00
struct cxl_cdat {
void *table;
size_t length;
} cdat;
bool cdat_available;
long pci_latency;
};
/**
* struct cxl_root - logical collection of root cxl_port items
*
* @port: cxl_port member
* @ops: cxl root operations
*/
struct cxl_root {
struct cxl_port port;
const struct cxl_root_ops *ops;
};
static inline struct cxl_root *
to_cxl_root(const struct cxl_port *port)
{
return container_of(port, struct cxl_root, port);
}
struct cxl_root_ops {
int (*qos_class)(struct cxl_root *cxl_root,
struct access_coordinate *coord, int entries,
int *qos_class);
};
static inline struct cxl_dport *
cxl_find_dport_by_dev(struct cxl_port *port, const struct device *dport_dev)
{
return xa_load(&port->dports, (unsigned long)dport_dev);
}
struct cxl_rcrb_info {
resource_size_t base;
u16 aer_cap;
};
/**
* struct cxl_dport - CXL downstream port
* @dport_dev: PCI bridge or firmware device representing the downstream link
* @reg_map: component and ras register mapping parameters
* @port_id: unique hardware identifier for dport in decoder target list
* @rcrb: Data about the Root Complex Register Block layout
2022-12-03 08:40:29 +00:00
* @rch: Indicate whether this dport was enumerated in RCH or VH mode
* @port: reference to cxl_port that contains this downstream port
* @regs: Dport parsed register blocks
* @coord: access coordinates (bandwidth and latency performance attributes)
* @link_latency: calculated PCIe downstream latency
*/
struct cxl_dport {
struct device *dport_dev;
struct cxl_register_map reg_map;
int port_id;
struct cxl_rcrb_info rcrb;
2022-12-03 08:40:29 +00:00
bool rch;
struct cxl_port *port;
struct cxl_regs regs;
struct access_coordinate coord[ACCESS_COORDINATE_MAX];
long link_latency;
};
/**
* struct cxl_ep - track an endpoint's interest in a port
* @ep: device that hosts a generic CXL endpoint (expander or accelerator)
* @dport: which dport routes to this endpoint on @port
* @next: cxl switch port across the link attached to @dport NULL if
* attached to an endpoint
*/
struct cxl_ep {
struct device *ep;
struct cxl_dport *dport;
struct cxl_port *next;
};
cxl/region: Attach endpoint decoders CXL regions (interleave sets) are made up of a set of memory devices where each device maps a portion of the interleave with one of its decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure). As endpoint decoders are identified by a provisioning tool they can be added to a region provided the region interleave properties are set (way, granularity, HPA) and DPA has been assigned to the decoder. The attach event triggers several validation checks, for example: - is the DPA sized appropriately for the region - is the decoder reachable via the host-bridges identified by the region's root decoder - is the device already active in a different region position slot - are there already regions with a higher HPA active on a given port (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming) ...and the attach event affords an opportunity to collect data and resources relevant to later programming the target lists in switch decoders, for example: - allocate a decoder at each cxl_port in the decode chain - for a given switch port, how many the region's endpoints are hosted through the port - how many unique targets (next hops) does a port need to map to reach those endpoints The act of reconciling this information and deploying it to the decoder configuration is saved for a follow-on patch. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784337277.1758207.4108508181328528703.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-06-07 17:56:10 +00:00
/**
* struct cxl_region_ref - track a region's interest in a port
* @port: point in topology to install this reference
* @decoder: decoder assigned for @region in @port
* @region: region for this reference
* @endpoints: cxl_ep references for region members beneath @port
* @nr_targets_set: track how many targets have been programmed during setup
cxl/region: Attach endpoint decoders CXL regions (interleave sets) are made up of a set of memory devices where each device maps a portion of the interleave with one of its decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure). As endpoint decoders are identified by a provisioning tool they can be added to a region provided the region interleave properties are set (way, granularity, HPA) and DPA has been assigned to the decoder. The attach event triggers several validation checks, for example: - is the DPA sized appropriately for the region - is the decoder reachable via the host-bridges identified by the region's root decoder - is the device already active in a different region position slot - are there already regions with a higher HPA active on a given port (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming) ...and the attach event affords an opportunity to collect data and resources relevant to later programming the target lists in switch decoders, for example: - allocate a decoder at each cxl_port in the decode chain - for a given switch port, how many the region's endpoints are hosted through the port - how many unique targets (next hops) does a port need to map to reach those endpoints The act of reconciling this information and deploying it to the decoder configuration is saved for a follow-on patch. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784337277.1758207.4108508181328528703.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-06-07 17:56:10 +00:00
* @nr_eps: number of endpoints beneath @port
* @nr_targets: number of distinct targets needed to reach @nr_eps
*/
struct cxl_region_ref {
struct cxl_port *port;
struct cxl_decoder *decoder;
struct cxl_region *region;
struct xarray endpoints;
int nr_targets_set;
cxl/region: Attach endpoint decoders CXL regions (interleave sets) are made up of a set of memory devices where each device maps a portion of the interleave with one of its decoders (see CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure). As endpoint decoders are identified by a provisioning tool they can be added to a region provided the region interleave properties are set (way, granularity, HPA) and DPA has been assigned to the decoder. The attach event triggers several validation checks, for example: - is the DPA sized appropriately for the region - is the decoder reachable via the host-bridges identified by the region's root decoder - is the device already active in a different region position slot - are there already regions with a higher HPA active on a given port (per CXL 2.0 8.2.5.12.20 Committing Decoder Programming) ...and the attach event affords an opportunity to collect data and resources relevant to later programming the target lists in switch decoders, for example: - allocate a decoder at each cxl_port in the decode chain - for a given switch port, how many the region's endpoints are hosted through the port - how many unique targets (next hops) does a port need to map to reach those endpoints The act of reconciling this information and deploying it to the decoder configuration is saved for a follow-on patch. Co-developed-by: Ben Widawsky <bwidawsk@kernel.org> Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784337277.1758207.4108508181328528703.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-06-07 17:56:10 +00:00
int nr_eps;
int nr_targets;
};
/*
* The platform firmware device hosting the root is also the top of the
* CXL port topology. All other CXL ports have another CXL port as their
* parent and their ->uport_dev / host device is out-of-line of the port
* ancestry.
*/
static inline bool is_cxl_root(struct cxl_port *port)
{
return port->uport_dev == port->dev.parent;
}
int cxl_num_decoders_committed(struct cxl_port *port);
bool is_cxl_port(const struct device *dev);
struct cxl_port *to_cxl_port(const struct device *dev);
cxl/port: Fix use-after-free, permit out-of-order decoder shutdown In support of investigating an initialization failure report [1], cxl_test was updated to register mock memory-devices after the mock root-port/bus device had been registered. That led to cxl_test crashing with a use-after-free bug with the following signature: cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1 cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1 cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0 1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1 [..] cxld_unregister: cxl decoder14.0: cxl_region_decode_reset: cxl_region region3: mock_decoder_reset: cxl_port port3: decoder3.0 reset 2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1 cxl_endpoint_decoder_release: cxl decoder14.0: [..] cxld_unregister: cxl decoder7.0: 3) cxl_region_decode_reset: cxl_region region3: Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI [..] RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core] [..] Call Trace: <TASK> cxl_region_decode_reset+0x69/0x190 [cxl_core] cxl_region_detach+0xe8/0x210 [cxl_core] cxl_decoder_kill_region+0x27/0x40 [cxl_core] cxld_unregister+0x5d/0x60 [cxl_core] At 1) a region has been established with 2 endpoint decoders (7.0 and 14.0). Those endpoints share a common switch-decoder in the topology (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits the "out of order reset case" in the switch decoder. The effect though is that region3 cleanup is aborted leaving it in-tact and referencing decoder14.0. At 3) the second attempt to teardown region3 trips over the stale decoder14.0 object which has long since been deleted. The fix here is to recognize that the CXL specification places no mandate on in-order shutdown of switch-decoders, the driver enforces in-order allocation, and hardware enforces in-order commit. So, rather than fail and leave objects dangling, always remove them. In support of making cxl_region_decode_reset() always succeed, cxl_region_invalidate_memregion() failures are turned into warnings. Crashing the kernel is ok there since system integrity is at risk if caches cannot be managed around physical address mutation events like CXL region destruction. A new device_for_each_child_reverse_from() is added to cleanup port->commit_end after all dependent decoders have been disabled. In other words if decoders are allocated 0->1->2 and disabled 1->2->0 then port->commit_end only decrements from 2 after 2 has been disabled, and it decrements all the way to zero since 1 was disabled previously. Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Cc: stable@vger.kernel.org Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com>
2024-10-23 01:43:49 +00:00
void cxl_port_commit_reap(struct cxl_decoder *cxld);
struct pci_bus;
int devm_cxl_register_pci_bus(struct device *host, struct device *uport_dev,
struct pci_bus *bus);
struct pci_bus *cxl_port_to_pci_bus(struct cxl_port *port);
struct cxl_port *devm_cxl_add_port(struct device *host,
struct device *uport_dev,
resource_size_t component_reg_phys,
struct cxl_dport *parent_dport);
struct cxl_root *devm_cxl_add_root(struct device *host,
const struct cxl_root_ops *ops);
struct cxl_root *find_cxl_root(struct cxl_port *port);
void put_cxl_root(struct cxl_root *cxl_root);
DEFINE_FREE(put_cxl_root, struct cxl_root *, if (_T) put_cxl_root(_T))
DEFINE_FREE(put_cxl_port, struct cxl_port *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev))
int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd);
void cxl_bus_rescan(void);
void cxl_bus_drain(void);
struct cxl_port *cxl_pci_find_port(struct pci_dev *pdev,
struct cxl_dport **dport);
struct cxl_port *cxl_mem_find_port(struct cxl_memdev *cxlmd,
struct cxl_dport **dport);
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
bool schedule_cxl_memdev_detach(struct cxl_memdev *cxlmd);
struct cxl_dport *devm_cxl_add_dport(struct cxl_port *port,
struct device *dport, int port_id,
resource_size_t component_reg_phys);
2022-12-03 08:40:29 +00:00
struct cxl_dport *devm_cxl_add_rch_dport(struct cxl_port *port,
struct device *dport_dev, int port_id,
resource_size_t rcrb);
cxl/pci: Add RCH downstream port AER register discovery Restricted CXL host (RCH) downstream port AER information is not currently logged while in the error state. One problem preventing the error logging is the AER and RAS registers are not accessible. The CXL driver requires changes to find RCH downstream port AER and RAS registers for purpose of error logging. RCH downstream ports are not enumerated during a PCI bus scan and are instead discovered using system firmware, ACPI in this case.[1] The downstream port is implemented as a Root Complex Register Block (RCRB). The RCRB is a 4k memory block containing PCIe registers based on the PCIe root port.[2] The RCRB includes AER extended capability registers used for reporting errors. Note, the RCH's AER Capability is located in the RCRB memory space instead of PCI configuration space, thus its register access is different. Existing kernel PCIe AER functions can not be used to manage the downstream port AER capabilities and RAS registers because the port was not enumerated during PCI scan and the registers are not PCI config accessible. Discover RCH downstream port AER extended capability registers. Use MMIO accesses to search for extended AER capability in RCRB register space. [1] CXL 3.0 Spec, 9.11.2 - System Firmware View of CXL 1.1 Hierarchy [2] CXL 3.0 Spec, 8.2.1.1 - RCH Downstream Port RCRB Co-developed-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231018171713.1883517-12-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 22:08:06 +00:00
#ifdef CONFIG_PCIEAER_CXL
void cxl_setup_parent_dport(struct device *host, struct cxl_dport *dport);
void cxl_dport_init_ras_reporting(struct cxl_dport *dport, struct device *host);
cxl/pci: Add RCH downstream port AER register discovery Restricted CXL host (RCH) downstream port AER information is not currently logged while in the error state. One problem preventing the error logging is the AER and RAS registers are not accessible. The CXL driver requires changes to find RCH downstream port AER and RAS registers for purpose of error logging. RCH downstream ports are not enumerated during a PCI bus scan and are instead discovered using system firmware, ACPI in this case.[1] The downstream port is implemented as a Root Complex Register Block (RCRB). The RCRB is a 4k memory block containing PCIe registers based on the PCIe root port.[2] The RCRB includes AER extended capability registers used for reporting errors. Note, the RCH's AER Capability is located in the RCRB memory space instead of PCI configuration space, thus its register access is different. Existing kernel PCIe AER functions can not be used to manage the downstream port AER capabilities and RAS registers because the port was not enumerated during PCI scan and the registers are not PCI config accessible. Discover RCH downstream port AER extended capability registers. Use MMIO accesses to search for extended AER capability in RCRB register space. [1] CXL 3.0 Spec, 9.11.2 - System Firmware View of CXL 1.1 Hierarchy [2] CXL 3.0 Spec, 8.2.1.1 - RCH Downstream Port RCRB Co-developed-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231018171713.1883517-12-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 22:08:06 +00:00
#else
static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
struct device *host) { }
cxl/pci: Add RCH downstream port AER register discovery Restricted CXL host (RCH) downstream port AER information is not currently logged while in the error state. One problem preventing the error logging is the AER and RAS registers are not accessible. The CXL driver requires changes to find RCH downstream port AER and RAS registers for purpose of error logging. RCH downstream ports are not enumerated during a PCI bus scan and are instead discovered using system firmware, ACPI in this case.[1] The downstream port is implemented as a Root Complex Register Block (RCRB). The RCRB is a 4k memory block containing PCIe registers based on the PCIe root port.[2] The RCRB includes AER extended capability registers used for reporting errors. Note, the RCH's AER Capability is located in the RCRB memory space instead of PCI configuration space, thus its register access is different. Existing kernel PCIe AER functions can not be used to manage the downstream port AER capabilities and RAS registers because the port was not enumerated during PCI scan and the registers are not PCI config accessible. Discover RCH downstream port AER extended capability registers. Use MMIO accesses to search for extended AER capability in RCRB register space. [1] CXL 3.0 Spec, 9.11.2 - System Firmware View of CXL 1.1 Hierarchy [2] CXL 3.0 Spec, 8.2.1.1 - RCH Downstream Port RCRB Co-developed-by: Robert Richter <rrichter@amd.com> Signed-off-by: Terry Bowman <terry.bowman@amd.com> Signed-off-by: Robert Richter <rrichter@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20231018171713.1883517-12-rrichter@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-10-27 22:08:06 +00:00
#endif
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
struct cxl_decoder *to_cxl_decoder(struct device *dev);
struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev);
bool is_root_decoder(struct device *dev);
bool is_switch_decoder(struct device *dev);
bool is_endpoint_decoder(struct device *dev);
struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
unsigned int nr_targets);
struct cxl_switch_decoder *cxl_switch_decoder_alloc(struct cxl_port *port,
unsigned int nr_targets);
int cxl_decoder_add(struct cxl_decoder *cxld, int *target_map);
struct cxl_endpoint_decoder *cxl_endpoint_decoder_alloc(struct cxl_port *port);
int cxl_decoder_add_locked(struct cxl_decoder *cxld, int *target_map);
int cxl_decoder_autoremove(struct device *host, struct cxl_decoder *cxld);
cxl/acpi: Cleanup __cxl_parse_cfmws() As a follow on to the recent rework of __cxl_parse_cfmws() to always return errors [1], use cleanup.h helpers to remove goto and other cleanups now that logging is moved to the cxl_parse_cfmws() wrapper. This ends up adding more code than it deletes, but __cxl_parse_cfmws() itself does get smaller. The takeaway from the cond_no_free_ptr() discussion [2] was to not add new macros to handle the cases where no_free_ptr() is awkward, instead rework the code to have helpers and clearer delineation of responsibility. Now one might say that __free(del_cxl_resource) is excessive given it is immediately registered with add_or_reset_cxl_resource(). The rationale for keeping it is that it forces use of "no_free_ptr()" on the argument passed to add_or_reset_cxl_resource(). That in turn makes it clear that @res is NULL for the rest of the function which is part of the point of the cleanup helpers, to turn subtle use after free errors [3] into loud NULL pointer de-references. Link: http://lore.kernel.org/r/170820177238.631006.1012639681618409284.stgit@dwillia2-xfh.jf.intel.com [1] Link: http://lore.kernel.org/r/CAHk-=whBVhnh=KSeBBRet=E7qJAwnPR_aj5em187Q3FiD+LXnA@mail.gmail.com [2] Link: http://lore.kernel.org/r/20230714093146.2253438-1-leitao@debian.org [3] Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Closes: http://lore.kernel.org/r/20240219124041.00002bda@Huawei.com Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/171235474028.2718248.14109646123143505522.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-04-05 22:05:50 +00:00
static inline int cxl_root_decoder_autoremove(struct device *host,
struct cxl_root_decoder *cxlrd)
{
return cxl_decoder_autoremove(host, &cxlrd->cxlsd.cxld);
}
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
int cxl_endpoint_autoremove(struct cxl_memdev *cxlmd, struct cxl_port *endpoint);
/**
* struct cxl_endpoint_dvsec_info - Cached DVSEC info
* @mem_enabled: cached value of mem_enabled in the DVSEC at init time
* @ranges: Number of active HDM ranges this device uses.
* @port: endpoint port associated with this info instance
* @dvsec_range: cached attributes of the ranges in the DVSEC, PCIE_DEVICE
*/
struct cxl_endpoint_dvsec_info {
bool mem_enabled;
int ranges;
struct cxl_port *port;
struct range dvsec_range[2];
};
struct cxl_hdm;
struct cxl_hdm *devm_cxl_setup_hdm(struct cxl_port *port,
struct cxl_endpoint_dvsec_info *info);
int devm_cxl_enumerate_decoders(struct cxl_hdm *cxlhdm,
struct cxl_endpoint_dvsec_info *info);
int devm_cxl_add_passthrough_decoder(struct cxl_port *port);
int cxl_dvsec_rr_decode(struct device *dev, struct cxl_port *port,
struct cxl_endpoint_dvsec_info *info);
cxl/acpi: Introduce cxl_decoder objects A cxl_decoder is a child of a cxl_port. It represents a hardware decoder configuration of an upstream port to one or more of its downstream ports. The decoder is either represented in CXL standard HDM decoder registers (see CXL 2.0 section 8.2.5.12 CXL HDM Decoder Capability Structure), or it is a static decode configuration communicated by platform firmware (see the CXL Early Discovery Table: Fixed Memory Window Structure). The firmware described and hardware described decoders differ slightly leading to 2 different sub-types of decoders, cxl_decoder_root and cxl_decoder_switch. At the root level the decode capabilities restrict what can be mapped beneath them. Mid-level switch decoders are configured for either acclerator (type-2) or memory-expander (type-3) operation, but they are otherwise agnostic to the type of memory (volatile vs persistent) being mapped. Here is an example topology from a single-ported host-bridge environment without CFMWS decodes enumerated. /sys/bus/cxl/devices/root0 ├── devtype ├── dport0 -> ../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── port1 │   ├── decoder1.0 │   │   ├── devtype │   │   ├── locked │   │   ├── size │   │   ├── start │   │   ├── subsystem -> ../../../../../../bus/cxl │   │   ├── target_list │   │   ├── target_type │   │   └── uevent │   ├── devtype │   ├── dport0 -> ../../../../pci0000:34/0000:34:00.0 │   ├── subsystem -> ../../../../../bus/cxl │   ├── uevent │   └── uport -> ../../../../LNXSYSTM:00/LNXSYBUS:00/ACPI0016:00 ├── subsystem -> ../../../../bus/cxl ├── uevent └── uport -> ../../ACPI0017:00 Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/162325695128.2293823.17519927266014762694.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-09 16:43:29 +00:00
cxl/region: Add region creation support CXL 2.0 allows for dynamic provisioning of new memory regions (system physical address resources like "System RAM" and "Persistent Memory"). Whereas DDR and PMEM resources are conveyed statically at boot, CXL allows for assembling and instantiating new regions from the available capacity of CXL memory expanders in the system. Sysfs with an "echo $region_name > $create_region_attribute" interface is chosen as the mechanism to initiate the provisioning process. This was chosen over ioctl() and netlink() to keep the configuration interface entirely in a pseudo-fs interface, and it was chosen over configfs since, aside from this one creation event, the interface is read-mostly. I.e. configfs supports cases where an object is designed to be provisioned each boot, like an iSCSI storage target, and CXL region creation is mostly for PMEM regions which are created usually once per-lifetime of a server instance. This is an improvement over nvdimm that pre-created "seed" devices that tended to confuse users looking to determine which devices are active and which are idle. Recall that the major change that CXL brings over previous persistent memory architectures is the ability to dynamically define new regions. Compare that to drivers like 'nfit' where the region configuration is statically defined by platform firmware. Regions are created as a child of a root decoder that encompasses an address space with constraints. When created through sysfs, the root decoder is explicit. When created from an LSA's region structure a root decoder will possibly need to be inferred by the driver. Upon region creation through sysfs, a vacant region is created with a unique name. Regions have a number of attributes that must be configured before the region can be bound to the driver where HDM decoder program is completed. An example of creating a new region: - Allocate a new region name: region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) - Create a new region by name: while region=$(cat /sys/bus/cxl/devices/decoder0.0/create_pmem_region) ! echo $region > /sys/bus/cxl/devices/decoder0.0/create_pmem_region do true; done - Region now exists in sysfs: stat -t /sys/bus/cxl/devices/decoder0.0/$region - Delete the region, and name: echo $region > /sys/bus/cxl/devices/decoder0.0/delete_region Signed-off-by: Ben Widawsky <bwidawsk@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/165784333909.1758207.794374602146306032.stgit@dwillia2-xfh.jf.intel.com [djbw: simplify locking, reword changelog] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-06-08 17:28:34 +00:00
bool is_cxl_region(struct device *dev);
extern struct bus_type cxl_bus_type;
struct cxl_driver {
const char *name;
int (*probe)(struct device *dev);
void (*remove)(struct device *dev);
struct device_driver drv;
int id;
};
#define to_cxl_drv(__drv) container_of_const(__drv, struct cxl_driver, drv)
int __cxl_driver_register(struct cxl_driver *cxl_drv, struct module *owner,
const char *modname);
#define cxl_driver_register(x) __cxl_driver_register(x, THIS_MODULE, KBUILD_MODNAME)
void cxl_driver_unregister(struct cxl_driver *cxl_drv);
#define module_cxl_driver(__cxl_driver) \
module_driver(__cxl_driver, cxl_driver_register, cxl_driver_unregister)
#define CXL_DEVICE_NVDIMM_BRIDGE 1
#define CXL_DEVICE_NVDIMM 2
cxl/port: Add a driver for 'struct cxl_port' objects The need for a CXL port driver and a dedicated cxl_bus_type is driven by a need to simultaneously support 2 independent physical memory decode domains (cache coherent CXL.mem and uncached PCI.mmio) that also intersect at a single PCIe device node. A CXL Port is a device that advertises a CXL Component Register block with an "HDM Decoder Capability Structure". >From Documentation/driver-api/cxl/memory-devices.rst: Similar to how a RAID driver takes disk objects and assembles them into a new logical device, the CXL subsystem is tasked to take PCIe and ACPI objects and assemble them into a CXL.mem decode topology. The need for runtime configuration of the CXL.mem topology is also similar to RAID in that different environments with the same hardware configuration may decide to assemble the topology in contrasting ways. One may choose performance (RAID0) striping memory across multiple Host Bridges and endpoints while another may opt for fault tolerance and disable any striping in the CXL.mem topology. The port driver identifies whether an endpoint Memory Expander is connected to a CXL topology. If an active (bound to the 'cxl_port' driver) CXL Port is not found at every PCIe Switch Upstream port and an active "root" CXL Port then the device is just a plain PCIe endpoint only capable of participating in PCI.mmio and DMA cycles, not CXL.mem coherent interleave sets. The 'cxl_port' driver lets the CXL subsystem leverage driver-core infrastructure for setup and teardown of register resources and communicating device activation status to userspace. The cxl_bus_type can rendezvous the async arrival of platform level CXL resources (via the 'cxl_acpi' driver) with the asynchronous enumeration of Memory Expander endpoints, while also implementing a hierarchical locking model independent of the associated 'struct pci_dev' locking model. The locking for dport and decoder enumeration is now handled in the core rather than callers. For now the port driver only enumerates and registers CXL resources (downstream port metadata and decoder resources) later it will be used to take action on its decoders in response to CXL.mem region provisioning requests. Note1: cxlpci.h has long depended on pci.h, but port.c was the first to not include pci.h. Carry that dependency in cxlpci.h. Note2: cxl port enumeration and probing complicates CXL subsystem init to the point that it helps to have centralized debug logging of probe events in cxl_bus_probe(). Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/164374948116.464348.1772618057599155408.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-01 21:07:51 +00:00
#define CXL_DEVICE_PORT 3
#define CXL_DEVICE_ROOT 4
cxl/mem: Add the cxl_mem driver At this point the subsystem can enumerate all CXL ports (CXL.mem decode resources in upstream switch ports and host bridges) in a system. The last mile is connecting those ports to endpoints. The cxl_mem driver connects an endpoint device to the platform CXL.mem protoctol decode-topology. At ->probe() time it walks its device-topology-ancestry and adds a CXL Port object at every Upstream Port hop until it gets to CXL root. The CXL root object is only present after a platform firmware driver registers platform CXL resources. For ACPI based platform this is managed by the ACPI0017 device and the cxl_acpi driver. The ports are registered such that disabling a given port automatically unregisters all descendant ports, and the chain can only be registered after the root is established. Given ACPI device scanning may run asynchronously compared to PCI device scanning the root driver is tasked with rescanning the bus after the root successfully probes. Conversely if any ports in a chain between the root and an endpoint becomes disconnected it subsequently triggers the endpoint to unregister. Given lock depenedencies the endpoint unregistration happens in a workqueue asynchronously. If userspace cares about synchronizing delayed work after port events the /sys/bus/cxl/flush attribute is available for that purpose. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [djbw: clarify changelog, rework hotplug support] Link: https://lore.kernel.org/r/164398782997.903003.9725273241627693186.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-02-04 15:18:31 +00:00
#define CXL_DEVICE_MEMORY_EXPANDER 5
#define CXL_DEVICE_REGION 6
#define CXL_DEVICE_PMEM_REGION 7
#define CXL_DEVICE_DAX_REGION 8
#define CXL_DEVICE_PMU 9
#define MODULE_ALIAS_CXL(type) MODULE_ALIAS("cxl:t" __stringify(type) "*")
#define CXL_MODALIAS_FMT "cxl:t%d"
struct cxl_nvdimm_bridge *to_cxl_nvdimm_bridge(struct device *dev);
struct cxl_nvdimm_bridge *devm_cxl_add_nvdimm_bridge(struct device *host,
struct cxl_port *port);
struct cxl_nvdimm *to_cxl_nvdimm(struct device *dev);
bool is_cxl_nvdimm(struct device *dev);
cxl/pmem: Fix module reload vs workqueue state A test of the form: while true; do modprobe -r cxl_pmem; modprobe cxl_pmem; done May lead to a crash signature of the form: BUG: unable to handle page fault for address: ffffffffc0660030 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page [..] Workqueue: cxl_pmem 0xffffffffc0660030 RIP: 0010:0xffffffffc0660030 Code: Unable to access opcode bytes at RIP 0xffffffffc0660006. [..] Call Trace: ? process_one_work+0x4ec/0x9c0 ? pwq_dec_nr_in_flight+0x100/0x100 ? rwlock_bug.part.0+0x60/0x60 ? worker_thread+0x2eb/0x700 In that report the 0xffffffffc0660030 address corresponds to the former function address of cxl_nvb_update_state() from a previous load of the module, not the current address. Fix that by arranging for ->state_work in the 'struct cxl_nvdimm_bridge' object to be reinitialized on cxl_pmem module reload. Details: Recall that CXL subsystem wants to link a CXL memory expander device to an NVDIMM sub-hierarchy when both a persistent memory range has been registered by the CXL platform driver (cxl_acpi) *and* when that CXL memory expander has published persistent memory capacity (Get Partition Info). To this end the cxl_nvdimm_bridge driver arranges to rescan the CXL bus when either of those conditions change. The helper bus_rescan_devices() can not be called underneath the device_lock() for any device on that bus, so the cxl_nvdimm_bridge driver uses a workqueue for the rescan. Typically a driver allocates driver data to hold a 'struct work_struct' for a driven device, but for a workqueue that may run after ->remove() returns, driver data will have been freed. The 'struct cxl_nvdimm_bridge' object holds the state and work_struct directly. Unfortunately it was only arranging for that infrastructure to be initialized once per device creation rather than the necessary once per workqueue (cxl_pmem_wq) creation. Introduce is_cxl_nvdimm_bridge() and cxl_nvdimm_bridge_reset() in support of invalidating stale references to a recently destroyed cxl_pmem_wq. Cc: <stable@vger.kernel.org> Fixes: 8fdcb1704f61 ("cxl/pmem: Add initial infrastructure for pmem support") Reported-by: Vishal Verma <vishal.l.verma@intel.com> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/163665474585.3505991.8397182770066720755.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-11-11 18:19:05 +00:00
bool is_cxl_nvdimm_bridge(struct device *dev);
cxl/mem: Fix no cxl_nvd during pmem region auto-assembling When CXL subsystem is auto-assembling a pmem region during cxl endpoint port probing, always hit below calltrace. BUG: kernel NULL pointer dereference, address: 0000000000000078 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page RIP: 0010:cxl_pmem_region_probe+0x22e/0x360 [cxl_pmem] Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x82/0x160 ? do_user_addr_fault+0x65/0x6b0 ? exc_page_fault+0x7d/0x170 ? asm_exc_page_fault+0x26/0x30 ? cxl_pmem_region_probe+0x22e/0x360 [cxl_pmem] ? cxl_pmem_region_probe+0x1ac/0x360 [cxl_pmem] cxl_bus_probe+0x1b/0x60 [cxl_core] really_probe+0x173/0x410 ? __pfx___device_attach_driver+0x10/0x10 __driver_probe_device+0x80/0x170 driver_probe_device+0x1e/0x90 __device_attach_driver+0x90/0x120 bus_for_each_drv+0x84/0xe0 __device_attach+0xbc/0x1f0 bus_probe_device+0x90/0xa0 device_add+0x51c/0x710 devm_cxl_add_pmem_region+0x1b5/0x380 [cxl_core] cxl_bus_probe+0x1b/0x60 [cxl_core] The cxl_nvd of the memdev needs to be available during the pmem region probe. Currently the cxl_nvd is registered after the endpoint port probe. The endpoint probe, in the case of autoassembly of regions, can cause a pmem region probe requiring the not yet available cxl_nvd. Adjust the sequence so this dependency is met. This requires adding a port parameter to cxl_find_nvdimm_bridge() that can be used to query the ancestor root port. The endpoint port is not yet available, but will share a common ancestor with its parent, so start the query from there instead. Fixes: f17b558d6663 ("cxl/pmem: Refactor nvdimm device registration, delete the workqueue") Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Li Ming <ming4.li@intel.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20240612064423.2567625-1-ming4.li@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-06-12 06:44:23 +00:00
int devm_cxl_add_nvdimm(struct cxl_port *parent_port, struct cxl_memdev *cxlmd);
struct cxl_nvdimm_bridge *cxl_find_nvdimm_bridge(struct cxl_port *port);
#ifdef CONFIG_CXL_REGION
bool is_cxl_pmem_region(struct device *dev);
struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev);
cxl/region: Add region autodiscovery Region autodiscovery is an asynchronous state machine advanced by cxl_port_probe(). After the decoders on an endpoint port are enumerated they are scanned for actively enabled instances. Each active decoder is flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region. If a region does not already exist for the address range setting of the decoder one is created. That creation process may race with other decoders of the same region being discovered since cxl_port_probe() is asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is introduced to mitigate that race. Once all decoders have arrived, "p->nr_targets == p->interleave_ways", they are sorted by their relative decode position. The sort algorithm involves finding the point in the cxl_port topology where one leg of the decode leads to deviceA and the other deviceB. At that point in the topology the target order in the 'struct cxl_switch_decoder' indicates the relative position of those endpoint decoders in the region. >From that point the region goes through the same setup and validation steps as user-created regions, but instead of programming the decoders it validates that driver would have written the same values to the decoders as were already present. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601999958.1924368.9366954455835735048.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:31:17 +00:00
int cxl_add_to_region(struct cxl_port *root,
struct cxl_endpoint_decoder *cxled);
struct cxl_dax_region *to_cxl_dax_region(struct device *dev);
#else
static inline bool is_cxl_pmem_region(struct device *dev)
{
return false;
}
static inline struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev)
{
return NULL;
}
cxl/region: Add region autodiscovery Region autodiscovery is an asynchronous state machine advanced by cxl_port_probe(). After the decoders on an endpoint port are enumerated they are scanned for actively enabled instances. Each active decoder is flagged for auto-assembly CXL_DECODER_F_AUTO and attached to a region. If a region does not already exist for the address range setting of the decoder one is created. That creation process may race with other decoders of the same region being discovered since cxl_port_probe() is asynchronous. A new 'struct cxl_root_decoder' lock, @range_lock, is introduced to mitigate that race. Once all decoders have arrived, "p->nr_targets == p->interleave_ways", they are sorted by their relative decode position. The sort algorithm involves finding the point in the cxl_port topology where one leg of the decode leads to deviceA and the other deviceB. At that point in the topology the target order in the 'struct cxl_switch_decoder' indicates the relative position of those endpoint decoders in the region. >From that point the region goes through the same setup and validation steps as user-created regions, but instead of programming the decoders it validates that driver would have written the same values to the decoders as were already present. Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167601999958.1924368.9366954455835735048.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2023-02-11 01:31:17 +00:00
static inline int cxl_add_to_region(struct cxl_port *root,
struct cxl_endpoint_decoder *cxled)
{
return 0;
}
static inline struct cxl_dax_region *to_cxl_dax_region(struct device *dev)
{
return NULL;
}
#endif
tools/testing/cxl: Introduce a mocked-up CXL port hierarchy Create an environment for CXL plumbing unit tests. Especially when it comes to an algorithm for HDM Decoder (Host-managed Device Memory Decoder) programming, the availability of an in-kernel-tree emulation environment for CXL configuration complexity and corner cases speeds development and deters regressions. The approach taken mirrors what was done for tools/testing/nvdimm/. I.e. an external module, cxl_test.ko built out of the tools/testing/cxl/ directory, provides mock implementations of kernel APIs and kernel objects to simulate a real world device hierarchy. One feedback for the tools/testing/nvdimm/ proposal was "why not do this in QEMU?". In fact, the CXL development community has developed a QEMU model for CXL [1]. However, there are a few blocking issues that keep QEMU from being a tight fit for topology + provisioning unit tests: 1/ The QEMU community has yet to show interest in merging any of this support that has had patches on the list since November 2020. So, testing CXL to date involves building custom QEMU with out-of-tree patches. 2/ CXL mechanisms like cross-host-bridge interleave do not have a clear path to be emulated by QEMU without major infrastructure work. This is easier to achieve with the alloc_mock_res() approach taken in this patch to shortcut-define emulated system physical address ranges with interleave behavior. The QEMU enabling has been critical to get the driver off the ground, and may still move forward, but it does not address the ongoing needs of a regression testing environment and test driven development. This patch adds an ACPI CXL Platform definition with emulated CXL multi-ported host-bridges. A follow on patch adds emulated memory expander devices. Acked-by: Ben Widawsky <ben.widawsky@intel.com> Reported-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/20210202005948.241655-1-ben.widawsky@intel.com [1] Link: https://lore.kernel.org/r/163164680798.2831381.838684634806668012.stgit@dwillia2-desk3.amr.corp.intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-14 19:14:22 +00:00
void cxl_endpoint_parse_cdat(struct cxl_port *port);
void cxl_switch_parse_cdat(struct cxl_port *port);
int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
struct access_coordinate *coord);
void cxl_region_perf_data_calculate(struct cxl_region *cxlr,
struct cxl_endpoint_decoder *cxled);
cxl: Calculate region bandwidth of targets with shared upstream link The current bandwidth calculation aggregates all the targets. This simple method does not take into account where multiple targets sharing under a switch or a root port where the aggregated bandwidth can be greater than the upstream link of the switch. To accurately account for the shared upstream uplink cases, a new update function is introduced by walking from the leaves to the root of the hierarchy and clamp the bandwidth in the process as needed. This process is done when all the targets for a region are present but before the final values are send to the HMAT handling code cached access_coordinate targets. The original perf calculation path was kept to calculate the latency performance data that does not require the shared link consideration. The shared upstream link calculation is done as a second pass when all the endpoints have arrived. Testing is done via qemu with CXL hierarchy. run_qemu[1] is modified to support several CXL hierarchy layouts. The following layouts are tested: HB: Host Bridge RP: Root Port SW: Switch EP: End Point 2 HB 2 RP 2 EP: resulting bandwidth: 624 1 HB 2 RP 2 EP: resulting bandwidth: 624 2 HB 2 RP 2 SW 4 EP: resulting bandwidth: 624 Current testing, perf number from SRAT/HMAT is hacked into the kernel code. However with new QEMU support of Generic Target Port that's incoming, the perf data injection is no longer needed. [1]: https://github.com/pmem/run_qemu Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://lore.kernel.org/linux-cxl/20240501152503.00002e60@Huawei.com/ Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20240904001316.1688225-3-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2024-09-04 00:11:51 +00:00
void cxl_region_shared_upstream_bandwidth_update(struct cxl_region *cxlr);
void cxl_memdev_update_perf(struct cxl_memdev *cxlmd);
void cxl_coordinates_combine(struct access_coordinate *out,
struct access_coordinate *c1,
struct access_coordinate *c2);
bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
tools/testing/cxl: Introduce a mocked-up CXL port hierarchy Create an environment for CXL plumbing unit tests. Especially when it comes to an algorithm for HDM Decoder (Host-managed Device Memory Decoder) programming, the availability of an in-kernel-tree emulation environment for CXL configuration complexity and corner cases speeds development and deters regressions. The approach taken mirrors what was done for tools/testing/nvdimm/. I.e. an external module, cxl_test.ko built out of the tools/testing/cxl/ directory, provides mock implementations of kernel APIs and kernel objects to simulate a real world device hierarchy. One feedback for the tools/testing/nvdimm/ proposal was "why not do this in QEMU?". In fact, the CXL development community has developed a QEMU model for CXL [1]. However, there are a few blocking issues that keep QEMU from being a tight fit for topology + provisioning unit tests: 1/ The QEMU community has yet to show interest in merging any of this support that has had patches on the list since November 2020. So, testing CXL to date involves building custom QEMU with out-of-tree patches. 2/ CXL mechanisms like cross-host-bridge interleave do not have a clear path to be emulated by QEMU without major infrastructure work. This is easier to achieve with the alloc_mock_res() approach taken in this patch to shortcut-define emulated system physical address ranges with interleave behavior. The QEMU enabling has been critical to get the driver off the ground, and may still move forward, but it does not address the ongoing needs of a regression testing environment and test driven development. This patch adds an ACPI CXL Platform definition with emulated CXL multi-ported host-bridges. A follow on patch adds emulated memory expander devices. Acked-by: Ben Widawsky <ben.widawsky@intel.com> Reported-by: Vishal Verma <vishal.l.verma@intel.com> Link: https://lore.kernel.org/r/20210202005948.241655-1-ben.widawsky@intel.com [1] Link: https://lore.kernel.org/r/163164680798.2831381.838684634806668012.stgit@dwillia2-desk3.amr.corp.intel.com Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-09-14 19:14:22 +00:00
/*
* Unit test builds overrides this to __weak, find the 'strong' version
* of these symbols in tools/testing/cxl/.
*/
#ifndef __mock
#define __mock static
#endif
cxl/mem: Find device capabilities Provide enough functionality to utilize the mailbox of a memory device. The mailbox is used to interact with the firmware running on the memory device. The flow is proven with one implemented command, "identify". Because the class code has already told the driver this is a memory device and the identify command is mandatory. CXL devices contain an array of capabilities that describe the interactions software can have with the device or firmware running on the device. A CXL compliant device must implement the device status and the mailbox capability. Additionally, a CXL compliant memory device must implement the memory device capability. Each of the capabilities can [will] provide an offset within the MMIO region for interacting with the CXL device. The capabilities tell the driver how to find and map the register space for CXL Memory Devices. The registers are required to utilize the CXL spec defined mailbox interface. The spec outlines two mailboxes, primary and secondary. The secondary mailbox is earmarked for system firmware, and not handled in this driver. Primary mailboxes are capable of generating an interrupt when submitting a background command. That implementation is saved for a later time. Reported-by: Colin Ian King <colin.king@canonical.com> (coverity) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> (smatch) Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> (v2) Link: https://www.computeexpresslink.org/download-the-specification Link: https://lore.kernel.org/r/20210217040958.1354670-3-ben.widawsky@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-02-17 04:09:51 +00:00
#endif /* __CXL_H__ */