mirror of
https://github.com/torvalds/linux.git
synced 2024-12-12 14:12:51 +00:00
c221c0b030
This is intended for use with NVDIMMs that are physically persistent (physically like flash) so that they can be used as a cost-effective RAM replacement. Intel Optane DC persistent memory is one implementation of this kind of NVDIMM. Currently, a persistent memory region is "owned" by a device driver, either the "Direct DAX" or "Filesystem DAX" drivers. These drivers allow applications to explicitly use persistent memory, generally by being modified to use special, new libraries. (DIMM-based persistent memory hardware/software is described in great detail here: Documentation/nvdimm/nvdimm.txt). However, this limits persistent memory use to applications which *have* been modified. To make it more broadly usable, this driver "hotplugs" memory into the kernel, to be managed and used just like normal RAM would be. To make this work, management software must remove the device from being controlled by the "Device DAX" infrastructure: echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind and then tell the new driver that it can bind to the device: echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id After this, there will be a number of new memory sections visible in sysfs that can be onlined, or that may get onlined by existing udev-initiated memory hotplug rules. This rebinding procedure is currently a one-way trip. Once memory is bound to "kmem", it's there permanently and can not be unbound and assigned back to device_dax. The kmem driver will never bind to a dax device unless the device is *explicitly* bound to the driver. There are two reasons for this: One, since it is a one-way trip, it can not be undone if bound incorrectly. Two, the kmem driver destroys data on the device. Think of if you had good data on a pmem device. It would be catastrophic if you compile-in "kmem", but leave out the "device_dax" driver. kmem would take over the device and write volatile data all over your good data. This inherits any existing NUMA information for the newly-added memory from the persistent memory device that came from the firmware. On Intel platforms, the firmware has guarantees that require each socket's persistent memory to be in a separate memory-only NUMA node. That means that this patch is not expected to create NUMA nodes, but will simply hotplug memory into existing nodes. Because NUMA nodes are created, the existing NUMA APIs and tools are sufficient to create policies for applications or memory areas to have affinity for or an aversion to using this memory. There is currently some metadata at the beginning of pmem regions. The section-size memory hotplug restrictions, plus this small reserved area can cause the "loss" of a section or two of capacity. This should be fixable in follow-on patches. But, as a first step, losing 256MB of memory (worst case) out of hundreds of gigabytes is a good tradeoff vs. the required code to fix this up precisely. This calculation is also the reason we export memory_block_size_bytes(). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: linux-nvdimm@lists.01.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: Huang Ying <ying.huang@intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Jerome Glisse <jglisse@redhat.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
61 lines
1.8 KiB
Plaintext
61 lines
1.8 KiB
Plaintext
config DAX_DRIVER
|
|
select DAX
|
|
bool
|
|
|
|
menuconfig DAX
|
|
tristate "DAX: direct access to differentiated memory"
|
|
select SRCU
|
|
default m if NVDIMM_DAX
|
|
|
|
if DAX
|
|
|
|
config DEV_DAX
|
|
tristate "Device DAX: direct access mapping device"
|
|
depends on TRANSPARENT_HUGEPAGE
|
|
help
|
|
Support raw access to differentiated (persistence, bandwidth,
|
|
latency...) memory via an mmap(2) capable character
|
|
device. Platform firmware or a device driver may identify a
|
|
platform memory resource that is differentiated from the
|
|
baseline memory pool. Mappings of a /dev/daxX.Y device impose
|
|
restrictions that make the mapping behavior deterministic.
|
|
|
|
config DEV_DAX_PMEM
|
|
tristate "PMEM DAX: direct access to persistent memory"
|
|
depends on LIBNVDIMM && NVDIMM_DAX && DEV_DAX
|
|
depends on m # until we can kill DEV_DAX_PMEM_COMPAT
|
|
default DEV_DAX
|
|
help
|
|
Support raw access to persistent memory. Note that this
|
|
driver consumes memory ranges allocated and exported by the
|
|
libnvdimm sub-system.
|
|
|
|
Say M if unsure
|
|
|
|
config DEV_DAX_KMEM
|
|
tristate "KMEM DAX: volatile-use of persistent memory"
|
|
default DEV_DAX
|
|
depends on DEV_DAX
|
|
depends on MEMORY_HOTPLUG # for add_memory() and friends
|
|
help
|
|
Support access to persistent memory as if it were RAM. This
|
|
allows easier use of persistent memory by unmodified
|
|
applications.
|
|
|
|
To use this feature, a DAX device must be unbound from the
|
|
device_dax driver (PMEM DAX) and bound to this kmem driver
|
|
on each boot.
|
|
|
|
Say N if unsure.
|
|
|
|
config DEV_DAX_PMEM_COMPAT
|
|
tristate "PMEM DAX: support the deprecated /sys/class/dax interface"
|
|
depends on DEV_DAX_PMEM
|
|
default DEV_DAX_PMEM
|
|
help
|
|
Older versions of the libdaxctl library expect to find all
|
|
device-dax instances under /sys/class/dax. If libdaxctl in
|
|
your distribution is older than v58 say M, otherwise say N.
|
|
|
|
endif
|