x86/intel_rdt: Documentation for Cache Pseudo-Locking
Add description of Cache Pseudo-Locking feature, its interface, as well as an example of its usage. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Cc: vikas.shivappa@linux.intel.com Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: dave.hansen@intel.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/6e118c15d2c254a27b8891783505cd1bb94a2b10.1529706536.git.reinette.chatre@intel.com
This commit is contained in:
parent
d9b48c86eb
commit
e17e733070
@ -29,7 +29,11 @@ mount options are:
|
|||||||
L2 and L3 CDP are controlled seperately.
|
L2 and L3 CDP are controlled seperately.
|
||||||
|
|
||||||
RDT features are orthogonal. A particular system may support only
|
RDT features are orthogonal. A particular system may support only
|
||||||
monitoring, only control, or both monitoring and control.
|
monitoring, only control, or both monitoring and control. Cache
|
||||||
|
pseudo-locking is a unique way of using cache control to "pin" or
|
||||||
|
"lock" data in the cache. Details can be found in
|
||||||
|
"Cache Pseudo-Locking".
|
||||||
|
|
||||||
|
|
||||||
The mount succeeds if either of allocation or monitoring is present, but
|
The mount succeeds if either of allocation or monitoring is present, but
|
||||||
only those files and directories supported by the system will be created.
|
only those files and directories supported by the system will be created.
|
||||||
@ -86,6 +90,8 @@ related to allocation:
|
|||||||
and available for sharing.
|
and available for sharing.
|
||||||
"E" - Corresponding region is used exclusively by
|
"E" - Corresponding region is used exclusively by
|
||||||
one resource group. No sharing allowed.
|
one resource group. No sharing allowed.
|
||||||
|
"P" - Corresponding region is pseudo-locked. No
|
||||||
|
sharing allowed.
|
||||||
|
|
||||||
Memory bandwitdh(MB) subdirectory contains the following files
|
Memory bandwitdh(MB) subdirectory contains the following files
|
||||||
with respect to allocation:
|
with respect to allocation:
|
||||||
@ -192,7 +198,12 @@ When control is enabled all CTRL_MON groups will also contain:
|
|||||||
"mode":
|
"mode":
|
||||||
The "mode" of the resource group dictates the sharing of its
|
The "mode" of the resource group dictates the sharing of its
|
||||||
allocations. A "shareable" resource group allows sharing of its
|
allocations. A "shareable" resource group allows sharing of its
|
||||||
allocations while an "exclusive" resource group does not.
|
allocations while an "exclusive" resource group does not. A
|
||||||
|
cache pseudo-locked region is created by first writing
|
||||||
|
"pseudo-locksetup" to the "mode" file before writing the cache
|
||||||
|
pseudo-locked region's schemata to the resource group's "schemata"
|
||||||
|
file. On successful pseudo-locked region creation the mode will
|
||||||
|
automatically change to "pseudo-locked".
|
||||||
|
|
||||||
When monitoring is enabled all MON groups will also contain:
|
When monitoring is enabled all MON groups will also contain:
|
||||||
|
|
||||||
@ -410,6 +421,170 @@ L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
|
|||||||
L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
|
L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
|
||||||
L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
|
L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
|
||||||
|
|
||||||
|
Cache Pseudo-Locking
|
||||||
|
--------------------
|
||||||
|
CAT enables a user to specify the amount of cache space that an
|
||||||
|
application can fill. Cache pseudo-locking builds on the fact that a
|
||||||
|
CPU can still read and write data pre-allocated outside its current
|
||||||
|
allocated area on a cache hit. With cache pseudo-locking, data can be
|
||||||
|
preloaded into a reserved portion of cache that no application can
|
||||||
|
fill, and from that point on will only serve cache hits. The cache
|
||||||
|
pseudo-locked memory is made accessible to user space where an
|
||||||
|
application can map it into its virtual address space and thus have
|
||||||
|
a region of memory with reduced average read latency.
|
||||||
|
|
||||||
|
The creation of a cache pseudo-locked region is triggered by a request
|
||||||
|
from the user to do so that is accompanied by a schemata of the region
|
||||||
|
to be pseudo-locked. The cache pseudo-locked region is created as follows:
|
||||||
|
- Create a CAT allocation CLOSNEW with a CBM matching the schemata
|
||||||
|
from the user of the cache region that will contain the pseudo-locked
|
||||||
|
memory. This region must not overlap with any current CAT allocation/CLOS
|
||||||
|
on the system and no future overlap with this cache region is allowed
|
||||||
|
while the pseudo-locked region exists.
|
||||||
|
- Create a contiguous region of memory of the same size as the cache
|
||||||
|
region.
|
||||||
|
- Flush the cache, disable hardware prefetchers, disable preemption.
|
||||||
|
- Make CLOSNEW the active CLOS and touch the allocated memory to load
|
||||||
|
it into the cache.
|
||||||
|
- Set the previous CLOS as active.
|
||||||
|
- At this point the closid CLOSNEW can be released - the cache
|
||||||
|
pseudo-locked region is protected as long as its CBM does not appear in
|
||||||
|
any CAT allocation. Even though the cache pseudo-locked region will from
|
||||||
|
this point on not appear in any CBM of any CLOS an application running with
|
||||||
|
any CLOS will be able to access the memory in the pseudo-locked region since
|
||||||
|
the region continues to serve cache hits.
|
||||||
|
- The contiguous region of memory loaded into the cache is exposed to
|
||||||
|
user-space as a character device.
|
||||||
|
|
||||||
|
Cache pseudo-locking increases the probability that data will remain
|
||||||
|
in the cache via carefully configuring the CAT feature and controlling
|
||||||
|
application behavior. There is no guarantee that data is placed in
|
||||||
|
cache. Instructions like INVD, WBINVD, CLFLUSH, etc. can still evict
|
||||||
|
“locked” data from cache. Power management C-states may shrink or
|
||||||
|
power off cache. It is thus recommended to limit the processor maximum
|
||||||
|
C-state, for example, by setting the processor.max_cstate kernel parameter.
|
||||||
|
|
||||||
|
It is required that an application using a pseudo-locked region runs
|
||||||
|
with affinity to the cores (or a subset of the cores) associated
|
||||||
|
with the cache on which the pseudo-locked region resides. A sanity check
|
||||||
|
within the code will not allow an application to map pseudo-locked memory
|
||||||
|
unless it runs with affinity to cores associated with the cache on which the
|
||||||
|
pseudo-locked region resides. The sanity check is only done during the
|
||||||
|
initial mmap() handling, there is no enforcement afterwards and the
|
||||||
|
application self needs to ensure it remains affine to the correct cores.
|
||||||
|
|
||||||
|
Pseudo-locking is accomplished in two stages:
|
||||||
|
1) During the first stage the system administrator allocates a portion
|
||||||
|
of cache that should be dedicated to pseudo-locking. At this time an
|
||||||
|
equivalent portion of memory is allocated, loaded into allocated
|
||||||
|
cache portion, and exposed as a character device.
|
||||||
|
2) During the second stage a user-space application maps (mmap()) the
|
||||||
|
pseudo-locked memory into its address space.
|
||||||
|
|
||||||
|
Cache Pseudo-Locking Interface
|
||||||
|
------------------------------
|
||||||
|
A pseudo-locked region is created using the resctrl interface as follows:
|
||||||
|
|
||||||
|
1) Create a new resource group by creating a new directory in /sys/fs/resctrl.
|
||||||
|
2) Change the new resource group's mode to "pseudo-locksetup" by writing
|
||||||
|
"pseudo-locksetup" to the "mode" file.
|
||||||
|
3) Write the schemata of the pseudo-locked region to the "schemata" file. All
|
||||||
|
bits within the schemata should be "unused" according to the "bit_usage"
|
||||||
|
file.
|
||||||
|
|
||||||
|
On successful pseudo-locked region creation the "mode" file will contain
|
||||||
|
"pseudo-locked" and a new character device with the same name as the resource
|
||||||
|
group will exist in /dev/pseudo_lock. This character device can be mmap()'ed
|
||||||
|
by user space in order to obtain access to the pseudo-locked memory region.
|
||||||
|
|
||||||
|
An example of cache pseudo-locked region creation and usage can be found below.
|
||||||
|
|
||||||
|
Cache Pseudo-Locking Debugging Interface
|
||||||
|
---------------------------------------
|
||||||
|
The pseudo-locking debugging interface is enabled by default (if
|
||||||
|
CONFIG_DEBUG_FS is enabled) and can be found in /sys/kernel/debug/resctrl.
|
||||||
|
|
||||||
|
There is no explicit way for the kernel to test if a provided memory
|
||||||
|
location is present in the cache. The pseudo-locking debugging interface uses
|
||||||
|
the tracing infrastructure to provide two ways to measure cache residency of
|
||||||
|
the pseudo-locked region:
|
||||||
|
1) Memory access latency using the pseudo_lock_mem_latency tracepoint. Data
|
||||||
|
from these measurements are best visualized using a hist trigger (see
|
||||||
|
example below). In this test the pseudo-locked region is traversed at
|
||||||
|
a stride of 32 bytes while hardware prefetchers and preemption
|
||||||
|
are disabled. This also provides a substitute visualization of cache
|
||||||
|
hits and misses.
|
||||||
|
2) Cache hit and miss measurements using model specific precision counters if
|
||||||
|
available. Depending on the levels of cache on the system the pseudo_lock_l2
|
||||||
|
and pseudo_lock_l3 tracepoints are available.
|
||||||
|
WARNING: triggering this measurement uses from two (for just L2
|
||||||
|
measurements) to four (for L2 and L3 measurements) precision counters on
|
||||||
|
the system, if any other measurements are in progress the counters and
|
||||||
|
their corresponding event registers will be clobbered.
|
||||||
|
|
||||||
|
When a pseudo-locked region is created a new debugfs directory is created for
|
||||||
|
it in debugfs as /sys/kernel/debug/resctrl/<newdir>. A single
|
||||||
|
write-only file, pseudo_lock_measure, is present in this directory. The
|
||||||
|
measurement on the pseudo-locked region depends on the number, 1 or 2,
|
||||||
|
written to this debugfs file. Since the measurements are recorded with the
|
||||||
|
tracing infrastructure the relevant tracepoints need to be enabled before the
|
||||||
|
measurement is triggered.
|
||||||
|
|
||||||
|
Example of latency debugging interface:
|
||||||
|
In this example a pseudo-locked region named "newlock" was created. Here is
|
||||||
|
how we can measure the latency in cycles of reading from this region and
|
||||||
|
visualize this data with a histogram that is available if CONFIG_HIST_TRIGGERS
|
||||||
|
is set:
|
||||||
|
# :> /sys/kernel/debug/tracing/trace
|
||||||
|
# echo 'hist:keys=latency' > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/trigger
|
||||||
|
# echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
|
||||||
|
# echo 1 > /sys/kernel/debug/resctrl/newlock/pseudo_lock_measure
|
||||||
|
# echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
|
||||||
|
# cat /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/hist
|
||||||
|
|
||||||
|
# event histogram
|
||||||
|
#
|
||||||
|
# trigger info: hist:keys=latency:vals=hitcount:sort=hitcount:size=2048 [active]
|
||||||
|
#
|
||||||
|
|
||||||
|
{ latency: 456 } hitcount: 1
|
||||||
|
{ latency: 50 } hitcount: 83
|
||||||
|
{ latency: 36 } hitcount: 96
|
||||||
|
{ latency: 44 } hitcount: 174
|
||||||
|
{ latency: 48 } hitcount: 195
|
||||||
|
{ latency: 46 } hitcount: 262
|
||||||
|
{ latency: 42 } hitcount: 693
|
||||||
|
{ latency: 40 } hitcount: 3204
|
||||||
|
{ latency: 38 } hitcount: 3484
|
||||||
|
|
||||||
|
Totals:
|
||||||
|
Hits: 8192
|
||||||
|
Entries: 9
|
||||||
|
Dropped: 0
|
||||||
|
|
||||||
|
Example of cache hits/misses debugging:
|
||||||
|
In this example a pseudo-locked region named "newlock" was created on the L2
|
||||||
|
cache of a platform. Here is how we can obtain details of the cache hits
|
||||||
|
and misses using the platform's precision counters.
|
||||||
|
|
||||||
|
# :> /sys/kernel/debug/tracing/trace
|
||||||
|
# echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_l2/enable
|
||||||
|
# echo 2 > /sys/kernel/debug/resctrl/newlock/pseudo_lock_measure
|
||||||
|
# echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_l2/enable
|
||||||
|
# cat /sys/kernel/debug/tracing/trace
|
||||||
|
|
||||||
|
# tracer: nop
|
||||||
|
#
|
||||||
|
# _-----=> irqs-off
|
||||||
|
# / _----=> need-resched
|
||||||
|
# | / _---=> hardirq/softirq
|
||||||
|
# || / _--=> preempt-depth
|
||||||
|
# ||| / delay
|
||||||
|
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
|
||||||
|
# | | | |||| | |
|
||||||
|
pseudo_lock_mea-1672 [002] .... 3132.860500: pseudo_lock_l2: hits=4097 miss=0
|
||||||
|
|
||||||
|
|
||||||
Examples for RDT allocation usage:
|
Examples for RDT allocation usage:
|
||||||
|
|
||||||
Example 1
|
Example 1
|
||||||
@ -596,6 +771,107 @@ A resource group cannot be forced to overlap with an exclusive resource group:
|
|||||||
# cat info/last_cmd_status
|
# cat info/last_cmd_status
|
||||||
overlaps with exclusive group
|
overlaps with exclusive group
|
||||||
|
|
||||||
|
Example of Cache Pseudo-Locking
|
||||||
|
-------------------------------
|
||||||
|
Lock portion of L2 cache from cache id 1 using CBM 0x3. Pseudo-locked
|
||||||
|
region is exposed at /dev/pseudo_lock/newlock that can be provided to
|
||||||
|
application for argument to mmap().
|
||||||
|
|
||||||
|
# mount -t resctrl resctrl /sys/fs/resctrl/
|
||||||
|
# cd /sys/fs/resctrl
|
||||||
|
|
||||||
|
Ensure that there are bits available that can be pseudo-locked, since only
|
||||||
|
unused bits can be pseudo-locked the bits to be pseudo-locked needs to be
|
||||||
|
removed from the default resource group's schemata:
|
||||||
|
# cat info/L2/bit_usage
|
||||||
|
0=SSSSSSSS;1=SSSSSSSS
|
||||||
|
# echo 'L2:1=0xfc' > schemata
|
||||||
|
# cat info/L2/bit_usage
|
||||||
|
0=SSSSSSSS;1=SSSSSS00
|
||||||
|
|
||||||
|
Create a new resource group that will be associated with the pseudo-locked
|
||||||
|
region, indicate that it will be used for a pseudo-locked region, and
|
||||||
|
configure the requested pseudo-locked region capacity bitmask:
|
||||||
|
|
||||||
|
# mkdir newlock
|
||||||
|
# echo pseudo-locksetup > newlock/mode
|
||||||
|
# echo 'L2:1=0x3' > newlock/schemata
|
||||||
|
|
||||||
|
On success the resource group's mode will change to pseudo-locked, the
|
||||||
|
bit_usage will reflect the pseudo-locked region, and the character device
|
||||||
|
exposing the pseudo-locked region will exist:
|
||||||
|
|
||||||
|
# cat newlock/mode
|
||||||
|
pseudo-locked
|
||||||
|
# cat info/L2/bit_usage
|
||||||
|
0=SSSSSSSS;1=SSSSSSPP
|
||||||
|
# ls -l /dev/pseudo_lock/newlock
|
||||||
|
crw------- 1 root root 243, 0 Apr 3 05:01 /dev/pseudo_lock/newlock
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Example code to access one page of pseudo-locked cache region
|
||||||
|
* from user space.
|
||||||
|
*/
|
||||||
|
#define _GNU_SOURCE
|
||||||
|
#include <fcntl.h>
|
||||||
|
#include <sched.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <unistd.h>
|
||||||
|
#include <sys/mman.h>
|
||||||
|
|
||||||
|
/*
|
||||||
|
* It is required that the application runs with affinity to only
|
||||||
|
* cores associated with the pseudo-locked region. Here the cpu
|
||||||
|
* is hardcoded for convenience of example.
|
||||||
|
*/
|
||||||
|
static int cpuid = 2;
|
||||||
|
|
||||||
|
int main(int argc, char *argv[])
|
||||||
|
{
|
||||||
|
cpu_set_t cpuset;
|
||||||
|
long page_size;
|
||||||
|
void *mapping;
|
||||||
|
int dev_fd;
|
||||||
|
int ret;
|
||||||
|
|
||||||
|
page_size = sysconf(_SC_PAGESIZE);
|
||||||
|
|
||||||
|
CPU_ZERO(&cpuset);
|
||||||
|
CPU_SET(cpuid, &cpuset);
|
||||||
|
ret = sched_setaffinity(0, sizeof(cpuset), &cpuset);
|
||||||
|
if (ret < 0) {
|
||||||
|
perror("sched_setaffinity");
|
||||||
|
exit(EXIT_FAILURE);
|
||||||
|
}
|
||||||
|
|
||||||
|
dev_fd = open("/dev/pseudo_lock/newlock", O_RDWR);
|
||||||
|
if (dev_fd < 0) {
|
||||||
|
perror("open");
|
||||||
|
exit(EXIT_FAILURE);
|
||||||
|
}
|
||||||
|
|
||||||
|
mapping = mmap(0, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
|
||||||
|
dev_fd, 0);
|
||||||
|
if (mapping == MAP_FAILED) {
|
||||||
|
perror("mmap");
|
||||||
|
close(dev_fd);
|
||||||
|
exit(EXIT_FAILURE);
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Application interacts with pseudo-locked memory @mapping */
|
||||||
|
|
||||||
|
ret = munmap(mapping, page_size);
|
||||||
|
if (ret < 0) {
|
||||||
|
perror("munmap");
|
||||||
|
close(dev_fd);
|
||||||
|
exit(EXIT_FAILURE);
|
||||||
|
}
|
||||||
|
|
||||||
|
close(dev_fd);
|
||||||
|
exit(EXIT_SUCCESS);
|
||||||
|
}
|
||||||
|
|
||||||
Locking between applications
|
Locking between applications
|
||||||
----------------------------
|
----------------------------
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user