rdmacg: Added documentation for rdmacg
Added documentation for v1 and v2 version describing high level design and usage examples on using rdma controller. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
This commit is contained in:
109
Documentation/cgroup-v1/rdma.txt
Normal file
109
Documentation/cgroup-v1/rdma.txt
Normal file
@@ -0,0 +1,109 @@
|
|||||||
|
RDMA Controller
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Contents
|
||||||
|
--------
|
||||||
|
|
||||||
|
1. Overview
|
||||||
|
1-1. What is RDMA controller?
|
||||||
|
1-2. Why RDMA controller needed?
|
||||||
|
1-3. How is RDMA controller implemented?
|
||||||
|
2. Usage Examples
|
||||||
|
|
||||||
|
1. Overview
|
||||||
|
|
||||||
|
1-1. What is RDMA controller?
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
|
RDMA controller allows user to limit RDMA/IB specific resources that a given
|
||||||
|
set of processes can use. These processes are grouped using RDMA controller.
|
||||||
|
|
||||||
|
RDMA controller defines two resources which can be limited for processes of a
|
||||||
|
cgroup.
|
||||||
|
|
||||||
|
1-2. Why RDMA controller needed?
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
|
Currently user space applications can easily take away all the rdma verb
|
||||||
|
specific resources such as AH, CQ, QP, MR etc. Due to which other applications
|
||||||
|
in other cgroup or kernel space ULPs may not even get chance to allocate any
|
||||||
|
rdma resources. This can leads to service unavailability.
|
||||||
|
|
||||||
|
Therefore RDMA controller is needed through which resource consumption
|
||||||
|
of processes can be limited. Through this controller different rdma
|
||||||
|
resources can be accounted.
|
||||||
|
|
||||||
|
1-3. How is RDMA controller implemented?
|
||||||
|
----------------------------------------
|
||||||
|
|
||||||
|
RDMA cgroup allows limit configuration of resources. Rdma cgroup maintains
|
||||||
|
resource accounting per cgroup, per device using resource pool structure.
|
||||||
|
Each such resource pool is limited up to 64 resources in given resource pool
|
||||||
|
by rdma cgroup, which can be extended later if required.
|
||||||
|
|
||||||
|
This resource pool object is linked to the cgroup css. Typically there
|
||||||
|
are 0 to 4 resource pool instances per cgroup, per device in most use cases.
|
||||||
|
But nothing limits to have it more. At present hundreds of RDMA devices per
|
||||||
|
single cgroup may not be handled optimally, however there is no
|
||||||
|
known use case or requirement for such configuration either.
|
||||||
|
|
||||||
|
Since RDMA resources can be allocated from any process and can be freed by any
|
||||||
|
of the child processes which shares the address space, rdma resources are
|
||||||
|
always owned by the creator cgroup css. This allows process migration from one
|
||||||
|
to other cgroup without major complexity of transferring resource ownership;
|
||||||
|
because such ownership is not really present due to shared nature of
|
||||||
|
rdma resources. Linking resources around css also ensures that cgroups can be
|
||||||
|
deleted after processes migrated. This allow progress migration as well with
|
||||||
|
active resources, even though that is not a primary use case.
|
||||||
|
|
||||||
|
Whenever RDMA resource charging occurs, owner rdma cgroup is returned to
|
||||||
|
the caller. Same rdma cgroup should be passed while uncharging the resource.
|
||||||
|
This also allows process migrated with active RDMA resource to charge
|
||||||
|
to new owner cgroup for new resource. It also allows to uncharge resource of
|
||||||
|
a process from previously charged cgroup which is migrated to new cgroup,
|
||||||
|
even though that is not a primary use case.
|
||||||
|
|
||||||
|
Resource pool object is created in following situations.
|
||||||
|
(a) User sets the limit and no previous resource pool exist for the device
|
||||||
|
of interest for the cgroup.
|
||||||
|
(b) No resource limits were configured, but IB/RDMA stack tries to
|
||||||
|
charge the resource. So that it correctly uncharge them when applications are
|
||||||
|
running without limits and later on when limits are enforced during uncharging,
|
||||||
|
otherwise usage count will drop to negative.
|
||||||
|
|
||||||
|
Resource pool is destroyed if all the resource limits are set to max and
|
||||||
|
it is the last resource getting deallocated.
|
||||||
|
|
||||||
|
User should set all the limit to max value if it intents to remove/unconfigure
|
||||||
|
the resource pool for a particular device.
|
||||||
|
|
||||||
|
IB stack honors limits enforced by the rdma controller. When application
|
||||||
|
query about maximum resource limits of IB device, it returns minimum of
|
||||||
|
what is configured by user for a given cgroup and what is supported by
|
||||||
|
IB device.
|
||||||
|
|
||||||
|
Following resources can be accounted by rdma controller.
|
||||||
|
hca_handle Maximum number of HCA Handles
|
||||||
|
hca_object Maximum number of HCA Objects
|
||||||
|
|
||||||
|
2. Usage Examples
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
(a) Configure resource limit:
|
||||||
|
echo mlx4_0 hca_handle=2 hca_object=2000 > /sys/fs/cgroup/rdma/1/rdma.max
|
||||||
|
echo ocrdma1 hca_handle=3 > /sys/fs/cgroup/rdma/2/rdma.max
|
||||||
|
|
||||||
|
(b) Query resource limit:
|
||||||
|
cat /sys/fs/cgroup/rdma/2/rdma.max
|
||||||
|
#Output:
|
||||||
|
mlx4_0 hca_handle=2 hca_object=2000
|
||||||
|
ocrdma1 hca_handle=3 hca_object=max
|
||||||
|
|
||||||
|
(c) Query current usage:
|
||||||
|
cat /sys/fs/cgroup/rdma/2/rdma.current
|
||||||
|
#Output:
|
||||||
|
mlx4_0 hca_handle=1 hca_object=20
|
||||||
|
ocrdma1 hca_handle=1 hca_object=23
|
||||||
|
|
||||||
|
(d) Delete resource limit:
|
||||||
|
echo echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max
|
||||||
@@ -47,6 +47,8 @@ CONTENTS
|
|||||||
5-3. IO
|
5-3. IO
|
||||||
5-3-1. IO Interface Files
|
5-3-1. IO Interface Files
|
||||||
5-3-2. Writeback
|
5-3-2. Writeback
|
||||||
|
5-4. RDMA
|
||||||
|
5-4-1. RDMA Interface Files
|
||||||
6. Namespace
|
6. Namespace
|
||||||
6-1. Basics
|
6-1. Basics
|
||||||
6-2. The Root and Views
|
6-2. The Root and Views
|
||||||
@@ -1119,6 +1121,42 @@ writeback as follows.
|
|||||||
vm.dirty[_background]_ratio.
|
vm.dirty[_background]_ratio.
|
||||||
|
|
||||||
|
|
||||||
|
5-4. RDMA
|
||||||
|
|
||||||
|
The "rdma" controller regulates the distribution and accounting of
|
||||||
|
of RDMA resources.
|
||||||
|
|
||||||
|
5-4-1. RDMA Interface Files
|
||||||
|
|
||||||
|
rdma.max
|
||||||
|
A readwrite nested-keyed file that exists for all the cgroups
|
||||||
|
except root that describes current configured resource limit
|
||||||
|
for a RDMA/IB device.
|
||||||
|
|
||||||
|
Lines are keyed by device name and are not ordered.
|
||||||
|
Each line contains space separated resource name and its configured
|
||||||
|
limit that can be distributed.
|
||||||
|
|
||||||
|
The following nested keys are defined.
|
||||||
|
|
||||||
|
hca_handle Maximum number of HCA Handles
|
||||||
|
hca_object Maximum number of HCA Objects
|
||||||
|
|
||||||
|
An example for mlx4 and ocrdma device follows.
|
||||||
|
|
||||||
|
mlx4_0 hca_handle=2 hca_object=2000
|
||||||
|
ocrdma1 hca_handle=3 hca_object=max
|
||||||
|
|
||||||
|
rdma.current
|
||||||
|
A read-only file that describes current resource usage.
|
||||||
|
It exists for all the cgroup except root.
|
||||||
|
|
||||||
|
An example for mlx4 and ocrdma device follows.
|
||||||
|
|
||||||
|
mlx4_0 hca_handle=1 hca_object=20
|
||||||
|
ocrdma1 hca_handle=1 hca_object=23
|
||||||
|
|
||||||
|
|
||||||
6. Namespace
|
6. Namespace
|
||||||
|
|
||||||
6-1. Basics
|
6-1. Basics
|
||||||
|
|||||||
Reference in New Issue
Block a user