Commit Graph

11 Commits

Author SHA1 Message Date
Tejun Heo
610467270f cgroup: don't call migration methods if there are no tasks to migrate
Subsystem migration methods shouldn't be called for empty migrations.
cgroup_migrate_execute() implements this guarantee by bailing early if
there are no source css_sets.  This used to be correct before
a79a908fd2 ("cgroup: introduce cgroup namespaces"), but no longer
since the commit because css_sets can stay pinned without tasks in
them.

This caused cgroup_migrate_execute() call into cpuset migration
methods with an empty cgroup_taskset.  cpuset migration methods
correctly assume that cgroup_taskset_first() never returns NULL;
however, due to the bug, it can, leading to the following oops.

  Unable to handle kernel paging request for data at address 0x00000960
  Faulting instruction address: 0xc0000000001d6868
  Oops: Kernel access of bad area, sig: 11 [#1]
  ...
  CPU: 14 PID: 16947 Comm: kworker/14:0 Tainted: G        W
  4.12.0-rc4-next-20170609 #2
  Workqueue: events cpuset_hotplug_workfn
  task: c00000000ca60580 task.stack: c00000000c728000
  NIP: c0000000001d6868 LR: c0000000001d6858 CTR: c0000000001d6810
  REGS: c00000000c72b720 TRAP: 0300   Tainted: GW (4.12.0-rc4-next-20170609)
  MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44722422  XER: 20000000
  CFAR: c000000000008710 DAR: 0000000000000960 DSISR: 40000000 SOFTE: 1
  GPR00: c0000000001d6858 c00000000c72b9a0 c000000001536e00 0000000000000000
  GPR04: c00000000c72b9c0 0000000000000000 c00000000c72bad0 c000000766367678
  GPR08: c000000766366d10 c00000000c72b958 c000000001736e00 0000000000000000
  GPR12: c0000000001d6810 c00000000e749300 c000000000123ef8 c000000775af4180
  GPR16: 0000000000000000 0000000000000000 c00000075480e9c0 c00000075480e9e0
  GPR20: c00000075480e8c0 0000000000000001 0000000000000000 c00000000c72ba20
  GPR24: c00000000c72baa0 c00000000c72bac0 c000000001407248 c00000000c72ba20
  GPR28: c00000000141fc80 c00000000c72bac0 c00000000c6bc790 0000000000000000
  NIP [c0000000001d6868] cpuset_can_attach+0x58/0x1b0
  LR [c0000000001d6858] cpuset_can_attach+0x48/0x1b0
  Call Trace:
  [c00000000c72b9a0] [c0000000001d6858] cpuset_can_attach+0x48/0x1b0 (unreliable)
  [c00000000c72ba00] [c0000000001cbe80] cgroup_migrate_execute+0xb0/0x450
  [c00000000c72ba80] [c0000000001d3754] cgroup_transfer_tasks+0x1c4/0x360
  [c00000000c72bba0] [c0000000001d923c] cpuset_hotplug_workfn+0x86c/0xa20
  [c00000000c72bca0] [c00000000011aa44] process_one_work+0x1e4/0x580
  [c00000000c72bd30] [c00000000011ae78] worker_thread+0x98/0x5c0
  [c00000000c72bdc0] [c000000000124058] kthread+0x168/0x1b0
  [c00000000c72be30] [c00000000000b2e8] ret_from_kernel_thread+0x5c/0x74
  Instruction dump:
  f821ffa1 7c7d1b78 60000000 60000000 38810020 7fa3eb78 3f42ffed 4bff4c25
  60000000 3b5a0448 3d420020 eb610020 <e9230960> 7f43d378 e9290000 f92af200
  ---[ end trace dcaaf98fb36d9e64 ]---

This patch fixes the bug by adding an explicit nr_tasks counter to
cgroup_taskset and skipping calling the migration methods if the
counter is zero.  While at it, remove the now spurious check on no
source css_sets.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: stable@vger.kernel.org # v4.6+
Fixes: a79a908fd2 ("cgroup: introduce cgroup namespaces")
Link: http://lkml.kernel.org/r/1497266622.15415.39.camel@abdul.in.ibm.com
2017-07-08 07:37:50 -04:00
Waiman Long
a28f8f5e99 cgroup: Move debug cgroup to its own file
The debug cgroup currently resides within cgroup-v1.c and is enabled
only for v1 cgroup. To enable the debug cgroup also for v2, it makes
sense to put the code into its own file as it will no longer be v1
specific. There is no change to the debug cgroup specific code.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-06-14 16:01:21 -04:00
Zefan Li
9732adc5d6 cgroup: avoid attaching a cgroup root to two different superblocks, take 2
Commit bfb0b80db5 ("cgroup: avoid attaching a cgroup root to two
different superblocks") is broken.  Now we try to fix the race by
delaying the initialization of cgroup root refcnt until a superblock
has been allocated.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Andrei Vagin <avagin@virtuozzo.com>
Tested-by: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-04-28 18:04:54 -04:00
Elena Reshetova
4b9502e63b kernel: convert css_set.refcount from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-03-08 17:46:03 -05:00
Tejun Heo
bfc2cf6f61 cgroup: call subsys->*attach() only for subsystems which are actually affected by migration
Currently, subsys->*attach() callbacks are called for all subsystems
which are attached to the hierarchy on which the migration is taking
place.

With cgroup_migrate_prepare_dst() filtering out identity migrations,
v1 hierarchies can avoid spurious ->*attach() callback invocations
where the source and destination csses are identical; however, this
isn't enough on v2 as only a subset of the attached controllers can be
affected on controller enable/disable.

While spurious ->*attach() invocations aren't critically broken,
they're unnecessary overhead and can lead to temporary overcharges on
certain controllers.  Fix it by tracking which subsystems are affected
by a migration and invoking ->*attach() callbacks only on those
subsystems.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>
2017-01-15 19:03:41 -05:00
Tejun Heo
e595cd7069 cgroup: track migration context in cgroup_mgctx
cgroup migration is performed in four steps - css_set preloading,
addition of target tasks, actual migration, and clean up.  A list
named preloaded_csets is used to track the preloading.  This is a bit
too restricted and the code is already depending on the subtlety that
all source css_sets appear before destination ones.

Let's create struct cgroup_mgctx which keeps track of everything
during migration.  Currently, it has separate preload lists for source
and destination csets and also embeds cgroup_taskset which is used
during the actual migration.  This moves struct cgroup_taskset
definition to cgroup-internal.h.

This patch doesn't cause any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>
2017-01-15 19:03:41 -05:00
Tejun Heo
dcfe149b9f cgroup: move namespace code to kernel/cgroup/namespace.c
get/put_css_set() get exposed in cgroup-internal.h in the process.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Acked-by: Zefan Li <lizefan@huawei.com>
2016-12-27 14:49:09 -05:00
Tejun Heo
d62beb7f3d cgroup: rename functions for consistency
Now that v1 functions are separated out, rename some functions for
consistency.

 cgroup_dfl_base_files		-> cgroup_base_files
 cgroup_legacy_base_files	-> cgroup1_base_files
 cgroup_ssid_no_v1()		-> cgroup1_ssid_disabled()
 cgroup_pidlist_destroy_all	-> cgroup1_pidlist_destroy_all()
 cgroup_release_agent()		-> cgroup1_release_agent()
 check_for_release()		-> cgroup1_check_for_release()

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Acked-by: Zefan Li <lizefan@huawei.com>
2016-12-27 14:49:08 -05:00
Tejun Heo
1592c9b223 cgroup: move v1 mount functions to kernel/cgroup/cgroup-v1.c
Now that the v1 mount code is split into separate functions, move them
to kernel/cgroup/cgroup-v1.c along with the mount option handling
code.  As this puts all v1-only kernfs_syscall_ops in cgroup-v1.c,
move cgroup1_kf_syscall_ops to cgroup-v1.c too.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Acked-by: Zefan Li <lizefan@huawei.com>
2016-12-27 14:49:08 -05:00
Tejun Heo
fa069904dd cgroup: separate out cgroup1_kf_syscall_ops
Currently, cgroup_kf_syscall_ops is shared by v1 and v2 and the
specific methods test the version and take different actions.  Split
out v1 functions and put them in cgroup1_kf_syscall_ops and remove the
now unnecessary explicit branches in specific methods.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Acked-by: Zefan Li <lizefan@huawei.com>
2016-12-27 14:49:07 -05:00
Tejun Heo
0a268dbd79 cgroup: move cgroup v1 specific code to kernel/cgroup/cgroup-v1.c
cgroup.c is getting too unwieldy.  Let's move out cgroup v1 specific
code along with the debug controller into kernel/cgroup/cgroup-v1.c.

v2: cgroup_mutex and css_set_lock made available in cgroup-internal.h
    regardless of CONFIG_PROVE_RCU.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Acked-by: Zefan Li <lizefan@huawei.com>
2016-12-27 14:49:06 -05:00