linux

Author	SHA1	Message	Date
Tejun Heo	d55262c4d1	workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity Unbound workqueues are now NUMA aware. Let's add some control knobs and update sysfs interface accordingly. * Add kernel param workqueue.numa_disable which disables NUMA affinity globally. * Replace sysfs file "pool_id" with "pool_ids" which contain node:pool_id pairs. This change is userland-visible but "pool_id" hasn't seen a release yet, so this is okay. * Add a new sysf files "numa" which can toggle NUMA affinity on individual workqueues. This is implemented as attrs->no_numa whichn is special in that it isn't part of a pool's attributes. It only affects how apply_workqueue_attrs() picks which pools to use. After "pool_ids" change, first_pwq() doesn't have any user left. Removed. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:38 -07:00
Tejun Heo	4c16bd327c	workqueue: implement NUMA affinity for unbound workqueues Currently, an unbound workqueue has single current, or first, pwq (pool_workqueue) to which all new work items are queued. This often isn't optimal on NUMA machines as workers may jump around across node boundaries and work items get assigned to workers without any regard to NUMA affinity. This patch implements NUMA affinity for unbound workqueues. Instead of mapping all entries of numa_pwq_tbl[] to the same pwq, apply_workqueue_attrs() now creates a separate pwq covering the intersecting CPUs for each NUMA node which has online CPUs in @attrs->cpumask. Nodes which don't have intersecting possible CPUs are mapped to pwqs covering whole @attrs->cpumask. As CPUs come up and go down, the pool association is changed accordingly. Changing pool association may involve allocating new pools which may fail. To avoid failing CPU_DOWN, each workqueue always keeps a default pwq which covers whole attrs->cpumask which is used as fallback if pool creation fails during a CPU hotplug operation. This ensures that all work items issued on a NUMA node is executed on the same node as long as the workqueue allows execution on the CPUs of the node. As this maps a workqueue to multiple pwqs and max_active is per-pwq, this change the behavior of max_active. The limit is now per NUMA node instead of global. While this is an actual change, max_active is already per-cpu for per-cpu workqueues and primarily used as safety mechanism rather than for active concurrency control. Concurrency is usually limited from workqueue users by the number of concurrently active work items and this change shouldn't matter much. v2: Fixed pwq freeing in apply_workqueue_attrs() error path. Spotted by Lai. v3: The previous version incorrectly made a workqueue spanning multiple nodes spread work items over all online CPUs when some of its nodes don't have any desired cpus. Reimplemented so that NUMA affinity is properly updated as CPUs go up and down. This problem was spotted by Lai Jiangshan. v4: destroy_workqueue() was putting wq->dfl_pwq and then clearing it; however, wq may be freed at any time after dfl_pwq is put making the clearing use-after-free. Clear wq->dfl_pwq before putting it. v5: apply_workqueue_attrs() was leaking @tmp_attrs, @new_attrs and @pwq_tbl after success. Fixed. Retry loop in wq_update_unbound_numa_attrs() isn't necessary as application of new attrs is excluded via CPU hotplug. Removed. Documentation on CPU affinity guarantee on CPU_DOWN added. All changes are suggested by Lai Jiangshan. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:36 -07:00
Tejun Heo	dce90d47c4	workqueue: introduce put_pwq_unlocked() Factor out lock pool, put_pwq(), unlock sequence into put_pwq_unlocked(). The two existing places are converted and there will be more with NUMA affinity support. This is to prepare for NUMA affinity support for unbound workqueues and doesn't introduce any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:35 -07:00
Tejun Heo	1befcf3073	workqueue: introduce numa_pwq_tbl_install() Factor out pool_workqueue linking and installation into numa_pwq_tbl[] from apply_workqueue_attrs() into numa_pwq_tbl_install(). link_pwq() is made safe to call multiple times. numa_pwq_tbl_install() links the pwq, installs it into numa_pwq_tbl[] at the specified node and returns the old entry. @last_pwq is removed from link_pwq() as the return value of the new function can be used instead. This is to prepare for NUMA affinity support for unbound workqueues. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:35 -07:00
Tejun Heo	e50aba9aea	workqueue: use NUMA-aware allocation for pool_workqueues Use kmem_cache_alloc_node() with @pool->node instead of kmem_cache_zalloc() when allocating a pool_workqueue so that it's allocated on the same node as the associated worker_pool. As there's no no kmem_cache_zalloc_node(), move zeroing to init_pwq(). This was suggested by Lai Jiangshan. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:35 -07:00
Tejun Heo	f147f29eb7	workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq() Break init_and_link_pwq() into init_pwq() and link_pwq() and move unbound-workqueue specific handling into apply_workqueue_attrs(). Also, factor out unbound pool and pool_workqueue allocation into alloc_unbound_pwq(). This reorganization is to prepare for NUMA affinity and doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:35 -07:00
Tejun Heo	df2d5ae499	workqueue: map an unbound workqueues to multiple per-node pool_workqueues Currently, an unbound workqueue has only one "current" pool_workqueue associated with it. It may have multple pool_workqueues but only the first pool_workqueue servies new work items. For NUMA affinity, we want to change this so that there are multiple current pool_workqueues serving different NUMA nodes. Introduce workqueue->numa_pwq_tbl[] which is indexed by NUMA node and points to the pool_workqueue to use for each possible node. This replaces first_pwq() in __queue_work() and workqueue_congested(). numa_pwq_tbl[] is currently initialized to point to the same pool_workqueue as first_pwq() so this patch doesn't make any behavior changes. v2: Use rcu_dereference_raw() in unbound_pwq_by_node() as the function may be called only with wq->mutex held. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:35 -07:00
Tejun Heo	2728fd2f09	workqueue: move hot fields of workqueue_struct to the end Move wq->flags and ->cpu_pwqs to the end of workqueue_struct and align them to the cacheline. These two fields are used in the work item issue path and thus hot. The scheduled NUMA affinity support will add dispatch table at the end of workqueue_struct and relocating these two fields will allow us hitting only single cacheline on hot paths. Note that wq->pwqs isn't moved although it currently is being used in the work item issue path for unbound workqueues. The dispatch table mentioned above will replace its use in the issue path, so it will become cold once NUMA support is implemented. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:35 -07:00
Tejun Heo	ecf6881ff3	workqueue: make workqueue->name[] fixed len Currently workqueue->name[] is of flexible length. We want to use the flexible field for something more useful and there isn't much benefit in allowing arbitrary name length anyway. Make it fixed len capping at 24 bytes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:34 -07:00
Tejun Heo	6029a91829	workqueue: add workqueue->unbound_attrs Currently, when exposing attrs of an unbound workqueue via sysfs, the workqueue_attrs of first_pwq() is used as that should equal the current state of the workqueue. The planned NUMA affinity support will make unbound workqueues make use of multiple pool_workqueues for different NUMA nodes and the above assumption will no longer hold. Introduce workqueue->unbound_attrs which records the current attrs in effect and use it for sysfs instead of first_pwq()->attrs. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:34 -07:00
Tejun Heo	f3f90ad469	workqueue: determine NUMA node of workers accourding to the allowed cpumask When worker tasks are created using kthread_create_on_node(), currently only per-cpu ones have the matching NUMA node specified. All unbound workers are always created with NUMA_NO_NODE. Now that an unbound worker pool may have an arbitrary cpumask associated with it, this isn't optimal. Add pool->node which is determined by the pool's cpumask. If the pool's cpumask is contained inside a NUMA node proper, the pool is associated with that node, and all workers of the pool are created on that node. This currently only makes difference for unbound worker pools with cpumask contained inside single NUMA node, but this will serve as foundation for making all unbound pools NUMA-affine. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:34 -07:00
Tejun Heo	e3c916a4c7	workqueue: drop 'H' from kworker names of unbound worker pools Currently, all workqueue workers which have negative nice value has 'H' postfixed to their names. This is necessary for per-cpu workers as they use the CPU number instead of pool->id to identify the pool and the 'H' postfix is the only thing distinguishing normal and highpri workers. As workers for unbound pools use pool->id, the 'H' postfix is purely informational. TASK_COMM_LEN is 16 and after the static part and delimiters, there are only five characters left for the pool and worker IDs. We're expecting to have more unbound pools with the scheduled NUMA awareness support. Let's drop the non-essential 'H' postfix from unbound kworker name. While at it, restructure kthread_create*() invocation to help future NUMA related changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:32 -07:00
Tejun Heo	bce903809a	workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[] Unbound workqueues are going to be NUMA-affine. Add wq_numa_tbl_len and wq_numa_possible_cpumask[] in preparation. The former is the highest NUMA node ID + 1 and the latter is masks of possibles CPUs for each NUMA node. This patch only introduces these. Future patches will make use of them. v2: NUMA initialization move into wq_numa_init(). Also, the possible cpumask array is not created if there aren't multiple nodes on the system. wq_numa_enabled bool added. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:32 -07:00
Tejun Heo	a892cacc7f	workqueue: move pwq_pool_locking outside of get/put_unbound_pool() The scheduled NUMA affinity support for unbound workqueues would need to walk workqueues list and pool related operations on each workqueue. Move wq_pool_mutex locking out of get/put_unbound_pool() to their callers so that pool operations can be performed while walking the workqueues list, which is also protected by wq_pool_mutex. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:32 -07:00
Tejun Heo	4862125b02	workqueue: fix memory leak in apply_workqueue_attrs() apply_workqueue_attrs() wasn't freeing temp attrs variable @new_attrs in its success path. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-04-01 11:23:31 -07:00
Tejun Heo	13e2e55601	workqueue: fix unbound workqueue attrs hashing / comparison `29c91e9912` ("workqueue: implement attribute-based unbound worker_pool management") implemented attrs based worker_pool matching. It tried to avoid false negative when comparing cpumasks with custom hash function; unfortunately, the hash and comparison functions fail to ignore CPUs which are not possible. It incorrectly assumed that bitmap_copy() skips leftover bits in the last word of bitmap and cpumask_equal() ignores impossible CPUs. This patch updates attrs->cpumask handling such that impossible CPUs are properly ignored. * Hash and copy functions no longer do anything special. They expect their callers to clear impossible CPUs. * alloc_workqueue_attrs() initializes the cpumask to cpu_possible_mask instead of setting all bits and explicit cpumask_setall() for unbound_std_wq_attrs[] in init_workqueues() is dropped. * apply_workqueue_attrs() is now responsible for ignoring impossible CPUs. It makes a copy of @attrs and clears impossible CPUs before doing anything else. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-04-01 11:23:31 -07:00
Tejun Heo	bc0caf099d	workqueue: fix race condition in unbound workqueue free path `8864b4e59` ("workqueue: implement get/put_pwq()") implemented pwq (pool_workqueue) refcnting which frees workqueue when the last pwq goes away. It determined whether it was the last pwq by testing wq->pwqs is empty. Unfortunately, the test was done outside wq->mutex and multiple pwq release could race and try to free wq multiple times leading to oops. Test wq->pwqs emptiness while holding wq->mutex. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-04-01 11:23:31 -07:00
Lai Jiangshan	b592760547	workqueue: remove pwq_lock which is no longer used To simplify locking, the previous patches expanded wq->mutex to protect all fields of each workqueue instance including the pwqs list leaving pwq_lock without any user. Remove the unused pwq_lock. tj: Rebased on top of the current dev branch. Updated description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-25 16:57:19 -07:00
Lai Jiangshan	a357fc0326	workqueue: protect wq->saved_max_active with wq->mutex We're expanding wq->mutex to cover all fields specific to each workqueue with the end goal of replacing pwq_lock which will make locking simpler and easier to understand. This patch makes wq->saved_max_active protected by wq->mutex instead of pwq_lock. As pwq_lock locking around pwq_adjust_mac_active() is no longer necessary, this patch also replaces pwq_lock lockings of for_each_pwq() around pwq_adjust_max_active() to wq->mutex. tj: Rebased on top of the current dev branch. Updated description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-25 16:57:19 -07:00
Lai Jiangshan	b09f4fd39c	workqueue: protect wq->pwqs and iteration with wq->mutex We're expanding wq->mutex to cover all fields specific to each workqueue with the end goal of replacing pwq_lock which will make locking simpler and easier to understand. init_and_link_pwq() and pwq_unbound_release_workfn() already grab wq->mutex when adding or removing a pwq from wq->pwqs list. This patch makes it official that the list is wq->mutex protected for writes and updates readers accoridingly. Explicit IRQ toggles for sched-RCU read-locking in flush_workqueue_prep_pwqs() and drain_workqueues() are removed as the surrounding wq->mutex can provide sufficient synchronization. Also, assert_rcu_or_pwq_lock() is renamed to assert_rcu_or_wq_mutex() and checks for wq->mutex too. pwq_lock locking and assertion are not removed by this patch and a couple of for_each_pwq() iterations are still protected by it. They'll be removed by future patches. tj: Rebased on top of the current dev branch. Updated description. Folded in assert_rcu_or_wq_mutex() renaming from a later patch along with associated comment updates. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-25 16:57:18 -07:00
Lai Jiangshan	87fc741e94	workqueue: protect wq->nr_drainers and ->flags with wq->mutex We're expanding wq->mutex to cover all fields specific to each workqueue with the end goal of replacing pwq_lock which will make locking simpler and easier to understand. wq->nr_drainers and ->flags are specific to each workqueue. Protect ->nr_drainers and ->flags with wq->mutex instead of pool_mutex. tj: Rebased on top of the current dev branch. Updated description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-25 16:57:18 -07:00
Lai Jiangshan	3c25a55daa	workqueue: rename wq->flush_mutex to wq->mutex Currently pwq->flush_mutex protects many fields of a workqueue including, especially, the pwqs list. We're going to expand this mutex to protect most of a workqueue and eventually replace pwq_lock, which will make locking simpler and easier to understand. Drop the "flush_" prefix in preparation. This patch is pure rename. tj: Rebased on top of the current dev branch. Updated description. Use WQ: and WR: instead of Q: and QR: for synchronization labels. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-25 16:57:17 -07:00
Lai Jiangshan	68e13a67dd	workqueue: rename wq_mutex to wq_pool_mutex wq->flush_mutex will be renamed to wq->mutex and cover all fields specific to each workqueue and eventually replace pwq_lock, which will make locking simpler and easier to understand. Rename wq_mutex to wq_pool_mutex to avoid confusion with wq->mutex. After the scheduled changes, wq_pool_mutex won't be protecting anything specific to each workqueue instance anyway. This patch is pure rename. tj: s/wqs_mutex/wq_pool_mutex/. Rewrote description. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-25 16:57:17 -07:00
Lai Jiangshan	519e3c1163	workqueue: avoid false negative in assert_manager_or_pool_lock() If lockdep complains something for other subsystem, lockdep_is_held() can be false negative, so we need to also test debug_locks before triggering WARN. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-20 11:21:34 -07:00
Lai Jiangshan	881094532e	workqueue: use rcu_read_lock_sched() instead for accessing pwq in RCU rcu_read_lock_sched() is better than preempt_disable() if the code is protected by RCU_SCHED. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-20 11:00:57 -07:00
Lai Jiangshan	951a078a52	workqueue: kick a worker in pwq_adjust_max_active() If pwq_adjust_max_active() changes max_active from 0 to saved_max_active, it needs to wakeup worker. This is already done by thaw_workqueues(). If pwq_adjust_max_active() increases max_active for an unbound wq, while not strictly necessary for correctness, it's still desirable to wake up a worker so that the requested concurrency level is reached sooner. Move wake_up_worker() call from thaw_workqueues() to pwq_adjust_max_active() so that it can handle both of the above two cases. This also makes thaw_workqueues() simpler. tj: Updated comments and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-20 10:52:30 -07:00
Lai Jiangshan	6a092dfd51	workqueue: simplify current_is_workqueue_rescuer() We can test worker->recue_wq instead of reaching into current_pwq->wq->rescuer and then comparing it to self. tj: Commit message. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-20 10:40:25 -07:00
Lai Jiangshan	12ee4fc67c	workqueue: add missing POOL_FREEZING get_unbound_pool() forgot to set POOL_FREEZING if workqueue_freezing is set and a new pool could go out of sync with the global freezing state. Fix it by adding POOL_FREEZING if workqueue_freezing. wq_mutex is already held so no further locking is necessary. This also removes the unused static variable warning when !CONFIG_FREEZER. tj: Updated commit message. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-20 10:19:09 -07:00
Tejun Heo	7dbc725e47	workqueue: restore CPU affinity of unbound workers on CPU_ONLINE With the recent addition of the custom attributes support, unbound pools may have allowed cpumask which isn't full. As long as some of CPUs in the cpumask are online, its workers will maintain cpus_allowed as set on worker creation; however, once no online CPU is left in cpus_allowed, the scheduler will reset cpus_allowed of any workers which get scheduled so that they can execute. To remain compliant to the user-specified configuration, CPU affinity needs to be restored when a CPU becomes online for an unbound pool which doesn't currently have any online CPUs before. This patch implement restore_unbound_workers_cpumask(), which is called from CPU_ONLINE for all unbound pools, checks whether the coming up CPU is the first allowed online one, and, if so, invokes set_cpus_allowed_ptr() with the configured cpumask on all workers. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-19 13:45:21 -07:00
Tejun Heo	a9ab775bca	workqueue: directly restore CPU affinity of workers from CPU_ONLINE Rebinding workers of a per-cpu pool after a CPU comes online involves a lot of back-and-forth mostly because only the task itself could adjust CPU affinity if PF_THREAD_BOUND was set. As CPU_ONLINE itself couldn't adjust affinity, it had to somehow coerce the workers themselves to perform set_cpus_allowed_ptr(). Due to the various states a worker can be in, this led to three different paths a worker may be rebound. worker->rebind_work is queued to busy workers. Idle ones are signaled by unlinking worker->entry and call idle_worker_rebind(). The manager isn't covered by either and implements its own mechanism. PF_THREAD_BOUND has been relaced with PF_NO_SETAFFINITY and CPU_ONLINE itself now can manipulate CPU affinity of workers. This patch replaces the existing rebind mechanism with direct one where CPU_ONLINE iterates over all workers using for_each_pool_worker(), restores CPU affinity, and clears WORKER_UNBOUND. There are a couple subtleties. All bound idle workers should have their runqueues set to that of the bound CPU; however, if the target task isn't running, set_cpus_allowed_ptr() just updates the cpus_allowed mask deferring the actual migration to when the task wakes up. This is worked around by waking up idle workers after restoring CPU affinity before any workers can become bound. Another subtlety is stems from matching @pool->nr_running with the number of running unbound workers. While DISASSOCIATED, all workers are unbound and nr_running is zero. As workers become bound again, nr_running needs to be adjusted accordingly; however, there is no good way to tell whether a given worker is running without poking into scheduler internals. Instead of clearing UNBOUND directly, rebind_workers() replaces UNBOUND with another new NOT_RUNNING flag - REBOUND, which will later be cleared by the workers themselves while preparing for the next round of work item execution. The only change needed for the workers is clearing REBOUND along with PREP. * This patch leaves for_each_busy_worker() without any user. Removed. * idle_worker_rebind(), busy_worker_rebind_fn(), worker->rebind_work and rebind logic in manager_workers() removed. * worker_thread() now looks at WORKER_DIE instead of testing whether @worker->entry is empty to determine whether it needs to do something special as dying is the only special thing now. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-19 13:45:21 -07:00
Tejun Heo	bd7c089eb2	workqueue: relocate rebind_workers() rebind_workers() will be reimplemented in a way which makes it mostly decoupled from the rest of worker management. Move rebind_workers() so that it's located with other CPU hotplug related functions. This patch is pure function relocation. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-19 13:45:21 -07:00
Tejun Heo	822d8405d1	workqueue: convert worker_pool->worker_ida to idr and implement for_each_pool_worker() Make worker_ida an idr - worker_idr and use it to implement for_each_pool_worker() which will be used to simplify worker rebinding on CPU_ONLINE. pool->worker_idr is protected by both pool->manager_mutex and pool->lock so that it can be iterated while holding either lock. * create_worker() allocates ID without installing worker pointer and installs the pointer later using idr_replace(). This is because worker ID is needed when creating the actual task to name it and the new worker shouldn't be visible to iterations before fully initialized. * In destroy_worker(), ID removal is moved before kthread_stop(). This is again to guarantee that only fully working workers are visible to for_each_pool_worker(). Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-19 13:45:21 -07:00
Tejun Heo	14a40ffccd	sched: replace PF_THREAD_BOUND with PF_NO_SETAFFINITY PF_THREAD_BOUND was originally used to mark kernel threads which were bound to a specific CPU using kthread_bind() and a task with the flag set allows cpus_allowed modifications only to itself. Workqueue is currently abusing it to prevent userland from meddling with cpus_allowed of workqueue workers. What we need is a flag to prevent userland from messing with cpus_allowed of certain kernel tasks. In kernel, anyone can (incorrectly) squash the flag, and, for worker-type usages, restricting cpus_allowed modification to the task itself doesn't provide meaningful extra proection as other tasks can inject work items to the task anyway. This patch replaces PF_THREAD_BOUND with PF_NO_SETAFFINITY. sched_setaffinity() checks the flag and return -EINVAL if set. set_cpus_allowed_ptr() is no longer affected by the flag. This will allow simplifying workqueue worker CPU affinity management. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>	2013-03-19 13:45:20 -07:00
Tejun Heo	2e109a2855	workqueue: rename workqueue_lock to wq_mayday_lock With the recent locking updates, the only thing protected by workqueue_lock is workqueue->maydays list. Rename workqueue_lock to wq_mayday_lock. This patch is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 19:47:40 -07:00
Tejun Heo	794b18bc8a	workqueue: separate out pool_workqueue locking into pwq_lock This patch continues locking cleanup from the previous patch. It breaks out pool_workqueue synchronization from workqueue_lock into a new spinlock - pwq_lock. The followings are protected by pwq_lock. * workqueue->pwqs * workqueue->saved_max_active The conversion is straight-forward. workqueue_lock usages which cover the above two are converted to pwq_lock. New locking label PW added for things protected by pwq_lock and FR is updated to mean flush_mutex + pwq_lock + sched-RCU. This patch shouldn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 19:47:40 -07:00
Tejun Heo	5bcab3355a	workqueue: separate out pool and workqueue locking into wq_mutex Currently, workqueue_lock protects most shared workqueue resources - the pools, workqueues, pool_workqueues, draining, ID assignments, mayday handling and so on. The coverage has grown organically and there is no identified bottleneck coming from workqueue_lock, but it has grown a bit too much and scheduled rebinding changes need the pools and workqueues to be protected by a mutex instead of a spinlock. This patch breaks out pool and workqueue synchronization from workqueue_lock into a new mutex - wq_mutex. The followings are protected by wq_mutex. * worker_pool_idr and unbound_pool_hash * pool->refcnt * workqueues list * workqueue->flags, ->nr_drainers Most changes are mostly straight-forward. workqueue_lock is replaced with wq_mutex where applicable and workqueue_lock lock/unlocks are added where wq_mutex conversion leaves data structures not protected by wq_mutex without locking. irq / preemption flippings were added where the conversion affects them. Things worth noting are * New WQ and WR locking lables added along with assert_rcu_or_wq_mutex(). * worker_pool_assign_id() now expects to be called under wq_mutex. * create_mutex is removed from get_unbound_pool(). It now just holds wq_mutex. This patch shouldn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 19:47:40 -07:00
Tejun Heo	7d19c5ce66	workqueue: relocate global variable defs and function decls in workqueue.c They're split across debugobj code for some reason. Collect them. This patch is pure relocation. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 19:47:40 -07:00
Tejun Heo	cd549687a7	workqueue: better define locking rules around worker creation / destruction When a manager creates or destroys workers, the operations are always done with the manager_mutex held; however, initial worker creation or worker destruction during pool release don't grab the mutex. They are still correct as initial worker creation doesn't require synchronization and grabbing manager_arb provides enough exclusion for pool release path. Still, let's make everyone follow the same rules for consistency and such that lockdep annotations can be added. Update create_and_start_worker() and put_unbound_pool() to grab manager_mutex around thread creation and destruction respectively and add lockdep assertions to create_worker() and destroy_worker(). This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 19:47:39 -07:00
Tejun Heo	ebf44d16ec	workqueue: factor out initial worker creation into create_and_start_worker() get_unbound_pool(), workqueue_cpu_up_callback() and init_workqueues() have similar code pieces to create and start the initial worker factor those out into create_and_start_worker(). This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 19:47:39 -07:00
Tejun Heo	bc3a1afc92	workqueue: rename worker_pool->assoc_mutex to ->manager_mutex Manager operations are currently governed by two mutexes - pool->manager_arb and ->assoc_mutex. The former is used to decide who gets to be the manager and the latter to exclude the actual manager operations including creation and destruction of workers. Anyone who grabs ->manager_arb must perform manager role; otherwise, the pool might stall. Grabbing ->assoc_mutex blocks everyone else from performing manager operations but doesn't require the holder to perform manager duties as it's merely blocking manager operations without becoming the manager. Because the blocking was necessary when [dis]associating per-cpu workqueues during CPU hotplug events, the latter was named assoc_mutex. The mutex is scheduled to be used for other purposes, so this patch gives it a more fitting generic name - manager_mutex - and updates / adds comments to explain synchronization around the manager role and operations. This patch is pure rename / doc update. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 19:47:39 -07:00
Tejun Heo	8425e3d5bd	workqueue: inline trivial wrappers There's no reason to make these trivial wrappers full (exported) functions. Inline the followings. queue_work() queue_delayed_work() mod_delayed_work() schedule_work_on() schedule_work() schedule_delayed_work_on() schedule_delayed_work() keventd_up() Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 16:51:36 -07:00
Tejun Heo	611c92a020	workqueue: rename @id to @pi in for_each_each_pool() Rename @id argument of for_each_pool() to @pi so that it doesn't get reused accidentally when for_each_pool() is used in combination with other iterators. This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 16:51:36 -07:00
Tejun Heo	c5aa87bbf4	workqueue: update comments and a warning message * Update incorrect and add missing synchronization labels. * Update incorrect or misleading comments. Add new comments where clarification is necessary. Reformat / rephrase some comments. * drain_workqueue() can be used separately from destroy_workqueue() but its warning message was incorrectly referring to destruction. Other than the warning message change, this patch doesn't make any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 16:51:36 -07:00
Tejun Heo	983ca25e73	workqueue: fix max_active handling in init_and_link_pwq() Since `9e8cd2f589` ("workqueue: implement apply_workqueue_attrs()"), init_and_link_pwq() may be called to initialize a new pool_workqueue for a workqueue which is already online, but the function was setting pwq->max_active to wq->saved_max_active without proper synchronization. Fix it by calling pwq_adjust_max_active() under proper locking instead of manually setting max_active. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 16:51:35 -07:00
Tejun Heo	699ce097ef	workqueue: implement and use pwq_adjust_max_active() Rename pwq_set_max_active() to pwq_adjust_max_active() and move pool_workqueue->max_active synchronization and max_active determination logic into it. The new function should be called with workqueue_lock held for stable workqueue->saved_max_active, determines the current max_active value the target pool_workqueue should be using from @wq->saved_max_active and the state of the associated pool, and applies it with proper synchronization. The current two users - workqueue_set_max_active() and thaw_workqueues() - are updated accordingly. In addition, the manual freezing handling in __alloc_workqueue_key() and freeze_workqueues_begin() are replaced with calls to pwq_adjust_max_active(). This centralizes max_active handling so that it's less error-prone. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 16:51:35 -07:00
Tejun Heo	0fbd95aa8a	workqueue: relocate pwq_set_max_active() pwq_set_max_active() is gonna be modified and used during pool_workqueue init. Move it above init_and_link_pwq(). This patch is pure code reorganization and doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-13 16:51:35 -07:00
Tejun Heo	e626761691	workqueue: implement current_is_workqueue_rescuer() Implement a function which queries whether it currently is running off a workqueue rescuer. This will be used to convert writeback to workqueue. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-12 17:42:01 -07:00
Tejun Heo	226223ab3c	workqueue: implement sysfs interface for workqueues There are cases where workqueue users want to expose control knobs to userland. e.g. Unbound workqueues with custom attributes are scheduled to be used for writeback workers and depending on configuration it can be useful to allow admins to tinker with the priority or allowed CPUs. This patch implements workqueue_sysfs_register(), which makes the workqueue visible under /sys/bus/workqueue/devices/WQ_NAME. There currently are two attributes common to both per-cpu and unbound pools and extra attributes for unbound pools including nice level and cpumask. If alloc_workqueue*() is called with WQ_SYSFS, workqueue_sysfs_register() is called automatically as part of workqueue creation. This is the preferred method unless the workqueue user wants to apply workqueue_attrs before making the workqueue visible to userland. v2: Disallow exposing ordered workqueues as ordered workqueues can't be tuned in any way. Signed-off-by: Tejun Heo <tj@kernel.org>	2013-03-12 11:37:07 -07:00
Tejun Heo	8719dceae2	workqueue: reject adjusting max_active or applying attrs to ordered workqueues Adjusting max_active of or applying new workqueue_attrs to an ordered workqueue breaks its ordering guarantee. The former is obvious. The latter is because applying attrs creates a new pwq (pool_workqueue) and there is no ordering constraint between the old and new pwqs. Make apply_workqueue_attrs() and workqueue_set_max_active() trigger WARN_ON() if those operations are requested on an ordered workqueue and fail / ignore respectively. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:04 -07:00
Tejun Heo	618b01eb42	workqueue: make it clear that WQ_DRAINING is an internal flag We're gonna add another internal WQ flag. Let's make the distinction clear. Prefix WQ_DRAINING with __ and move it to bit 16. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>	2013-03-12 11:30:04 -07:00

1 2 3 4 5 ...

15156 Commits