linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-26 22:21:42 +00:00

Author	SHA1	Message	Date
Paul E. McKenney	dd7dafd1ad	rcu: Make kernel-mode nohz_full CPUs invoke the RCU core processing If a nohz_full CPU is idle or executing in userspace, it makes good sense to keep it out of RCU core processing. After all, the RCU grace-period kthread can see its quiescent states and all of its callbacks are offloaded, so there is nothing for RCU core processing to do. However, if a nohz_full CPU is executing in kernel space, the RCU grace-period kthread cannot do anything for it, so such a CPU must report its own quiescent states. This commit therefore makes nohz_full CPUs skip RCU core processing only if the scheduler-clock interrupt caught them in idle or in userspace. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2019-10-28 07:02:21 -07:00
Paul E. McKenney	ed93dfc6bc	rcu: Confine ->core_needs_qs accesses to the corresponding CPU Commit `671a63517c` ("rcu: Avoid unnecessary softirq when system is idle") fixed a bug that could result in an indefinite number of unnecessary invocations of the RCU_SOFTIRQ handler at the trailing edge of a scheduler-clock interrupt. However, the fix introduced off-CPU stores to ->core_needs_qs. These writes did not conflict with the on-CPU stores because the CPU's leaf rcu_node structure's ->lock was held across all such stores. However, the loads from ->core_needs_qs were not promoted to READ_ONCE() and, worse yet, the code loading from ->core_needs_qs was written assuming that it was only ever updated by the corresponding CPU. So operation has been robust, but only by luck. This situation is therefore an accident waiting to happen. This commit therefore takes a different approach. Instead of clearing ->core_needs_qs from the grace-period kthread's force-quiescent-state processing, it modifies the rcu_pending() function to suppress the rcu_sched_clock_irq() function's call to invoke_rcu_core() if there is no grace period in progress. This avoids the infinite needless RCU_SOFTIRQ handlers while still keeping all accesses to ->core_needs_qs local to the corresponding CPU. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2019-10-28 07:02:21 -07:00
Joel Fernandes (Google)	516e5ae0c9	rcu: Reset CPU hints when reporting a quiescent state In some cases, tracing shows that need_heavy_qs is still set even though urgent_qs was cleared upon reporting of a quiescent state. One such case is when the softirq reports that a CPU has passed quiescent state. Commit `671a63517c` ("rcu: Avoid unnecessary softirq when system is idle") fixed a bug where core_needs_qs was not being cleared. In order to avoid running into similar situations with the urgent-grace-period flags, this commit causes rcu_disable_urgency_upon_qs(), previously rcu_disable_tick_upon_qs(), to clear the urgency hints, ->rcu_urgent_qs and ->rcu_need_heavy_qs. Note that it is possible for CPUs to go offline with these urgency hints still set. This is handled because rcu_disable_urgency_upon_qs() is also invoked during the online process. Because these hints can be cleared both by the corresponding CPU and by the grace-period kthread, this commit also adds a number of READ_ONCE() and WRITE_ONCE() calls. Tested overnight with rcutorture running for 60 minutes on all configurations of RCU. Signed-off-by: "Joel Fernandes (Google)" <joel@joelfernandes.org> [ paulmck: Clear urgency flags in rcu_disable_urgency_upon_qs(). ] [ paulmck: Remove ->core_needs_qs from the set cleared at quiescent state. ] [ paulmck: Make rcu_disable_urgency_upon_qs static per kbuild test robot. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2019-10-28 07:02:21 -07:00
Paul E. McKenney	b200a04895	rcu: Force nohz_full tick on upon irq enter instead of exit There is interrupt-exit code that forces on the tick for nohz_full CPUs failing to respond to the current grace period in a timely fashion. However, this code must compare ->dynticks_nmi_nesting to the value 2 in the interrupt-exit fastpath. This commit therefore moves this code to the interrupt-entry fastpath, where a lighter-weight comparison to zero may be used. Reported-by: Joel Fernandes <joel@joelfernandes.org> [ paulmck: Apply Joel Fernandes TICK_DEP_MASK_RCU->TICK_DEP_BIT_RCU fix. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2019-10-28 07:02:21 -07:00
Paul E. McKenney	66e4c33b51	rcu: Force tick on for nohz_full CPUs not reaching quiescent states CPUs running for long time periods in the kernel in nohz_full mode might leave the scheduling-clock interrupt disabled for then full duration of their in-kernel execution. This can (among other things) delay grace periods. This commit therefore forces the tick back on for any nohz_full CPU that is failing to pass through a quiescent state upon return from interrupt, which the resched_cpu() will induce. Reported-by: Joel Fernandes <joel@joelfernandes.org> [ paulmck: Clear ->rcu_forced_tick as reported by Joel Fernandes testing. ] [ paulmck: Apply Joel Fernandes TICK_DEP_MASK_RCU->TICK_DEP_BIT_RCU fix. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2019-10-28 07:02:21 -07:00
Paul E. McKenney	79ba7ff5a9	rcutorture: Emulate dyntick aspect of userspace nohz_full sojourn During an actual call_rcu() flood, there would be frequent trips to userspace (in-kernel call_rcu() floods must be otherwise housebroken). Userspace execution on nohz_full CPUs implies an RCU dyntick idle/not-idle transition pair, so this commit adds emulation of that pair. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2019-10-05 10:46:05 -07:00
Paul E. McKenney	96926686de	rcu: Make CPU-hotplug removal operations enable tick CPU-hotplug removal operations run the multi_cpu_stop() function, which relies on the scheduler to gain control from whatever is running on the various online CPUs, including any nohz_full CPUs running long loops in kernel-mode code. Lack of the scheduler-clock interrupt on such CPUs can delay multi_cpu_stop() for several minutes and can also result in RCU CPU stall warnings. This commit therefore causes CPU-hotplug removal operations to enable the scheduler-clock interrupt on all online CPUs. [ paulmck: Apply Joel Fernandes TICK_DEP_MASK_RCU->TICK_DEP_BIT_RCU fix. ] [ paulmck: Apply simplifications suggested by Frederic Weisbecker. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2019-10-05 10:46:05 -07:00
Paul E. McKenney	366237e7b0	stop_machine: Provide RCU quiescent state in multi_cpu_stop() When multi_cpu_stop() loops waiting for other tasks, it can trigger an RCU CPU stall warning. This can be misleading because what is instead needed is information on whatever task is blocking multi_cpu_stop(). This commit therefore inserts an RCU quiescent state into the multi_cpu_stop() function's waitloop. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2019-10-05 10:46:05 -07:00
Paul E. McKenney	6a949b7af8	rcu: Force on tick when invoking lots of callbacks Callback invocation can run for a significant time period, and within CONFIG_NO_HZ_FULL=y kernels, this period will be devoid of scheduler-clock interrupts. In-kernel execution without such interrupts can cause all manner of malfunction, with RCU CPU stall warnings being but one result. This commit therefore forces scheduling-clock interrupts on whenever more than a few RCU callbacks are invoked. Because offloaded callback invocation can be preempted, this forcing is withdrawn on each context switch. This in turn requires that the loop invoking RCU callbacks reiterate the forcing periodically. [ paulmck: Apply Joel Fernandes TICK_DEP_MASK_RCU->TICK_DEP_BIT_RCU fix. ] [ paulmck: Remove NO_HZ_FULL check per Frederic Weisbecker feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2019-10-05 10:46:04 -07:00
Linus Torvalds	7e67a85999	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann, Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers. As perf and the scheduler is getting bigger and more complex, document the status quo of current responsibilities and interests, and spread the review pain^H^H^H^H fun via an increase in the Cc: linecount generated by scripts/get_maintainer.pl. :-) - Add another series of patches that brings the -rt (PREEMPT_RT) tree closer to mainline: split the monolithic CONFIG_PREEMPT dependencies into a new CONFIG_PREEMPTION category that will allow the eventual introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches to go though. - Extend the CPU cgroup controller with uclamp.min and uclamp.max to allow the finer shaping of CPU bandwidth usage. - Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS). - Improve the behavior of high CPU count, high thread count applications running under cpu.cfs_quota_us constraints. - Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present. - Improve CPU isolation housekeeping CPU allocation NUMA locality. - Fix deadline scheduler bandwidth calculations and logic when cpusets rebuilds the topology, or when it gets deadline-throttled while it's being offlined. - Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from setscheduler() system calls without creating global serialization. Add new synchronization between cpuset topology-changing events and the deadline acceptance tests in setscheduler(), which were broken before. - Rework the active_mm state machine to be less confusing and more optimal. - Rework (simplify) the pick_next_task() slowpath. - Improve load-balancing on AMD EPYC systems. - ... and misc cleanups, smaller fixes and improvements - please see the Git log for more details. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) sched/psi: Correct overly pessimistic size calculation sched/fair: Speed-up energy-aware wake-ups sched/uclamp: Always use 'enum uclamp_id' for clamp_id values sched/uclamp: Update CPU's refcount on TG's clamp changes sched/uclamp: Use TG's clamps to restrict TASK's clamps sched/uclamp: Propagate system defaults to the root group sched/uclamp: Propagate parent clamps sched/uclamp: Extend CPU's cgroup controller sched/topology: Improve load balancing on AMD EPYC systems arch, ia64: Make NUMA select SMP sched, perf: MAINTAINERS update, add submaintainers and reviewers sched/fair: Use rq_lock/unlock in online_fair_sched_group cpufreq: schedutil: fix equation in comment sched: Rework pick_next_task() slow-path sched: Allow put_prev_task() to drop rq->lock sched/fair: Expose newidle_balance() sched: Add task_struct pointer to sched_class::set_curr_task sched: Rework CPU hotplug task selection sched/{rt,deadline}: Fix set_next_task vs pick_next_task sched: Fix kerneldoc comment for ia64_set_curr_task ...	2019-09-16 17:25:49 -07:00
Ingo Molnar	563c4f85f9	Merge branch 'sched/rt' into sched/core, to pick up -rt changes Pick up the first couple of patches working towards PREEMPT_RT. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-09-16 14:05:04 +02:00
Eric Dumazet	cfcdef5e30	rcu: Allow rcu_do_batch() to dynamically adjust batch sizes Bimodal behavior of rcu_do_batch() is not really suited to Google applications like gfe servers. When a process with millions of sockets exits, closing all files queues two rcu callbacks per socket. This eventually reaches the point where RCU enters an emergency mode, where rcu_do_batch() do not return until whole queue is flushed. Each rcu callback lasts at least 70 nsec, so with millions of elements, we easily spend more than 100 msec without rescheduling. Goal of this patch is to avoid the infamous message like following "need_resched set for > 51999388 ns (52 ticks) without schedule" We dynamically adjust the number of elements we process, instead of 10 / INFINITE choices, we use a floor of ~1 % of current entries. If the number is above 1000, we switch to a time based limit of 3 msec per batch, adjustable with /sys/module/rcutree/parameters/rcu_resched_ns Signed-off-by: Eric Dumazet <edumazet@google.com> [ paulmck: Forward-port and remove debug statements. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:38:24 -07:00
Paul E. McKenney	23651d9b96	rcu/nocb: Advance CBs after merge in rcutree_migrate_callbacks() The rcutree_migrate_callbacks() invokes rcu_advance_cbs() on both the offlined CPU's ->cblist and that of the surviving CPU, then merges them. However, after the merge, and of the offlined CPU's callbacks that were not ready to be invoked will no longer be associated with a grace-period number. This commit therefore invokes rcu_advance_cbs() one more time on the merged ->cblist in order to assign a grace-period number to these callbacks. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:38:24 -07:00
Paul E. McKenney	d1b222c6be	rcu/nocb: Add bypass callback queueing Use of the rcu_data structure's segmented ->cblist for no-CBs CPUs takes advantage of unrelated grace periods, thus reducing the memory footprint in the face of floods of call_rcu() invocations. However, the ->cblist field is a more-complex rcu_segcblist structure which must be protected via locking. Even though there are only three entities which can acquire this lock (the CPU invoking call_rcu(), the no-CBs grace-period kthread, and the no-CBs callbacks kthread), the contention on this lock is excessive under heavy stress. This commit therefore greatly reduces contention by provisioning an rcu_cblist structure field named ->nocb_bypass within the rcu_data structure. Each no-CBs CPU is permitted only a limited number of enqueues onto the ->cblist per jiffy, controlled by a new nocb_nobypass_lim_per_jiffy kernel boot parameter that defaults to about 16 enqueues per millisecond (16 * 1000 / HZ). When that limit is exceeded, the CPU instead enqueues onto the new ->nocb_bypass. The ->nocb_bypass is flushed into the ->cblist every jiffy or when the number of callbacks on ->nocb_bypass exceeds qhimark, whichever happens first. During call_rcu() floods, this flushing is carried out by the CPU during the course of its call_rcu() invocations. However, a CPU could simply stop invoking call_rcu() at any time. The no-CBs grace-period kthread therefore carries out less-aggressive flushing (every few jiffies or when the number of callbacks on ->nocb_bypass exceeds (2 * qhimark), whichever comes first). This means that the no-CBs grace-period kthread cannot be permitted to do unbounded waits while there are callbacks on ->nocb_bypass. A ->nocb_bypass_timer is used to provide the needed wakeups. [ paulmck: Apply Coverity feedback reported by Colin Ian King. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:37:32 -07:00
Paul E. McKenney	6608c3a027	rcu/nocb: Reduce contention at no-CBs registry-time CB advancement Currently, __call_rcu_nocb_wake() conditionally acquires the leaf rcu_node structure's ->lock, and only afterwards does rcu_advance_cbs_nowake() check to see if it is possible to advance callbacks without potentially needing to awaken the grace-period kthread. Given that the no-awaken check can be done locklessly, this commit reverses the order, so that rcu_advance_cbs_nowake() is invoked without holding the leaf rcu_node structure's ->lock and rcu_advance_cbs_nowake() checks the grace-period state before conditionally acquiring that lock, thus reducing the number of needless acquistions of the leaf rcu_node structure's ->lock. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	7f36ef82e5	rcu/nocb: Avoid needless wakeups of no-CBs grace-period kthread Currently, the code provides an extra wakeup for the no-CBs grace-period kthread if one of its CPUs is generating excessive numbers of callbacks. But satisfying though it is to wake something up when things are going south, unless the thing being awakened can actually help solve the problem, that extra wakeup does nothing but consume additional CPU time, which is exactly what you don't want during a call_rcu() flood. This commit therefore avoids doing anything if the corresponding no-CBs callback kthread is going full tilt. Otherwise, if advancing callbacks immediately might help and if the leaf rcu_node structure's lock is immediately available, this commit invokes a new variant of rcu_advance_cbs() that advances callbacks only if doing so won't require awakening the grace-period kthread (not to be confused with any of the no-CBs grace-period kthreads). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	921bb5fad1	rcu/nocb: Use build-time no-CBs check in rcu_pending() Currently, rcu_pending() invokes rcu_segcblist_is_offloaded() even in CONFIG_RCU_NOCB_CPU=n kernels, which cannot possibly be offloaded. Given that rcu_pending() is on a fastpath, it makes sense to check for CONFIG_RCU_NOCB_CPU=y before invoking rcu_segcblist_is_offloaded(). This commit therefore makes this change. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	c1ab99d66e	rcu/nocb: Use build-time no-CBs check in rcu_core() Currently, rcu_core() invokes rcu_segcblist_is_offloaded() each time it needs to know whether the current CPU is a no-CBs CPU. Given that it is not possible to change the no-CBs status of a CPU after boot, and given that it is not possible to even have no-CBs CPUs in CONFIG_RCU_NOCB_CPU=n kernels, this repeated runtime invocation wastes CPU. This commit therefore created a const on-stack variable to allow this check to be done only once per rcu_core() invocation. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	ec5ef87bac	rcu/nocb: Use build-time no-CBs check in rcu_do_batch() Currently, rcu_do_batch() invokes rcu_segcblist_is_offloaded() each time it needs to know whether the current CPU is a no-CBs CPU. Given that it is not possible to change the no-CBs status of a CPU after boot, and given that it is not possible to even have no-CBs CPUs in CONFIG_RCU_NOCB_CPU=n kernels, this per-callback invocation wastes CPU. This commit therefore created a const on-stack variable to allow this check to be done only once per rcu_do_batch() invocation. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	c035280f17	rcu/nocb: Remove obsolete nocb_q_count and nocb_q_count_lazy fields This commit removes the obsolete nocb_q_count and nocb_q_count_lazy fields, also removing rcu_get_n_cbs_nocb_cpu(), adjusting rcu_get_n_cbs_cpu(), and making rcutree_migrate_callbacks() once again disable the ->cblist fields of offline CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	5d6742b377	rcu/nocb: Use rcu_segcblist for no-CBs CPUs Currently the RCU callbacks for no-CBs CPUs are queued on a series of ad-hoc linked lists, which means that these callbacks cannot benefit from "drive-by" grace periods, thus suffering needless delays prior to invocation. In addition, the no-CBs grace-period kthreads first wait for callbacks to appear and later wait for a new grace period, which means that callbacks appearing during a grace-period wait can be delayed. These delays increase memory footprint, and could even result in an out-of-memory condition. This commit therefore enqueues RCU callbacks from no-CBs CPUs on the rcu_segcblist structure that is already used by non-no-CBs CPUs. It also restructures the no-CBs grace-period kthread to be checking for incoming callbacks while waiting for grace periods. Also, instead of waiting for a new grace period, it waits for the closest grace period that will cause some of the callbacks to be safe to invoke. All of these changes reduce callback latency and thus the number of outstanding callbacks, in turn reducing the probability of an out-of-memory condition. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	e83e73f5b0	rcu/nocb: Leave ->cblist enabled for no-CBs CPUs As a first step towards making no-CBs CPUs use the ->cblist, this commit leaves the ->cblist enabled for these CPUs. The main reason to make no-CBs CPUs use ->cblist is to take advantage of callback numbering, which will reduce the effects of missed grace periods which in turn will reduce forward-progress problems for no-CBs CPUs. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	85f69b3212	rcu/nocb: Check for deferred nocb wakeups before nohz_full early exit In theory, a timer is used to defer wakeups of no-CBs grace-period kthreads when the wakeup cannot be done safely directly from the call_rcu(). In practice, the one-jiffy delay is not always consistent with timely callback invocation under heavy call_rcu() loads. Therefore, there are a number of checks for a pending deferred wakeup, including from the scheduling-clock interrupt. Unfortunately, this check follows the rcu_nohz_full_cpu() early exit, which renders it useless on such CPUs. This commit therefore moves the check for the pending deferred no-CB wakeup to precede the rcu_nohz_full_cpu() early exit. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	c00045be32	rcu/nocb: Make rcutree_migrate_callbacks() start at leaf rcu_node structure Because rcutree_migrate_callbacks() is invoked infrequently and because an exact snapshot of the grace-period state might save some callbacks a second trip through a grace period, this function has used the root rcu_node structure. However, this safe-second-trip optimization happens only if rcutree_migrate_callbacks() races with grace-period initialization, so it is not worth the added mental load. This commit therefore makes rcutree_migrate_callbacks() start with the leaf rcu_node structures, as is done elsewhere. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	750d7f6a43	rcu/nocb: Add checks for offloaded callback processing This commit is a preparatory patch for offloaded callbacks using the same ->cblist structure used by non-offloaded callbacks. It therefore adds rcu_segcblist_is_offloaded() calls where they will be needed when !rcu_segcblist_is_enabled() no longer flags the offloaded case. It also adds checks in rcu_do_batch() to ensure that there are no missed checks: Currently, it should not be possible for offloaded execution to reach rcu_do_batch(), though this will change later in this series. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Paul E. McKenney	ce5215c134	rcu/nocb: Use separate flag to indicate offloaded ->cblist RCU callback processing currently uses rcu_is_nocb_cpu() to determine whether or not the current CPU's callbacks are to be offloaded. This works, but it is not so good for cache locality. Plus use of ->cblist for offloaded callbacks will greatly increase the frequency of these checks. This commit therefore adds a ->offloaded flag to the rcu_segcblist structure to provide a more flexible and cache-friendly means of checking for callback offloading. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-08-13 14:35:49 -07:00
Peter Zijlstra	130d9c331b	rcu/tree: Fix SCHED_FIFO params A rather embarrasing mistake had us call sched_setscheduler() before initializing the parameters passed to it. Fixes: `1a763fd7c6` ("rcu/tree: Call setschedule() gp ktread to SCHED_FIFO outside of atomic region") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Juri Lelli <juri.lelli@redhat.com>	2019-08-08 09:09:30 +02:00
Thomas Gleixner	01b1d88b09	rcu: Use CONFIG_PREEMPTION CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT. Both PREEMPT and PREEMPT_RT require the same functionality which today depends on CONFIG_PREEMPT. Switch the conditionals in RCU to use CONFIG_PREEMPTION. That's the first step towards RCU on RT. The further tweaks are work in progress. This neither touches the selftest bits which need a closer look by Paul. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20190726212124.210156346@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-07-31 19:03:35 +02:00
Juri Lelli	1a763fd7c6	rcu/tree: Call setschedule() gp ktread to SCHED_FIFO outside of atomic region sched_setscheduler() needs to acquire cpuset_rwsem, but it is currently called from an invalid (atomic) context by rcu_spawn_gp_kthread(). Fix that by simply moving sched_setscheduler_nocheck() call outside of the atomic region, as it doesn't actually require to be guarded by rcu_node lock. Suggested-by: Peter Zijlstra <peterz@infradead.org> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bristot@redhat.com Cc: claudio@evidence.eu.com Cc: lizefan@huawei.com Cc: longman@redhat.com Cc: luca.abeni@santannapisa.it Cc: mathieu.poirier@linaro.org Cc: rostedt@goodmis.org Cc: tj@kernel.org Cc: tommaso.cucinotta@santannapisa.it Link: https://lkml.kernel.org/r/20190719140000.31694-8-juri.lelli@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2019-07-25 15:55:03 +02:00
Paul E. McKenney	11ca7a9d54	Merge branches 'consolidate.2019.05.28a', 'doc.2019.05.28a', 'fixes.2019.06.13a', 'srcu.2019.05.28a', 'sync.2019.05.28a' and 'torture.2019.05.28a' into HEAD consolidate.2019.05.28a: RCU flavor consolidation cleanups and optmizations. doc.2019.05.28a: Documentation updates. fixes.2019.06.13a: Miscellaneous fixes. srcu.2019.05.28a: SRCU updates. sync.2019.05.28a: RCU-sync flavor consolidation. torture.2019.05.28a: Torture-test updates.	2019-06-19 09:21:46 -07:00
Paul E. McKenney	d5a9a8c3bc	rcu: Set a maximum limit for back-to-back callback invocation Currently, if a CPU has more than 10,000 callbacks pending, it will increase rdp->blimit to LONG_MAX. If you are lucky, LONG_MAX is only about two billion, but this is still a bit too many callbacks to invoke back-to-back while otherwise ignoring the world. This commit therefore sets a maximum limit of DEFAULT_MAX_RCU_BLIMIT, which is set to 10,000, for rdp->blimit. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 09:02:57 -07:00
Joel Fernandes (Google)	eddded8012	rcu: Add checks for dynticks counters in rcu_is_cpu_rrupt_from_idle() It would be good to combine the dynticks and dynticks_nesting counters in order to simplify the code. Unfortunately, there are concerns about usermode upcalls appearing to RCU as half of an interrupt, as Byungchul learned [1]. The "half" in "half interrupt" is due to an unpaired rcu_irq_enter(): Normally, each rcu_irq_enter() has a later call to rcu_irq_exit(). Out of an abundance of caution, Paul added warnings [2] in the RCU code which if not fired by 2021 will be interpreted as meaning that this half-interrupt scenario cannot happen any more, thus permitting simplification of this code. In the meantime, this commit makes the following changes: (1) Combining these two counters requires that rcu_rrupt_from_idle() is invoked only from hard-interrupt contexts as discussed here [3]. This commit therefore adds the required lockdep_assert_in_irq() to check this constraint. (2) Furthermore, rcu_rrupt_from_idle() is not explicit about how it is using the counters which can lead to weird future bugs. This commit therefore adds comments indicating the meaning and use of each counter. (3) Lastly, this commit checks for counter underflows as another check that half interrupts don't occur. (Previously, the function would simply return true upon underflow.) All these checks checks are NOOPs if PROVE_LOCKING (and thus PROVE_RCU) are disabled. [1] https://lore.kernel.org/patchwork/patch/952349/ [2] Commit `e11ec65cc8` ("rcu: Add warning to detect half-interrupts") [3] https://lore.kernel.org/lkml/20190312150514.GB249405@google.com/ Cc: byungchul.park@lge.com Cc: kernel-team@android.com Cc: rcu@vger.kernel.org Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-28 08:48:19 -07:00
Paul E. McKenney	43e903ad3e	rcu: Inline invoke_rcu_callbacks() into its sole remaining caller This commit saves a few lines of code by inlining invoke_rcu_callbacks() into its sole remaining caller. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-25 14:50:49 -07:00
Sebastian Andrzej Siewior	48d07c04b4	rcu: Enable elimination of Tree-RCU softirq processing Some workloads need to change kthread priority for RCU core processing without affecting other softirq work. This commit therefore introduces the rcutree.use_softirq kernel boot parameter, which moves the RCU core work from softirq to a per-CPU SCHED_OTHER kthread named rcuc. Use of SCHED_OTHER approach avoids the scalability problems that appeared with the earlier attempt to move RCU core processing to from softirq to kthreads. That said, kernels built with RCU_BOOST=y will run the rcuc kthreads at the RCU-boosting priority. Note that rcutree.use_softirq=0 must be specified to move RCU core processing to the rcuc kthreads: rcutree.use_softirq=1 is the default. Reported-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ paulmck: Adjust for invoke_rcu_callbacks() only ever being invoked from RCU core processing, in contrast to softirq->rcuc transition in old mainline RCU priority boosting. ] [ paulmck: Avoid wakeups when scheduler might have invoked rcu_read_unlock() while holding rq or pi locks, also possibly fixing a pre-existing latent bug involving raise_softirq()-induced wakeups. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-05-25 14:50:46 -07:00
Linus Torvalds	d2d8b14604	The major changes in this tracing update includes: - Removing of non-DYNAMIC_FTRACE from 32bit x86 - Removing of mcount support from x86 - Emulating a call from int3 on x86_64, fixes live kernel patching - Consolidated Tracing Error logs file Minor updates: - Removal of klp_check_compiler_support() - kdb ftrace dumping output changes - Accessing and creating ftrace instances from inside the kernel - Clean up of #define if macro - Introduction of TRACE_EVENT_NOP() to disable trace events based on config options And other minor fixes and clean ups -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXNxMZxQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qq4PAP44kP6VbwL8CHyI2A3xuJ6Hwxd+2Z2r ip66RtzyJ+2iCgEA2QCuWUlEt2bLpF9a8IQ4N9tWenSeW2i7gunPb+tioQw= =RVQo -----END PGP SIGNATURE----- Merge tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The major changes in this tracing update includes: - Removal of non-DYNAMIC_FTRACE from 32bit x86 - Removal of mcount support from x86 - Emulating a call from int3 on x86_64, fixes live kernel patching - Consolidated Tracing Error logs file Minor updates: - Removal of klp_check_compiler_support() - kdb ftrace dumping output changes - Accessing and creating ftrace instances from inside the kernel - Clean up of #define if macro - Introduction of TRACE_EVENT_NOP() to disable trace events based on config options And other minor fixes and clean ups" * tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits) x86: Hide the int3_emulate_call/jmp functions from UML livepatch: Remove klp_check_compiler_support() ftrace/x86: Remove mcount support ftrace/x86_32: Remove support for non DYNAMIC_FTRACE tracing: Simplify "if" macro code tracing: Fix documentation about disabling options using trace_options tracing: Replace kzalloc with kcalloc tracing: Fix partial reading of trace event's id file tracing: Allow RCU to run between postponed startup tests tracing: Fix white space issues in parse_pred() function tracing: Eliminate const char[] auto variables ring-buffer: Fix mispelling of Calculate tracing: probeevent: Fix to make the type of $comm string tracing: probeevent: Do not accumulate on ret variable tracing: uprobes: Re-enable $comm support for uprobe events ftrace/x86_64: Emulate call function while updating in breakpoint handler x86_64: Allow breakpoints to emulate call instructions x86_64: Add gap to int3 to allow for call emulation tracing: kdb: Allow ftdump to skip all but the last few entries tracing: Add trace_total_entries() / trace_total_entries_cpu() ...	2019-05-15 16:05:47 -07:00
Linus Torvalds	0968621917	Printk changes for 5.2 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAlzP8nQACgkQUqAMR0iA lPK79A/+NkRouqA9ihAZhUbgW0DHzOAFvUJSBgX11HQAZbGjngakuoyYFvwUx0T0 m80SUTCysxQrWl+xLdccPZ9ZrhP2KFQrEBEdeYHZ6ymcYcl83+3bOIBS7VwdZAbO EzB8u/58uU/sI6ABL4lF7ZF/+R+U4CXveEUoVUF04bxdPOxZkRX4PT8u3DzCc+RK r4yhwQUXGcKrHa2GrRL3GXKsDxcnRdFef/nzq4RFSZsi0bpskzEj34WrvctV6j+k FH/R3kEcZrtKIMPOCoDMMWq07yNqK/QKj0MJlGoAlwfK4INgcrSXLOx+pAmr6BNq uMKpkxCFhnkZVKgA/GbKEGzFf+ZGz9+2trSFka9LD2Ig6DIstwXqpAgiUK8JFQYj lq1mTaJZD3DfF2vnGHGeAfBFG3XETv+mIT/ow6BcZi3NyNSVIaqa5GAR+lMc6xkR waNkcMDkzLFuP1r0p7ZizXOksk9dFkMP3M6KqJomRtApwbSNmtt+O2jvyLPvB3+w wRyN9WT7IJZYo4v0rrD5Bl6BjV15ZeCPRSFZRYofX+vhcqJQsFX1M9DeoNqokh55 Cri8f6MxGzBVjE1G70y2/cAFFvKEKJud0NUIMEuIbcy+xNrEAWPF8JhiwpKKnU10 c0u674iqHJ2HeVsYWZF0zqzqQ6E1Idhg/PrXfuVuhAaL5jIOnYY= =WZfC -----END PGP SIGNATURE----- Merge tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Allow state reset of printk_once() calls. - Prevent crashes when dereferencing invalid pointers in vsprintf(). Only the first byte is checked for simplicity. - Make vsprintf warnings consistent and inlined. - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf modifiers. - Some clean up of vsprintf and test_printf code. * tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: lib/vsprintf: Make function pointer_string static vsprintf: Limit the length of inlined error messages vsprintf: Avoid confusion between invalid address and value vsprintf: Prevent crash when dereferencing invalid pointers vsprintf: Consolidate handling of unknown pointer specifiers vsprintf: Factor out %pO handler as kobject_string() vsprintf: Factor out %pV handler as va_format() vsprintf: Factor out %p[iI] handler as ip_addr_string() vsprintf: Do not check address of well-known strings vsprintf: Consistent %pK handling for kptr_restrict == 0 vsprintf: Shuffle restricted_pointer() printk: Tie printk_once / printk_deferred_once into .data.once for reset treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively lib/test_printf: Switch to bitmap_zalloc()	2019-05-07 09:18:12 -07:00
Paul E. McKenney	6cdbc07a5a	Merge branches 'consolidate.2019.04.09a', 'doc.2019.03.26b', 'fixes.2019.03.26b', 'srcu.2019.03.26b', 'stall.2019.03.26b' and 'torture.2019.03.26b' into HEAD consolidate.2019.04.09a: Lingering RCU flavor consolidation cleanups. doc.2019.03.26b: Documentation updates. fixes.2019.03.26b: Miscellaneous fixes. srcu.2019.03.26b: SRCU updates. stall.2019.03.26b: RCU CPU stall warning updates. torture.2019.03.26b: Torture-test updates.	2019-04-09 08:08:13 -07:00
Sakari Ailus	d75f773c86	treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively %pF and %pf are functionally equivalent to %pS and %ps conversion specifiers. The former are deprecated, therefore switch the current users to use the preferred variant. The changes have been produced by the following command: git grep -l '%p[fF]' \| grep -v '^$tools\\|Documentation$/' \| \ while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done And verifying the result. Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: sparclinux@vger.kernel.org Cc: linux-um@lists.infradead.org Cc: xen-devel@lists.xenproject.org Cc: linux-acpi@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: drbd-dev@lists.linbit.com Cc: linux-block@vger.kernel.org Cc: linux-mmc@vger.kernel.org Cc: linux-nvdimm@lists.01.org Cc: linux-pci@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: linux-btrfs@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: linux-mm@kvack.org Cc: ceph-devel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Acked-by: David Sterba <dsterba@suse.com> (for btrfs) Acked-by: Mike Rapoport <rppt@linux.ibm.com> (for mm/memblock.c) Acked-by: Bjorn Helgaas <bhelgaas@google.com> (for drivers/pci) Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Petr Mladek <pmladek@suse.com>	2019-04-09 14:19:06 +02:00
Yafang Shao	4f5fbd78a7	rcu: validate arguments for rcu tracepoints When CONFIG_RCU_TRACE is not set, all these tracepoints are defined as do-nothing macro. We'd better make those inline functions that take proper arguments. As RCU_TRACE() is defined as do-nothing marco as well when CONFIG_RCU_TRACE is not set, so we can clean it up. Link: http://lkml.kernel.org/r/1553602391-11926-4-git-send-email-laoar.shao@gmail.com Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2019-04-08 09:22:51 -04:00
Paul E. McKenney	b51bcbbf16	rcu: Move forward-progress checkers into tree_stall.h This commit further consolidates stall-warning functionality by moving forward-progress checkers into kernel/rcu/tree_stall.h, updating a comment or two while in the area. More specifically, this commit moves show_rcu_gp_kthreads(), rcu_check_gp_start_stall(), rcu_fwd_progress_check(), sysrq_rcu, sysrq_show_rcu(), sysrq_rcudump_op, and rcu_sysrq_init() from kernel/rcu/tree.c to kernel/rcu/tree_stall.h. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-03-26 14:40:13 -07:00
Paul E. McKenney	7ac1907c9e	rcu: Move irq-disabled stall-warning checking to tree_stall.h The rcu_iw_handler() function's sole purpose in life is to indicate whether a stalled CPU had interrupts disabled, so it belongs in kernel/rcu/tree_stall.h. This commit therefore makes that move, clarifying its header comment while in the area. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-03-26 14:40:13 -07:00
Paul E. McKenney	32255d51b6	rcu: Move RCU CPU stall-warning code out of tree.c This commit completes the process of consolidating the code for RCU CPU stall warnings for normal grace periods by moving the remaining such code from kernel/rcu/tree.c to kernel/rcu/tree_stall.h. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-03-26 14:40:13 -07:00
Paul E. McKenney	10462d6f58	rcu: Move RCU CPU stall-warning code out of update.c The RCU CPU stall-warning code for normal grace periods is currently scattered across three files, due to earlier Tiny RCU support for RCU CPU stall warnings and for old Kconfig options that have long since been retired. Given that it is hard for the lead RCU maintainer to find relevant stall-warning code, it would be good to consolidate it. This commit starts this process by moving stall-warning code from kernel/rcu/update.c to a new kernel/rcu/tree_stall.h file. Note that the definitions of rcu_cpu_stall_suppress and rcu_cpu_stall_timeout must remain in kernel/rcu/update.h to provide compatibility for kernel boot parameter lists. Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-03-26 14:40:13 -07:00
Zhouyi Zhou	5d8a752e31	rcu: Fix force_qs_rnp() header comment Previously, threads blocked on offlining CPUS were migrated to the root rcu_node structure, thus requiring RCU priority boosting on this structure. However, since commit `d19fb8d1f3` ("rcu: Don't migrate blocked tasks even if all corresponding CPUs offline"), RCU does not migrate blocked tasks. Consequently, RCU no longer does RCU priority boosting on the root rcu_node structure as of commit `1be0085b51` ("rcu: Don't initiate RCU priority boosting on root rcu_node"). This commit therefore brings comments for the force_qs_rnp() function's header comment in line with this new no-root-boosting reality. Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> [ paulmck: Also remove obsolete comment on suppressing new grace periods. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-03-26 14:38:38 -07:00
Paul E. McKenney	85f2b60c43	rcu: Update jiffies_to_sched_qs and adjust_jiffies_till_sched_qs() comments This commit better documents the jiffies_to_sched_qs default-value strategy used by adjust_jiffies_till_sched_qs() Reported-by: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-03-26 14:38:38 -07:00
Neeraj Upadhyay	6973032a60	rcu: Default jiffies_to_sched_qs to jiffies_till_sched_qs The current code only calls adjust_jiffies_till_sched_qs() if jiffies_till_sched_qs is left at its default value, so when the jiffies_till_sched_qs kernel-boot parameter actually is specified, jiffies_to_sched_qs will be left with the value zero, which will result in useless slowdowns of cond_resched(). This commit therefore changes rcu_init_geometry() to unconditionally invoke adjust_jiffies_till_sched_qs(), which ensures that jiffies_to_sched_qs will be initialized in all cases, thus maintaining good cond_resched() performance. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-03-26 14:38:38 -07:00
Neeraj Upadhyay	0f58d2ac2c	rcu: Fix self-wakeups for grace-period kthread The current rcu_gp_kthread_wake() function uses in_interrupt() and thus does a self-wakeup from all interrupt contexts, including the pointless case where the GP kthread happens to be running with bottom halves disabled, along with the impossible case where the GP kthread is running within an NMI handler (you are not supposed to invoke rcu_gp_kthread_wake() from within an NMI handler. This commit therefore replaces the in_interrupt() with in_irq(), so that the self-wakeups happen only from handlers for hardware interrupts and softirqs. This also makes the code match the comment. Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2019-03-26 14:37:49 -07:00
Akira Yokosawa	b2eb85b49a	rcu: Move common code out of if-else block As the result of recent addition of "rdp->core_needs_qs = false;" in the "if" block, now both branches of the if-else have the same assignment. Factor it out and reduce line count. Signed-off-by: Akira Yokosawa <akiyks@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>	2019-03-26 14:37:49 -07:00
Liu Song	3ffe3d1adc	rcu: Set rcutree.kthread_prio sysfs access to read-only The rcutree.kthread_prio kernel-boot parameter is used to set the priority for boost (rcub), per-CPU (rcuc), and grace-period (rcu_preempt or rcu_sched) kthreads. It is also used by rcutorture to check whether it is possible to meaningfully test RCU priority boosting. However, all of these cases will either ignore or be confused by any post-boot changes to rcutree.kthread_prio. Note that the user really can change the priorities of all of these kthreads using chrt, given sufficient privileges. Therefore, the read-write nature of sysfs access to rcutree.kthread_prio is thus at best an attractive nuisance. This commit therefore changes sysfs access to rcutree.kthread_prio to be read-only. Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-03-26 14:37:49 -07:00
Joel Fernandes (Google)	671a63517c	rcu: Avoid unnecessary softirq when system is idle When there are no callbacks pending on an idle system, I noticed that RCU softirq is continuously firing. During this the cpu_no_qs is set to false, and core_needs_qs is set to true indefinitely. This causes rcu_process_callbacks to be repeatedly called, even though the node corresponding to the CPU has that CPU's mask bit cleared and the system is idle. I believe the race is when such mask clearing is done during idle CPU scan of the quiescent state forcing stage in the kthread instead of the softirq. Since the rnp mask is cleared, but the flags on the CPU's rdp are not cleared, the CPU thinks it still needs to report to core RCU. Cure this by clearing the core_needs_qs flag when the CPU detects that its node is already updated which will avoid the unwanted softirq raises to the benefit of real-time systems. Test: Ran rcutorture for various tree RCU configs. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>	2019-03-26 14:37:49 -07:00

1 2 3 4 5 ...

649 Commits