diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst index 42a81e30619e..0da9133fa13a 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.rst +++ b/Documentation/RCU/Design/Requirements/Requirements.rst @@ -2598,6 +2598,24 @@ also includes DEFINE_SRCU(), DEFINE_STATIC_SRCU(), and init_srcu_struct() APIs for defining and initializing ``srcu_struct`` structures. +More recently, the SRCU API has added polling interfaces: + +#. start_poll_synchronize_srcu() returns a cookie identifying + the completion of a future SRCU grace period and ensures + that this grace period will be started. +#. poll_state_synchronize_srcu() returns ``true`` iff the + specified cookie corresponds to an already-completed + SRCU grace period. +#. get_state_synchronize_srcu() returns a cookie just like + start_poll_synchronize_srcu() does, but differs in that + it does nothing to ensure that any future SRCU grace period + will be started. + +These functions are used to avoid unnecessary SRCU grace periods in +certain types of buffer-cache algorithms having multi-stage age-out +mechanisms. The idea is that by the time the block has aged completely +from the cache, an SRCU grace period will be very likely to have elapsed. + Tasks RCU ~~~~~~~~~ diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst index e97d1b4876ef..7148e9be08c3 100644 --- a/Documentation/RCU/stallwarn.rst +++ b/Documentation/RCU/stallwarn.rst @@ -92,7 +92,9 @@ warnings: buggy timer hardware through bugs in the interrupt or exception path (whether hardware, firmware, or software) through bugs in Linux's timer subsystem through bugs in the scheduler, and, - yes, even including bugs in RCU itself. + yes, even including bugs in RCU itself. It can also result in + the ``rcu_.*timer wakeup didn't happen for`` console-log message, + which will include additional debugging information. - A bug in the RCU implementation. @@ -292,6 +294,25 @@ kthread is waiting for a short timeout, the "state" precedes value of the task_struct ->state field, and the "cpu" indicates that the grace-period kthread last ran on CPU 5. +If the relevant grace-period kthread does not wake from FQS wait in a +reasonable time, then the following additional line is printed:: + + kthread timer wakeup didn't happen for 23804 jiffies! g7076 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 + +The "23804" indicates that kthread's timer expired more than 23 thousand +jiffies ago. The rest of the line has meaning similar to the kthread +starvation case. + +Additionally, the following line is printed:: + + Possible timer handling issue on cpu=4 timer-softirq=11142 + +Here "cpu" indicates that the grace-period kthread last ran on CPU 4, +where it queued the fqs timer. The number following the "timer-softirq" +is the current ``TIMER_SOFTIRQ`` count on cpu 4. If this value does not +change on successive RCU CPU stall warnings, there is further reason to +suspect a timer problem. + Multiple Warnings From One Stall ================================ diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index b5baa8a54df0..b1d6cd58a04c 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4092,6 +4092,10 @@ value, meaning that RCU_SOFTIRQ is used by default. Specify rcutree.use_softirq=0 to use rcuc kthreads. + But note that CONFIG_PREEMPT_RT=y kernels disable + this kernel boot parameter, forcibly setting it + to zero. + rcutree.rcu_fanout_exact= [KNL] Disable autobalancing of the rcu_node combining tree. This is used by rcutorture, and might @@ -4332,6 +4336,14 @@ stress RCU, they don't participate in the actual test, hence the "fake". + rcutorture.nocbs_nthreads= [KNL] + Set number of RCU callback-offload togglers. + Zero (the default) disables toggling. + + rcutorture.nocbs_toggle= [KNL] + Set the delay in milliseconds between successive + callback-offload toggling attempts. + rcutorture.nreaders= [KNL] Set number of RCU readers. The value -1 selects N-1, where N is the number of CPUs. A value @@ -4464,6 +4476,13 @@ only normal grace-period primitives. No effect on CONFIG_TINY_RCU kernels. + But note that CONFIG_PREEMPT_RT=y kernels enables + this kernel boot parameter, forcibly setting + it to the value one, that is, converting any + post-boot attempt at an expedited RCU grace + period to instead use normal non-expedited + grace-period processing. + rcupdate.rcu_task_ipi_delay= [KNL] Set time in jiffies during which RCU tasks will avoid sending IPIs, starting with the beginning @@ -4551,6 +4570,12 @@ refscale.verbose= [KNL] Enable additional printk() statements. + refscale.verbose_batched= [KNL] + Batch the additional printk() statements. If zero + (the default) or negative, print everything. Otherwise, + print every Nth verbose statement, where N is the value + specified. + relax_domain_level= [KNL, SMP] Set scheduler's default relax_domain_level. See Documentation/admin-guide/cgroup-v1/cpusets.rst. @@ -5325,6 +5350,14 @@ are running concurrently, especially on systems with rotating-rust storage. + torture.verbose_sleep_frequency= [KNL] + Specifies how many verbose printk()s should be + emitted between each sleep. The default of zero + disables verbose-printk() sleeping. + + torture.verbose_sleep_duration= [KNL] + Duration of each verbose-printk() sleep in jiffies. + tp720= [HW,PS2] tpm_suspend_pcr=[HW,TPM] diff --git a/include/linux/cpu.h b/include/linux/cpu.h index d6428aaf67e7..3aaa0687e8df 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -111,6 +111,8 @@ static inline void cpu_maps_update_done(void) #endif /* CONFIG_SMP */ extern struct bus_type cpu_subsys; +extern int lockdep_is_cpus_held(void); + #ifdef CONFIG_HOTPLUG_CPU extern void cpus_write_lock(void); extern void cpus_write_unlock(void); diff --git a/include/linux/mm.h b/include/linux/mm.h index 5299b90a6c40..af7d050900e7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3169,5 +3169,7 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping, extern int sysctl_nr_trim_pages; +void mem_dump_obj(void *object); + #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h index b36afe7b22c9..8afe886e85f1 100644 --- a/include/linux/rcu_segcblist.h +++ b/include/linux/rcu_segcblist.h @@ -63,6 +63,122 @@ struct rcu_cblist { #define RCU_NEXT_TAIL 3 #define RCU_CBLIST_NSEGS 4 + +/* + * ==NOCB Offloading state machine== + * + * + * ---------------------------------------------------------------------------- + * | SEGCBLIST_SOFTIRQ_ONLY | + * | | + * | Callbacks processed by rcu_core() from softirqs or local | + * | rcuc kthread, without holding nocb_lock. | + * ---------------------------------------------------------------------------- + * | + * v + * ---------------------------------------------------------------------------- + * | SEGCBLIST_OFFLOADED | + * | | + * | Callbacks processed by rcu_core() from softirqs or local | + * | rcuc kthread, while holding nocb_lock. Waking up CB and GP kthreads, | + * | allowing nocb_timer to be armed. | + * ---------------------------------------------------------------------------- + * | + * v + * ----------------------------------- + * | | + * v v + * --------------------------------------- ----------------------------------| + * | SEGCBLIST_OFFLOADED | | | SEGCBLIST_OFFLOADED | | + * | SEGCBLIST_KTHREAD_CB | | SEGCBLIST_KTHREAD_GP | + * | | | | + * | | | | + * | CB kthread woke up and | | GP kthread woke up and | + * | acknowledged SEGCBLIST_OFFLOADED. | | acknowledged SEGCBLIST_OFFLOADED| + * | Processes callbacks concurrently | | | + * | with rcu_core(), holding | | | + * | nocb_lock. | | | + * --------------------------------------- ----------------------------------- + * | | + * ----------------------------------- + * | + * v + * |--------------------------------------------------------------------------| + * | SEGCBLIST_OFFLOADED | | + * | SEGCBLIST_KTHREAD_CB | | + * | SEGCBLIST_KTHREAD_GP | + * | | + * | Kthreads handle callbacks holding nocb_lock, local rcu_core() stops | + * | handling callbacks. | + * ---------------------------------------------------------------------------- + */ + + + +/* + * ==NOCB De-Offloading state machine== + * + * + * |--------------------------------------------------------------------------| + * | SEGCBLIST_OFFLOADED | | + * | SEGCBLIST_KTHREAD_CB | | + * | SEGCBLIST_KTHREAD_GP | + * | | + * | CB/GP kthreads handle callbacks holding nocb_lock, local rcu_core() | + * | ignores callbacks. | + * ---------------------------------------------------------------------------- + * | + * v + * |--------------------------------------------------------------------------| + * | SEGCBLIST_KTHREAD_CB | | + * | SEGCBLIST_KTHREAD_GP | + * | | + * | CB/GP kthreads and local rcu_core() handle callbacks concurrently | + * | holding nocb_lock. Wake up CB and GP kthreads if necessary. | + * ---------------------------------------------------------------------------- + * | + * v + * ----------------------------------- + * | | + * v v + * ---------------------------------------------------------------------------| + * | | + * | SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP | + * | | | + * | GP kthread woke up and | CB kthread woke up and | + * | acknowledged the fact that | acknowledged the fact that | + * | SEGCBLIST_OFFLOADED got cleared. | SEGCBLIST_OFFLOADED got cleared. | + * | | The CB kthread goes to sleep | + * | The callbacks from the target CPU | until it ever gets re-offloaded. | + * | will be ignored from the GP kthread | | + * | loop. | | + * ---------------------------------------------------------------------------- + * | | + * ----------------------------------- + * | + * v + * ---------------------------------------------------------------------------- + * | 0 | + * | | + * | Callbacks processed by rcu_core() from softirqs or local | + * | rcuc kthread, while holding nocb_lock. Forbid nocb_timer to be armed. | + * | Flush pending nocb_timer. Flush nocb bypass callbacks. | + * ---------------------------------------------------------------------------- + * | + * v + * ---------------------------------------------------------------------------- + * | SEGCBLIST_SOFTIRQ_ONLY | + * | | + * | Callbacks processed by rcu_core() from softirqs or local | + * | rcuc kthread, without holding nocb_lock. | + * ---------------------------------------------------------------------------- + */ +#define SEGCBLIST_ENABLED BIT(0) +#define SEGCBLIST_SOFTIRQ_ONLY BIT(1) +#define SEGCBLIST_KTHREAD_CB BIT(2) +#define SEGCBLIST_KTHREAD_GP BIT(3) +#define SEGCBLIST_OFFLOADED BIT(4) + struct rcu_segcblist { struct rcu_head *head; struct rcu_head **tails[RCU_CBLIST_NSEGS]; @@ -72,8 +188,8 @@ struct rcu_segcblist { #else long len; #endif - u8 enabled; - u8 offloaded; + long seglen[RCU_CBLIST_NSEGS]; + u8 flags; }; #define RCU_SEGCBLIST_INITIALIZER(n) \ diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index de0826411311..ebd8dcca4997 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -33,6 +33,8 @@ #define ULONG_CMP_GE(a, b) (ULONG_MAX / 2 >= (a) - (b)) #define ULONG_CMP_LT(a, b) (ULONG_MAX / 2 < (a) - (b)) #define ulong2long(a) (*(long *)(&(a))) +#define USHORT_CMP_GE(a, b) (USHRT_MAX / 2 >= (unsigned short)((a) - (b))) +#define USHORT_CMP_LT(a, b) (USHRT_MAX / 2 < (unsigned short)((a) - (b))) /* Exported common interfaces */ void call_rcu(struct rcu_head *head, rcu_callback_t func); @@ -86,6 +88,12 @@ void rcu_sched_clock_irq(int user); void rcu_report_dead(unsigned int cpu); void rcutree_migrate_callbacks(int cpu); +#ifdef CONFIG_TASKS_RCU_GENERIC +void rcu_init_tasks_generic(void); +#else +static inline void rcu_init_tasks_generic(void) { } +#endif + #ifdef CONFIG_RCU_STALL_COMMON void rcu_sysrq_start(void); void rcu_sysrq_end(void); @@ -104,8 +112,12 @@ static inline void rcu_user_exit(void) { } #ifdef CONFIG_RCU_NOCB_CPU void rcu_init_nohz(void); +int rcu_nocb_cpu_offload(int cpu); +int rcu_nocb_cpu_deoffload(int cpu); #else /* #ifdef CONFIG_RCU_NOCB_CPU */ static inline void rcu_init_nohz(void) { } +static inline int rcu_nocb_cpu_offload(int cpu) { return -EINVAL; } +static inline int rcu_nocb_cpu_deoffload(int cpu) { return 0; } #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */ /** @@ -840,19 +852,11 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) */ #define __is_kvfree_rcu_offset(offset) ((offset) < 4096) -/* - * Helper macro for kfree_rcu() to prevent argument-expansion eyestrain. - */ -#define __kvfree_rcu(head, offset) \ - do { \ - BUILD_BUG_ON(!__is_kvfree_rcu_offset(offset)); \ - kvfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \ - } while (0) - /** * kfree_rcu() - kfree an object after a grace period. - * @ptr: pointer to kfree - * @rhf: the name of the struct rcu_head within the type of @ptr. + * @ptr: pointer to kfree for both single- and double-argument invocations. + * @rhf: the name of the struct rcu_head within the type of @ptr, + * but only for double-argument invocations. * * Many rcu callbacks functions just call kfree() on the base structure. * These functions are trivial, but their size adds up, and furthermore @@ -865,7 +869,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) * Because the functions are not allowed in the low-order 4096 bytes of * kernel virtual memory, offsets up to 4095 bytes can be accommodated. * If the offset is larger than 4095 bytes, a compile-time error will - * be generated in __kvfree_rcu(). If this error is triggered, you can + * be generated in kvfree_rcu_arg_2(). If this error is triggered, you can * either fall back to use of call_rcu() or rearrange the structure to * position the rcu_head structure into the first 4096 bytes. * @@ -875,13 +879,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) * The BUILD_BUG_ON check must not involve any function calls, hence the * checks are done in macros here. */ -#define kfree_rcu(ptr, rhf) \ -do { \ - typeof (ptr) ___p = (ptr); \ - \ - if (___p) \ - __kvfree_rcu(&((___p)->rhf), offsetof(typeof(*(ptr)), rhf)); \ -} while (0) +#define kfree_rcu kvfree_rcu /** * kvfree_rcu() - kvfree an object after a grace period. @@ -913,7 +911,17 @@ do { \ kvfree_rcu_arg_2, kvfree_rcu_arg_1)(__VA_ARGS__) #define KVFREE_GET_MACRO(_1, _2, NAME, ...) NAME -#define kvfree_rcu_arg_2(ptr, rhf) kfree_rcu(ptr, rhf) +#define kvfree_rcu_arg_2(ptr, rhf) \ +do { \ + typeof (ptr) ___p = (ptr); \ + \ + if (___p) { \ + BUILD_BUG_ON(!__is_kvfree_rcu_offset(offsetof(typeof(*(ptr)), rhf))); \ + kvfree_call_rcu(&((___p)->rhf), (rcu_callback_t)(unsigned long) \ + (offsetof(typeof(*(ptr)), rhf))); \ + } \ +} while (0) + #define kvfree_rcu_arg_1(ptr) \ do { \ typeof(ptr) ___p = (ptr); \ diff --git a/include/linux/slab.h b/include/linux/slab.h index be4ba5867ac5..7ae604076767 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -186,6 +186,8 @@ void kfree(const void *); void kfree_sensitive(const void *); size_t __ksize(const void *); size_t ksize(const void *); +bool kmem_valid_obj(void *object); +void kmem_dump_obj(void *object); #ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR void __check_heap_object(const void *ptr, unsigned long n, struct page *page, diff --git a/include/linux/srcu.h b/include/linux/srcu.h index e432cc92c73d..a0895bbf71ce 100644 --- a/include/linux/srcu.h +++ b/include/linux/srcu.h @@ -60,6 +60,9 @@ void cleanup_srcu_struct(struct srcu_struct *ssp); int __srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp); void __srcu_read_unlock(struct srcu_struct *ssp, int idx) __releases(ssp); void synchronize_srcu(struct srcu_struct *ssp); +unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp); +unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp); +bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long cookie); #ifdef CONFIG_DEBUG_LOCK_ALLOC diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h index 5a5a1941ca15..0e0cf4d6a72a 100644 --- a/include/linux/srcutiny.h +++ b/include/linux/srcutiny.h @@ -15,7 +15,8 @@ struct srcu_struct { short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */ - short srcu_idx; /* Current reader array element. */ + unsigned short srcu_idx; /* Current reader array element in bit 0x2. */ + unsigned short srcu_idx_max; /* Furthest future srcu_idx request. */ u8 srcu_gp_running; /* GP workqueue running? */ u8 srcu_gp_waiting; /* GP waiting for readers? */ struct swait_queue_head srcu_wq; @@ -59,7 +60,7 @@ static inline int __srcu_read_lock(struct srcu_struct *ssp) { int idx; - idx = READ_ONCE(ssp->srcu_idx); + idx = ((READ_ONCE(ssp->srcu_idx) + 1) & 0x2) >> 1; WRITE_ONCE(ssp->srcu_lock_nesting[idx], ssp->srcu_lock_nesting[idx] + 1); return idx; } @@ -80,7 +81,7 @@ static inline void srcu_torture_stats_print(struct srcu_struct *ssp, { int idx; - idx = READ_ONCE(ssp->srcu_idx) & 0x1; + idx = ((READ_ONCE(ssp->srcu_idx) + 1) & 0x2) >> 1; pr_alert("%s%s Tiny SRCU per-CPU(idx=%d): (%hd,%hd)\n", tt, tf, idx, READ_ONCE(ssp->srcu_lock_nesting[!idx]), diff --git a/include/linux/timer.h b/include/linux/timer.h index fda13c9d1256..4118a97e62fb 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -192,6 +192,8 @@ extern int try_to_del_timer_sync(struct timer_list *timer); #define del_singleshot_timer_sync(t) del_timer_sync(t) +extern bool timer_curr_running(struct timer_list *timer); + extern void init_timers(void); struct hrtimer; extern enum hrtimer_restart it_real_fn(struct hrtimer *); diff --git a/include/linux/torture.h b/include/linux/torture.h index 7f65bd1dd307..0910c5803f35 100644 --- a/include/linux/torture.h +++ b/include/linux/torture.h @@ -32,11 +32,27 @@ #define TOROUT_STRING(s) \ pr_alert("%s" TORTURE_FLAG " %s\n", torture_type, s) #define VERBOSE_TOROUT_STRING(s) \ - do { if (verbose) pr_alert("%s" TORTURE_FLAG " %s\n", torture_type, s); } while (0) +do { \ + if (verbose) { \ + verbose_torout_sleep(); \ + pr_alert("%s" TORTURE_FLAG " %s\n", torture_type, s); \ + } \ +} while (0) #define VERBOSE_TOROUT_ERRSTRING(s) \ - do { if (verbose) pr_alert("%s" TORTURE_FLAG "!!! %s\n", torture_type, s); } while (0) +do { \ + if (verbose) { \ + verbose_torout_sleep(); \ + pr_alert("%s" TORTURE_FLAG "!!! %s\n", torture_type, s); \ + } \ +} while (0) +void verbose_torout_sleep(void); /* Definitions for online/offline exerciser. */ +#ifdef CONFIG_HOTPLUG_CPU +int torture_num_online_cpus(void); +#else /* #ifdef CONFIG_HOTPLUG_CPU */ +static inline int torture_num_online_cpus(void) { return 1; } +#endif /* #else #ifdef CONFIG_HOTPLUG_CPU */ typedef void torture_ofl_func(void); bool torture_offline(int cpu, long *n_onl_attempts, long *n_onl_successes, unsigned long *sum_offl, int *min_onl, int *max_onl); @@ -61,6 +77,13 @@ static inline void torture_random_init(struct torture_random_state *trsp) trsp->trs_count = 0; } +/* Definitions for high-resolution-timer sleeps. */ +int torture_hrtimeout_ns(ktime_t baset_ns, u32 fuzzt_ns, struct torture_random_state *trsp); +int torture_hrtimeout_us(u32 baset_us, u32 fuzzt_ns, struct torture_random_state *trsp); +int torture_hrtimeout_ms(u32 baset_ms, u32 fuzzt_us, struct torture_random_state *trsp); +int torture_hrtimeout_jiffies(u32 baset_j, struct torture_random_state *trsp); +int torture_hrtimeout_s(u32 baset_s, u32 fuzzt_ms, struct torture_random_state *trsp); + /* Task shuffler, which causes CPUs to occasionally go idle. */ void torture_shuffle_task_register(struct task_struct *tp); int torture_shuffle_init(long shuffint); diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 80c0181c411d..c18f4751a704 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -246,4 +246,10 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) int register_vmap_purge_notifier(struct notifier_block *nb); int unregister_vmap_purge_notifier(struct notifier_block *nb); +#ifdef CONFIG_MMU +bool vmalloc_dump_obj(void *object); +#else +static inline bool vmalloc_dump_obj(void *object) { return false; } +#endif + #endif /* _LINUX_VMALLOC_H */ diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h index 155b5cb43cfd..5fc29400e1a2 100644 --- a/include/trace/events/rcu.h +++ b/include/trace/events/rcu.h @@ -505,6 +505,32 @@ TRACE_EVENT_RCU(rcu_callback, __entry->qlen) ); +TRACE_EVENT_RCU(rcu_segcb_stats, + + TP_PROTO(struct rcu_segcblist *rs, const char *ctx), + + TP_ARGS(rs, ctx), + + TP_STRUCT__entry( + __field(const char *, ctx) + __array(unsigned long, gp_seq, RCU_CBLIST_NSEGS) + __array(long, seglen, RCU_CBLIST_NSEGS) + ), + + TP_fast_assign( + __entry->ctx = ctx; + memcpy(__entry->seglen, rs->seglen, RCU_CBLIST_NSEGS * sizeof(long)); + memcpy(__entry->gp_seq, rs->gp_seq, RCU_CBLIST_NSEGS * sizeof(unsigned long)); + + ), + + TP_printk("%s seglen: (DONE=%ld, WAIT=%ld, NEXT_READY=%ld, NEXT=%ld) " + "gp_seq: (DONE=%lu, WAIT=%lu, NEXT_READY=%lu, NEXT=%lu)", __entry->ctx, + __entry->seglen[0], __entry->seglen[1], __entry->seglen[2], __entry->seglen[3], + __entry->gp_seq[0], __entry->gp_seq[1], __entry->gp_seq[2], __entry->gp_seq[3]) + +); + /* * Tracepoint for the registration of a single RCU callback of the special * kvfree() form. The first argument is the RCU type, the second argument diff --git a/init/main.c b/init/main.c index 6feee7f11eaf..421640fca375 100644 --- a/init/main.c +++ b/init/main.c @@ -1518,6 +1518,7 @@ static noinline void __init kernel_init_freeable(void) init_mm_internals(); + rcu_init_tasks_generic(); do_pre_smp_initcalls(); lockup_detector_init(); diff --git a/kernel/cpu.c b/kernel/cpu.c index 4e11e91010e1..1b6302ecbabe 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -330,6 +330,13 @@ void lockdep_assert_cpus_held(void) percpu_rwsem_assert_held(&cpu_hotplug_lock); } +#ifdef CONFIG_LOCKDEP +int lockdep_is_cpus_held(void) +{ + return percpu_rwsem_is_held(&cpu_hotplug_lock); +} +#endif + static void lockdep_acquire_cpus_lock(void) { rwsem_acquire(&cpu_hotplug_lock.dep_map, 0, 0, _THIS_IP_); diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index fd838cea3934..0ab94e1f1276 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -27,7 +27,6 @@ #include #include #include -#include #include #include diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig index cdc57b4f6d48..3128b7cf8e1f 100644 --- a/kernel/rcu/Kconfig +++ b/kernel/rcu/Kconfig @@ -95,6 +95,7 @@ config TASKS_RUDE_RCU config TASKS_TRACE_RCU def_bool 0 + select IRQ_WORK help This option enables a task-based RCU implementation that uses explicit rcu_read_lock_trace() read-side markers, and allows @@ -188,8 +189,8 @@ config RCU_FAST_NO_HZ config RCU_BOOST bool "Enable RCU priority boosting" - depends on RT_MUTEXES && PREEMPT_RCU && RCU_EXPERT - default n + depends on (RT_MUTEXES && PREEMPT_RCU && RCU_EXPERT) || PREEMPT_RT + default y if PREEMPT_RT help This option boosts the priority of preempted RCU readers that block the current preemptible RCU grace period for too long. diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index 59ef1ae6dc37..bf0827d4b659 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -378,7 +378,11 @@ do { \ smp_mb__after_unlock_lock(); \ } while (0) -#define raw_spin_unlock_rcu_node(p) raw_spin_unlock(&ACCESS_PRIVATE(p, lock)) +#define raw_spin_unlock_rcu_node(p) \ +do { \ + lockdep_assert_irqs_disabled(); \ + raw_spin_unlock(&ACCESS_PRIVATE(p, lock)); \ +} while (0) #define raw_spin_lock_irq_rcu_node(p) \ do { \ @@ -387,7 +391,10 @@ do { \ } while (0) #define raw_spin_unlock_irq_rcu_node(p) \ - raw_spin_unlock_irq(&ACCESS_PRIVATE(p, lock)) +do { \ + lockdep_assert_irqs_disabled(); \ + raw_spin_unlock_irq(&ACCESS_PRIVATE(p, lock)); \ +} while (0) #define raw_spin_lock_irqsave_rcu_node(p, flags) \ do { \ @@ -396,7 +403,10 @@ do { \ } while (0) #define raw_spin_unlock_irqrestore_rcu_node(p, flags) \ - raw_spin_unlock_irqrestore(&ACCESS_PRIVATE(p, lock), flags) +do { \ + lockdep_assert_irqs_disabled(); \ + raw_spin_unlock_irqrestore(&ACCESS_PRIVATE(p, lock), flags); \ +} while (0) #define raw_spin_trylock_rcu_node(p) \ ({ \ diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c index 2d2a6b6b9dfb..7f181c9675f7 100644 --- a/kernel/rcu/rcu_segcblist.c +++ b/kernel/rcu/rcu_segcblist.c @@ -7,10 +7,10 @@ * Authors: Paul E. McKenney */ -#include -#include +#include #include -#include +#include +#include #include "rcu_segcblist.h" @@ -88,23 +88,135 @@ static void rcu_segcblist_set_len(struct rcu_segcblist *rsclp, long v) #endif } +/* Get the length of a segment of the rcu_segcblist structure. */ +static long rcu_segcblist_get_seglen(struct rcu_segcblist *rsclp, int seg) +{ + return READ_ONCE(rsclp->seglen[seg]); +} + +/* Return number of callbacks in segmented callback list by summing seglen. */ +long rcu_segcblist_n_segment_cbs(struct rcu_segcblist *rsclp) +{ + long len = 0; + int i; + + for (i = RCU_DONE_TAIL; i < RCU_CBLIST_NSEGS; i++) + len += rcu_segcblist_get_seglen(rsclp, i); + + return len; +} + +/* Set the length of a segment of the rcu_segcblist structure. */ +static void rcu_segcblist_set_seglen(struct rcu_segcblist *rsclp, int seg, long v) +{ + WRITE_ONCE(rsclp->seglen[seg], v); +} + +/* Increase the numeric length of a segment by a specified amount. */ +static void rcu_segcblist_add_seglen(struct rcu_segcblist *rsclp, int seg, long v) +{ + WRITE_ONCE(rsclp->seglen[seg], rsclp->seglen[seg] + v); +} + +/* Move from's segment length to to's segment. */ +static void rcu_segcblist_move_seglen(struct rcu_segcblist *rsclp, int from, int to) +{ + long len; + + if (from == to) + return; + + len = rcu_segcblist_get_seglen(rsclp, from); + if (!len) + return; + + rcu_segcblist_add_seglen(rsclp, to, len); + rcu_segcblist_set_seglen(rsclp, from, 0); +} + +/* Increment segment's length. */ +static void rcu_segcblist_inc_seglen(struct rcu_segcblist *rsclp, int seg) +{ + rcu_segcblist_add_seglen(rsclp, seg, 1); +} + /* * Increase the numeric length of an rcu_segcblist structure by the * specified amount, which can be negative. This can cause the ->len * field to disagree with the actual number of callbacks on the structure. * This increase is fully ordered with respect to the callers accesses * both before and after. + * + * So why on earth is a memory barrier required both before and after + * the update to the ->len field??? + * + * The reason is that rcu_barrier() locklessly samples each CPU's ->len + * field, and if a given CPU's field is zero, avoids IPIing that CPU. + * This can of course race with both queuing and invoking of callbacks. + * Failing to correctly handle either of these races could result in + * rcu_barrier() failing to IPI a CPU that actually had callbacks queued + * which rcu_barrier() was obligated to wait on. And if rcu_barrier() + * failed to wait on such a callback, unloading certain kernel modules + * would result in calls to functions whose code was no longer present in + * the kernel, for but one example. + * + * Therefore, ->len transitions from 1->0 and 0->1 have to be carefully + * ordered with respect with both list modifications and the rcu_barrier(). + * + * The queuing case is CASE 1 and the invoking case is CASE 2. + * + * CASE 1: Suppose that CPU 0 has no callbacks queued, but invokes + * call_rcu() just as CPU 1 invokes rcu_barrier(). CPU 0's ->len field + * will transition from 0->1, which is one of the transitions that must + * be handled carefully. Without the full memory barriers after the ->len + * update and at the beginning of rcu_barrier(), the following could happen: + * + * CPU 0 CPU 1 + * + * call_rcu(). + * rcu_barrier() sees ->len as 0. + * set ->len = 1. + * rcu_barrier() does nothing. + * module is unloaded. + * callback invokes unloaded function! + * + * With the full barriers, any case where rcu_barrier() sees ->len as 0 will + * have unambiguously preceded the return from the racing call_rcu(), which + * means that this call_rcu() invocation is OK to not wait on. After all, + * you are supposed to make sure that any problematic call_rcu() invocations + * happen before the rcu_barrier(). + * + * + * CASE 2: Suppose that CPU 0 is invoking its last callback just as + * CPU 1 invokes rcu_barrier(). CPU 0's ->len field will transition from + * 1->0, which is one of the transitions that must be handled carefully. + * Without the full memory barriers before the ->len update and at the + * end of rcu_barrier(), the following could happen: + * + * CPU 0 CPU 1 + * + * start invoking last callback + * set ->len = 0 (reordered) + * rcu_barrier() sees ->len as 0 + * rcu_barrier() does nothing. + * module is unloaded + * callback executing after unloaded! + * + * With the full barriers, any case where rcu_barrier() sees ->len as 0 + * will be fully ordered after the completion of the callback function, + * so that the module unloading operation is completely safe. + * */ -static void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v) +void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v) { #ifdef CONFIG_RCU_NOCB_CPU - smp_mb__before_atomic(); /* Up to the caller! */ + smp_mb__before_atomic(); // Read header comment above. atomic_long_add(v, &rsclp->len); - smp_mb__after_atomic(); /* Up to the caller! */ + smp_mb__after_atomic(); // Read header comment above. #else - smp_mb(); /* Up to the caller! */ + smp_mb(); // Read header comment above. WRITE_ONCE(rsclp->len, rsclp->len + v); - smp_mb(); /* Up to the caller! */ + smp_mb(); // Read header comment above. #endif } @@ -119,26 +231,6 @@ void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp) rcu_segcblist_add_len(rsclp, 1); } -/* - * Exchange the numeric length of the specified rcu_segcblist structure - * with the specified value. This can cause the ->len field to disagree - * with the actual number of callbacks on the structure. This exchange is - * fully ordered with respect to the callers accesses both before and after. - */ -static long rcu_segcblist_xchg_len(struct rcu_segcblist *rsclp, long v) -{ -#ifdef CONFIG_RCU_NOCB_CPU - return atomic_long_xchg(&rsclp->len, v); -#else - long ret = rsclp->len; - - smp_mb(); /* Up to the caller! */ - WRITE_ONCE(rsclp->len, v); - smp_mb(); /* Up to the caller! */ - return ret; -#endif -} - /* * Initialize an rcu_segcblist structure. */ @@ -149,10 +241,12 @@ void rcu_segcblist_init(struct rcu_segcblist *rsclp) BUILD_BUG_ON(RCU_NEXT_TAIL + 1 != ARRAY_SIZE(rsclp->gp_seq)); BUILD_BUG_ON(ARRAY_SIZE(rsclp->tails) != ARRAY_SIZE(rsclp->gp_seq)); rsclp->head = NULL; - for (i = 0; i < RCU_CBLIST_NSEGS; i++) + for (i = 0; i < RCU_CBLIST_NSEGS; i++) { rsclp->tails[i] = &rsclp->head; + rcu_segcblist_set_seglen(rsclp, i, 0); + } rcu_segcblist_set_len(rsclp, 0); - rsclp->enabled = 1; + rcu_segcblist_set_flags(rsclp, SEGCBLIST_ENABLED); } /* @@ -163,16 +257,21 @@ void rcu_segcblist_disable(struct rcu_segcblist *rsclp) { WARN_ON_ONCE(!rcu_segcblist_empty(rsclp)); WARN_ON_ONCE(rcu_segcblist_n_cbs(rsclp)); - rsclp->enabled = 0; + rcu_segcblist_clear_flags(rsclp, SEGCBLIST_ENABLED); } /* * Mark the specified rcu_segcblist structure as offloaded. This * structure must be empty. */ -void rcu_segcblist_offload(struct rcu_segcblist *rsclp) +void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload) { - rsclp->offloaded = 1; + if (offload) { + rcu_segcblist_clear_flags(rsclp, SEGCBLIST_SOFTIRQ_ONLY); + rcu_segcblist_set_flags(rsclp, SEGCBLIST_OFFLOADED); + } else { + rcu_segcblist_clear_flags(rsclp, SEGCBLIST_OFFLOADED); + } } /* @@ -245,7 +344,7 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp, struct rcu_head *rhp) { rcu_segcblist_inc_len(rsclp); - smp_mb(); /* Ensure counts are updated before callback is enqueued. */ + rcu_segcblist_inc_seglen(rsclp, RCU_NEXT_TAIL); rhp->next = NULL; WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rhp); WRITE_ONCE(rsclp->tails[RCU_NEXT_TAIL], &rhp->next); @@ -274,27 +373,13 @@ bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp, for (i = RCU_NEXT_TAIL; i > RCU_DONE_TAIL; i--) if (rsclp->tails[i] != rsclp->tails[i - 1]) break; + rcu_segcblist_inc_seglen(rsclp, i); WRITE_ONCE(*rsclp->tails[i], rhp); for (; i <= RCU_NEXT_TAIL; i++) WRITE_ONCE(rsclp->tails[i], &rhp->next); return true; } -/* - * Extract only the counts from the specified rcu_segcblist structure, - * and place them in the specified rcu_cblist structure. This function - * supports both callback orphaning and invocation, hence the separation - * of counts and callbacks. (Callbacks ready for invocation must be - * orphaned and adopted separately from pending callbacks, but counts - * apply to all callbacks. Locking must be used to make sure that - * both orphaned-callbacks lists are consistent.) - */ -void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp, - struct rcu_cblist *rclp) -{ - rclp->len = rcu_segcblist_xchg_len(rsclp, 0); -} - /* * Extract only those callbacks ready to be invoked from the specified * rcu_segcblist structure and place them in the specified rcu_cblist @@ -307,6 +392,7 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp, if (!rcu_segcblist_ready_cbs(rsclp)) return; /* Nothing to do. */ + rclp->len = rcu_segcblist_get_seglen(rsclp, RCU_DONE_TAIL); *rclp->tail = rsclp->head; WRITE_ONCE(rsclp->head, *rsclp->tails[RCU_DONE_TAIL]); WRITE_ONCE(*rsclp->tails[RCU_DONE_TAIL], NULL); @@ -314,6 +400,7 @@ void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp, for (i = RCU_CBLIST_NSEGS - 1; i >= RCU_DONE_TAIL; i--) if (rsclp->tails[i] == rsclp->tails[RCU_DONE_TAIL]) WRITE_ONCE(rsclp->tails[i], &rsclp->head); + rcu_segcblist_set_seglen(rsclp, RCU_DONE_TAIL, 0); } /* @@ -330,11 +417,15 @@ void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp, if (!rcu_segcblist_pend_cbs(rsclp)) return; /* Nothing to do. */ + rclp->len = 0; *rclp->tail = *rsclp->tails[RCU_DONE_TAIL]; rclp->tail = rsclp->tails[RCU_NEXT_TAIL]; WRITE_ONCE(*rsclp->tails[RCU_DONE_TAIL], NULL); - for (i = RCU_DONE_TAIL + 1; i < RCU_CBLIST_NSEGS; i++) + for (i = RCU_DONE_TAIL + 1; i < RCU_CBLIST_NSEGS; i++) { + rclp->len += rcu_segcblist_get_seglen(rsclp, i); WRITE_ONCE(rsclp->tails[i], rsclp->tails[RCU_DONE_TAIL]); + rcu_segcblist_set_seglen(rsclp, i, 0); + } } /* @@ -345,7 +436,6 @@ void rcu_segcblist_insert_count(struct rcu_segcblist *rsclp, struct rcu_cblist *rclp) { rcu_segcblist_add_len(rsclp, rclp->len); - rclp->len = 0; } /* @@ -359,6 +449,7 @@ void rcu_segcblist_insert_done_cbs(struct rcu_segcblist *rsclp, if (!rclp->head) return; /* No callbacks to move. */ + rcu_segcblist_add_seglen(rsclp, RCU_DONE_TAIL, rclp->len); *rclp->tail = rsclp->head; WRITE_ONCE(rsclp->head, rclp->head); for (i = RCU_DONE_TAIL; i < RCU_CBLIST_NSEGS; i++) @@ -379,6 +470,8 @@ void rcu_segcblist_insert_pend_cbs(struct rcu_segcblist *rsclp, { if (!rclp->head) return; /* Nothing to do. */ + + rcu_segcblist_add_seglen(rsclp, RCU_NEXT_TAIL, rclp->len); WRITE_ONCE(*rsclp->tails[RCU_NEXT_TAIL], rclp->head); WRITE_ONCE(rsclp->tails[RCU_NEXT_TAIL], rclp->tail); } @@ -403,6 +496,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq) if (ULONG_CMP_LT(seq, rsclp->gp_seq[i])) break; WRITE_ONCE(rsclp->tails[RCU_DONE_TAIL], rsclp->tails[i]); + rcu_segcblist_move_seglen(rsclp, i, RCU_DONE_TAIL); } /* If no callbacks moved, nothing more need be done. */ @@ -423,6 +517,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq) if (rsclp->tails[j] == rsclp->tails[RCU_NEXT_TAIL]) break; /* No more callbacks. */ WRITE_ONCE(rsclp->tails[j], rsclp->tails[i]); + rcu_segcblist_move_seglen(rsclp, i, j); rsclp->gp_seq[j] = rsclp->gp_seq[i]; } } @@ -444,7 +539,7 @@ void rcu_segcblist_advance(struct rcu_segcblist *rsclp, unsigned long seq) */ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq) { - int i; + int i, j; WARN_ON_ONCE(!rcu_segcblist_is_enabled(rsclp)); if (rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL)) @@ -487,6 +582,10 @@ bool rcu_segcblist_accelerate(struct rcu_segcblist *rsclp, unsigned long seq) if (rcu_segcblist_restempty(rsclp, i) || ++i >= RCU_NEXT_TAIL) return false; + /* Accounting: everything below i is about to get merged into i. */ + for (j = i + 1; j <= RCU_NEXT_TAIL; j++) + rcu_segcblist_move_seglen(rsclp, j, i); + /* * Merge all later callbacks, including newly arrived callbacks, * into the segment located by the for-loop above. Assign "seq" @@ -514,13 +613,24 @@ void rcu_segcblist_merge(struct rcu_segcblist *dst_rsclp, struct rcu_cblist donecbs; struct rcu_cblist pendcbs; + lockdep_assert_cpus_held(); + rcu_cblist_init(&donecbs); rcu_cblist_init(&pendcbs); - rcu_segcblist_extract_count(src_rsclp, &donecbs); + rcu_segcblist_extract_done_cbs(src_rsclp, &donecbs); rcu_segcblist_extract_pend_cbs(src_rsclp, &pendcbs); + + /* + * No need smp_mb() before setting length to 0, because CPU hotplug + * lock excludes rcu_barrier. + */ + rcu_segcblist_set_len(src_rsclp, 0); + rcu_segcblist_insert_count(dst_rsclp, &donecbs); + rcu_segcblist_insert_count(dst_rsclp, &pendcbs); rcu_segcblist_insert_done_cbs(dst_rsclp, &donecbs); rcu_segcblist_insert_pend_cbs(dst_rsclp, &pendcbs); + rcu_segcblist_init(src_rsclp); } diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h index 492262bcb591..9a19328ff251 100644 --- a/kernel/rcu/rcu_segcblist.h +++ b/kernel/rcu/rcu_segcblist.h @@ -15,6 +15,9 @@ static inline long rcu_cblist_n_cbs(struct rcu_cblist *rclp) return READ_ONCE(rclp->len); } +/* Return number of callbacks in segmented callback list by summing seglen. */ +long rcu_segcblist_n_segment_cbs(struct rcu_segcblist *rsclp); + void rcu_cblist_init(struct rcu_cblist *rclp); void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp); void rcu_cblist_flush_enqueue(struct rcu_cblist *drclp, @@ -50,19 +53,51 @@ static inline long rcu_segcblist_n_cbs(struct rcu_segcblist *rsclp) #endif } +static inline void rcu_segcblist_set_flags(struct rcu_segcblist *rsclp, + int flags) +{ + rsclp->flags |= flags; +} + +static inline void rcu_segcblist_clear_flags(struct rcu_segcblist *rsclp, + int flags) +{ + rsclp->flags &= ~flags; +} + +static inline bool rcu_segcblist_test_flags(struct rcu_segcblist *rsclp, + int flags) +{ + return READ_ONCE(rsclp->flags) & flags; +} + /* * Is the specified rcu_segcblist enabled, for example, not corresponding * to an offline CPU? */ static inline bool rcu_segcblist_is_enabled(struct rcu_segcblist *rsclp) { - return rsclp->enabled; + return rcu_segcblist_test_flags(rsclp, SEGCBLIST_ENABLED); } -/* Is the specified rcu_segcblist offloaded? */ +/* Is the specified rcu_segcblist offloaded, or is SEGCBLIST_SOFTIRQ_ONLY set? */ static inline bool rcu_segcblist_is_offloaded(struct rcu_segcblist *rsclp) { - return IS_ENABLED(CONFIG_RCU_NOCB_CPU) && rsclp->offloaded; + if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) && + !rcu_segcblist_test_flags(rsclp, SEGCBLIST_SOFTIRQ_ONLY)) + return true; + + return false; +} + +static inline bool rcu_segcblist_completely_offloaded(struct rcu_segcblist *rsclp) +{ + int flags = SEGCBLIST_KTHREAD_CB | SEGCBLIST_KTHREAD_GP | SEGCBLIST_OFFLOADED; + + if (IS_ENABLED(CONFIG_RCU_NOCB_CPU) && (rsclp->flags & flags) == flags) + return true; + + return false; } /* @@ -75,10 +110,22 @@ static inline bool rcu_segcblist_restempty(struct rcu_segcblist *rsclp, int seg) return !READ_ONCE(*READ_ONCE(rsclp->tails[seg])); } +/* + * Is the specified segment of the specified rcu_segcblist structure + * empty of callbacks? + */ +static inline bool rcu_segcblist_segempty(struct rcu_segcblist *rsclp, int seg) +{ + if (seg == RCU_DONE_TAIL) + return &rsclp->head == rsclp->tails[RCU_DONE_TAIL]; + return rsclp->tails[seg - 1] == rsclp->tails[seg]; +} + void rcu_segcblist_inc_len(struct rcu_segcblist *rsclp); +void rcu_segcblist_add_len(struct rcu_segcblist *rsclp, long v); void rcu_segcblist_init(struct rcu_segcblist *rsclp); void rcu_segcblist_disable(struct rcu_segcblist *rsclp); -void rcu_segcblist_offload(struct rcu_segcblist *rsclp); +void rcu_segcblist_offload(struct rcu_segcblist *rsclp, bool offload); bool rcu_segcblist_ready_cbs(struct rcu_segcblist *rsclp); bool rcu_segcblist_pend_cbs(struct rcu_segcblist *rsclp); struct rcu_head *rcu_segcblist_first_cb(struct rcu_segcblist *rsclp); @@ -88,8 +135,6 @@ void rcu_segcblist_enqueue(struct rcu_segcblist *rsclp, struct rcu_head *rhp); bool rcu_segcblist_entrain(struct rcu_segcblist *rsclp, struct rcu_head *rhp); -void rcu_segcblist_extract_count(struct rcu_segcblist *rsclp, - struct rcu_cblist *rclp); void rcu_segcblist_extract_done_cbs(struct rcu_segcblist *rsclp, struct rcu_cblist *rclp); void rcu_segcblist_extract_pend_cbs(struct rcu_segcblist *rsclp, diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c index 528ed10b78fd..99657ffa6688 100644 --- a/kernel/rcu/rcutorture.c +++ b/kernel/rcu/rcutorture.c @@ -85,6 +85,7 @@ torture_param(bool, gp_cond, false, "Use conditional/async GP wait primitives"); torture_param(bool, gp_exp, false, "Use expedited GP wait primitives"); torture_param(bool, gp_normal, false, "Use normal (non-expedited) GP wait primitives"); +torture_param(bool, gp_poll, false, "Use polling GP wait primitives"); torture_param(bool, gp_sync, false, "Use synchronous GP wait primitives"); torture_param(int, irqreader, 1, "Allow RCU readers from irq handlers"); torture_param(int, leakpointer, 0, "Leak pointer dereferences from readers"); @@ -97,6 +98,8 @@ torture_param(int, object_debug, 0, torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)"); torture_param(int, onoff_interval, 0, "Time between CPU hotplugs (jiffies), 0=disable"); +torture_param(int, nocbs_nthreads, 0, "Number of NOCB toggle threads, 0 to disable"); +torture_param(int, nocbs_toggle, 1000, "Time between toggling nocb state (ms)"); torture_param(int, read_exit_delay, 13, "Delay between read-then-exit episodes (s)"); torture_param(int, read_exit_burst, 16, @@ -127,10 +130,12 @@ static char *torture_type = "rcu"; module_param(torture_type, charp, 0444); MODULE_PARM_DESC(torture_type, "Type of RCU to torture (rcu, srcu, ...)"); +static int nrealnocbers; static int nrealreaders; static struct task_struct *writer_task; static struct task_struct **fakewriter_tasks; static struct task_struct **reader_tasks; +static struct task_struct **nocb_tasks; static struct task_struct *stats_task; static struct task_struct *fqs_task; static struct task_struct *boost_tasks[NR_CPUS]; @@ -142,11 +147,22 @@ static struct task_struct *read_exit_task; #define RCU_TORTURE_PIPE_LEN 10 +// Mailbox-like structure to check RCU global memory ordering. +struct rcu_torture_reader_check { + unsigned long rtc_myloops; + int rtc_chkrdr; + unsigned long rtc_chkloops; + int rtc_ready; + struct rcu_torture_reader_check *rtc_assigner; +} ____cacheline_internodealigned_in_smp; + +// Update-side data structure used to check RCU readers. struct rcu_torture { struct rcu_head rtort_rcu; int rtort_pipe_count; struct list_head rtort_free; int rtort_mbtest; + struct rcu_torture_reader_check *rtort_chkp; }; static LIST_HEAD(rcu_torture_freelist); @@ -157,10 +173,13 @@ static DEFINE_SPINLOCK(rcu_torture_lock); static DEFINE_PER_CPU(long [RCU_TORTURE_PIPE_LEN + 1], rcu_torture_count); static DEFINE_PER_CPU(long [RCU_TORTURE_PIPE_LEN + 1], rcu_torture_batch); static atomic_t rcu_torture_wcount[RCU_TORTURE_PIPE_LEN + 1]; +static struct rcu_torture_reader_check *rcu_torture_reader_mbchk; static atomic_t n_rcu_torture_alloc; static atomic_t n_rcu_torture_alloc_fail; static atomic_t n_rcu_torture_free; static atomic_t n_rcu_torture_mberror; +static atomic_t n_rcu_torture_mbchk_fail; +static atomic_t n_rcu_torture_mbchk_tries; static atomic_t n_rcu_torture_error; static long n_rcu_torture_barrier_error; static long n_rcu_torture_boost_ktrerror; @@ -174,6 +193,8 @@ static unsigned long n_read_exits; static struct list_head rcu_torture_removed; static unsigned long shutdown_jiffies; static unsigned long start_gp_seq; +static atomic_long_t n_nocb_offload; +static atomic_long_t n_nocb_deoffload; static int rcu_torture_writer_state; #define RTWS_FIXED_DELAY 0 @@ -183,9 +204,11 @@ static int rcu_torture_writer_state; #define RTWS_EXP_SYNC 4 #define RTWS_COND_GET 5 #define RTWS_COND_SYNC 6 -#define RTWS_SYNC 7 -#define RTWS_STUTTER 8 -#define RTWS_STOPPING 9 +#define RTWS_POLL_GET 7 +#define RTWS_POLL_WAIT 8 +#define RTWS_SYNC 9 +#define RTWS_STUTTER 10 +#define RTWS_STOPPING 11 static const char * const rcu_torture_writer_state_names[] = { "RTWS_FIXED_DELAY", "RTWS_DELAY", @@ -194,6 +217,8 @@ static const char * const rcu_torture_writer_state_names[] = { "RTWS_EXP_SYNC", "RTWS_COND_GET", "RTWS_COND_SYNC", + "RTWS_POLL_GET", + "RTWS_POLL_WAIT", "RTWS_SYNC", "RTWS_STUTTER", "RTWS_STOPPING", @@ -311,7 +336,9 @@ struct rcu_torture_ops { void (*deferred_free)(struct rcu_torture *p); void (*sync)(void); void (*exp_sync)(void); - unsigned long (*get_state)(void); + unsigned long (*get_gp_state)(void); + unsigned long (*start_gp_poll)(void); + bool (*poll_gp_state)(unsigned long oldstate); void (*cond_sync)(unsigned long oldstate); call_rcu_func_t call; void (*cb_barrier)(void); @@ -386,7 +413,12 @@ static bool rcu_torture_pipe_update_one(struct rcu_torture *rp) { int i; + struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp); + if (rtrcp) { + WRITE_ONCE(rp->rtort_chkp, NULL); + smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire(). + } i = READ_ONCE(rp->rtort_pipe_count); if (i > RCU_TORTURE_PIPE_LEN) i = RCU_TORTURE_PIPE_LEN; @@ -461,7 +493,7 @@ static struct rcu_torture_ops rcu_ops = { .deferred_free = rcu_torture_deferred_free, .sync = synchronize_rcu, .exp_sync = synchronize_rcu_expedited, - .get_state = get_state_synchronize_rcu, + .get_gp_state = get_state_synchronize_rcu, .cond_sync = cond_synchronize_rcu, .call = call_rcu, .cb_barrier = rcu_barrier, @@ -570,6 +602,21 @@ static void srcu_torture_synchronize(void) synchronize_srcu(srcu_ctlp); } +static unsigned long srcu_torture_get_gp_state(void) +{ + return get_state_synchronize_srcu(srcu_ctlp); +} + +static unsigned long srcu_torture_start_gp_poll(void) +{ + return start_poll_synchronize_srcu(srcu_ctlp); +} + +static bool srcu_torture_poll_gp_state(unsigned long oldstate) +{ + return poll_state_synchronize_srcu(srcu_ctlp, oldstate); +} + static void srcu_torture_call(struct rcu_head *head, rcu_callback_t func) { @@ -601,6 +648,9 @@ static struct rcu_torture_ops srcu_ops = { .deferred_free = srcu_torture_deferred_free, .sync = srcu_torture_synchronize, .exp_sync = srcu_torture_synchronize_expedited, + .get_gp_state = srcu_torture_get_gp_state, + .start_gp_poll = srcu_torture_start_gp_poll, + .poll_gp_state = srcu_torture_poll_gp_state, .call = srcu_torture_call, .cb_barrier = srcu_torture_barrier, .stats = srcu_torture_stats, @@ -1018,42 +1068,26 @@ rcu_torture_fqs(void *arg) return 0; } -/* - * RCU torture writer kthread. Repeatedly substitutes a new structure - * for that pointed to by rcu_torture_current, freeing the old structure - * after a series of grace periods (the "pipeline"). - */ -static int -rcu_torture_writer(void *arg) -{ - bool can_expedite = !rcu_gp_is_expedited() && !rcu_gp_is_normal(); - int expediting = 0; - unsigned long gp_snap; - bool gp_cond1 = gp_cond, gp_exp1 = gp_exp, gp_normal1 = gp_normal; - bool gp_sync1 = gp_sync; - int i; - int oldnice = task_nice(current); - struct rcu_torture *rp; - struct rcu_torture *old_rp; - static DEFINE_TORTURE_RANDOM(rand); - bool stutter_waited; - int synctype[] = { RTWS_DEF_FREE, RTWS_EXP_SYNC, - RTWS_COND_GET, RTWS_SYNC }; - int nsynctypes = 0; +// Used by writers to randomly choose from the available grace-period +// primitives. The only purpose of the initialization is to size the array. +static int synctype[] = { RTWS_DEF_FREE, RTWS_EXP_SYNC, RTWS_COND_GET, RTWS_POLL_GET, RTWS_SYNC }; +static int nsynctypes; - VERBOSE_TOROUT_STRING("rcu_torture_writer task started"); - if (!can_expedite) - pr_alert("%s" TORTURE_FLAG - " GP expediting controlled from boot/sysfs for %s.\n", - torture_type, cur_ops->name); +/* + * Determine which grace-period primitives are available. + */ +static void rcu_torture_write_types(void) +{ + bool gp_cond1 = gp_cond, gp_exp1 = gp_exp, gp_normal1 = gp_normal; + bool gp_poll1 = gp_poll, gp_sync1 = gp_sync; /* Initialize synctype[] array. If none set, take default. */ - if (!gp_cond1 && !gp_exp1 && !gp_normal1 && !gp_sync1) - gp_cond1 = gp_exp1 = gp_normal1 = gp_sync1 = true; - if (gp_cond1 && cur_ops->get_state && cur_ops->cond_sync) { + if (!gp_cond1 && !gp_exp1 && !gp_normal1 && !gp_poll1 && !gp_sync1) + gp_cond1 = gp_exp1 = gp_normal1 = gp_poll1 = gp_sync1 = true; + if (gp_cond1 && cur_ops->get_gp_state && cur_ops->cond_sync) { synctype[nsynctypes++] = RTWS_COND_GET; pr_info("%s: Testing conditional GPs.\n", __func__); - } else if (gp_cond && (!cur_ops->get_state || !cur_ops->cond_sync)) { + } else if (gp_cond && (!cur_ops->get_gp_state || !cur_ops->cond_sync)) { pr_alert("%s: gp_cond without primitives.\n", __func__); } if (gp_exp1 && cur_ops->exp_sync) { @@ -1068,12 +1102,46 @@ rcu_torture_writer(void *arg) } else if (gp_normal && !cur_ops->deferred_free) { pr_alert("%s: gp_normal without primitives.\n", __func__); } + if (gp_poll1 && cur_ops->start_gp_poll && cur_ops->poll_gp_state) { + synctype[nsynctypes++] = RTWS_POLL_GET; + pr_info("%s: Testing polling GPs.\n", __func__); + } else if (gp_poll && (!cur_ops->start_gp_poll || !cur_ops->poll_gp_state)) { + pr_alert("%s: gp_poll without primitives.\n", __func__); + } if (gp_sync1 && cur_ops->sync) { synctype[nsynctypes++] = RTWS_SYNC; pr_info("%s: Testing normal GPs.\n", __func__); } else if (gp_sync && !cur_ops->sync) { pr_alert("%s: gp_sync without primitives.\n", __func__); } +} + +/* + * RCU torture writer kthread. Repeatedly substitutes a new structure + * for that pointed to by rcu_torture_current, freeing the old structure + * after a series of grace periods (the "pipeline"). + */ +static int +rcu_torture_writer(void *arg) +{ + bool boot_ended; + bool can_expedite = !rcu_gp_is_expedited() && !rcu_gp_is_normal(); + unsigned long cookie; + int expediting = 0; + unsigned long gp_snap; + int i; + int idx; + int oldnice = task_nice(current); + struct rcu_torture *rp; + struct rcu_torture *old_rp; + static DEFINE_TORTURE_RANDOM(rand); + bool stutter_waited; + + VERBOSE_TOROUT_STRING("rcu_torture_writer task started"); + if (!can_expedite) + pr_alert("%s" TORTURE_FLAG + " GP expediting controlled from boot/sysfs for %s.\n", + torture_type, cur_ops->name); if (WARN_ONCE(nsynctypes == 0, "rcu_torture_writer: No update-side primitives.\n")) { /* @@ -1087,7 +1155,7 @@ rcu_torture_writer(void *arg) do { rcu_torture_writer_state = RTWS_FIXED_DELAY; - schedule_timeout_uninterruptible(1); + torture_hrtimeout_us(500, 1000, &rand); rp = rcu_torture_alloc(); if (rp == NULL) continue; @@ -1107,6 +1175,18 @@ rcu_torture_writer(void *arg) atomic_inc(&rcu_torture_wcount[i]); WRITE_ONCE(old_rp->rtort_pipe_count, old_rp->rtort_pipe_count + 1); + if (cur_ops->get_gp_state && cur_ops->poll_gp_state) { + idx = cur_ops->readlock(); + cookie = cur_ops->get_gp_state(); + WARN_ONCE(rcu_torture_writer_state != RTWS_DEF_FREE && + cur_ops->poll_gp_state(cookie), + "%s: Cookie check 1 failed %s(%d) %lu->%lu\n", + __func__, + rcu_torture_writer_state_getname(), + rcu_torture_writer_state, + cookie, cur_ops->get_gp_state()); + cur_ops->readunlock(idx); + } switch (synctype[torture_random(&rand) % nsynctypes]) { case RTWS_DEF_FREE: rcu_torture_writer_state = RTWS_DEF_FREE; @@ -1119,15 +1199,21 @@ rcu_torture_writer(void *arg) break; case RTWS_COND_GET: rcu_torture_writer_state = RTWS_COND_GET; - gp_snap = cur_ops->get_state(); - i = torture_random(&rand) % 16; - if (i != 0) - schedule_timeout_interruptible(i); - udelay(torture_random(&rand) % 1000); + gp_snap = cur_ops->get_gp_state(); + torture_hrtimeout_jiffies(torture_random(&rand) % 16, &rand); rcu_torture_writer_state = RTWS_COND_SYNC; cur_ops->cond_sync(gp_snap); rcu_torture_pipe_update(old_rp); break; + case RTWS_POLL_GET: + rcu_torture_writer_state = RTWS_POLL_GET; + gp_snap = cur_ops->start_gp_poll(); + rcu_torture_writer_state = RTWS_POLL_WAIT; + while (!cur_ops->poll_gp_state(gp_snap)) + torture_hrtimeout_jiffies(torture_random(&rand) % 16, + &rand); + rcu_torture_pipe_update(old_rp); + break; case RTWS_SYNC: rcu_torture_writer_state = RTWS_SYNC; cur_ops->sync(); @@ -1137,6 +1223,14 @@ rcu_torture_writer(void *arg) WARN_ON_ONCE(1); break; } + if (cur_ops->get_gp_state && cur_ops->poll_gp_state) + WARN_ONCE(rcu_torture_writer_state != RTWS_DEF_FREE && + !cur_ops->poll_gp_state(cookie), + "%s: Cookie check 2 failed %s(%d) %lu->%lu\n", + __func__, + rcu_torture_writer_state_getname(), + rcu_torture_writer_state, + cookie, cur_ops->get_gp_state()); } WRITE_ONCE(rcu_torture_current_version, rcu_torture_current_version + 1); @@ -1155,12 +1249,13 @@ rcu_torture_writer(void *arg) !rcu_gp_is_normal(); } rcu_torture_writer_state = RTWS_STUTTER; + boot_ended = rcu_inkernel_boot_has_ended(); stutter_waited = stutter_wait("rcu_torture_writer"); if (stutter_waited && !READ_ONCE(rcu_fwd_cb_nodelay) && !cur_ops->slow_gps && !torture_must_stop() && - rcu_inkernel_boot_has_ended()) + boot_ended) for (i = 0; i < ARRAY_SIZE(rcu_tortures); i++) if (list_empty(&rcu_tortures[i].rtort_free) && rcu_access_pointer(rcu_torture_current) != @@ -1194,26 +1289,43 @@ rcu_torture_writer(void *arg) static int rcu_torture_fakewriter(void *arg) { + unsigned long gp_snap; DEFINE_TORTURE_RANDOM(rand); VERBOSE_TOROUT_STRING("rcu_torture_fakewriter task started"); set_user_nice(current, MAX_NICE); do { - schedule_timeout_uninterruptible(1 + torture_random(&rand)%10); - udelay(torture_random(&rand) & 0x3ff); + torture_hrtimeout_jiffies(torture_random(&rand) % 10, &rand); if (cur_ops->cb_barrier != NULL && torture_random(&rand) % (nfakewriters * 8) == 0) { cur_ops->cb_barrier(); - } else if (gp_normal == gp_exp) { - if (cur_ops->sync && torture_random(&rand) & 0x80) - cur_ops->sync(); - else if (cur_ops->exp_sync) + } else { + switch (synctype[torture_random(&rand) % nsynctypes]) { + case RTWS_DEF_FREE: + break; + case RTWS_EXP_SYNC: cur_ops->exp_sync(); - } else if (gp_normal && cur_ops->sync) { - cur_ops->sync(); - } else if (cur_ops->exp_sync) { - cur_ops->exp_sync(); + break; + case RTWS_COND_GET: + gp_snap = cur_ops->get_gp_state(); + torture_hrtimeout_jiffies(torture_random(&rand) % 16, &rand); + cur_ops->cond_sync(gp_snap); + break; + case RTWS_POLL_GET: + gp_snap = cur_ops->start_gp_poll(); + while (!cur_ops->poll_gp_state(gp_snap)) { + torture_hrtimeout_jiffies(torture_random(&rand) % 16, + &rand); + } + break; + case RTWS_SYNC: + cur_ops->sync(); + break; + default: + WARN_ON_ONCE(1); + break; + } } stutter_wait("rcu_torture_fakewriter"); } while (!torture_must_stop()); @@ -1227,6 +1339,62 @@ static void rcu_torture_timer_cb(struct rcu_head *rhp) kfree(rhp); } +// Set up and carry out testing of RCU's global memory ordering +static void rcu_torture_reader_do_mbchk(long myid, struct rcu_torture *rtp, + struct torture_random_state *trsp) +{ + unsigned long loops; + int noc = torture_num_online_cpus(); + int rdrchked; + int rdrchker; + struct rcu_torture_reader_check *rtrcp; // Me. + struct rcu_torture_reader_check *rtrcp_assigner; // Assigned us to do checking. + struct rcu_torture_reader_check *rtrcp_chked; // Reader being checked. + struct rcu_torture_reader_check *rtrcp_chker; // Reader doing checking when not me. + + if (myid < 0) + return; // Don't try this from timer handlers. + + // Increment my counter. + rtrcp = &rcu_torture_reader_mbchk[myid]; + WRITE_ONCE(rtrcp->rtc_myloops, rtrcp->rtc_myloops + 1); + + // Attempt to assign someone else some checking work. + rdrchked = torture_random(trsp) % nrealreaders; + rtrcp_chked = &rcu_torture_reader_mbchk[rdrchked]; + rdrchker = torture_random(trsp) % nrealreaders; + rtrcp_chker = &rcu_torture_reader_mbchk[rdrchker]; + if (rdrchked != myid && rdrchked != rdrchker && noc >= rdrchked && noc >= rdrchker && + smp_load_acquire(&rtrcp->rtc_chkrdr) < 0 && // Pairs with smp_store_release below. + !READ_ONCE(rtp->rtort_chkp) && + !smp_load_acquire(&rtrcp_chker->rtc_assigner)) { // Pairs with smp_store_release below. + rtrcp->rtc_chkloops = READ_ONCE(rtrcp_chked->rtc_myloops); + WARN_ON_ONCE(rtrcp->rtc_chkrdr >= 0); + rtrcp->rtc_chkrdr = rdrchked; + WARN_ON_ONCE(rtrcp->rtc_ready); // This gets set after the grace period ends. + if (cmpxchg_relaxed(&rtrcp_chker->rtc_assigner, NULL, rtrcp) || + cmpxchg_relaxed(&rtp->rtort_chkp, NULL, rtrcp)) + (void)cmpxchg_relaxed(&rtrcp_chker->rtc_assigner, rtrcp, NULL); // Back out. + } + + // If assigned some completed work, do it! + rtrcp_assigner = READ_ONCE(rtrcp->rtc_assigner); + if (!rtrcp_assigner || !smp_load_acquire(&rtrcp_assigner->rtc_ready)) + return; // No work or work not yet ready. + rdrchked = rtrcp_assigner->rtc_chkrdr; + if (WARN_ON_ONCE(rdrchked < 0)) + return; + rtrcp_chked = &rcu_torture_reader_mbchk[rdrchked]; + loops = READ_ONCE(rtrcp_chked->rtc_myloops); + atomic_inc(&n_rcu_torture_mbchk_tries); + if (ULONG_CMP_LT(loops, rtrcp_assigner->rtc_chkloops)) + atomic_inc(&n_rcu_torture_mbchk_fail); + rtrcp_assigner->rtc_chkloops = loops + ULONG_MAX / 2; + rtrcp_assigner->rtc_ready = 0; + smp_store_release(&rtrcp->rtc_assigner, NULL); // Someone else can assign us work. + smp_store_release(&rtrcp_assigner->rtc_chkrdr, -1); // Assigner can again assign. +} + /* * Do one extension of an RCU read-side critical section using the * current reader state in readstate (set to zero for initial entry @@ -1362,8 +1530,9 @@ rcutorture_loop_extend(int *readstate, struct torture_random_state *trsp, * no data to read. Can be invoked both from process context and * from a timer handler. */ -static bool rcu_torture_one_read(struct torture_random_state *trsp) +static bool rcu_torture_one_read(struct torture_random_state *trsp, long myid) { + unsigned long cookie; int i; unsigned long started; unsigned long completed; @@ -1379,6 +1548,8 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp) WARN_ON_ONCE(!rcu_is_watching()); newstate = rcutorture_extend_mask(readstate, trsp); rcutorture_one_extend(&readstate, newstate, trsp, rtrsp++); + if (cur_ops->get_gp_state && cur_ops->poll_gp_state) + cookie = cur_ops->get_gp_state(); started = cur_ops->get_gp_seq(); ts = rcu_trace_clock_local(); p = rcu_dereference_check(rcu_torture_current, @@ -1394,6 +1565,7 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp) } if (p->rtort_mbtest == 0) atomic_inc(&n_rcu_torture_mberror); + rcu_torture_reader_do_mbchk(myid, p, trsp); rtrsp = rcutorture_loop_extend(&readstate, trsp, rtrsp); preempt_disable(); pipe_count = READ_ONCE(p->rtort_pipe_count); @@ -1415,6 +1587,13 @@ static bool rcu_torture_one_read(struct torture_random_state *trsp) } __this_cpu_inc(rcu_torture_batch[completed]); preempt_enable(); + if (cur_ops->get_gp_state && cur_ops->poll_gp_state) + WARN_ONCE(cur_ops->poll_gp_state(cookie), + "%s: Cookie check 3 failed %s(%d) %lu->%lu\n", + __func__, + rcu_torture_writer_state_getname(), + rcu_torture_writer_state, + cookie, cur_ops->get_gp_state()); rcutorture_one_extend(&readstate, 0, trsp, rtrsp); WARN_ON_ONCE(readstate & RCUTORTURE_RDR_MASK); // This next splat is expected behavior if leakpointer, especially @@ -1443,7 +1622,7 @@ static DEFINE_TORTURE_RANDOM_PERCPU(rcu_torture_timer_rand); static void rcu_torture_timer(struct timer_list *unused) { atomic_long_inc(&n_rcu_torture_timers); - (void)rcu_torture_one_read(this_cpu_ptr(&rcu_torture_timer_rand)); + (void)rcu_torture_one_read(this_cpu_ptr(&rcu_torture_timer_rand), -1); /* Test call_rcu() invocation from interrupt handler. */ if (cur_ops->call) { @@ -1479,13 +1658,13 @@ rcu_torture_reader(void *arg) if (!timer_pending(&t)) mod_timer(&t, jiffies + 1); } - if (!rcu_torture_one_read(&rand) && !torture_must_stop()) + if (!rcu_torture_one_read(&rand, myid) && !torture_must_stop()) schedule_timeout_interruptible(HZ); if (time_after(jiffies, lastsleep) && !torture_must_stop()) { - schedule_timeout_interruptible(1); + torture_hrtimeout_us(500, 1000, &rand); lastsleep = jiffies + 10; } - while (num_online_cpus() < mynumonline && !torture_must_stop()) + while (torture_num_online_cpus() < mynumonline && !torture_must_stop()) schedule_timeout_interruptible(HZ / 5); stutter_wait("rcu_torture_reader"); } while (!torture_must_stop()); @@ -1498,6 +1677,53 @@ rcu_torture_reader(void *arg) return 0; } +/* + * Randomly Toggle CPUs' callback-offload state. This uses hrtimers to + * increase race probabilities and fuzzes the interval between toggling. + */ +static int rcu_nocb_toggle(void *arg) +{ + int cpu; + int maxcpu = -1; + int oldnice = task_nice(current); + long r; + DEFINE_TORTURE_RANDOM(rand); + ktime_t toggle_delay; + unsigned long toggle_fuzz; + ktime_t toggle_interval = ms_to_ktime(nocbs_toggle); + + VERBOSE_TOROUT_STRING("rcu_nocb_toggle task started"); + while (!rcu_inkernel_boot_has_ended()) + schedule_timeout_interruptible(HZ / 10); + for_each_online_cpu(cpu) + maxcpu = cpu; + WARN_ON(maxcpu < 0); + if (toggle_interval > ULONG_MAX) + toggle_fuzz = ULONG_MAX >> 3; + else + toggle_fuzz = toggle_interval >> 3; + if (toggle_fuzz <= 0) + toggle_fuzz = NSEC_PER_USEC; + do { + r = torture_random(&rand); + cpu = (r >> 4) % (maxcpu + 1); + if (r & 0x1) { + rcu_nocb_cpu_offload(cpu); + atomic_long_inc(&n_nocb_offload); + } else { + rcu_nocb_cpu_deoffload(cpu); + atomic_long_inc(&n_nocb_deoffload); + } + toggle_delay = torture_random(&rand) % toggle_fuzz + toggle_interval; + set_current_state(TASK_INTERRUPTIBLE); + schedule_hrtimeout(&toggle_delay, HRTIMER_MODE_REL); + if (stutter_wait("rcu_nocb_toggle")) + sched_set_normal(current, oldnice); + } while (!torture_must_stop()); + torture_kthread_stopping("rcu_nocb_toggle"); + return 0; +} + /* * Print torture statistics. Caller must ensure that there is only * one call to this function at a given time!!! This is normally @@ -1539,8 +1765,9 @@ rcu_torture_stats_print(void) atomic_read(&n_rcu_torture_alloc), atomic_read(&n_rcu_torture_alloc_fail), atomic_read(&n_rcu_torture_free)); - pr_cont("rtmbe: %d rtbe: %ld rtbke: %ld rtbre: %ld ", + pr_cont("rtmbe: %d rtmbkf: %d/%d rtbe: %ld rtbke: %ld rtbre: %ld ", atomic_read(&n_rcu_torture_mberror), + atomic_read(&n_rcu_torture_mbchk_fail), atomic_read(&n_rcu_torture_mbchk_tries), n_rcu_torture_barrier_error, n_rcu_torture_boost_ktrerror, n_rcu_torture_boost_rterror); @@ -1553,16 +1780,20 @@ rcu_torture_stats_print(void) data_race(n_barrier_successes), data_race(n_barrier_attempts), data_race(n_rcu_torture_barrier_error)); - pr_cont("read-exits: %ld\n", data_race(n_read_exits)); + pr_cont("read-exits: %ld ", data_race(n_read_exits)); // Statistic. + pr_cont("nocb-toggles: %ld:%ld\n", + atomic_long_read(&n_nocb_offload), atomic_long_read(&n_nocb_deoffload)); pr_alert("%s%s ", torture_type, TORTURE_FLAG); if (atomic_read(&n_rcu_torture_mberror) || + atomic_read(&n_rcu_torture_mbchk_fail) || n_rcu_torture_barrier_error || n_rcu_torture_boost_ktrerror || n_rcu_torture_boost_rterror || n_rcu_torture_boost_failure || i > 1) { pr_cont("%s", "!!! "); atomic_inc(&n_rcu_torture_error); WARN_ON_ONCE(atomic_read(&n_rcu_torture_mberror)); + WARN_ON_ONCE(atomic_read(&n_rcu_torture_mbchk_fail)); WARN_ON_ONCE(n_rcu_torture_barrier_error); // rcu_barrier() WARN_ON_ONCE(n_rcu_torture_boost_ktrerror); // no boost kthread WARN_ON_ONCE(n_rcu_torture_boost_rterror); // can't set RT prio @@ -1647,7 +1878,8 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag) "stall_cpu_block=%d " "n_barrier_cbs=%d " "onoff_interval=%d onoff_holdoff=%d " - "read_exit_delay=%d read_exit_burst=%d\n", + "read_exit_delay=%d read_exit_burst=%d " + "nocbs_nthreads=%d nocbs_toggle=%d\n", torture_type, tag, nrealreaders, nfakewriters, stat_interval, verbose, test_no_idle_hz, shuffle_interval, stutter, irqreader, fqs_duration, fqs_holdoff, fqs_stutter, @@ -1657,7 +1889,8 @@ rcu_torture_print_module_parms(struct rcu_torture_ops *cur_ops, const char *tag) stall_cpu_block, n_barrier_cbs, onoff_interval, onoff_holdoff, - read_exit_delay, read_exit_burst); + read_exit_delay, read_exit_burst, + nocbs_nthreads, nocbs_toggle); } static int rcutorture_booster_cleanup(unsigned int cpu) @@ -2392,7 +2625,7 @@ static int rcu_torture_read_exit_child(void *trsp_in) // Minimize time between reading and exiting. while (!kthread_should_stop()) schedule_timeout_uninterruptible(1); - (void)rcu_torture_one_read(trsp); + (void)rcu_torture_one_read(trsp, -1); return 0; } @@ -2500,6 +2733,13 @@ rcu_torture_cleanup(void) torture_stop_kthread(rcu_torture_stall, stall_task); torture_stop_kthread(rcu_torture_writer, writer_task); + if (nocb_tasks) { + for (i = 0; i < nrealnocbers; i++) + torture_stop_kthread(rcu_nocb_toggle, nocb_tasks[i]); + kfree(nocb_tasks); + nocb_tasks = NULL; + } + if (reader_tasks) { for (i = 0; i < nrealreaders; i++) torture_stop_kthread(rcu_torture_reader, @@ -2507,6 +2747,8 @@ rcu_torture_cleanup(void) kfree(reader_tasks); reader_tasks = NULL; } + kfree(rcu_torture_reader_mbchk); + rcu_torture_reader_mbchk = NULL; if (fakewriter_tasks) { for (i = 0; i < nfakewriters; i++) @@ -2604,6 +2846,7 @@ static void rcu_test_debug_objects(void) #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD struct rcu_head rh1; struct rcu_head rh2; + struct rcu_head *rhp = kmalloc(sizeof(*rhp), GFP_KERNEL); init_rcu_head_on_stack(&rh1); init_rcu_head_on_stack(&rh2); @@ -2616,6 +2859,10 @@ static void rcu_test_debug_objects(void) local_irq_disable(); /* Make it harder to start a new grace period. */ call_rcu(&rh2, rcu_torture_leak_cb); call_rcu(&rh2, rcu_torture_err_cb); /* Duplicate callback. */ + if (rhp) { + call_rcu(rhp, rcu_torture_leak_cb); + call_rcu(rhp, rcu_torture_err_cb); /* Another duplicate callback. */ + } local_irq_enable(); rcu_read_unlock(); preempt_enable(); @@ -2710,6 +2957,8 @@ rcu_torture_init(void) atomic_set(&n_rcu_torture_alloc_fail, 0); atomic_set(&n_rcu_torture_free, 0); atomic_set(&n_rcu_torture_mberror, 0); + atomic_set(&n_rcu_torture_mbchk_fail, 0); + atomic_set(&n_rcu_torture_mbchk_tries, 0); atomic_set(&n_rcu_torture_error, 0); n_rcu_torture_barrier_error = 0; n_rcu_torture_boost_ktrerror = 0; @@ -2729,6 +2978,7 @@ rcu_torture_init(void) /* Start up the kthreads. */ + rcu_torture_write_types(); firsterr = torture_create_kthread(rcu_torture_writer, NULL, writer_task); if (firsterr) @@ -2751,17 +3001,40 @@ rcu_torture_init(void) } reader_tasks = kcalloc(nrealreaders, sizeof(reader_tasks[0]), GFP_KERNEL); - if (reader_tasks == NULL) { + rcu_torture_reader_mbchk = kcalloc(nrealreaders, sizeof(*rcu_torture_reader_mbchk), + GFP_KERNEL); + if (!reader_tasks || !rcu_torture_reader_mbchk) { VERBOSE_TOROUT_ERRSTRING("out of memory"); firsterr = -ENOMEM; goto unwind; } for (i = 0; i < nrealreaders; i++) { + rcu_torture_reader_mbchk[i].rtc_chkrdr = -1; firsterr = torture_create_kthread(rcu_torture_reader, (void *)i, reader_tasks[i]); if (firsterr) goto unwind; } + nrealnocbers = nocbs_nthreads; + if (WARN_ON(nrealnocbers < 0)) + nrealnocbers = 1; + if (WARN_ON(nocbs_toggle < 0)) + nocbs_toggle = HZ; + if (nrealnocbers > 0) { + nocb_tasks = kcalloc(nrealnocbers, sizeof(nocb_tasks[0]), GFP_KERNEL); + if (nocb_tasks == NULL) { + VERBOSE_TOROUT_ERRSTRING("out of memory"); + firsterr = -ENOMEM; + goto unwind; + } + } else { + nocb_tasks = NULL; + } + for (i = 0; i < nrealnocbers; i++) { + firsterr = torture_create_kthread(rcu_nocb_toggle, NULL, nocb_tasks[i]); + if (firsterr) + goto unwind; + } if (stat_interval > 0) { firsterr = torture_create_kthread(rcu_torture_stats, NULL, stats_task); diff --git a/kernel/rcu/refscale.c b/kernel/rcu/refscale.c index 23ff36a66f97..02dd9767b559 100644 --- a/kernel/rcu/refscale.c +++ b/kernel/rcu/refscale.c @@ -46,6 +46,18 @@ #define VERBOSE_SCALEOUT(s, x...) \ do { if (verbose) pr_alert("%s" SCALE_FLAG s, scale_type, ## x); } while (0) +static atomic_t verbose_batch_ctr; + +#define VERBOSE_SCALEOUT_BATCH(s, x...) \ +do { \ + if (verbose && \ + (verbose_batched <= 0 || \ + !(atomic_inc_return(&verbose_batch_ctr) % verbose_batched))) { \ + schedule_timeout_uninterruptible(1); \ + pr_alert("%s" SCALE_FLAG s, scale_type, ## x); \ + } \ +} while (0) + #define VERBOSE_SCALEOUT_ERRSTRING(s, x...) \ do { if (verbose) pr_alert("%s" SCALE_FLAG "!!! " s, scale_type, ## x); } while (0) @@ -57,6 +69,7 @@ module_param(scale_type, charp, 0444); MODULE_PARM_DESC(scale_type, "Type of test (rcu, srcu, refcnt, rwsem, rwlock."); torture_param(int, verbose, 0, "Enable verbose debugging printk()s"); +torture_param(int, verbose_batched, 0, "Batch verbose debugging printk()s"); // Wait until there are multiple CPUs before starting test. torture_param(int, holdoff, IS_BUILTIN(CONFIG_RCU_REF_SCALE_TEST) ? 10 : 0, @@ -368,14 +381,14 @@ ref_scale_reader(void *arg) u64 start; s64 duration; - VERBOSE_SCALEOUT("ref_scale_reader %ld: task started", me); + VERBOSE_SCALEOUT_BATCH("ref_scale_reader %ld: task started", me); set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids)); set_user_nice(current, MAX_NICE); atomic_inc(&n_init); if (holdoff) schedule_timeout_interruptible(holdoff * HZ); repeat: - VERBOSE_SCALEOUT("ref_scale_reader %ld: waiting to start next experiment on cpu %d", me, smp_processor_id()); + VERBOSE_SCALEOUT_BATCH("ref_scale_reader %ld: waiting to start next experiment on cpu %d", me, smp_processor_id()); // Wait for signal that this reader can start. wait_event(rt->wq, (atomic_read(&nreaders_exp) && smp_load_acquire(&rt->start_reader)) || @@ -392,7 +405,7 @@ repeat: while (atomic_read_acquire(&n_started)) cpu_relax(); - VERBOSE_SCALEOUT("ref_scale_reader %ld: experiment %d started", me, exp_idx); + VERBOSE_SCALEOUT_BATCH("ref_scale_reader %ld: experiment %d started", me, exp_idx); // To reduce noise, do an initial cache-warming invocation, check @@ -421,8 +434,8 @@ repeat: if (atomic_dec_and_test(&nreaders_exp)) wake_up(&main_wq); - VERBOSE_SCALEOUT("ref_scale_reader %ld: experiment %d ended, (readers remaining=%d)", - me, exp_idx, atomic_read(&nreaders_exp)); + VERBOSE_SCALEOUT_BATCH("ref_scale_reader %ld: experiment %d ended, (readers remaining=%d)", + me, exp_idx, atomic_read(&nreaders_exp)); if (!torture_must_stop()) goto repeat; diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c index 6208c1dae5c9..26344dc6483b 100644 --- a/kernel/rcu/srcutiny.c +++ b/kernel/rcu/srcutiny.c @@ -34,6 +34,7 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp) ssp->srcu_gp_running = false; ssp->srcu_gp_waiting = false; ssp->srcu_idx = 0; + ssp->srcu_idx_max = 0; INIT_WORK(&ssp->srcu_work, srcu_drive_gp); INIT_LIST_HEAD(&ssp->srcu_work.entry); return 0; @@ -84,6 +85,8 @@ void cleanup_srcu_struct(struct srcu_struct *ssp) WARN_ON(ssp->srcu_gp_waiting); WARN_ON(ssp->srcu_cb_head); WARN_ON(&ssp->srcu_cb_head != ssp->srcu_cb_tail); + WARN_ON(ssp->srcu_idx != ssp->srcu_idx_max); + WARN_ON(ssp->srcu_idx & 0x1); } EXPORT_SYMBOL_GPL(cleanup_srcu_struct); @@ -114,7 +117,7 @@ void srcu_drive_gp(struct work_struct *wp) struct srcu_struct *ssp; ssp = container_of(wp, struct srcu_struct, srcu_work); - if (ssp->srcu_gp_running || !READ_ONCE(ssp->srcu_cb_head)) + if (ssp->srcu_gp_running || USHORT_CMP_GE(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max))) return; /* Already running or nothing to do. */ /* Remove recently arrived callbacks and wait for readers. */ @@ -124,11 +127,12 @@ void srcu_drive_gp(struct work_struct *wp) ssp->srcu_cb_head = NULL; ssp->srcu_cb_tail = &ssp->srcu_cb_head; local_irq_enable(); - idx = ssp->srcu_idx; - WRITE_ONCE(ssp->srcu_idx, !ssp->srcu_idx); + idx = (ssp->srcu_idx & 0x2) / 2; + WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1); WRITE_ONCE(ssp->srcu_gp_waiting, true); /* srcu_read_unlock() wakes! */ swait_event_exclusive(ssp->srcu_wq, !READ_ONCE(ssp->srcu_lock_nesting[idx])); WRITE_ONCE(ssp->srcu_gp_waiting, false); /* srcu_read_unlock() cheap. */ + WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1); /* Invoke the callbacks we removed above. */ while (lh) { @@ -146,11 +150,27 @@ void srcu_drive_gp(struct work_struct *wp) * straighten that out. */ WRITE_ONCE(ssp->srcu_gp_running, false); - if (READ_ONCE(ssp->srcu_cb_head)) + if (USHORT_CMP_LT(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max))) schedule_work(&ssp->srcu_work); } EXPORT_SYMBOL_GPL(srcu_drive_gp); +static void srcu_gp_start_if_needed(struct srcu_struct *ssp) +{ + unsigned short cookie; + + cookie = get_state_synchronize_srcu(ssp); + if (USHORT_CMP_GE(READ_ONCE(ssp->srcu_idx_max), cookie)) + return; + WRITE_ONCE(ssp->srcu_idx_max, cookie); + if (!READ_ONCE(ssp->srcu_gp_running)) { + if (likely(srcu_init_done)) + schedule_work(&ssp->srcu_work); + else if (list_empty(&ssp->srcu_work.entry)) + list_add(&ssp->srcu_work.entry, &srcu_boot_list); + } +} + /* * Enqueue an SRCU callback on the specified srcu_struct structure, * initiating grace-period processing if it is not already running. @@ -166,12 +186,7 @@ void call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp, *ssp->srcu_cb_tail = rhp; ssp->srcu_cb_tail = &rhp->next; local_irq_restore(flags); - if (!READ_ONCE(ssp->srcu_gp_running)) { - if (likely(srcu_init_done)) - schedule_work(&ssp->srcu_work); - else if (list_empty(&ssp->srcu_work.entry)) - list_add(&ssp->srcu_work.entry, &srcu_boot_list); - } + srcu_gp_start_if_needed(ssp); } EXPORT_SYMBOL_GPL(call_srcu); @@ -190,6 +205,48 @@ void synchronize_srcu(struct srcu_struct *ssp) } EXPORT_SYMBOL_GPL(synchronize_srcu); +/* + * get_state_synchronize_srcu - Provide an end-of-grace-period cookie + */ +unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp) +{ + unsigned long ret; + + barrier(); + ret = (READ_ONCE(ssp->srcu_idx) + 3) & ~0x1; + barrier(); + return ret & USHRT_MAX; +} +EXPORT_SYMBOL_GPL(get_state_synchronize_srcu); + +/* + * start_poll_synchronize_srcu - Provide cookie and start grace period + * + * The difference between this and get_state_synchronize_srcu() is that + * this function ensures that the poll_state_synchronize_srcu() will + * eventually return the value true. + */ +unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp) +{ + unsigned long ret = get_state_synchronize_srcu(ssp); + + srcu_gp_start_if_needed(ssp); + return ret; +} +EXPORT_SYMBOL_GPL(start_poll_synchronize_srcu); + +/* + * poll_state_synchronize_srcu - Has cookie's grace period ended? + */ +bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long cookie) +{ + bool ret = USHORT_CMP_GE(READ_ONCE(ssp->srcu_idx), cookie); + + barrier(); + return ret; +} +EXPORT_SYMBOL_GPL(poll_state_synchronize_srcu); + /* Lockdep diagnostics. */ void __init rcu_scheduler_starting(void) { diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c index 0f23d20d485a..e26547b34ad3 100644 --- a/kernel/rcu/srcutree.c +++ b/kernel/rcu/srcutree.c @@ -807,6 +807,46 @@ static void srcu_leak_callback(struct rcu_head *rhp) { } +/* + * Start an SRCU grace period, and also queue the callback if non-NULL. + */ +static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp, + struct rcu_head *rhp, bool do_norm) +{ + unsigned long flags; + int idx; + bool needexp = false; + bool needgp = false; + unsigned long s; + struct srcu_data *sdp; + + check_init_srcu_struct(ssp); + idx = srcu_read_lock(ssp); + sdp = raw_cpu_ptr(ssp->sda); + spin_lock_irqsave_rcu_node(sdp, flags); + if (rhp) + rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp); + rcu_segcblist_advance(&sdp->srcu_cblist, + rcu_seq_current(&ssp->srcu_gp_seq)); + s = rcu_seq_snap(&ssp->srcu_gp_seq); + (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, s); + if (ULONG_CMP_LT(sdp->srcu_gp_seq_needed, s)) { + sdp->srcu_gp_seq_needed = s; + needgp = true; + } + if (!do_norm && ULONG_CMP_LT(sdp->srcu_gp_seq_needed_exp, s)) { + sdp->srcu_gp_seq_needed_exp = s; + needexp = true; + } + spin_unlock_irqrestore_rcu_node(sdp, flags); + if (needgp) + srcu_funnel_gp_start(ssp, sdp, s, do_norm); + else if (needexp) + srcu_funnel_exp_start(ssp, sdp->mynode, s); + srcu_read_unlock(ssp, idx); + return s; +} + /* * Enqueue an SRCU callback on the srcu_data structure associated with * the current CPU and the specified srcu_struct structure, initiating @@ -838,14 +878,6 @@ static void srcu_leak_callback(struct rcu_head *rhp) static void __call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp, rcu_callback_t func, bool do_norm) { - unsigned long flags; - int idx; - bool needexp = false; - bool needgp = false; - unsigned long s; - struct srcu_data *sdp; - - check_init_srcu_struct(ssp); if (debug_rcu_head_queue(rhp)) { /* Probable double call_srcu(), so leak the callback. */ WRITE_ONCE(rhp->func, srcu_leak_callback); @@ -853,28 +885,7 @@ static void __call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp, return; } rhp->func = func; - idx = srcu_read_lock(ssp); - sdp = raw_cpu_ptr(ssp->sda); - spin_lock_irqsave_rcu_node(sdp, flags); - rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp); - rcu_segcblist_advance(&sdp->srcu_cblist, - rcu_seq_current(&ssp->srcu_gp_seq)); - s = rcu_seq_snap(&ssp->srcu_gp_seq); - (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, s); - if (ULONG_CMP_LT(sdp->srcu_gp_seq_needed, s)) { - sdp->srcu_gp_seq_needed = s; - needgp = true; - } - if (!do_norm && ULONG_CMP_LT(sdp->srcu_gp_seq_needed_exp, s)) { - sdp->srcu_gp_seq_needed_exp = s; - needexp = true; - } - spin_unlock_irqrestore_rcu_node(sdp, flags); - if (needgp) - srcu_funnel_gp_start(ssp, sdp, s, do_norm); - else if (needexp) - srcu_funnel_exp_start(ssp, sdp->mynode, s); - srcu_read_unlock(ssp, idx); + (void)srcu_gp_start_if_needed(ssp, rhp, do_norm); } /** @@ -1003,6 +1014,77 @@ void synchronize_srcu(struct srcu_struct *ssp) } EXPORT_SYMBOL_GPL(synchronize_srcu); +/** + * get_state_synchronize_srcu - Provide an end-of-grace-period cookie + * @ssp: srcu_struct to provide cookie for. + * + * This function returns a cookie that can be passed to + * poll_state_synchronize_srcu(), which will return true if a full grace + * period has elapsed in the meantime. It is the caller's responsibility + * to make sure that grace period happens, for example, by invoking + * call_srcu() after return from get_state_synchronize_srcu(). + */ +unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp) +{ + // Any prior manipulation of SRCU-protected data must happen + // before the load from ->srcu_gp_seq. + smp_mb(); + return rcu_seq_snap(&ssp->srcu_gp_seq); +} +EXPORT_SYMBOL_GPL(get_state_synchronize_srcu); + +/** + * start_poll_synchronize_srcu - Provide cookie and start grace period + * @ssp: srcu_struct to provide cookie for. + * + * This function returns a cookie that can be passed to + * poll_state_synchronize_srcu(), which will return true if a full grace + * period has elapsed in the meantime. Unlike get_state_synchronize_srcu(), + * this function also ensures that any needed SRCU grace period will be + * started. This convenience does come at a cost in terms of CPU overhead. + */ +unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp) +{ + return srcu_gp_start_if_needed(ssp, NULL, true); +} +EXPORT_SYMBOL_GPL(start_poll_synchronize_srcu); + +/** + * poll_state_synchronize_srcu - Has cookie's grace period ended? + * @ssp: srcu_struct to provide cookie for. + * @cookie: Return value from get_state_synchronize_srcu() or start_poll_synchronize_srcu(). + * + * This function takes the cookie that was returned from either + * get_state_synchronize_srcu() or start_poll_synchronize_srcu(), and + * returns @true if an SRCU grace period elapsed since the time that the + * cookie was created. + * + * Because cookies are finite in size, wrapping/overflow is possible. + * This is more pronounced on 32-bit systems where cookies are 32 bits, + * where in theory wrapping could happen in about 14 hours assuming + * 25-microsecond expedited SRCU grace periods. However, a more likely + * overflow lower bound is on the order of 24 days in the case of + * one-millisecond SRCU grace periods. Of course, wrapping in a 64-bit + * system requires geologic timespans, as in more than seven million years + * even for expedited SRCU grace periods. + * + * Wrapping/overflow is much more of an issue for CONFIG_SMP=n systems + * that also have CONFIG_PREEMPTION=n, which selects Tiny SRCU. This uses + * a 16-bit cookie, which rcutorture routinely wraps in a matter of a + * few minutes. If this proves to be a problem, this counter will be + * expanded to the same size as for Tree SRCU. + */ +bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long cookie) +{ + if (!rcu_seq_done(&ssp->srcu_gp_seq, cookie)) + return false; + // Ensure that the end of the SRCU grace period happens before + // any subsequent code that the caller might execute. + smp_mb(); // ^^^ + return true; +} +EXPORT_SYMBOL_GPL(poll_state_synchronize_srcu); + /* * Callback function for srcu_barrier() use. */ @@ -1160,6 +1242,7 @@ static void srcu_advance_state(struct srcu_struct *ssp) */ static void srcu_invoke_callbacks(struct work_struct *work) { + long len; bool more; struct rcu_cblist ready_cbs; struct rcu_head *rhp; @@ -1182,6 +1265,7 @@ static void srcu_invoke_callbacks(struct work_struct *work) /* We are on the job! Extract and invoke ready callbacks. */ sdp->srcu_cblist_invoking = true; rcu_segcblist_extract_done_cbs(&sdp->srcu_cblist, &ready_cbs); + len = ready_cbs.len; spin_unlock_irq_rcu_node(sdp); rhp = rcu_cblist_dequeue(&ready_cbs); for (; rhp != NULL; rhp = rcu_cblist_dequeue(&ready_cbs)) { @@ -1190,13 +1274,14 @@ static void srcu_invoke_callbacks(struct work_struct *work) rhp->func(rhp); local_bh_enable(); } + WARN_ON_ONCE(ready_cbs.len); /* * Update counts, accelerate new callbacks, and if needed, * schedule another round of callback invocation. */ spin_lock_irq_rcu_node(sdp); - rcu_segcblist_insert_count(&sdp->srcu_cblist, &ready_cbs); + rcu_segcblist_add_len(&sdp->srcu_cblist, -len); (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, rcu_seq_snap(&ssp->srcu_gp_seq)); sdp->srcu_cblist_invoking = false; diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index 35bdcfd84d42..af7c19439f4e 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -241,7 +241,7 @@ static int __noreturn rcu_tasks_kthread(void *arg) } } -/* Spawn RCU-tasks grace-period kthread, e.g., at core_initcall() time. */ +/* Spawn RCU-tasks grace-period kthread. */ static void __init rcu_spawn_tasks_kthread_generic(struct rcu_tasks *rtp) { struct task_struct *t; @@ -564,7 +564,6 @@ static int __init rcu_spawn_tasks_kthread(void) rcu_spawn_tasks_kthread_generic(&rcu_tasks); return 0; } -core_initcall(rcu_spawn_tasks_kthread); #if !defined(CONFIG_TINY_RCU) void show_rcu_tasks_classic_gp_kthread(void) @@ -692,7 +691,6 @@ static int __init rcu_spawn_tasks_rude_kthread(void) rcu_spawn_tasks_kthread_generic(&rcu_tasks_rude); return 0; } -core_initcall(rcu_spawn_tasks_rude_kthread); #if !defined(CONFIG_TINY_RCU) void show_rcu_tasks_rude_gp_kthread(void) @@ -968,6 +966,11 @@ static void rcu_tasks_trace_pregp_step(void) static void rcu_tasks_trace_pertask(struct task_struct *t, struct list_head *hop) { + // During early boot when there is only the one boot CPU, there + // is no idle task for the other CPUs. Just return. + if (unlikely(t == NULL)) + return; + WRITE_ONCE(t->trc_reader_special.b.need_qs, false); WRITE_ONCE(t->trc_reader_checked, false); t->trc_ipi_to_cpu = -1; @@ -1193,7 +1196,6 @@ static int __init rcu_spawn_tasks_trace_kthread(void) rcu_spawn_tasks_kthread_generic(&rcu_tasks_trace); return 0; } -core_initcall(rcu_spawn_tasks_trace_kthread); #if !defined(CONFIG_TINY_RCU) void show_rcu_tasks_trace_gp_kthread(void) @@ -1222,6 +1224,100 @@ void show_rcu_tasks_gp_kthreads(void) } #endif /* #ifndef CONFIG_TINY_RCU */ +#ifdef CONFIG_PROVE_RCU +struct rcu_tasks_test_desc { + struct rcu_head rh; + const char *name; + bool notrun; +}; + +static struct rcu_tasks_test_desc tests[] = { + { + .name = "call_rcu_tasks()", + /* If not defined, the test is skipped. */ + .notrun = !IS_ENABLED(CONFIG_TASKS_RCU), + }, + { + .name = "call_rcu_tasks_rude()", + /* If not defined, the test is skipped. */ + .notrun = !IS_ENABLED(CONFIG_TASKS_RUDE_RCU), + }, + { + .name = "call_rcu_tasks_trace()", + /* If not defined, the test is skipped. */ + .notrun = !IS_ENABLED(CONFIG_TASKS_TRACE_RCU) + } +}; + +static void test_rcu_tasks_callback(struct rcu_head *rhp) +{ + struct rcu_tasks_test_desc *rttd = + container_of(rhp, struct rcu_tasks_test_desc, rh); + + pr_info("Callback from %s invoked.\n", rttd->name); + + rttd->notrun = true; +} + +static void rcu_tasks_initiate_self_tests(void) +{ + pr_info("Running RCU-tasks wait API self tests\n"); +#ifdef CONFIG_TASKS_RCU + synchronize_rcu_tasks(); + call_rcu_tasks(&tests[0].rh, test_rcu_tasks_callback); +#endif + +#ifdef CONFIG_TASKS_RUDE_RCU + synchronize_rcu_tasks_rude(); + call_rcu_tasks_rude(&tests[1].rh, test_rcu_tasks_callback); +#endif + +#ifdef CONFIG_TASKS_TRACE_RCU + synchronize_rcu_tasks_trace(); + call_rcu_tasks_trace(&tests[2].rh, test_rcu_tasks_callback); +#endif +} + +static int rcu_tasks_verify_self_tests(void) +{ + int ret = 0; + int i; + + for (i = 0; i < ARRAY_SIZE(tests); i++) { + if (!tests[i].notrun) { // still hanging. + pr_err("%s has been failed.\n", tests[i].name); + ret = -1; + } + } + + if (ret) + WARN_ON(1); + + return ret; +} +late_initcall(rcu_tasks_verify_self_tests); +#else /* #ifdef CONFIG_PROVE_RCU */ +static void rcu_tasks_initiate_self_tests(void) { } +#endif /* #else #ifdef CONFIG_PROVE_RCU */ + +void __init rcu_init_tasks_generic(void) +{ +#ifdef CONFIG_TASKS_RCU + rcu_spawn_tasks_kthread(); +#endif + +#ifdef CONFIG_TASKS_RUDE_RCU + rcu_spawn_tasks_rude_kthread(); +#endif + +#ifdef CONFIG_TASKS_TRACE_RCU + rcu_spawn_tasks_trace_kthread(); +#endif + + // Run the self-tests. + rcu_tasks_initiate_self_tests(); +} + #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */ static inline void rcu_tasks_bootup_oddness(void) {} void show_rcu_tasks_gp_kthreads(void) {} diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 40e5e3dd253e..0f4a6a3c057b 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -83,6 +83,9 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = { .dynticks_nesting = 1, .dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE, .dynticks = ATOMIC_INIT(RCU_DYNTICK_CTRL_CTR), +#ifdef CONFIG_RCU_NOCB_CPU + .cblist.flags = SEGCBLIST_SOFTIRQ_ONLY, +#endif }; static struct rcu_state rcu_state = { .level = { &rcu_state.node[0] }, @@ -100,8 +103,10 @@ static struct rcu_state rcu_state = { static bool dump_tree; module_param(dump_tree, bool, 0444); /* By default, use RCU_SOFTIRQ instead of rcuc kthreads. */ -static bool use_softirq = true; +static bool use_softirq = !IS_ENABLED(CONFIG_PREEMPT_RT); +#ifndef CONFIG_PREEMPT_RT module_param(use_softirq, bool, 0444); +#endif /* Control rcu_node-tree auto-balancing at boot time. */ static bool rcu_fanout_exact; module_param(rcu_fanout_exact, bool, 0444); @@ -1495,6 +1500,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp) if (!rcu_segcblist_pend_cbs(&rdp->cblist)) return false; + trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCbPreAcc")); + /* * Callbacks are often registered with incomplete grace-period * information. Something about the fact that getting exact @@ -1515,6 +1522,8 @@ static bool rcu_accelerate_cbs(struct rcu_node *rnp, struct rcu_data *rdp) else trace_rcu_grace_period(rcu_state.name, gp_seq_req, TPS("AccReadyCB")); + trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCbPostAcc")); + return ret; } @@ -1765,7 +1774,7 @@ static bool rcu_gp_init(void) * go offline later. Please also refer to "Hotplug CPU" section * of RCU's Requirements documentation. */ - rcu_state.gp_state = RCU_GP_ONOFF; + WRITE_ONCE(rcu_state.gp_state, RCU_GP_ONOFF); rcu_for_each_leaf_node(rnp) { smp_mb(); // Pair with barriers used when updating ->ofl_seq to odd values. firstseq = READ_ONCE(rnp->ofl_seq); @@ -1831,7 +1840,7 @@ static bool rcu_gp_init(void) * The grace period cannot complete until the initialization * process finishes, because this kthread handles both. */ - rcu_state.gp_state = RCU_GP_INIT; + WRITE_ONCE(rcu_state.gp_state, RCU_GP_INIT); rcu_for_each_node_breadth_first(rnp) { rcu_gp_slow(gp_init_delay); raw_spin_lock_irqsave_rcu_node(rnp, flags); @@ -1930,17 +1939,22 @@ static void rcu_gp_fqs_loop(void) ret = 0; for (;;) { if (!ret) { - rcu_state.jiffies_force_qs = jiffies + j; + WRITE_ONCE(rcu_state.jiffies_force_qs, jiffies + j); + /* + * jiffies_force_qs before RCU_GP_WAIT_FQS state + * update; required for stall checks. + */ + smp_wmb(); WRITE_ONCE(rcu_state.jiffies_kick_kthreads, jiffies + (j ? 3 * j : 2)); } trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("fqswait")); - rcu_state.gp_state = RCU_GP_WAIT_FQS; + WRITE_ONCE(rcu_state.gp_state, RCU_GP_WAIT_FQS); ret = swait_event_idle_timeout_exclusive( rcu_state.gp_wq, rcu_gp_fqs_check_wake(&gf), j); rcu_gp_torture_wait(); - rcu_state.gp_state = RCU_GP_DOING_FQS; + WRITE_ONCE(rcu_state.gp_state, RCU_GP_DOING_FQS); /* Locking provides needed memory barriers. */ /* If grace period done, leave loop. */ if (!READ_ONCE(rnp->qsmask) && @@ -2054,7 +2068,7 @@ static void rcu_gp_cleanup(void) trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("end")); rcu_seq_end(&rcu_state.gp_seq); ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq); - rcu_state.gp_state = RCU_GP_IDLE; + WRITE_ONCE(rcu_state.gp_state, RCU_GP_IDLE); /* Check for GP requests since above loop. */ rdp = this_cpu_ptr(&rcu_data); if (!needgp && ULONG_CMP_LT(rnp->gp_seq, rnp->gp_seq_needed)) { @@ -2093,12 +2107,12 @@ static int __noreturn rcu_gp_kthread(void *unused) for (;;) { trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("reqwait")); - rcu_state.gp_state = RCU_GP_WAIT_GPS; + WRITE_ONCE(rcu_state.gp_state, RCU_GP_WAIT_GPS); swait_event_idle_exclusive(rcu_state.gp_wq, READ_ONCE(rcu_state.gp_flags) & RCU_GP_FLAG_INIT); rcu_gp_torture_wait(); - rcu_state.gp_state = RCU_GP_DONE_GPS; + WRITE_ONCE(rcu_state.gp_state, RCU_GP_DONE_GPS); /* Locking provides needed memory barrier. */ if (rcu_gp_init()) break; @@ -2113,9 +2127,9 @@ static int __noreturn rcu_gp_kthread(void *unused) rcu_gp_fqs_loop(); /* Handle grace-period end. */ - rcu_state.gp_state = RCU_GP_CLEANUP; + WRITE_ONCE(rcu_state.gp_state, RCU_GP_CLEANUP); rcu_gp_cleanup(); - rcu_state.gp_state = RCU_GP_CLEANED; + WRITE_ONCE(rcu_state.gp_state, RCU_GP_CLEANED); } } @@ -2430,11 +2444,12 @@ int rcutree_dead_cpu(unsigned int cpu) static void rcu_do_batch(struct rcu_data *rdp) { int div; + bool __maybe_unused empty; unsigned long flags; const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist); struct rcu_head *rhp; struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl); - long bl, count; + long bl, count = 0; long pending, tlimit = 0; /* If no callbacks are ready, just return. */ @@ -2471,14 +2486,18 @@ static void rcu_do_batch(struct rcu_data *rdp) rcu_segcblist_extract_done_cbs(&rdp->cblist, &rcl); if (offloaded) rdp->qlen_last_fqs_check = rcu_segcblist_n_cbs(&rdp->cblist); + + trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCbDequeued")); rcu_nocb_unlock_irqrestore(rdp, flags); /* Invoke callbacks. */ tick_dep_set_task(current, TICK_DEP_BIT_RCU); rhp = rcu_cblist_dequeue(&rcl); + for (; rhp; rhp = rcu_cblist_dequeue(&rcl)) { rcu_callback_t f; + count++; debug_rcu_head_unqueue(rhp); rcu_lock_acquire(&rcu_callback_map); @@ -2492,21 +2511,19 @@ static void rcu_do_batch(struct rcu_data *rdp) /* * Stop only if limit reached and CPU has something to do. - * Note: The rcl structure counts down from zero. */ - if (-rcl.len >= bl && !offloaded && + if (count >= bl && !offloaded && (need_resched() || (!is_idle_task(current) && !rcu_is_callbacks_kthread()))) break; if (unlikely(tlimit)) { /* only call local_clock() every 32 callbacks */ - if (likely((-rcl.len & 31) || local_clock() < tlimit)) + if (likely((count & 31) || local_clock() < tlimit)) continue; /* Exceeded the time limit, so leave. */ break; } - if (offloaded) { - WARN_ON_ONCE(in_serving_softirq()); + if (!in_serving_softirq()) { local_bh_enable(); lockdep_assert_irqs_enabled(); cond_resched_tasks_rcu_qs(); @@ -2517,15 +2534,13 @@ static void rcu_do_batch(struct rcu_data *rdp) local_irq_save(flags); rcu_nocb_lock(rdp); - count = -rcl.len; rdp->n_cbs_invoked += count; trace_rcu_batch_end(rcu_state.name, count, !!rcl.head, need_resched(), is_idle_task(current), rcu_is_callbacks_kthread()); /* Update counts and requeue any remaining callbacks. */ rcu_segcblist_insert_done_cbs(&rdp->cblist, &rcl); - smp_mb(); /* List handling before counting for rcu_barrier(). */ - rcu_segcblist_insert_count(&rdp->cblist, &rcl); + rcu_segcblist_add_len(&rdp->cblist, -count); /* Reinstate batch limit if we have worked down the excess. */ count = rcu_segcblist_n_cbs(&rdp->cblist); @@ -2543,9 +2558,12 @@ static void rcu_do_batch(struct rcu_data *rdp) * The following usually indicates a double call_rcu(). To track * this down, try building with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y. */ - WARN_ON_ONCE(count == 0 && !rcu_segcblist_empty(&rdp->cblist)); + empty = rcu_segcblist_empty(&rdp->cblist); + WARN_ON_ONCE(count == 0 && !empty); WARN_ON_ONCE(!IS_ENABLED(CONFIG_RCU_NOCB_CPU) && - count != 0 && rcu_segcblist_empty(&rdp->cblist)); + count != 0 && empty); + WARN_ON_ONCE(count == 0 && rcu_segcblist_n_segment_cbs(&rdp->cblist) != 0); + WARN_ON_ONCE(!empty && rcu_segcblist_n_segment_cbs(&rdp->cblist) == 0); rcu_nocb_unlock_irqrestore(rdp, flags); @@ -2566,6 +2584,7 @@ static void rcu_do_batch(struct rcu_data *rdp) void rcu_sched_clock_irq(int user) { trace_rcu_utilization(TPS("Start scheduler-tick")); + lockdep_assert_irqs_disabled(); raw_cpu_inc(rcu_data.ticks_this_gp); /* The load-acquire pairs with the store-release setting to true. */ if (smp_load_acquire(this_cpu_ptr(&rcu_data.rcu_urgent_qs))) { @@ -2579,6 +2598,7 @@ void rcu_sched_clock_irq(int user) rcu_flavor_sched_clock_irq(user); if (rcu_pending(user)) invoke_rcu_core(); + lockdep_assert_irqs_disabled(); trace_rcu_utilization(TPS("End scheduler-tick")); } @@ -2688,7 +2708,7 @@ static __latent_entropy void rcu_core(void) unsigned long flags; struct rcu_data *rdp = raw_cpu_ptr(&rcu_data); struct rcu_node *rnp = rdp->mynode; - const bool offloaded = rcu_segcblist_is_offloaded(&rdp->cblist); + const bool do_batch = !rcu_segcblist_completely_offloaded(&rdp->cblist); if (cpu_is_offline(smp_processor_id())) return; @@ -2708,17 +2728,17 @@ static __latent_entropy void rcu_core(void) /* No grace period and unregistered callbacks? */ if (!rcu_gp_in_progress() && - rcu_segcblist_is_enabled(&rdp->cblist) && !offloaded) { - local_irq_save(flags); + rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) { + rcu_nocb_lock_irqsave(rdp, flags); if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL)) rcu_accelerate_cbs_unlocked(rnp, rdp); - local_irq_restore(flags); + rcu_nocb_unlock_irqrestore(rdp, flags); } rcu_check_gp_start_stall(rnp, rdp, rcu_jiffies_till_stall_check()); /* If there are callbacks ready, invoke them. */ - if (!offloaded && rcu_segcblist_ready_cbs(&rdp->cblist) && + if (do_batch && rcu_segcblist_ready_cbs(&rdp->cblist) && likely(READ_ONCE(rcu_scheduler_fully_active))) rcu_do_batch(rdp); @@ -2941,6 +2961,7 @@ static void check_cb_ovld(struct rcu_data *rdp) static void __call_rcu(struct rcu_head *head, rcu_callback_t func) { + static atomic_t doublefrees; unsigned long flags; struct rcu_data *rdp; bool was_alldone; @@ -2954,8 +2975,10 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func) * Use rcu:rcu_callback trace event to find the previous * time callback was passed to __call_rcu(). */ - WARN_ONCE(1, "__call_rcu(): Double-freed CB %p->%pS()!!!\n", - head, head->func); + if (atomic_inc_return(&doublefrees) < 4) { + pr_err("%s(): Double-freed CB %p->%pS()!!! ", __func__, head, head->func); + mem_dump_obj(head); + } WRITE_ONCE(head->func, rcu_leak_callback); return; } @@ -2989,6 +3012,8 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func) trace_rcu_callback(rcu_state.name, head, rcu_segcblist_n_cbs(&rdp->cblist)); + trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued")); + /* Go handle any RCU core processing required. */ if (unlikely(rcu_segcblist_is_offloaded(&rdp->cblist))) { __call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */ @@ -3498,6 +3523,7 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) goto unlock_return; } + kasan_record_aux_stack(ptr); success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr); if (!success) { run_page_cache_worker(krcp); @@ -3747,6 +3773,8 @@ static int rcu_pending(int user) struct rcu_data *rdp = this_cpu_ptr(&rcu_data); struct rcu_node *rnp = rdp->mynode; + lockdep_assert_irqs_disabled(); + /* Check for CPU stalls, if enabled. */ check_cpu_stall(rdp); @@ -4001,12 +4029,18 @@ int rcutree_prepare_cpu(unsigned int cpu) rdp->qlen_last_fqs_check = 0; rdp->n_force_qs_snap = rcu_state.n_force_qs; rdp->blimit = blimit; - if (rcu_segcblist_empty(&rdp->cblist) && /* No early-boot CBs? */ - !rcu_segcblist_is_offloaded(&rdp->cblist)) - rcu_segcblist_init(&rdp->cblist); /* Re-enable callbacks. */ rdp->dynticks_nesting = 1; /* CPU not up, no tearing. */ rcu_dynticks_eqs_online(); raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */ + /* + * Lock in case the CB/GP kthreads are still around handling + * old callbacks (longer term we should flush all callbacks + * before completing CPU offline) + */ + rcu_nocb_lock(rdp); + if (rcu_segcblist_empty(&rdp->cblist)) /* No early-boot CBs? */ + rcu_segcblist_init(&rdp->cblist); /* Re-enable callbacks. */ + rcu_nocb_unlock(rdp); /* * Add CPU to leaf rcu_node pending-online bitmask. Any needed @@ -4159,6 +4193,9 @@ void rcu_report_dead(unsigned int cpu) struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */ + // Do any dangling deferred wakeups. + do_nocb_deferred_wakeup(rdp); + /* QS for any half-done expedited grace period. */ preempt_disable(); rcu_report_exp_rdp(this_cpu_ptr(&rcu_data)); diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 7708ed161f4a..5d359b9f9fec 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -201,6 +201,7 @@ struct rcu_data { /* 5) Callback offloading. */ #ifdef CONFIG_RCU_NOCB_CPU struct swait_queue_head nocb_cb_wq; /* For nocb kthreads to sleep on. */ + struct swait_queue_head nocb_state_wq; /* For offloading state changes */ struct task_struct *nocb_gp_kthread; raw_spinlock_t nocb_lock; /* Guard following pair of fields. */ atomic_t nocb_lock_contended; /* Contention experienced. */ @@ -256,6 +257,7 @@ struct rcu_data { }; /* Values for nocb_defer_wakeup field in struct rcu_data. */ +#define RCU_NOCB_WAKE_OFF -1 #define RCU_NOCB_WAKE_NOT 0 #define RCU_NOCB_WAKE 1 #define RCU_NOCB_WAKE_FORCE 2 diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 8760b6ead770..6c6ff06d4ae6 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -545,7 +545,7 @@ static void synchronize_rcu_expedited_wait(void) data_race(rnp_root->expmask), ".T"[!!data_race(rnp_root->exp_tasks)]); if (ndetected) { - pr_err("blocking rcu_node structures:"); + pr_err("blocking rcu_node structures (internal RCU debug):"); rcu_for_each_node_breadth_first(rnp) { if (rnp == rnp_root) continue; /* printed unconditionally */ diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 7e291ce0a1d6..231a0c6cf03c 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -682,6 +682,7 @@ static void rcu_flavor_sched_clock_irq(int user) { struct task_struct *t = current; + lockdep_assert_irqs_disabled(); if (user || rcu_is_cpu_rrupt_from_idle()) { rcu_note_voluntary_context_switch(current); } @@ -1665,6 +1666,8 @@ static void wake_nocb_gp(struct rcu_data *rdp, bool force, static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype, const char *reason) { + if (rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_OFF) + return; if (rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) mod_timer(&rdp->nocb_timer, jiffies + 1); if (rdp->nocb_defer_wakeup < waketype) @@ -1928,6 +1931,52 @@ static void do_nocb_bypass_wakeup_timer(struct timer_list *t) __call_rcu_nocb_wake(rdp, true, flags); } +/* + * Check if we ignore this rdp. + * + * We check that without holding the nocb lock but + * we make sure not to miss a freshly offloaded rdp + * with the current ordering: + * + * rdp_offload_toggle() nocb_gp_enabled_cb() + * ------------------------- ---------------------------- + * WRITE flags LOCK nocb_gp_lock + * LOCK nocb_gp_lock READ/WRITE nocb_gp_sleep + * READ/WRITE nocb_gp_sleep UNLOCK nocb_gp_lock + * UNLOCK nocb_gp_lock READ flags + */ +static inline bool nocb_gp_enabled_cb(struct rcu_data *rdp) +{ + u8 flags = SEGCBLIST_OFFLOADED | SEGCBLIST_KTHREAD_GP; + + return rcu_segcblist_test_flags(&rdp->cblist, flags); +} + +static inline bool nocb_gp_update_state(struct rcu_data *rdp, bool *needwake_state) +{ + struct rcu_segcblist *cblist = &rdp->cblist; + + if (rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED)) { + if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) { + rcu_segcblist_set_flags(cblist, SEGCBLIST_KTHREAD_GP); + if (rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB)) + *needwake_state = true; + } + return true; + } + + /* + * De-offloading. Clear our flag and notify the de-offload worker. + * We will ignore this rdp until it ever gets re-offloaded. + */ + WARN_ON_ONCE(!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)); + rcu_segcblist_clear_flags(cblist, SEGCBLIST_KTHREAD_GP); + if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB)) + *needwake_state = true; + return false; +} + + /* * No-CBs GP kthreads come here to wait for additional callbacks to show up * or for grace periods to end. @@ -1956,8 +2005,18 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) */ WARN_ON_ONCE(my_rdp->nocb_gp_rdp != my_rdp); for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) { + bool needwake_state = false; + + if (!nocb_gp_enabled_cb(rdp)) + continue; trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check")); rcu_nocb_lock_irqsave(rdp, flags); + if (!nocb_gp_update_state(rdp, &needwake_state)) { + rcu_nocb_unlock_irqrestore(rdp, flags); + if (needwake_state) + swake_up_one(&rdp->nocb_state_wq); + continue; + } bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass); if (bypass_ncbs && (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) || @@ -1967,6 +2026,8 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass); } else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) { rcu_nocb_unlock_irqrestore(rdp, flags); + if (needwake_state) + swake_up_one(&rdp->nocb_state_wq); continue; /* No callbacks here, try next. */ } if (bypass_ncbs) { @@ -2018,6 +2079,8 @@ static void nocb_gp_wait(struct rcu_data *my_rdp) } if (needwake_gp) rcu_gp_kthread_wake(); + if (needwake_state) + swake_up_one(&rdp->nocb_state_wq); } my_rdp->nocb_gp_bypass = bypass; @@ -2081,14 +2144,27 @@ static int rcu_nocb_gp_kthread(void *arg) return 0; } +static inline bool nocb_cb_can_run(struct rcu_data *rdp) +{ + u8 flags = SEGCBLIST_OFFLOADED | SEGCBLIST_KTHREAD_CB; + return rcu_segcblist_test_flags(&rdp->cblist, flags); +} + +static inline bool nocb_cb_wait_cond(struct rcu_data *rdp) +{ + return nocb_cb_can_run(rdp) && !READ_ONCE(rdp->nocb_cb_sleep); +} + /* * Invoke any ready callbacks from the corresponding no-CBs CPU, * then, if there are no more, wait for more to appear. */ static void nocb_cb_wait(struct rcu_data *rdp) { + struct rcu_segcblist *cblist = &rdp->cblist; unsigned long cur_gp_seq; unsigned long flags; + bool needwake_state = false; bool needwake_gp = false; struct rcu_node *rnp = rdp->mynode; @@ -2100,32 +2176,55 @@ static void nocb_cb_wait(struct rcu_data *rdp) local_bh_enable(); lockdep_assert_irqs_enabled(); rcu_nocb_lock_irqsave(rdp, flags); - if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) && + if (rcu_segcblist_nextgp(cblist, &cur_gp_seq) && rcu_seq_done(&rnp->gp_seq, cur_gp_seq) && raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */ needwake_gp = rcu_advance_cbs(rdp->mynode, rdp); raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */ } - if (rcu_segcblist_ready_cbs(&rdp->cblist)) { - rcu_nocb_unlock_irqrestore(rdp, flags); - if (needwake_gp) - rcu_gp_kthread_wake(); - return; + + WRITE_ONCE(rdp->nocb_cb_sleep, true); + + if (rcu_segcblist_test_flags(cblist, SEGCBLIST_OFFLOADED)) { + if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB)) { + rcu_segcblist_set_flags(cblist, SEGCBLIST_KTHREAD_CB); + if (rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) + needwake_state = true; + } + if (rcu_segcblist_ready_cbs(cblist)) + WRITE_ONCE(rdp->nocb_cb_sleep, false); + } else { + /* + * De-offloading. Clear our flag and notify the de-offload worker. + * We won't touch the callbacks and keep sleeping until we ever + * get re-offloaded. + */ + WARN_ON_ONCE(!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB)); + rcu_segcblist_clear_flags(cblist, SEGCBLIST_KTHREAD_CB); + if (!rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)) + needwake_state = true; } - trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("CBSleep")); - WRITE_ONCE(rdp->nocb_cb_sleep, true); + if (rdp->nocb_cb_sleep) + trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("CBSleep")); + rcu_nocb_unlock_irqrestore(rdp, flags); if (needwake_gp) rcu_gp_kthread_wake(); - swait_event_interruptible_exclusive(rdp->nocb_cb_wq, - !READ_ONCE(rdp->nocb_cb_sleep)); - if (!smp_load_acquire(&rdp->nocb_cb_sleep)) { /* VVV */ - /* ^^^ Ensure CB invocation follows _sleep test. */ - return; - } - WARN_ON(signal_pending(current)); - trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty")); + + if (needwake_state) + swake_up_one(&rdp->nocb_state_wq); + + do { + swait_event_interruptible_exclusive(rdp->nocb_cb_wq, + nocb_cb_wait_cond(rdp)); + + // VVV Ensure CB invocation follows _sleep test. + if (smp_load_acquire(&rdp->nocb_cb_sleep)) { // ^^^ + WARN_ON(signal_pending(current)); + trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty")); + } + } while (!nocb_cb_can_run(rdp)); } /* @@ -2148,7 +2247,7 @@ static int rcu_nocb_cb_kthread(void *arg) /* Is a deferred wakeup of rcu_nocb_kthread() required? */ static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp) { - return READ_ONCE(rdp->nocb_defer_wakeup); + return READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT; } /* Do a deferred wakeup of rcu_nocb_kthread(). */ @@ -2187,6 +2286,195 @@ static void do_nocb_deferred_wakeup(struct rcu_data *rdp) do_nocb_deferred_wakeup_common(rdp); } +static int rdp_offload_toggle(struct rcu_data *rdp, + bool offload, unsigned long flags) + __releases(rdp->nocb_lock) +{ + struct rcu_segcblist *cblist = &rdp->cblist; + struct rcu_data *rdp_gp = rdp->nocb_gp_rdp; + bool wake_gp = false; + + rcu_segcblist_offload(cblist, offload); + + if (rdp->nocb_cb_sleep) + rdp->nocb_cb_sleep = false; + rcu_nocb_unlock_irqrestore(rdp, flags); + + /* + * Ignore former value of nocb_cb_sleep and force wake up as it could + * have been spuriously set to false already. + */ + swake_up_one(&rdp->nocb_cb_wq); + + raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags); + if (rdp_gp->nocb_gp_sleep) { + rdp_gp->nocb_gp_sleep = false; + wake_gp = true; + } + raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags); + + if (wake_gp) + wake_up_process(rdp_gp->nocb_gp_kthread); + + return 0; +} + +static int __rcu_nocb_rdp_deoffload(struct rcu_data *rdp) +{ + struct rcu_segcblist *cblist = &rdp->cblist; + unsigned long flags; + int ret; + + pr_info("De-offloading %d\n", rdp->cpu); + + rcu_nocb_lock_irqsave(rdp, flags); + /* + * If there are still pending work offloaded, the offline + * CPU won't help much handling them. + */ + if (cpu_is_offline(rdp->cpu) && !rcu_segcblist_empty(&rdp->cblist)) { + rcu_nocb_unlock_irqrestore(rdp, flags); + return -EBUSY; + } + + ret = rdp_offload_toggle(rdp, false, flags); + swait_event_exclusive(rdp->nocb_state_wq, + !rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB | + SEGCBLIST_KTHREAD_GP)); + rcu_nocb_lock_irqsave(rdp, flags); + /* Make sure nocb timer won't stay around */ + WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_OFF); + rcu_nocb_unlock_irqrestore(rdp, flags); + del_timer_sync(&rdp->nocb_timer); + + /* + * Flush bypass. While IRQs are disabled and once we set + * SEGCBLIST_SOFTIRQ_ONLY, no callback is supposed to be + * enqueued on bypass. + */ + rcu_nocb_lock_irqsave(rdp, flags); + rcu_nocb_flush_bypass(rdp, NULL, jiffies); + rcu_segcblist_set_flags(cblist, SEGCBLIST_SOFTIRQ_ONLY); + /* + * With SEGCBLIST_SOFTIRQ_ONLY, we can't use + * rcu_nocb_unlock_irqrestore() anymore. Theoretically we + * could set SEGCBLIST_SOFTIRQ_ONLY with cb unlocked and IRQs + * disabled now, but let's be paranoid. + */ + raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags); + + return ret; +} + +static long rcu_nocb_rdp_deoffload(void *arg) +{ + struct rcu_data *rdp = arg; + + WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id()); + return __rcu_nocb_rdp_deoffload(rdp); +} + +int rcu_nocb_cpu_deoffload(int cpu) +{ + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); + int ret = 0; + + if (rdp == rdp->nocb_gp_rdp) { + pr_info("Can't deoffload an rdp GP leader (yet)\n"); + return -EINVAL; + } + mutex_lock(&rcu_state.barrier_mutex); + cpus_read_lock(); + if (rcu_segcblist_is_offloaded(&rdp->cblist)) { + if (cpu_online(cpu)) + ret = work_on_cpu(cpu, rcu_nocb_rdp_deoffload, rdp); + else + ret = __rcu_nocb_rdp_deoffload(rdp); + if (!ret) + cpumask_clear_cpu(cpu, rcu_nocb_mask); + } + cpus_read_unlock(); + mutex_unlock(&rcu_state.barrier_mutex); + + return ret; +} +EXPORT_SYMBOL_GPL(rcu_nocb_cpu_deoffload); + +static int __rcu_nocb_rdp_offload(struct rcu_data *rdp) +{ + struct rcu_segcblist *cblist = &rdp->cblist; + unsigned long flags; + int ret; + + /* + * For now we only support re-offload, ie: the rdp must have been + * offloaded on boot first. + */ + if (!rdp->nocb_gp_rdp) + return -EINVAL; + + pr_info("Offloading %d\n", rdp->cpu); + /* + * Can't use rcu_nocb_lock_irqsave() while we are in + * SEGCBLIST_SOFTIRQ_ONLY mode. + */ + raw_spin_lock_irqsave(&rdp->nocb_lock, flags); + /* Re-enable nocb timer */ + WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT); + /* + * We didn't take the nocb lock while working on the + * rdp->cblist in SEGCBLIST_SOFTIRQ_ONLY mode. + * Every modifications that have been done previously on + * rdp->cblist must be visible remotely by the nocb kthreads + * upon wake up after reading the cblist flags. + * + * The layout against nocb_lock enforces that ordering: + * + * __rcu_nocb_rdp_offload() nocb_cb_wait()/nocb_gp_wait() + * ------------------------- ---------------------------- + * WRITE callbacks rcu_nocb_lock() + * rcu_nocb_lock() READ flags + * WRITE flags READ callbacks + * rcu_nocb_unlock() rcu_nocb_unlock() + */ + ret = rdp_offload_toggle(rdp, true, flags); + swait_event_exclusive(rdp->nocb_state_wq, + rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_CB) && + rcu_segcblist_test_flags(cblist, SEGCBLIST_KTHREAD_GP)); + + return ret; +} + +static long rcu_nocb_rdp_offload(void *arg) +{ + struct rcu_data *rdp = arg; + + WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id()); + return __rcu_nocb_rdp_offload(rdp); +} + +int rcu_nocb_cpu_offload(int cpu) +{ + struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu); + int ret = 0; + + mutex_lock(&rcu_state.barrier_mutex); + cpus_read_lock(); + if (!rcu_segcblist_is_offloaded(&rdp->cblist)) { + if (cpu_online(cpu)) + ret = work_on_cpu(cpu, rcu_nocb_rdp_offload, rdp); + else + ret = __rcu_nocb_rdp_offload(rdp); + if (!ret) + cpumask_set_cpu(cpu, rcu_nocb_mask); + } + cpus_read_unlock(); + mutex_unlock(&rcu_state.barrier_mutex); + + return ret; +} +EXPORT_SYMBOL_GPL(rcu_nocb_cpu_offload); + void __init rcu_init_nohz(void) { int cpu; @@ -2229,7 +2517,9 @@ void __init rcu_init_nohz(void) rdp = per_cpu_ptr(&rcu_data, cpu); if (rcu_segcblist_empty(&rdp->cblist)) rcu_segcblist_init(&rdp->cblist); - rcu_segcblist_offload(&rdp->cblist); + rcu_segcblist_offload(&rdp->cblist, true); + rcu_segcblist_set_flags(&rdp->cblist, SEGCBLIST_KTHREAD_CB); + rcu_segcblist_set_flags(&rdp->cblist, SEGCBLIST_KTHREAD_GP); } rcu_organize_nocb_kthreads(); } @@ -2239,6 +2529,7 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp) { init_swait_queue_head(&rdp->nocb_cb_wq); init_swait_queue_head(&rdp->nocb_gp_wq); + init_swait_queue_head(&rdp->nocb_state_wq); raw_spin_lock_init(&rdp->nocb_lock); raw_spin_lock_init(&rdp->nocb_bypass_lock); raw_spin_lock_init(&rdp->nocb_gp_lock); @@ -2381,6 +2672,19 @@ void rcu_bind_current_to_nocb(void) } EXPORT_SYMBOL_GPL(rcu_bind_current_to_nocb); +// The ->on_cpu field is available only in CONFIG_SMP=y, so... +#ifdef CONFIG_SMP +static char *show_rcu_should_be_on_cpu(struct task_struct *tsp) +{ + return tsp && tsp->state == TASK_RUNNING && !tsp->on_cpu ? "!" : ""; +} +#else // #ifdef CONFIG_SMP +static char *show_rcu_should_be_on_cpu(struct task_struct *tsp) +{ + return ""; +} +#endif // #else #ifdef CONFIG_SMP + /* * Dump out nocb grace-period kthread state for the specified rcu_data * structure. @@ -2389,7 +2693,7 @@ static void show_rcu_nocb_gp_state(struct rcu_data *rdp) { struct rcu_node *rnp = rdp->mynode; - pr_info("nocb GP %d %c%c%c%c%c%c %c[%c%c] %c%c:%ld rnp %d:%d %lu\n", + pr_info("nocb GP %d %c%c%c%c%c%c %c[%c%c] %c%c:%ld rnp %d:%d %lu %c CPU %d%s\n", rdp->cpu, "kK"[!!rdp->nocb_gp_kthread], "lL"[raw_spin_is_locked(&rdp->nocb_gp_lock)], @@ -2403,12 +2707,17 @@ static void show_rcu_nocb_gp_state(struct rcu_data *rdp) ".B"[!!rdp->nocb_gp_bypass], ".G"[!!rdp->nocb_gp_gp], (long)rdp->nocb_gp_seq, - rnp->grplo, rnp->grphi, READ_ONCE(rdp->nocb_gp_loops)); + rnp->grplo, rnp->grphi, READ_ONCE(rdp->nocb_gp_loops), + rdp->nocb_gp_kthread ? task_state_to_char(rdp->nocb_gp_kthread) : '.', + rdp->nocb_cb_kthread ? (int)task_cpu(rdp->nocb_gp_kthread) : -1, + show_rcu_should_be_on_cpu(rdp->nocb_cb_kthread)); } /* Dump out nocb kthread state for the specified rcu_data structure. */ static void show_rcu_nocb_state(struct rcu_data *rdp) { + char bufw[20]; + char bufr[20]; struct rcu_segcblist *rsclp = &rdp->cblist; bool waslocked; bool wastimer; @@ -2417,8 +2726,11 @@ static void show_rcu_nocb_state(struct rcu_data *rdp) if (rdp->nocb_gp_rdp == rdp) show_rcu_nocb_gp_state(rdp); - pr_info(" CB %d->%d %c%c%c%c%c%c F%ld L%ld C%d %c%c%c%c%c q%ld\n", + sprintf(bufw, "%ld", rsclp->gp_seq[RCU_WAIT_TAIL]); + sprintf(bufr, "%ld", rsclp->gp_seq[RCU_NEXT_READY_TAIL]); + pr_info(" CB %d^%d->%d %c%c%c%c%c%c F%ld L%ld C%d %c%c%s%c%s%c%c q%ld %c CPU %d%s\n", rdp->cpu, rdp->nocb_gp_rdp->cpu, + rdp->nocb_next_cb_rdp ? rdp->nocb_next_cb_rdp->cpu : -1, "kK"[!!rdp->nocb_cb_kthread], "bB"[raw_spin_is_locked(&rdp->nocb_bypass_lock)], "cC"[!!atomic_read(&rdp->nocb_lock_contended)], @@ -2429,11 +2741,16 @@ static void show_rcu_nocb_state(struct rcu_data *rdp) jiffies - rdp->nocb_nobypass_last, rdp->nocb_nobypass_count, ".D"[rcu_segcblist_ready_cbs(rsclp)], - ".W"[!rcu_segcblist_restempty(rsclp, RCU_DONE_TAIL)], - ".R"[!rcu_segcblist_restempty(rsclp, RCU_WAIT_TAIL)], - ".N"[!rcu_segcblist_restempty(rsclp, RCU_NEXT_READY_TAIL)], + ".W"[!rcu_segcblist_segempty(rsclp, RCU_WAIT_TAIL)], + rcu_segcblist_segempty(rsclp, RCU_WAIT_TAIL) ? "" : bufw, + ".R"[!rcu_segcblist_segempty(rsclp, RCU_NEXT_READY_TAIL)], + rcu_segcblist_segempty(rsclp, RCU_NEXT_READY_TAIL) ? "" : bufr, + ".N"[!rcu_segcblist_segempty(rsclp, RCU_NEXT_TAIL)], ".B"[!!rcu_cblist_n_cbs(&rdp->nocb_bypass)], - rcu_segcblist_n_cbs(&rdp->cblist)); + rcu_segcblist_n_cbs(&rdp->cblist), + rdp->nocb_cb_kthread ? task_state_to_char(rdp->nocb_cb_kthread) : '.', + rdp->nocb_cb_kthread ? (int)task_cpu(rdp->nocb_gp_kthread) : -1, + show_rcu_should_be_on_cpu(rdp->nocb_cb_kthread)); /* It is OK for GP kthreads to have GP state. */ if (rdp->nocb_gp_rdp == rdp) diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h index 70d48c52fabc..475b26171b20 100644 --- a/kernel/rcu/tree_stall.h +++ b/kernel/rcu/tree_stall.h @@ -266,6 +266,7 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags) struct task_struct *t; struct task_struct *ts[8]; + lockdep_assert_irqs_disabled(); if (!rcu_preempt_blocked_readers_cgp(rnp)) return 0; pr_err("\tTasks blocked on level-%d rcu_node (CPUs %d-%d):", @@ -290,6 +291,7 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags) ".q"[rscr.rs.b.need_qs], ".e"[rscr.rs.b.exp_hint], ".l"[rscr.on_blkd_list]); + lockdep_assert_irqs_disabled(); put_task_struct(t); ndetected++; } @@ -333,9 +335,12 @@ static void rcu_dump_cpu_stacks(void) rcu_for_each_leaf_node(rnp) { raw_spin_lock_irqsave_rcu_node(rnp, flags); for_each_leaf_node_possible_cpu(rnp, cpu) - if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) - if (!trigger_single_cpu_backtrace(cpu)) + if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) { + if (cpu_is_offline(cpu)) + pr_err("Offline CPU %d blocking current GP.\n", cpu); + else if (!trigger_single_cpu_backtrace(cpu)) dump_cpu_task(cpu); + } raw_spin_unlock_irqrestore_rcu_node(rnp, flags); } } @@ -449,25 +454,66 @@ static void print_cpu_stall_info(int cpu) /* Complain about starvation of grace-period kthread. */ static void rcu_check_gp_kthread_starvation(void) { + int cpu; struct task_struct *gpk = rcu_state.gp_kthread; unsigned long j; if (rcu_is_gp_kthread_starving(&j)) { + cpu = gpk ? task_cpu(gpk) : -1; pr_err("%s kthread starved for %ld jiffies! g%ld f%#x %s(%d) ->state=%#lx ->cpu=%d\n", rcu_state.name, j, (long)rcu_seq_current(&rcu_state.gp_seq), data_race(rcu_state.gp_flags), gp_state_getname(rcu_state.gp_state), rcu_state.gp_state, - gpk ? gpk->state : ~0, gpk ? task_cpu(gpk) : -1); + gpk ? gpk->state : ~0, cpu); if (gpk) { pr_err("\tUnless %s kthread gets sufficient CPU time, OOM is now expected behavior.\n", rcu_state.name); pr_err("RCU grace-period kthread stack dump:\n"); sched_show_task(gpk); + if (cpu >= 0) { + if (cpu_is_offline(cpu)) { + pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu); + } else { + pr_err("Stack dump where RCU GP kthread last ran:\n"); + if (!trigger_single_cpu_backtrace(cpu)) + dump_cpu_task(cpu); + } + } wake_up_process(gpk); } } } +/* Complain about missing wakeups from expired fqs wait timer */ +static void rcu_check_gp_kthread_expired_fqs_timer(void) +{ + struct task_struct *gpk = rcu_state.gp_kthread; + short gp_state; + unsigned long jiffies_fqs; + int cpu; + + /* + * Order reads of .gp_state and .jiffies_force_qs. + * Matching smp_wmb() is present in rcu_gp_fqs_loop(). + */ + gp_state = smp_load_acquire(&rcu_state.gp_state); + jiffies_fqs = READ_ONCE(rcu_state.jiffies_force_qs); + + if (gp_state == RCU_GP_WAIT_FQS && + time_after(jiffies, jiffies_fqs + RCU_STALL_MIGHT_MIN) && + gpk && !READ_ONCE(gpk->on_rq)) { + cpu = task_cpu(gpk); + pr_err("%s kthread timer wakeup didn't happen for %ld jiffies! g%ld f%#x %s(%d) ->state=%#lx\n", + rcu_state.name, (jiffies - jiffies_fqs), + (long)rcu_seq_current(&rcu_state.gp_seq), + data_race(rcu_state.gp_flags), + gp_state_getname(RCU_GP_WAIT_FQS), RCU_GP_WAIT_FQS, + gpk->state); + pr_err("\tPossible timer handling issue on cpu=%d timer-softirq=%u\n", + cpu, kstat_softirqs_cpu(TIMER_SOFTIRQ, cpu)); + } +} + static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps) { int cpu; @@ -478,6 +524,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps) struct rcu_node *rnp; long totqlen = 0; + lockdep_assert_irqs_disabled(); + /* Kick and suppress, if so configured. */ rcu_stall_kick_kthreads(); if (rcu_stall_is_suppressed()) @@ -499,6 +547,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps) } } ndetected += rcu_print_task_stall(rnp, flags); // Releases rnp->lock. + lockdep_assert_irqs_disabled(); } for_each_possible_cpu(cpu) @@ -529,6 +578,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps) WRITE_ONCE(rcu_state.jiffies_stall, jiffies + 3 * rcu_jiffies_till_stall_check() + 3); + rcu_check_gp_kthread_expired_fqs_timer(); rcu_check_gp_kthread_starvation(); panic_on_rcu_stall(); @@ -544,6 +594,8 @@ static void print_cpu_stall(unsigned long gps) struct rcu_node *rnp = rcu_get_root(); long totqlen = 0; + lockdep_assert_irqs_disabled(); + /* Kick and suppress, if so configured. */ rcu_stall_kick_kthreads(); if (rcu_stall_is_suppressed()) @@ -564,6 +616,7 @@ static void print_cpu_stall(unsigned long gps) jiffies - gps, (long)rcu_seq_current(&rcu_state.gp_seq), totqlen); + rcu_check_gp_kthread_expired_fqs_timer(); rcu_check_gp_kthread_starvation(); rcu_dump_cpu_stacks(); @@ -598,6 +651,7 @@ static void check_cpu_stall(struct rcu_data *rdp) unsigned long js; struct rcu_node *rnp; + lockdep_assert_irqs_disabled(); if ((rcu_stall_is_suppressed() && !READ_ONCE(rcu_kick_kthreads)) || !rcu_gp_in_progress()) return; diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 39334d2d2b37..b95ae86c40a7 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -56,8 +56,10 @@ #ifndef CONFIG_TINY_RCU module_param(rcu_expedited, int, 0); module_param(rcu_normal, int, 0); -static int rcu_normal_after_boot; +static int rcu_normal_after_boot = IS_ENABLED(CONFIG_PREEMPT_RT); +#ifndef CONFIG_PREEMPT_RT module_param(rcu_normal_after_boot, int, 0); +#endif #endif /* #ifndef CONFIG_TINY_RCU */ #ifdef CONFIG_DEBUG_LOCK_ALLOC diff --git a/kernel/scftorture.c b/kernel/scftorture.c index d55a9f8cda3d..2377cbb32474 100644 --- a/kernel/scftorture.c +++ b/kernel/scftorture.c @@ -398,6 +398,7 @@ static void scftorture_invoke_one(struct scf_statistics *scfp, struct torture_ra static int scftorture_invoker(void *arg) { int cpu; + int curcpu; DEFINE_TORTURE_RANDOM(rand); struct scf_statistics *scfp = (struct scf_statistics *)arg; bool was_offline = false; @@ -412,7 +413,10 @@ static int scftorture_invoker(void *arg) VERBOSE_SCFTORTOUT("scftorture_invoker %d: Waiting for all SCF torturers from cpu %d", scfp->cpu, smp_processor_id()); // Make sure that the CPU is affinitized appropriately during testing. - WARN_ON_ONCE(smp_processor_id() != scfp->cpu); + curcpu = smp_processor_id(); + WARN_ONCE(curcpu != scfp->cpu % nr_cpu_ids, + "%s: Wanted CPU %d, running on %d, nr_cpu_ids = %d\n", + __func__, scfp->cpu, curcpu, nr_cpu_ids); if (!atomic_dec_return(&n_started)) while (atomic_read_acquire(&n_started)) { diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 15d2562118d1..a75c608839c4 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3464,7 +3464,7 @@ out: /** * try_invoke_on_locked_down_task - Invoke a function on task in fixed state - * @p: Process for which the function is to be invoked. + * @p: Process for which the function is to be invoked, can be @current. * @func: Function to invoke. * @arg: Argument to function. * @@ -3482,12 +3482,11 @@ out: */ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct task_struct *t, void *arg), void *arg) { - bool ret = false; struct rq_flags rf; + bool ret = false; struct rq *rq; - lockdep_assert_irqs_enabled(); - raw_spin_lock_irq(&p->pi_lock); + raw_spin_lock_irqsave(&p->pi_lock, rf.flags); if (p->on_rq) { rq = __task_rq_lock(p, &rf); if (task_rq(p) == rq) @@ -3504,7 +3503,7 @@ bool try_invoke_on_locked_down_task(struct task_struct *p, bool (*func)(struct t ret = func(p, arg); } } - raw_spin_unlock_irq(&p->pi_lock); + raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags); return ret; } diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 8dbc008f8942..f475f1a027c8 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1237,6 +1237,20 @@ int try_to_del_timer_sync(struct timer_list *timer) } EXPORT_SYMBOL(try_to_del_timer_sync); +bool timer_curr_running(struct timer_list *timer) +{ + int i; + + for (i = 0; i < NR_BASES; i++) { + struct timer_base *base = this_cpu_ptr(&timer_bases[i]); + + if (base->running_timer == timer) + return true; + } + + return false; +} + #ifdef CONFIG_PREEMPT_RT static __init void timer_base_init_expiry_lock(struct timer_base *base) { diff --git a/kernel/torture.c b/kernel/torture.c index 8562ac18d2eb..01e336f1e5b2 100644 --- a/kernel/torture.c +++ b/kernel/torture.c @@ -48,6 +48,12 @@ module_param(disable_onoff_at_boot, bool, 0444); static bool ftrace_dump_at_shutdown; module_param(ftrace_dump_at_shutdown, bool, 0444); +static int verbose_sleep_frequency; +module_param(verbose_sleep_frequency, int, 0444); + +static int verbose_sleep_duration = 1; +module_param(verbose_sleep_duration, int, 0444); + static char *torture_type; static int verbose; @@ -58,6 +64,95 @@ static int verbose; static int fullstop = FULLSTOP_RMMOD; static DEFINE_MUTEX(fullstop_mutex); +static atomic_t verbose_sleep_counter; + +/* + * Sleep if needed from VERBOSE_TOROUT*(). + */ +void verbose_torout_sleep(void) +{ + if (verbose_sleep_frequency > 0 && + verbose_sleep_duration > 0 && + !(atomic_inc_return(&verbose_sleep_counter) % verbose_sleep_frequency)) + schedule_timeout_uninterruptible(verbose_sleep_duration); +} +EXPORT_SYMBOL_GPL(verbose_torout_sleep); + +/* + * Schedule a high-resolution-timer sleep in nanoseconds, with a 32-bit + * nanosecond random fuzz. This function and its friends desynchronize + * testing from the timer wheel. + */ +int torture_hrtimeout_ns(ktime_t baset_ns, u32 fuzzt_ns, struct torture_random_state *trsp) +{ + ktime_t hto = baset_ns; + + if (trsp) + hto += (torture_random(trsp) >> 3) % fuzzt_ns; + set_current_state(TASK_UNINTERRUPTIBLE); + return schedule_hrtimeout(&hto, HRTIMER_MODE_REL); +} +EXPORT_SYMBOL_GPL(torture_hrtimeout_ns); + +/* + * Schedule a high-resolution-timer sleep in microseconds, with a 32-bit + * nanosecond (not microsecond!) random fuzz. + */ +int torture_hrtimeout_us(u32 baset_us, u32 fuzzt_ns, struct torture_random_state *trsp) +{ + ktime_t baset_ns = baset_us * NSEC_PER_USEC; + + return torture_hrtimeout_ns(baset_ns, fuzzt_ns, trsp); +} +EXPORT_SYMBOL_GPL(torture_hrtimeout_us); + +/* + * Schedule a high-resolution-timer sleep in milliseconds, with a 32-bit + * microsecond (not millisecond!) random fuzz. + */ +int torture_hrtimeout_ms(u32 baset_ms, u32 fuzzt_us, struct torture_random_state *trsp) +{ + ktime_t baset_ns = baset_ms * NSEC_PER_MSEC; + u32 fuzzt_ns; + + if ((u32)~0U / NSEC_PER_USEC < fuzzt_us) + fuzzt_ns = (u32)~0U; + else + fuzzt_ns = fuzzt_us * NSEC_PER_USEC; + return torture_hrtimeout_ns(baset_ns, fuzzt_ns, trsp); +} +EXPORT_SYMBOL_GPL(torture_hrtimeout_ms); + +/* + * Schedule a high-resolution-timer sleep in jiffies, with an + * implied one-jiffy random fuzz. This is intended to replace calls to + * schedule_timeout_interruptible() and friends. + */ +int torture_hrtimeout_jiffies(u32 baset_j, struct torture_random_state *trsp) +{ + ktime_t baset_ns = jiffies_to_nsecs(baset_j); + + return torture_hrtimeout_ns(baset_ns, jiffies_to_nsecs(1), trsp); +} +EXPORT_SYMBOL_GPL(torture_hrtimeout_jiffies); + +/* + * Schedule a high-resolution-timer sleep in milliseconds, with a 32-bit + * millisecond (not second!) random fuzz. + */ +int torture_hrtimeout_s(u32 baset_s, u32 fuzzt_ms, struct torture_random_state *trsp) +{ + ktime_t baset_ns = baset_s * NSEC_PER_SEC; + u32 fuzzt_ns; + + if ((u32)~0U / NSEC_PER_MSEC < fuzzt_ms) + fuzzt_ns = (u32)~0U; + else + fuzzt_ns = fuzzt_ms * NSEC_PER_MSEC; + return torture_hrtimeout_ns(baset_ns, fuzzt_ns, trsp); +} +EXPORT_SYMBOL_GPL(torture_hrtimeout_s); + #ifdef CONFIG_HOTPLUG_CPU /* @@ -80,6 +175,19 @@ static unsigned long sum_online; static int min_online = -1; static int max_online; +static int torture_online_cpus = NR_CPUS; + +/* + * Some torture testing leverages confusion as to the number of online + * CPUs. This function returns the torture-testing view of this number, + * which allows torture tests to load-balance appropriately. + */ +int torture_num_online_cpus(void) +{ + return READ_ONCE(torture_online_cpus); +} +EXPORT_SYMBOL_GPL(torture_num_online_cpus); + /* * Attempt to take a CPU offline. Return false if the CPU is already * offline or if it is not subject to CPU-hotplug operations. The @@ -134,6 +242,8 @@ bool torture_offline(int cpu, long *n_offl_attempts, long *n_offl_successes, *min_offl = delta; if (*max_offl < delta) *max_offl = delta; + WRITE_ONCE(torture_online_cpus, torture_online_cpus - 1); + WARN_ON_ONCE(torture_online_cpus <= 0); } return true; @@ -190,12 +300,33 @@ bool torture_online(int cpu, long *n_onl_attempts, long *n_onl_successes, *min_onl = delta; if (*max_onl < delta) *max_onl = delta; + WRITE_ONCE(torture_online_cpus, torture_online_cpus + 1); } return true; } EXPORT_SYMBOL_GPL(torture_online); +/* + * Get everything online at the beginning and ends of tests. + */ +static void torture_online_all(char *phase) +{ + int cpu; + int ret; + + for_each_possible_cpu(cpu) { + if (cpu_online(cpu)) + continue; + ret = add_cpu(cpu); + if (ret && verbose) { + pr_alert("%s" TORTURE_FLAG + "%s: %s online %d: errno %d\n", + __func__, phase, torture_type, cpu, ret); + } + } +} + /* * Execute random CPU-hotplug operations at the interval specified * by the onoff_interval. @@ -206,25 +337,12 @@ torture_onoff(void *arg) int cpu; int maxcpu = -1; DEFINE_TORTURE_RANDOM(rand); - int ret; VERBOSE_TOROUT_STRING("torture_onoff task started"); for_each_online_cpu(cpu) maxcpu = cpu; WARN_ON(maxcpu < 0); - if (!IS_MODULE(CONFIG_TORTURE_TEST)) { - for_each_possible_cpu(cpu) { - if (cpu_online(cpu)) - continue; - ret = add_cpu(cpu); - if (ret && verbose) { - pr_alert("%s" TORTURE_FLAG - "%s: Initial online %d: errno %d\n", - __func__, torture_type, cpu, ret); - } - } - } - + torture_online_all("Initial"); if (maxcpu == 0) { VERBOSE_TOROUT_STRING("Only one CPU, so CPU-hotplug testing is disabled"); goto stop; @@ -252,6 +370,7 @@ torture_onoff(void *arg) stop: torture_kthread_stopping("torture_onoff"); + torture_online_all("Final"); return 0; } @@ -602,7 +721,6 @@ static int stutter_gap; */ bool stutter_wait(const char *title) { - ktime_t delay; unsigned int i = 0; bool ret = false; int spt; @@ -618,11 +736,8 @@ bool stutter_wait(const char *title) schedule_timeout_interruptible(1); } else if (spt == 2) { while (READ_ONCE(stutter_pause_test)) { - if (!(i++ & 0xffff)) { - set_current_state(TASK_INTERRUPTIBLE); - delay = 10 * NSEC_PER_USEC; - schedule_hrtimeout(&delay, HRTIMER_MODE_REL); - } + if (!(i++ & 0xffff)) + torture_hrtimeout_us(10, 0, NULL); cond_resched(); } } else { @@ -640,7 +755,6 @@ EXPORT_SYMBOL_GPL(stutter_wait); */ static int torture_stutter(void *arg) { - ktime_t delay; DEFINE_TORTURE_RANDOM(rand); int wtime; @@ -651,20 +765,15 @@ static int torture_stutter(void *arg) if (stutter > 2) { WRITE_ONCE(stutter_pause_test, 1); wtime = stutter - 3; - delay = ktime_divns(NSEC_PER_SEC * wtime, HZ); - delay += (torture_random(&rand) >> 3) % NSEC_PER_MSEC; - set_current_state(TASK_INTERRUPTIBLE); - schedule_hrtimeout(&delay, HRTIMER_MODE_REL); + torture_hrtimeout_jiffies(wtime, &rand); wtime = 2; } WRITE_ONCE(stutter_pause_test, 2); - delay = ktime_divns(NSEC_PER_SEC * wtime, HZ); - set_current_state(TASK_INTERRUPTIBLE); - schedule_hrtimeout(&delay, HRTIMER_MODE_REL); + torture_hrtimeout_jiffies(wtime, NULL); } WRITE_ONCE(stutter_pause_test, 0); if (!torture_must_stop()) - schedule_timeout_interruptible(stutter_gap); + torture_hrtimeout_jiffies(stutter_gap, NULL); torture_shutdown_absorb("torture_stutter"); } while (!torture_must_stop()); torture_kthread_stopping("torture_stutter"); diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index e59eda07305e..a1071cdefb5a 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -5,6 +5,7 @@ #include #include #include +#include #include /* @@ -168,6 +169,7 @@ static void percpu_ref_switch_to_atomic_rcu(struct rcu_head *rcu) struct percpu_ref_data, rcu); struct percpu_ref *ref = data->ref; unsigned long __percpu *percpu_count = percpu_count_ptr(ref); + static atomic_t underflows; unsigned long count = 0; int cpu; @@ -191,9 +193,13 @@ static void percpu_ref_switch_to_atomic_rcu(struct rcu_head *rcu) */ atomic_long_add((long)count - PERCPU_COUNT_BIAS, &data->count); - WARN_ONCE(atomic_long_read(&data->count) <= 0, - "percpu ref (%ps) <= 0 (%ld) after switching to atomic", - data->release, atomic_long_read(&data->count)); + if (WARN_ONCE(atomic_long_read(&data->count) <= 0, + "percpu ref (%ps) <= 0 (%ld) after switching to atomic", + data->release, atomic_long_read(&data->count)) && + atomic_inc_return(&underflows) < 4) { + pr_err("%s(): percpu_ref underflow", __func__); + mem_dump_obj(data); + } /* @ref is viewed as dead on all CPUs, send out switch confirmation */ percpu_ref_call_confirm_rcu(rcu); diff --git a/mm/slab.c b/mm/slab.c index d7c8da9319c7..dcc55e78f353 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3635,6 +3635,26 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t flags, EXPORT_SYMBOL(__kmalloc_node_track_caller); #endif /* CONFIG_NUMA */ +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page) +{ + struct kmem_cache *cachep; + unsigned int objnr; + void *objp; + + kpp->kp_ptr = object; + kpp->kp_page = page; + cachep = page->slab_cache; + kpp->kp_slab_cache = cachep; + objp = object - obj_offset(cachep); + kpp->kp_data_offset = obj_offset(cachep); + page = virt_to_head_page(objp); + objnr = obj_to_index(cachep, page, objp); + objp = index_to_obj(cachep, page, objnr); + kpp->kp_objp = objp; + if (DEBUG && cachep->flags & SLAB_STORE_USER) + kpp->kp_ret = *dbg_userword(cachep, objp); +} + /** * __do_kmalloc - allocate memory * @size: how many bytes of memory are required. diff --git a/mm/slab.h b/mm/slab.h index 1a756a359fa8..ecad9b57bc44 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -615,4 +615,16 @@ static inline bool slab_want_init_on_free(struct kmem_cache *c) return false; } +#define KS_ADDRS_COUNT 16 +struct kmem_obj_info { + void *kp_ptr; + struct page *kp_page; + void *kp_objp; + unsigned long kp_data_offset; + struct kmem_cache *kp_slab_cache; + void *kp_ret; + void *kp_stack[KS_ADDRS_COUNT]; +}; +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page); + #endif /* MM_SLAB_H */ diff --git a/mm/slab_common.c b/mm/slab_common.c index e981c80d216c..adbace4256ef 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -537,6 +537,81 @@ bool slab_is_available(void) return slab_state >= UP; } +/** + * kmem_valid_obj - does the pointer reference a valid slab object? + * @object: pointer to query. + * + * Return: %true if the pointer is to a not-yet-freed object from + * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer + * is to an already-freed object, and %false otherwise. + */ +bool kmem_valid_obj(void *object) +{ + struct page *page; + + /* Some arches consider ZERO_SIZE_PTR to be a valid address. */ + if (object < (void *)PAGE_SIZE || !virt_addr_valid(object)) + return false; + page = virt_to_head_page(object); + return PageSlab(page); +} + +/** + * kmem_dump_obj - Print available slab provenance information + * @object: slab object for which to find provenance information. + * + * This function uses pr_cont(), so that the caller is expected to have + * printed out whatever preamble is appropriate. The provenance information + * depends on the type of object and on how much debugging is enabled. + * For a slab-cache object, the fact that it is a slab object is printed, + * and, if available, the slab name, return address, and stack trace from + * the allocation of that object. + * + * This function will splat if passed a pointer to a non-slab object. + * If you are not sure what type of object you have, you should instead + * use mem_dump_obj(). + */ +void kmem_dump_obj(void *object) +{ + char *cp = IS_ENABLED(CONFIG_MMU) ? "" : "/vmalloc"; + int i; + struct page *page; + unsigned long ptroffset; + struct kmem_obj_info kp = { }; + + if (WARN_ON_ONCE(!virt_addr_valid(object))) + return; + page = virt_to_head_page(object); + if (WARN_ON_ONCE(!PageSlab(page))) { + pr_cont(" non-slab memory.\n"); + return; + } + kmem_obj_info(&kp, object, page); + if (kp.kp_slab_cache) + pr_cont(" slab%s %s", cp, kp.kp_slab_cache->name); + else + pr_cont(" slab%s", cp); + if (kp.kp_objp) + pr_cont(" start %px", kp.kp_objp); + if (kp.kp_data_offset) + pr_cont(" data offset %lu", kp.kp_data_offset); + if (kp.kp_objp) { + ptroffset = ((char *)object - (char *)kp.kp_objp) - kp.kp_data_offset; + pr_cont(" pointer offset %lu", ptroffset); + } + if (kp.kp_slab_cache && kp.kp_slab_cache->usersize) + pr_cont(" size %u", kp.kp_slab_cache->usersize); + if (kp.kp_ret) + pr_cont(" allocated at %pS\n", kp.kp_ret); + else + pr_cont("\n"); + for (i = 0; i < ARRAY_SIZE(kp.kp_stack); i++) { + if (!kp.kp_stack[i]) + break; + pr_info(" %pS\n", kp.kp_stack[i]); + } +} + #ifndef CONFIG_SLOB /* Create a cache during boot when no slab services are available yet */ void __init create_boot_cache(struct kmem_cache *s, const char *name, diff --git a/mm/slob.c b/mm/slob.c index 8d4bfa46247f..ef87ada8705d 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -461,6 +461,12 @@ out: spin_unlock_irqrestore(&slob_lock, flags); } +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page) +{ + kpp->kp_ptr = object; + kpp->kp_page = page; +} + /* * End of slob allocator proper. Begin kmem_cache_alloc and kmalloc frontend. */ diff --git a/mm/slub.c b/mm/slub.c index 0c8b43a5b3b0..3c1a84316fd7 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3919,6 +3919,46 @@ int __kmem_cache_shutdown(struct kmem_cache *s) return 0; } +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page) +{ + void *base; + int __maybe_unused i; + unsigned int objnr; + void *objp; + void *objp0; + struct kmem_cache *s = page->slab_cache; + struct track __maybe_unused *trackp; + + kpp->kp_ptr = object; + kpp->kp_page = page; + kpp->kp_slab_cache = s; + base = page_address(page); + objp0 = kasan_reset_tag(object); +#ifdef CONFIG_SLUB_DEBUG + objp = restore_red_left(s, objp0); +#else + objp = objp0; +#endif + objnr = obj_to_index(s, page, objp); + kpp->kp_data_offset = (unsigned long)((char *)objp0 - (char *)objp); + objp = base + s->size * objnr; + kpp->kp_objp = objp; + if (WARN_ON_ONCE(objp < base || objp >= base + page->objects * s->size || (objp - base) % s->size) || + !(s->flags & SLAB_STORE_USER)) + return; +#ifdef CONFIG_SLUB_DEBUG + trackp = get_track(s, objp, TRACK_ALLOC); + kpp->kp_ret = (void *)trackp->addr; +#ifdef CONFIG_STACKTRACE + for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) { + kpp->kp_stack[i] = (void *)trackp->addrs[i]; + if (!kpp->kp_stack[i]) + break; + } +#endif +#endif +} + /******************************************************************** * Kmalloc subsystem *******************************************************************/ diff --git a/mm/util.c b/mm/util.c index 8c9b7d1e7c49..54870226cea6 100644 --- a/mm/util.c +++ b/mm/util.c @@ -982,3 +982,34 @@ int __weak memcmp_pages(struct page *page1, struct page *page2) kunmap_atomic(addr1); return ret; } + +/** + * mem_dump_obj - Print available provenance information + * @object: object for which to find provenance information. + * + * This function uses pr_cont(), so that the caller is expected to have + * printed out whatever preamble is appropriate. The provenance information + * depends on the type of object and on how much debugging is enabled. + * For example, for a slab-cache object, the slab name is printed, and, + * if available, the return address and stack trace from the allocation + * of that object. + */ +void mem_dump_obj(void *object) +{ + if (kmem_valid_obj(object)) { + kmem_dump_obj(object); + return; + } + if (vmalloc_dump_obj(object)) + return; + if (!virt_addr_valid(object)) { + if (object == NULL) + pr_cont(" NULL pointer.\n"); + else if (object == ZERO_SIZE_PTR) + pr_cont(" zero-size pointer.\n"); + else + pr_cont(" non-paged memory.\n"); + return; + } + pr_cont(" non-slab/vmalloc memory.\n"); +} diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 4d88fe5a277a..e3229ff627ea 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3448,6 +3448,19 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms) } #endif /* CONFIG_SMP */ +bool vmalloc_dump_obj(void *object) +{ + struct vm_struct *vm; + void *objp = (void *)PAGE_ALIGN((unsigned long)object); + + vm = find_vm_area(objp); + if (!vm) + return false; + pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n", + vm->nr_pages, (unsigned long)vm->addr, vm->caller); + return true; +} + #ifdef CONFIG_PROC_FS static void *s_start(struct seq_file *m, loff_t *pos) __acquires(&vmap_purge_lock) diff --git a/tools/testing/selftests/rcutorture/bin/config2csv.sh b/tools/testing/selftests/rcutorture/bin/config2csv.sh new file mode 100755 index 000000000000..d5a16631b16e --- /dev/null +++ b/tools/testing/selftests/rcutorture/bin/config2csv.sh @@ -0,0 +1,67 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0+ +# +# Create a spreadsheet from torture-test Kconfig options and kernel boot +# parameters. Run this in the directory containing the scenario files. +# +# Usage: config2csv path.csv [ "scenario1 scenario2 ..." ] +# +# By default, this script will take the list of scenarios from the CFLIST +# file in that directory, otherwise it will consider only the scenarios +# specified on the command line. It will examine each scenario's file +# and also its .boot file, if present, and create a column in the .csv +# output file. Note that "CFLIST" is a synonym for all the scenarios in the +# CFLIST file, which allows easy comparison of those scenarios with selected +# scenarios such as BUSTED that are normally omitted from CFLIST files. + +csvout=${1} +if test -z "$csvout" +then + echo "Need .csv output file as first argument." + exit 1 +fi +shift +defaultconfigs="`tr '\012' ' ' < CFLIST`" +if test "$#" -eq 0 +then + scenariosarg=$defaultconfigs +else + scenariosarg=$* +fi +scenarios="`echo $scenariosarg | sed -e "s/\/$defaultconfigs/g"`" + +T=/tmp/config2latex.sh.$$ +trap 'rm -rf $T' 0 +mkdir $T + +cat << '---EOF---' >> $T/p.awk +END { +---EOF--- +for i in $scenarios +do + echo ' s["'$i'"] = 1;' >> $T/p.awk + grep -v '^#' < $i | grep -v '^ *$' > $T/p + if test -r $i.boot + then + tr -s ' ' '\012' < $i.boot | grep -v '^#' >> $T/p + fi + sed -e 's/^[^=]*$/&=?/' < $T/p | + sed -e 's/^\([^=]*\)=\(.*\)$/\tp["\1:'"$i"'"] = "\2";\n\tc["\1"] = 1;/' >> $T/p.awk +done +cat << '---EOF---' >> $T/p.awk + ns = asorti(s, ss); + nc = asorti(c, cs); + for (j = 1; j <= ns; j++) + printf ",\"%s\"", ss[j]; + printf "\n"; + for (i = 1; i <= nc; i++) { + printf "\"%s\"", cs[i]; + for (j = 1; j <= ns; j++) { + printf ",\"%s\"", p[cs[i] ":" ss[j]]; + } + printf "\n"; + } +} +---EOF--- +awk -f $T/p.awk < /dev/null > $T/p.csv +cp $T/p.csv $csvout diff --git a/tools/testing/selftests/rcutorture/bin/console-badness.sh b/tools/testing/selftests/rcutorture/bin/console-badness.sh index 80ae7f08b363..e6a132df6172 100755 --- a/tools/testing/selftests/rcutorture/bin/console-badness.sh +++ b/tools/testing/selftests/rcutorture/bin/console-badness.sh @@ -14,4 +14,5 @@ egrep 'Badness|WARNING:|Warn|BUG|===========|Call Trace:|Oops:|detected stalls o grep -v 'ODEBUG: ' | grep -v 'This means that this is a DEBUG kernel and it is' | grep -v 'Warning: unable to open an initial console' | +grep -v 'Warning: Failed to add ttynull console. No stdin, stdout, and stderr.*the init process!' | grep -v 'NOHZ tick-stop error: Non-RCU local softirq work is pending, handler' diff --git a/tools/testing/selftests/rcutorture/bin/functions.sh b/tools/testing/selftests/rcutorture/bin/functions.sh index 82663495fb38..c35ba24f994c 100644 --- a/tools/testing/selftests/rcutorture/bin/functions.sh +++ b/tools/testing/selftests/rcutorture/bin/functions.sh @@ -108,6 +108,39 @@ configfrag_hotplug_cpu () { grep -q '^CONFIG_HOTPLUG_CPU=y$' "$1" } +# get_starttime +# +# Returns a cookie identifying the current time. +get_starttime () { + awk 'BEGIN { print systime() }' < /dev/null +} + +# get_starttime_duration starttime +# +# Given the return value from get_starttime, compute a human-readable +# string denoting the time since get_starttime. +get_starttime_duration () { + awk -v starttime=$1 ' + BEGIN { + ts = systime() - starttime; + tm = int(ts / 60); + th = int(ts / 3600); + td = int(ts / 86400); + d = td; + h = th - td * 24; + m = tm - th * 60; + s = ts - tm * 60; + if (d >= 1) + printf "%dd %d:%02d:%02d\n", d, h, m, s + else if (h >= 1) + printf "%d:%02d:%02d\n", h, m, s + else if (m >= 1) + printf "%d:%02d.0\n", m, s + else + print s " seconds" + }' < /dev/null +} + # identify_boot_image qemu-cmd # # Returns the relative path to the kernel build image. This will be @@ -170,6 +203,7 @@ identify_qemu () { # and the TORTURE_QEMU_INTERACTIVE environment variable. identify_qemu_append () { echo debug_boot_weak_hash + echo panic=-1 local console=ttyS0 case "$1" in qemu-system-x86_64|qemu-system-i386) @@ -232,7 +266,7 @@ identify_qemu_args () { # Returns the number of virtual CPUs available to the aggregate of the # guest OSes. identify_qemu_vcpus () { - lscpu | grep '^CPU(s):' | sed -e 's/CPU(s)://' -e 's/[ ]*//g' + getconf _NPROCESSORS_ONLN } # print_bug diff --git a/tools/testing/selftests/rcutorture/bin/kvm-find-errors.sh b/tools/testing/selftests/rcutorture/bin/kvm-find-errors.sh index 6f50722f251f..0670841122d8 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-find-errors.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-find-errors.sh @@ -39,12 +39,14 @@ done if test -n "$files" then $editor $files + editorret=1 else echo No build errors. fi if grep -q -e "--buildonly" < ${rundir}/log then echo Build-only run, no console logs to check. + exit $editorret fi # Find console logs with errors @@ -62,5 +64,10 @@ then exit 1 else echo No errors in console logs. - exit 0 + if test -n "$editorret" + then + exit $editorret + else + exit 0 + fi fi diff --git a/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh b/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh index 840a4679a0d7..47cf4db10896 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-recheck.sh @@ -87,15 +87,16 @@ do fi done EDITOR=echo kvm-find-errors.sh "${@: -1}" > $T 2>&1 -ret=$? builderrors="`tr ' ' '\012' < $T | grep -c '/Make.out.diags'`" if test "$builderrors" -gt 0 then echo $builderrors runs with build errors. + ret=1 fi runerrors="`tr ' ' '\012' < $T | grep -c '/console.log.diags'`" if test "$runerrors" -gt 0 then echo $runerrors runs with runtime errors. + ret=2 fi exit $ret diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh index 3cd03d01857c..536d103ef166 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh @@ -125,7 +125,6 @@ seconds=$4 qemu_args=$5 boot_args=$6 -kstarttime=`gawk 'BEGIN { print systime() }' < /dev/null` if test -z "$TORTURE_BUILDONLY" then echo ' ---' `date`: Starting kernel @@ -158,6 +157,8 @@ then boot_args="$boot_args $TORTURE_BOOT_GDB_ARG" fi echo $QEMU $qemu_args -m $TORTURE_QEMU_MEM -kernel $KERNEL -append \"$qemu_append $boot_args\" $TORTURE_QEMU_GDB_ARG > $resdir/qemu-cmd +echo "# TORTURE_SHUTDOWN_GRACE=$TORTURE_SHUTDOWN_GRACE" >> $resdir/qemu-cmd +echo "# seconds=$seconds" >> $resdir/qemu-cmd if test -n "$TORTURE_BUILDONLY" then @@ -174,6 +175,7 @@ echo 'echo $! > $resdir/qemu_pid' >> $T/qemu-cmd echo "NOTE: $QEMU either did not run or was interactive" > $resdir/console.log # Attempt to run qemu +kstarttime=`gawk 'BEGIN { print systime() }' < /dev/null` ( . $T/qemu-cmd; wait `cat $resdir/qemu_pid`; echo $? > $resdir/qemu-retval ) & commandcompleted=0 if test -z "$TORTURE_KCONFIG_GDB_ARG" @@ -209,7 +211,7 @@ do if test -n "$TORTURE_KCONFIG_GDB_ARG" then : - elif test $kruntime -ge $seconds || test -f "$TORTURE_STOPFILE" + elif test $kruntime -ge $seconds || test -f "$resdir/../STOP.1" then break; fi @@ -252,16 +254,16 @@ then fi if test $commandcompleted -eq 0 -a -n "$qemu_pid" then - if ! test -f "$TORTURE_STOPFILE" + if ! test -f "$resdir/../STOP.1" then echo Grace period for qemu job at pid $qemu_pid fi oldline="`tail $resdir/console.log`" while : do - if test -f "$TORTURE_STOPFILE" + if test -f "$resdir/../STOP.1" then - echo "PID $qemu_pid killed due to run STOP request" >> $resdir/Warnings 2>&1 + echo "PID $qemu_pid killed due to run STOP.1 request" >> $resdir/Warnings 2>&1 kill -KILL $qemu_pid break fi diff --git a/tools/testing/selftests/rcutorture/bin/kvm.sh b/tools/testing/selftests/rcutorture/bin/kvm.sh index 45d07b7b69f5..8d3c99b35e06 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm.sh @@ -47,6 +47,9 @@ cpus=0 ds=`date +%Y.%m.%d-%H.%M.%S` jitter="-1" +startdate="`date`" +starttime="`get_starttime`" + usage () { echo "Usage: $scriptname optional arguments:" echo " --allcpus" @@ -57,7 +60,7 @@ usage () { echo " --cpus N" echo " --datestamp string" echo " --defconfig string" - echo " --dryrun sched|script" + echo " --dryrun batches|sched|script" echo " --duration minutes | s | h | d" echo " --gdb" echo " --help" @@ -85,7 +88,7 @@ do ;; --bootargs|--bootarg) checkarg --bootargs "(list of kernel boot arguments)" "$#" "$2" '.*' '^--' - TORTURE_BOOTARGS="$2" + TORTURE_BOOTARGS="$TORTURE_BOOTARGS $2" shift ;; --bootimage) @@ -97,8 +100,8 @@ do TORTURE_BUILDONLY=1 ;; --configs|--config) - checkarg --configs "(list of config files)" "$#" "$2" '^[^/]*$' '^--' - configs="$2" + checkarg --configs "(list of config files)" "$#" "$2" '^[^/]\+$' '^--' + configs="$configs $2" shift ;; --cpus) @@ -113,7 +116,7 @@ do shift ;; --datestamp) - checkarg --datestamp "(relative pathname)" "$#" "$2" '^[^/]*$' '^--' + checkarg --datestamp "(relative pathname)" "$#" "$2" '^[a-zA-Z0-9._-/]*$' '^--' ds=$2 shift ;; @@ -123,7 +126,7 @@ do shift ;; --dryrun) - checkarg --dryrun "sched|script" $# "$2" 'sched\|script' '^--' + checkarg --dryrun "batches|sched|script" $# "$2" 'batches\|sched\|script' '^--' dryrun=$2 shift ;; @@ -162,18 +165,18 @@ do ;; --kconfig|--kconfigs) checkarg --kconfig "(Kconfig options)" $# "$2" '^CONFIG_[A-Z0-9_]\+=\([ynm]\|[0-9]\+\)\( CONFIG_[A-Z0-9_]\+=\([ynm]\|[0-9]\+\)\)*$' '^error$' - TORTURE_KCONFIG_ARG="$2" + TORTURE_KCONFIG_ARG="`echo "$TORTURE_KCONFIG_ARG $2" | sed -e 's/^ *//' -e 's/ *$//'`" shift ;; --kasan) TORTURE_KCONFIG_KASAN_ARG="CONFIG_DEBUG_INFO=y CONFIG_KASAN=y"; export TORTURE_KCONFIG_KASAN_ARG ;; --kcsan) - TORTURE_KCONFIG_KCSAN_ARG="CONFIG_DEBUG_INFO=y CONFIG_KCSAN=y CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000 CONFIG_KCSAN_VERBOSE=y CONFIG_KCSAN_INTERRUPT_WATCHER=y"; export TORTURE_KCONFIG_KCSAN_ARG + TORTURE_KCONFIG_KCSAN_ARG="CONFIG_DEBUG_INFO=y CONFIG_KCSAN=y CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000 CONFIG_KCSAN_INTERRUPT_WATCHER=y CONFIG_KCSAN_VERBOSE=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y"; export TORTURE_KCONFIG_KCSAN_ARG ;; --kmake-arg|--kmake-args) checkarg --kmake-arg "(kernel make arguments)" $# "$2" '.*' '^error$' - TORTURE_KMAKE_ARG="$2" + TORTURE_KMAKE_ARG="`echo "$TORTURE_KMAKE_ARG $2" | sed -e 's/^ *//' -e 's/ *$//'`" shift ;; --mac) @@ -191,7 +194,7 @@ do ;; --qemu-args|--qemu-arg) checkarg --qemu-args "(qemu arguments)" $# "$2" '^-' '^error' - TORTURE_QEMU_ARG="$2" + TORTURE_QEMU_ARG="`echo "$TORTURE_QEMU_ARG $2" | sed -e 's/^ *//' -e 's/ *$//'`" shift ;; --qemu-cmd) @@ -232,7 +235,7 @@ do shift done -if test -z "$TORTURE_INITRD" || tools/testing/selftests/rcutorture/bin/mkinitrd.sh +if test -n "$dryrun" || test -z "$TORTURE_INITRD" || tools/testing/selftests/rcutorture/bin/mkinitrd.sh then : else @@ -283,19 +286,34 @@ then exit 1 fi fi -for CF1 in $configs_derep +echo 'BEGIN {' > $T/cfgcpu.awk +for CF1 in `echo $configs_derep | tr -s ' ' '\012' | sort -u` do if test -f "$CONFIGFRAG/$CF1" then - cpu_count=`configNR_CPUS.sh $CONFIGFRAG/$CF1` + if echo "$TORTURE_KCONFIG_ARG" | grep -q '\ $T/KCONFIG_ARG + cpu_count=`configNR_CPUS.sh $T/KCONFIG_ARG` + else + cpu_count=`configNR_CPUS.sh $CONFIGFRAG/$CF1` + fi cpu_count=`configfrag_boot_cpus "$TORTURE_BOOTARGS" "$CONFIGFRAG/$CF1" "$cpu_count"` cpu_count=`configfrag_boot_maxcpus "$TORTURE_BOOTARGS" "$CONFIGFRAG/$CF1" "$cpu_count"` - echo $CF1 $cpu_count >> $T/cfgcpu + echo 'scenariocpu["'"$CF1"'"] = '"$cpu_count"';' >> $T/cfgcpu.awk else echo "The --configs file $CF1 does not exist, terminating." exit 1 fi done +cat << '___EOF___' >> $T/cfgcpu.awk +} +{ + for (i = 1; i <= NF; i++) + print $i, scenariocpu[$i]; +} +___EOF___ +echo $configs_derep | awk -f $T/cfgcpu.awk > $T/cfgcpu sort -k2nr $T/cfgcpu -T="$T" > $T/cfgcpu.sort # Use a greedy bin-packing algorithm, sorting the list accordingly. @@ -315,11 +333,10 @@ END { batch = 0; nc = -1; - # Each pass through the following loop creates on test batch - # that can be executed concurrently given ncpus. Note that a - # given test that requires more than the available CPUs will run in - # their own batch. Such tests just have to make do with what - # is available. + # Each pass through the following loop creates on test batch that + # can be executed concurrently given ncpus. Note that a given test + # that requires more than the available CPUs will run in its own + # batch. Such tests just have to make do with what is available. while (nc != ncpus) { batch++; nc = ncpus; @@ -375,9 +392,9 @@ if ! test -e $resdir then mkdir -p "$resdir" || : fi -mkdir $resdir/$ds +mkdir -p $resdir/$ds TORTURE_RESDIR="$resdir/$ds"; export TORTURE_RESDIR -TORTURE_STOPFILE="$resdir/$ds/STOP"; export TORTURE_STOPFILE +TORTURE_STOPFILE="$resdir/$ds/STOP.1"; export TORTURE_STOPFILE echo Results directory: $resdir/$ds echo $scriptname $args touch $resdir/$ds/log @@ -517,14 +534,19 @@ END { dump(first, i, batchnum); }' >> $T/script -cat << ___EOF___ >> $T/script -echo -echo -echo " --- `date` Test summary:" -echo Results directory: $resdir/$ds -kcsan-collapse.sh $resdir/$ds -kvm-recheck.sh $resdir/$ds +cat << '___EOF___' >> $T/script +echo | tee -a $TORTURE_RESDIR/log +echo | tee -a $TORTURE_RESDIR/log +echo " --- `date` Test summary:" | tee -a $TORTURE_RESDIR/log ___EOF___ +cat << ___EOF___ >> $T/script +echo Results directory: $resdir/$ds | tee -a $resdir/$ds/log +kcsan-collapse.sh $resdir/$ds | tee -a $resdir/$ds/log +kvm-recheck.sh $resdir/$ds > $T/kvm-recheck.sh.out 2>&1 +___EOF___ +echo 'ret=$?' >> $T/script +echo "cat $T/kvm-recheck.sh.out | tee -a $resdir/$ds/log" >> $T/script +echo 'exit $ret' >> $T/script if test "$dryrun" = script then @@ -533,13 +555,34 @@ then elif test "$dryrun" = sched then # Extract the test run schedule from the script. - egrep 'Start batch|Starting build\.' $T/script | - grep -v ">>" | + egrep 'Start batch|Starting build\.' $T/script | grep -v ">>" | sed -e 's/:.*$//' -e 's/^echo //' + nbuilds="`grep 'Starting build\.' $T/script | + grep -v ">>" | sed -e 's/:.*$//' -e 's/^echo //' | + awk '{ print $1 }' | grep -v '\.' | wc -l`" + echo Total number of builds: $nbuilds + nbatches="`grep 'Start batch' $T/script | grep -v ">>" | wc -l`" + echo Total number of batches: $nbatches exit 0 +elif test "$dryrun" = batches +then + # Extract the tests and their batches from the script. + egrep 'Start batch|Starting build\.' $T/script | grep -v ">>" | + sed -e 's/:.*$//' -e 's/^echo //' -e 's/-ovf//' | + awk ' + /^----Start/ { + batchno = $3; + next; + } + { + print batchno, $1, $2 + }' else # Not a dryrun, so run the script. - sh $T/script + bash $T/script + ret=$? + echo " --- Done at `date` (`get_starttime_duration $starttime`) exitcode $ret" | tee -a $resdir/$ds/log + exit $ret fi # Tracing: trace_event=rcu:rcu_grace_period,rcu:rcu_future_grace_period,rcu:rcu_grace_period_init,rcu:rcu_nocb_wake,rcu:rcu_preempt_task,rcu:rcu_unlock_preempted_task,rcu:rcu_quiescent_state_report,rcu:rcu_fqs,rcu:rcu_callback,rcu:rcu_kfree_callback,rcu:rcu_batch_start,rcu:rcu_invoke_callback,rcu:rcu_invoke_kfree_callback,rcu:rcu_batch_end,rcu:rcu_torture_read,rcu:rcu_barrier diff --git a/tools/testing/selftests/rcutorture/bin/parse-build.sh b/tools/testing/selftests/rcutorture/bin/parse-build.sh index 09155c15ea65..9313e5065ae9 100755 --- a/tools/testing/selftests/rcutorture/bin/parse-build.sh +++ b/tools/testing/selftests/rcutorture/bin/parse-build.sh @@ -21,7 +21,7 @@ mkdir $T . functions.sh -if grep -q CC < $F || test -n "$TORTURE_TRUST_MAKE" +if grep -q CC < $F || test -n "$TORTURE_TRUST_MAKE" || grep -qe --trust-make < `dirname $F`/../log then : else diff --git a/tools/testing/selftests/rcutorture/bin/parse-console.sh b/tools/testing/selftests/rcutorture/bin/parse-console.sh index 263b1be50008..9f624bd53c27 100755 --- a/tools/testing/selftests/rcutorture/bin/parse-console.sh +++ b/tools/testing/selftests/rcutorture/bin/parse-console.sh @@ -128,7 +128,7 @@ then then summary="$summary Badness: $n_badness" fi - n_warn=`grep -v 'Warning: unable to open an initial console' $file | egrep -c 'WARNING:|Warn'` + n_warn=`grep -v 'Warning: unable to open an initial console' $file | grep -v 'Warning: Failed to add ttynull console. No stdin, stdout, and stderr for the init process' | egrep -c 'WARNING:|Warn'` if test "$n_warn" -ne 0 then summary="$summary Warnings: $n_warn" diff --git a/tools/testing/selftests/rcutorture/bin/torture.sh b/tools/testing/selftests/rcutorture/bin/torture.sh new file mode 100755 index 000000000000..ad7525b7ac29 --- /dev/null +++ b/tools/testing/selftests/rcutorture/bin/torture.sh @@ -0,0 +1,442 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0+ +# +# Run a series of torture tests, intended for overnight or +# longer timeframes, and also for large systems. +# +# Usage: torture.sh [ options ] +# +# Copyright (C) 2020 Facebook, Inc. +# +# Authors: Paul E. McKenney + +scriptname=$0 +args="$*" + +KVM="`pwd`/tools/testing/selftests/rcutorture"; export KVM +PATH=${KVM}/bin:$PATH; export PATH +. functions.sh + +TORTURE_ALLOTED_CPUS="`identify_qemu_vcpus`" +MAKE_ALLOTED_CPUS=$((TORTURE_ALLOTED_CPUS*2)) +HALF_ALLOTED_CPUS=$((TORTURE_ALLOTED_CPUS/2)) +if test "$HALF_ALLOTED_CPUS" -lt 1 +then + HALF_ALLOTED_CPUS=1 +fi +VERBOSE_BATCH_CPUS=$((TORTURE_ALLOTED_CPUS/16)) +if test "$VERBOSE_BATCH_CPUS" -lt 2 +then + VERBOSE_BATCH_CPUS=0 +fi + +# Configurations/scenarios. +configs_rcutorture= +configs_locktorture= +configs_scftorture= +kcsan_kmake_args= + +# Default compression, duration, and apportionment. +compress_kasan_vmlinux="`identify_qemu_vcpus`" +duration_base=10 +duration_rcutorture_frac=7 +duration_locktorture_frac=1 +duration_scftorture_frac=2 + +# "yes" or "no" parameters +do_allmodconfig=yes +do_rcutorture=yes +do_locktorture=yes +do_scftorture=yes +do_rcuscale=yes +do_refscale=yes +do_kvfree=yes +do_kasan=yes +do_kcsan=no + +# doyesno - Helper function for yes/no arguments +function doyesno () { + if test "$1" = "$2" + then + echo yes + else + echo no + fi +} + +usage () { + echo "Usage: $scriptname optional arguments:" + echo " --compress-kasan-vmlinux concurrency" + echo " --configs-rcutorture \"config-file list w/ repeat factor (3*TINY01)\"" + echo " --configs-locktorture \"config-file list w/ repeat factor (10*LOCK01)\"" + echo " --configs-scftorture \"config-file list w/ repeat factor (2*CFLIST)\"" + echo " --doall" + echo " --doallmodconfig / --do-no-allmodconfig" + echo " --do-kasan / --do-no-kasan" + echo " --do-kcsan / --do-no-kcsan" + echo " --do-kvfree / --do-no-kvfree" + echo " --do-locktorture / --do-no-locktorture" + echo " --do-none" + echo " --do-rcuscale / --do-no-rcuscale" + echo " --do-rcutorture / --do-no-rcutorture" + echo " --do-refscale / --do-no-refscale" + echo " --do-scftorture / --do-no-scftorture" + echo " --duration [ | h | d ]" + echo " --kcsan-kmake-arg kernel-make-arguments" + exit 1 +} + +while test $# -gt 0 +do + case "$1" in + --compress-kasan-vmlinux) + checkarg --compress-kasan-vmlinux "(concurrency level)" $# "$2" '^[0-9][0-9]*$' '^error' + compress_kasan_vmlinux=$2 + shift + ;; + --config-rcutorture|--configs-rcutorture) + checkarg --configs-rcutorture "(list of config files)" "$#" "$2" '^[^/]\+$' '^--' + configs_rcutorture="$configs_rcutorture $2" + shift + ;; + --config-locktorture|--configs-locktorture) + checkarg --configs-locktorture "(list of config files)" "$#" "$2" '^[^/]\+$' '^--' + configs_locktorture="$configs_locktorture $2" + shift + ;; + --config-scftorture|--configs-scftorture) + checkarg --configs-scftorture "(list of config files)" "$#" "$2" '^[^/]\+$' '^--' + configs_scftorture="$configs_scftorture $2" + shift + ;; + --doall) + do_allmodconfig=yes + do_rcutorture=yes + do_locktorture=yes + do_scftorture=yes + do_rcuscale=yes + do_refscale=yes + do_kvfree=yes + do_kasan=yes + do_kcsan=yes + ;; + --do-allmodconfig|--do-no-allmodconfig) + do_allmodconfig=`doyesno "$1" --do-allmodconfig` + ;; + --do-kasan|--do-no-kasan) + do_kasan=`doyesno "$1" --do-kasan` + ;; + --do-kcsan|--do-no-kcsan) + do_kcsan=`doyesno "$1" --do-kcsan` + ;; + --do-kvfree|--do-no-kvfree) + do_kvfree=`doyesno "$1" --do-kvfree` + ;; + --do-locktorture|--do-no-locktorture) + do_locktorture=`doyesno "$1" --do-locktorture` + ;; + --do-none) + do_allmodconfig=no + do_rcutorture=no + do_locktorture=no + do_scftorture=no + do_rcuscale=no + do_refscale=no + do_kvfree=no + do_kasan=no + do_kcsan=no + ;; + --do-rcuscale|--do-no-rcuscale) + do_rcuscale=`doyesno "$1" --do-rcuscale` + ;; + --do-rcutorture|--do-no-rcutorture) + do_rcutorture=`doyesno "$1" --do-rcutorture` + ;; + --do-refscale|--do-no-refscale) + do_refscale=`doyesno "$1" --do-refscale` + ;; + --do-scftorture|--do-no-scftorture) + do_scftorture=`doyesno "$1" --do-scftorture` + ;; + --duration) + checkarg --duration "(minutes)" $# "$2" '^[0-9][0-9]*\(m\|h\|d\|\)$' '^error' + mult=1 + if echo "$2" | grep -q 'm$' + then + mult=1 + elif echo "$2" | grep -q 'h$' + then + mult=60 + elif echo "$2" | grep -q 'd$' + then + mult=1440 + fi + ts=`echo $2 | sed -e 's/[smhd]$//'` + duration_base=$(($ts*mult)) + shift + ;; + --kcsan-kmake-arg|--kcsan-kmake-args) + checkarg --kcsan-kmake-arg "(kernel make arguments)" $# "$2" '.*' '^error$' + kcsan_kmake_args="`echo "$kcsan_kmake_args $2" | sed -e 's/^ *//' -e 's/ *$//'`" + shift + ;; + *) + echo Unknown argument $1 + usage + ;; + esac + shift +done + +ds="`date +%Y.%m.%d-%H.%M.%S`-torture" +startdate="`date`" +starttime="`get_starttime`" + +T=/tmp/torture.sh.$$ +trap 'rm -rf $T' 0 2 +mkdir $T + +echo " --- " $scriptname $args | tee -a $T/log +echo " --- Results directory: " $ds | tee -a $T/log + +# Calculate rcutorture defaults and apportion time +if test -z "$configs_rcutorture" +then + configs_rcutorture=CFLIST +fi +duration_rcutorture=$((duration_base*duration_rcutorture_frac/10)) +if test "$duration_rcutorture" -eq 0 +then + echo " --- Zero time for rcutorture, disabling" | tee -a $T/log + do_rcutorture=no +fi + +# Calculate locktorture defaults and apportion time +if test -z "$configs_locktorture" +then + configs_locktorture=CFLIST +fi +duration_locktorture=$((duration_base*duration_locktorture_frac/10)) +if test "$duration_locktorture" -eq 0 +then + echo " --- Zero time for locktorture, disabling" | tee -a $T/log + do_locktorture=no +fi + +# Calculate scftorture defaults and apportion time +if test -z "$configs_scftorture" +then + configs_scftorture=CFLIST +fi +duration_scftorture=$((duration_base*duration_scftorture_frac/10)) +if test "$duration_scftorture" -eq 0 +then + echo " --- Zero time for scftorture, disabling" | tee -a $T/log + do_scftorture=no +fi + +touch $T/failures +touch $T/successes + +# torture_one - Does a single kvm.sh run. +# +# Usage: +# torture_bootargs="[ kernel boot arguments ]" +# torture_one flavor [ kvm.sh arguments ] +# +# Note that "flavor" is an arbitrary string. Supply --torture if needed. +# Note that quoting is problematic. So on the command line, pass multiple +# values with multiple kvm.sh argument instances. +function torture_one { + local cur_bootargs= + local boottag= + + echo " --- $curflavor:" Start `date` | tee -a $T/log + if test -n "$torture_bootargs" + then + boottag="--bootargs" + cur_bootargs="$torture_bootargs" + fi + "$@" $boottag "$cur_bootargs" --datestamp "$ds/results-$curflavor" > $T/$curflavor.out 2>&1 + retcode=$? + resdir="`grep '^Results directory: ' $T/$curflavor.out | tail -1 | sed -e 's/^Results directory: //'`" + if test -z "$resdir" + then + cat $T/$curflavor.out | tee -a $T/log + echo retcode=$retcode | tee -a $T/log + fi + if test "$retcode" == 0 + then + echo "$curflavor($retcode)" $resdir >> $T/successes + else + echo "$curflavor($retcode)" $resdir >> $T/failures + fi +} + +# torture_set - Does a set of tortures with and without KASAN and KCSAN. +# +# Usage: +# torture_bootargs="[ kernel boot arguments ]" +# torture_set flavor [ kvm.sh arguments ] +# +# Note that "flavor" is an arbitrary string. Supply --torture if needed. +# Note that quoting is problematic. So on the command line, pass multiple +# values with multiple kvm.sh argument instances. +function torture_set { + local cur_kcsan_kmake_args= + local kcsan_kmake_tag= + local flavor=$1 + shift + curflavor=$flavor + torture_one "$@" + if test "$do_kasan" = "yes" + then + curflavor=${flavor}-kasan + torture_one "$@" --kasan + fi + if test "$do_kcsan" = "yes" + then + curflavor=${flavor}-kcsan + if test -n "$kcsan_kmake_args" + then + kcsan_kmake_tag="--kmake-args" + cur_kcsan_kmake_args="$kcsan_kmake_args" + fi + torture_one $* --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y" $kcsan_kmake_tag $cur_kcsan_kmake_args --kcsan + fi +} + +# make allmodconfig +if test "$do_allmodconfig" = "yes" +then + echo " --- allmodconfig:" Start `date` | tee -a $T/log + amcdir="tools/testing/selftests/rcutorture/res/$ds/allmodconfig" + mkdir -p "$amcdir" + echo " --- make clean" > "$amcdir/Make.out" 2>&1 + make -j$MAKE_ALLOTED_CPUS clean >> "$amcdir/Make.out" 2>&1 + echo " --- make allmodconfig" >> "$amcdir/Make.out" 2>&1 + make -j$MAKE_ALLOTED_CPUS allmodconfig >> "$amcdir/Make.out" 2>&1 + echo " --- make " >> "$amcdir/Make.out" 2>&1 + make -j$MAKE_ALLOTED_CPUS >> "$amcdir/Make.out" 2>&1 + retcode="$?" + echo $retcode > "$amcdir/Make.exitcode" + if test "$retcode" == 0 + then + echo "allmodconfig($retcode)" $amcdir >> $T/successes + else + echo "allmodconfig($retcode)" $amcdir >> $T/failures + fi +fi + +# --torture rcu +if test "$do_rcutorture" = "yes" +then + torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000" + torture_set "rcutorture" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration "$duration_rcutorture" --configs "$configs_rcutorture" --trust-make +fi + +if test "$do_locktorture" = "yes" +then + torture_bootargs="torture.disable_onoff_at_boot" + torture_set "locktorture" tools/testing/selftests/rcutorture/bin/kvm.sh --torture lock --allcpus --duration "$duration_locktorture" --configs "$configs_locktorture" --trust-make +fi + +if test "$do_scftorture" = "yes" +then + torture_bootargs="scftorture.nthreads=$HALF_ALLOTED_CPUS torture.disable_onoff_at_boot" + torture_set "scftorture" tools/testing/selftests/rcutorture/bin/kvm.sh --torture scf --allcpus --duration "$duration_scftorture" --configs "$configs_scftorture" --kconfig "CONFIG_NR_CPUS=$HALF_ALLOTED_CPUS" --trust-make +fi + +if test "$do_refscale" = yes +then + primlist="`grep '\.name[ ]*=' kernel/rcu/refscale.c | sed -e 's/^[^"]*"//' -e 's/".*$//'`" +else + primlist= +fi +for prim in $primlist +do + torture_bootargs="refscale.scale_type="$prim" refscale.nreaders=$HALF_ALLOTED_CPUS refscale.loops=10000 refscale.holdoff=20 torture.disable_onoff_at_boot" + torture_set "refscale-$prim" tools/testing/selftests/rcutorture/bin/kvm.sh --torture refscale --allcpus --duration 5 --kconfig "CONFIG_NR_CPUS=$HALF_ALLOTED_CPUS" --bootargs "verbose_batched=$VERBOSE_BATCH_CPUS torture.verbose_sleep_frequency=8 torture.verbose_sleep_duration=$VERBOSE_BATCH_CPUS" --trust-make +done + +if test "$do_rcuscale" = yes +then + primlist="`grep '\.name[ ]*=' kernel/rcu/rcuscale.c | sed -e 's/^[^"]*"//' -e 's/".*$//'`" +else + primlist= +fi +for prim in $primlist +do + torture_bootargs="rcuscale.scale_type="$prim" rcuscale.nwriters=$HALF_ALLOTED_CPUS rcuscale.holdoff=20 torture.disable_onoff_at_boot" + torture_set "rcuscale-$prim" tools/testing/selftests/rcutorture/bin/kvm.sh --torture rcuscale --allcpus --duration 5 --kconfig "CONFIG_NR_CPUS=$HALF_ALLOTED_CPUS" --trust-make +done + +if test "$do_kvfree" = "yes" +then + torture_bootargs="rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=16 rcuscale.holdoff=20 rcuscale.kfree_loops=10000 torture.disable_onoff_at_boot" + torture_set "rcuscale-kvfree" tools/testing/selftests/rcutorture/bin/kvm.sh --torture rcuscale --allcpus --duration 10 --kconfig "CONFIG_NR_CPUS=$HALF_ALLOTED_CPUS" --trust-make +fi + +echo " --- " $scriptname $args +echo " --- " Done `date` | tee -a $T/log +ret=0 +nsuccesses=0 +echo SUCCESSES: | tee -a $T/log +if test -s "$T/successes" +then + cat "$T/successes" | tee -a $T/log + nsuccesses="`wc -l "$T/successes" | awk '{ print $1 }'`" +fi +nfailures=0 +echo FAILURES: | tee -a $T/log +if test -s "$T/failures" +then + cat "$T/failures" | tee -a $T/log + nfailures="`wc -l "$T/failures" | awk '{ print $1 }'`" + ret=2 +fi +echo Started at $startdate, ended at `date`, duration `get_starttime_duration $starttime`. | tee -a $T/log +echo Summary: Successes: $nsuccesses Failures: $nfailures. | tee -a $T/log +tdir="`cat $T/successes $T/failures | head -1 | awk '{ print $NF }' | sed -e 's,/[^/]\+/*$,,'`" +if test -n "$tdir" && test $compress_kasan_vmlinux -gt 0 +then + # KASAN vmlinux files can approach 1GB in size, so compress them. + echo Looking for KASAN files to compress: `date` > "$tdir/log-xz" 2>&1 + find "$tdir" -type d -name '*-kasan' -print > $T/xz-todo + ncompresses=0 + batchno=1 + if test -s $T/xz-todo + then + echo Size before compressing: `du -sh $tdir | awk '{ print $1 }'` `date` 2>&1 | tee -a "$tdir/log-xz" | tee -a $T/log + for i in `cat $T/xz-todo` + do + echo Compressing vmlinux files in ${i}: `date` >> "$tdir/log-xz" 2>&1 + for j in $i/*/vmlinux + do + xz "$j" >> "$tdir/log-xz" 2>&1 & + ncompresses=$((ncompresses+1)) + if test $ncompresses -ge $compress_kasan_vmlinux + then + echo Waiting for batch $batchno of $ncompresses compressions `date` | tee -a "$tdir/log-xz" | tee -a $T/log + wait + ncompresses=0 + batchno=$((batchno+1)) + fi + done + done + if test $ncompresses -gt 0 + then + echo Waiting for final batch $batchno of $ncompresses compressions `date` | tee -a "$tdir/log-xz" | tee -a $T/log + fi + wait + echo Size after compressing: `du -sh $tdir | awk '{ print $1 }'` `date` 2>&1 | tee -a "$tdir/log-xz" | tee -a $T/log + echo Total duration `get_starttime_duration $starttime`. | tee -a $T/log + else + echo No compression needed: `date` >> "$tdir/log-xz" 2>&1 + fi +fi +if test -n "$tdir" +then + cp $T/log "$tdir" +fi +exit $ret diff --git a/tools/testing/selftests/rcutorture/configs/rcu/RUDE01.boot b/tools/testing/selftests/rcutorture/configs/rcu/RUDE01.boot index 9363708c9075..932a0799eb08 100644 --- a/tools/testing/selftests/rcutorture/configs/rcu/RUDE01.boot +++ b/tools/testing/selftests/rcutorture/configs/rcu/RUDE01.boot @@ -1 +1,2 @@ rcutorture.torture_type=tasks-rude +rcutree.use_softirq=0 diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TASKS01.boot b/tools/testing/selftests/rcutorture/configs/rcu/TASKS01.boot index cd2a188eeb6d..22cdeced98ea 100644 --- a/tools/testing/selftests/rcutorture/configs/rcu/TASKS01.boot +++ b/tools/testing/selftests/rcutorture/configs/rcu/TASKS01.boot @@ -1 +1,2 @@ rcutorture.torture_type=tasks +rcutree.use_softirq=0 diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot b/tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot index d6da9a61d44a..40af3df0f397 100644 --- a/tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot @@ -2,5 +2,7 @@ maxcpus=8 nr_cpus=43 rcutree.gp_preinit_delay=3 rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3 -rcu_nocbs=0 +rcu_nocbs=0-1,3-7 +rcutorture.nocbs_nthreads=8 +rcutorture.nocbs_toggle=1000 rcutorture.fwd_progress=0