linux/kernel
Eric Dumazet 8af088710d posix-timers: RCU conversion
Ben Nagy reported a scalability problem with KVM/QEMU that hit very hard
a single spinlock (idr_lock) in posix-timers code, on its 48 core
machine.

Even on a 16 cpu machine (2x4x2), a single test can show 98% of cpu time
used in ticket_spin_lock, from lock_timer

Ref: http://www.spinics.net/lists/kvm/msg51526.html

Switching to RCU is quite easy, IDR being already RCU ready. idr_lock
should be locked only for an insert/delete, not a lookup.

Benchmark on a 2x4x2 machine, 16 processes calling timer_gettime().

Before :

real    1m18.669s
user    0m1.346s
sys     1m17.180s

After :

real    0m3.296s
user    0m1.366s
sys     0m1.926s

Reported-by: Ben Nagy <ben@iagu.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Ben Nagy <ben@iagu.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Richard Cochran <richard.cochran@omicron.at>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24 12:10:51 +02:00
..
debug Fix common misspellings 2011-03-31 11:26:23 -03:00
events Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-05-19 18:14:34 -07:00
gcov Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 2011-03-20 18:14:55 -07:00
irq Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-05-19 17:30:15 -07:00
power PM: Allow drivers to allocate memory from .prepare() callbacks safely 2011-05-17 23:26:00 +02:00
time hrtimers: Avoid touching inactive timer bases 2011-05-23 13:59:54 +02:00
trace Merge branches 'sched-core-for-linus' and 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-05-19 17:41:22 -07:00
.gitignore
acct.c
async.c
audit_tree.c Fix common misspellings 2011-03-31 11:26:23 -03:00
audit_watch.c kill path_lookup() 2011-03-14 09:15:23 -04:00
audit.c netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms 2011-03-03 10:55:40 -08:00
audit.h audit: make functions static 2010-10-30 01:42:19 -04:00
auditfilter.c netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms 2011-03-03 10:55:40 -08:00
auditsc.c audit: acquire creds selectively to reduce atomic op overhead 2011-04-27 15:11:03 +02:00
backtracetest.c
bounds.c memcg: remove direct page_cgroup-to-page pointer 2011-03-23 19:46:28 -07:00
capability.c Cache user_ns in struct cred 2011-05-13 11:45:33 -07:00
cgroup_freezer.c cgroup_freezer: update_freezer_state() does incorrect state transitions 2010-10-27 18:03:08 -07:00
cgroup.c cgroup,rcu: convert call_rcu(__free_css_id_cb) to kfree_rcu() 2011-05-07 22:50:47 -07:00
compat.c signal: introduce do_sigtimedwait() to factor out compat/native code 2011-04-28 13:01:38 +02:00
configs.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
cpu.c Fix common misspellings 2011-03-31 11:26:23 -03:00
cpuset.c sched: Dynamic sched_domain::level 2011-04-11 14:09:32 +02:00
crash_dump.c crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn 2011-03-23 19:47:19 -07:00
cred.c Cache user_ns in struct cred 2011-05-13 11:45:33 -07:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c Merge branch 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc 2011-05-20 13:33:21 -07:00
extable.c extable, core_kernel_data(): Make sure all archs define _sdata 2011-05-20 08:56:56 +02:00
fork.c sched: Remove unused parameters from sched_fork() and wake_up_new_task() 2011-05-12 09:36:37 +02:00
freezer.c Freezer: Use SMP barriers 2011-05-17 23:19:17 +02:00
futex_compat.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
futex.c futex: Set FLAGS_HAS_TIMEOUT during futex_wait restart setup 2011-04-15 16:34:32 +02:00
groups.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
hrtimer.c hrtimers: Reorder clock bases 2011-05-23 13:59:54 +02:00
hung_task.c watchdog, hung_task_timeout: Add Kconfig configurable default 2011-04-28 09:13:17 +02:00
irq_work.c irq_work: Use per cpu atomics instead of regular atomics 2010-12-18 15:54:48 +01:00
itimer.c
jump_label.c jump label: Introduce static_branch() interface 2011-04-04 12:48:08 -04:00
kallsyms.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-25 17:52:22 -07:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks arch:Kconfig.locks Remove unused config option. 2011-04-10 17:01:05 +02:00
Kconfig.preempt
kexec.c PM: Remove sysdev suspend, resume and shutdown operations 2011-05-11 21:37:15 +02:00
kfifo.c kfifo: fix scatterlist usage 2010-10-01 10:50:58 -07:00
kmod.c kmod: always provide usermodehelper_disable() 2011-05-17 23:19:18 +02:00
kprobes.c Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2011-01-07 17:02:58 -08:00
ksysfs.c kernel/ksysfs.c: expose file_caps_enabled in sysfs 2011-04-19 16:45:51 -07:00
kthread.c Fix common misspellings 2011-03-31 11:26:23 -03:00
latencytop.c Fix common misspellings 2011-03-31 11:26:23 -03:00
lockdep_internals.h
lockdep_proc.c lockdep: Remove unused 'factor' variable from lockdep_stats_show() 2011-03-23 13:54:47 +01:00
lockdep_states.h
lockdep.c lockdep: Remove cmpxchg to update nr_chain_hlocks 2011-04-22 11:06:59 +02:00
Makefile hw breakpoints: Move to kernel/events/ 2011-05-03 15:26:43 +02:00
module.c Merge branch 'staging-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6 2011-05-23 12:49:28 -07:00
mutex-debug.c mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex-debug.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex.c sched: Get rid of lock_depth 2011-04-24 13:18:38 +02:00
mutex.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
notifier.c
ns_cgroup.c cgroup: notify ns_cgroup deprecated 2010-10-27 18:03:09 -07:00
nsproxy.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
padata.c Fix common misspellings 2011-03-31 11:26:23 -03:00
panic.c move x86 specific oops=panic to generic code 2011-03-22 17:44:11 -07:00
params.c params.c: Use new strtobool function to process boolean inputs 2011-05-19 16:55:28 +09:30
pid_namespace.c pidns: call pid_ns_prepare_proc() from create_pid_namespace() 2011-03-23 19:46:58 -07:00
pid.c next_pidmap: fix overflow condition 2011-04-18 10:35:30 -07:00
pm_qos_params.c treewide: remove extra semicolons 2011-04-10 17:01:05 +02:00
posix-cpu-timers.c hrtimers: Avoid touching inactive timer bases 2011-05-23 13:59:54 +02:00
posix-timers.c posix-timers: RCU conversion 2011-05-24 12:10:51 +02:00
printk.c printk: allow setting DEFAULT_MESSAGE_LEVEL via Kconfig 2011-03-22 17:44:13 -07:00
profile.c profile: Use vzalloc() rather than vmalloc() & memset() 2010-10-31 09:47:26 +01:00
ptrace.c Merge branch 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc 2011-05-20 13:33:21 -07:00
range.c kernel/range.c: fix clean_sort_range() for the case of full array 2010-11-12 07:55:31 -08:00
rcupdate.c rcu: Use WARN_ON_ONCE for DEBUG_OBJECTS_RCU_HEAD warnings 2011-05-05 23:16:57 -07:00
rcutiny_plugin.h rcu: Converge TINY_RCU expedited and normal boosting 2011-05-05 23:16:58 -07:00
rcutiny.c sanitize <linux/prefetch.h> usage 2011-05-20 12:50:29 -07:00
rcutorture.c rcu: mark rcutorture boosting callback as being on-stack 2011-05-05 23:16:57 -07:00
rcutree_plugin.h Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-19 23:25:29 +02:00
rcutree_trace.c Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-19 23:25:29 +02:00
rcutree.c sanitize <linux/prefetch.h> usage 2011-05-20 12:50:29 -07:00
rcutree.h Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" 2011-05-19 23:25:29 +02:00
relay.c Clean up relay_alloc_page_array() slightly by using vzalloc rather than vmalloc and memset 2010-11-05 08:21:34 -07:00
res_counter.c memcg: res_counter_read_u64(): fix potential races on 32-bit machines 2011-03-23 19:46:22 -07:00
resource.c resources: add arch hook for preventing allocation in reserved areas 2010-12-17 10:01:09 -08:00
rtmutex_common.h rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rtmutex-debug.c rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rtmutex-debug.h
rtmutex-tester.c rtmutex: tester: Remove the remaining BKL leftovers 2011-02-22 22:07:22 +01:00
rtmutex.c rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rtmutex.h
rwsem.c
sched_autogroup.c Fix common misspellings 2011-03-31 11:26:23 -03:00
sched_autogroup.h sched, autogroup: Stop going ahead if autogroup is disabled 2011-02-23 11:33:59 +01:00
sched_clock.c sched: Add some clock info to sched_debug 2010-11-23 10:29:08 +01:00
sched_cpupri.c
sched_cpupri.h
sched_debug.c sched: Get rid of lock_depth 2011-04-24 13:18:38 +02:00
sched_fair.c sched: Introduce SCHED_POWER_SCALE to scale cpu_power calculations 2011-05-20 14:16:50 +02:00
sched_features.h sched: Move the second half of ttwu() to the remote cpu 2011-04-14 08:52:41 +02:00
sched_idletask.c sched: Drop the rq argument to sched_class::select_task_rq() 2011-04-14 08:52:36 +02:00
sched_rt.c Merge branches 'sched-core-for-linus' and 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-05-19 17:41:22 -07:00
sched_stats.h sched_stat: Update sched_info_queue/dequeue() code comments 2010-10-24 13:29:01 +02:00
sched_stoptask.c sched: Drop the rq argument to sched_class::select_task_rq() 2011-04-14 08:52:36 +02:00
sched.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-05-23 12:53:48 -07:00
seccomp.c
semaphore.c
signal.c Merge branch 'ptrace' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc 2011-05-20 13:33:21 -07:00
smp.c smp: move smp setup functions to kernel/smp.c 2011-03-22 17:44:11 -07:00
softirq.c rcu: move TREE_RCU from softirq to kthread 2011-05-05 23:16:54 -07:00
spinlock.c
srcu.c rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status 2011-01-14 04:56:49 -08:00
stacktrace.c
stop_machine.c kthread: use kthread_create_on_node() 2011-03-22 17:44:01 -07:00
sys_ni.c ipc: Add missing sys_ni entries for ipc/compat.c functions 2011-05-20 13:53:02 -07:00
sys.c Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 2011-05-19 18:24:11 -07:00
sysctl_binary.c open-style analog of vfs_path_lookup() 2011-03-14 09:15:28 -04:00
sysctl_check.c sysctl_check: drop dead code 2011-03-23 19:46:51 -07:00
sysctl.c watchdog: Disable watchdog when thresh is zero 2011-05-23 11:58:59 +02:00
taskstats.c taskstats: use appropriate printk priority level 2011-03-23 19:47:14 -07:00
test_kprobes.c kprobes: Fix selftest to clear flags field for reusing probes 2010-10-14 08:55:27 +02:00
time.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-15 18:53:35 -07:00
timeconst.pl
timer.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-15 18:53:35 -07:00
tracepoint.c jump label: Introduce static_branch() interface 2011-04-04 12:48:08 -04:00
tsacct.c taskstats: use real microsecond granularity for CPU times 2010-10-27 18:03:17 -07:00
uid16.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
up.c
user_namespace.c user_ns: improve the user_ns on-the-slab packaging 2011-01-13 08:03:18 -08:00
user-return-notifier.c Fix common misspellings 2011-03-31 11:26:23 -03:00
user.c userns: add a user_namespace as creator/owner of uts_namespace 2011-03-23 19:46:59 -07:00
utsname_sysctl.c
utsname.c userns: allow sethostname in a container 2011-03-23 19:47:03 -07:00
wait.c Fix common misspellings 2011-03-31 11:26:23 -03:00
watchdog.c kernel/watchdog.c: Use proper ANSI C prototypes 2011-05-23 21:07:40 -07:00
workqueue_sched.h
workqueue.c workqueue: fix deadlock in worker_maybe_bind_and_lock() 2011-04-29 18:08:37 +02:00