linux/kernel
Frederic Weisbecker db2c4c7791 lockdep: Move lock events under lockdep recursion protection
There are rcu locked read side areas in the path where we submit
a trace event. And these rcu_read_(un)lock() trigger lock events,
which create recursive events.

One pair in do_perf_sw_event:

__lock_acquire
      |
      |--96.11%-- lock_acquire
      |          |
      |          |--27.21%-- do_perf_sw_event
      |          |          perf_tp_event
      |          |          |
      |          |          |--49.62%-- ftrace_profile_lock_release
      |          |          |          lock_release
      |          |          |          |
      |          |          |          |--33.85%-- _raw_spin_unlock

Another pair in perf_output_begin/end:

__lock_acquire
      |--23.40%-- perf_output_begin
      |          |          __perf_event_overflow
      |          |          perf_swevent_overflow
      |          |          perf_swevent_add
      |          |          perf_swevent_ctx_event
      |          |          do_perf_sw_event
      |          |          perf_tp_event
      |          |          |
      |          |          |--55.37%-- ftrace_profile_lock_acquire
      |          |          |          lock_acquire
      |          |          |          |
      |          |          |          |--37.31%-- _raw_spin_lock

The problem is not that much the trace recursion itself, as we have a
recursion protection already (though it's always wasteful to recurse).
But the trace events are outside the lockdep recursion protection, then
each lockdep event triggers a lock trace, which will trigger two
other lockdep events. Here the recursive lock trace event won't
be taken because of the trace recursion, so the recursion stops there
but lockdep will still analyse these new events:

To sum up, for each lockdep events we have:

	lock_*()
	     |
             trace lock_acquire
                  |
                  ----- rcu_read_lock()
                  |          |
                  |          lock_acquire()
                  |          |
                  |          trace_lock_acquire() (stopped)
                  |          |
		  |          lockdep analyze
                  |
                  ----- rcu_read_unlock()
                             |
                             lock_release
                             |
                             trace_lock_release() (stopped)
                             |
                             lockdep analyze

And you can repeat the above two times as we have two rcu read side
sections when we submit an event.

This is fixed in this patch by moving the lock trace event under
the lockdep recursion protection.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
2010-03-10 14:26:07 +01:00
..
gcov microblaze: Enable GCOV_PROFILE_ALL 2009-09-21 14:29:21 +02:00
irq sparseirq: Use radix_tree instead of ptrs array 2010-02-17 17:27:20 -08:00
power mm/pm: force GFP_NOIO during suspend/hibernation and resume 2010-03-06 11:26:26 -08:00
time Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-03-01 08:48:25 -08:00
trace Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2010-03-03 07:34:18 -08:00
.gitignore
acct.c bsdacct: fix uid/gid misreporting 2009-12-15 08:53:10 -08:00
async.c async: Fix lack of boot-time console due to insufficient synchronization 2009-06-08 12:31:53 -07:00
audit_tree.c new helper: iterate_mounts() 2010-03-03 14:07:57 -05:00
audit_watch.c Audit: reorganize struct audit_watch to save 8 bytes 2009-09-24 03:50:25 -04:00
audit.c Audit: send signal info if selinux is disabled 2009-09-24 03:50:26 -04:00
audit.h Fix rule eviction order for AUDIT_DIR 2009-06-24 00:02:38 -04:00
auditfilter.c Audit: clean up all op= output to include string quoting 2009-06-24 00:00:52 -04:00
auditsc.c Lose the first argument of audit_inode_child() 2010-02-08 14:38:36 -05:00
backtracetest.c
bounds.c kbuild: move bounds.h to include/generated 2009-12-12 13:08:14 +01:00
capability.c capabilities: Use RCU to protect task lookup in sys_capget 2009-12-10 09:42:48 +11:00
cgroup_freezer.c cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time 2009-09-24 07:20:58 -07:00
cgroup.c sched, cgroups: Fix module export 2010-02-25 12:02:13 +01:00
compat.c
configs.c
cpu.c kernel/cpu.c: delete deprecated definition in cpu_up() 2010-03-06 11:26:28 -08:00
cpuset.c sched: Fix balance vs hotplug race 2009-12-06 21:10:56 +01:00
cred-internals.h
cred.c kernel/cred.c: use kmem_cache_free 2010-02-03 10:21:57 +11:00
delayacct.c headers: taskstats_kern.h trim 2009-09-18 09:48:52 -07:00
dma.c
early_res.c early_res: Need to save the allocation name in drop_range_partial() 2010-03-01 23:23:02 -08:00
elfcore.c elf coredump: add extended numbering support 2010-03-06 11:26:46 -08:00
exec_domain.c
exit.c kernel/exit.c: fix shadows sparse warning 2010-03-06 11:26:32 -08:00
extable.c
fork.c kernel core: use helpers for rlimits 2010-03-06 11:26:33 -08:00
freezer.c sched: fix nr_uninterruptible accounting of frozen tasks really 2009-07-18 14:19:53 +02:00
futex_compat.c futex: Protect pid lookup in compat code with RCU 2009-12-09 14:22:14 +01:00
futex.c futex: Handle futex value corruption gracefully 2010-02-03 15:13:22 +01:00
groups.c groups: move code to kernel/groups.c 2009-06-16 19:47:48 -07:00
hrtimer.c hrtimers: Convert to raw_spinlocks 2009-12-14 23:55:34 +01:00
hung_task.c softlockup: Fix hung_task_check_count sysctl 2009-11-27 06:21:57 +01:00
hw_breakpoint.c Merge branch 'perf/core' into perf/urgent 2010-03-04 11:47:52 +01:00
itimer.c itimers: Fix racy writes to cpu_itimer fields 2009-11-18 16:32:12 +01:00
kallsyms.c hw-breakpoints: Fix broken hw-breakpoint sample module 2009-11-10 11:23:29 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks mutex: Better control mutex adaptive spinning config 2009-12-03 11:50:11 +01:00
Kconfig.preempt
kexec.c percpu: add __percpu sparse annotations to core kernel subsystems 2010-02-17 11:17:38 +09:00
kfifo.c kfifo: Don't use integer as NULL pointer 2010-02-16 15:11:08 -08:00
kgdb.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-02-04 16:07:41 -08:00
kmod.c kmod: fix resource leak in call_usermodehelper_pipe() 2010-01-11 09:34:04 -08:00
kprobes.c kprobes: Jump optimization sysctl interface 2010-02-25 17:49:25 +01:00
ksysfs.c sched: Remove USER_SCHED 2010-01-21 13:40:18 +01:00
kthread.c kthread, sched: Remove reference to kthread_create_on_cpu 2010-02-09 11:47:39 +01:00
latencytop.c
lockdep_internals.h lockdep: BFS cleanup 2009-07-24 10:53:29 +02:00
lockdep_proc.c seq_file: constify seq_operations 2009-09-23 07:39:29 -07:00
lockdep_states.h
lockdep.c lockdep: Move lock events under lockdep recursion protection 2010-03-10 14:26:07 +01:00
Makefile elf coredump: replace ELF_CORE_EXTRA_* macros by functions 2010-03-06 11:26:45 -08:00
module.c sysfs: Use sysfs_attr_init and sysfs_bin_attr_init on module dynamic attributes 2010-03-07 17:04:51 -08:00
mutex-debug.c headers: remove sched.h from interrupt.h 2009-10-11 11:20:58 -07:00
mutex-debug.h locking: Implement new raw_spinlock 2009-12-14 23:55:32 +01:00
mutex.c mutex: Better control mutex adaptive spinning config 2009-12-03 11:50:11 +01:00
mutex.h
notifier.c sched: Use lockdep-based checking on rcu_dereference() 2010-02-25 10:34:26 +01:00
ns_cgroup.c cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time 2009-09-24 07:20:58 -07:00
nsproxy.c nsproxy: extract create_nsproxy() 2009-06-18 13:03:56 -07:00
padata.c padata: Allocate the cpumask for the padata instance 2010-03-04 13:30:22 +08:00
panic.c panic: fix panic_timeout accuracy when running on a hypervisor 2010-03-06 11:26:33 -08:00
params.c sysfs: Use sysfs_attr_init and sysfs_bin_attr_init on dynamic attributes 2010-03-07 17:04:51 -08:00
perf_event.c perf: Provide better condition for event rotation 2010-03-10 13:22:36 +01:00
pid_namespace.c pidns: deny CLONE_PARENT|CLONE_NEWPID combination 2009-09-24 07:21:04 -07:00
pid.c kernel/pid.c: update comment on find_task_by_pid_ns 2010-03-06 11:26:33 -08:00
pm_qos_params.c pm_qos: clean up racy global "name" variable 2009-10-14 15:31:10 +02:00
posix-cpu-timers.c kernel core: use helpers for rlimits 2010-03-06 11:26:33 -08:00
posix-timers.c posix-timers.c: Don't export local functions 2010-02-05 14:54:10 +01:00
printk.c printk: avoid warning when CONFIG_PRINTK is disabled 2010-03-06 11:26:33 -08:00
profile.c kernel/profile.c: Switch /proc/irq/prof_cpu_mask to seq_file 2009-09-20 20:15:40 +02:00
ptrace.c ptrace: Fix ptrace_regset() comments and diagnose errors specifically 2010-02-23 13:45:26 -08:00
range.c x86: Change range end to start+size 2010-02-10 17:47:17 -08:00
rcupdate.c rcu: Export rcu_scheduler_active 2010-02-26 08:20:46 +01:00
rcutiny.c rcu: Eliminate unneeded function wrapping 2009-11-22 18:58:16 +01:00
rcutorture.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2010-03-03 07:34:18 -08:00
rcutree_plugin.h rcu: Fix accelerated GPs for last non-dynticked CPU 2010-02-27 09:53:53 +01:00
rcutree_trace.c rcu: Stop overflowing signed integers 2010-02-25 10:34:57 +01:00
rcutree.c rcu: Fix accelerated grace periods for last non-dynticked CPU 2010-02-27 09:53:52 +01:00
rcutree.h rcu: Fix accelerated grace periods for last non-dynticked CPU 2010-02-27 09:53:52 +01:00
relay.c splice: comparing unsigned int < 0 2010-03-06 11:26:32 -08:00
res_counter.c memcg: some modification to softlimit under hierarchical memory reclaim. 2009-10-01 16:11:13 -07:00
resource.c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-03-03 09:11:02 -08:00
rtmutex_common.h
rtmutex-debug.c sched: Convert pi_lock to raw_spinlock 2009-12-14 23:55:33 +01:00
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c rtmutes: Convert rtmutex.lock to raw_spinlock 2009-12-14 23:55:33 +01:00
rtmutex.h
rwsem.c
sched_clock.c sched: Fix cpu_clock() in NMIs, on !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK 2009-12-15 09:04:36 +01:00
sched_cpupri.c bitops: rename for_each_bit() to for_each_set_bit() 2010-03-06 11:26:23 -08:00
sched_cpupri.h sched: Convert cpupri lock to raw_spinlock 2009-12-14 23:55:33 +01:00
sched_debug.c sched: Convert rq->lock to raw_spinlock 2009-12-14 23:55:33 +01:00
sched_fair.c sched: Fix SCHED_MC regression caused by change in sched cpu_power 2010-02-26 15:45:13 +01:00
sched_features.h sched: Discard some old bits 2009-12-09 10:03:07 +01:00
sched_idletask.c sched: Remove the sched_class load_balance methods 2010-01-21 13:40:09 +01:00
sched_rt.c kernel core: use helpers for rlimits 2010-03-06 11:26:33 -08:00
sched_stats.h
sched.c sysdev: Pass attribute in sysdev_class attributes show/store 2010-03-07 17:04:47 -08:00
seccomp.c
semaphore.c
signal.c kernel core: use helpers for rlimits 2010-03-06 11:26:33 -08:00
slow-work-debugfs.c SLOW_WORK: Move slow_work's proc file to debugfs 2009-12-01 08:20:31 -08:00
slow-work.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6 2009-12-08 07:38:50 -08:00
slow-work.h SLOW_WORK: Move slow_work's proc file to debugfs 2009-12-01 08:20:31 -08:00
smp.c generic-ipi: Optimize accesses by using DEFINE_PER_CPU_SHARED_ALIGNED for IPI data 2010-01-18 09:02:59 +01:00
softirq.c hrtimer, softirq: Fix hrtimer->softirq trampoline 2010-02-03 18:17:40 +01:00
softlockup.c softlockup: Add sched_clock_tick() to avoid kernel warning on kgdb resume 2010-02-01 08:22:32 +01:00
spinlock.c locking: Cleanup the name space completely 2009-12-14 23:55:33 +01:00
srcu.c rcu: Introduce lockdep-based checking to RCU read-side primitives 2010-02-25 09:40:59 +01:00
stacktrace.c
stop_machine.c percpu: add __percpu sparse annotations to core kernel subsystems 2010-02-17 11:17:38 +09:00
sys_ni.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 2009-12-08 07:55:01 -08:00
sys.c kernel core: use helpers for rlimits 2010-03-06 11:26:33 -08:00
sysctl_binary.c Switch may_open() and break_lease() to passing O_... 2010-03-03 13:00:21 -05:00
sysctl_check.c ipv4 05/05: add sysctl to accept packets with local source addresses 2009-12-03 12:14:38 -08:00
sysctl.c Merge branch 'perf-probes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-03-05 10:50:22 -08:00
taskstats.c const: struct nla_policy 2010-02-18 14:30:18 -08:00
test_kprobes.c
time.c Revert "time: Remove xtime_cache" 2009-12-22 14:10:37 -08:00
timeconst.pl
timer.c perf: Fix perf_event_do_pending() fallback callsite 2010-01-21 13:40:39 +01:00
tracepoint.c trivial: fix typo "to to" in multiple files 2009-09-21 15:14:55 +02:00
tsacct.c mm: clean up mm_counter 2010-03-06 11:26:23 -08:00
uid16.c headers: utsname.h redux 2009-09-23 18:13:10 -07:00
up.c
user_namespace.c
user-return-notifier.c core: Clean up user return notifers use of per_cpu 2009-12-02 10:22:59 +01:00
user.c sched: Remove USER_SCHED 2010-01-21 13:40:18 +01:00
utsname_sysctl.c sysctl kernel: Remove binary sysctl logic 2009-11-12 02:04:55 -08:00
utsname.c utsns: extract creeate_uts_ns() 2009-06-18 13:03:55 -07:00
wait.c locking, sched: Give waitqueue spinlocks their own lockdep classes 2009-08-10 14:43:09 +02:00
workqueue.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2009-12-10 09:35:44 -08:00