linux/kernel/bpf
Song Liu c50eb518e2 bpf: Use separate lockdep class for each hashtab
If a hashtab is accessed in both NMI and non-NMI contexts, it may cause
deadlock in bucket->lock. LOCKDEP NMI warning highlighted this issue:

./test_progs -t stacktrace

[   74.828970]
[   74.828971] ================================
[   74.828972] WARNING: inconsistent lock state
[   74.828973] 5.9.0-rc8+ #275 Not tainted
[   74.828974] --------------------------------
[   74.828975] inconsistent {INITIAL USE} -> {IN-NMI} usage.
[   74.828976] taskset/1174 [HC2[2]:SC0[0]:HE0:SE1] takes:
[   74.828977] ffffc90000ee96b0 (&htab->buckets[i].raw_lock){....}-{2:2}, at: htab_map_update_elem+0x271/0x5a0
[   74.828981] {INITIAL USE} state was registered at:
[   74.828982]   lock_acquire+0x137/0x510
[   74.828983]   _raw_spin_lock_irqsave+0x43/0x90
[   74.828984]   htab_map_update_elem+0x271/0x5a0
[   74.828984]   0xffffffffa0040b34
[   74.828985]   trace_call_bpf+0x159/0x310
[   74.828986]   perf_trace_run_bpf_submit+0x5f/0xd0
[   74.828987]   perf_trace_urandom_read+0x1be/0x220
[   74.828988]   urandom_read_nowarn.isra.0+0x26f/0x380
[   74.828989]   vfs_read+0xf8/0x280
[   74.828989]   ksys_read+0xc9/0x160
[   74.828990]   do_syscall_64+0x33/0x40
[   74.828991]   entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   74.828992] irq event stamp: 1766
[   74.828993] hardirqs last  enabled at (1765): [<ffffffff82800ace>] asm_exc_page_fault+0x1e/0x30
[   74.828994] hardirqs last disabled at (1766): [<ffffffff8267df87>] irqentry_enter+0x37/0x60
[   74.828995] softirqs last  enabled at (856): [<ffffffff81043e7c>] fpu__clear+0xac/0x120
[   74.828996] softirqs last disabled at (854): [<ffffffff81043df0>] fpu__clear+0x20/0x120
[   74.828997]
[   74.828998] other info that might help us debug this:
[   74.828999]  Possible unsafe locking scenario:
[   74.828999]
[   74.829000]        CPU0
[   74.829001]        ----
[   74.829001]   lock(&htab->buckets[i].raw_lock);
[   74.829003]   <Interrupt>
[   74.829004]     lock(&htab->buckets[i].raw_lock);
[   74.829006]
[   74.829006]  *** DEADLOCK ***
[   74.829007]
[   74.829008] 1 lock held by taskset/1174:
[   74.829008]  #0: ffff8883ec3fd020 (&cpuctx_lock){-...}-{2:2}, at: perf_event_task_tick+0x101/0x650
[   74.829012]
[   74.829013] stack backtrace:
[   74.829014] CPU: 0 PID: 1174 Comm: taskset Not tainted 5.9.0-rc8+ #275
[   74.829015] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[   74.829016] Call Trace:
[   74.829016]  <NMI>
[   74.829017]  dump_stack+0x9a/0xd0
[   74.829018]  lock_acquire+0x461/0x510
[   74.829019]  ? lock_release+0x6b0/0x6b0
[   74.829020]  ? stack_map_get_build_id_offset+0x45e/0x800
[   74.829021]  ? htab_map_update_elem+0x271/0x5a0
[   74.829022]  ? rcu_read_lock_held_common+0x1a/0x50
[   74.829022]  ? rcu_read_lock_held+0x5f/0xb0
[   74.829023]  _raw_spin_lock_irqsave+0x43/0x90
[   74.829024]  ? htab_map_update_elem+0x271/0x5a0
[   74.829025]  htab_map_update_elem+0x271/0x5a0
[   74.829026]  bpf_prog_1fd9e30e1438d3c5_oncpu+0x9c/0xe88
[   74.829027]  bpf_overflow_handler+0x127/0x320
[   74.829028]  ? perf_event_text_poke_output+0x4d0/0x4d0
[   74.829029]  ? sched_clock_cpu+0x18/0x130
[   74.829030]  __perf_event_overflow+0xae/0x190
[   74.829030]  handle_pmi_common+0x34c/0x470
[   74.829031]  ? intel_pmu_save_and_restart+0x90/0x90
[   74.829032]  ? lock_acquire+0x3f8/0x510
[   74.829033]  ? lock_release+0x6b0/0x6b0
[   74.829034]  intel_pmu_handle_irq+0x11e/0x240
[   74.829034]  perf_event_nmi_handler+0x40/0x60
[   74.829035]  nmi_handle+0x110/0x360
[   74.829036]  ? __intel_pmu_enable_all.constprop.0+0x72/0xf0
[   74.829037]  default_do_nmi+0x6b/0x170
[   74.829038]  exc_nmi+0x106/0x130
[   74.829038]  end_repeat_nmi+0x16/0x55
[   74.829039] RIP: 0010:__intel_pmu_enable_all.constprop.0+0x72/0xf0
[   74.829042] Code: 2f 1f 03 48 8d bb b8 0c 00 00 e8 29 09 41 00 48 ...
[   74.829043] RSP: 0000:ffff8880a604fc90 EFLAGS: 00000002
[   74.829044] RAX: 000000070000000f RBX: ffff8883ec2195a0 RCX: 000000000000038f
[   74.829045] RDX: 0000000000000007 RSI: ffffffff82e72c20 RDI: ffff8883ec21a258
[   74.829046] RBP: 000000070000000f R08: ffffffff8101b013 R09: fffffbfff0a7982d
[   74.829047] R10: ffffffff853cc167 R11: fffffbfff0a7982c R12: 0000000000000000
[   74.829049] R13: ffff8883ec3f0af0 R14: ffff8883ec3fd120 R15: ffff8883e9c92098
[   74.829049]  ? intel_pmu_lbr_enable_all+0x43/0x240
[   74.829050]  ? __intel_pmu_enable_all.constprop.0+0x72/0xf0
[   74.829051]  ? __intel_pmu_enable_all.constprop.0+0x72/0xf0
[   74.829052]  </NMI>
[   74.829053]  perf_event_task_tick+0x48d/0x650
[   74.829054]  scheduler_tick+0x129/0x210
[   74.829054]  update_process_times+0x37/0x70
[   74.829055]  tick_sched_handle.isra.0+0x35/0x90
[   74.829056]  tick_sched_timer+0x8f/0xb0
[   74.829057]  __hrtimer_run_queues+0x364/0x7d0
[   74.829058]  ? tick_sched_do_timer+0xa0/0xa0
[   74.829058]  ? enqueue_hrtimer+0x1e0/0x1e0
[   74.829059]  ? recalibrate_cpu_khz+0x10/0x10
[   74.829060]  ? ktime_get_update_offsets_now+0x1a3/0x360
[   74.829061]  hrtimer_interrupt+0x1bb/0x360
[   74.829062]  ? rcu_read_lock_sched_held+0xa1/0xd0
[   74.829063]  __sysvec_apic_timer_interrupt+0xed/0x3d0
[   74.829064]  sysvec_apic_timer_interrupt+0x3f/0xd0
[   74.829064]  ? asm_sysvec_apic_timer_interrupt+0xa/0x20
[   74.829065]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[   74.829066] RIP: 0033:0x7fba18d579b4
[   74.829068] Code: 74 54 44 0f b6 4a 04 41 83 e1 0f 41 80 f9 ...
[   74.829069] RSP: 002b:00007ffc9ba69570 EFLAGS: 00000206
[   74.829071] RAX: 00007fba192084c0 RBX: 00007fba18c24d28 RCX: 00000000000007a4
[   74.829072] RDX: 00007fba18c30488 RSI: 0000000000000000 RDI: 000000000000037b
[   74.829073] RBP: 00007fba18ca5760 R08: 00007fba18c248fc R09: 00007fba18c94c30
[   74.829074] R10: 000000000000002f R11: 0000000000073c30 R12: 00007ffc9ba695e0
[   74.829075] R13: 00000000000003f3 R14: 00007fba18c21ac8 R15: 00000000000058d6

However, such warning should not apply across multiple hashtabs. The
system will not deadlock if one hashtab is used in NMI, while another
hashtab is used in non-NMI.

Use separate lockdep class for each hashtab, so that we don't get this
false alert.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201029071925.3103400-2-songliubraving@fb.com
2020-10-30 13:03:29 -07:00
..
preload bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach 2020-09-29 13:09:23 -07:00
arraymap.c bpf: Allow for map-in-map with dynamic inner array map entries 2020-10-11 10:21:04 -07:00
bpf_inode_storage.c bpf: Allow specifying a BTF ID per argument in function protos 2020-09-21 15:00:40 -07:00
bpf_iter.c bpf: Permit cond_resched for some iterators 2020-10-28 14:54:31 -07:00
bpf_local_storage.c bpf: Use hlist_add_head_rcu when linking to local_storage 2020-09-19 01:12:35 +02:00
bpf_lru_list.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206 2019-05-30 11:29:53 -07:00
bpf_lru_list.h bpf: Fix a typo "inacitve" -> "inactive" 2020-04-06 21:54:10 +02:00
bpf_lsm.c bpf: Change bpf_sk_storage_*() to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON 2020-09-25 13:58:01 -07:00
bpf_struct_ops_types.h bpf: tcp: Support tcp_congestion_ops in bpf 2020-01-09 08:46:18 -08:00
bpf_struct_ops.c bpf: Move btf_resolve_size into __btf_resolve_size 2020-08-25 15:37:41 -07:00
btf.c bpf: Introduce bpf_per_cpu_ptr() 2020-10-02 15:00:49 -07:00
cgroup.c Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-10-22 09:59:21 -07:00
core.c networking changes for the 5.10 merge window 2020-10-15 18:42:13 -07:00
cpumap.c bpf, cpumap: Remove rcpu pointer from cpu_map_build_skb signature 2020-09-28 23:30:42 +02:00
devmap.c bpf: {cpu,dev}map: Change various functions return type from int to void 2020-09-01 15:45:58 +02:00
disasm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295 2019-06-05 17:36:38 +02:00
disasm.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295 2019-06-05 17:36:38 +02:00
dispatcher.c bpf: Remove bpf_image tree 2020-03-13 12:49:52 -07:00
hashtab.c bpf: Use separate lockdep class for each hashtab 2020-10-30 13:03:29 -07:00
helpers.c bpf: Introducte bpf_this_cpu_ptr() 2020-10-02 15:00:49 -07:00
inode.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
local_storage.c bpf/local_storage: Fix build without CONFIG_CGROUP 2020-07-25 20:16:36 -07:00
lpm_trie.c bpf: Add map_meta_equal map ops 2020-08-28 15:41:30 +02:00
Makefile bpf: Implement bpf_local_storage for inodes 2020-08-25 15:00:04 -07:00
map_in_map.c bpf: Relax max_entries check for most of the inner map types 2020-08-28 15:41:30 +02:00
map_in_map.h bpf: Add map_meta_equal map ops 2020-08-28 15:41:30 +02:00
map_iter.c bpf: Implement link_query callbacks in map element iterators 2020-08-21 14:01:39 -07:00
net_namespace.c bpf: Add support for forced LINK_DETACH command 2020-08-01 20:38:28 -07:00
offload.c bpf, offload: Replace bitwise AND by logical AND in bpf_prog_offload_info_fill 2020-02-17 16:53:49 +01:00
percpu_freelist.c bpf: Use raw_spin_trylock() for pcpu_freelist_push/pop in NMI 2020-10-06 00:04:11 +02:00
percpu_freelist.h bpf: Use raw_spin_trylock() for pcpu_freelist_push/pop in NMI 2020-10-06 00:04:11 +02:00
prog_iter.c bpf: Refactor bpf_iter_reg to have separate seq_info member 2020-07-25 20:16:32 -07:00
queue_stack_maps.c bpf: Add map_meta_equal map ops 2020-08-28 15:41:30 +02:00
reuseport_array.c bpf, net: Rework cookie generator as per-cpu one 2020-09-30 11:50:35 -07:00
ringbuf.c bpf: Add map_meta_equal map ops 2020-08-28 15:41:30 +02:00
stackmap.c bpf: Allow specifying a BTF ID per argument in function protos 2020-09-21 15:00:40 -07:00
syscall.c bpf: Remove unneeded break 2020-10-19 20:40:21 +02:00
sysfs_btf.c bpf: Fix sysfs export of empty BTF section 2020-09-21 21:50:24 +02:00
task_iter.c bpf: Permit cond_resched for some iterators 2020-10-28 14:54:31 -07:00
tnum.c bpf: Verifier, do explicit ALU32 bounds tracking 2020-03-30 14:59:53 -07:00
trampoline.c bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach 2020-09-29 13:09:23 -07:00
verifier.c bpf: Enforce id generation for all may-be-null register type 2020-10-19 15:57:42 -07:00