linux

History

John Fastabend e9db4ef6bf bpf: sockhash fix omitted bucket lock in sock_close First the sk_callback_lock() was being used to protect both the sock callback hooks and the psock->maps list. This got overly convoluted after the addition of sockhash (in sockmap it made some sense because masp and callbacks were tightly coupled) so lets split out a specific lock for maps and only use the callback lock for its intended purpose. This fixes a couple cases where we missed using maps lock when it was in fact needed. Also this makes it easier to follow the code because now we can put the locking closer to the actual code its serializing. Next, in sock_hash_delete_elem() the pattern was as follows, sock_hash_delete_elem() [...] spin_lock(bucket_lock) l = lookup_elem_raw() if (l) hlist_del_rcu() write_lock(sk_callback_lock) .... destroy psock ... write_unlock(sk_callback_lock) spin_unlock(bucket_lock) The ordering is necessary because we only know the {p}sock after dereferencing the hash table which we can't do unless we have the bucket lock held. Once we have the bucket lock and the psock element it is deleted from the hashmap to ensure any other path doing a lookup will fail. Finally, the refcnt is decremented and if zero the psock is destroyed. In parallel with the above (or free'ing the map) a tcp close event may trigger tcp_close(). Which at the moment omits the bucket lock altogether (oops!) where the flow looks like this, bpf_tcp_close() [...] write_lock(sk_callback_lock) for each psock->maps // list of maps this sock is part of hlist_del_rcu(ref_hash_node); .... destroy psock ... write_unlock(sk_callback_lock) Obviously, and demonstrated by syzbot, this is broken because we can have multiple threads deleting entries via hlist_del_rcu(). To fix this we might be tempted to wrap the hlist operation in a bucket lock but that would create a lock inversion problem. In summary to follow locking rules the psocks maps list needs the sk_callback_lock (after this patch maps_lock) but we need the bucket lock to do the hlist_del_rcu. To resolve the lock inversion problem pop the head of the maps list repeatedly and remove the reference until no more are left. If a delete happens in parallel from the BPF API that is OK as well because it will do a similar action, lookup the lock in the map/hash, delete it from the map/hash, and dec the refcnt. We check for this case before doing a destroy on the psock to ensure we don't have two threads tearing down a psock. The new logic is as follows, bpf_tcp_close() e = psock_map_pop(psock->maps) // done with map lock bucket_lock() // lock hash list bucket l = lookup_elem_raw(head, hash, key, key_size); if (l) { //only get here if elmnt was not already removed hlist_del_rcu() ... destroy psock... } bucket_unlock() And finally for all the above to work add missing locking around map operations per above. Then add RCU annotations and use rcu_dereference/rcu_assign_pointer to manage values relying on RCU so that the object is not free'd from sock_hash_free() while it is being referenced in bpf_tcp_close(). Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com Fixes: `8111038444` ("bpf: sockmap, add hash map support") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>		2018-07-01 01:21:32 +02:00
..
arraymap.c	bpf: btf: Rename btf_key_id and btf_value_id in bpf_map_info	2018-05-23 12:03:32 +02:00
bpf_lru_list.c	bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4	2017-04-17 13:55:52 -04:00
bpf_lru_list.h	bpf: Only set node->ref = 1 if it has not been set	2017-09-01 09:57:39 -07:00
btf.c	treewide: kvzalloc() -> kvcalloc()	2018-06-12 16:19:22 -07:00
cgroup.c	bpf: fix attach type BPF_LIRC_MODE2 dependency wrt CONFIG_CGROUP_BPF	2018-06-26 11:28:38 +02:00
core.c	bpf: undo prog rejection on read-only lock failure	2018-06-29 10:47:35 -07:00
cpumap.c	xdp: introduce xdp_return_frame_rx_napi	2018-05-24 18:36:15 -07:00
devmap.c	xdp: Fix handling of devmap in generic XDP	2018-06-15 23:47:15 +02:00
disasm.c	bpf: Remove struct bpf_verifier_env argument from print_bpf_insn	2018-03-23 17:38:57 +01:00
disasm.h	bpf: Remove struct bpf_verifier_env argument from print_bpf_insn	2018-03-23 17:38:57 +01:00
hashtab.c	bpf: avoid retpoline for lookup/update/delete calls on maps	2018-06-03 07:45:37 -07:00
helpers.c	bpf: implement bpf_get_current_cgroup_id() helper	2018-06-03 18:22:41 -07:00
inode.c	bpf: implement dummy fops for bpf objects	2018-06-08 10:58:48 -07:00
lpm_trie.c	treewide: kmalloc() -> kmalloc_array()	2018-06-12 16:19:22 -07:00
Makefile	bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP	2018-05-03 15:55:24 -07:00
map_in_map.c	bpf: Add syscall lookup support for fd array and htab	2017-06-29 13:13:25 -04:00
map_in_map.h	bpf: Add syscall lookup support for fd array and htab	2017-06-29 13:13:25 -04:00
offload.c	bpf: offload: allow offloaded programs to use perf event arrays	2018-05-04 23:41:03 +02:00
percpu_freelist.c	bpf: fix lockdep splat	2017-11-15 19:46:32 +09:00
percpu_freelist.h	bpf: introduce percpu_freelist	2016-03-08 15:28:31 -05:00
sockmap.c	bpf: sockhash fix omitted bucket lock in sock_close	2018-07-01 01:21:32 +02:00
stackmap.c	bpf: avoid -Wmaybe-uninitialized warning	2018-05-28 17:40:59 +02:00
syscall.c	bpf: fix attach type BPF_LIRC_MODE2 dependency wrt CONFIG_CGROUP_BPF	2018-06-26 11:28:38 +02:00
tnum.c	bpf/verifier: improve register value range tracking with ARSH	2018-04-29 08:45:53 -07:00
verifier.c	treewide: Use array_size() in vzalloc()	2018-06-12 16:19:22 -07:00
xskmap.c	xsk: clean up SPDX headers	2018-05-18 16:07:02 +02:00