linux

History

Filipe Manana ac05ca913e Btrfs: fix race between using extent maps and merging them We have a few cases where we allow an extent map that is in an extent map tree to be merged with other extents in the tree. Such cases include the unpinning of an extent after the respective ordered extent completed or after logging an extent during a fast fsync. This can lead to subtle and dangerous problems because when doing the merge some other task might be using the same extent map and as consequence see an inconsistent state of the extent map - for example sees the new length but has seen the old start offset. With luck this triggers a BUG_ON(), and not some silent bug, such as the following one in __do_readpage(): $ cat -n fs/btrfs/extent_io.c 3061 static int __do_readpage(struct extent_io_tree tree, 3062 struct page page, (...) 3127 em = __get_extent_map(inode, page, pg_offset, cur, 3128 end - cur + 1, get_extent, em_cached); 3129 if (IS_ERR_OR_NULL(em)) { 3130 SetPageError(page); 3131 unlock_extent(tree, cur, end); 3132 break; 3133 } 3134 extent_offset = cur - em->start; 3135 BUG_ON(extent_map_end(em) <= cur); (...) Consider the following example scenario, where we end up hitting the BUG_ON() in __do_readpage(). We have an inode with a size of 8KiB and 2 extent maps: extent A: file offset 0, length 4KiB, disk_bytenr = X, persisted on disk by a previous transaction extent B: file offset 4KiB, length 4KiB, disk_bytenr = X + 4KiB, not yet persisted but writeback started for it already. The extent map is pinned since there's writeback and an ordered extent in progress, so it can not be merged with extent map A yet The following sequence of steps leads to the BUG_ON(): 1) The ordered extent for extent B completes, the respective page gets its writeback bit cleared and the extent map is unpinned, at that point it is not yet merged with extent map A because it's in the list of modified extents; 2) Due to memory pressure, or some other reason, the MM subsystem releases the page corresponding to extent B - btrfs_releasepage() is called and returns 1, meaning the page can be released as it's not dirty, not under writeback anymore and the extent range is not locked in the inode's iotree. However the extent map is not released, either because we are not in a context that allows memory allocations to block or because the inode's size is smaller than 16MiB - in this case our inode has a size of 8KiB; 3) Task B needs to read extent B and ends up __do_readpage() through the btrfs_readpage() callback. At __do_readpage() it gets a reference to extent map B; 4) Task A, doing a fast fsync, calls clear_em_loggin() against extent map B while holding the write lock on the inode's extent map tree - this results in try_merge_map() being called and since it's possible to merge extent map B with extent map A now (the extent map B was removed from the list of modified extents), the merging begins - it sets extent map B's start offset to 0 (was 4KiB), but before it increments the map's length to 8KiB (4kb + 4KiB), task A is at: BUG_ON(extent_map_end(em) <= cur); The call to extent_map_end() sees the extent map has a start of 0 and a length still at 4KiB, so it returns 4KiB and 'cur' is 4KiB, so the BUG_ON() is triggered. So it's dangerous to modify an extent map that is in the tree, because some other task might have got a reference to it before and still using it, and needs to see a consistent map while using it. Generally this is very rare since most paths that lookup and use extent maps also have the file range locked in the inode's iotree. The fsync path is pretty much the only exception where we don't do it to avoid serialization with concurrent reads. Fix this by not allowing an extent map do be merged if if it's being used by tasks other then the one attempting to merge the extent map (when the reference count of the extent map is greater than 2). Reported-by: ryusuke1925 <st13s20@gm.ibaraki-ct.ac.jp> Reported-by: Koki Mitani <koki.mitani.xg@hco.ntt.co.jp> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206211 CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>		2020-02-12 17:16:46 +01:00
..
9p	9p pull request for inclusion in 5.4	2019-09-27 15:10:34 -07:00
adfs	Merge branch 'work.adfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-07-19 11:33:22 -07:00
affs	affs: fix a memory leak in affs_remount	2019-11-18 14:26:43 +01:00
afs	Merge branch 'dhowells' (patches from DavidH)	2020-01-14 09:56:31 -08:00
autofs	Merge branch 'next.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-12-05 17:11:48 -08:00
befs	fs: Fill in max and min timestamps in superblock	2019-08-30 07:27:17 -07:00
bfs	fs: Fill in max and min timestamps in superblock	2019-08-30 07:27:17 -07:00
btrfs	Btrfs: fix race between using extent maps and merging them	2020-02-12 17:16:46 +01:00
cachefiles	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36	2019-05-24 17:27:11 +02:00
ceph	ceph: add more debug info when decoding mdsmap	2019-12-09 20:55:10 +01:00
cifs	cifs: Optimize readdir on reparse points	2019-12-23 09:04:44 -06:00
coda	y2038: add inode timestamp clamping	2019-09-19 09:42:37 -07:00
configfs	configfs: calculate the depth of parent item	2019-11-06 18:36:01 +01:00
cramfs	cramfs: fix usage on non-MTD device	2019-11-23 21:44:49 -05:00
crypto	treewide: Use sizeof_field() macro	2019-12-09 10:36:44 -08:00
debugfs	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-12-06 09:06:58 -08:00
devpts	devpts_pty_kill(): don't bother with d_delete()	2019-09-03 09:30:56 -04:00
dlm	dlm for 5.3	2019-07-12 17:37:53 -07:00
ecryptfs	compat_ioctl: remove most of fs/compat_ioctl.c	2019-12-01 13:46:15 -08:00
efivarfs	Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-07-19 10:42:02 -07:00
efs	fs: Fill in max and min timestamps in superblock	2019-08-30 07:27:17 -07:00
erofs	Changes since last update:	2019-12-11 12:25:32 -08:00
exportfs	race in exportfs_decode_fh()	2019-11-11 09:21:59 -05:00
ext2	\n	2019-11-30 11:16:07 -08:00
ext4	Ext4 bug fixes (including a regression fix) for 5.5	2019-12-22 10:41:48 -08:00
f2fs	compat_ioctl: remove most of fs/compat_ioctl.c	2019-12-01 13:46:15 -08:00
fat	compat_ioctl: move drivers to compat_ptr_ioctl	2019-10-23 17:23:43 +02:00
freevxfs	fs: Fill in max and min timestamps in superblock	2019-08-30 07:27:17 -07:00
fscache	Revert "Merge tag 'keys-acl-20190703' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs"	2019-07-10 18:43:43 -07:00
fuse	fuse: fix fuse_send_readpages() in the syncronous read case	2020-01-16 11:09:36 +01:00
gfs2	GFS2 changes for this merge window:	2019-12-05 13:20:11 -08:00
hfs	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
hfsplus	fs/hfsplus/xattr.c: replace strncpy with memcpy	2019-07-16 19:23:23 -07:00
hostfs	This pull request contains the following changes for UML:	2019-05-12 17:52:13 -04:00
hpfs	fs: compat_ioctl: move FITRIM emulation into file systems	2019-10-23 17:23:46 +02:00
hugetlbfs	mm/hugetlbfs: fix for_each_hstate() loop in init_hugetlbfs_fs()	2020-01-03 10:39:08 -08:00
iomap	iomap: stop using ioend after it's been freed in iomap_finish_ioend()	2019-12-05 07:41:16 -08:00
isofs	y2038: add inode timestamp clamping	2019-09-19 09:42:37 -07:00
jbd2	This merge window saw the the following new featuers added to ext4:	2019-11-30 10:53:02 -08:00
jffs2	Revert "jffs2: Fix possible null-pointer dereferences in jffs2_add_frag_to_fragtree()"	2019-11-29 11:29:58 +01:00
jfs	y2038: add inode timestamp clamping	2019-09-19 09:42:37 -07:00
kernfs	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-12-06 09:06:58 -08:00
lockd	NFSv4.1: Don't rebind to the same source port when reconnecting to the server	2019-11-03 21:28:45 -05:00
minix	fs: Fill in max and min timestamps in superblock	2019-08-30 07:27:17 -07:00
nfs	reimplement path_mountpoint() with less magic	2020-01-15 01:36:06 -05:00
nfs_common	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
nfsd	This is a relatively quiet cycle for nfsd, mainly various bugfixes.	2019-12-07 16:56:00 -08:00
nilfs2	fs: compat_ioctl: move FITRIM emulation into file systems	2019-10-23 17:23:46 +02:00
nls	treewide: Add SPDX license identifier - Makefile/Kconfig	2019-05-21 10:50:46 +02:00
notify	fs: call fsnotify_sb_delete after evict_inodes	2019-12-18 00:03:01 -05:00
ntfs	ntfs: remove (un)?likely() from IS_ERR() conditions	2019-09-26 10:10:44 -07:00
ocfs2	ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less	2020-01-04 13:55:09 -08:00
omfs	fs: omfs: Initialize filesystem timestamp ranges	2019-08-30 08:11:25 -07:00
openpromfs	Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-07-19 10:42:02 -07:00
orangefs	orangefs: posix open permission checking...	2019-12-04 08:52:55 -05:00
overlayfs	overlayfs fixes for 5.5-rc2	2019-12-14 11:13:54 -08:00
proc	sched/cputime, proc/stat: Fix incorrect guest nice cpustat value	2019-12-11 07:09:58 +01:00
pstore	pstore/ram: Regularize prz label allocation lifetime	2020-01-08 17:05:45 -08:00
qnx4	fs: Fill in max and min timestamps in superblock	2019-08-30 07:27:17 -07:00
qnx6	fs: Fill in max and min timestamps in superblock	2019-08-30 07:27:17 -07:00
quota	fs: avoid softlockups in s_inodes iterators	2019-12-18 00:03:01 -05:00
ramfs	vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API	2019-09-12 21:05:34 -04:00
reiserfs	reiserfs: replace open-coded atomic_dec_and_mutex_lock()	2019-11-05 12:25:22 +01:00
romfs	Merge branch 'work.mount2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-09-19 10:06:57 -07:00
squashfs	Merge branch 'work.mount2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-09-19 10:06:57 -07:00
sysfs	Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-07-19 10:42:02 -07:00
sysv	fs: sysv: Initialize filesystem timestamp ranges	2019-08-30 07:27:18 -07:00
tracefs	tracing: Do not create tracefs files if tracefs lockdown is in effect	2019-10-12 20:49:07 -04:00
ubifs	ubifs: ubifs_tnc_start_commit: Fix OOB in layout_in_gaps	2019-11-17 22:22:54 +01:00
udf	fs-udf: Delete an unnecessary check before brelse()	2019-09-04 18:19:43 +02:00
ufs	y2038: add inode timestamp clamping	2019-09-19 09:42:37 -07:00
unicode	unicode: make array 'token' static const, makes object smaller	2019-09-17 11:48:24 -04:00
verity	treewide: Use sizeof_field() macro	2019-12-09 10:36:44 -08:00
xfs	xfs: Make the symbol 'xfs_rtalloc_log_count' static	2019-12-20 08:07:31 -08:00
aio.c	y2038: syscall implementation cleanups	2019-12-01 14:00:59 -08:00
anon_inodes.c	Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-07-19 10:42:02 -07:00
attr.c	timestamp_truncate: Replace users of timespec64_trunc	2019-08-30 07:27:17 -07:00
bad_inode.c
binfmt_aout.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
binfmt_elf_fdpic.c	y2038: elfcore: Use __kernel_old_timeval for process times	2019-11-15 14:38:29 +01:00
binfmt_elf.c	fs/binfmt_elf.c: extract elf_read() function	2019-12-04 19:44:13 -08:00
binfmt_em86.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
binfmt_flat.c	fs/binfmt_flat.c: remove set but not used variable 'inode'	2019-07-16 19:23:22 -07:00
binfmt_misc.c	Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-07-19 10:42:02 -07:00
binfmt_script.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
block_dev.c	block: don't send uevent for empty disk when not invalidating	2019-12-02 18:49:30 -07:00
buffer.c	fs: move guard_bio_eod() after bio_set_op_attrs	2020-01-09 08:16:12 -07:00
char_dev.c	chardev: Avoid potential use-after-free in 'chrdev_open()'	2020-01-06 20:10:26 +01:00
compat_binfmt_elf.c	y2038: elfcore: Use __kernel_old_timeval for process times	2019-11-15 14:38:29 +01:00
compat_ioctl.c	New code for 5.5:	2019-12-02 14:46:22 -08:00
compat.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500	2019-06-19 17:09:55 +02:00
coredump.c	coredump: split pipe command whitespace before expanding template	2019-08-03 07:02:01 -07:00
d_path.c	[PATCH] fix d_absolute_path() interplay with fsmount()	2019-08-30 19:31:09 -04:00
dax.c	New code for 5.5:	2019-11-30 10:44:49 -08:00
dcache.c	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-12-08 11:08:28 -08:00
dcookies.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
direct-io.c	fs/direct-io.c: include fs/internal.h for missing prototype	2020-01-04 13:55:09 -08:00
drop_caches.c	fs: avoid softlockups in s_inodes iterators	2019-12-18 00:03:01 -05:00
eventfd.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
eventpoll.c	fs/epoll: remove unnecessary wakeups of nested epoll	2019-12-04 19:44:13 -08:00
exec.c	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2019-12-03 12:20:25 -08:00
fcntl.c	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-12-08 11:08:28 -08:00
fhandle.c	fs/handle.c - fix up kerneldoc	2019-08-07 21:51:47 -04:00
file_table.c	vfs: Export flush_delayed_fput for use by knfsd.	2019-08-19 11:00:39 -04:00
file.c	Revert "fs: remove ksys_dup()"	2020-01-02 16:15:33 -08:00
filesystems.c	vfs: Implement a filesystem superblock creation/configuration context	2019-02-28 03:29:26 -05:00
fs_context.c	vfs: subtype handling moved to fuse	2019-09-06 21:28:49 +02:00
fs_parser.c	vfs: Make fs_parse() handle fs_param_is_fd-type params better	2019-09-12 21:06:14 -04:00
fs_pin.c	switch the remnants of releasing the mountpoint away from fs_pin	2019-07-16 22:52:37 -04:00
fs_struct.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
fs_types.c	fs: common implementation of file type	2019-01-21 17:48:13 +01:00
fs-writeback.c	cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead	2019-11-08 13:37:24 -07:00
fsopen.c	Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2019-07-19 10:42:02 -07:00
inode.c	fs: avoid softlockups in s_inodes iterators	2019-12-18 00:03:01 -05:00
internal.h	fs: move guard_bio_eod() after bio_set_op_attrs	2020-01-09 08:16:12 -07:00
io_uring.c	io_uring: only allow submit from owning task	2020-01-16 21:43:24 -07:00
io-wq.c	io-wq: cancel work if we fail getting a mm reference	2020-01-14 22:06:11 -07:00
io-wq.h	io-wq: re-add io_wq_current_is_worker()	2019-12-17 19:57:20 -07:00
ioctl.c	New code for 5.5:	2019-12-02 14:46:22 -08:00
Kconfig	io-wq: small threadpool implementation for io_uring	2019-10-29 12:43:00 -06:00
Kconfig.binfmt	binfmt_flat: make support for old format binaries optional	2019-06-24 09:16:47 +10:00
libfs.c	fs/libfs.c: fix kernel-doc warning	2019-10-14 15:04:01 -07:00
locks.c	locks: print unsigned ino in /proc/locks	2019-12-29 09:00:58 -05:00
Makefile	io-wq: small threadpool implementation for io_uring	2019-10-29 12:43:00 -06:00
mbcache.c	treewide: Add SPDX license identifier for more missed files	2019-05-21 10:50:45 +02:00
mount.h	switch the remnants of releasing the mountpoint away from fs_pin	2019-07-16 22:52:37 -04:00
mpage.c	fs: move guard_bio_eod() after bio_set_op_attrs	2020-01-09 08:16:12 -07:00
namei.c	fix autofs regression caused by follow_managed() changes	2020-01-15 01:36:46 -05:00
namespace.c	fs/namespace.c: make to_mnt_ns() static	2020-01-04 13:55:09 -08:00
no-block.c	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152	2019-05-30 11:26:32 -07:00
nsfs.c	fs/nsfs.c: include headers for missing declarations	2020-01-04 13:55:09 -08:00
open.c	Revert "vfs: properly and reliably lock f_pos in fdget_pos()"	2019-11-26 11:34:06 -08:00
pipe.c	pipe: fix empty pipe check in pipe_write()	2019-12-22 09:47:47 -08:00
pnode.c	fs/namespace: fix unprivileged mount propagation	2019-06-17 17:36:09 -04:00
pnode.h	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 209	2019-05-30 11:29:53 -07:00
posix_acl.c	fs/posix_acl.c: fix kernel-doc warnings	2020-01-04 13:55:09 -08:00
proc_namespace.c	vfs: subtype handling moved to fuse	2019-09-06 21:28:49 +02:00
read_write.c	vfs: fix page locking deadlocks when deduping files	2019-08-16 18:43:24 -07:00
readdir.c	filldir[64]: remove WARN_ON_ONCE() for bad directory entries	2019-10-18 18:41:16 -04:00
select.c	y2038: syscalls: change remaining timeval to __kernel_old_timeval	2019-11-15 14:38:29 +01:00
seq_file.c	seq_file: fix problem when seeking mid-record	2019-08-13 16:06:52 -07:00
signalfd.c	fs: mark expected switch fall-throughs	2019-04-08 18:21:02 -05:00
splice.c	pipe: remove 'waiting_writers' merging logic	2019-12-07 13:21:01 -08:00
stack.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00
stat.c	fs: move generic stat response attr handling to vfs_getattr_nosec	2019-02-01 01:55:45 -05:00
statfs.c	vfs: Fix EOVERFLOW testing in put_compat_statfs64	2019-10-03 14:21:35 -07:00
super.c	fs: call fsnotify_sb_delete after evict_inodes	2019-12-18 00:03:01 -05:00
sync.c	fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback	2019-05-14 09:47:50 -07:00
timerfd.c	y2038: timerfd: Use timespec64 internally	2019-11-15 14:38:30 +01:00
userfaultfd.c	Merge branch 'akpm' (patches from Andrew)	2019-12-01 20:36:41 -08:00
utimes.c	y2038: syscalls: change remaining timeval to __kernel_old_timeval	2019-11-15 14:38:29 +01:00
xattr.c	treewide: Add SPDX license identifier for missed files	2019-05-21 10:50:45 +02:00