linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-27 06:31:52 +00:00

History

Andrea Arcangeli 15a77c6fe4 userfaultfd: fix SIGBUS resulting from false rwsem wakeups With >=32 CPUs the userfaultfd selftest triggered a graceful but unexpected SIGBUS because VM_FAULT_RETRY was returned by handle_userfault() despite the UFFDIO_COPY wasn't completed. This seems caused by rwsem waking the thread blocked in handle_userfault() and we can't run up_read() before the wait_event sequence is complete. Keeping the wait_even sequence identical to the first one, would require running userfaultfd_must_wait() again to know if the loop should be repeated, and it would also require retaking the rwsem and revalidating the whole vma status. It seems simpler to wait the targeted wakeup so that if false wakeups materialize we still wait for our specific wakeup event, unless of course there are signals or the uffd was released. Debug code collecting the stack trace of the wakeup showed this: $ ./userfaultfd 100 99999 nr_pages: 25600, nr_pages_per_cpu: 800 bounces: 99998, mode: racing ver poll, userfaults: 32 35 90 232 30 138 69 82 34 30 139 40 40 31 20 19 43 13 15 28 27 38 21 43 56 22 1 17 31 8 4 2 bounces: 99997, mode: rnd ver poll, Bus error (core dumped) save_stack_trace+0x2b/0x50 try_to_wake_up+0x2a6/0x580 wake_up_q+0x32/0x70 rwsem_wake+0xe0/0x120 call_rwsem_wake+0x1b/0x30 up_write+0x3b/0x40 vm_mmap_pgoff+0x9c/0xc0 SyS_mmap_pgoff+0x1a9/0x240 SyS_mmap+0x22/0x30 entry_SYSCALL_64_fastpath+0x1f/0xbd 0xffffffffffffffff FAULT_FLAG_ALLOW_RETRY missing 70 CPU: 24 PID: 1054 Comm: userfaultfd Tainted: G W 4.8.0+ #30 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xb8/0x112 handle_userfault+0x572/0x650 handle_mm_fault+0x12cb/0x1520 __do_page_fault+0x175/0x500 trace_do_page_fault+0x61/0x270 do_async_page_fault+0x19/0x90 async_page_fault+0x25/0x30 This always happens when the main userfault selftest thread is running clone() while glibc runs either mprotect or mmap (both taking mmap_sem down_write()) to allocate the thread stack of the background threads, while locking/userfault threads already run at full throttle and are susceptible to false wakeups that may cause handle_userfault() to return before than expected (which results in graceful SIGBUS at the next attempt). This was reproduced only with >=32 CPUs because the loop to start the thread where clone() is too quick with fewer CPUs, while with 32 CPUs there's already significant activity on ~32 locking and userfault threads when the last background threads are started with clone(). This >=32 CPUs SMP race condition is likely reproducible only with the selftest because of the much heavier userfault load it generates if compared to real apps. We'll have to allow "one more" VM_FAULT_RETRY for the WP support and a patch floating around that provides it also hidden this problem but in reality only is successfully at hiding the problem. False wakeups could still happen again the second time handle_userfault() is invoked, even if it's a so rare race condition that getting false wakeups twice in a row is impossible to reproduce. This full fix is needed for correctness, the only alternative would be to allow VM_FAULT_RETRY to be returned infinitely. With this fix the WP support can stick to a strict "one more" VM_FAULT_RETRY logic (no need of returning it infinite times to avoid the SIGBUS). Link: http://lkml.kernel.org/r/20170111005535.13832-2-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Shubham Kumar Sharma <shubham.kumar.sharma@oracle.com> Tested-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Michael Rapoport <RAPOPORT@il.ibm.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2017-01-24 16:26:14 -08:00
..
9p	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
adfs	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2016-10-10 20:16:43 -07:00
affs	vfs: remove ".readlink = generic_readlink" assignments	2016-12-09 16:45:04 +01:00
afs	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
autofs4	Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs	2016-12-17 19:16:12 -08:00
befs	befs: add NFS export support	2016-12-22 11:25:24 +00:00
bfs	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
btrfs	Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs	2017-01-13 17:40:22 -08:00
cachefiles	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2016-10-10 20:16:43 -07:00
ceph	ceph: fix bad endianness handling in parse_reply_info_extra	2017-01-18 17:58:45 +01:00
cifs	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
coda	vfs: remove ".readlink = generic_readlink" assignments	2016-12-09 16:45:04 +01:00
configfs	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
cramfs
crypto	fscrypt: fix renaming and linking special files	2016-12-31 00:47:05 -05:00
debugfs	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2016-10-10 20:16:43 -07:00
devpts	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2016-10-10 20:16:43 -07:00
dlm	ktime: Get rid of ktime_equal()	2016-12-25 17:21:23 +01:00
ecryptfs	vfs: remove ".readlink = generic_readlink" assignments	2016-12-09 16:45:04 +01:00
efivarfs	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2016-10-10 20:16:43 -07:00
efs	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
exofs	exofs: don't mess with simple_write_{begin,end}	2016-12-10 14:25:19 -05:00
exportfs	exportfs: be careful to only return expected errors.	2016-10-06 09:07:44 -04:00
ext2	dax: fix build warnings with FS_DAX and !FS_IOMAP	2017-01-24 16:26:14 -08:00
ext4	dax: fix build warnings with FS_DAX and !FS_IOMAP	2017-01-24 16:26:14 -08:00
f2fs	block: Rename blk_queue_zone_size and bdev_zone_size	2017-01-12 07:58:32 -07:00
fat	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2016-10-10 20:16:43 -07:00
freevxfs
fscache
fuse	fuse: fix time_to_jiffies nsec sanity check	2017-01-13 17:20:47 +01:00
gfs2	ktime: Cleanup ktime_set() usage	2016-12-25 17:21:22 +01:00
hfs	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
hfsplus	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
hostfs	vfs: remove ".readlink = generic_readlink" assignments	2016-12-09 16:45:04 +01:00
hpfs	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2016-10-10 20:16:43 -07:00
hugetlbfs	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
isofs	Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block	2016-12-13 10:19:16 -08:00
jbd2	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
jffs2	vfs: remove ".readlink = generic_readlink" assignments	2016-12-09 16:45:04 +01:00
jfs	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
kernfs	Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs	2016-12-17 19:16:12 -08:00
lockd	netns: make struct pernet_operations::id unsigned int	2016-11-18 10:59:15 -05:00
minix	vfs: remove ".readlink = generic_readlink" assignments	2016-12-09 16:45:04 +01:00
ncpfs	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
nfs	NFSv4: Fix client recovery when server reboots multiple times	2017-01-13 13:31:32 -05:00
nfs_common	netns: make struct pernet_operations::id unsigned int	2016-11-18 10:59:15 -05:00
nfsd	nfsd: fix supported attributes for acl & labels	2017-01-12 15:55:51 -05:00
nilfs2	Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs	2016-12-17 19:16:12 -08:00
nls
notify	Merge branch 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit	2017-01-05 23:06:06 -08:00
ntfs	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
ocfs2	ocfs2: fix crash caused by stale lvb with fsdlm plugin	2017-01-10 18:31:54 -08:00
omfs	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2016-10-10 20:16:43 -07:00
openpromfs	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
orangefs	Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs	2016-12-17 19:16:12 -08:00
overlayfs	ovl: fix possible use after free on redirect dir lookup	2017-01-18 15:19:54 +01:00
proc	sysctl: Drop reference added by grab_header in proc_sys_readdir	2017-01-10 13:34:57 +13:00
pstore	Improvements and fixes to pstore subsystem:	2016-12-13 09:16:11 -08:00
qnx4
qnx6
quota	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs	2016-12-19 08:23:53 -08:00
ramfs	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
reiserfs	Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs	2016-12-17 19:16:12 -08:00
romfs
squashfs	Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs	2016-12-17 19:16:12 -08:00
sysfs	Merge branch 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup	2016-10-14 12:18:50 -07:00
sysv	vfs: remove ".readlink = generic_readlink" assignments	2016-12-09 16:45:04 +01:00
tracefs	fs: Replace CURRENT_TIME with current_time() for inode timestamps	2016-09-27 21:06:21 -04:00
ubifs	ubifs: Fix journal replay wrt. xattr nodes	2017-01-17 14:35:58 +01:00
udf	block,fs: untangle fs.h and blk_types.h	2016-11-01 09:43:26 -06:00
ufs	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
xfs	xfs: fix xfs_mode_to_ftype() prototype	2017-01-18 12:39:21 -08:00
aio.c	aio: fix lock dep warning	2017-01-14 19:31:40 -05:00
anon_inodes.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
attr.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2016-10-10 20:16:43 -07:00
bad_inode.c	bad_inode: add missing i_op initializers	2016-12-09 11:57:43 +01:00
binfmt_aout.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
binfmt_elf_fdpic.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
binfmt_elf.c	coredump: Ensure proper size of sparse core files	2017-01-14 19:32:40 -05:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c	fs: Replace current_fs_time() with current_time()	2016-09-27 21:06:22 -04:00
binfmt_script.c
block_dev.c	Merge branch 'for-linus' of git://git.kernel.dk/linux-block	2017-01-04 09:03:37 -08:00
buffer.c	clean_bdev_aliases: Prevent cleaning blocks that are not in block range	2017-01-02 09:35:14 -07:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
compat.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
coredump.c	coredump: Ensure proper size of sparse core files	2017-01-14 19:32:40 -05:00
dax.c	dax: fix build warnings with FS_DAX and !FS_IOMAP	2017-01-24 16:26:14 -08:00
dcache.c	mnt: Protect the mountpoint hashtable with mount_lock	2017-01-10 13:34:43 +13:00
dcookies.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
direct-io.c	do_direct_IO: Use inode->i_blkbits to compute block count to be cleaned	2017-01-10 13:29:54 -07:00
drop_caches.c
eventfd.c
eventpoll.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
exec.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
fcntl.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
fhandle.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
file_table.c	constify alloc_file()	2016-12-05 19:01:16 -05:00
file.c	fs/file: more unsigned file descriptors	2016-09-27 18:47:38 -04:00
filesystems.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
fs_pin.c
fs_struct.c
fs-writeback.c	fs/fs-writeback.c: remove redundant if check	2016-12-12 18:55:08 -08:00
inode.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2016-10-10 20:16:43 -07:00
internal.h	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2016-12-17 18:44:00 -08:00
ioctl.c	vfs: call vfs_clone_file_range() under freeze protection	2016-12-16 11:02:54 +01:00
iomap.c	xfs: updates for 4.10-rc1	2016-12-14 21:35:31 -08:00
Kconfig	dax: fix build warnings with FS_DAX and !FS_IOMAP	2017-01-24 16:26:14 -08:00
Kconfig.binfmt	docs: fix locations of several documents that got moved	2016-10-24 08:12:35 -02:00
libfs.c	libfs: Modify mount_pseudo_xattr to be clear it is not a userspace mount	2017-01-10 13:34:55 +13:00
locks.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
Makefile	logfs: remove from tree	2016-12-14 23:48:11 -05:00
mbcache.c	mbcache: document that "find" functions only return reusable entries	2016-12-03 15:55:01 -05:00
mount.h	vfs: add path_is_mountpoint() helper	2016-12-03 20:51:35 -05:00
mpage.c	fs: Add helper to clean bdev aliases under a bh and use it	2016-11-04 14:34:47 -06:00
namei.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
namespace.c	mnt: Protect the mountpoint hashtable with mount_lock	2017-01-10 13:34:43 +13:00
no-block.c
nsfs.c	net: add an ioctl to get a socket network namespace	2016-10-31 10:56:36 -04:00
open.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
pipe.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
pnode.c	reorganize do_make_slave()	2016-12-16 16:30:49 -05:00
pnode.h	mnt: Add a per mount namespace limit on the number of mounts	2016-09-30 12:46:48 -05:00
posix_acl.c	tmpfs: clear S_ISGID when setting posix ACLs	2017-01-10 01:29:48 -05:00
proc_namespace.c
read_write.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
readdir.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
select.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
seq_file.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
signalfd.c
splice.c	splice: reinstate SIGPIPE/EPIPE handling	2016-12-21 10:59:34 -08:00
stack.c
stat.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
statfs.c	vfs: misc struct path constification	2016-12-05 19:03:49 -05:00
super.c	quota: Remove dqonoff_mutex	2016-11-30 08:38:07 +01:00
sync.c
timerfd.c	ktime: Cleanup ktime_set() usage	2016-12-25 17:21:22 +01:00
userfaultfd.c	userfaultfd: fix SIGBUS resulting from false rwsem wakeups	2017-01-24 16:26:14 -08:00
utimes.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00
xattr.c	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	2016-12-24 11:46:01 -08:00