linux/fs
Jason Baron ae10b2b4eb epoll: optimize EPOLL_CTL_DEL using rcu
Nathan Zimmer found that once we get over 10+ cpus, the scalability of
SPECjbb falls over due to the contention on the global 'epmutex', which is
taken in on EPOLL_CTL_ADD and EPOLL_CTL_DEL operations.

Patch #1 removes the 'epmutex' lock completely from the EPOLL_CTL_DEL path
by using rcu to guard against any concurrent traversals.

Patch #2 remove the 'epmutex' lock from EPOLL_CTL_ADD operations for
simple topologies.  IE when adding a link from an epoll file descriptor to
a wakeup source, where the epoll file descriptor is not nested.

This patch (of 2):

Optimize EPOLL_CTL_DEL such that it does not require the 'epmutex' by
converting the file->f_ep_links list into an rcu one.  In this way, we can
traverse the epoll network on the add path in parallel with deletes.
Since deletes can't create loops or worse wakeup paths, this is safe.

This patch in combination with the patch "epoll: Do not take global 'epmutex'
for simple topologies", shows a dramatic performance improvement in
scalability for SPECjbb.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Tested-by: Nathan Zimmer <nzimmer@sgi.com>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Nelson Elhage <nelhage@nelhage.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
CC: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-13 12:09:25 +09:00
..
9p FS-Cache: Provide the ability to enable/disable cookies 2013-09-27 18:40:25 +01:00
adfs truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
affs truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
afs Merge branch 'fscache' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into linux-next 2013-10-28 19:36:46 -04:00
autofs4 autofs4: close the races around autofs4_notify_daemon() 2013-09-16 19:16:38 -04:00
befs [readdir] convert befs 2013-06-29 12:56:55 +04:00
bfs truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2013-10-18 16:46:21 -07:00
cachefiles FS-Cache: Provide the ability to enable/disable cookies 2013-09-27 18:40:25 +01:00
ceph FS-Cache: Provide the ability to enable/disable cookies 2013-09-27 18:40:25 +01:00
cifs Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6 2013-11-08 06:01:47 +09:00
coda helper for reading ->d_count 2013-07-05 18:59:33 +04:00
configfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-07-14 11:42:26 -07:00
cramfs cramfs: mark as obsolete 2013-11-13 12:09:12 +09:00
debugfs debugfs: use list_next_entry() in debugfs_remove_recursive() 2013-11-13 12:09:24 +09:00
devpts fs: Limit sys_mount to only request filesystem modules (Part 2). 2013-03-07 01:08:55 -08:00
dlm dlm: remove signal blocking 2013-08-12 15:22:43 -05:00
ecryptfs eCryptfs: fix 32 bit corruption issue 2013-10-24 12:36:30 -07:00
efivarfs efivarfs: we can use simple_lookup() now 2013-07-14 17:48:35 +04:00
efs efs: iget_locked() doesn't return an ERR_PTR() 2013-08-24 12:10:22 -04:00
exofs truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
exportfs exportfs: don't assume that ->iterate() won't feed us too long entries 2013-09-07 19:54:55 -04:00
ext2 truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
ext3 ext[34]: fix double put in tmpfile 2013-10-15 12:14:06 -04:00
ext4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-10-16 17:18:18 -07:00
f2fs f2fs: optimize gc for better performance 2013-09-05 13:50:32 +09:00
fat truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
freevxfs [readdir] convert freevxfs 2013-06-29 12:56:53 +04:00
fscache FS-Cache: Provide the ability to enable/disable cookies 2013-09-27 18:40:25 +01:00
fuse fuse: no RCU mode in fuse_access() 2013-10-01 16:41:23 +02:00
gfs2 The main feature of interest this time is quota updates. There are 2013-11-11 07:11:00 +09:00
hfs truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
hfsplus truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
hostfs um: hostfs: Fix writeback 2013-09-07 10:38:29 +02:00
hpfs truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
hppfs clean up scary strncpy(dst, src, strlen(src)) uses 2013-07-03 16:07:41 -07:00
hugetlbfs cope with potentially long ->d_dname() output for shmem/hugetlb 2013-08-24 12:10:17 -04:00
isofs isofs: Refuse RW mount of the filesystem instead of making it RO 2013-07-31 22:14:50 +02:00
jbd jbd: use a single printk for jbd_debug() 2013-08-09 10:49:00 +02:00
jbd2 jbd2: Fix endian mixing problems in the checksumming code 2013-08-28 14:59:58 -04:00
jffs2 [readdir] convert jffs2 2013-06-29 12:56:47 +04:00
jfs Just a patch to fix an oops in an error path. 2013-10-22 09:01:11 +01:00
lockd LOCKD: Don't call utsname()->nodename from nlmclnt_setlockargs 2013-08-05 15:03:46 -04:00
logfs Lots of bug fixes, cleanups and optimizations. In the bug fixes 2013-07-02 09:39:34 -07:00
minix fs/minix: Drop dependency on H8300 2013-09-16 18:20:25 -07:00
ncpfs ncpfs: fix error return code in ncp_parse_options() 2013-07-09 10:33:25 -07:00
nfs NFSv4.2: Remove redundant checks in nfs_setsecurity+nfs4_label_init_security 2013-11-04 16:42:52 -05:00
nfs_common
nfsd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-09-12 15:01:38 -07:00
nilfs2 nilfs2: fix issue with race condition of competition between segments for dirty blocks 2013-09-30 14:31:02 -07:00
nls
notify fsnotify: update comments concerning locking scheme 2013-07-09 10:33:20 -07:00
ntfs truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
ocfs2 ocfs2: simplify ocfs2_invalidatepage() and ocfs2_releasepage() 2013-11-13 12:09:02 +09:00
omfs truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
openpromfs [readdir] convert openpromfs 2013-06-29 12:56:32 +04:00
proc mm: factor commit limit calculation 2013-11-13 12:09:11 +09:00
pstore pstore: Remove the messages related to compression failure 2013-09-16 09:28:29 -07:00
qnx4 [readdir] convert qnx4 2013-06-29 12:56:38 +04:00
qnx6 [readdir] convert qnx6 2013-06-29 12:56:39 +04:00
quota fs: convert fs shrinkers to new scan/count API 2013-09-10 18:56:31 -04:00
ramfs initmpfs: move rootfs code from fs/ramfs/ to init/ 2013-09-11 15:59:37 -07:00
reiserfs reiserfs: fix race with flush_used_journal_lists and flush_journal_list 2013-09-24 11:24:21 +02:00
romfs [readdir] convert romfs 2013-06-29 12:56:29 +04:00
squashfs Squashfs: add corruption check for type in squashfs_readdir() 2013-09-06 04:57:54 +01:00
sysfs Revert "sysfs: drop kobj_ns_type handling" 2013-11-07 20:47:28 +09:00
sysv sysv: Add forgotten superblock lock init for v7 fs 2013-09-29 22:02:02 -04:00
ubifs Just one patch which fixes the power-cut recovery testing mode. 2013-09-16 15:36:55 -04:00
udf udf: Fortify LVID loading 2013-09-24 11:23:33 +02:00
ufs truncate: drop 'oldsize' truncate_pagecache() parameter 2013-09-12 15:38:02 -07:00
xfs writeback: do not sync data dirtied after sync start 2013-11-13 12:09:07 +09:00
aio.c aio: fix use-after-free in aio_migratepage 2013-09-26 20:34:51 -04:00
anon_inodes.c fs/anon_inode: Introduce a new lib function anon_inode_getfile_private() 2013-07-16 09:32:17 -04:00
attr.c
bad_inode.c [readdir] ->readdir() is gone 2013-06-29 12:57:04 +04:00
binfmt_aout.c mm: remove free_area_cache 2013-07-10 18:11:34 -07:00
binfmt_elf_fdpic.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2013-05-02 10:16:16 -07:00
binfmt_elf.c fs/binfmt_elf.c: prevent a coredump with a large vm_map_count from Oopsing 2013-09-30 14:31:01 -07:00
binfmt_em86.c
binfmt_flat.c new helper: read_code() 2013-04-29 15:40:23 -04:00
binfmt_misc.c binfmt_misc: reuse string_unescape_inplace() 2013-04-30 17:04:03 -07:00
binfmt_script.c
binfmt_som.c
bio-integrity.c Merge branch 'for-3.12/core' of git://git.kernel.dk/linux-block 2013-09-22 15:00:11 -07:00
bio.c block: Fix bio_copy_data() 2013-09-24 14:41:42 -07:00
block_dev.c a trivial writeback fix 2013-09-13 23:06:40 -04:00
buffer.c fs: buffer: move allocation failure loop into the allocator 2013-10-16 21:35:53 -07:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c compat.c: LOOP_CLR_FD is taken care of in loop.c itself... 2013-06-29 12:46:44 +04:00
compat.c [readdir] constify ->actor 2013-06-29 12:57:05 +04:00
coredump.c coredump: add new %P variable in core_pattern 2013-09-11 15:59:01 -07:00
coredump.h
dcache.c vfs: decrapify dput(), fix cache behavior under normal load 2013-10-31 15:43:02 -07:00
dcookies.c
direct-io.c direct-io: Use return from cmpxchg to decide of assignment happened 2013-09-09 10:47:42 -07:00
drop_caches.c shrinker: add node awareness 2013-09-10 18:56:31 -04:00
eventfd.c
eventpoll.c epoll: optimize EPOLL_CTL_DEL using rcu 2013-11-13 12:09:25 +09:00
exec.c sched/numa: Call task_numa_free() from do_execve() 2013-10-09 14:48:00 +02:00
fcntl.c vfs: add missing check for __O_TMPFILE in fcntl_init() 2013-08-05 18:25:32 +04:00
fhandle.c
file_table.c nfsd regression since delayed fput() 2013-10-20 08:44:39 -04:00
file.c don't bother with deferred freeing of fdtables 2013-05-01 17:31:42 -04:00
filesystems.c
fs_struct.c
fs-writeback.c writeback: do not sync data dirtied after sync start 2013-11-13 12:09:07 +09:00
generic_acl.c
inode.c fs: convert inode and dentry shrinking to be node aware 2013-09-10 18:56:31 -04:00
internal.h fs: convert inode and dentry shrinking to be node aware 2013-09-10 18:56:31 -04:00
ioctl.c
ioprio.c
Kconfig efivarfs: Move to fs/efivarfs 2013-04-17 13:25:09 +01:00
Kconfig.binfmt fs: make binfmt support for #! scripts modular and removable 2013-04-30 17:04:04 -07:00
libfs.c make simple_lookup() usable for filesystems that set ->s_d_op 2013-07-14 17:43:25 +04:00
locks.c locks: move file_lock_list to a set of percpu hlist_heads and convert file_lock_lock to an lglock 2013-07-08 13:36:42 +04:00
Makefile Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
mbcache.c fs: convert fs shrinkers to new scan/count API 2013-09-10 18:56:31 -04:00
mount.h get rid of full-hash scan on detaching vfsmounts 2013-04-09 14:12:52 -04:00
mpage.c
namei.c fs/namei.c: fix new kernel-doc warning 2013-10-22 12:02:40 +01:00
namespace.c initmpfs: move rootfs code from fs/ramfs/ to init/ 2013-09-11 15:59:37 -07:00
no-block.c
open.c vfs: improve i_op->atomic_open() documentation 2013-09-16 19:17:24 -04:00
pipe.c aio: don't include aio.h in sched.h 2013-05-07 20:16:25 -07:00
pnode.c vfs: Fix invalid ida_remove() call 2013-05-31 15:16:33 -04:00
pnode.h vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces 2013-08-26 18:42:15 -07:00
posix_acl.c
proc_namespace.c
read_write.c aio: Kill aio_rw_vect_retry() 2013-07-30 11:53:12 -04:00
readdir.c [readdir] constify ->actor 2013-06-29 12:57:05 +04:00
select.c Revert "select: use freezable blocking call" 2013-10-30 15:28:35 +01:00
seq_file.c seq_file: always update file->f_pos in seq_lseek() 2013-10-25 10:46:40 -04:00
signalfd.c
splice.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-07-03 09:10:19 -07:00
stack.c
stat.c quota: provide interface for readding allocated space into reserved space 2013-08-17 09:32:32 -04:00
statfs.c vfs: allow O_PATH file descriptors for fstatfs() 2013-10-12 13:12:31 -07:00
super.c fs/super.c: fix lru_list leak for real 2013-10-01 13:11:21 -04:00
sync.c writeback: do not sync data dirtied after sync start 2013-11-13 12:09:07 +09:00
timerfd.c timerfd: Add alarm timers 2013-05-29 12:57:34 -07:00
utimes.c
xattr_acl.c
xattr.c