linux

History

Nick Piggin 6416ccb789 fs: scale files_lock fs: scale files_lock Improve scalability of files_lock by adding per-cpu, per-sb files lists, protected with an lglock. The lglock provides fast access to the per-cpu lists to add and remove files. It also provides a snapshot of all the per-cpu lists (although this is very slow). One difficulty with this approach is that a file can be removed from the list by another CPU. We must track which per-cpu list the file is on with a new variale in the file struct (packed into a hole on 64-bit archs). Scalability could suffer if files are frequently removed from different cpu's list. However loads with frequent removal of files imply short interval between adding and removing the files, and the scheduler attempts to avoid moving processes too far away. Also, even in the case of cross-CPU removal, the hardware has much more opportunity to parallelise cacheline transfers with N cachelines than with 1. A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs degenerates to contending on a single lock, which is no worse than before. When more than one CPU are allocating files, even if they are always freed by different CPUs, there will be more parallelism than the single-lock case. Testing results: On a 2 socket, 8 core opteron, I measure the number of times the lock is taken to remove the file, the number of times it is removed by the same CPU that added it, and the number of times it is removed by the same node that added it. Booting: locks= 25049 cpu-hits= 23174 (92.5%) node-hits= 23945 (95.6%) kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%) dbench 64 locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%) So a file is removed from the same CPU it was added by over 90% of the time. It remains within the same node 95% of the time. Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile. throughput 2.6.34-rc2 24.5 +patch 24.9 us sys idle IO wait (in %) 2.6.34-rc2 51.25 28.25 17.25 3.25 +patch 53.75 18.5 19 8.75 So significantly less CPU time spent in kernel code, higher idle time and slightly higher throughput. Single threaded performance difference was within the noise of microbenchmarks. That is not to say penalty does not exist, the code is larger and more memory accesses required so it will be slightly slower. Cc: linux-kernel@vger.kernel.org Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>		2010-08-18 08:35:48 -04:00
..
9p	v9fs: fixup for inode_setattr being removed	2010-08-11 00:08:00 -04:00
adfs	check ATTR_SIZE contraints in inode_change_ok	2010-08-09 16:47:39 -04:00
affs	AFFS: wait for sb synchronization when needed	2010-08-09 16:48:51 -04:00
afs	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6	2010-08-13 10:37:30 -07:00
autofs	autofs/autofs4: Move compat_ioctl handling into fs	2010-08-09 00:13:34 +02:00
autofs4	autofs4: remove unneeded null check in try_to_fill_dentry()	2010-08-11 08:59:06 -07:00
befs	fix typos concerning "initiali[zs]e"	2010-06-16 18:05:05 +02:00
bfs	BFS: clean up the superblock usage	2010-08-09 16:48:53 -04:00
btrfs	Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block	2010-08-10 15:22:42 -07:00
cachefiles	Add a dummy printk function for the maintenance of unused printks	2010-08-12 09:51:35 -07:00
ceph	ceph: generalize mon requests, add pool op support	2010-08-10 14:41:25 -07:00
cifs	cifs: update README to include details about 'fsc' option	2010-08-11 17:11:28 +00:00
coda	Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block	2010-08-10 15:22:42 -07:00
configfs	fix setattr error handling in sysfs, configfs	2010-06-04 17:16:29 -04:00
cramfs	cramfs: only unlock new inodes	2010-08-18 01:01:33 -04:00
debugfs	Add x64 support to debugfs	2010-05-19 22:41:57 -04:00
devpts	Simplify devpts_get_sb() failure exits	2010-05-21 18:31:12 -04:00
dlm	fs/dlm: Drop unnecessary null test	2010-08-05 14:23:45 -05:00
ecryptfs	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6	2010-08-10 12:14:39 -07:00
efs
exofs	Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd	2010-08-11 09:19:43 -07:00
exportfs
ext2	mbcache: Remove unused features	2010-08-09 16:48:45 -04:00
ext3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6	2010-08-10 11:26:52 -07:00
ext4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6	2010-08-10 11:26:52 -07:00
fat	remove SWRITE* I/O types	2010-08-18 01:09:01 -04:00
freevxfs	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6	2010-08-10 11:26:52 -07:00
fscache	Add a dummy printk function for the maintenance of unused printks	2010-08-12 09:51:35 -07:00
fuse	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6	2010-08-10 11:26:52 -07:00
gfs2	Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block	2010-08-10 15:22:42 -07:00
hfs	convert remaining ->clear_inode() to ->evict_inode()	2010-08-09 16:48:37 -04:00
hfsplus	convert remaining ->clear_inode() to ->evict_inode()	2010-08-09 16:48:37 -04:00
hostfs	hostfs ->follow_link() braino	2010-08-18 06:21:10 -04:00
hpfs	switch hpfs to ->evict_inode()	2010-08-09 16:48:17 -04:00
hppfs	switch hppfs to ->evict_inode()	2010-08-09 16:48:16 -04:00
hugetlbfs	new helper: end_writeback()	2010-08-09 16:47:49 -04:00
isofs	isofs: Fix lseek() to position beyond 4 GB	2010-08-11 00:29:47 -04:00
jbd	remove SWRITE* I/O types	2010-08-18 01:09:01 -04:00
jbd2	remove SWRITE* I/O types	2010-08-18 01:09:01 -04:00
jffs2	Merge git://git.infradead.org/mtd-2.6	2010-08-10 11:49:21 -07:00
jfs	jfs: don't allow os2 xattr namespace overlap with others	2010-08-10 15:33:09 -07:00
lockd	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
logfs	logfs: kill BKL	2010-08-14 00:24:24 +02:00
minix	switch minix to ->evict_inode(), fix write_inode/delete_inode race	2010-08-09 16:47:53 -04:00
ncpfs	Merge branch 'bkl/ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing	2010-08-10 13:58:28 -07:00
nfs	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6	2010-08-13 10:37:30 -07:00
nfs_common	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
nfsd	Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify	2010-08-10 11:39:13 -07:00
nilfs2	kill BH_Ordered flag	2010-08-18 01:09:00 -04:00
nls
notify	Revert "fsnotify: store struct file not struct path"	2010-08-12 14:23:04 -07:00
ntfs	convert remaining ->clear_inode() to ->evict_inode()	2010-08-09 16:48:37 -04:00
ocfs2	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2	2010-08-13 10:43:50 -07:00
omfs	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bcopeland/omfs	2010-08-10 11:47:36 -07:00
openpromfs
partitions	[S390] partitions: fix build error in ibm partition detection code	2010-08-13 10:06:55 +02:00
proc	mm: fix up some user-visible effects of the stack guard page	2010-08-15 11:35:52 -07:00
qnx4	get rid of cont_write_begin_newtrunc	2010-08-09 16:47:31 -04:00
quota	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6	2010-08-10 11:26:52 -07:00
ramfs	check ATTR_SIZE contraints in inode_change_ok	2010-08-09 16:47:39 -04:00
reiserfs	remove SWRITE* I/O types	2010-08-18 01:09:01 -04:00
romfs
smbfs	switch smbfs to evict_inode()	2010-08-09 16:48:00 -04:00
squashfs	Squashfs: fix checkpatch.pl warnings	2010-08-08 22:29:33 +00:00
sysfs	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6	2010-08-10 11:26:52 -07:00
sysv	fs/sysv/super.c: add support for non-PDP11 v7 filesystems	2010-08-11 08:59:23 -07:00
ubifs	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6	2010-08-10 11:26:52 -07:00
udf	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6	2010-08-10 11:26:52 -07:00
ufs	remove SWRITE* I/O types	2010-08-18 01:09:01 -04:00
xfs	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6	2010-08-10 11:26:52 -07:00
aio.c	aio: fix wrong subsystem comments	2010-08-05 13:21:23 -07:00
anon_inodes.c	Revert "anon_inode: set S_IFREG on the anon_inode"	2010-05-27 22:03:05 -04:00
attr.c	check ATTR_SIZE contraints in inode_change_ok	2010-08-09 16:47:39 -04:00
bad_inode.c	bkl: Remove locked .ioctl file operation	2010-08-14 00:24:24 +02:00
binfmt_aout.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
binfmt_elf_fdpic.c	binfmt_elf_fdpic: Fix clear_user() error handling	2010-06-01 08:11:06 -07:00
binfmt_elf.c	coredump: pass mm->flags as a coredump parameter for consistency	2010-03-06 11:26:46 -08:00
binfmt_em86.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
binfmt_flat.c	flat: tweak default stack alignment	2010-06-29 15:29:31 -07:00
binfmt_misc.c	convert remaining ->clear_inode() to ->evict_inode()	2010-08-09 16:48:37 -04:00
binfmt_script.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
binfmt_som.c
bio-integrity.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
bio.c	block: unify flags for struct bio and struct request	2010-08-07 18:20:39 +02:00
block_dev.c	blkdev: cgroup whitelist permission fix	2010-08-11 08:59:18 -07:00
buffer.c	remove SWRITE* I/O types	2010-08-18 01:09:01 -04:00
char_dev.c	Fix init ordering of /dev/console vs callers of modprobe	2010-08-06 09:17:02 -07:00
compat_binfmt_elf.c	elf coredump: replace ELF_CORE_EXTRA_* macros by functions	2010-03-06 11:26:45 -08:00
compat_ioctl.c	bkl: Remove locked .ioctl file operation	2010-08-14 00:24:24 +02:00
compat.c	Mark arguments to certain syscalls as being const	2010-08-13 16:53:13 -07:00
dcache.c	fs: remove extra lookup in __lookup_hash	2010-08-18 08:35:47 -04:00
dcookies.c
direct-io.c	sort out blockdev_direct_IO variants	2010-08-09 16:47:29 -04:00
drop_caches.c	simplify checks for I_CLEAR/I_FREEING	2010-08-09 16:47:44 -04:00
eventfd.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
eventpoll.c	sched, wait: Use wrapper functions	2010-05-11 17:43:58 +02:00
exec.c	fs: fs_struct rwlock to spinlock	2010-08-18 08:35:46 -04:00
fcntl.c	vfs: O_* bit numbers uniqueness check	2010-08-11 08:59:02 -07:00
fifo.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
file_table.c	fs: scale files_lock	2010-08-18 08:35:48 -04:00
file.c	vfs: use kmalloc() to allocate fdmem if possible	2010-08-11 08:59:02 -07:00
filesystems.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
fs_struct.c	fs: fs_struct rwlock to spinlock	2010-08-18 08:35:46 -04:00
fs-writeback.c	mm: fix writeback_in_progress()	2010-08-12 08:43:30 -07:00
generic_acl.c	vfs: update ctime when changing the file's permission by setfacl	2010-08-18 01:04:22 -04:00
inode.c	Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify	2010-08-10 11:39:13 -07:00
internal.h	tty: fix fu_list abuse	2010-08-18 08:35:47 -04:00
ioctl.c	bkl: Remove locked .ioctl file operation	2010-08-14 00:24:24 +02:00
ioprio.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
Kconfig	fs/Kconfig: Fix typo Userpace -> Userspace	2010-07-20 17:30:22 +02:00
Kconfig.binfmt
libfs.c	check ATTR_SIZE contraints in inode_change_ok	2010-08-09 16:47:39 -04:00
locks.c	Merge branch 'for-next' into for-linus	2010-03-08 16:55:37 +01:00
Makefile	Take statfs variants to fs/statfs.c	2010-05-21 18:31:17 -04:00
mbcache.c	mbcache: Limit the maximum number of cache entries	2010-08-18 06:24:41 -04:00
mpage.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
namei.c	fs: remove extra lookup in __lookup_hash	2010-08-18 08:35:47 -04:00
namespace.c	vfs: remove unused MNT_STRICTATIME	2010-08-11 00:29:47 -04:00
nfsctl.c	Switch may_open() and break_lease() to passing O_...	2010-03-03 13:00:21 -05:00
no-block.c
open.c	fs: cleanup files_lock locking	2010-08-18 08:35:47 -04:00
pipe.c	pipe: fix check in "set size" fcntl	2010-06-10 19:08:34 +02:00
pnode.c	Kill CL_PROPAGATION, sanitize fs/pnode.c:get_source()	2010-03-03 13:00:22 -05:00
pnode.h	VFS: Clean up shared mount flag propagation	2010-03-03 14:07:55 -05:00
posix_acl.c
read_write.c	fsnotify: pass a file instead of an inode to open, read, and write	2010-07-28 09:58:32 -04:00
read_write.h
readdir.c	vfs: fix warning: 'dirent' is used uninitialized in this function	2010-08-09 20:45:05 -07:00
select.c	Add generic sys_old_select()	2010-03-12 15:52:32 -08:00
seq_file.c	seq_file: fix new kernel-doc warnings	2010-03-07 15:48:26 -08:00
signalfd.c	signalfd: fill in ssi_int for posix timers and message queues	2010-08-11 08:59:20 -07:00
splice.c	splice: fix misuse of SPLICE_F_NONBLOCK	2010-08-07 18:52:56 +02:00
stack.c
stat.c	Mark arguments to certain syscalls as being const	2010-08-13 16:53:13 -07:00
statfs.c	add f_flags to struct statfs(64)	2010-08-09 16:48:44 -04:00
super.c	fs: scale files_lock	2010-08-18 08:35:48 -04:00
sync.c	get rid of file_fsync()	2010-08-09 16:47:43 -04:00
timerfd.c	fs/timerfd.c: make use of wait_event_interruptible_locked_irq()	2010-05-20 13:21:42 -07:00
utimes.c	Mark arguments to certain syscalls as being const	2010-08-13 16:53:13 -07:00
xattr_acl.c	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h	2010-03-30 22:02:32 +09:00
xattr.c	fs: xattr_handler table should be const	2010-05-21 18:31:18 -04:00