linux/fs
J. Bruce Fields 3d330dc175 dcache: return -ESTALE not -EBUSY on distributed fs race
On a distributed filesystem it's possible for lookup to discover that a
directory it just found is already cached elsewhere in the directory
heirarchy.  The dcache won't let us keep the directory in both places,
so we have to move the dentry to the new location from the place we
previously had it cached.

If the parent has changed, then this requires all the same locks as we'd
need to do a cross-directory rename.  But we're already in lookup
holding one parent's i_mutex, so it's too late to acquire those locks in
the right order.

The (unreliable) solution in __d_unalias is to trylock() the required
locks and return -EBUSY if it fails.

I see no particular reason for returning -EBUSY, and -ESTALE is already
the result of some other lookup races on NFS.  I think -ESTALE is the
more helpful error return.  It also allows us to take advantage of the
logic Jeff Layton added in c6a9428401 "vfs: fix renameat to retry on
ESTALE errors" and ancestors, which hopefully resolves some of these
errors before they're returned to userspace.

I can reproduce these cases using NFS with:

	ssh root@$client '
		mount -olookupcache=pos '$server':'$export' /mnt/
		mkdir /mnt/TO
		mkdir /mnt/DIR
		touch /mnt/DIR/test.txt
		while true; do
			strace -e open cat /mnt/DIR/test.txt 2>&1 | grep EBUSY
		done
	'
	ssh root@$server '
		while true; do
			mv $export/DIR $export/TO/DIR
			mv $export/TO/DIR $export/DIR
		done
	'

It also helps to add some other concurrent use of the directory on the
client (e.g., "ls /mnt/TO").  And you can replace the server-side mv's
by client-side mv's that are repeatedly killed.  (If the client is
interrupted while waiting for the RENAME response then it's left with a
dentry that has to go under one parent or the other, but it doesn't yet
know which.)

Acked-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:24:33 -04:00
..
9p VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
adfs
affs fs/affs/super.c: fix switch indentation 2015-02-17 14:34:53 -08:00
afs Merge branch 'for-3.20/bdi' of git://git.kernel.dk/linux-block 2015-02-12 13:50:21 -08:00
autofs4 autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation 2015-02-22 11:43:34 -05:00
befs fs/befs/linuxvfs.c: remove unnecessary casting 2015-02-17 14:34:50 -08:00
bfs
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2015-03-21 10:53:37 -07:00
cachefiles Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions 2015-02-22 11:38:41 -05:00
ceph Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-22 17:42:14 -08:00
cifs Revert "locks: keep a count of locks on the flctx lists" 2015-02-16 14:32:03 -05:00
coda VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
configfs configfs: Fix potential NULL d_inode dereference 2015-02-20 04:56:43 -05:00
cramfs
debugfs debugfs: leave freeing a symlink body until inode eviction 2015-02-22 11:38:43 -05:00
devpts
dlm netlink: make nlmsg_end() and genlmsg_end() void 2015-01-18 01:03:45 -05:00
ecryptfs eCryptfs: don't pass fs-specific ioctl commands through 2015-03-03 02:03:56 -06:00
efivarfs * Move efivarfs from the misc filesystem section to pseudo filesystem, 2015-01-29 19:16:40 +01:00
efs
exofs vfs: remove get_xip_mem 2015-02-16 17:56:03 -08:00
exportfs VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
ext2 ext2: get rid of most mentions of XIP in ext2 2015-02-16 17:56:04 -08:00
ext3 ext3: destroy sbi mutexes in put_super 2015-01-05 11:13:55 +01:00
ext4 Ext4 bug fixes for 3.20. We also reserved code points for encryption 2015-02-22 18:05:13 -08:00
f2fs Merge tag 'for-f2fs-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs 2015-02-12 19:28:50 -08:00
fat fs: fat: use MSDOS_SB macro to get msdos_sb_info 2015-02-17 14:34:51 -08:00
freevxfs
fscache
fuse fuse: explicitly set /dev/fuse file's private_data 2015-03-19 15:29:22 +01:00
gfs2 VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
hfs fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp 2014-12-10 17:41:16 -08:00
hfsplus VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
hostfs
hpfs
hppfs VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
hugetlbfs fs: remove mapping->backing_dev_info 2015-01-20 14:03:05 -07:00
isofs isofs: Fix bug in the way to check if the year is a leap year 2015-01-07 09:51:49 +01:00
jbd jbd: Deletion of an unnecessary check before the function call "iput" 2014-11-18 10:15:29 +01:00
jbd2 jbd2: complain about descriptor block checksum errors 2015-01-19 15:59:58 -05:00
jffs2 Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-22 17:42:14 -08:00
jfs Merge branch 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 16:12:34 -08:00
kernfs kernfs: handle poll correctly on 'direct_read' files. 2015-03-16 21:51:20 +01:00
lockd Merge branch 'for-3.20' of git://linux-nfs.org/~bfields/linux 2015-02-12 10:39:41 -08:00
logfs
minix
ncpfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 14:56:45 -08:00
nfs NFSv4.1: Clear the old state by our client id before establishing a new lease 2015-03-03 21:52:30 -05:00
nfs_common
nfsd Subject: nfsd: don't recursively call nfsd4_cb_layout_fail 2015-03-19 15:49:27 -04:00
nilfs2 nilfs2: fix deadlock of segment constructor during recovery 2015-03-12 18:46:08 -07:00
nls
notify fanotify: fix event filtering with FAN_ONDIR set 2015-03-12 18:46:08 -07:00
ntfs NTFS: Version 2.1.32 - Update file write from aio_write to write_iter. 2015-04-11 22:24:33 -04:00
ocfs2 ocfs2: make append_dio an incompat feature 2015-03-12 18:46:07 -07:00
omfs
openpromfs
overlayfs ovl: upper fs should not be R/O 2015-03-18 10:29:48 +01:00
proc pagemap: do not leak physical addresses to non-privileged userspace 2015-03-17 09:31:30 -07:00
pstore pstore: Fix sprintf format specifier in pstore_dump() 2015-01-16 16:01:29 -08:00
qnx4
qnx6
quota Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2015-02-10 15:52:38 -08:00
ramfs fs: remove mapping->backing_dev_info 2015-01-20 14:03:05 -07:00
reiserfs VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
romfs fs: remove mapping->backing_dev_info 2015-01-20 14:03:05 -07:00
squashfs Squashfs: Add LZ4 compression configuration option 2014-11-27 18:48:44 +00:00
sysfs driver core patches for 3.20-rc1 2015-02-15 11:11:47 -08:00
sysv
ubifs Merge branch 'for-linus-v3.20' of git://git.infradead.org/linux-ubifs 2015-02-15 10:11:39 -08:00
udf udf: remove bool assignment to 0/1 2015-02-05 16:34:25 +01:00
ufs fs/ufs/super.c: fix potential race condition 2015-02-17 14:34:51 -08:00
xfs xfs: cancel failed transaction in xfs_fs_commit_blocks() 2015-02-24 10:15:18 +11:00
aio.c fs/aio.c: Remove duplicate function name in pr_debug messages 2015-02-20 04:56:44 -05:00
anon_inodes.c
attr.c
bad_inode.c don't bother with most of the bad_file_ops methods 2015-02-20 04:03:58 -05:00
binfmt_aout.c assorted conversions to %p[dD] 2014-11-19 13:01:20 -05:00
binfmt_elf_fdpic.c
binfmt_elf.c x86, mm/ASLR: Fix stack randomization on 64-bit systems 2015-02-19 12:21:36 +01:00
binfmt_em86.c syscalls: implement execveat() system call 2014-12-13 12:42:51 -08:00
binfmt_flat.c
binfmt_misc.c unfuck binfmt_misc.c (broken by commit e6084d4) 2014-12-17 08:27:14 -05:00
binfmt_script.c syscalls: implement execveat() system call 2014-12-13 12:42:51 -08:00
block_dev.c Merge branch 'for-3.20/core' of git://git.kernel.dk/linux-block 2015-02-12 14:13:23 -08:00
buffer.c
char_dev.c fs: introduce f_op->mmap_capabilities for nommu mmap support 2015-01-20 14:02:58 -07:00
compat_binfmt_elf.c
compat_ioctl.c
compat.c vfs: make first argument of dir_context.actor typed 2014-10-31 17:48:54 -04:00
coredump.c coredump: Fix typo in comment 2015-02-20 04:56:44 -05:00
dax.c dax: add dax_zero_page_range 2015-02-16 17:56:04 -08:00
dcache.c dcache: return -ESTALE not -EBUSY on distributed fs race 2015-04-11 22:24:33 -04:00
dcookies.c
direct-io.c
drop_caches.c vmscan: per memory cgroup slab shrinkers 2015-02-12 18:54:09 -08:00
eventfd.c eventfd: don't take the spinlock in eventfd_poll 2015-02-17 14:34:52 -08:00
eventpoll.c epoll: optimize setting task running after blocking 2015-02-13 21:21:40 -08:00
exec.c fs: create proper filename objects using getname_kernel() 2015-01-23 00:22:20 -05:00
fcntl.c vfs: renumber FMODE_NONOTIFY and add to uniqueness check 2015-01-08 15:10:52 -08:00
fhandle.c
file_table.c
file.c fs/file.c: replace get_unused_fd() with get_unused_fd_flags(0) 2014-12-10 17:41:10 -08:00
filesystems.c
fs_pin.c switch the IO-triggering parts of umount to fs_pin 2015-01-25 23:17:29 -05:00
fs_struct.c
fs-writeback.c trylock_super(): replacement for grab_super_passive() 2015-02-22 11:38:42 -05:00
inode.c Merge branch 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 16:12:34 -08:00
internal.h trylock_super(): replacement for grab_super_passive() 2015-02-22 11:38:42 -05:00
ioctl.c fsioctl.c: make generic_block_fiemap() signal-tolerant 2015-02-10 14:30:30 -08:00
Kconfig dax: does not work correctly with virtual aliasing caches 2015-02-16 17:56:04 -08:00
Kconfig.binfmt fs/binfmt_som: Drop kernel support for HP-UX SOM binaries 2015-02-17 16:29:36 +01:00
libfs.c VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
locks.c locks: fix generic_delete_lease tracepoint to use victim pointer 2015-03-14 09:45:35 -04:00
Makefile Merge branch 'parisc-3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux 2015-02-17 14:25:58 -08:00
mbcache.c
mount.h switch the IO-triggering parts of umount to fs_pin 2015-01-25 23:17:29 -05:00
mpage.c
namei.c remove incorrect comment in lookup_one_len() 2015-04-11 22:24:30 -04:00
namespace.c VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
no-block.c
nsfs.c take the targets of /proc/*/ns/* symlinks to separate fs 2014-12-10 21:30:20 -05:00
open.c drop bogus check in file_open_root() 2015-04-11 22:24:32 -04:00
pipe.c
pnode.c mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers. 2014-12-02 10:46:50 -06:00
pnode.h
posix_acl.c VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) 2015-02-22 11:38:41 -05:00
proc_namespace.c vfs: add support for a lazytime mount option 2015-02-05 02:45:00 -05:00
read_write.c Merge branch 'iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-02-17 15:48:33 -08:00
readdir.c vfs: make first argument of dir_context.actor typed 2014-10-31 17:48:54 -04:00
select.c all arches, signal: move restart_block to struct task_struct 2015-02-12 18:54:12 -08:00
seq_file.c bitmap, cpumask, nodemask: remove dedicated formatting functions 2015-02-13 21:21:39 -08:00
signalfd.c fs: Convert show_fdinfo functions to void 2014-11-05 14:13:23 -05:00
splice.c fs: add vfs_iter_{read,write} helpers 2015-01-29 00:13:13 -05:00
stack.c
stat.c switch security_inode_getattr() to struct path * 2015-04-11 22:24:32 -04:00
statfs.c
super.c trylock_super(): replacement for grab_super_passive() 2015-02-22 11:38:42 -05:00
sync.c vfs: add support for a lazytime mount option 2015-02-05 02:45:00 -05:00
timerfd.c fs: Convert show_fdinfo functions to void 2014-11-05 14:13:23 -05:00
utimes.c
xattr.c new helper: audit_file() 2014-11-19 13:01:26 -05:00