linux/fs
Brian Foster 5d11fb4b9a xfs: rework zero range to prevent invalid i_size updates
The zero range operation is analogous to fallocate with the exception of
converting the range to zeroes. E.g., it attempts to allocate zeroed
blocks over the range specified by the caller. The XFS implementation
kills all delalloc blocks currently over the aligned range, converts the
range to allocated zero blocks (unwritten extents) and handles the
partial pages at the ends of the range by sending writes through the
pagecache.

The current implementation suffers from several problems associated with
inode size. If the aligned range covers an extending I/O, said I/O is
discarded and an inode size update from a previous write never makes it
to disk. Further, if an unaligned zero range extends beyond eof, the
page write induced for the partial end page can itself increase the
inode size, even if the zero range request is not supposed to update
i_size (via KEEP_SIZE, similar to an fallocate beyond EOF).

The latter behavior not only incorrectly increases the inode size, but
can lead to stray delalloc blocks on the inode. Typically, post-eof
preallocation blocks are either truncated on release or inode eviction
or explicitly written to by xfs_zero_eof() on natural file size
extension. If the inode size increases due to zero range, however,
associated blocks leak into the address space having never been
converted or mapped to pagecache pages. A direct I/O to such an
uncovered range cannot convert the extent via writeback and will BUG().
For example:

$ xfs_io -fc "pwrite 0 128k" -c "fzero -k 1m 54321" <file>
...
$ xfs_io -d -c "pread 128k 128k" <file>
<BUG>

If the entire delalloc extent happens to not have page coverage
whatsoever (e.g., delalloc conversion couldn't find a large enough free
space extent), even a full file writeback won't convert what's left of
the extent and we'll assert on inode eviction.

Rework xfs_zero_file_space() to avoid buffered I/O for partial pages.
Use the existing hole punch and prealloc mechanisms as primitives for
zero range. This implementation is not efficient nor ideal as we
writeback dirty data over the range and remove existing extents rather
than convert to unwrittern. The former writeback, however, is currently
the only mechanism available to ensure consistency between pagecache and
extent state. Even a pagecache truncate/delalloc punch prior to hole
punch has lead to inconsistencies due to racing with writeback.

This provides a consistent, correct implementation of zero range that
survives fsstress/fsx testing without assert failures. The
implementation can be optimized from this point forward once the
fundamental issue of pagecache and delalloc extent state consistency is
addressed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-10-30 10:35:11 +11:00
..
9p 9p: switch to %p[dD] 2014-10-09 02:39:04 -04:00
adfs adfs: add __printf verification, fix format/argument mismatches 2014-08-08 15:57:24 -07:00
affs fs/affs: remove redundant sys_tz declarations 2014-10-14 02:18:22 +02:00
afs Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-13 16:23:15 +02:00
autofs4 autofs4: d_manage() should return -EISDIR when appropriate in rcu-walk mode. 2014-10-14 02:18:16 +02:00
befs fs/befs/btree.c: remove typedef befs_btree_node 2014-10-14 02:18:20 +02:00
bfs fs/bfs: use bfs prefix for dump_imap 2014-08-08 15:57:24 -07:00
btrfs vfs: export check_sticky() 2014-10-24 00:14:36 +02:00
cachefiles FS-Cache fixes 2014-10-14 08:40:15 +02:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2014-10-15 06:46:01 +02:00
cifs [CIFS] Remove obsolete comment 2014-10-17 17:17:12 -05:00
coda fs/coda: use linux/uaccess.h 2014-08-08 15:57:20 -07:00
configfs fs/configfs: use pr_fmt 2014-06-04 16:53:53 -07:00
cramfs fs/cramfs/inode.c: use linux/uaccess.h 2014-08-08 15:57:25 -07:00
debugfs fs: debugfs: remove trailing whitespace 2014-07-09 16:58:21 -07:00
devpts fs/devpts/inode.c: convert printk to pr_foo() 2014-06-06 16:08:14 -07:00
dlm dlm: fix missing endian conversion of rcom_status flags 2014-10-14 15:11:48 -05:00
ecryptfs fs: limit filesystem stacking depth 2014-10-24 00:14:39 +02:00
efivarfs fs/efivarfs/super.c: use static const for dentry_operations 2014-06-04 16:54:14 -07:00
efs fs/efs/namei.c: return is not a function 2014-08-08 15:57:18 -07:00
exofs Boaz Harrosh - Fix broken email address 2014-10-19 20:22:32 +03:00
exportfs fs/exportfs/expfs.c: kernel-doc warning fixes 2014-06-04 16:54:14 -07:00
ext2 percpu_counter: add @gfp to percpu_counter_init() 2014-09-08 09:51:29 +09:00
ext3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2014-10-11 08:02:31 -04:00
ext4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-10-26 11:19:18 -07:00
f2fs f2fs: support volatile operations for transient data 2014-10-07 11:54:41 -07:00
fat fat: remove redundant sys_tz declaration 2014-10-14 02:18:20 +02:00
freevxfs
fscache fs/fscache/object-list.c: use __seq_open_private() 2014-10-13 17:52:21 +01:00
fuse vfs: Make d_invalidate return void 2014-10-09 02:38:57 -04:00
gfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-10-13 11:28:42 +02:00
hfs fs/hfs/hfs_fs.h: remove redundant sys_tz declaration 2014-10-14 02:18:20 +02:00
hfsplus Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-06-12 10:30:18 -07:00
hostfs hostfs: support rename flags 2014-08-07 14:40:09 -04:00
hpfs fs/hpfs/dnode.c: fix suspect code indent 2014-08-08 15:57:22 -07:00
hppfs
hugetlbfs fs/hugetlbfs/inode.c: remove null test before kfree 2014-06-04 16:54:11 -07:00
isofs isofs: replace strnicmp with strncasecmp 2014-10-14 02:18:24 +02:00
jbd jbd/jbd2: use non-movable memory for the jbd superblock 2014-09-04 22:36:35 -04:00
jbd2 jbd2: simplify calling convention around __jbd2_journal_clean_checkpoint_list 2014-09-18 00:58:12 -04:00
jffs2 [jffs2] kill wbuf_queued/wbuf_dwork_lock 2014-10-09 02:39:01 -04:00
jfs Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-13 16:23:15 +02:00
kernfs vfs: Remove unnecessary calls of check_submounts_and_drop 2014-10-09 02:38:56 -04:00
lockd File locking related changes for v3.18 (pile #1) 2014-10-11 13:21:34 -04:00
logfs fs/logfs/readwrite.c: kernel-doc warning fixes 2014-08-06 18:01:12 -07:00
minix minix zmap block counts calculation fix 2014-08-08 15:57:20 -07:00
ncpfs fs/ncpfs/dir.c: remove redundant sys_tz declaration 2014-10-14 02:18:16 +02:00
nfs Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd 2014-10-21 12:53:45 -07:00
nfs_common lockd: move lockd's grace period handling into its own module 2014-09-17 16:33:11 -04:00
nfsd Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-13 16:23:15 +02:00
nilfs2 nilfs2: improve the performance of fdatasync() 2014-10-14 02:18:20 +02:00
nls
notify File locking related changes for v3.18 (pile #1) 2014-10-11 13:21:34 -04:00
ntfs NTFS: Bump version to 2.1.31. 2014-10-16 12:53:35 +01:00
ocfs2 ocfs2: replace strnicmp with strncasecmp 2014-10-14 02:18:24 +02:00
omfs FS/OMFS: block number sanity check during fill_super operation 2014-10-14 02:18:22 +02:00
openpromfs
overlayfs overlayfs: embed middle into overlay_readdir_data 2014-10-24 20:25:23 -04:00
proc mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared 2014-10-14 02:18:28 +02:00
pstore pstore: Fix duplicate {console,ftrace}-efi entries 2014-10-15 13:51:33 -07:00
qnx4
qnx6 fs/qnx6: update debugging to current functions 2014-08-08 15:57:26 -07:00
quota percpu_counter: add @gfp to percpu_counter_init() 2014-09-08 09:51:29 +09:00
ramfs fs/ramfs/file-nommu.c: replace count*size kzalloc by kcalloc 2014-08-08 15:57:18 -07:00
reiserfs fs/reiserfs/journal.c: fix sparse context imbalance warning 2014-10-14 02:18:20 +02:00
romfs fs/romfs/super.c: add blank line after declarations 2014-08-08 15:57:25 -07:00
squashfs fs/squashfs/super.c: logging cleanup 2014-08-06 18:01:13 -07:00
sysfs kernfs: move the last knowledge of sysfs out from kernfs 2014-06-03 08:11:18 -07:00
sysv
ubifs UBIFS: Fix trivial typo in power_cut_emulated() 2014-09-30 09:29:44 +03:00
udf udf: Fix loading of special inodes 2014-10-09 13:06:14 +02:00
ufs fs/ufs/balloc.c: remove unused variable 2014-10-14 02:18:20 +02:00
xfs xfs: rework zero range to prevent invalid i_size updates 2014-10-30 10:35:11 +11:00
aio.c percpu_ref: add PERCPU_REF_INIT_* flags 2014-09-24 13:31:50 -04:00
anon_inodes.c
attr.c fs,userns: Change inode_capable to capable_wrt_inode_uidgid 2014-06-10 13:57:22 -07:00
bad_inode.c bad_inode: add ->rename2() 2014-08-07 14:40:09 -04:00
binfmt_aout.c handle suicide on late failure exits in execve() in search_binary_handler() 2014-10-09 02:39:00 -04:00
binfmt_elf_fdpic.c handle suicide on late failure exits in execve() in search_binary_handler() 2014-10-09 02:39:00 -04:00
binfmt_elf.c handle suicide on late failure exits in execve() in search_binary_handler() 2014-10-09 02:39:00 -04:00
binfmt_em86.c
binfmt_flat.c fs/binfmt_flat.c: make old_reloc() static 2014-06-04 16:54:21 -07:00
binfmt_misc.c binfmt_misc: work around gcc-4.9 warning 2014-10-14 02:18:16 +02:00
binfmt_script.c
binfmt_som.c
block_dev.c Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block 2014-10-18 11:53:51 -07:00
buffer.c A large number of cleanups and bug fixes, with some (minor) journal 2014-10-20 09:50:11 -07:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c Bluetooth: Move HCI socket definitions into its own header file 2014-07-11 13:53:04 +03:00
compat.c vfs: move getname() from callers to do_mount() 2014-10-09 02:39:16 -04:00
coredump.c coredump: add %i/%I in core_pattern to report the tid of the crashed thread 2014-10-14 02:18:21 +02:00
dcache.c fix inode leaks on d_splice_alias() failure exits 2014-10-23 22:30:18 -04:00
dcookies.c
direct-io.c fuse: honour max_read and max_write in direct_io mode 2014-09-26 21:16:51 -04:00
drop_caches.c fs: convert use of typedef ctl_table to struct ctl_table 2014-06-06 16:08:16 -07:00
eventfd.c
eventpoll.c eventpoll: fix uninitialized variable in epoll_ctl 2014-09-10 15:42:12 -07:00
exec.c handle suicide on late failure exits in execve() in search_binary_handler() 2014-10-09 02:39:00 -04:00
fcntl.c security: make security_file_set_fowner, f_setown and __f_setown void return 2014-09-09 16:01:36 -04:00
fhandle.c
file_table.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-10-13 11:28:42 +02:00
file.c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-13 15:44:12 +02:00
filesystems.c
fs_pin.c make fs/{namespace,super}.c forget about acct.h 2014-08-07 14:40:09 -04:00
fs_struct.c
fs-writeback.c sched: Remove proliferation of wait_on_bit() action functions 2014-07-16 15:10:39 +02:00
inode.c mm: allow drivers to prevent new writable mappings 2014-08-08 15:57:31 -07:00
internal.h vfs: export __inode_permission() to modules 2014-10-24 00:14:35 +02:00
ioctl.c
Kconfig overlay filesystem 2014-10-24 00:14:38 +02:00
Kconfig.binfmt
libfs.c locks: plumb a "priv" pointer into the setlease routines 2014-10-07 14:06:12 -04:00
locks.c locks: flock_make_lock should return a struct file_lock (or PTR_ERR) 2014-10-07 14:06:13 -04:00
Makefile overlay filesystem 2014-10-24 00:14:38 +02:00
mbcache.c fs/mbcache: replace __builtin_log2() with ilog2() 2014-06-25 22:08:29 -04:00
mount.h vfs: Add a function to lazily unmount all mounts from any dentry. 2014-10-09 02:38:55 -04:00
mpage.c vfs: guard end of device for mpage interface 2014-10-09 22:25:53 -04:00
namei.c vfs: add RENAME_WHITEOUT 2014-10-24 00:14:37 +02:00
namespace.c vfs: introduce clone_private_mount() 2014-10-24 00:14:36 +02:00
no-block.c
open.c vfs: add i_op->dentry_open() 2014-10-24 00:14:35 +02:00
pipe.c
pnode.c get rid of propagate_umount() mistakenly treating slaves as busy. 2014-08-30 18:31:41 -04:00
pnode.h
posix_acl.c
proc_namespace.c namespaces: Use task_lock and not rcu to protect nsproxy 2014-07-29 18:08:50 -07:00
read_write.c cachefiles_write_page(): switch to __kernel_write() 2014-10-09 02:39:05 -04:00
readdir.c fanotify: create FAN_ACCESS event for readdir 2014-06-04 16:53:52 -07:00
select.c
seq_file.c fs/seq_file: fallback to vmalloc allocation 2014-07-03 09:21:54 -07:00
signalfd.c
splice.c vfs: export do_splice_direct() to modules 2014-10-24 00:14:35 +02:00
stack.c fs: fix comment for 'CONFIG_LBADF' 2014-08-26 09:35:56 +02:00
stat.c
statfs.c
super.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-10-13 11:28:42 +02:00
sync.c Export sync_filesystem() for modular ->remount_fs() use 2014-09-05 08:16:21 -07:00
timerfd.c timerfd: Remove an always true check 2014-08-27 11:17:48 +02:00
utimes.c
xattr.c vfs: Deduplicate code shared by xattr system calls operating on paths 2014-10-12 17:09:10 -04:00