linux/fs/xfs
Brian Foster 3136e8bb30 xfs: always drain dio before extending aio write submission
XFS supports and typically allows concurrent asynchronous direct I/O
submission to a single file. One exception to the rule is that file
extending dio writes that start beyond the current EOF (e.g.,
potentially create a hole at EOF) require exclusive I/O access to the
file. This is because such writes must zero any pre-existing blocks
beyond EOF that are exposed by virtue of now residing within EOF as a
result of the write about to be submitted.

Before EOF zeroing can occur, the current file i_size must be stabilized
to avoid data corruption. In this scenario, XFS upgrades the iolock to
exclude any further I/O submission, waits on in-flight I/O to complete
to ensure i_size is up to date (i_size is updated on dio write
completion) and restarts the various checks against the state of the
file. The problem is that this protection sequence is triggered only
when the iolock is currently held shared. While this is true for async
dio in most cases, the caller may upgrade the lock in advance based on
arbitrary circumstances with respect to EOF zeroing. For example, the
iolock is always acquired exclusively if the start offset is not block
aligned. This means that even though the iolock is already held
exclusive for such I/Os, pending I/O is not drained and thus EOF zeroing
can occur based on an unstable i_size.

This problem has been reproduced as guest data corruption in virtual
machines with file-backed qcow2 virtual disks hosted on an XFS
filesystem. The virtual disks must be configured with aio=native mode
and the must not be truncated out to the maximum file size (as some virt
managers will do).

Update xfs_file_aio_write_checks() to unconditionally drain in-flight
dio before EOF zeroing can occur. Rather than trigger the wait based on
iolock state, use a new flag and upgrade the iolock when necessary. Note
that this results in a full restart of the inode checks even when the
iolock was already held exclusive when technically it is only required
to recheck i_size. This should be a rare enough occurrence that it is
preferable to keep the code simple rather than create an alternate
restart jump target.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-10-12 16:02:05 +11:00
..
libxfs Merge branch 'xfs-misc-fixes-for-4.3-4' into for-next 2015-09-01 10:30:11 +10:00
Kconfig
kmem.c xfs: change kmem_free to use generic kvfree() 2015-02-02 09:54:18 +11:00
kmem.h xfs: change kmem_free to use generic kvfree() 2015-02-02 09:54:18 +11:00
Makefile libxfs: add xfs_bit.c 2015-07-29 11:52:08 +10:00
mrlock.h
uuid.c
uuid.h
xfs_acl.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_acl.h xfs: move acl structures to xfs_format.h 2014-11-28 14:24:37 +11:00
xfs_aops.c xfs: add missing ilock around dio write last extent alignment 2015-10-12 15:34:20 +11:00
xfs_aops.h xfs: add DAX file operations support 2015-06-04 09:18:53 +10:00
xfs_attr_inactive.c Merge branch 'xfs-misc-fixes-for-4.2-3' into for-next 2015-06-23 08:49:01 +10:00
xfs_attr_list.c xfs: pass attr geometry to attr leaf header conversion functions 2015-04-13 11:26:02 +10:00
xfs_attr.h
xfs_bmap_util.c xfs: add missing bmap cancel calls in error paths 2015-08-19 10:01:40 +10:00
xfs_bmap_util.h xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate 2015-03-25 15:08:56 +11:00
xfs_buf_item.c Merge branch 'xfs-misc-fixes-for-4.3-3' into for-next 2015-08-25 10:13:35 +10:00
xfs_buf_item.h xfs: fix non-debug build warnings 2015-08-25 10:05:13 +10:00
xfs_buf.c xfs: updates for 4.3-rc1 2015-09-07 13:28:32 -07:00
xfs_buf.h dax: move DAX-related functions to a new header 2015-09-08 15:35:28 -07:00
xfs_dir2_readdir.c xfs: stop holding ILOCK over filldir callbacks 2015-08-19 10:33:00 +10:00
xfs_discard.c xfs: pass mp to XFS_WANT_CORRUPTED_GOTO 2015-02-23 22:39:08 +11:00
xfs_discard.h
xfs_dquot_item.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_dquot_item.h
xfs_dquot.c Merge branch 'xfs-misc-fixes-for-4.3-2' into for-next 2015-08-20 09:28:45 +10:00
xfs_dquot.h xfs: fix implicit bool to int conversion 2015-01-09 10:48:58 +11:00
xfs_error.c xfs: remove inst_t 2015-06-22 09:44:02 +10:00
xfs_error.h xfs: remove inst_t 2015-06-22 09:44:02 +10:00
xfs_export.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
xfs_export.h
xfs_extent_busy.c xfs: merge xfs_ag.h into xfs_format.h 2014-11-28 14:25:04 +11:00
xfs_extent_busy.h
xfs_extfree_item.c xfs: add helper to conditionally remove items from the AIL 2015-08-19 10:01:08 +10:00
xfs_extfree_item.h xfs: fix efi/efd error handling to avoid fs shutdown hangs 2015-08-19 09:51:16 +10:00
xfs_file.c xfs: always drain dio before extending aio write submission 2015-10-12 16:02:05 +11:00
xfs_filestream.c xfs: clean up XFS_MIN_FREELIST macros 2015-06-22 10:13:30 +10:00
xfs_filestream.h
xfs_fsops.c xfs: growfs not aware of sb_meta_uuid 2015-08-19 10:31:41 +10:00
xfs_fsops.h
xfs_globals.c xfs: export log_recovery_delay to delay mount time log recovery 2014-09-09 11:56:13 +10:00
xfs_icache.c xfs: add mssing inode cache attempts counter increment 2015-08-28 14:50:56 +10:00
xfs_icache.h xfs: merge xfs_ag.h into xfs_format.h 2014-11-28 14:25:04 +11:00
xfs_icreate_item.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_icreate_item.h
xfs_inode_item.c xfs: add helper to conditionally remove items from the AIL 2015-08-19 10:01:08 +10:00
xfs_inode_item.h
xfs_inode.c Merge branch 'xfs-misc-fixes-for-4.3-3' into for-next 2015-08-25 10:13:35 +10:00
xfs_inode.h xfs: clean up inode lockdep annotations 2015-08-19 10:32:49 +10:00
xfs_ioctl32.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
xfs_ioctl32.h xfs: compat_xfs_bstat does not have forkoff 2014-10-02 09:17:58 +10:00
xfs_ioctl.c xfs: saner xfs_trans_commit interface 2015-06-04 13:48:08 +10:00
xfs_ioctl.h
xfs_iomap.c xfs: add missing ilock around dio write last extent alignment 2015-10-12 15:34:20 +11:00
xfs_iomap.h xfs: pass a 64-bit count argument to xfs_iomap_write_unwritten 2015-01-09 10:48:12 +11:00
xfs_iops.c xfs: fix error gotos in xfs_setattr_nonsize 2015-08-28 14:51:10 +10:00
xfs_iops.h xfs: inodes are new until the dentry cache is set up 2015-02-23 22:38:08 +11:00
xfs_itable.c xfs: fix btree cursor error cleanups 2015-08-19 10:00:53 +10:00
xfs_itable.h xfs: bulkstat chunk formatting cursor is broken 2014-11-07 08:30:30 +11:00
xfs_linux.h xfs: remove xfs_caddr_t 2015-06-22 09:45:10 +10:00
xfs_log_cil.c xfs: close xc_cil list_empty() races with cil commit sequence 2015-07-29 11:51:01 +10:00
xfs_log_priv.h xfs: don't leave EFIs on AIL on mount failure 2015-08-19 09:58:36 +10:00
xfs_log_recover.c Merge branch 'xfs-misc-fixes-for-4.3-2' into for-next 2015-08-20 09:28:45 +10:00
xfs_log.c Merge branch 'xfs-efi-rework' into for-next 2015-08-19 10:10:47 +10:00
xfs_log.h xfs: don't leave EFIs on AIL on mount failure 2015-08-19 09:58:36 +10:00
xfs_message.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_message.h
xfs_mount.c xfs: clean up root inode properly on mount failure 2015-08-19 10:00:28 +10:00
xfs_mount.h Merge branch 'xfs-dax-support' into for-next 2015-06-04 13:01:49 +10:00
xfs_mru_cache.c xfs: xfs_mru_cache_insert() should use GFP_NOFS 2015-03-25 14:57:53 +11:00
xfs_mru_cache.h
xfs_pnfs.c xfs: add missing ilock around dio write last extent alignment 2015-10-12 15:34:20 +11:00
xfs_pnfs.h xfs: unlock i_mutex in xfs_break_layouts 2015-04-13 11:38:29 +10:00
xfs_qm_bhv.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_qm_syscalls.c xfs: saner xfs_trans_commit interface 2015-06-04 13:48:08 +10:00
xfs_qm.c xfs: saner xfs_trans_commit interface 2015-06-04 13:48:08 +10:00
xfs_qm.h xfs: Convert to using ->get_state callback 2015-03-04 16:06:36 +01:00
xfs_quota.h xfs: fix quota block reservation leak when tp allocates and frees blocks 2015-06-01 07:15:37 +10:00
xfs_quotaops.c xfs: Add support for Q_SETINFO 2015-03-04 16:06:38 +01:00
xfs_rtalloc.c xfs: add missing bmap cancel calls in error paths 2015-08-19 10:01:40 +10:00
xfs_rtalloc.h xfs: combine xfs_rtmodify_summary and xfs_rtget_summary 2014-09-09 11:58:42 +10:00
xfs_stats.c
xfs_stats.h
xfs_super.c xfs: updates for 4.3-rc1 2015-09-07 13:28:32 -07:00
xfs_super.h xfs: Remove icsb infrastructure 2015-02-23 21:22:31 +11:00
xfs_symlink.c Merge branch 'xfs-misc-fixes-for-4.3-2' into for-next 2015-08-20 09:28:45 +10:00
xfs_symlink.h
xfs_sysctl.c xfs: remove deprecated sysctls 2015-01-09 10:47:43 +11:00
xfs_sysctl.h xfs: export log_recovery_delay to delay mount time log recovery 2014-09-09 11:56:13 +10:00
xfs_sysfs.c xfs: export log_recovery_delay to delay mount time log recovery 2014-09-09 11:56:13 +10:00
xfs_sysfs.h xfs: add debug sysfs attribute set 2014-09-09 11:52:42 +10:00
xfs_trace.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_trace.h xfs: huge page fault support 2015-09-08 15:35:28 -07:00
xfs_trans_ail.c xfs: remove __psint_t and __psunsigned_t 2015-06-22 09:43:32 +10:00
xfs_trans_buf.c xfs: only trace buffer items if they exist 2015-02-10 09:23:40 +11:00
xfs_trans_dquot.c xfs: Clean up xfs_trans_dup_dqinfo 2015-06-01 10:50:00 +10:00
xfs_trans_extfree.c xfs: ensure EFD trans aborts on log recovery extent free failure 2015-08-19 09:51:43 +10:00
xfs_trans_inode.c xfs: move most of xfs_sb.h to xfs_format.h 2014-11-28 14:27:09 +11:00
xfs_trans_priv.h xfs: add helper to conditionally remove items from the AIL 2015-08-19 10:01:08 +10:00
xfs_trans.c xfs: return committed status from xfs_trans_roll() 2015-08-19 09:50:13 +10:00
xfs_trans.h xfs: ensure EFD trans aborts on log recovery extent free failure 2015-08-19 09:51:43 +10:00
xfs_xattr.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
xfs.h