linux/fs/xfs
Darrick J. Wong 5eda430000 xfs: mark speculative prealloc CoW fork extents unwritten
Christoph Hellwig pointed out that there's a potentially nasty race when
performing simultaneous nearby directio cow writes:

"Thread 1 writes a range from B to c

"                    B --------- C
                           p

"a little later thread 2 writes from A to B

"        A --------- B
               p

[editor's note: the 'p' denote cowextsize boundaries, which I added to
make this more clear]

"but the code preallocates beyond B into the range where thread
"1 has just written, but ->end_io hasn't been called yet.
"But once ->end_io is called thread 2 has already allocated
"up to the extent size hint into the write range of thread 1,
"so the end_io handler will splice the unintialized blocks from
"that preallocation back into the file right after B."

We can avoid this race by ensuring that thread 1 cannot accidentally
remap the blocks that thread 2 allocated (as part of speculative
preallocation) as part of t2's write preparation in t1's end_io handler.
The way we make this happen is by taking advantage of the unwritten
extent flag as an intermediate step.

Recall that when we begin the process of writing data to shared blocks,
we create a delayed allocation extent in the CoW fork:

D: --RRRRRRSSSRRRRRRRR---
C: ------DDDDDDD---------

When a thread prepares to CoW some dirty data out to disk, it will now
convert the delalloc reservation into an /unwritten/ allocated extent in
the cow fork.  The da conversion code tries to opportunistically
allocate as much of a (speculatively prealloc'd) extent as possible, so
we may end up allocating a larger extent than we're actually writing
out:

D: --RRRRRRSSSRRRRRRRR---
U: ------UUUUUUU---------

Next, we convert only the part of the extent that we're actively
planning to write to normal (i.e. not unwritten) status:

D: --RRRRRRSSSRRRRRRRR---
U: ------UURRUUU---------

If the write succeeds, the end_cow function will now scan the relevant
range of the CoW fork for real extents and remap only the real extents
into the data fork:

D: --RRRRRRRRSRRRRRRRR---
U: ------UU--UUU---------

This ensures that we never obliterate valid data fork extents with
unwritten blocks from the CoW fork.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-02-02 15:14:02 -08:00
..
libxfs xfs: allow unwritten extents in the CoW fork 2017-02-02 15:14:01 -08:00
Kconfig xfs: implement iomap based buffered write path 2016-06-21 09:53:44 +10:00
kmem.c xfs: improve kmem_realloc 2016-04-06 09:47:01 +10:00
kmem.h xfs: improve kmem_realloc 2016-04-06 09:47:01 +10:00
Makefile xfs: introduce the CoW fork 2016-10-04 18:06:40 -07:00
mrlock.h
uuid.c
uuid.h
xfs_acl.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-10-10 20:16:43 -07:00
xfs_acl.h xfs: Change how listxattr generates synthetic attributes 2015-12-06 21:34:16 -05:00
xfs_aops.c xfs: mark speculative prealloc CoW fork extents unwritten 2017-02-02 15:14:02 -08:00
xfs_aops.h xfs: use iomap_dio_rw 2016-11-30 14:37:15 +11:00
xfs_attr_inactive.c xfs: make several functions static 2016-06-01 17:38:15 +10:00
xfs_attr_list.c xfs: several xattr functions can be void 2016-12-05 12:32:14 +11:00
xfs_attr.h xfs: several xattr functions can be void 2016-12-05 12:32:14 +11:00
xfs_bmap_item.c xfs: when replaying bmap operations, don't let unlinked inodes get reaped 2016-10-04 11:05:44 -07:00
xfs_bmap_item.h xfs: log bmap intent items 2016-10-04 11:05:44 -07:00
xfs_bmap_util.c xfs: remove unused full argument from bmap 2017-01-30 16:32:25 -08:00
xfs_bmap_util.h xfs: remove unused full argument from bmap 2017-01-30 16:32:25 -08:00
xfs_buf_item.c xfs: normalize "infinite" retries in error configs 2016-09-14 07:51:30 +10:00
xfs_buf_item.h xfs: fix non-debug build warnings 2015-08-25 10:05:13 +10:00
xfs_buf.c xfs: clear _XBF_PAGES from buffers when readahead page 2017-01-25 20:24:57 -08:00
xfs_buf.h xfs: use rhashtable to track buffer cache 2016-12-07 17:36:36 +11:00
xfs_dir2_readdir.c xfs: remove i_iolock and use i_rwsem in the VFS inode instead 2016-11-30 14:33:25 +11:00
xfs_discard.c xfs: rmap btree requires more reserved free space 2016-08-03 11:38:24 +10:00
xfs_discard.h
xfs_dquot_item.c xfs: allocate log vector buffers outside CIL context lock 2016-07-22 09:52:35 +10:00
xfs_dquot_item.h
xfs_dquot.c xfs: don't wrap ID in xfs_dq_get_next_id 2017-01-17 11:43:38 -08:00
xfs_dquot.h
xfs_error.c Merge branch 'xfs-4.8-misc-fixes-3' into for-next 2016-07-20 11:51:08 +10:00
xfs_error.h xfs: simulate per-AG reservations being critically low 2016-10-05 16:26:31 -07:00
xfs_export.c xfs: abstract block export operations from nfsd layouts 2016-07-15 15:31:29 -04:00
xfs_export.h
xfs_extent_busy.c xfs: remote attribute blocks aren't really userdata 2016-09-26 08:21:28 +10:00
xfs_extent_busy.h
xfs_extfree_item.c xfs: remove unnecessary parentheses from log redo item recovery functions 2016-08-03 12:29:32 +10:00
xfs_extfree_item.h xfs: refactor redo intent item processing 2016-08-03 11:23:49 +10:00
xfs_file.c xfs: fail _dir_open when readahead fails 2017-02-02 15:13:58 -08:00
xfs_filestream.c Merge branch 'xfs-4.9-log-recovery-fixes' into for-next 2016-10-03 09:56:28 +11:00
xfs_filestream.h
xfs_fsops.c xfs: remove boilerplate around xfs_btree_init_block 2017-01-30 16:32:24 -08:00
xfs_fsops.h xfs: preallocate blocks for worst-case btree expansion 2016-10-05 16:26:27 -07:00
xfs_globals.c xfs: garbage collect old cowextsz reservations 2016-10-05 16:26:28 -07:00
xfs_icache.c xfs: sync eofblocks scans under iolock are livelock prone 2017-01-30 16:32:25 -08:00
xfs_icache.h xfs: sync eofblocks scans under iolock are livelock prone 2017-01-30 16:32:25 -08:00
xfs_icreate_item.c fs: xfs: xfs_icreate_item: constify xfs_item_ops structure 2016-11-28 14:57:42 +11:00
xfs_icreate_item.h
xfs_inode_item.c xfs: provide helper for counting extents from if_bytes 2016-11-08 12:59:42 +11:00
xfs_inode_item.h xfs: remove timestamps from incore inode 2016-02-09 16:54:58 +11:00
xfs_inode.c xfs: pull up iolock from xfs_free_eofblocks() 2017-01-30 16:32:25 -08:00
xfs_inode.h xfs: remove i_iolock and use i_rwsem in the VFS inode instead 2016-11-30 14:33:25 +11:00
xfs_ioctl32.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
xfs_ioctl32.h
xfs_ioctl.c xfs: remove unused full argument from bmap 2017-01-30 16:32:25 -08:00
xfs_ioctl.h xfs: don't pass ioflags around in the ioctl path 2016-07-20 11:29:35 +10:00
xfs_iomap.c xfs: mark speculative prealloc CoW fork extents unwritten 2017-02-02 15:14:02 -08:00
xfs_iomap.h iomap: constify struct iomap_ops 2017-01-30 16:32:25 -08:00
xfs_iops.c xfs: sanity check inode mode when creating new dentry 2017-01-17 11:42:22 -08:00
xfs_iops.h xfs: Propagate dentry down to inode_change_ok() 2016-09-22 10:56:19 +02:00
xfs_itable.c xfs: create a separate cow extent size hint for the allocator 2016-10-05 16:26:26 -07:00
xfs_itable.h
xfs_linux.h xfs: make the ASSERT() condition likely 2017-01-17 11:41:41 -08:00
xfs_log_cil.c xfs: allocate log vector buffers outside CIL context lock 2016-07-22 09:52:35 +10:00
xfs_log_priv.h xfs: rework log recovery to submit buffers on LSN boundaries 2016-09-26 08:22:16 +10:00
xfs_log_recover.c Merge branch 'xfs-4.10-misc-fixes-3' into for-next 2016-12-07 17:42:30 +11:00
xfs_log.c xfs: don't print warnings when xfs_log_force fails 2017-01-09 13:45:01 -08:00
xfs_log.h xfs: remove unused struct declarations 2017-01-30 16:32:25 -08:00
xfs_message.c xfs: more info from kmem deadlocks and high-level error msgs 2015-10-12 16:04:45 +11:00
xfs_message.h
xfs_mount.c xfs: use rhashtable to track buffer cache 2016-12-07 17:36:36 +11:00
xfs_mount.h xfs: use per-AG reservations for the finobt 2017-01-25 07:49:35 -08:00
xfs_mru_cache.c xfs: xfs_mru_cache_insert() should use GFP_NOFS 2015-03-25 14:57:53 +11:00
xfs_mru_cache.h
xfs_ondisk.h xfs: define the on-disk refcount btree format 2016-10-03 09:11:18 -07:00
xfs_pnfs.c xfs: remove i_iolock and use i_rwsem in the VFS inode instead 2016-11-30 14:33:25 +11:00
xfs_pnfs.h xfs: remove i_iolock and use i_rwsem in the VFS inode instead 2016-11-30 14:33:25 +11:00
xfs_qm_bhv.c
xfs_qm_syscalls.c xfs: better xfs_trans_alloc interface 2016-04-06 09:19:55 +10:00
xfs_qm.c xfs: prevent quotacheck from overloading inode lru 2017-01-27 09:32:30 -08:00
xfs_qm.h xfs: Split default quota limits by quota type 2016-02-08 11:27:55 +11:00
xfs_quota.h xfs: fix quota block reservation leak when tp allocates and frees blocks 2015-06-01 07:15:37 +10:00
xfs_quotaops.c xfs: wire up Q_XGETNEXTQUOTA / get_nextdqblk 2016-02-08 11:27:38 +11:00
xfs_refcount_item.c xfs: fix double-cleanup when CUI recovery fails 2017-01-03 18:39:32 -08:00
xfs_refcount_item.h xfs: log refcount intent items 2016-10-03 09:11:21 -07:00
xfs_reflink.c xfs: mark speculative prealloc CoW fork extents unwritten 2017-02-02 15:14:02 -08:00
xfs_reflink.h xfs: mark speculative prealloc CoW fork extents unwritten 2017-02-02 15:14:02 -08:00
xfs_rmap_item.c xfs: convert unwritten status of reverse mappings for shared files 2016-10-05 16:26:29 -07:00
xfs_rmap_item.h xfs: convert RUI log formats to use variable length arrays 2016-09-19 10:24:27 +10:00
xfs_rtalloc.c xfs: rename flist/free_list to dfops 2016-08-03 11:19:29 +10:00
xfs_rtalloc.h xfs: make several functions static 2016-06-01 17:38:15 +10:00
xfs_stats.c xfs: make xfs btree stats less huge 2016-12-05 14:38:58 +11:00
xfs_stats.h xfs: make xfs btree stats less huge 2016-12-05 14:38:58 +11:00
xfs_super.c xfs: deprecate barrier/nobarrier mount option 2016-12-09 16:49:54 +11:00
xfs_super.h xfs: quiesce the filesystem after recovery on readonly mount 2016-09-26 08:21:44 +10:00
xfs_symlink.c xfs: remove i_iolock and use i_rwsem in the VFS inode instead 2016-11-30 14:33:25 +11:00
xfs_symlink.h
xfs_sysctl.c xfs: garbage collect old cowextsz reservations 2016-10-05 16:26:28 -07:00
xfs_sysctl.h xfs: garbage collect old cowextsz reservations 2016-10-05 16:26:28 -07:00
xfs_sysfs.c xfs: fix max_retries _show and _store functions 2017-01-03 20:34:17 -08:00
xfs_sysfs.h xfs: configurable error behavior via sysfs 2016-05-18 10:58:51 +10:00
xfs_trace.c xfs: rework xfs_bmap_free callers to use xfs_defer_ops 2016-08-03 11:15:38 +10:00
xfs_trace.h xfs: mark speculative prealloc CoW fork extents unwritten 2017-02-02 15:14:02 -08:00
xfs_trans_ail.c xfs: Make xfsaild freezeable again 2016-02-08 14:59:07 +11:00
xfs_trans_bmap.c xfs: implement deferred bmbt map/unmap operations 2016-10-04 11:05:44 -07:00
xfs_trans_buf.c xfs: remove XBF_STALE flag wrapper macros 2016-02-10 15:01:11 +11:00
xfs_trans_dquot.c xfs: Split default quota limits by quota type 2016-02-08 11:27:55 +11:00
xfs_trans_extfree.c xfs: set up per-AG free space reservations 2016-09-19 10:30:52 +10:00
xfs_trans_inode.c fs: Replace current_fs_time() with current_time() 2016-09-27 21:06:22 -04:00
xfs_trans_priv.h xfs: add helper to conditionally remove items from the AIL 2015-08-19 10:01:08 +10:00
xfs_trans_refcount.c xfs: connect refcount adjust functions to upper layers 2016-10-03 09:11:22 -07:00
xfs_trans_rmap.c xfs: add shared rmap map/unmap/convert log item types 2016-10-05 16:26:29 -07:00
xfs_trans.c Merge branch 'xfs-4.9-reflink-prep' into for-next 2016-10-03 09:52:31 +11:00
xfs_trans.h xfs: remove unused struct declarations 2017-01-30 16:32:25 -08:00
xfs_xattr.c xfs: several xattr functions can be void 2016-12-05 12:32:14 +11:00
xfs.h