linux/fs/btrfs
Filipe Manana 669249eea3 Btrfs: fix fsync race leading to invalid data after log replay
When the fsync callback (btrfs_sync_file) starts, it first waits for
the writeback of any dirty pages to start and finish without holding
the inode's mutex (to reduce contention). After this it acquires the
inode's mutex and repeats that process via btrfs_wait_ordered_range
only if we're doing a full sync (BTRFS_INODE_NEEDS_FULL_SYNC flag
is set on the inode).

This is not safe for a non full sync - we need to start and wait for
writeback to finish for any pages that might have been made dirty
before acquiring the inode's mutex and after that first step mentioned
before. Why this is needed is explained by the following comment added
to btrfs_sync_file:

  "Right before acquiring the inode's mutex, we might have new
   writes dirtying pages, which won't immediately start the
   respective ordered operations - that is done through the
   fill_delalloc callbacks invoked from the writepage and
   writepages address space operations. So make sure we start
   all ordered operations before starting to log our inode. Not
   doing this means that while logging the inode, writeback
   could start and invoke writepage/writepages, which would call
   the fill_delalloc callbacks (cow_file_range,
   submit_compressed_extents). These callbacks add first an
   extent map to the modified list of extents and then create
   the respective ordered operation, which means in
   tree-log.c:btrfs_log_inode() we might capture all existing
   ordered operations (with btrfs_get_logged_extents()) before
   the fill_delalloc callback adds its ordered operation, and by
   the time we visit the modified list of extent maps (with
   btrfs_log_changed_extents()), we see and process the extent
   map they created. We then use the extent map to construct a
   file extent item for logging without waiting for the
   respective ordered operation to finish - this file extent
   item points to a disk location that might not have yet been
   written to, containing random data - so after a crash a log
   replay will make our inode have file extent items that point
   to disk locations containing invalid data, as we returned
   success to userspace without waiting for the respective
   ordered operation to finish, because it wasn't captured by
   btrfs_get_logged_extents()."

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-19 06:57:50 -07:00
..
tests Btrfs: improve free space cache management and space allocation 2014-09-17 13:38:13 -07:00
acl.c btrfs: remove useless ACL check 2014-06-09 17:20:42 -07:00
async-thread.c Btrfs: implement repair function when direct read fails 2014-09-17 13:39:01 -07:00
async-thread.h Btrfs: implement repair function when direct read fails 2014-09-17 13:39:01 -07:00
backref.c Btrfs: make fiemap not blow when you have lots of snapshots 2014-09-17 13:38:24 -07:00
backref.h Btrfs: make fiemap not blow when you have lots of snapshots 2014-09-17 13:38:24 -07:00
btrfs_inode.h Btrfs: implement repair function when direct read fails 2014-09-17 13:39:01 -07:00
check-integrity.c Btrfs: fix wrong disk size when writing super blocks 2014-09-17 13:38:33 -07:00
check-integrity.h block: submit_bio_wait() conversions 2013-11-24 16:33:41 -07:00
compression.c btrfs: use DIV_ROUND_UP instead of open-coded variants 2014-09-17 13:37:17 -07:00
compression.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
ctree.c Btrfs: make btrfs_search_forward return with nodes unlocked 2014-09-17 13:38:02 -07:00
ctree.h Btrfs: implement repair function when direct read fails 2014-09-17 13:39:01 -07:00
delayed-inode.c btrfs: kill the key type accessor helpers 2014-09-17 13:37:12 -07:00
delayed-inode.h Btrfs: introduce the delayed inode ref deletion for the single link inode 2014-01-28 13:20:09 -08:00
delayed-ref.c Btrfs: rework qgroup accounting 2014-06-09 17:20:48 -07:00
delayed-ref.h Btrfs: rework qgroup accounting 2014-06-09 17:20:48 -07:00
dev-replace.c Btrfs: make the logic of source device removing more clear 2014-09-17 13:38:46 -07:00
dev-replace.h Btrfs: add new sources for device replace code 2012-12-12 17:15:41 -05:00
dir-item.c btrfs: kill the key type accessor helpers 2014-09-17 13:37:12 -07:00
disk-io.c Btrfs: implement repair function when direct read fails 2014-09-17 13:39:01 -07:00
disk-io.h Btrfs: implement repair function when direct read fails 2014-09-17 13:39:01 -07:00
export.c btrfs: kill the key type accessor helpers 2014-09-17 13:37:12 -07:00
export.h
extent_io.c Btrfs: cleanup the read failure record after write or when the inode is freeing 2014-09-17 13:39:02 -07:00
extent_io.h Btrfs: cleanup the read failure record after write or when the inode is freeing 2014-09-17 13:39:02 -07:00
extent_map.c Btrfs: fix NULL pointer crash when running balance and scrub concurrently 2014-06-19 14:20:55 -07:00
extent_map.h Btrfs: fix NULL pointer crash when running balance and scrub concurrently 2014-06-19 14:20:55 -07:00
extent-tree.c Btrfs: Fix misuse of chunk mutex 2014-09-17 13:38:42 -07:00
file-item.c Btrfs: load checksum data once when submitting a direct read io 2014-09-17 13:38:50 -07:00
file.c Btrfs: fix fsync race leading to invalid data after log replay 2014-09-19 06:57:50 -07:00
free-space-cache.c Btrfs: improve free space cache management and space allocation 2014-09-17 13:38:13 -07:00
free-space-cache.h Btrfs: remove path arg from btrfs_truncate_free_space_cache 2013-11-11 21:51:33 -05:00
hash.c btrfs: use PTR_ERR_OR_ZERO 2014-09-17 13:37:29 -07:00
hash.h Btrfs: fix btrfs boot when compiled as built-in 2014-01-28 13:20:31 -08:00
inode-item.c btrfs: kill the key type accessor helpers 2014-09-17 13:37:12 -07:00
inode-map.c btrfs: cleanup ino cache members of btrfs_root 2014-09-17 13:37:09 -07:00
inode-map.h
inode.c btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map 2014-09-18 07:14:46 -07:00
ioctl.c Btrfs: fix unprotected device's variants on 32bits machine 2014-09-17 13:38:38 -07:00
Kconfig Btrfs: fix btrfs boot when compiled as built-in 2014-01-28 13:20:31 -08:00
locking.c Btrfs: fix deadlocks with trylock on tree nodes 2014-06-19 14:19:55 -07:00
locking.h Btrfs: remove btrfs_try_spin_lock 2013-03-14 14:57:10 -04:00
lzo.c btrfs: use DIV_ROUND_UP instead of open-coded variants 2014-09-17 13:37:17 -07:00
Makefile Btrfs: add sanity tests for new qgroup accounting code 2014-06-09 17:20:49 -07:00
math.h Btrfs: cleanup duplicated division functions 2012-12-11 13:31:30 -05:00
ordered-data.c Btrfs: fix task hang under heavy compressed write 2014-08-24 07:17:02 -07:00
ordered-data.h btrfs: disable strict file flushes for renames and truncates 2014-08-15 07:43:42 -07:00
orphan.c btrfs: kill the key type accessor helpers 2014-09-17 13:37:12 -07:00
print-tree.c btrfs: use nodesize everywhere, kill leafsize 2014-09-17 13:37:14 -07:00
print-tree.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
props.c Btrfs: add support for inode properties 2014-01-28 13:20:24 -08:00
props.h Btrfs: add support for inode properties 2014-01-28 13:20:24 -08:00
qgroup.c btrfs: don't go readonly on existing qgroup items 2014-09-17 13:38:19 -07:00
qgroup.h btrfs: qgroup: account shared subtrees during snapshot delete 2014-08-15 07:43:14 -07:00
raid56.c btrfs: use DIV_ROUND_UP instead of open-coded variants 2014-09-17 13:37:17 -07:00
raid56.h Btrfs: RAID5 and RAID6 2013-02-01 14:24:23 -05:00
rcu-string.h
reada.c btrfs: use nodesize everywhere, kill leafsize 2014-09-17 13:37:14 -07:00
relocation.c btrfs: use nodesize everywhere, kill leafsize 2014-09-17 13:37:14 -07:00
root-tree.c Btrfs: use bitfield instead of integer data type for the some variants in btrfs_root 2014-06-09 17:20:40 -07:00
scrub.c Btrfs: modify clean_io_failure and make it suit direct io 2014-09-17 13:38:59 -07:00
send.c Btrfs: send, lower mem requirements for processing xattrs 2014-09-17 13:38:16 -07:00
send.h btrfs: make static code static & remove dead code 2013-05-06 15:55:23 -04:00
struct-funcs.c Btrfs: rewrite BTRFS_SETGET_FUNCS 2012-07-23 16:28:06 -04:00
super.c Btrfs: fix unprotected device list access when getting the fs information 2014-09-17 13:38:41 -07:00
sysfs.c btrfs: sysfs label interface should check for read only FS 2014-09-17 13:38:01 -07:00
sysfs.h btrfs: code optimize: BTRFS_ATTR_RW could set the mode 2014-09-17 13:37:59 -07:00
transaction.c Btrfs: fix wrong device bytes_used in the super block 2014-09-17 13:38:34 -07:00
transaction.h btrfs: disable strict file flushes for renames and truncates 2014-08-15 07:43:42 -07:00
tree-defrag.c Btrfs: use bitfield instead of integer data type for the some variants in btrfs_root 2014-06-09 17:20:40 -07:00
tree-log.c Btrfs: fix directory recovery from fsync log 2014-09-17 13:38:27 -07:00
tree-log.h Btrfs: fix fsync data loss after a ranged fsync 2014-09-08 13:56:43 -07:00
ulist.c Btrfs: do not export ulist functions 2014-01-29 07:06:27 -08:00
ulist.h Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch 2014-08-15 07:43:19 -07:00
uuid-tree.c Btrfs: make btrfs_search_forward return with nodes unlocked 2014-09-17 13:38:02 -07:00
volumes.c Btrfs: Set real mirror number for read operation on RAID0/5/6 2014-09-17 13:39:00 -07:00
volumes.h Btrfs: do file data check by sub-bio's self 2014-09-17 13:38:53 -07:00
xattr.c btrfs: kill the key type accessor helpers 2014-09-17 13:37:12 -07:00
xattr.h btrfs: use generic posix ACL infrastructure 2014-01-25 23:58:18 -05:00
zlib.c btrfs compression: merge inflate and deflate z_streams 2014-09-17 13:37:33 -07:00