linux/fs/btrfs
Liu Bo 18e83ac75b Btrfs: fix unexpected EEXIST from btrfs_get_extent
This fixes a corner case that is caused by a race of dio write vs dio
read/write.

Here is how the race could happen.

Suppose that no extent map has been loaded into memory yet.
There is a file extent [0, 32K), two jobs are running concurrently
against it, t1 is doing dio write to [8K, 32K) and t2 is doing dio
read from [0, 4K) or [4K, 8K).

t1 goes ahead of t2 and splits em [0, 32K) to em [0K, 8K) and [8K 32K).

------------------------------------------------------
             t1                                t2
      btrfs_get_blocks_direct()         btrfs_get_blocks_direct()
       -> btrfs_get_extent()              -> btrfs_get_extent()
           -> lookup_extent_mapping()
           -> add_extent_mapping()            -> lookup_extent_mapping()
              # load [0, 32K)
       -> btrfs_new_extent_direct()
           -> btrfs_drop_extent_cache()
              # split [0, 32K) and
	      # drop [8K, 32K)
           -> add_extent_mapping()
              # add [8K, 32K)
                                              -> add_extent_mapping()
                                                 # handle -EEXIST when adding
                                                 # [0, 32K)
------------------------------------------------------
About how t2(dio read/write) runs into -EEXIST:

a) add_extent_mapping() gets -EEXIST for adding em [0, 32k),

b) search_extent_mapping() then returns [0, 8k) as the existing em,
   even though start == existing->start, em is [0, 32k) so that
   extent_map_end(em) > extent_map_end(existing), i.e. 32k > 8k,

c) then it goes thru merge_extent_mapping() which tries to add a [8k, 8k)
   (with a length 0) and returns -EEXIST as [8k, 32k) is already in tree,

d) so btrfs_get_extent() ends up returning -EEXIST to dio read/write,
   which is confusing applications.

Here I conclude all the possible situations,
1) start < existing->start

            +-----------+em+-----------+
+--prev---+ |     +-------------+      |
|         | |     |             |      |
+---------+ +     +---+existing++      ++
                +
                |
                +
             start

2) start == existing->start

      +------------em------------+
      |     +-------------+      |
      |     |             |      |
      +     +----existing-+      +
            |
            |
            +
         start

3) start > existing->start && start < (existing->start + existing->len)

      +------------em------------+
      |     +-------------+      |
      |     |             |      |
      +     +----existing-+      +
               |
               |
               +
             start

4) start >= (existing->start + existing->len)

+-----------+em+-----------+
|     +-------------+      | +--next---+
|     |             |      | |         |
+     +---+existing++      + +---------+
                      +
                      |
                      +
                   start

As we can see, it turns out that if start is within existing em (front
inclusive), then the existing em should be returned as is, otherwise,
we try our best to merge candidate em with sibling ems to form a
larger em (in order to reduce the total number of em).

Reported-by: David Vallender <david.vallender@landmark.co.uk>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22 16:08:21 +01:00
..
tests btrfs: Remove redundant FLAG_VACANCY 2018-01-22 16:08:14 +01:00
acl.c btrfs: preserve i_mode if __btrfs_set_acl() fails 2017-08-21 17:47:42 +02:00
async-thread.c Btrfs: fix confusing worker helper info in stacktrace 2017-10-30 12:27:57 +01:00
async-thread.h btrfs: constify tracepoint arguments 2017-08-16 14:19:53 +02:00
backref.c btrfs: make function update_share_count static 2018-01-22 16:08:14 +01:00
backref.h btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents 2017-11-01 20:45:34 +01:00
btrfs_inode.h btrfs: make the delalloc block rsv per inode 2017-11-01 20:45:35 +01:00
check-integrity.c btrfs: Fix bug for misused dev_t when lookup in dev state hash table. 2017-11-01 20:45:36 +01:00
check-integrity.h
compression.c btrfs: Remove redundant bio_get/set calls in compressed read/write paths 2018-01-22 16:08:19 +01:00
compression.h btrfs: compression: add helper for type to string conversion 2018-01-22 16:08:16 +01:00
ctree.c btrfs: Improve btrfs_search_slot description 2018-01-22 16:08:19 +01:00
ctree.h Btrfs: remove unused wait in btrfs_stripe_hash 2018-01-22 16:08:19 +01:00
dedupe.h
delayed-inode.c btrfs: Move checks from btrfs_wq_run_delayed_node to btrfs_balance_delayed_items 2018-01-22 16:08:11 +01:00
delayed-inode.h
delayed-ref.c Btrfs: add __init macro to btrfs init functions 2018-01-22 16:08:11 +01:00
delayed-ref.h Btrfs: add __init macro to btrfs init functions 2018-01-22 16:08:11 +01:00
dev-replace.c btrfs: cleanup device states define BTRFS_DEV_STATE_REPLACE_TGT 2018-01-22 16:08:15 +01:00
dev-replace.h
dir-item.c btrfs: Cleanup existing name_len checks 2018-01-22 16:08:12 +01:00
disk-io.c btrfs: fail mount when sb flag is not in BTRFS_SUPER_FLAG_SUPP 2018-01-22 16:08:21 +01:00
disk-io.h btrfs: sink get_extent parameter to read_extent_buffer_pages 2018-01-22 16:08:13 +01:00
export.c btrfs: Cleanup existing name_len checks 2018-01-22 16:08:12 +01:00
export.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
extent_io.c btrfs: sink unlock_extent parameter gfp_flags 2018-01-22 16:08:19 +01:00
extent_io.h btrfs: sink unlock_extent parameter gfp_flags 2018-01-22 16:08:19 +01:00
extent_map.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
extent_map.h btrfs: Remove redundant FLAG_VACANCY 2018-01-22 16:08:14 +01:00
extent-tree.c btrfs: Make btrfs_inode_rsv_release static 2018-01-22 16:08:21 +01:00
file-item.c Merge branch 'for-4.13-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux 2017-07-05 16:41:23 -07:00
file.c Btrfs: fix space leak after fallocate and zero range operations 2018-01-22 16:08:20 +01:00
free-space-cache.c btrfs: sink unlock_extent parameter gfp_flags 2018-01-22 16:08:19 +01:00
free-space-cache.h
free-space-tree.c btrfs: Clean up unused variables in free-space-tree.c 2017-10-30 12:27:59 +01:00
free-space-tree.h btrfs: expose internal free space tree routine only if sanity tests are enabled 2017-08-18 16:36:29 +02:00
hash.c crypto: Work around deallocated stack frame reference gcc bug on sparc. 2017-06-08 17:36:03 +08:00
hash.h
inode-item.c
inode-map.c Btrfs: rework outstanding_extents 2017-11-01 20:45:35 +01:00
inode-map.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
inode.c Btrfs: fix unexpected EEXIST from btrfs_get_extent 2018-01-22 16:08:21 +01:00
ioctl.c btrfs: use correct string length in DEV_INFO ioctl 2018-01-22 16:08:21 +01:00
Kconfig Btrfs: add a extent ref verify tool 2017-10-30 12:28:00 +01:00
locking.c
locking.h
lzo.c btrfs: allow to set compression level for zlib 2017-11-01 20:45:29 +01:00
Makefile Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux 2017-11-14 13:35:29 -08:00
math.h
ordered-data.c Btrfs: rework outstanding_extents 2017-11-01 20:45:35 +01:00
ordered-data.h btrfs: fix integer overflow in calc_reclaim_items_nr 2017-06-29 20:17:02 +02:00
orphan.c
print-tree.c Btrfs: add one more sanity check for shared ref type 2017-08-21 17:47:43 +02:00
print-tree.h btrfs: get fs_info from eb in btrfs_print_tree, remove argument 2017-08-16 16:12:03 +02:00
props.c btrfs: prop: use common helper for type to string conversion 2018-01-22 16:08:16 +01:00
props.h
qgroup.c btrfs: sink gfp parameter to clear_extent_bit 2018-01-22 16:08:12 +01:00
qgroup.h btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges 2017-06-29 20:17:02 +02:00
raid56.c Btrfs: raid56: fix race between merge_bio and rbio_orig_end_io 2018-01-22 16:08:21 +01:00
raid56.h
rcu-string.h
reada.c btrfs: remove unused member err from reada_extent 2017-06-19 18:25:59 +02:00
ref-verify.c btrfs: ref-verify: Remove unused parameter from walk_up_tree() to kill warning 2018-01-22 16:08:13 +01:00
ref-verify.h Btrfs: add a extent ref verify tool 2017-10-30 12:28:00 +01:00
relocation.c Btrfs: fix reported number of inode blocks after buffered append writes 2017-11-15 17:27:46 +01:00
root-tree.c btrfs: Cleanup existing name_len checks 2018-01-22 16:08:12 +01:00
scrub.c btrfs: rename btrfs_device::scrub_device to scrub_ctx 2018-01-22 16:08:20 +01:00
send.c btrfs: Cleanup existing name_len checks 2018-01-22 16:08:12 +01:00
send.h btrfs: fix send ioctl on 32bit with 64bit kernel 2017-10-30 12:27:59 +01:00
struct-funcs.c btrfs: struct-funcs, constify readers 2017-08-16 14:19:53 +02:00
super.c btrfS: collapse btrfs_handle_error() into __btrfs_handle_fs_error() 2018-01-22 16:08:20 +01:00
sysfs.c Btrfs: add __init macro to btrfs init functions 2018-01-22 16:08:11 +01:00
sysfs.h Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux 2017-11-14 13:35:29 -08:00
transaction.c btrfs: simplify mutex unlocking code in btrfs_commit_transaction 2018-01-22 16:08:20 +01:00
transaction.h btrfs: reorder btrfs_transaction members for better packing 2018-01-22 16:08:14 +01:00
tree-checker.c btrfs: tree-check: reduce stack consumption in check_dir_item 2018-01-22 16:08:21 +01:00
tree-checker.h btrfs: tree-checker: Fix false panic for sanity test 2017-11-28 14:59:09 +01:00
tree-defrag.c
tree-log.c btrfs: btrfs_inode_log_parent should use defined inode_only values. 2018-01-22 16:08:14 +01:00
tree-log.h
ulist.c
ulist.h
uuid-tree.c
volumes.c btrfs: Remove unused readahead spinlock 2018-01-22 16:08:21 +01:00
volumes.h btrfs: Remove unused readahead spinlock 2018-01-22 16:08:21 +01:00
xattr.c btrfs: Cleanup existing name_len checks 2018-01-22 16:08:12 +01:00
xattr.h
zlib.c btrfs: allow to set compression level for zlib 2017-11-01 20:45:29 +01:00
zstd.c btrfs: move some zstd work data from stack to workspace 2018-01-22 16:08:14 +01:00