Commit Graph

252 Commits

Author SHA1 Message Date
Marcos Paulo de Souza
f8ee80de7b btrfs: remove unneeded return variable in btrfs_lookup_file_extent
We can return from btrfs_search_slot directly which also shows that it
follows the same return value convention.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-23 13:19:01 +02:00
David Sterba
1a9fd4172d btrfs: fix typos in comments
Fix typos that have snuck in since the last round. Found by codespell.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-22 14:11:57 +02:00
Filipe Manana
ea7036de0d btrfs: fix fsync failure and transaction abort after writes to prealloc extents
When doing a series of partial writes to different ranges of preallocated
extents with transaction commits and fsyncs in between, we can end up with
a checksum items in a log tree. This causes an fsync to fail with -EIO and
abort the transaction, turning the filesystem to RO mode, when syncing the
log.

For this to happen, we need to have a full fsync of a file following one
or more fast fsyncs.

The following example reproduces the problem and explains how it happens:

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt

  # Create our test file with 2 preallocated extents. Leave a 1M hole
  # between them to ensure that we get two file extent items that will
  # never be merged into a single one. The extents are contiguous on disk,
  # which will later result in the checksums for their data to be merged
  # into a single checksum item in the csums btree.
  #
  $ xfs_io -f \
           -c "falloc 0 1M" \
           -c "falloc 3M 3M" \
           /mnt/foobar

  # Now write to the second extent and leave only 1M of it as unwritten,
  # which corresponds to the file range [4M, 5M[.
  #
  # Then fsync the file to flush delalloc and to clear full sync flag from
  # the inode, so that a future fsync will use the fast code path.
  #
  # After the writeback triggered by the fsync we have 3 file extent items
  # that point to the second extent we previously allocated:
  #
  # 1) One file extent item of type BTRFS_FILE_EXTENT_REG that covers the
  #    file range [3M, 4M[
  #
  # 2) One file extent item of type BTRFS_FILE_EXTENT_PREALLOC that covers
  #    the file range [4M, 5M[
  #
  # 3) One file extent item of type BTRFS_FILE_EXTENT_REG that covers the
  #    file range [5M, 6M[
  #
  # All these file extent items have a generation of 6, which is the ID of
  # the transaction where they were created. The split of the original file
  # extent item is done at btrfs_mark_extent_written() when ordered extents
  # complete for the file ranges [3M, 4M[ and [5M, 6M[.
  #
  $ xfs_io -c "pwrite -S 0xab 3M 1M" \
           -c "pwrite -S 0xef 5M 1M" \
           -c "fsync" \
           /mnt/foobar

  # Commit the current transaction. This wipes out the log tree created by
  # the previous fsync.
  sync

  # Now write to the unwritten range of the second extent we allocated,
  # corresponding to the file range [4M, 5M[, and fsync the file, which
  # triggers the fast fsync code path.
  #
  # The fast fsync code path sees that there is a new extent map covering
  # the file range [4M, 5M[ and therefore it will log a checksum item
  # covering the range [1M, 2M[ of the second extent we allocated.
  #
  # Also, after the fsync finishes we no longer have the 3 file extent
  # items that pointed to 3 sections of the second extent we allocated.
  # Instead we end up with a single file extent item pointing to the whole
  # extent, with a type of BTRFS_FILE_EXTENT_REG and a generation of 7 (the
  # current transaction ID). This is due to the file extent item merging we
  # do when completing ordered extents into ranges that point to unwritten
  # (preallocated) extents. This merging is done at
  # btrfs_mark_extent_written().
  #
  $ xfs_io -c "pwrite -S 0xcd 4M 1M" \
           -c "fsync" \
           /mnt/foobar

  # Now do some write to our file outside the range of the second extent
  # that we allocated with fallocate() and truncate the file size from 6M
  # down to 5M.
  #
  # The truncate operation sets the full sync runtime flag on the inode,
  # forcing the next fsync to use the slow code path. It also changes the
  # length of the second file extent item so that it represents the file
  # range [3M, 5M[ and not the range [3M, 6M[ anymore.
  #
  # Finally fsync the file. Since this is a fsync that triggers the slow
  # code path, it will remove all items associated to the inode from the
  # log tree and then it will scan for file extent items in the
  # fs/subvolume tree that have a generation matching the current
  # transaction ID, which is 7. This means it will log 2 file extent
  # items:
  #
  # 1) One for the first extent we allocated, covering the file range
  #    [0, 1M[
  #
  # 2) Another for the first 2M of the second extent we allocated,
  #    covering the file range [3M, 5M[
  #
  # When logging the first file extent item we log a single checksum item
  # that has all the checksums for the entire extent.
  #
  # When logging the second file extent item, we also lookup for the
  # checksums that are associated with the range [0, 2M[ of the second
  # extent we allocated (file range [3M, 5M[), and then we log them with
  # btrfs_csum_file_blocks(). However that results in ending up with a log
  # that has two checksum items with ranges that overlap:
  #
  # 1) One for the range [1M, 2M[ of the second extent we allocated,
  #    corresponding to the file range [4M, 5M[, which we logged in the
  #    previous fsync that used the fast code path;
  #
  # 2) One for the ranges [0, 1M[ and [0, 2M[ of the first and second
  #    extents, respectively, corresponding to the files ranges [0, 1M[
  #    and [3M, 5M[. This one was added during this last fsync that uses
  #    the slow code path and overlaps with the previous one logged by
  #    the previous fast fsync.
  #
  # This happens because when logging the checksums for the second
  # extent, we notice they start at an offset that matches the end of the
  # checksums item that we logged for the first extent, and because both
  # extents are contiguous on disk, btrfs_csum_file_blocks() decides to
  # extend that existing checksums item and append the checksums for the
  # second extent to this item. The end result is we end up with two
  # checksum items in the log tree that have overlapping ranges, as
  # listed before, resulting in the fsync to fail with -EIO and aborting
  # the transaction, turning the filesystem into RO mode.
  #
  $ xfs_io -c "pwrite -S 0xff 0 1M" \
           -c "truncate 5M" \
           -c "fsync" \
           /mnt/foobar
  fsync: Input/output error

After running the example, dmesg/syslog shows the tree checker complained
about the checksum items with overlapping ranges and we aborted the
transaction:

  $ dmesg
  (...)
  [756289.557487] BTRFS critical (device sdc): corrupt leaf: root=18446744073709551610 block=30720000 slot=5, csum end range (16777216) goes beyond the start range (15728640) of the next csum item
  [756289.560583] BTRFS info (device sdc): leaf 30720000 gen 7 total ptrs 7 free space 11677 owner 18446744073709551610
  [756289.562435] BTRFS info (device sdc): refs 2 lock_owner 0 current 2303929
  [756289.563654] 	item 0 key (257 1 0) itemoff 16123 itemsize 160
  [756289.564649] 		inode generation 6 size 5242880 mode 100600
  [756289.565636] 	item 1 key (257 12 256) itemoff 16107 itemsize 16
  [756289.566694] 	item 2 key (257 108 0) itemoff 16054 itemsize 53
  [756289.567725] 		extent data disk bytenr 13631488 nr 1048576
  [756289.568697] 		extent data offset 0 nr 1048576 ram 1048576
  [756289.569689] 	item 3 key (257 108 1048576) itemoff 16001 itemsize 53
  [756289.570682] 		extent data disk bytenr 0 nr 0
  [756289.571363] 		extent data offset 0 nr 2097152 ram 2097152
  [756289.572213] 	item 4 key (257 108 3145728) itemoff 15948 itemsize 53
  [756289.573246] 		extent data disk bytenr 14680064 nr 3145728
  [756289.574121] 		extent data offset 0 nr 2097152 ram 3145728
  [756289.574993] 	item 5 key (18446744073709551606 128 13631488) itemoff 12876 itemsize 3072
  [756289.576113] 	item 6 key (18446744073709551606 128 15728640) itemoff 11852 itemsize 1024
  [756289.577286] BTRFS error (device sdc): block=30720000 write time tree block corruption detected
  [756289.578644] ------------[ cut here ]------------
  [756289.579376] WARNING: CPU: 0 PID: 2303929 at fs/btrfs/disk-io.c:465 csum_one_extent_buffer+0xed/0x100 [btrfs]
  [756289.580857] Modules linked in: btrfs dm_zero dm_dust loop dm_snapshot (...)
  [756289.591534] CPU: 0 PID: 2303929 Comm: xfs_io Tainted: G        W         5.12.0-rc8-btrfs-next-87 #1
  [756289.592580] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
  [756289.594161] RIP: 0010:csum_one_extent_buffer+0xed/0x100 [btrfs]
  [756289.595122] Code: 5d c3 e8 76 60 (...)
  [756289.597509] RSP: 0018:ffffb51b416cb898 EFLAGS: 00010282
  [756289.598142] RAX: 0000000000000000 RBX: fffff02b8a365bc0 RCX: 0000000000000000
  [756289.598970] RDX: 0000000000000000 RSI: ffffffffa9112421 RDI: 00000000ffffffff
  [756289.599798] RBP: ffffa06500880000 R08: 0000000000000000 R09: 0000000000000000
  [756289.600619] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
  [756289.601456] R13: ffffa0652b1d8980 R14: ffffa06500880000 R15: 0000000000000000
  [756289.602278] FS:  00007f08b23c9800(0000) GS:ffffa0682be00000(0000) knlGS:0000000000000000
  [756289.603217] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [756289.603892] CR2: 00005652f32d0138 CR3: 000000025d616003 CR4: 0000000000370ef0
  [756289.604725] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [756289.605563] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [756289.606400] Call Trace:
  [756289.606704]  btree_csum_one_bio+0x244/0x2b0 [btrfs]
  [756289.607313]  btrfs_submit_metadata_bio+0xb7/0x100 [btrfs]
  [756289.608040]  submit_one_bio+0x61/0x70 [btrfs]
  [756289.608587]  btree_write_cache_pages+0x587/0x610 [btrfs]
  [756289.609258]  ? free_debug_processing+0x1d5/0x240
  [756289.609812]  ? __module_address+0x28/0xf0
  [756289.610298]  ? lock_acquire+0x1a0/0x3e0
  [756289.610754]  ? lock_acquired+0x19f/0x430
  [756289.611220]  ? lock_acquire+0x1a0/0x3e0
  [756289.611675]  do_writepages+0x43/0xf0
  [756289.612101]  ? __filemap_fdatawrite_range+0xa4/0x100
  [756289.612800]  __filemap_fdatawrite_range+0xc5/0x100
  [756289.613393]  btrfs_write_marked_extents+0x68/0x160 [btrfs]
  [756289.614085]  btrfs_sync_log+0x21c/0xf20 [btrfs]
  [756289.614661]  ? finish_wait+0x90/0x90
  [756289.615096]  ? __mutex_unlock_slowpath+0x45/0x2a0
  [756289.615661]  ? btrfs_log_inode_parent+0x3c9/0xdc0 [btrfs]
  [756289.616338]  ? lock_acquire+0x1a0/0x3e0
  [756289.616801]  ? lock_acquired+0x19f/0x430
  [756289.617284]  ? lock_acquire+0x1a0/0x3e0
  [756289.617750]  ? lock_release+0x214/0x470
  [756289.618221]  ? lock_acquired+0x19f/0x430
  [756289.618704]  ? dput+0x20/0x4a0
  [756289.619079]  ? dput+0x20/0x4a0
  [756289.619452]  ? lockref_put_or_lock+0x9/0x30
  [756289.619969]  ? lock_release+0x214/0x470
  [756289.620445]  ? lock_release+0x214/0x470
  [756289.620924]  ? lock_release+0x214/0x470
  [756289.621415]  btrfs_sync_file+0x46a/0x5b0 [btrfs]
  [756289.621982]  do_fsync+0x38/0x70
  [756289.622395]  __x64_sys_fsync+0x10/0x20
  [756289.622907]  do_syscall_64+0x33/0x80
  [756289.623438]  entry_SYSCALL_64_after_hwframe+0x44/0xae
  [756289.624063] RIP: 0033:0x7f08b27fbb7b
  [756289.624588] Code: 0f 05 48 3d 00 (...)
  [756289.626760] RSP: 002b:00007ffe2583f940 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
  [756289.627639] RAX: ffffffffffffffda RBX: 00005652f32cd0f0 RCX: 00007f08b27fbb7b
  [756289.628464] RDX: 00005652f32cbca0 RSI: 00005652f32cd110 RDI: 0000000000000003
  [756289.629323] RBP: 00005652f32cd110 R08: 0000000000000000 R09: 00007f08b28c4be0
  [756289.630172] R10: fffffffffffff39a R11: 0000000000000293 R12: 0000000000000001
  [756289.631007] R13: 00005652f32cd0f0 R14: 0000000000000001 R15: 00005652f32cc480
  [756289.631819] irq event stamp: 0
  [756289.632188] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  [756289.632911] hardirqs last disabled at (0): [<ffffffffa7e97c29>] copy_process+0x879/0x1cc0
  [756289.633893] softirqs last  enabled at (0): [<ffffffffa7e97c29>] copy_process+0x879/0x1cc0
  [756289.634871] softirqs last disabled at (0): [<0000000000000000>] 0x0
  [756289.635606] ---[ end trace 0a039fdc16ff3fef ]---
  [756289.636179] BTRFS: error (device sdc) in btrfs_sync_log:3136: errno=-5 IO failure
  [756289.637082] BTRFS info (device sdc): forced readonly

Having checksum items covering ranges that overlap is dangerous as in some
cases it can lead to having extent ranges for which we miss checksums
after log replay or getting the wrong checksum item. There were some fixes
in the past for bugs that resulted in this problem, and were explained and
fixed by the following commits:

  27b9a8122f ("Btrfs: fix csum tree corruption, duplicate and outdated checksums")
  b84b8390d6 ("Btrfs: fix file read corruption after extent cloning and fsync")
  40e046acbd ("Btrfs: fix missing data checksums after replaying a log tree")
  e289f03ea7 ("btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents")

Fix the issue by making btrfs_csum_file_blocks() taking into account the
start offset of the next checksum item when it decides to extend an
existing checksum item, so that it never extends the checksum to end at a
range that goes beyond the start range of the next checksum item.

When we can not access the next checksum item without releasing the path,
simply drop the optimization of extending the previous checksum item and
fallback to inserting a new checksum item - this happens rarely and the
optimization is not significant enough for a log tree in order to justify
the extra complexity, as it would only save a few bytes (the size of a
struct btrfs_item) of leaf space.

This behaviour is only needed when inserting into a log tree because
for the regular checksums tree we never have a case where we try to
insert a range of checksums that overlap with a range that was previously
inserted.

A test case for fstests will follow soon.

Reported-by: Philipp Fent <fent@in.tum.de>
Link: https://lore.kernel.org/linux-btrfs/93c4600e-5263-5cba-adf0-6f47526e7561@in.tum.de/
CC: stable@vger.kernel.org # 5.4+
Tested-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 23:31:36 +02:00
Josef Bacik
b86652be7c btrfs: fix error handling in btrfs_del_csums
Error injection stress would sometimes fail with checksums on disk that
did not have a corresponding extent.  This occurred because the pattern
in btrfs_del_csums was

	while (1) {
		ret = btrfs_search_slot();
		if (ret < 0)
			break;
	}
	ret = 0;
out:
	btrfs_free_path(path);
	return ret;

If we got an error from btrfs_search_slot we'd clear the error because
we were breaking instead of goto out.  Instead of using goto out, simply
handle the cases where we may leave a random value in ret, and get rid
of the

	ret = 0;
out:

pattern and simply allow break to have the proper error reporting.  With
this fix we properly abort the transaction and do not commit thinking we
successfully deleted the csum.

Reviewed-by: Qu Wenruo <wqu@suse.com>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 23:30:49 +02:00
Johannes Thumshirn
cea628008f btrfs: remove duplicated in_range() macro
The in_range() macro is defined twice in btrfs' source, once in ctree.h
and once in misc.h.

Remove the definition in ctree.h and include misc.h in the files depending
on it.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-04-19 17:25:16 +02:00
Nikolay Borisov
ca4207ae13 btrfs: fix function description formats in file-item.c
This fixes following W=1 warnings:

fs/btrfs/file-item.c:27: warning: Cannot understand  * @inode:  the inode we want to update the disk_i_size for
 on line 27 - I thought it was a doc line
fs/btrfs/file-item.c:65: warning: Cannot understand  * @inode - the inode we're modifying
 on line 65 - I thought it was a doc line
fs/btrfs/file-item.c:91: warning: Cannot understand  * @inode - the inode we're modifying
 on line 91 - I thought it was a doc line

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-08 22:58:53 +01:00
ethanwu
9a66497156 btrfs: correctly calculate item size used when item key collision happens
Item key collision is allowed for some item types, like dir item and
inode refs, but the overall item size is limited by the nodesize.

item size(ins_len) passed from btrfs_insert_empty_items to
btrfs_search_slot already contains size of btrfs_item.

When btrfs_search_slot reaches leaf, we'll see if we need to split leaf.
The check incorrectly reports that split leaf is required, because
it treats the space required by the newly inserted item as
btrfs_item + item data. But in item key collision case, only item data
is actually needed, the newly inserted item could merge into the existing
one. No new btrfs_item will be inserted.

And split_leaf return EOVERFLOW from following code:

  if (extend && data_size + btrfs_item_size_nr(l, slot) +
      sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(fs_info))
      return -EOVERFLOW;

In most cases, when callers receive EOVERFLOW, they either return
this error or handle in different ways. For example, in normal dir item
creation the userspace will get errno EOVERFLOW; in inode ref case
INODE_EXTREF is used instead.

However, this is not the case for rename. To avoid the unrecoverable
situation in rename, btrfs_check_dir_item_collision is called in
early phase of rename. In this function, when item key collision is
detected leaf space is checked:

  data_size = sizeof(*di) + name_len;
  if (data_size + btrfs_item_size_nr(leaf, slot) +
      sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(root->fs_info))

the sizeof(struct btrfs_item) + btrfs_item_size_nr(leaf, slot) here
refers to existing item size, the condition here correctly calculates
the needed size for collision case rather than the wrong case above.

The consequence of inconsistent condition check between
btrfs_check_dir_item_collision and btrfs_search_slot when item key
collision happens is that we might pass check here but fail
later at btrfs_search_slot. Rename fails and volume is forced readonly

  [436149.586170] ------------[ cut here ]------------
  [436149.586173] BTRFS: Transaction aborted (error -75)
  [436149.586196] WARNING: CPU: 0 PID: 16733 at fs/btrfs/inode.c:9870 btrfs_rename2+0x1938/0x1b70 [btrfs]
  [436149.586227] CPU: 0 PID: 16733 Comm: python Tainted: G      D           4.18.0-rc5+ #1
  [436149.586228] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
  [436149.586238] RIP: 0010:btrfs_rename2+0x1938/0x1b70 [btrfs]
  [436149.586254] RSP: 0018:ffffa327043a7ce0 EFLAGS: 00010286
  [436149.586255] RAX: 0000000000000000 RBX: ffff8d8a17d13340 RCX: 0000000000000006
  [436149.586256] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff8d8a7fc164b0
  [436149.586257] RBP: ffffa327043a7da0 R08: 0000000000000560 R09: 7265282064657472
  [436149.586258] R10: 0000000000000000 R11: 6361736e61725420 R12: ffff8d8a0d4c8b08
  [436149.586258] R13: ffff8d8a17d13340 R14: ffff8d8a33e0a540 R15: 00000000000001fe
  [436149.586260] FS:  00007fa313933740(0000) GS:ffff8d8a7fc00000(0000) knlGS:0000000000000000
  [436149.586261] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [436149.586262] CR2: 000055d8d9c9a720 CR3: 000000007aae0003 CR4: 00000000003606f0
  [436149.586295] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [436149.586296] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [436149.586296] Call Trace:
  [436149.586311]  vfs_rename+0x383/0x920
  [436149.586313]  ? vfs_rename+0x383/0x920
  [436149.586315]  do_renameat2+0x4ca/0x590
  [436149.586317]  __x64_sys_rename+0x20/0x30
  [436149.586324]  do_syscall_64+0x5a/0x120
  [436149.586330]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [436149.586332] RIP: 0033:0x7fa3133b1d37
  [436149.586348] RSP: 002b:00007fffd3e43908 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
  [436149.586349] RAX: ffffffffffffffda RBX: 00007fa3133b1d30 RCX: 00007fa3133b1d37
  [436149.586350] RDX: 000055d8da06b5e0 RSI: 000055d8da225d60 RDI: 000055d8da2c4da0
  [436149.586351] RBP: 000055d8da2252f0 R08: 00007fa313782000 R09: 00000000000177e0
  [436149.586351] R10: 000055d8da010680 R11: 0000000000000246 R12: 00007fa313840b00

Thanks to Hans van Kranenburg for information about crc32 hash collision
tools, I was able to reproduce the dir item collision with following
python script.
https://github.com/wutzuchieh/misc_tools/blob/master/crc32_forge.py Run
it under a btrfs volume will trigger the abort transaction.  It simply
creates files and rename them to forged names that leads to
hash collision.

There are two ways to fix this. One is to simply revert the patch
878f2d2cb3 ("Btrfs: fix max dir item size calculation") to make the
condition consistent although that patch is correct about the size.

The other way is to handle the leaf space check correctly when
collision happens. I prefer the second one since it correct leaf
space check in collision case. This fix will not account
sizeof(struct btrfs_item) when the item already exists.
There are two places where ins_len doesn't contain
sizeof(struct btrfs_item), however.

  1. extent-tree.c: lookup_inline_extent_backref
  2. file-item.c: btrfs_csum_file_blocks

to make the logic of btrfs_search_slot more clear, we add a flag
search_for_extension in btrfs_path.

This flag indicates that ins_len passed to btrfs_search_slot doesn't
contain sizeof(struct btrfs_item). When key exists, btrfs_search_slot
will use the actual size needed to calculate the required leaf space.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: ethanwu <ethanwu@synology.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-18 14:50:00 +01:00
Qu Wenruo
6275193ef1 btrfs: refactor btrfs_lookup_bio_sums to handle out-of-order bvecs
Refactor btrfs_lookup_bio_sums() by:

- Remove the @file_offset parameter
  There are two factors making the @file_offset parameter useless:

  * For csum lookup in csum tree, file offset makes no sense
    We only need disk_bytenr, which is unrelated to file_offset

  * page_offset (file offset) of each bvec is not contiguous.
    Pages can be added to the same bio as long as their on-disk bytenr
    is contiguous, meaning we could have pages at different file offsets
    in the same bio.

  Thus passing file_offset makes no sense any more.
  The only user of file_offset is for data reloc inode, we will use
  a new function, search_file_offset_in_bio(), to handle it.

- Extract the csum tree lookup into search_csum_tree()
  The new function will handle the csum search in csum tree.
  The return value is the same as btrfs_find_ordered_sum(), returning
  the number of found sectors which have checksum.

- Change how we do the main loop
  The only needed info from bio is:
  * the on-disk bytenr
  * the length

  After extracting the above info, we can do the search without bio
  at all, which makes the main loop much simpler:

	for (cur_disk_bytenr = orig_disk_bytenr;
	     cur_disk_bytenr < orig_disk_bytenr + orig_len;
	     cur_disk_bytenr += count * sectorsize) {

		/* Lookup csum tree */
		count = search_csum_tree(fs_info, path, cur_disk_bytenr,
					 search_len, csum_dst);
		if (!count) {
			/* Csum hole handling */
		}
	}

- Use single variable as the source to calculate all other offsets
  Instead of all different type of variables, we use only one main
  variable, cur_disk_bytenr, which represents the current disk bytenr.

  All involved values can be calculated from that variable, and
  all those variable will only be visible in the inner loop.

The above refactoring makes btrfs_lookup_bio_sums() way more robust than
it used to be, especially related to the file offset lookup.  Now
file_offset lookup is only related to data reloc inode, otherwise we
don't need to bother file_offset at all.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-09 19:16:11 +01:00
Qu Wenruo
9e46458a7c btrfs: remove btrfs_find_ordered_sum call from btrfs_lookup_bio_sums
The function btrfs_lookup_bio_sums() is only called for read bios.
While btrfs_find_ordered_sum() is to search ordered extent sums, which
is only for write path.

This means to read a page we either:

- Submit read bio if it's not uptodate
  This means we only need to search csum tree for checksums.

- The page is already uptodate
  It can be marked uptodate for previous read, or being marked dirty.
  As we always mark page uptodate for dirty page.
  In that case, we don't need to submit read bio at all, thus no need
  to search any checksums.

Remove the btrfs_find_ordered_sum() call in btrfs_lookup_bio_sums().
And since btrfs_lookup_bio_sums() is the only caller for
btrfs_find_ordered_sum(), also remove the implementation.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-09 19:16:10 +01:00
David Sterba
1201b58b67 btrfs: drop casts of bio bi_sector
Since commit 72deb455b5 ("block: remove CONFIG_LBDAF") (5.2) the
sector_t type is u64 on all arches and configs so we don't need to
typecast it.  It used to be unsigned long and the result of sector size
shifts were not guaranteed to fit in the type.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-09 19:16:05 +01:00
Qu Wenruo
35478d053a btrfs: use nodesize to determine if we need readahead in btrfs_lookup_bio_sums
In btrfs_lookup_bio_sums() if the bio is pretty large, we want to
start readahead in the csum tree.

However the threshold is an immediate number, (PAGE_SIZE * 8), from the
initial btrfs merge.

The meaning of the value is pretty hard to guess, especially when the
immediate number is from the times when 4K sectorsize was the default
and only CRC32C was supported.

For the most common btrfs setup, CRC32 csum and 4K sectorsize,
it means just 32K read would kick readahead, while the csum itself is
only 32 bytes in size.

Now let's be more reasonable by taking both csum size and node size into
consideration.

If the csum size for the bio is larger than one leaf, then we kick the
readahead.  This means for current default btrfs, the threshold will be
16M.

This change should not change performance observably, thus this is
mostly a readability enhancement.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08 15:54:14 +01:00
Nikolay Borisov
76aea53796 btrfs: make btrfs_inode_safe_disk_i_size_write take btrfs_inode
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08 15:54:10 +01:00
Josef Bacik
b9729ce014 btrfs: locking: rip out path->leave_spinning
We no longer distinguish between blocking and spinning, so rip out all
this code.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08 15:54:02 +01:00
David Sterba
713cebfb98 btrfs: remove unnecessary local variables for checksum size
Remove local variable that is then used just once and replace it with
fs_info::csum_size.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08 15:54:00 +01:00
David Sterba
223486c27b btrfs: switch cached fs_info::csum_size from u16 to u32
The fs_info value is 32bit, switch also the local u16 variables. This
leads to a better assembly code generated due to movzwl.

This simple change will shave some bytes on x86_64 and release config:

   text    data     bss     dec     hex filename
1090000   17980   14912 1122892  11224c pre/btrfs.ko
1089794   17980   14912 1122686  11217e post/btrfs.ko

DELTA: -206

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08 15:53:59 +01:00
David Sterba
55fc29bed8 btrfs: use cached value of fs_info::csum_size everywhere
btrfs_get_16 shows up in the system performance profiles (helper to read
16bit values from on-disk structures). This is partially because of the
checksum size that's frequently read along with data reads/writes, other
u16 uses are from item size or directory entries.

Replace all calls to btrfs_super_csum_size by the cached value from
fs_info.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08 15:53:59 +01:00
David Sterba
265fdfa6ce btrfs: replace s_blocksize_bits with fs_info::sectorsize_bits
The value of super_block::s_blocksize_bits is the same as
fs_info::sectorsize_bits, but we don't need to do the extra dereferences
in many functions and storing the bits as u32 (in fs_info) generates
shorter assembly.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08 15:53:58 +01:00
David Sterba
ab108d992b btrfs: use precalculated sectorsize_bits from fs_info
We do a lot of calculations where we divide or multiply by sectorsize.
We also know and make sure that sectorsize is a power of two, so this
means all divisions can be turned to shifts and avoid eg. expensive
u64/u32 divisions.

The type is u32 as it's more register friendly on x86_64 compared to u8
and the resulting assembly is smaller (movzbl vs movl).

There's also superblock s_blocksize_bits but it's usually one more
pointer dereference farther than fs_info.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08 15:53:57 +01:00
Josef Bacik
42437a6386 btrfs: introduce mount option rescue=ignorebadroots
In the face of extent root corruption, or any other core fs wide root
corruption we will fail to mount the file system.  This makes recovery
kind of a pain, because you need to fall back to userspace tools to
scrape off data.  Instead provide a mechanism to gracefully handle bad
roots, so we can at least mount read-only and possibly recover data from
the file system.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08 15:53:41 +01:00
Josef Bacik
334c16d82c btrfs: push the NODATASUM check into btrfs_lookup_bio_sums
When we move to being able to handle NULL csum_roots it'll be cleaner to
just check in btrfs_lookup_bio_sums instead of at all of the caller
locations, so push the NODATASUM check into it as well so it's unified.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08 15:53:39 +01:00
Nikolay Borisov
3c5641a83a btrfs: make btrfs_find_ordered_sum take btrfs_inode
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:12:19 +02:00
Nikolay Borisov
bd242a08a6 btrfs: make btrfs_csum_one_bio takae btrfs_inode
Will enable converting btrfs_submit_compressed_write to btrfs_inode more
easily.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:26 +02:00
Nikolay Borisov
c350437269 btrfs: make btrfs_lookup_ordered_extent take btrfs_inode
It doesn't use the generic vfs inode for anything use btrfs_inode
directly.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:25 +02:00
Filipe Manana
918cdf4423 btrfs: remove useless 'fail_unlock' label from btrfs_csum_file_blocks()
The label 'fail_unlock' is pointless, all it does is to jump to the label
'out', so just remove it.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25 11:25:37 +02:00
Filipe Manana
7e4a3f7ed5 btrfs: do not ignore error from btrfs_next_leaf() when inserting checksums
We are currently treating any non-zero return value from btrfs_next_leaf()
the same way, by going to the code that inserts a new checksum item in the
tree. However if btrfs_next_leaf() returns an error (a value < 0), we
should just stop and return the error, and not behave as if nothing has
happened, since in that case we do not have a way to know if there is a
next leaf or we are currently at the last leaf already.

So fix that by returning the error from btrfs_next_leaf().

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25 11:25:37 +02:00
Filipe Manana
cc14600c15 btrfs: make checksum item extension more efficient
When we want to add checksums into the checksums tree, or a log tree, we
try whenever possible to extend existing checksum items, as this helps
reduce amount of metadata space used, since adding a new item uses extra
metadata space for a btrfs_item structure (25 bytes).

However we have two inefficiencies in the current approach:

1) After finding a checksum item that covers a range with an end offset
   that matches the start offset of the checksum range we want to insert,
   we release the search path populated by btrfs_lookup_csum() and then
   do another COW search on tree with the goal of getting additional
   space for at least one checksum. Doing this path release and then
   searching again is a waste of time because very often the leaf already
   has enough free space for at least one more checksum;

2) After the COW search that guarantees we get free space in the leaf for
   at least one more checksum, we end up not doing the extension of the
   previous checksum item, and fallback to insertion of a new checksum
   item, if the leaf doesn't have an amount of free space larger then the
   space required for 2 checksums plus one btrfs_item structure - this is
   pointless for two reasons:

   a) We want to extend an existing item, so we don't need to account for
      a btrfs_item structure (25 bytes);

   b) We made the COW search with an insertion size for 1 single checksum,
      so if the leaf ends up with a free space amount smaller then 2
      checksums plus the size of a btrfs_item structure, we give up on the
      extension of the existing item and jump to the 'insert' label, where
      we end up releasing the path and then doing yet another search to
      insert a new checksum item for a single checksum.

Fix these inefficiencies by doing the following:

- For case 1), before releasing the path just check if the leaf already
  has enough space for at least 1 more checksum, and if it does, jump
  directly to the item extension code, with releasing our current path,
  which was already COWed by btrfs_lookup_csum();

- For case 2), fix the logic so that for item extension we require only
  that the leaf has enough free space for 1 checksum, and not a minimum
  of 2 checksums plus space for a btrfs_item structure.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25 11:25:37 +02:00
Eric Biggers
fd08001f17 btrfs: use crypto_shash_digest() instead of open coding
Use crypto_shash_digest() instead of crypto_shash_init() +
crypto_shash_update() + crypto_shash_final().  This is more efficient.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25 11:25:27 +02:00
Omar Sandoval
fb30f4707d btrfs: clarify btrfs_lookup_bio_sums documentation
Fix a couple of issues in the btrfs_lookup_bio_sums documentation:

* The bio doesn't need to be a btrfs_io_bio if dst was provided. Move
  the declaration in the code to make that clear, too.
* dst must be large enough to hold nblocks * csum_size, not just
  csum_size.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25 11:25:25 +02:00
Filipe Manana
a5eeb3d17b btrfs: add helper to get the end offset of a file extent item
Getting the end offset for a file extent item requires a bit of code since
the extent can be either inline or regular/prealloc. There are some places
all over the code base that open code this logic and in another patch
later in this series it will be needed again. Therefore encapsulate this
logic in a helper function and use it.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:56 +01:00
Josef Bacik
41a2ee75aa btrfs: introduce per-inode file extent tree
In order to keep track of where we have file extents on disk, and thus
where it is safe to adjust the i_size to, we need to have a tree in
place to keep track of the contiguous areas we have file extents for.

Add helpers to use this tree, as it's not required for NO_HOLES file
systems.  We will use this by setting DIRTY for areas we know we have
file extent item's set, and clearing it when we remove file extent items
for truncation.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:24 +01:00
David Sterba
4babad1019 btrfs: safely advance counter when looking up bio csums
Dan's smatch tool reports

  fs/btrfs/file-item.c:295 btrfs_lookup_bio_sums()
  warn: should this be 'count == -1'

which points to the while (count--) loop. With count == 0 the check
itself could decrement it to -1. There's a WARN_ON a few lines below
that has never been seen in practice though.

It turns out that the value of page_bytes_left matches the count (by
sectorsize multiples). The loop never reaches the state where count
would go to -1, because page_bytes_left == 0 is found first and this
breaks out.

For clarity, use only plain check on count (and only for positive
value), decrement safely inside the loop. Any other discrepancy after
the whole bio list processing should be reported by the exising
WARN_ON_ONCE as well.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20 16:41:01 +01:00
Omar Sandoval
bffe633e00 btrfs: make btrfs_ordered_extent naming consistent with btrfs_file_extent_item
ordered->start, ordered->len, and ordered->disk_len correspond to
fi->disk_bytenr, fi->num_bytes, and fi->disk_num_bytes, respectively.
It's confusing to translate between the two naming schemes. Since a
btrfs_ordered_extent is basically a pending btrfs_file_extent_item,
let's make the former use the naming from the latter.

Note that I didn't touch the names in tracepoints just in case there are
scripts depending on the current naming.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20 16:40:54 +01:00
Omar Sandoval
db72e47f79 btrfs: get rid of at_offset parameter to btrfs_lookup_bio_sums()
We can encode this in the offset parameter: -1 means use the page
offsets, anything else is a valid offset.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20 16:40:54 +01:00
Omar Sandoval
e62958fce9 btrfs: get rid of trivial __btrfs_lookup_bio_sums() wrappers
Currently, we have two wrappers for __btrfs_lookup_bio_sums():
btrfs_lookup_bio_sums_dio(), which is used for direct I/O, and
btrfs_lookup_bio_sums(), which is used everywhere else. The only
difference is that the _dio variant looks up csums starting at the given
offset instead of using the page index, which isn't actually direct
I/O-specific. Let's clean up the signature and return value of
__btrfs_lookup_bio_sums(), rename it to btrfs_lookup_bio_sums(), and get
rid of the trivial helpers.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20 16:40:53 +01:00
Filipe Manana
40e046acbd Btrfs: fix missing data checksums after replaying a log tree
When logging a file that has shared extents (reflinked with other files or
with itself), we can end up logging multiple checksum items that cover
overlapping ranges. This confuses the search for checksums at log replay
time causing some checksums to never be added to the fs/subvolume tree.

Consider the following example of a file that shares the same extent at
offsets 0 and 256Kb:

   [ bytenr 13893632, offset 64Kb, len 64Kb  ]
   0                                         64Kb

   [ bytenr 13631488, offset 64Kb, len 192Kb ]
   64Kb                                      256Kb

   [ bytenr 13893632, offset 0, len 256Kb    ]
   256Kb                                     512Kb

When logging the inode, at tree-log.c:copy_items(), when processing the
file extent item at offset 0, we log a checksum item covering the range
13959168 to 14024704, which corresponds to 13893632 + 64Kb and 13893632 +
64Kb + 64Kb, respectively.

Later when processing the extent item at offset 256K, we log the checksums
for the range from 13893632 to 14155776 (which corresponds to 13893632 +
256Kb). These checksums get merged with the checksum item for the range
from 13631488 to 13893632 (13631488 + 256Kb), logged by a previous fsync.
So after this we get the two following checksum items in the log tree:

   (...)
   item 6 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 3095 itemsize 512
           range start 13631488 end 14155776 length 524288
   item 7 key (EXTENT_CSUM EXTENT_CSUM 13959168) itemoff 3031 itemsize 64
           range start 13959168 end 14024704 length 65536

The first one covers the range from the second one, they overlap.

So far this does not cause a problem after replaying the log, because
when replaying the file extent item for offset 256K, we copy all the
checksums for the extent 13893632 from the log tree to the fs/subvolume
tree, since searching for an checksum item for bytenr 13893632 leaves us
at the first checksum item, which covers the whole range of the extent.

However if we write 64Kb to file offset 256Kb for example, we will
not be able to find and copy the checksums for the last 128Kb of the
extent at bytenr 13893632, referenced by the file range 384Kb to 512Kb.

After writing 64Kb into file offset 256Kb we get the following extent
layout for our file:

   [ bytenr 13893632, offset 64K, len 64Kb   ]
   0                                         64Kb

   [ bytenr 13631488, offset 64Kb, len 192Kb ]
   64Kb                                      256Kb

   [ bytenr 14155776, offset 0, len 64Kb     ]
   256Kb                                     320Kb

   [ bytenr 13893632, offset 64Kb, len 192Kb ]
   320Kb                                     512Kb

After fsync'ing the file, if we have a power failure and then mount
the filesystem to replay the log, the following happens:

1) When replaying the file extent item for file offset 320Kb, we
   lookup for the checksums for the extent range from 13959168
   (13893632 + 64Kb) to 14155776 (13893632 + 256Kb), through a call
   to btrfs_lookup_csums_range();

2) btrfs_lookup_csums_range() finds the checksum item that starts
   precisely at offset 13959168 (item 7 in the log tree, shown before);

3) However that checksum item only covers 64Kb of data, and not 192Kb
   of data;

4) As a result only the checksums for the first 64Kb of data referenced
   by the file extent item are found and copied to the fs/subvolume tree.
   The remaining 128Kb of data, file range 384Kb to 512Kb, doesn't get
   the corresponding data checksums found and copied to the fs/subvolume
   tree.

5) After replaying the log userspace will not be able to read the file
   range from 384Kb to 512Kb, because the checksums are missing and
   resulting in an -EIO error.

The following steps reproduce this scenario:

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt/sdc

  $ xfs_io -f -c "pwrite -S 0xa3 0 256K" /mnt/sdc/foobar
  $ xfs_io -c "fsync" /mnt/sdc/foobar
  $ xfs_io -c "pwrite -S 0xc7 256K 256K" /mnt/sdc/foobar

  $ xfs_io -c "reflink /mnt/sdc/foobar 320K 0 64K" /mnt/sdc/foobar
  $ xfs_io -c "fsync" /mnt/sdc/foobar

  $ xfs_io -c "pwrite -S 0xe5 256K 64K" /mnt/sdc/foobar
  $ xfs_io -c "fsync" /mnt/sdc/foobar

  <power failure>

  $ mount /dev/sdc /mnt/sdc
  $ md5sum /mnt/sdc/foobar
  md5sum: /mnt/sdc/foobar: Input/output error

  $ dmesg | tail
  [165305.003464] BTRFS info (device sdc): no csum found for inode 257 start 401408
  [165305.004014] BTRFS info (device sdc): no csum found for inode 257 start 405504
  [165305.004559] BTRFS info (device sdc): no csum found for inode 257 start 409600
  [165305.005101] BTRFS info (device sdc): no csum found for inode 257 start 413696
  [165305.005627] BTRFS info (device sdc): no csum found for inode 257 start 417792
  [165305.006134] BTRFS info (device sdc): no csum found for inode 257 start 421888
  [165305.006625] BTRFS info (device sdc): no csum found for inode 257 start 425984
  [165305.007278] BTRFS info (device sdc): no csum found for inode 257 start 430080
  [165305.008248] BTRFS warning (device sdc): csum failed root 5 ino 257 off 393216 csum 0x1337385e expected csum 0x00000000 mirror 1
  [165305.009550] BTRFS warning (device sdc): csum failed root 5 ino 257 off 393216 csum 0x1337385e expected csum 0x00000000 mirror 1

Fix this simply by deleting first any checksums, from the log tree, for the
range of the extent we are logging at copy_items(). This ensures we do not
get checksum items in the log tree that have overlapping ranges.

This is a long time issue that has been present since we have the clone
(and deduplication) ioctl, and can happen both when an extent is shared
between different files and within the same file.

A test case for fstests follows soon.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-12-13 14:09:24 +01:00
David Sterba
a019e9e197 btrfs: remove extent_map::bdev
We can now remove the bdev from extent_map. Previous patches made sure
that bio_set_dev is correctly in all places and that we don't need to
grab it from latest_bdev or pass it around inside the extent map.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 23:43:44 +01:00
Johannes Thumshirn
d5178578bc btrfs: directly call into crypto framework for checksumming
Currently btrfs_csum_data() relied on the crc32c() wrapper around the
crypto framework for calculating the CRCs.

As we have our own crypto_shash structure in the fs_info now, we can
directly call into the crypto framework without going trough the wrapper.

This way we can even remove the btrfs_csum_data() and btrfs_csum_final()
wrappers.

The module dependency on crc32c is preserved via MODULE_SOFTDEP("pre:
crc32c"), which was previously provided by LIBCRC32C config option doing
the same.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:35:02 +02:00
Johannes Thumshirn
10fe6ca80d btrfs: don't assume compressed_bio sums to be 4 bytes
BTRFS has the implicit assumption that a checksum in compressed_bio is 4
bytes. While this is true for CRC32C, it is not for any other checksum.

Change the data type to be a byte array and adjust loop index calculation
accordingly.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:35:01 +02:00
Johannes Thumshirn
1e25a2e3ca btrfs: don't assume ordered sums to be 4 bytes
BTRFS has the implicit assumption that a checksum in btrfs_orderd_sums
is 4 bytes. While this is true for CRC32C, it is not for any other
checksum.

Change the data type to be a byte array and adjust loop index
calculation accordingly.

This includes moving the adjustment of 'index' by 'ins_size' in
btrfs_csum_file_blocks() before dividing 'ins_size' by the checksum
size, because before this patch the 'sums' member of 'struct
btrfs_ordered_sum' was 4 Bytes in size and afterwards it is only one
byte.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:35:00 +02:00
Nikolay Borisov
51d470aeaa btrfs: Document btrfs_csum_one_bio
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:52 +02:00
David Sterba
c71dd88007 btrfs: remove unused parameter fs_info from btrfs_extend_item
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:50 +02:00
David Sterba
78ac4f9e5a btrfs: remove unused parameter fs_info from btrfs_truncate_item
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:50 +02:00
Nikolay Borisov
f9756261c2 btrfs: Remove redundant inode argument from btrfs_add_ordered_sum
Ordered csums are keyed off of a btrfs_ordered_extent, which already has
a reference to the inode. This implies that an explicit inode argument
is redundant. So remove it.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:40 +02:00
David Sterba
e902baac65 btrfs: get fs_info from eb in btrfs_leaf_free_space
We can read fs_info from extent buffer and can drop it from the
parameters.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:30 +02:00
Johannes Thumshirn
443c8e2a83 btrfs: reduce kmap_atomic time for checksumming
Since commit c40a3d38af ("Btrfs: Compute and look up csums based on
sectorsized blocks") we do a kmap_atomic() on the contents of a bvec.
The code before c40a3d38af had the kmap region just around the
checksumming too.

kmap_atomic() in turn does a preempt_disable() and pagefault_disable(),
so we shouldn't map the data for too long. Reduce the time the bvec's
page is mapped to when we actually need it.

Performance wise it doesn't seem to make a huge difference with a 2 vcpu VM
on a /dev/zram device:

       vanilla      patched      delta
write  17.4MiB/s    17.8MiB/s	+0.4MiB/s (+2%)
read   40.6MiB/s    41.5MiB/s   +0.9MiB/s (+2%)

The following fio job profile was used in the comparision:

[global]
ioengine=libaio
direct=1
sync=1
norandommap
time_based
runtime=10m
size=100m
group_reporting
numjobs=2

[test]
filename=/mnt/test/fio
rw=randrw
rwmixread=70

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29 19:02:19 +02:00
Nikolay Borisov
a3d46aea46 btrfs: Switch memory allocations in async csum calculation path to kvmalloc
Recent multi-page biovec rework allowed creation of bios that can span
large regions - up to 128 megabytes in the case of btrfs. OTOH btrfs'
submission path currently allocates a contiguous array to store the
checksums for every bio submitted. This means we can request up to
(128mb / BTRFS_SECTOR_SIZE) * 4 bytes + 32bytes of memory from kmalloc.
On busy systems with possibly fragmented memory said kmalloc can fail
which will trigger BUG_ON due to improper error handling IO submission
context in btrfs.

Until error handling is improved or bios in btrfs limited to a more
manageable size (e.g. 1m) let's use kvmalloc to fallback to vmalloc for
such large allocations. There is no hard requirement that the memory
allocated for checksums during IO submission has to be contiguous, but
this is a simple fix that does not require several non-contiguous
allocations.

For small writes this is unlikely to have any visible effect since
kmalloc will still satisfy allocation requests as usual. For larger
requests the code will just fallback to vmalloc.

We've performed evaluation on several workload types and there was no
significant difference kmalloc vs kvmalloc.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-25 14:17:38 +02:00
David Sterba
b3a0dd50c3 btrfs: replace btrfs_io_bio::end_io with a simple helper
The end_io callback implemented as btrfs_io_bio_endio_readpage only
calls kfree. Also the callback is set only in case the csum buffer is
allocated and not pointing to the inline buffer. We can use that
information to drop the indirection and call a helper that will free the
csums only in the right case.

This shrinks struct btrfs_io_bio by 8 bytes.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17 14:51:40 +01:00
David Sterba
31fecccbd7 btrfs: remove redundant csum buffer in btrfs_io_bio
The io_bio tracks checksums and has an inline buffer or an allocated
one. And there's a third member that points to the right one, but we
don't need to use an extra pointer for that. Let btrfs_io_bio::csum
point to the right buffer and check that the inline buffer is not
accidentally freed.

This shrinks struct btrfs_io_bio by 8 bytes.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-12-17 14:51:40 +01:00
David Sterba
3ffbd68c48 btrfs: simplify pointer chasing of local fs_info variables
Functions that get btrfs inode can simply reach the fs_info by
dereferencing the root and this looks a bit more straightforward
compared to the btrfs_sb(...) indirection.

If the transaction handle is available and not NULL it's used instead.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:43 +02:00
Qu Wenruo
e41ca58974 btrfs: Get rid of the confusing btrfs_file_extent_inline_len
We used to call btrfs_file_extent_inline_len() to get the uncompressed
data size of an inlined extent.

However this function is hiding evil, for compressed extent, it has no
choice but to directly read out ram_bytes from btrfs_file_extent_item.
While for uncompressed extent, it uses item size to calculate the real
data size, and ignoring ram_bytes completely.

In fact, for corrupted ram_bytes, due to above behavior kernel
btrfs_print_leaf() can't even print correct ram_bytes to expose the bug.

Since we have the tree-checker to verify all EXTENT_DATA, such mismatch
can be detected pretty easily, thus we can trust ram_bytes without the
evil btrfs_file_extent_inline_len().

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-08-06 13:12:38 +02:00
David Sterba
c1d7c514f7 btrfs: replace GPL boilerplate by SPDX -- sources
Remove GPL boilerplate text (long, short, one-line) and keep the rest,
ie. personal, company or original source copyright statements. Add the
SPDX header.

Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-12 16:29:51 +02:00
Linus Torvalds
8c27cb3566 Merge branch 'for-4.13-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
 "The core updates improve error handling (mostly related to bios), with
  the usual incremental work on the GFP_NOFS (mis)use removal,
  refactoring or cleanups. Except the two top patches, all have been in
  for-next for an extensive amount of time.

  User visible changes:

   - statx support

   - quota override tunable

   - improved compression thresholds

   - obsoleted mount option alloc_start

  Core updates:

   - bio-related updates:
       - faster bio cloning
       - no allocation failures
       - preallocated flush bios

   - more kvzalloc use, memalloc_nofs protections, GFP_NOFS updates

   - prep work for btree_inode removal

   - dir-item validation

   - qgoup fixes and updates

   - cleanups:
       - removed unused struct members, unused code, refactoring
       - argument refactoring (fs_info/root, caller -> callee sink)
       - SEARCH_TREE ioctl docs"

* 'for-4.13-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (115 commits)
  btrfs: Remove false alert when fiemap range is smaller than on-disk extent
  btrfs: Don't clear SGID when inheriting ACLs
  btrfs: fix integer overflow in calc_reclaim_items_nr
  btrfs: scrub: fix target device intialization while setting up scrub context
  btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges
  btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered write and quotas being enabled
  btrfs: qgroup: Return actually freed bytes for qgroup release or free data
  btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function
  btrfs: qgroup: Add quick exit for non-fs extents
  Btrfs: rework delayed ref total_bytes_pinned accounting
  Btrfs: return old and new total ref mods when adding delayed refs
  Btrfs: always account pinned bytes when dropping a tree block ref
  Btrfs: update total_bytes_pinned when pinning down extents
  Btrfs: make BUG_ON() in add_pinned_bytes() an ASSERT()
  Btrfs: make add_pinned_bytes() take an s64 num_bytes instead of u64
  btrfs: fix validation of XATTR_ITEM dir items
  btrfs: Verify dir_item in iterate_object_props
  btrfs: Check name_len before in btrfs_del_root_ref
  btrfs: Check name_len before reading btrfs_get_name
  ...
2017-07-05 16:41:23 -07:00
Liu Bo
17347cec15 Btrfs: change how we iterate bios in endio
Since dio submit has used bio_clone_fast, the submitted bio may not have a
reliable bi_vcnt, for the bio vector iterations in checksum related
functions, bio->bi_iter is not modified yet and it's safe to use
bio_for_each_segment, while for those bio vector iterations in dio read's
endio, we now save a copy of bvec_iter in struct btrfs_io_bio when cloning
bios and use the helper __bio_for_each_segment with the saved bvec_iter to
access each bvec.

Also for dio reads which don't get split, we also need to save a copy of
bio iterator in btrfs_bio_clone to let __bio_for_each_segments to access
each bvec in dio read's endio.  Note that it doesn't affect other calls of
btrfs_bio_clone() because they don't need to use this iterator.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-06-19 18:25:59 +02:00
Christoph Hellwig
4e4cbee93d block: switch bios to blk_status_t
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09 09:27:32 -06:00
Chris Mason
e9f467d028 Merge branch 'for-chris-4.11-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.11 2017-02-28 14:35:09 -08:00
Nikolay Borisov
9cdc512410 btrfs: Make btrfs_extent_item_to_extent_map take btrfs_inode
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-28 11:30:11 +01:00
Nikolay Borisov
70ddc553b5 btrfs: make btrfs_is_free_space_inode take btrfs_inode
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-28 11:30:06 +01:00
Filipe Manana
6f546216e9 Btrfs: bulk delete checksum items in the same leaf
Very often we have the checksums for an extent spread in multiple items
in the checksums tree, and currently the algorithm to delete them starts
by looking for them one by one and then deleting them one by one, which
is not optimal since each deletion involves shifting all the other items
in the leaf and when the leaf reaches some low threshold, to move items
off the leaf into its left and right neighbor leafs. Also, after each
item deletion we release our search path and start a new search for other
checksums items.

So optimize this by deleting in bulk all the items in the same leaf that
contain checksums for the extent being freed.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
2017-02-24 00:36:55 +00:00
Nikolay Borisov
4a0cc7ca6c btrfs: Make btrfs_ino take a struct btrfs_inode
Currently btrfs_ino takes a struct inode and this causes a lot of
internal btrfs functions which consume this ino to take a VFS inode,
rather than btrfs' own struct btrfs_inode. In order to fix this "leak"
of VFS structs into the internals of btrfs first it's necessary to
eliminate all uses of struct inode for the purpose of inode. This patch
does that by using BTRFS_I to convert an inode to btrfs_inode. With
this problem eliminated subsequent patches will start eliminating the
passing of struct inode altogether, eventually resulting in a lot cleaner
code.

Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
[ fix btrfs_get_extent tracepoint prototype ]
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:51 +01:00
Seraphime Kirkovski
50d0446e68 Btrfs: code cleanup min/max -> min_t/max_t
This cleans up the cases where the min/max macros were used with a cast
rather than using directly min_t/max_t.

Signed-off-by: Seraphime Kirkovski <kirkseraph@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:50 +01:00
Jeff Mahoney
2ff7e61e0d btrfs: take an fs_info directly when the root is not used otherwise
There are loads of functions in btrfs that accept a root parameter
but only use it to obtain an fs_info pointer.  Let's convert those to
just accept an fs_info pointer directly.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:59 +01:00
Jeff Mahoney
0b246afa62 btrfs: root->fs_info cleanup, add fs_info convenience variables
In routines where someptr->fs_info is referenced multiple times, we
introduce a convenience variable.  This makes the code considerably
more readable.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:59 +01:00
Jeff Mahoney
da17066c40 btrfs: pull node/sector/stripe sizes out of root and into fs_info
We track the node sizes per-root, but they never vary from the values
in the superblock.  This patch messes with the 80-column style a bit,
but subsequent patches to factor out root->fs_info into a convenience
variable fix it up again.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:58 +01:00
Jeff Mahoney
5b4aacefb8 btrfs: call functions that overwrite their root parameter with fs_info
There are 11 functions that accept a root parameter and immediately
overwrite it.  We can pass those an fs_info pointer instead.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:57 +01:00
Christoph Hellwig
4989d277eb btrfs: refactor __btrfs_lookup_bio_sums to use bio_for_each_segment_all
Rework the loop a little bit to use the generic bio_for_each_segment_all
helper for iterating over the bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30 13:45:20 +01:00
Christoph Hellwig
6cd7ce4935 btrfs: don't access the bio directly in btrfs_csum_one_bio
Use bio_for_each_segment_all to iterate over the segments instead.
This requires a bit of reshuffling so that we only lookup up the ordered
item once inside the bio_for_each_segment_all loop.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30 13:45:20 +01:00
David Sterba
b159fa2808 btrfs: remove constant parameter to memset_extent_buffer and rename it
The only memset we do is to 0, so sink the parameter to the function and
simplify all calls. Rename the function to reflect the behaviour.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30 13:45:17 +01:00
Chris Mason
42049bf60d Btrfs: fix __MAX_CSUM_ITEMS
Jeff Mahoney's cleanup commit (14a1e067b4) wasn't correct for csums on
machines where the pagesize >= metadata blocksize.

This just reverts the relevant hunks to bring the old math back.

Signed-off-by: Chris Mason <clm@fb.com>
2016-08-03 14:08:37 -07:00
Jeff Mahoney
66642832f0 btrfs: btrfs_abort_transaction, drop root parameter
__btrfs_abort_transaction doesn't use its root parameter except to
obtain an fs_info pointer.  We can obtain that from trans->root->fs_info
for now and from trans->fs_info in a later patch.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26 13:54:26 +02:00
Jeff Mahoney
14a1e067b4 btrfs: introduce BTRFS_MAX_ITEM_SIZE
We use BTRFS_LEAF_DATA_SIZE - sizeof(struct btrfs_item) in
several places.  This introduces a BTRFS_MAX_ITEM_SIZE macro to do the
same.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26 13:54:24 +02:00
Nikolay Borisov
aee133afcd btrfs: Ratelimit "no csum found" info message
Recently during a crash it became apparent that this particular message
can be printed so many times that it causes the softlockup detector to
trigger. Fix it by ratelimiting it.

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26 13:52:25 +02:00
David Sterba
ceeb0ae7bf btrfs: sink gfp parameter to set_extent_bits
All callers pass GFP_NOFS.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-29 11:01:47 +02:00
Kirill A. Shutemov
09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Chris Mason
389f239c53 btrfs: make sure we stay inside the bvec during __btrfs_lookup_bio_sums
Commit c40a3d38af (Btrfs: Compute and look up csums based on
sectorsized blocks) changes around how we walk the bios while looking up
crcs.  There's an inner loop that is jumping to the next bvec based on
sectors and before it derefs the next bvec, it needs to make sure we're
still in the bio.

In this case, the outer loop would have decided to stop moving forward
too, and the bvec deref is never actually used for anything.  But
CONFIG_DEBUG_PAGEALLOC catches it because we're outside our bio.

Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
2016-03-21 07:25:44 -07:00
Anand Jain
ebb8765b2d btrfs: move btrfs_compression_type to compression.h
So that its better organized.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-03-11 17:12:46 +01:00
Chandan Rajendra
c40a3d38af Btrfs: Compute and look up csums based on sectorsized blocks
Checksums are applicable to sectorsize units. The current code uses
bio->bv_len units to compute and look up checksums. This works on machines
where sectorsize == PAGE_SIZE. This patch makes the checksum computation and
look up code to work with sectorsize units.

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-02-01 19:23:47 +01:00
David Sterba
e4058b54d1 btrfs: cleanup, use enum values for btrfs_path reada
Replace the integers by enums for better readability. The value 2 does
not have any meaning since a717531942
"Btrfs: do less aggressive btree readahead" (2009-01-22).

Signed-off-by: David Sterba <dsterba@suse.com>
2016-01-07 15:01:15 +01:00
Chris Mason
fc4c3c872f Merge branch 'cleanups-post-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.1
Signed-off-by: Chris Mason <clm@fb.com>

Conflicts:
	fs/btrfs/disk-io.c
2015-03-25 10:52:48 -07:00
David Sterba
31e818fe73 btrfs: cleanup, use kmalloc_array/kcalloc array helpers
Convert kmalloc(nr * size, ..) to kmalloc_array that does additional
overflow checks, the zeroing variant is kcalloc.

Signed-off-by: David Sterba <dsterba@suse.cz>
2015-03-03 17:23:58 +01:00
Daniel Dressler
b7a0365ec7 Btrfs: ctree: reduce args where only fs_info used
This patch is part of a larger project to cleanup btrfs's internal usage
of struct btrfs_root. Many functions take btrfs_root only to grab a
pointer to fs_info.

This causes programmers to ponder which root can be passed. Since only
the fs_info is read affected functions can accept any root, except this
is only obvious upon inspection.

This patch reduces the specificty of such functions to accept the
fs_info directly.

This patch does not address the two functions in ctree.c (insert_ptr,
and split_item) which only use root for BUG_ONs in ctree.c

This patch affects the following functions:
  1) fixup_low_keys
  2) btrfs_set_item_key_safe

Signed-off-by: Daniel Dressler <danieru.dressler@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
2015-02-16 18:48:43 +01:00
Chris Mason
6e5aafb274 Btrfs: fix kfree on list_head in btrfs_lookup_csums_range error cleanup
If we hit any errors in btrfs_lookup_csums_range, we'll loop through all
the csums we allocate and free them.  But the code was using list_entry
incorrectly, and ended up trying to free the on-stack list_head instead.

This bug came from commit 0678b6185

btrfs: Don't BUG_ON kzalloc error in btrfs_lookup_csums_range()

Signed-off-by: Chris Mason <clm@fb.com>
Reported-by: Erik Berg <btrfs@slipsprogrammoer.no>
cc: stable@vger.kernel.org # 3.3 or newer
2014-11-04 06:59:04 -08:00
Miao Xie
23ea8e5a07 Btrfs: load checksum data once when submitting a direct read io
The current code would load checksum data for several times when we split
a whole direct read io because of the limit of the raid stripe, it would
make us search the csum tree for several times. In fact, it just wasted time,
and made the contention of the csum tree root be more serious. This patch
improves this problem by loading the data at once.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:38:50 -07:00
Satoru Takeuchi
d1b00a4711 btrfs: use IS_ALIGNED() for assertion in btrfs_lookup_csums_range() for simplicity
btrfs_lookup_csums_range() uses ALIGN() to check if "start"
and "end + 1" are aligned to "root->sectorsize". It's better to
replace these with IS_ALIGNED() for simplicity.

Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:54 -07:00
David Sterba
962a298f35 btrfs: kill the key type accessor helpers
btrfs_set_key_type and btrfs_key_type are used inconsistently along with
open coded variants. Other members of btrfs_key are accessed directly
without any helpers anyway.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-09-17 13:37:12 -07:00
Filipe Manana
27b9a8122f Btrfs: fix csum tree corruption, duplicate and outdated checksums
Under rare circumstances we can end up leaving 2 versions of a checksum
for the same file extent range.

The reason for this is that after calling btrfs_next_leaf we process
slot 0 of the leaf it returns, instead of processing the slot set in
path->slots[0]. Most of the time (by far) path->slots[0] is 0, but after
btrfs_next_leaf() releases the path and before it searches for the next
leaf, another task might cause a split of the next leaf, which migrates
some of its keys to the leaf we were processing before calling
btrfs_next_leaf(). In this case btrfs_next_leaf() returns again the
same leaf but with path->slots[0] having a slot number corresponding
to the first new key it got, that is, a slot number that didn't exist
before calling btrfs_next_leaf(), as the leaf now has more keys than
it had before. So we must really process the returned leaf starting at
path->slots[0] always, as it isn't always 0, and the key at slot 0 can
have an offset much lower than our search offset/bytenr.

For example, consider the following scenario, where we have:

sums->bytenr: 40157184, sums->len: 16384, sums end: 40173568
four 4kb file data blocks with offsets 40157184, 40161280, 40165376, 40169472

  Leaf N:

    slot = 0                           slot = btrfs_header_nritems() - 1
  |-------------------------------------------------------------------|
  | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] |
  |-------------------------------------------------------------------|

  Leaf N + 1:

      slot = 0                          slot = btrfs_header_nritems() - 1
  |--------------------------------------------------------------------|
  | [(CSUM CSUM 40161280), size 32] ... [((CSUM CSUM 40615936), size 8 |
  |--------------------------------------------------------------------|

Because we are at the last slot of leaf N, we call btrfs_next_leaf() to
find the next highest key, which releases the current path and then searches
for that next key. However after releasing the path and before finding that
next key, the item at slot 0 of leaf N + 1 gets moved to leaf N, due to a call
to ctree.c:push_leaf_left() (via ctree.c:split_leaf()), and therefore
btrfs_next_leaf() will returns us a path again with leaf N but with the slot
pointing to its new last key (CSUM CSUM 40161280). This new version of leaf N
is then:

    slot = 0                        slot = btrfs_header_nritems() - 2  slot = btrfs_header_nritems() - 1
  |----------------------------------------------------------------------------------------------------|
  | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4]  [(CSUM CSUM 40161280), size 32] |
  |----------------------------------------------------------------------------------------------------|

And incorrecly using slot 0, makes us set next_offset to 39239680 and we jump
into the "insert:" label, which will set tmp to:

    tmp = min((sums->len - total_bytes) >> blocksize_bits,
        (next_offset - file_key.offset) >> blocksize_bits) =
    min((16384 - 0) >> 12, (39239680 - 40157184) >> 12) =
    min(4, (u64)-917504 = 18446744073708634112 >> 12) = 4

and

   ins_size = csum_size * tmp = 4 * 4 = 16 bytes.

In other words, we insert a new csum item in the tree with key
(CSUM_OBJECTID CSUM_KEY 40157184 = sums->bytenr) that contains the checksums
for all the data (4 blocks of 4096 bytes each = sums->len). Which is wrong,
because the item with key (CSUM CSUM 40161280) (the one that was moved from
leaf N + 1 to the end of leaf N) contains the old checksums of the last 12288
bytes of our data and won't get those old checksums removed.

So this leaves us 2 different checksums for 3 4kb blocks of data in the tree,
and breaks the logical rule:

   Key_N+1.offset >= Key_N.offset + length_of_data_its_checksums_cover

An obvious bad effect of this is that a subsequent csum tree lookup to get
the checksum of any of the blocks with logical offset of 40161280, 40165376
or 40169472 (the last 3 4kb blocks of file data), will get the old checksums.

Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-08-15 07:43:40 -07:00
Filipe Manana
7ffbb598a0 Btrfs: make fsync work after cloning into a file
When cloning into a file, we were correctly replacing the extent
items in the target range and removing the extent maps. However
we weren't replacing the extent maps with new ones that point to
the new extents - as a consequence, an incremental fsync (when the
inode doesn't have the full sync flag) was a NOOP, since it relies
on the existence of extent maps in the modified list of the inode's
extent map tree, which was empty. Therefore add new extent maps to
reflect the target clone range.

A test case for xfstests follows.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:21:16 -07:00
Filipe Manana
35045bf2fd Btrfs: don't access non-existent key when csum tree is empty
When the csum tree is empty, our leaf (path->nodes[0]) has a number
of items equal to 0 and since btrfs_header_nritems() returns an
unsigned integer (and so is our local nritems variable) the following
comparison always evaluates to false:

     if (path->slots[0] >= nritems - 1) {

As the casting rules lead to:

     if ((u32)0 >= (u32)4294967295) {

This makes us access key at slot paths->slots[0] + 1 (1) of the empty leaf
some lines below:

    btrfs_item_key_to_cpu(path->nodes[0], &found_key, slot);
    if (found_key.objectid != BTRFS_EXTENT_CSUM_OBJECTID ||
        found_key.type != BTRFS_EXTENT_CSUM_KEY) {
		found_next = 1;
		goto insert;
    }

So just don't access such non-existent slot and don't set found_next to 1
when the tree is empty. It's very unlikely we'll get a random key with the
objectid and type values above, which is where we could go into trouble.

If nritems is 0, just set found_next to 1 anyway as it will make us insert
a csum item covering our whole extent (or the whole leaf) when the tree is
empty.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:44 -07:00
Liu Bo
d2cbf2a260 Btrfs: do not increment on bio_index one by one
'bio_index' is just a index, it's really not necessary to do increment
one by one.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:25 -07:00
Linus Torvalds
e7651b819e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "This is a pretty big pull, and most of these changes have been
  floating in btrfs-next for a long time.  Filipe's properties work is a
  cool building block for inheriting attributes like compression down on
  a per inode basis.

  Jeff Mahoney kicked in code to export filesystem info into sysfs.

  Otherwise, lots of performance improvements, cleanups and bug fixes.

  Looks like there are still a few other small pending incrementals, but
  I wanted to get the bulk of this in first"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (149 commits)
  Btrfs: fix spin_unlock in check_ref_cleanup
  Btrfs: setup inode location during btrfs_init_inode_locked
  Btrfs: don't use ram_bytes for uncompressed inline items
  Btrfs: fix btrfs_search_slot_for_read backwards iteration
  Btrfs: do not export ulist functions
  Btrfs: rework ulist with list+rb_tree
  Btrfs: fix memory leaks on walking backrefs failure
  Btrfs: fix send file hole detection leading to data corruption
  Btrfs: add a reschedule point in btrfs_find_all_roots()
  Btrfs: make send's file extent item search more efficient
  Btrfs: fix to catch all errors when resolving indirect ref
  Btrfs: fix protection between walking backrefs and root deletion
  btrfs: fix warning while merging two adjacent extents
  Btrfs: fix infinite path build loops in incremental send
  btrfs: undo sysfs when open_ctree() fails
  Btrfs: fix snprintf usage by send's gen_unique_name
  btrfs: fix defrag 32-bit integer overflow
  btrfs: sysfs: list the NO_HOLES feature
  btrfs: sysfs: don't show reserved incompat feature
  btrfs: call permission checks earlier in ioctls and return EPERM
  ...
2014-01-30 20:08:20 -08:00
Frank Holton
efe120a067 Btrfs: convert printk to btrfs_ and fix BTRFS prefix
Convert all applicable cases of printk and pr_* to the btrfs_* macros.

Fix all uses of the BTRFS prefix.

Signed-off-by: Frank Holton <fholton@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-01-28 13:20:05 -08:00
Kent Overstreet
4f024f3797 block: Abstract out bvec iterator
Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@inktank.com>
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: Benny Halevy <bhalevy@tonian.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchand@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Peng Tao <tao.peng@emc.com>
Cc: Andy Adamson <andros@netapp.com>
Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Pankaj Kumar <pankaj.km@samsung.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>6
2013-11-23 22:33:47 -08:00
Dulshani Gunawardhana
fae7f21cec btrfs: Use WARN_ON()'s return value in place of WARN_ON(1)
Use WARN_ON()'s return value in place of WARN_ON(1) for cleaner source
code that outputs a more descriptive warnings. Also fix the styling
warning of redundant braces that came up as a result of this fix.

Signed-off-by: Dulshani Gunawardhana <dulshani.gunawardhana89@gmail.com>
Reviewed-by: Zach Brown <zab@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-11 22:11:53 -05:00
Josef Bacik
4277a9c3b3 Btrfs: add an assert to btrfs_lookup_csums_range for alignment
I was hitting weird issues when trying to remove hole extents and it turned out
it was because I was sending non-aligned offsets down to
btrfs_lookup_csums_range.  So add an assert for this in case somebody trips over
this in the future.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-11-11 21:58:45 -05:00
Geert Uytterhoeven
c1c9ff7c94 Btrfs: Remove superfluous casts from u64 to unsigned long long
u64 is "unsigned long long" on all architectures now, so there's no need to
cast it when formatting it using the "ll" length modifier.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:16:08 -04:00
Miao Xie
facc8a2247 Btrfs: don't cache the csum value into the extent state tree
Before applying this patch, we cached the csum value into the extent state
tree when reading some data from the disk, this operation increased the lock
contention of the state tree.

Now, we just store the csum value into the bio structure or other unshared
structure, so we can reduce the lock contention.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-01 08:04:33 -04:00
Miao Xie
f51a4a1826 Btrfs: remove btrfs_sector_sum structure
Using the structure btrfs_sector_sum to keep the checksum value is
unnecessary, because the extents that btrfs_sector_sum points to are
continuous, we can find out the expected checksums by btrfs_ordered_sum's
bytenr and the offset, so we can remove btrfs_sector_sum's bytenr. After
removing bytenr, there is only one member in the structure, so it makes
no sense to keep the structure, just remove it, and use a u32 array to
store the checksum value.

By this change, we don't use the while loop to get the checksums one by
one. Now, we can get several checksum value at one time, it improved the
performance by ~74% on my SSD (31MB/s -> 54MB/s).

test command:
 # dd if=/dev/zero of=/mnt/btrfs/file0 bs=1M count=1024 oflag=sync

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-07-02 11:50:47 -04:00
Eric Sandeen
48a3b6366f btrfs: make static code static & remove dead code
Big patch, but all it does is add statics to functions which
are in fact static, then remove the associated dead-code fallout.

removed functions:

btrfs_iref_to_path()
__btrfs_lookup_delayed_deletion_item()
__btrfs_search_delayed_insertion_item()
__btrfs_search_delayed_deletion_item()
find_eb_for_page()
btrfs_find_block_group()
range_straddles_pages()
extent_range_uptodate()
btrfs_file_extent_length()
btrfs_scrub_cancel_devid()
btrfs_start_transaction_lflush()

btrfs_print_tree() is left because it is used for debugging.
btrfs_start_transaction_lflush() and btrfs_reada_detach() are
left for symmetry.

ulist.c functions are left, another patch will take care of those.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:55:23 -04:00
Tsutomu Itoh
4b90c68015 Btrfs: remove unused argument of btrfs_extend_item()
Argument 'trans' is not used in btrfs_extend_item().

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:53 -04:00
Tsutomu Itoh
afe5fea72b Btrfs: cleanup of function where fixup_low_keys() is called
If argument 'trans' is unnecessary in the function where
fixup_low_keys() is called, 'trans' is deleted.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:52 -04:00
Miao Xie
e4100d987b Btrfs: improve the performance of the csums lookup
It is very likely that there are several blocks in bio, it is very
inefficient if we get their csums one by one. This patch improves
this problem by getting the csums in batch.

According to the result of the following test, the execute time of
__btrfs_lookup_bio_sums() is down by ~28%(300us -> 217us).

 # dd if=<mnt>/file of=/dev/null bs=1M count=1024

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2013-05-06 15:54:35 -04:00