Commit Graph

682 Commits

Author SHA1 Message Date
Josef Bacik
5705674081 btrfs: do async reclaim for data reservations
Now that we have the data ticketing stuff in place, move normal data
reservations to use an async reclaim helper to satisfy tickets.  Before
we could have multiple tasks race in and both allocate chunks, resulting
in more data chunks than we would necessarily need.  Serializing these
allocations and making a single thread responsible for flushing will
only allocate chunks as needed, as well as cut down on transaction
commits and other flush related activities.

Priority reservations will still work as they have before, simply
trying to allocate a chunk until they can make their reservation.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:06:54 +02:00
Marcos Paulo de Souza
282dd7d771 btrfs: reset compression level for lzo on remount
Currently a user can set mount "-o compress" which will set the
compression algorithm to zlib, and use the default compress level for
zlib (3):

  relatime,compress=zlib:3,space_cache

If the user remounts the fs using "-o compress=lzo", then the old
compress_level is used:

  relatime,compress=lzo:3,space_cache

But lzo does not expose any tunable compression level. The same happens
if we set any compress argument with different level, also with zstd.

Fix this by resetting the compress_level when compress=lzo is
specified.  With the fix applied, lzo is shown without compress level:

  relatime,compress=lzo,space_cache

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-19 18:39:12 +02:00
Josef Bacik
faa008899a btrfs: make sure SB_I_VERSION doesn't get unset by remount
There's some inconsistency around SB_I_VERSION handling with mount and
remount.  Since we don't really want it to be off ever just work around
this by making sure we don't get the flag cleared on remount.

There's a tiny cpu cost of setting the bit, otherwise all changes to
i_version also change some of the times (ctime/mtime) so the inode needs
to be synced. We wouldn't save anything by disabling it.

Reported-by: Eric Sandeen <sandeen@redhat.com>
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add perf impact analysis ]
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-10 18:58:35 +02:00
Josef Bacik
3ef3959b29 btrfs: don't show full path of bind mounts in subvol=
Chris Murphy reported a problem where rpm ostree will bind mount a bunch
of things for whatever voodoo it's doing.  But when it does this
/proc/mounts shows something like

  /dev/sda /mnt/test btrfs rw,relatime,subvolid=256,subvol=/foo 0 0
  /dev/sda /mnt/test/baz btrfs rw,relatime,subvolid=256,subvol=/foo/bar 0 0

Despite subvolid=256 being subvol=/foo.  This is because we're just
spitting out the dentry of the mount point, which in the case of bind
mounts is the source path for the mountpoint.  Instead we should spit
out the path to the actual subvol.  Fix this by looking up the name for
the subvolid we have mounted.  With this fix the same test looks like
this

  /dev/sda /mnt/test btrfs rw,relatime,subvolid=256,subvol=/foo 0 0
  /dev/sda /mnt/test/baz btrfs rw,relatime,subvolid=256,subvol=/foo 0 0

Reported-by: Chris Murphy <chris@colorremedies.com>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-10 18:58:26 +02:00
David Sterba
27942c9971 btrfs: fix messages after changing compression level by remount
Reported by Forza on IRC that remounting with compression options does
not reflect the change in level, or at least it does not appear to do so
according to the messages:

  mount -o compress=zstd:1 /dev/sda /mnt
  mount -o remount,compress=zstd:15 /mnt

does not print the change to the level to syslog:

  [   41.366060] BTRFS info (device vda): use zstd compression, level 1
  [   41.368254] BTRFS info (device vda): disk space caching is enabled
  [   41.390429] BTRFS info (device vda): disk space caching is enabled

What really happens is that the message is lost but the level is actualy
changed.

There's another weird output, if compression is reset to 'no':

  [   45.413776] BTRFS info (device vda): use no compression, level 4

To fix that, save the previous compression level and print the message
in that case too and use separate message for 'no' compression.

CC: stable@vger.kernel.org # 4.19+
Signed-off-by: David Sterba <dsterba@suse.com>
2020-08-10 18:58:16 +02:00
Johannes Thumshirn
88c4703f00 btrfs: open-code remount flag setting in btrfs_remount
When we're (re)mounting a btrfs filesystem we set the
BTRFS_FS_STATE_REMOUNTING state in fs_info to serialize against async
reclaim or defrags.

This flag is set in btrfs_remount_prepare() called by btrfs_remount().
As btrfs_remount_prepare() does nothing but setting this flag and
doesn't have a second caller, we can just open-code the flag setting in
btrfs_remount().

Similarly do for so clearing of the flag by moving it out of
btrfs_remount_cleanup() into btrfs_remount() to be symmetrical.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:47 +02:00
Josef Bacik
5913139343 btrfs: document special case error codes for fs errors
We've had some discussions about what to do in certain scenarios for
error codes, specifically EUCLEAN and EROFS.  Document these near the
error handling code so its clear what their intentions are.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:46 +02:00
Anand Jain
4faf55b038 btrfs: don't traverse into the seed devices in show_devname
->show_devname currently shows the lowest devid in the list. As the seed
devices have the lowest devid in the sprouted filesystem, the userland
tool such as findmnt end up seeing seed device instead of the device from
the read-writable sprouted filesystem. As shown below.

 mount /dev/sda /btrfs
 mount: /btrfs: WARNING: device write-protected, mounted read-only.

 findmnt --output SOURCE,TARGET,UUID /btrfs
 SOURCE   TARGET UUID
 /dev/sda /btrfs 899f7027-3e46-4626-93e7-7d4c9ad19111

 btrfs dev add -f /dev/sdb /btrfs

 umount /btrfs
 mount /dev/sdb /btrfs

 findmnt --output SOURCE,TARGET,UUID /btrfs
 SOURCE   TARGET UUID
 /dev/sda /btrfs 899f7027-3e46-4626-93e7-7d4c9ad19111

All sprouts from a single seed will show the same seed device and the
same fsid. That's confusing.
This is causing problems in our prototype as there isn't any reference
to the sprout file-system(s) which is being used for actual read and
write.

This was added in the patch which implemented the show_devname in btrfs
commit 9c5085c147 ("Btrfs: implement ->show_devname").
I tried to look for any particular reason that we need to show the seed
device, there isn't any.

So instead, do not traverse through the seed devices, just show the
lowest devid in the sprouted fsid.

After the patch:

 mount /dev/sda /btrfs
 mount: /btrfs: WARNING: device write-protected, mounted read-only.

 findmnt --output SOURCE,TARGET,UUID /btrfs
 SOURCE   TARGET UUID
 /dev/sda /btrfs 899f7027-3e46-4626-93e7-7d4c9ad19111

 btrfs dev add -f /dev/sdb /btrfs
 mount -o rw,remount /dev/sdb /btrfs

 findmnt --output SOURCE,TARGET,UUID /btrfs
 SOURCE   TARGET UUID
 /dev/sdb /btrfs 595ca0e6-b82e-46b5-b9e2-c72a6928be48

 mount /dev/sda /btrfs1
 mount: /btrfs1: WARNING: device write-protected, mounted read-only.

 btrfs dev add -f /dev/sdc /btrfs1

 findmnt --output SOURCE,TARGET,UUID /btrfs1
 SOURCE   TARGET  UUID
 /dev/sdc /btrfs1 ca1dbb7a-8446-4f95-853c-a20f3f82bdbb

 cat /proc/self/mounts | grep btrfs
 /dev/sdb /btrfs btrfs rw,relatime,noacl,space_cache,subvolid=5,subvol=/ 0 0
 /dev/sdc /btrfs1 btrfs ro,relatime,noacl,space_cache,subvolid=5,subvol=/ 0 0

Reported-by: Martin K. Petersen <martin.petersen@oracle.com>
CC: stable@vger.kernel.org # 4.19+
Tested-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:44 +02:00
David Sterba
b90a4ab6ba btrfs: remove deprecated mount option subvolrootid
The option subvolrootid used to be a workaround for mounting subvolumes
and ineffective since 5e2a4b25da ("btrfs: deprecate subvolrootid mount
option"). We have subvol= that works and we don't need to keep the
cruft, let's remove it.

Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:39 +02:00
David Sterba
d801e7a355 btrfs: remove deprecated mount option alloc_start
The mount option alloc_start has no effect since 0d0c71b317 ("btrfs:
obsolete and remove mount option alloc_start") which has details why
it's been deprecated. We can remove it.

Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:39 +02:00
David Sterba
b547a88ea5 btrfs: start deprecation of mount option inode_cache
Estimated time of removal of the functionality is 5.11, the option will
be still parsed but will have no effect.

Reasons for deprecation and removal:

- very poor naming choice of the mount option, it's supposed to cache
  and reuse the inode _numbers_, but it sounds a some generic cache for
  inodes

- the only known usecase where this option would make sense is on a
  32bit architecture where inode numbers in one subvolume would be
  exhausted due to 32bit inode::i_ino

- the cache is stored on disk, consumes space, needs to be loaded and
  written back

- new inode number allocation is slower due to lookups into the cache
  (compared to a simple increment which is the default)

- uses the free-space-cache code that is going to be deprecated as well
  in the future

Known problems:

- since 2011, returning EEXIST when there's not enough space in a page
  to store all checksums, see commit 4b9465cb9e ("Btrfs: add mount -o
  inode_cache")

Remaining issues:

- if the option was enabled, new inodes created, the option disabled
  again, the cache is still stored on the devices and there's currently
  no way to remove it

Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:28 +02:00
Qu Wenruo
74ef00185e btrfs: introduce "rescue=" mount option
This patch introduces a new "rescue=" mount option group for all mount
options for data recovery.

Different rescue sub options are seperated by ':'. E.g
"ro,rescue=nologreplay:usebackuproot".

The original plan was to use ';', but ';' needs to be escaped/quoted,
or it will be interpreted by bash, similar to '|'.

And obviously, user can specify rescue options one by one like:
"ro,rescue=nologreplay,rescue=usebackuproot".

The following mount options are converted to "rescue=", old mount
options are deprecated but still available for compatibility purpose:

- usebackuproot
  Now it's "rescue=usebackuproot"

- nologreplay
  Now it's "rescue=nologreplay"

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:22 +02:00
Marcos Paulo de Souza
c730ae0c6b btrfs: convert comments to fallthrough annotations
Convert fall through comments to the pseudo-keyword which is now the
preferred way.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-02 10:18:30 +02:00
David Sterba
0202e83fda btrfs: simplify iget helpers
The inode lookup starting at btrfs_iget takes the full location key,
while only the objectid is used to match the inode, because the lookup
happens inside the given root thus the inode number is unique.
The entire location key is properly set up in btrfs_init_locked_inode.

Simplify the helpers and pass only inode number, renaming it to 'ino'
instead of 'objectid'. This allows to remove temporary variables key,
saving some stack space.

Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25 11:25:37 +02:00
David Sterba
56e9357a1e btrfs: simplify root lookup by id
The main function to lookup a root by its id btrfs_get_fs_root takes the
whole key, while only using the objectid. The value of offset is preset
to (u64)-1 but not actually used until btrfs_find_root that does the
actual search.

Switch btrfs_get_fs_root to use only objectid and remove all local
variables that existed just for the lookup. The actual key for search is
set up in btrfs_get_fs_root, reusing another key variable.

Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25 11:25:36 +02:00
David Sterba
fb8521caa8 btrfs: add more codes to decoder table
I've grepped logs for 'errno=.*unknown' and found -95, -117 and -122,
now added to the table. The wording is adjusted so it makes sense in
context of filesystem.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25 11:25:28 +02:00
David Sterba
d54f814434 btrfs: sort error decoder entries
Add the raw errnos and sort them accordingly.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-05-25 11:25:28 +02:00
David Sterba
7e8f19e50e btrfs: adjust message level for unrecognized mount option
An unrecognized option is a failure that should get user/administrator
attention, the info level is often below what gets logged, so make it
error.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:45 +01:00
Marcos Paulo de Souza
c0c907a47d btrfs: export helpers for subvolume name/id resolution
The functions will be used outside of export.c and super.c to allow
resolving subvolume name from a given id, eg. for subvolume deletion by
id ioctl.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ split from the next patch ]
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:42 +01:00
David Sterba
bf31f87f71 btrfs: add wrapper for transaction abort predicate
The status of aborted transaction can change between calls and it needs
to be accessed by READ_ONCE. Add a helper that also wraps the unlikely
hint.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:34 +01:00
Josef Bacik
0024652895 btrfs: rename btrfs_put_fs_root and btrfs_grab_fs_root
We are now using these for all roots, rename them to btrfs_put_root()
and btrfs_grab_root();

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:33 +01:00
Josef Bacik
8260edba67 btrfs: make the init of static elements in fs_info separate
In adding things like eb leak checking and root leak checking there were
a lot of weird corner cases that come from the fact that

  1) We do not init the fs_info until we get to open_ctree time in the
     normal case and

  2) The test infrastructure half-init's the fs_info for things that it
     needs.

This makes it really annoying to make changes because you have to add
init in two different places, have special cases for testing fs_info's
that may not have certain things initialized, and cases for fs_info's
that didn't make it to open_ctree and thus are not fully set up.

Fix this by extracting out the non-allocating init of the fs info into
it's own public function and use that to make sure we're all getting
consistent views of an allocated fs_info.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:33 +01:00
Josef Bacik
bc44d7c4b2 btrfs: push btrfs_grab_fs_root into btrfs_get_fs_root
Now that all callers of btrfs_get_fs_root are subsequently calling
btrfs_grab_fs_root and handling dropping the ref when they are done
appropriately, go ahead and push btrfs_grab_fs_root up into
btrfs_get_fs_root.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:32 +01:00
Josef Bacik
0d4b046301 btrfs: export and rename free_fs_info
We're going to start freeing roots and doing other complicated things in
free_fs_info, so we need to move it to disk-io.c and export it in order
to use things lik btrfs_put_fs_root().

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:32 +01:00
Josef Bacik
5168489a07 btrfs: hold a ref on the root in get_subvol_name_from_objectid
We lookup the name of a subvol which means we'll cross into different
roots.  Hold a ref while we're doing the look ups in the fs_root we're
searching.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:31 +01:00
Josef Bacik
3619c94f07 btrfs: open code btrfs_read_fs_root_no_name
All this does is call btrfs_get_fs_root() with check_ref == true.  Just
use btrfs_get_fs_root() so we don't have a bunch of different helpers
that do the same thing.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:26 +01:00
Su Yue
cfe953c824 btrfs: update the comment of btrfs_control_ioctl()
Btrfsctl was removed in 2012, now the function btrfs_control_ioctl()
is only used for devices ioctls. So update the comment.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-03-23 17:01:23 +01:00
David Sterba
10a3a3edc5 btrfs: log message when rw remount is attempted with unclean tree-log
A remount to a read-write filesystem is not safe when there's tree-log
to be replayed. Files that could be opened until now might be affected
by the changes in the tree-log.

A regular mount is needed to replay the log so the filesystem presents
the consistent view with the pending changes included.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-02-12 17:17:00 +01:00
Josef Bacik
d55966c427 btrfs: do not zero f_bavail if we have available space
There was some logic added a while ago to clear out f_bavail in statfs()
if we did not have enough free metadata space to satisfy our global
reserve.  This was incorrect at the time, however didn't really pose a
problem for normal file systems because we would often allocate chunks
if we got this low on free metadata space, and thus wouldn't really hit
this case unless we were actually full.

Fast forward to today and now we are much better about not allocating
metadata chunks all of the time.  Couple this with d792b0f197 ("btrfs:
always reserve our entire size for the global reserve") which now means
we'll easily have a larger global reserve than our free space, we are
now more likely to trip over this while still having plenty of space.

Fix this by skipping this logic if the global rsv's space_info is not
full.  space_info->full is 0 unless we've attempted to allocate a chunk
for that space_info and that has failed.  If this happens then the space
for the global reserve is definitely sacred and we need to report
b_avail == 0, but before then we can just use our calculated b_avail.

Reported-by: Martin Steigerwald <martin@lichtvoll.de>
Fixes: ca8a51b3a9 ("btrfs: statfs: report zero available if metadata are exhausted")
CC: stable@vger.kernel.org # 4.5+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Tested-By: Martin Steigerwald <martin@lichtvoll.de>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-02-02 18:49:32 +01:00
Dennis Zhou
b0643e59cf btrfs: add the beginning of async discard, discard workqueue
When discard is enabled, everytime a pinned extent is released back to
the block_group's free space cache, a discard is issued for the extent.
This is an overeager approach when it comes to discarding and helping
the SSD maintain enough free space to prevent severe garbage collection
situations.

This adds the beginning of async discard. Instead of issuing a discard
prior to returning it to the free space, it is just marked as untrimmed.
The block_group is then added to a LRU which then feeds into a workqueue
to issue discards at a much slower rate. Full discarding of unused block
groups is still done and will be addressed in a future patch of the
series.

For now, we don't persist the discard state of extents and bitmaps.
Therefore, our failure recovery mode will be to consider extents
untrimmed. This lets us handle failure and unmounting as one in the
same.

On a number of Facebook webservers, I collected data every minute
accounting the time we spent in btrfs_finish_extent_commit() (col. 1)
and in btrfs_commit_transaction() (col. 2). btrfs_finish_extent_commit()
is where we discard extents synchronously before returning them to the
free space cache.

discard=sync:
                 p99 total per minute       p99 total per minute
      Drive   |   extent_commit() (ms)  |    commit_trans() (ms)
    ---------------------------------------------------------------
     Drive A  |           434           |          1170
     Drive B  |           880           |          2330
     Drive C  |          2943           |          3920
     Drive D  |          4763           |          5701

discard=async:
                 p99 total per minute       p99 total per minute
      Drive   |   extent_commit() (ms)  |    commit_trans() (ms)
    --------------------------------------------------------------
     Drive A  |           134           |           956
     Drive B  |            64           |          1972
     Drive C  |            59           |          1032
     Drive D  |            62           |          1200

While it's not great that the stats are cumulative over 1m, all of these
servers are running the same workload and and the delta between the two
are substantial. We are spending significantly less time in
btrfs_finish_extent_commit() which is responsible for discarding.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20 16:40:57 +01:00
Dennis Zhou
46b27f5059 btrfs: rename DISCARD mount option to to DISCARD_SYNC
This series introduces async discard which will use the flag
DISCARD_ASYNC, so rename the original flag to DISCARD_SYNC as it is
synchronously done in transaction commit.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-01-20 16:40:57 +01:00
Linus Torvalds
0da522107e compat_ioctl: remove most of fs/compat_ioctl.c
As part of the cleanup of some remaining y2038 issues, I came to
 fs/compat_ioctl.c, which still has a couple of commands that need support
 for time64_t.
 
 In completely unrelated work, I spent time on cleaning up parts of this
 file in the past, moving things out into drivers instead.
 
 After Al Viro reviewed an earlier version of this series and did a lot
 more of that cleanup, I decided to try to completely eliminate the rest
 of it and move it all into drivers.
 
 This series incorporates some of Al's work and many patches of my own,
 but in the end stops short of actually removing the last part, which is
 the scsi ioctl handlers. I have patches for those as well, but they need
 more testing or possibly a rewrite.
 
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJdsHCdAAoJEJpsee/mABjZtYkP/1JGl3jFv3Iq/5BCdPkaePP1
 RtMJRNfURgK3GeuHUui330PvVjI/pLWXU/VXMK2MPTASpJLzYz3uCaZrpVWEMpDZ
 +ImzGmgJkITlW1uWU3zOcQhOxTyb1hCZ0Ci+2xn9QAmyOL7prXoXCXDWv3h6iyiF
 lwG+nW+HNtyx41YG+9bRfKNoG0ZJ+nkJ70BV6u0acQHXWn7Xuupa9YUmBL87hxAL
 6dlJfLTJg6q8QSv/Q6LxslfWk2Ti8OOJZOwtFM5R8Bgl0iUcvshiRCKfv/3t9jXD
 dJNvF1uq8z+gracWK49Qsfq5dnZ2ZxHFUo9u0NjbCrxNvWH/sdvhbaUBuJI75seH
 VIznCkdxFhrqitJJ8KmxANxG08u+9zSKjSlxG2SmlA4qFx/AoStoHwQXcogJscNb
 YIXYKmWBvwPzYu09QFAXdHFPmZvp/3HhMWU6o92lvDhsDwzkSGt3XKhCJea4DCaT
 m+oCcoACqSWhMwdbJOEFofSub4bY43s5iaYuKes+c8O261/Dwg6v/pgIVez9mxXm
 TBnvCsotq5m8wbwzv99eFqGeJH8zpDHrXxEtRR5KQqMqjLq/OQVaEzmpHZTEuK7n
 e/V/PAKo2/V63g4k6GApQXDxnjwT+m0aWToWoeEzPYXS6KmtWC91r4bWtslu3rdl
 bN65armTm7bFFR32Avnu
 =lgCl
 -----END PGP SIGNATURE-----

Merge tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground

Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
 "As part of the cleanup of some remaining y2038 issues, I came to
  fs/compat_ioctl.c, which still has a couple of commands that need
  support for time64_t.

  In completely unrelated work, I spent time on cleaning up parts of
  this file in the past, moving things out into drivers instead.

  After Al Viro reviewed an earlier version of this series and did a lot
  more of that cleanup, I decided to try to completely eliminate the
  rest of it and move it all into drivers.

  This series incorporates some of Al's work and many patches of my own,
  but in the end stops short of actually removing the last part, which
  is the scsi ioctl handlers. I have patches for those as well, but they
  need more testing or possibly a rewrite"

* tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
  scsi: sd: enable compat ioctls for sed-opal
  pktcdvd: add compat_ioctl handler
  compat_ioctl: move SG_GET_REQUEST_TABLE handling
  compat_ioctl: ppp: move simple commands into ppp_generic.c
  compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
  compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
  compat_ioctl: unify copy-in of ppp filters
  tty: handle compat PPP ioctls
  compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
  compat_ioctl: handle SIOCOUTQNSD
  af_unix: add compat_ioctl support
  compat_ioctl: reimplement SG_IO handling
  compat_ioctl: move WDIOC handling into wdt drivers
  fs: compat_ioctl: move FITRIM emulation into file systems
  gfs2: add compat_ioctl support
  compat_ioctl: remove unused convert_in_user macro
  compat_ioctl: remove last RAID handling code
  compat_ioctl: remove /dev/raw ioctl translation
  compat_ioctl: remove PCI ioctl translation
  compat_ioctl: remove joystick ioctl translation
  ...
2019-12-01 13:46:15 -08:00
David Sterba
8d6fac0087 btrfs: add support for 4-copy replication (raid1c4)
Add new block group profile to store 4 copies in a simliar way that
current RAID1 does.  The profile attributes and constraints are defined
in the raid table and used by the same code that already handles the 2-
and 3-copy RAID1.

The minimum number of devices is 4, the maximum number of devices/chunks
that can be lost/damaged is 3. There is no comparable traditional RAID
level, the profile is added for future needs to accompany triple-parity
and beyond.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 17:51:49 +01:00
David Sterba
47e6f7423b btrfs: add support for 3-copy replication (raid1c3)
Add new block group profile to store 3 copies in a simliar way that
current RAID1 does. The profile attributes and constraints are defined
in the raid table and used by the same code that already handles the
2-copy RAID1.

The minimum number of devices is 3, the maximum number of devices/chunks
that can be lost/damaged is 2. Like RAID6 but with 33% space
utilization.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 17:51:49 +01:00
Johannes Thumshirn
f5389f330d btrfs: remove cached space_info in btrfs_statfs()
In btrfs_statfs() we cache fs_info::space_info in a local variable only
to use it once in a list_for_each_rcu() statement.

Not only is the local variable unnecessary it even makes the code harder
to follow as it's not clear which list it is iterating.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 17:51:45 +01:00
David Sterba
352ae07b59 btrfs: add blake2b to checksumming algorithms
Add blake2b (with 256 bit digest) to the list of possible checksumming
algorithms used by BTRFS.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 17:51:44 +01:00
Johannes Thumshirn
3831bf0094 btrfs: add sha256 to checksumming algorithm
Add sha256 to the list of possible checksumming algorithms used by BTRFS.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 17:51:43 +01:00
Johannes Thumshirn
3951e7f050 btrfs: add xxhash64 to checksumming algorithms
Add xxhash64 to the list of possible checksumming algorithms used by
BTRFS.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 17:51:43 +01:00
Chris Mason
ba8a9d0795 Btrfs: delete the entire async bio submission framework
Now that we're not using btrfs_schedule_bio() anymore, delete all the
code that supported it.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 12:46:53 +01:00
David Sterba
4143cb8b6f btrfs: add const function attribute
For some reason the attribute is called __attribute_const__ and not
__const, marks functions that have no observable effects on program
state, IOW not reading pointers, just the arguments and calculating a
value. Allows the compiler to do some optimizations, based on
-Wsuggest-attribute=const . The effects are rather small, though, about
60 bytes decrese of btrfs.ko.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 12:46:52 +01:00
David Sterba
b105e92755 btrfs: add __cold attribute to more functions
The attribute can mark functions supposed to be called rarely if at all
and the text can be moved to sections far from the other code. The
attribute has been added to several functions already, this patch is
based on hints given by gcc -Wsuggest-attribute=cold.

The net effect of this patch is decrease of btrfs.ko by 1000-1300,
depending on the config options.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 12:46:52 +01:00
David Sterba
4c66e0d424 btrfs: drop unused parameter is_new from btrfs_iget
The parameter is now always set to NULL and could be dropped. The last
user was get_default_root but that got reworked in 05dbe6837b ("Btrfs:
unify subvol= and subvolid= mounting") and the parameter became unused.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 12:46:52 +01:00
Josef Bacik
6f0d04f8e7 btrfs: separate out the extent io init function
We are moving extent_io_tree into it's on file, so separate out the
extent_state init stuff from extent_io_tree_init().

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 12:46:47 +01:00
Arnd Bergmann
1832f2d8ff compat_ioctl: move more drivers to compat_ptr_ioctl
The .ioctl and .compat_ioctl file operations have the same prototype so
they can both point to the same function, which works great almost all
the time when all the commands are compatible.

One exception is the s390 architecture, where a compat pointer is only
31 bit wide, and converting it into a 64-bit pointer requires calling
compat_ptr(). Most drivers here will never run in s390, but since we now
have a generic helper for it, it's easy enough to use it consistently.

I double-checked all these drivers to ensure that all ioctl arguments
are used as pointers or are ignored, but are not interpreted as integer
values.

Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: David Sterba <dsterba@suse.com>
Acked-by: Darren Hart (VMware) <dvhart@infradead.org>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-10-23 17:23:44 +02:00
David Sterba
89439109bc btrfs: move sysfs declarations out of ctree.h
As the header for sysfs code already exists, use it to clean up ctree.h.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09 14:59:06 +02:00
Josef Bacik
aac0023c21 btrfs: move basic block_group definitions to their own header
This is prep work for moving all of the block group cache code into its
own file.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor comment updates ]
Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09 14:59:03 +02:00
Nikolay Borisov
559ca6ea69 btrfs: Refactor btrfs_calc_avail_data_space
Simplify the code by removing variables that don't bring any real value
as well as simplifying the checks when buidling the candidate list of
devices. No functional changes.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-09 14:59:02 +02:00
Josef Bacik
8719aaae8d btrfs: move space_info to space-info.h
Migrate the struct definition and the one helper that's in ctree.h into
space-info.h

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02 12:30:51 +02:00
David Sterba
e1ea2beee2 btrfs: use raid_attr for minimum stripe count in btrfs_calc_avail_data_space
Minimum stripe count matches the minimum devices required for a given
profile. The open coded assignments match the raid_attr table.

What's changed here is the meaning for RAID5/6. Previously their
min_stripes would be 1, while newly it's devs_min. This however shold be
the same as before because it's not possible to create filesystem on
fewer devices than the raid_attr table allows.

There's no adjustment regarding the parity stripes (like
calc_data_stripes does), because we're interested in overall space that
would fit on the devices.

Missing devices make no difference for the whole calculation, we have
the size stored in the structures.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02 12:30:51 +02:00
David Sterba
4f080f5711 btrfs: use raid_attr to adjust minimal stripe size in btrfs_calc_avail_data_space
Special case for DUP can be replaced by lookup to the attribute table,
where the dev_stripes is the right coefficient.

Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-02 12:30:51 +02:00