This updates bcachefs to use the new mount API:
- Update the file_system_type to use the new init_fs_context()
function.
- Define the new fs_context_operations functions.
- No longer register bch2_mount() and bch2_remount(); these are now
called via the new fs_context functions.
- Define a new helper type, bch2_opts_parse that includes a struct
bch_opts and additionally a printbuf used to save options that can't
be parsed until after the FS is opened. This enables us to parse as
many options as possible prior to opening the filesystem while saving
those options that need the open FS for later parsing.
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This introduces a new error code, option_needs_open_fs, which is used to
indicate that an attempt was made to parse a mount option prior to
opening a filesystem, when that mount option requires an open filesystem
in order to be validated.
Returning this error results in bch2_parse_one_mount_opt() saving that
option for later parsing, after the filesystem is opened.
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Mount options that take the name of a device that may be part of a
filesystem, for example "metadata_target", cannot be validated until
after the filesystem has been opened. However, an attempt to parse those
options may be made prior to the filesystem being opened.
This change adds a printbuf parameter to bch2_parse_mount_opts() which
will be used to save those mount options, when they are supplied prior
to the FS being opened, so that they can be parsed later.
This functionality is not currently needed, but will be used after
bcachefs starts using the new mount API to parse mount options. This is
because using the new mount API, we will process mount options prior to
opening the FS, but the new API doesn't provide a convenient way to
"replay" mount option parsing. So we save these options ourselves to
accomplish this.
This change also splits out the code to parse a single option into
bch2_parse_one_mount_opt(), which will be useful when using the new
mount API which deals with a single mount option at a time.
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
New on disk format version for bch_alloc->stripe_sectors and
BCH_DATA_unstriped - accounting for unstriped data in stripe buckets.
Upgrade/downgrade requires regenerating alloc info - but only if erasure
coding is in use.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a new pseudo data type, to track buckets that are members of a
stripe, but have unstriped data in them.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a separate counter to bch_alloc_v4 for amount of striped data; this
lets us separately track striped and unstriped data in a bucket, which
lets us see when erasure coding has failed to update extents with stripe
pointers, and also find buckets to continue updating if we crash mid way
through creating a new stripe.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Consolidate duplicated checks for extents/dirents/xattrs - these keys
should all have a corresponding inode of the correct type.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The output of mount options such as "metadata_target" in `/proc/mounts`
uses the full path to the device.
mount(8) from util-linux uses the output from `/proc/mounts` to pass
existing mount options when performing a remount, so bcachefs should
accept as input the same form that it prints as output.
Without this change:
$ mount -t bcachefs -o metadata_target=vdb /dev/vdb /mnt
$ strace mount -o remount /mnt
...
fsconfig(4, FSCONFIG_SET_STRING, "metadata_target", "/dev/vdb", 0) = -1 EINVAL (Invalid argument)
...
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When "read_only" is exposed as a mount option, it is redundant with the
standard option "ro" and gives users multiple ways to specify that a
bcachefs filesystem should be mounted read-only. This presents the risk
of having inconsistent options specified.
This can be seen when remounting a read-only filesystem in read-write
mode, using mount(8) from util-linux. Because mount(8) parses the
existing mount options from `/proc/mounts` and applies them when
remounting, it can end up applying both "read_only" and "rw":
$ mount img -o ro /mnt
$ strace mount -o remount,rw /mnt
...
fsconfig(4, FSCONFIG_SET_FLAG, "read_only", NULL, 0) = 0
fsconfig(4, FSCONFIG_SET_FLAG, "rw", NULL, 0) = 0
...
Making "read_only" no longer a mount option means this edge case cannot
occur.
Fixes: 62719cf33c ("bcachefs: Fix nochanges/read_only interaction")
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
A subsequent change will remove "read_only" as a mount option in favor
of the standard option "ro", meaning the userspace fsck command cannot
pass it to the fsck ioctl. Instead, in offline fsck, set "read_only"
kernel-side without trying to parse it as a mount option.
For compatibility with versions of the "bcachefs fsck" command that try
to pass the "read_only" mount opt, remove it from the mount options
string prior to parsing when it is present.
Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Use try_cmpxchg() family of functions instead of
cmpxchg (*ptr, old, new) == old. x86 CMPXCHG instruction returns
success in ZF flag, so this change saves a compare after cmpxchg
(and related move instruction in front of cmpxchg).
Also, try_cmpxchg() implicitly assigns old *ptr value to "old" when
cmpxchg fails. There is no need to re-read the value in the loop.
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Using filemap_read() can reduce unnecessary code execution
for non IOCB_DIRECT paths.
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Before patch:
```
#cat btrees/inodes/keys
u64s 17 type inode_v3 0:4096:U32_MAX len 0 ver 0: mode=40755
flags= (16300000)
bi_size=0
```
After patch:
```
#cat btrees/inodes/keys
u64s 17 type inode_v3 0:4096:U32_MAX len 0 ver 0:
mode=40755
flags=(16300000)
bi_size=0
```
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
ei_update_lock is largely vestigal and will probably be removed, but
we're not ready for that just yet.
this fixes some lockdep splats with the new lockdep support for btree
node locks; they're harmless, since we were taking ei_update_lock before
actually locking any btree nodes, but "any btree nodes locked" are now
tracked at the btree_trans level.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a pretty printer so the btree reserve cache can be seen in sysfs; as
it pins open_buckets we need it for tracking down open_buckets issues.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
same as in io_write.c, if we're waiting on the allocator for an
excessive amount of time, print what's going on
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
btree_root_lock is for the root keys in btree_root, not the pointers to
the nodes themselves; this fixes a lock ordering issue between
btree_root_lock and btree node locks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This reverts commit 86d81ec5f5.
This wasn't tested with memcg enabled, it immediately hits a null ptr
deref in list_lru_add().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
After commit 230e9fc286 ("slab: add SLAB_ACCOUNT flag"), we need to mark
the inode cache as SLAB_ACCOUNT, similar to commit 5d097056c9 ("kmemcg:
account for certain kmem allocations to memcg")
Signed-off-by: Youling Tang <tangyouling@kylinos.cn>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We don't have a way to flush a timer that's executing the callback, and
this is simple and limited enough in scope that we can just use the lock
instead.
Needed for the next patch that adds direct wakeups from the allocator to
copygc, where we're now more frequently calling io_timer_del() on an
expiring timer.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
fragmentation_lru derives from dirty_sectors, and wasn't being checked.
Co-developed-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We need to make sure we're not missing any fragmenation entries in the
LRU BTREE after repairing ALLOC BTREE
Also, use the new bch2_btree_write_buffer_maybe_flush() helper; this was
only working without it before since bucket invalidation (usually)
wasn't happening while fsck was running.
Co-developed-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a new helper for checking references to write buffer btrees, where
we need a flush before we definitively know we have an inconsistency.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
BCH_READ_NODECODE mode - used by the move paths - really wants to use
only the original rbio, but the retry path really wants to clone - oof.
Make sure to copy the crc of the pointer we read from back to the
original rbio, or we'll see spurious checksum errors later.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
On a new filesystem or device we have to allocate the journal with a
bump allocator, because allocation info isn't ready yet - but when
hot-adding a device that doesn't have a journal, we don't want to use
that path.
Reported-by: syzbot+24a867cb90d8315cccff@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>