Commit Graph

81816 Commits

Author SHA1 Message Date
Jan Kara
028f6055c9 udf: Fix uninitialized array access for some pathnames
For filenames that begin with . and are between 2 and 5 characters long,
UDF charset conversion code would read uninitialized memory in the
output buffer. The only practical impact is that the name may be prepended a
"unification hash" when it is not actually needed but still it is good
to fix this.

Reported-by: syzbot+cd311b1e43cc25f90d18@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/000000000000e2638a05fe9dc8f9@google.com
Signed-off-by: Jan Kara <jack@suse.cz>
2023-06-21 11:53:06 +02:00
Jan Kara
404615d7f1 ext2: Drop fragment support
Ext2 has fields in superblock reserved for subblock allocation support.
However that never landed. Drop the many years dead code.

Reported-by: syzbot+af5e10f73dbff48f70af@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
2023-06-13 12:37:31 +02:00
Ye Bin
d6a95db3c7 quota: fix warning in dqgrab()
There's issue as follows when do fault injection:
WARNING: CPU: 1 PID: 14870 at include/linux/quotaops.h:51 dquot_disable+0x13b7/0x18c0
Modules linked in:
CPU: 1 PID: 14870 Comm: fsconfig Not tainted 6.3.0-next-20230505-00006-g5107a9c821af-dirty #541
RIP: 0010:dquot_disable+0x13b7/0x18c0
RSP: 0018:ffffc9000acc79e0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88825e41b980
RDX: 0000000000000000 RSI: ffff88825e41b980 RDI: 0000000000000002
RBP: ffff888179f68000 R08: ffffffff82087ca7 R09: 0000000000000000
R10: 0000000000000001 R11: ffffed102f3ed026 R12: ffff888179f68130
R13: ffff888179f68110 R14: dffffc0000000000 R15: ffff888179f68118
FS:  00007f450a073740(0000) GS:ffff88882fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe96f2efd8 CR3: 000000025c8ad000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 dquot_load_quota_sb+0xd53/0x1060
 dquot_resume+0x172/0x230
 ext4_reconfigure+0x1dc6/0x27b0
 reconfigure_super+0x515/0xa90
 __x64_sys_fsconfig+0xb19/0xd20
 do_syscall_64+0x39/0xb0
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Above issue may happens as follows:
ProcessA              ProcessB                    ProcessC
sys_fsconfig
  vfs_fsconfig_locked
   reconfigure_super
     ext4_remount
      dquot_suspend -> suspend all type quota

                 sys_fsconfig
                  vfs_fsconfig_locked
                    reconfigure_super
                     ext4_remount
                      dquot_resume
                       ret = dquot_load_quota_sb
                        add_dquot_ref
                                           do_open  -> open file O_RDWR
                                            vfs_open
                                             do_dentry_open
                                              get_write_access
                                               atomic_inc_unless_negative(&inode->i_writecount)
                                              ext4_file_open
                                               dquot_file_open
                                                dquot_initialize
                                                  __dquot_initialize
                                                   dqget
						    atomic_inc(&dquot->dq_count);

                          __dquot_initialize
                           __dquot_initialize
                            dqget
                             if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags))
                               ext4_acquire_dquot
			        -> Return error DQ_ACTIVE_B flag isn't set
                         dquot_disable
			  invalidate_dquots
			   if (atomic_read(&dquot->dq_count))
	                    dqgrab
			     WARN_ON_ONCE(!test_bit(DQ_ACTIVE_B, &dquot->dq_flags))
	                      -> Trigger warning

In the above scenario, 'dquot->dq_flags' has no DQ_ACTIVE_B is normal when
dqgrab().
To solve above issue just replace the dqgrab() use in invalidate_dquots() with
atomic_inc(&dquot->dq_count).

Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230605140731.2427629-3-yebin10@huawei.com>
2023-06-05 16:50:30 +02:00
Jan Kara
6a4e336379 quota: Properly disable quotas when add_dquot_ref() fails
When add_dquot_ref() fails (usually due to IO error or ENOMEM), we want
to disable quotas we are trying to enable. However dquot_disable() call
was passed just the flags we are enabling so in case flags ==
DQUOT_USAGE_ENABLED dquot_disable() call will just fail with EINVAL
instead of properly disabling quotas. Fix the problem by always passing
DQUOT_LIMITS_ENABLED | DQUOT_USAGE_ENABLED to dquot_disable() in this
case.

Reported-and-tested-by: Ye Bin <yebin10@huawei.com>
Reported-by: syzbot+e633c79ceaecbf479854@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230605140731.2427629-2-yebin10@huawei.com>
2023-06-05 16:50:30 +02:00
Bagas Sanjaya
aac2fa2013 fs: udf: udftime: Replace LGPL boilerplate with SPDX identifier
Replace license boilerplate in udftime.c with SPDX identifier for
LGPL-2.0.

Cc: Paul Eggert <eggert@twinsun.com>
Cc: Richard Fontana <rfontana@redhat.com>
Cc: Pali Rohár <pali@kernel.org>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230522005434.22133-3-bagasdotme@gmail.com>
2023-05-30 15:40:20 +02:00
Bagas Sanjaya
5ce345541e fs: udf: Replace GPL 2.0 boilerplate license notice with SPDX identifier
The notice refers to full GPL 2.0 text on now defunct MIT FTP site [1].
Replace it with appropriate SPDX license identifier.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Pali Rohár <pali@kernel.org>
Link: https://web.archive.org/web/20020809115410/ftp://prep.ai.mit.edu/pub/gnu/GPL [1]
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230522005434.22133-2-bagasdotme@gmail.com>
2023-05-30 15:39:13 +02:00
Jan Kara
576215cffd fs: Drop wait_unfrozen wait queue
wait_unfrozen waitqueue is used only in quota code to wait for
filesystem to become unfrozen. In that place we can just use
sb_start_write() - sb_end_write() pair to achieve the same. So just
remove the waitqueue.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Message-Id: <20230525141710.7595-1-jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
2023-05-30 15:36:40 +02:00
Al Viro
b8b9e8b35d ext2_find_entry()/ext2_dotdot(): callers don't need page_addr anymore
... and that's how it should've been done in the first place

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2023-05-29 11:03:31 +02:00
Al Viro
dae42837ba ext2_{set_link,delete_entry}(): don't bother with page_addr
ext2_set_link() simply doesn't use it anymore and ext2_delete_entry()
can easily obtain it from the directory entry pointer...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2023-05-29 11:03:28 +02:00
Al Viro
91f646fb97 ext2_put_page(): accept any pointer within the page
eliminates the need to keep the pointer to the first byte within
the page if we are guaranteed to have pointers to some byte
in the same page at hand.

Don't backport without commit 88d7b12068 ("highmem: round down the
address passed to kunmap_flush_on_unmap()").

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2023-05-29 11:03:25 +02:00
Al Viro
46022375ab ext2_get_page(): saner type
We need to pass to caller both the page reference and pointer to the
first byte in the now-mapped page.  The former always has the same type,
the latter varies from caller to caller.  So make it
	void *ext2_get_page(...., struct page **page)
rather than
	struct page *ext2_get_page(..., void **page_addr)
and avoid the casts...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2023-05-29 11:03:22 +02:00
Al Viro
8600839269 ext2: use offset_in_page() instead of open-coding it as subtraction
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2023-05-29 11:03:19 +02:00
Al Viro
8f1dca19b1 ext2_rename(): set_link and delete_entry may fail
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2023-05-29 11:03:06 +02:00
Ritesh Harjani (IBM)
6e335cd789 ext2: Add direct-io trace points
This patch adds the trace point to ext2 direct-io apis
in fs/ext2/file.c

Here is how the output looks like

        a.out-467865 [006]  6758.170968: ext2_dio_write_begin: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 4096 flags DIRECT|WRITE aio 1 ret 0
        a.out-467865 [006]  6758.171061: ext2_dio_write_end:   dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 0 flags DIRECT|WRITE aio 1 ret -529
kworker/3:153-444162 [003]  6758.171252: ext2_dio_write_endio: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 4096 flags DIRECT|WRITE aio 1 ret 0
        a.out-468222 [001]  6761.628924: ext2_dio_read_begin:  dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 4096 flags DIRECT aio 1 ret 0
        a.out-468222 [001]  6761.629063: ext2_dio_read_end:    dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 0 flags DIRECT aio 1 ret -529
        a.out-468428 [005]  6763.937454: ext2_dio_write_begin: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 4096 flags DIRECT aio 0 ret 0
        a.out-468428 [005]  6763.937829: ext2_dio_write_endio: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 4096 flags DIRECT aio 0 ret 0
        a.out-468428 [005]  6763.937847: ext2_dio_write_end:   dev 7:12 ino 0xe isize 0x1000 pos 0x1000 len 0 flags DIRECT aio 0 ret 4096
        a.out-468609 [000]  6765.702878: ext2_dio_read_begin:  dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 4096 flags DIRECT aio 0 ret 0
        a.out-468609 [000]  6765.703243: ext2_dio_read_end:    dev 7:12 ino 0xe isize 0x1000 pos 0x1000 len 0 flags DIRECT aio 0 ret 4096

Reported-and-tested-by: Disha Goel <disgoel@linux.ibm.com>
[Need to add CFLAGS_trace for fixing unable to find trace file problem]
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <b8b0897fa2b273a448d7b4ba7317357ac73c08bc.1682069716.git.ritesh.list@gmail.com>
2023-05-16 11:32:42 +02:00
Ritesh Harjani (IBM)
fb5de4358e ext2: Move direct-io to use iomap
This patch converts ext2 direct-io path to iomap interface.
- This also takes care of DIO_SKIP_HOLES part in which we return -ENOTBLK
  from ext2_iomap_begin(), in case if the write is done on a hole.
- This fallbacks to buffered-io in case of DIO_SKIP_HOLES or in case of
  a partial write or if any error is detected in ext2_iomap_end().
  We try to return -ENOTBLK in such cases.
- For any unaligned or extending DIO writes, we pass
  IOMAP_DIO_FORCE_WAIT flag to ensure synchronous writes.
- For extending writes we set IOMAP_F_DIRTY in ext2_iomap_begin because
  otherwise with dsync writes on devices that support FUA, generic_write_sync
  won't be called and we might miss inode metadata updates.
- Since ext2 already now uses _nolock vartiant of sync write. Hence
  there is no inode lock problem with iomap in this patch.
- ext2_iomap_ops are now being shared by DIO, DAX & fiemap path

Tested-by: Disha Goel <disgoel@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <610b672a52f2a7ff6dc550fd14d0f995806232a5.1682069716.git.ritesh.list@gmail.com>
2023-05-16 11:32:42 +02:00
Ritesh Harjani (IBM)
d053070425 ext2: Use generic_buffers_fsync() implementation
Next patch converts ext2 to use iomap interface for DIO.
iomap layer can call generic_write_sync() -> ext2_fsync() from
iomap_dio_complete while still holding the inode_lock().

Now writeback from other paths doesn't need inode_lock().
It seems there is also no need of an inode_lock() for
sync_mapping_buffers(). It uses it's own mapping->private_lock
for it's buffer list handling.
Hence this patch is in preparation to move ext2 to iomap.
This uses generic_buffers_fsync() which does not take any inode_lock()
in ext2_fsync().

Tested-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <76d206a464574ff91db25bc9e43479b51ca7e307.1682069716.git.ritesh.list@gmail.com>
2023-05-16 11:32:42 +02:00
Ritesh Harjani (IBM)
5b5b4ff8f9 ext4: Use generic_buffers_fsync_noflush() implementation
ext4 when got converted to iomap for dio, it copied __generic_file_fsync
implementation to avoid taking inode_lock in order to avoid any deadlock
(since iomap takes an inode_lock while calling generic_write_sync()).

The previous patch already added generic_buffers_fsync*() which does not
take any inode_lock(). Hence kill the redundant code and use
generic_buffers_fsync_noflush() function instead.

Tested-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <b43d4bb4403061ed86510c9587673e30a461ba14.1682069716.git.ritesh.list@gmail.com>
2023-05-16 11:32:42 +02:00
Ritesh Harjani (IBM)
31b2ebc092 fs/buffer.c: Add generic_buffers_fsync*() implementation
Some of the higher layers like iomap takes inode_lock() when calling
generic_write_sync().
Also writeback already happens from other paths without inode lock,
so it's difficult to say that we really need sync_mapping_buffers() to
take any inode locking here. Having said that, let's add
generic_buffers_fsync/_noflush() implementation in buffer.c with no
inode_lock/unlock() for now so that filesystems like ext2 and
ext4's nojournal mode can use it.

Ext4 when got converted to iomap for direct-io already copied it's own
variant of __generic_file_fsync() without lock.

This patch adds generic_buffers_fsync()
& generic_buffers_fsync_noflush() implementations for use in filesystems
like ext2 & ext4 respectively.

Later we can review other filesystems as well to see if we can make
generic_buffers_fsync/_noflush() which does not take any inode_lock() as
the default path.

Tested-by: Disha Goel <disgoel@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <d573408ac8408627d23a3d2d166e748c172c4c9e.1682069716.git.ritesh.list@gmail.com>
2023-05-16 11:32:42 +02:00
Ritesh Harjani (IBM)
fcced95b6b ext2/dax: Fix ext2_setsize when len is page aligned
PAGE_ALIGN(x) macro gives the next highest value which is multiple of
pagesize. But if x is already page aligned then it simply returns x.
So, if x passed is 0 in dax_zero_range() function, that means the
length gets passed as 0 to ->iomap_begin().

In ext2 it then calls ext2_get_blocks -> max_blocks as 0 and hits bug_on
here in ext2_get_blocks().
	BUG_ON(maxblocks == 0);

Instead we should be calling dax_truncate_page() here which takes
care of it. i.e. it only calls dax_zero_range if the offset is not
page/block aligned.

This can be easily triggered with following on fsdax mounted pmem
device.

dd if=/dev/zero of=file count=1 bs=512
truncate -s 0 file

[79.525838] EXT2-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
[79.529376] ext2 filesystem being mounted at /mnt1/test supports timestamps until 2038 (0x7fffffff)
[93.793207] ------------[ cut here ]------------
[93.795102] kernel BUG at fs/ext2/inode.c:637!
[93.796904] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[93.798659] CPU: 0 PID: 1192 Comm: truncate Not tainted 6.3.0-rc2-xfstests-00056-g131086faa369 #139
[93.806459] RIP: 0010:ext2_get_blocks.constprop.0+0x524/0x610
<...>
[93.835298] Call Trace:
[93.836253]  <TASK>
[93.837103]  ? lock_acquire+0xf8/0x110
[93.838479]  ? d_lookup+0x69/0xd0
[93.839779]  ext2_iomap_begin+0xa7/0x1c0
[93.841154]  iomap_iter+0xc7/0x150
[93.842425]  dax_zero_range+0x6e/0xa0
[93.843813]  ext2_setsize+0x176/0x1b0
[93.845164]  ext2_setattr+0x151/0x200
[93.846467]  notify_change+0x341/0x4e0
[93.847805]  ? lock_acquire+0xf8/0x110
[93.849143]  ? do_truncate+0x74/0xe0
[93.850452]  ? do_truncate+0x84/0xe0
[93.851739]  do_truncate+0x84/0xe0
[93.852974]  do_sys_ftruncate+0x2b4/0x2f0
[93.854404]  do_syscall_64+0x3f/0x90
[93.855789]  entry_SYSCALL_64_after_hwframe+0x72/0xdc

CC: stable@vger.kernel.org
Fixes: 2aa3048e03 ("iomap: switch iomap_zero_range to use iomap_iter")
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <046a58317f29d9603d1068b2bbae47c2332c17ae.1682069716.git.ritesh.list@gmail.com>
2023-05-16 11:32:42 +02:00
Linus Torvalds
bb7c241fae Some ext4 bug fixes (mostly to address Syzbot reports) for v6.4-rc2.
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmRgCfAACgkQ8vlZVpUN
 gaOaOgf5AbFUBsjb95Aq2Y6SKvlyO2xFd2OqJXu6+bGaJScQ8qeoW2byihN4vD/e
 i5V5vivpk764k1uOUe9fq5BlkaTuvFJI8d81eEnJC3LW4s7r6Gv586dwbE5lr0Bq
 cZKCVMYdgwz3admGtPXrN0CVgg+Y/wHb1ZmGtt2nAqZfNqYfpX0waDyGr6JebhkO
 04VE8QQCvMkO6oOIR9ZfbJmVm5vrGqQVLW4T0hXVTj9r3gUu/61qAkt2XYAu5tKJ
 ENIoMv2ix0asAgFSbcIzY6YnCzSY9hiV/K6Twtusf63r22T+r6+LXBqUe+8hMx4E
 Vh8L+5wkeNkCXD8HwnHizPx5r0nLqw==
 =ouFA
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Some ext4 bug fixes (mostly to address Syzbot reports)"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: bail out of ext4_xattr_ibody_get() fails for any reason
  ext4: add bounds checking in get_max_inline_xattr_value_size()
  ext4: add indication of ro vs r/w mounts in the mount message
  ext4: fix deadlock when converting an inline directory in nojournal mode
  ext4: improve error recovery code paths in __ext4_remount()
  ext4: improve error handling from ext4_dirhash()
  ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled
  ext4: check iomap type only if ext4_iomap_begin() does not fail
  ext4: avoid a potential slab-out-of-bounds in ext4_group_desc_csum
  ext4: fix data races when using cached status extents
  ext4: avoid deadlock in fs reclaim with page writeback
  ext4: fix invalid free tracking in ext4_xattr_move_to_block()
  ext4: remove a BUG_ON in ext4_mb_release_group_pa()
  ext4: allow ext4_get_group_info() to fail
  ext4: fix lockdep warning when enabling MMP
  ext4: fix WARNING in mb_find_extent
2023-05-13 17:45:39 -07:00
Theodore Ts'o
2a534e1d0d ext4: bail out of ext4_xattr_ibody_get() fails for any reason
In ext4_update_inline_data(), if ext4_xattr_ibody_get() fails for any
reason, it's best if we just fail as opposed to stumbling on,
especially if the failure is EFSCORRUPTED.

Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:05 -04:00
Theodore Ts'o
2220eaf909 ext4: add bounds checking in get_max_inline_xattr_value_size()
Normally the extended attributes in the inode body would have been
checked when the inode is first opened, but if someone is writing to
the block device while the file system is mounted, it's possible for
the inode table to get corrupted.  Add bounds checking to avoid
reading beyond the end of allocated memory if this happens.

Reported-by: syzbot+1966db24521e5f6e23f7@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=1966db24521e5f6e23f7
Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:05 -04:00
Theodore Ts'o
6dcc98fbc4 ext4: add indication of ro vs r/w mounts in the mount message
Whether the file system is mounted read-only or read/write is more
important than the quota mode, which we are already printing.  Add the
ro vs r/w indication since this can be helpful in debugging problems
from the console log.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:05 -04:00
Theodore Ts'o
f4ce24f54d ext4: fix deadlock when converting an inline directory in nojournal mode
In no journal mode, ext4_finish_convert_inline_dir() can self-deadlock
by calling ext4_handle_dirty_dirblock() when it already has taken the
directory lock.  There is a similar self-deadlock in
ext4_incvert_inline_data_nolock() for data files which we'll fix at
the same time.

A simple reproducer demonstrating the problem:

    mke2fs -Fq -t ext2 -O inline_data -b 4k /dev/vdc 64
    mount -t ext4 -o dirsync /dev/vdc /vdc
    cd /vdc
    mkdir file0
    cd file0
    touch file0
    touch file1
    attr -s BurnSpaceInEA -V abcde .
    touch supercalifragilisticexpialidocious

Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230507021608.1290720-1-tytso@mit.edu
Reported-by: syzbot+91dccab7c64e2850a4e5@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=ba84cc80a9491d65416bc7877e1650c87530fe8a
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:05 -04:00
Theodore Ts'o
4c0b4818b1 ext4: improve error recovery code paths in __ext4_remount()
If there are failures while changing the mount options in
__ext4_remount(), we need to restore the old mount options.

This commit fixes two problem.  The first is there is a chance that we
will free the old quota file names before a potential failure leading
to a use-after-free.  The second problem addressed in this commit is
if there is a failed read/write to read-only transition, if the quota
has already been suspended, we need to renable quota handling.

Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230506142419.984260-2-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:05 -04:00
Theodore Ts'o
4b3cb1d108 ext4: improve error handling from ext4_dirhash()
The ext4_dirhash() will *almost* never fail, especially when the hash
tree feature was first introduced.  However, with the addition of
support of encrypted, casefolded file names, that function can most
certainly fail today.

So make sure the callers of ext4_dirhash() properly check for
failures, and reflect the errors back up to their callers.

Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230506142419.984260-1-tytso@mit.edu
Reported-by: syzbot+394aa8a792cb99dbc837@syzkaller.appspotmail.com
Reported-by: syzbot+344aaa8697ebd232bfc8@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=db56459ea4ac4a676ae4b4678f633e55da005a9b
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:05 -04:00
Theodore Ts'o
a44be64bbe ext4: don't clear SB_RDONLY when remounting r/w until quota is re-enabled
When a file system currently mounted read/only is remounted
read/write, if we clear the SB_RDONLY flag too early, before the quota
is initialized, and there is another process/thread constantly
attempting to create a directory, it's possible to trigger the

	WARN_ON_ONCE(dquot_initialize_needed(inode));

in ext4_xattr_block_set(), with the following stack trace:

   WARNING: CPU: 0 PID: 5338 at fs/ext4/xattr.c:2141 ext4_xattr_block_set+0x2ef2/0x3680
   RIP: 0010:ext4_xattr_block_set+0x2ef2/0x3680 fs/ext4/xattr.c:2141
   Call Trace:
    ext4_xattr_set_handle+0xcd4/0x15c0 fs/ext4/xattr.c:2458
    ext4_initxattrs+0xa3/0x110 fs/ext4/xattr_security.c:44
    security_inode_init_security+0x2df/0x3f0 security/security.c:1147
    __ext4_new_inode+0x347e/0x43d0 fs/ext4/ialloc.c:1324
    ext4_mkdir+0x425/0xce0 fs/ext4/namei.c:2992
    vfs_mkdir+0x29d/0x450 fs/namei.c:4038
    do_mkdirat+0x264/0x520 fs/namei.c:4061
    __do_sys_mkdirat fs/namei.c:4076 [inline]
    __se_sys_mkdirat fs/namei.c:4074 [inline]
    __x64_sys_mkdirat+0x89/0xa0 fs/namei.c:4074

Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230506142419.984260-1-tytso@mit.edu
Reported-by: syzbot+6385d7d3065524c5ca6d@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=6513f6cb5cd6b5fc9f37e3bb70d273b94be9c34c
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:05 -04:00
Baokun Li
fa83c34e3e ext4: check iomap type only if ext4_iomap_begin() does not fail
When ext4_iomap_overwrite_begin() calls ext4_iomap_begin() map blocks may
fail for some reason (e.g. memory allocation failure, bare disk write), and
later because "iomap->type ! = IOMAP_MAPPED" triggers WARN_ON(). When ext4
iomap_begin() returns an error, it is normal that the type of iomap->type
may not match the expectation. Therefore, we only determine if iomap->type
is as expected when ext4_iomap_begin() is executed successfully.

Cc: stable@kernel.org
Reported-by: syzbot+08106c4b7d60702dbc14@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/00000000000015760b05f9b4eee9@google.com
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230505132429.714648-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:04 -04:00
Tudor Ambarus
4f04351888 ext4: avoid a potential slab-out-of-bounds in ext4_group_desc_csum
When modifying the block device while it is mounted by the filesystem,
syzbot reported the following:

BUG: KASAN: slab-out-of-bounds in crc16+0x206/0x280 lib/crc16.c:58
Read of size 1 at addr ffff888075f5c0a8 by task syz-executor.2/15586

CPU: 1 PID: 15586 Comm: syz-executor.2 Not tainted 6.2.0-rc5-syzkaller-00205-gc96618275234 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106
 print_address_description+0x74/0x340 mm/kasan/report.c:306
 print_report+0x107/0x1f0 mm/kasan/report.c:417
 kasan_report+0xcd/0x100 mm/kasan/report.c:517
 crc16+0x206/0x280 lib/crc16.c:58
 ext4_group_desc_csum+0x81b/0xb20 fs/ext4/super.c:3187
 ext4_group_desc_csum_set+0x195/0x230 fs/ext4/super.c:3210
 ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
 ext4_free_blocks+0x191a/0x2810 fs/ext4/mballoc.c:6173
 ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
 ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
 ext4_ext_remove_space+0x24ef/0x46a0 fs/ext4/extents.c:2958
 ext4_ext_truncate+0x177/0x220 fs/ext4/extents.c:4416
 ext4_truncate+0xa6a/0xea0 fs/ext4/inode.c:4342
 ext4_setattr+0x10c8/0x1930 fs/ext4/inode.c:5622
 notify_change+0xe50/0x1100 fs/attr.c:482
 do_truncate+0x200/0x2f0 fs/open.c:65
 handle_truncate fs/namei.c:3216 [inline]
 do_open fs/namei.c:3561 [inline]
 path_openat+0x272b/0x2dd0 fs/namei.c:3714
 do_filp_open+0x264/0x4f0 fs/namei.c:3741
 do_sys_openat2+0x124/0x4e0 fs/open.c:1310
 do_sys_open fs/open.c:1326 [inline]
 __do_sys_creat fs/open.c:1402 [inline]
 __se_sys_creat fs/open.c:1396 [inline]
 __x64_sys_creat+0x11f/0x160 fs/open.c:1396
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f72f8a8c0c9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f72f97e3168 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
RAX: ffffffffffffffda RBX: 00007f72f8bac050 RCX: 00007f72f8a8c0c9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000280
RBP: 00007f72f8ae7ae9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd165348bf R14: 00007f72f97e3300 R15: 0000000000022000

Replace
	le16_to_cpu(sbi->s_es->s_desc_size)
with
	sbi->s_desc_size

It reduces ext4's compiled text size, and makes the code more efficient
(we remove an extra indirect reference and a potential byte
swap on big endian systems), and there is no downside. It also avoids the
potential KASAN / syzkaller failure, as a bonus.

Reported-by: syzbot+fc51227e7100c9294894@syzkaller.appspotmail.com
Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=70d28d11ab14bd7938f3e088365252aa923cff42
Link: https://syzkaller.appspot.com/bug?id=b85721b38583ecc6b5e72ff524c67302abbc30f3
Link: https://lore.kernel.org/all/000000000000ece18705f3b20934@google.com/
Fixes: 717d50e497 ("Ext4: Uninitialized Block Groups")
Cc: stable@vger.kernel.org
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Link: https://lore.kernel.org/r/20230504121525.3275886-1-tudor.ambarus@linaro.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:04 -04:00
Jan Kara
492888df0c ext4: fix data races when using cached status extents
When using cached extent stored in extent status tree in tree->cache_es
another process holding ei->i_es_lock for reading can be racing with us
setting new value of tree->cache_es. If the compiler would decide to
refetch tree->cache_es at an unfortunate moment, it could result in a
bogus in_range() check. Fix the possible race by using READ_ONCE() when
using tree->cache_es only under ei->i_es_lock for reading.

Cc: stable@kernel.org
Reported-by: syzbot+4a03518df1e31b537066@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/000000000000d3b33905fa0fd4a6@google.com
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230504125524.10802-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:04 -04:00
Jan Kara
00d873c17e ext4: avoid deadlock in fs reclaim with page writeback
Ext4 has a filesystem wide lock protecting ext4_writepages() calls to
avoid races with switching of journalled data flag or inode format. This
lock can however cause a deadlock like:

CPU0                            CPU1

ext4_writepages()
  percpu_down_read(sbi->s_writepages_rwsem);
                                ext4_change_inode_journal_flag()
                                  percpu_down_write(sbi->s_writepages_rwsem);
                                    - blocks, all readers block from now on
  ext4_do_writepages()
    ext4_init_io_end()
      kmem_cache_zalloc(io_end_cachep, GFP_KERNEL)
        fs_reclaim frees dentry...
          dentry_unlink_inode()
            iput() - last ref =>
              iput_final() - inode dirty =>
                write_inode_now()...
                  ext4_writepages() tries to acquire sbi->s_writepages_rwsem
                    and blocks forever

Make sure we cannot recurse into filesystem reclaim from writeback code
to avoid the deadlock.

Reported-by: syzbot+6898da502aef574c5f8a@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/0000000000004c66b405fa108e27@google.com
Fixes: c8585c6fca ("ext4: fix races between changing inode journal mode and ext4_writepages")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230504124723.20205-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:04 -04:00
Theodore Ts'o
b87c7cdf2b ext4: fix invalid free tracking in ext4_xattr_move_to_block()
In ext4_xattr_move_to_block(), the value of the extended attribute
which we need to move to an external block may be allocated by
kvmalloc() if the value is stored in an external inode.  So at the end
of the function the code tried to check if this was the case by
testing entry->e_value_inum.

However, at this point, the pointer to the xattr entry is no longer
valid, because it was removed from the original location where it had
been stored.  So we could end up calling kvfree() on a pointer which
was not allocated by kvmalloc(); or we could also potentially leak
memory by not freeing the buffer when it should be freed.  Fix this by
storing whether it should be freed in a separate variable.

Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230430160426.581366-1-tytso@mit.edu
Link: https://syzkaller.appspot.com/bug?id=5c2aee8256e30b55ccf57312c16d88417adbd5e1
Link: https://syzkaller.appspot.com/bug?id=41a6b5d4917c0412eb3b3c3c604965bed7d7420b
Reported-by: syzbot+64b645917ce07d89bde5@syzkaller.appspotmail.com
Reported-by: syzbot+0d042627c4f2ad332195@syzkaller.appspotmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:04 -04:00
Theodore Ts'o
463808f237 ext4: remove a BUG_ON in ext4_mb_release_group_pa()
If a malicious fuzzer overwrites the ext4 superblock while it is
mounted such that the s_first_data_block is set to a very large
number, the calculation of the block group can underflow, and trigger
a BUG_ON check.  Change this to be an ext4_warning so that we don't
crash the kernel.

Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230430154311.579720-3-tytso@mit.edu
Reported-by: syzbot+e2efa3efc15a1c9e95c3@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=69b28112e098b070f639efb356393af3ffec4220
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-05-13 18:05:04 -04:00
Theodore Ts'o
5354b2af34 ext4: allow ext4_get_group_info() to fail
Previously, ext4_get_group_info() would treat an invalid group number
as BUG(), since in theory it should never happen.  However, if a
malicious attaker (or fuzzer) modifies the superblock via the block
device while it is the file system is mounted, it is possible for
s_first_data_block to get set to a very large number.  In that case,
when calculating the block group of some block number (such as the
starting block of a preallocation region), could result in an
underflow and very large block group number.  Then the BUG_ON check in
ext4_get_group_info() would fire, resutling in a denial of service
attack that can be triggered by root or someone with write access to
the block device.

For a quality of implementation perspective, it's best that even if
the system administrator does something that they shouldn't, that it
will not trigger a BUG.  So instead of BUG'ing, ext4_get_group_info()
will call ext4_error and return NULL.  We also add fallback code in
all of the callers of ext4_get_group_info() that it might NULL.

Also, since ext4_get_group_info() was already borderline to be an
inline function, un-inline it.  The results in a next reduction of the
compiled text size of ext4 by roughly 2k.

Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230430154311.579720-2-tytso@mit.edu
Reported-by: syzbot+e2efa3efc15a1c9e95c3@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=69b28112e098b070f639efb356393af3ffec4220
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2023-05-13 18:02:46 -04:00
Linus Torvalds
76c7f8873a for-6.4-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmRebDIACgkQxWXV+ddt
 WDu3vA//RNyRGjEz0HgfhTc1119DXJLwK6j544waYLrzRcMtBK4xKByiaFkAA4tL
 PQidGX+nAQPm+pZl0jcK30cBMObik5GXJwoSOZGl7/ectx4O7aFfXqiSfwPTyqZU
 3fTavoqoJxbxJCVbifcXOPNhsUxMlEGYJmA3CVRsllLviXY+3HMpX2ZpWZ7vch+N
 MLENNBfUo1HVdWaxOYfQif/qT5iR9G7D8dBjX9DUK0kVwrbwBB0rolJy4fPrY6z5
 gBLED9Ks3FBgyU3mYq4qrfPmbfF8mPiaU0+1j+B46vw3PdPtIwjIForR+91GsZ1v
 iHojbykf6VWTQV+gO78mgv4O4vRtn3C+UJaGxLL86OMOaiQQHFYdSETn9arPmoho
 p1wCBidI82tvfIOGYXgrTGorLN27hhyPJinHe/2Bqo+1wUL8/J8mwCWunIox7a8z
 rxO5QhDIDFX7gamsvYjkW3tBkYuGiGvBjx+Ic2cBHTkVp9wSPL9PCvqNNru2qexA
 t0BpAL9DxvN+T1xO1thC3qsm2Ogx0QEmgdDfRglbEVASnRZKZZsJEMO90FzFbkFg
 vLbs0KnT7yS7mTwq4NklDrgHZ0eiiJLZVCb8bR8xkzVW+ADrUmZuDM8WOcCgJAUp
 fUoMmFsJZi5zsdAOygDWr1bBHorLV5szrY0bSB5L2eHwJjYZ6KE=
 =uWUN
 -----END PGP SIGNATURE-----

Merge tag 'for-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull more btrfs fixes from David Sterba:

 - fix incorrect number of bitmap entries for space cache if loading is
   interrupted by some error

 - fix backref walking, this breaks a mode of LOGICAL_INO_V2 ioctl that
   is used in deduplication tools

 - zoned mode fixes:
      - properly finish zone reserved for relocation
      - correctly calculate super block zone end on ZNS
      - properly initialize new extent buffer for redirty

 - make mount option clear_cache work with block-group-tree, to rebuild
   free-space-tree instead of temporarily disabling it that would lead
   to a forced read-only mount

 - fix alignment check for offset when printing extent item

* tag 'for-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: make clear_cache mount option to rebuild FST without disabling it
  btrfs: zero the buffer before marking it dirty in btrfs_redirty_list_add
  btrfs: zoned: fix full zone super block reading on ZNS
  btrfs: zoned: zone finish data relocation BG with last IO
  btrfs: fix backref walking not returning all inode refs
  btrfs: fix space cache inconsistency after error loading it from disk
  btrfs: print-tree: parent bytenr must be aligned to sector size
2023-05-12 17:10:32 -05:00
Linus Torvalds
fd88f147cb 6 smb3 client fixes, most also for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmRd3pcACgkQiiy9cAdy
 T1FrEAwAj6M0DnI13diLdjZVs9WAVRIiu5Fk1l8oA6bco74UepbJjZmTf8K8G0pM
 fQDTXZWDa5g11tbYuXFb4Ue9Dv7hu4RuUthWZCxoDnKcvH00VtAtYplE1LJstWBh
 mZwHm8iEBRPs2oB1YZ7KnQ8kFSjs2VgwV2PQAI24QDP0QWTT4jtw/nPqLxKLxyef
 Rr9YkOK0LZPhhGEAZ5ABwkPH97U5CUMMwRTqhGDO/mm91FZ8sD+vAsqM7GdLU73w
 wGHp3M1RKBVBubNCgNT0uRMlnlqBcvoThqhYgJ6PYnS4uE23a2uLDNI8vj9LlMRo
 csoYdM+Kltpvki8hFIWiFxfHFfECHD6h/QOpRMSv9InU0ly6o7zwnl9VtuWcRyrM
 vyz3xnlI6wG6TCy1HGezq3HSaEbEbVpwLCxyG/xFiAHV/VdkmjmRUb0RJV4X3X8M
 /KUT+1wj0MAMmVUTIpIHuotRikScAPnP/NDMorIsgEUp/2TFh2m+FQZ15o401sGv
 96G8v+9I
 =ax5r
 -----END PGP SIGNATURE-----

Merge tag '6.4-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs client fixes from Steve French:

 - fix for copy_file_range bug for very large files that are multiples
   of rsize

 - do not ignore "isolated transport" flag if set on share

 - set rasize default better

 - three fixes related to shutdown and freezing (fixes 4 xfstests, and
   closes deferred handles faster in some places that were missed)

* tag '6.4-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: release leases for deferred close handles when freezing
  smb3: fix problem remounting a share after shutdown
  SMB3: force unmount was failing to close deferred close files
  smb3: improve parallel reads of large files
  do not reuse connection if share marked as isolated
  cifs: fix pcchunk length type in smb2_copychunk_range
2023-05-12 17:01:36 -05:00
Linus Torvalds
df8c2d13e2 vfs/v6.4-rc1/pipe
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZF5YpQAKCRCRxhvAZXjc
 oqZpAQCuXA/XDf4Cyhfk87C0gbzy1Akjf3Xpia8q1mWtvN4I+AD+PE54krWgxGRi
 mu5Ae2X95tQ1v+MUAcmWOMdITK8HewE=
 =8+GL
 -----END PGP SIGNATURE-----

Merge tag 'vfs/v6.4-rc1/pipe' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fix from Christian Brauner:
 "During the pipe nonblock rework the check for both O_NONBLOCK and
  IOCB_NOWAIT was dropped. Both checks need to be performed to ensure
  that files without O_NONBLOCK but IOCB_NOWAIT don't block when writing
  to or reading from a pipe.

  This just contains the fix adding the check for IOCB_NOWAIT back in"

* tag 'vfs/v6.4-rc1/pipe' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
  pipe: check for IOCB_NOWAIT alongside O_NONBLOCK
2023-05-12 16:56:09 -05:00
Jens Axboe
c04fe8e32f
pipe: check for IOCB_NOWAIT alongside O_NONBLOCK
Pipe reads or writes need to enable nonblocking attempts, if either
O_NONBLOCK is set on the file, or IOCB_NOWAIT is set in the iocb being
passed in. The latter isn't currently true, ensure we check for both
before waiting on data or space.

Fixes: afed6271f5 ("pipe: set FMODE_NOWAIT on pipes")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Message-Id: <e5946d67-4e5e-b056-ba80-656bab12d9f6@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-05-12 17:17:27 +02:00
Linus Torvalds
849a4f0973 xfs: bug fixes for 6.4-rc2
o fixes for inode garbage collection shutdown racing with work queue
   updates
 o ensure inodegc workers run on the CPU they are supposed to
 o disable counter scrubbing until we can exclusively freeze the
   filesystem from the kernel
 o Regression fixes for new allocation related bugs
 o a couple of minor cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEmJOoJ8GffZYWSjj/regpR/R1+h0FAmRcSIsUHGRhdmlkQGZy
 b21vcmJpdC5jb20ACgkQregpR/R1+h2Y8xAAxtsTdOx71XtDuNyfBOiqzZgTCq6b
 6LsckJIDQa1AXjUNq9G3zWcUcWBcRWcw+CWbkqjqQ9W47K/ijLuoKnjRsQ+5B4DU
 TBUctVq+/Zk2lBlb6HKuKdzqDGnIFWGVKVd7u8KlowqnXuzUeQ0vFkT7ZHTepUKG
 P+midgGNVT4+tykq7oH0H8WxoTyNPZhKiAUcZjneBgA60IAoQWHA2iUt+SKpbrkL
 1HyK+/edVMTXiDXtyHfXmDaH9Pgy6NCpw3TNkPDhuL1UDpLhg/zgT39rFZGBsAUt
 gaDM3wN5jBrot/mvJE3rH9bdZhkcf+NQKPx/1DDg3DL8plS/1/LUC4cImdolBJ3w
 RNmgJv1lK+AlE4MUJ/bUDlEpHUmwAjnnsxBXwEvnYNfj+9V6/mDB+HqKiY7/XxVK
 vF77s6z+CWvefdnZavJ4/72pVVJNkcDYCYmvh/donRP6vtnwZyzocFUeBeNMInV1
 /s3WMrF9hwmJqAClKG7p1fnszWp658yFIuw/TXVs+NrjTtQgXwMpl2cEYYvUZEJN
 Trq2p0xH/JSwcnOPSPJO6WHb8UPoqrM6lgGFaJVWJx1AWt1i1CFLf5eA5X+XisDV
 AJKgpqlnDg02bBMQ0tMFGZUaNx/1S1mwtxcZsyEFTutpUNxqJKDaMohpxxrWb0WC
 ppSqDvyJN4wtlFI=
 =qok2
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.4-rc1-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs bug fixes from Dave Chinner:
 "Largely minor bug fixes and cleanups, th emost important of which are
  probably the fixes for regressions in the extent allocation code:

   - fixes for inode garbage collection shutdown racing with work queue
     updates

   - ensure inodegc workers run on the CPU they are supposed to

   - disable counter scrubbing until we can exclusively freeze the
     filesystem from the kernel

   - regression fixes for new allocation related bugs

   - a couple of minor cleanups"

* tag 'xfs-6.4-rc1-fixes' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix xfs_inodegc_stop racing with mod_delayed_work
  xfs: disable reaping in fscounters scrub
  xfs: check that per-cpu inodegc workers actually run on that cpu
  xfs: explicitly specify cpu when forcing inodegc delayed work to run immediately
  xfs: fix negative array access in xfs_getbmap
  xfs: don't allocate into the data fork for an unshare request
  xfs: flush dirty data and drain directios before scrubbing cow fork
  xfs: set bnobt/cntbt numrecs correctly when formatting new AGs
  xfs: don't unconditionally null args->pag in xfs_bmap_btalloc_at_eof
2023-05-11 16:51:11 -05:00
Steve French
d39fc592ef cifs: release leases for deferred close handles when freezing
We should not be caching closed files when freeze is invoked on an fs
(so we can release resources more gracefully).

Fixes xfstests generic/068 generic/390 generic/491

Reviewed-by: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-10 17:48:30 -05:00
Linus Torvalds
d295b66a7b \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmRcFLAACgkQnJ2qBz9k
 QNmSuQf+P+hFOHVpCb83S4TrNsfPC9M7Pv/M1Rw926p1gESNLez1ElAx2B35OIBZ
 y4ijqrSHgXbCsC3PvbxR9YfBPJZwIhtmKwjPC+9KlhAFB536Bvt/nt34fjRCWJiS
 nABnexegkirk7EQufAGVZQ1gtm1p0qSCA1B/l5AFl4KeLIin31U/08F2eE1OAo3K
 TmxhXzlC+JRXjf/X/X8mzJ8nz/hY5039BDvs7EpVdOsZcLFRQcCUygUCrOdV94uI
 1wyzMoGNaiIsZHpHPf3NtphUoE9IFpZQJGXMTR4ehBgFYkkOHYxsjW+wn8YOmaks
 qzRRXb9w0WdgAXChDGu1xYXV6Wpjiw==
 =kqSe
 -----END PGP SIGNATURE-----

Merge tag 'fsnotify_for_v6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull inotify fix from Jan Kara:
 "A fix for possibly reporting invalid watch descriptor with inotify
  event"

* tag 'fsnotify_for_v6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  inotify: Avoid reporting event with invalid wd
2023-05-10 17:07:42 -05:00
Linus Torvalds
2a78769da3 gfs2 fix
- Fix a NULL pointer dereference when mounting corrupted filesystems.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmRbtSYUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTrQSg/6A7xD25DpU1CSL18CEKMxnViXPNLP
 IYiTtkByCsRkgQKR4KMtVgPD/lEyWbhpuZJzSF4mZkslpnngjL/o+Rj4Sds9GvPL
 +66CSJAqjpJUjgGDvAoiJcZYOejRyqZLCHzKF54h0oJ3k9JCd3xN86EX8vKGoc2e
 F/As+E1Wnf0PP4jiERDyFhhg95MNxFuNVJ/JJ7mwtWpumx+60SpGzzUp575AHcoe
 4zHkVHlj6FAz2G8l8NADYAc0wMuhQ05g19psHzRDr2bB9zHBy9kWzXN0MYrhRvuY
 HuQ3CNDbXCJPXFjrNEmLV0f7Y8yCk/YNrYPbo4zV+AW0l7NzesqzrwkP/qgNfySD
 pnQJ/mQ76P/h728fln0pYvO8GnFUFoKKoCR27jH4F/Mntr4hgR4FZNDhvK6BMQNc
 PpipJcXWT8AUY45Sg3TLKPEhjp37K+abiMKfyTEFQRvLT83O6WyCUx2HYpZI7l87
 /aTIsidlYkb2ZSeLeXEfBplRXwF7NQNvvRJCGQKHAN7UbTB1o2RVpC5bn6UIghb1
 eQQZObs4AUfmzHthF2OaeJy08AiFidbOVLmbpZuj2iOWbJzmjJea8YT/GdntxhMT
 swmkHJqdmZEYoSKKvDmemT6l7qtOwijg7sTQ/d2D77lk8QPPIHdUw6g1OVvMqoNE
 oESpIA4O2fshlpU=
 =/s3O
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v6.3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Andreas Gruenbacher:

 - Fix a NULL pointer dereference when mounting corrupted filesystems

* tag 'gfs2-v6.3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Don't deref jdesc in evict
2023-05-10 16:48:35 -05:00
Bob Peterson
504a10d9e4 gfs2: Don't deref jdesc in evict
On corrupt gfs2 file systems the evict code can try to reference the
journal descriptor structure, jdesc, after it has been freed and set to
NULL. The sequence of events is:

init_journal()
...
fail_jindex:
   gfs2_jindex_free(sdp); <------frees journals, sets jdesc = NULL
      if (gfs2_holder_initialized(&ji_gh))
         gfs2_glock_dq_uninit(&ji_gh);
fail:
   iput(sdp->sd_jindex); <--references jdesc in evict_linked_inode
      evict()
         gfs2_evict_inode()
            evict_linked_inode()
               ret = gfs2_trans_begin(sdp, 0, sdp->sd_jdesc->jd_blocks);
<------references the now freed/zeroed sd_jdesc pointer.

The call to gfs2_trans_begin is done because the truncate_inode_pages
call can cause gfs2 events that require a transaction, such as removing
journaled data (jdata) blocks from the journal.

This patch fixes the problem by adding a check for sdp->sd_jdesc to
function gfs2_evict_inode. In theory, this should only happen to corrupt
gfs2 file systems, when gfs2 detects the problem, reports it, then tries
to evict all the system inodes it has read in up to that point.

Reported-by: Yang Lan <lanyang0908@gmail.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-05-10 17:15:18 +02:00
Qu Wenruo
1d6a4fc857 btrfs: make clear_cache mount option to rebuild FST without disabling it
Previously clear_cache mount option would simply disable free-space-tree
feature temporarily then re-enable it to rebuild the whole free space
tree.

But this is problematic for block-group-tree feature, as we have an
artificial dependency on free-space-tree feature.

If we go the existing method, after clearing the free-space-tree
feature, we would flip the filesystem to read-only mode, as we detect a
super block write with block-group-tree but no free-space-tree feature.

This patch would change the behavior by properly rebuilding the free
space tree without disabling this feature, thus allowing clear_cache
mount option to work with block group tree.

Now we can mount a filesystem with block-group-tree feature and
clear_mount option:

  $ mkfs.btrfs  -O block-group-tree /dev/test/scratch1  -f
  $ sudo mount /dev/test/scratch1 /mnt/btrfs -o clear_cache
  $ sudo dmesg -t | head -n 5
  BTRFS info (device dm-1): force clearing of disk cache
  BTRFS info (device dm-1): using free space tree
  BTRFS info (device dm-1): auto enabling async discard
  BTRFS info (device dm-1): rebuilding free space tree
  BTRFS info (device dm-1): checking UUID tree

CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-10 14:51:27 +02:00
Christoph Hellwig
c83b56d1dd btrfs: zero the buffer before marking it dirty in btrfs_redirty_list_add
btrfs_redirty_list_add zeroes the buffer data and sets the
EXTENT_BUFFER_NO_CHECK to make sure writeback is fine with a bogus
header.  But it does that after already marking the buffer dirty, which
means that writeback could already be looking at the buffer.

Switch the order of operations around so that the buffer is only marked
dirty when we're ready to write it.

Fixes: d3575156f6 ("btrfs: zoned: redirty released extent buffers")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-10 14:50:29 +02:00
Naohiro Aota
02ca9e6fb5 btrfs: zoned: fix full zone super block reading on ZNS
When both of the superblock zones are full, we need to check which
superblock is newer. The calculation of last superblock position is wrong
as it does not consider zone_capacity and uses the length.

Fixes: 9658b72ef3 ("btrfs: zoned: locate superblock position using zone capacity")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-10 14:50:22 +02:00
Naohiro Aota
f84353c7c2 btrfs: zoned: zone finish data relocation BG with last IO
For data block groups, we zone finish a zone (or, just deactivate it) when
seeing the last IO in btrfs_finish_ordered_io(). That is only called for
IOs using ZONE_APPEND, but we use a regular WRITE command for data
relocation IOs. Detect it and call btrfs_zone_finish_endio() properly.

Fixes: be1a1d7a5d ("btrfs: zoned: finish fully written block group")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-10 14:50:12 +02:00
Filipe Manana
0cad8f14d7 btrfs: fix backref walking not returning all inode refs
When using the logical to ino ioctl v2, if the flag to ignore offsets of
file extent items (BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET) is given, the
backref walking code ends up not returning references for all file offsets
of an inode that point to the given logical bytenr. This happens since
kernel 6.2, commit 6ce6ba5344 ("btrfs: use a single argument for extent
offset in backref walking functions") because:

1) It mistakenly skipped the search for file extent items in a leaf that
   point to the target extent if that flag is given. Instead it should
   only skip the filtering done by check_extent_in_eb() - that is, it
   should not avoid the calls to that function (or find_extent_in_eb(),
   which uses it).

2) It was also not building a list of inode extent elements (struct
   extent_inode_elem) if we have multiple inode references for an extent
   when the ignore offset flag is given to the logical to ino ioctl - it
   would leave a single element, only the last one that was found.

These stem from the confusing old interface for backref walking functions
where we had an extent item offset argument that was a pointer to a u64
and another boolean argument that indicated if the offset should be
ignored, but the pointer could be NULL. That NULL case is used by
relocation, qgroup extent accounting and fiemap, simply to avoid building
the inode extent list for each reference, as it's not necessary for those
use cases and therefore avoids memory allocations and some computations.

Fix this by adding a boolean argument to the backref walk context
structure to indicate that the inode extent list should not be built,
make relocation set that argument to true and fix the backref walking
logic to skip the calls to check_extent_in_eb() and find_extent_in_eb()
only if this new argument is true, instead of 'ignore_extent_item_pos'
being true.

A test case for fstests will be added soon, to provide cover not only
for these cases but to the logical to ino ioctl in general as well, as
currently we do not have a test case for it.

Reported-by: Vladimir Panteleev <git@vladimir.panteleev.md>
Link: https://lore.kernel.org/linux-btrfs/CAHhfkvwo=nmzrJSqZ2qMfF-rZB-ab6ahHnCD_sq9h4o8v+M7QQ@mail.gmail.com/
Fixes: 6ce6ba5344 ("btrfs: use a single argument for extent offset in backref walking functions")
CC: stable@vger.kernel.org # 6.2+
Tested-by: Vladimir Panteleev <git@vladimir.panteleev.md>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-09 22:09:11 +02:00
Filipe Manana
0004ff15ea btrfs: fix space cache inconsistency after error loading it from disk
When loading a free space cache from disk, at __load_free_space_cache(),
if we fail to insert a bitmap entry, we still increment the number of
total bitmaps in the btrfs_free_space_ctl structure, which is incorrect
since we failed to add the bitmap entry. On error we then empty the
cache by calling __btrfs_remove_free_space_cache(), which will result
in getting the total bitmaps counter set to 1.

A failure to load a free space cache is not critical, so if a failure
happens we just rebuild the cache by scanning the extent tree, which
happens at block-group.c:caching_thread(). Yet the failure will result
in having the total bitmaps of the btrfs_free_space_ctl always bigger
by 1 then the number of bitmap entries we have. So fix this by having
the total bitmaps counter be incremented only if we successfully added
the bitmap entry.

Fixes: a67509c300 ("Btrfs: add a io_ctl struct and helpers for dealing with the space cache")
Reviewed-by: Anand Jain <anand.jain@oracle.com>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-09 22:08:05 +02:00
Anastasia Belova
c87f318e6f btrfs: print-tree: parent bytenr must be aligned to sector size
Check nodesize to sectorsize in alignment check in print_extent_item.
The comment states that and this is correct, similar check is done
elsewhere in the functions.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: ea57788eb7 ("btrfs: require only sector size alignment for parent eb bytenr")
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Anastasia Belova <abelova@astralinux.ru>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-09 22:07:40 +02:00