Commit Graph

75771 Commits

Author SHA1 Message Date
Linus Torvalds
279b83c673 fs.fixes.v5.18-rc4
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYmF+/wAKCRCRxhvAZXjc
 otmDAP47jPBjTS+gdnMy8fP6ymZu2+So3gex8N777x23mTvQ5AEAy9s4tKMb5UA5
 rwQa8vRgkUBAyiB9yjloNOdN65X1cAE=
 =WLsw
 -----END PGP SIGNATURE-----

Merge tag 'fs.fixes.v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull mount_setattr fix from Christian Brauner:
 "The recent cleanup in e257039f0f ("mount_setattr(): clean the
  control flow and calling conventions") switched the mount attribute
  codepaths from do-while to for loops as they are more idiomatic when
  walking mounts.

  However, we did originally choose do-while constructs because if we
  request a mount or mount tree to be made read-only we need to hold
  writers in the following way: The mount attribute code will grab
  lock_mount_hash() and then call mnt_hold_writers() which will
  _unconditionally_ set MNT_WRITE_HOLD on the mount.

  Any callers that need write access have to call mnt_want_write(). They
  will immediately see that MNT_WRITE_HOLD is set on the mount and the
  caller will then either spin (on non-preempt-rt) or wait on
  lock_mount_hash() (on preempt-rt).

  The fact that MNT_WRITE_HOLD is set unconditionally means that once
  mnt_hold_writers() returns we need to _always_ pair it with
  mnt_unhold_writers() in both the failure and success paths.

  The do-while constructs did take care of this. But Al's change to a
  for loop in the failure path stops on the first mount we failed to
  change mount attributes _without_ going into the loop to call
  mnt_unhold_writers().

  This in turn means that once we failed to make a mount read-only via
  mount_setattr() - i.e. there are already writers on that mount - we
  will block any writers indefinitely. Fix this by ensuring that the for
  loop always unsets MNT_WRITE_HOLD including the first mount we failed
  to change to read-only. Also sprinkle a few comments into the cleanup
  code to remind people about what is happening including myself. After
  all, I didn't catch it during review.

  This is only relevant on mainline and was reported by syzbot. Details
  about the syzbot reports are all in the commit message"

* tag 'fs.fixes.v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  fs: unset MNT_WRITE_HOLD on failure
2022-04-22 13:17:19 -07:00
Christophe Leroy
5f24d5a579 mm, hugetlb: allow for "high" userspace addresses
This is a fix for commit f6795053da ("mm: mmap: Allow for "high"
userspace addresses") for hugetlb.

This patch adds support for "high" userspace addresses that are
optionally supported on the system and have to be requested via a hint
mechanism ("high" addr parameter to mmap).

Architectures such as powerpc and x86 achieve this by making changes to
their architectural versions of hugetlb_get_unmapped_area() function.
However, arm64 uses the generic version of that function.

So take into account arch_get_mmap_base() and arch_get_mmap_end() in
hugetlb_get_unmapped_area().  To allow that, move those two macros out
of mm/mmap.c into include/linux/sched/mm.h

If these macros are not defined in architectural code then they default
to (TASK_SIZE) and (base) so should not introduce any behavioural
changes to architectures that do not define them.

For the time being, only ARM64 is affected by this change.

Catalin (ARM64) said
 "We should have fixed hugetlb_get_unmapped_area() as well when we added
  support for 52-bit VA. The reason for commit f6795053da was to
  prevent normal mmap() from returning addresses above 48-bit by default
  as some user-space had hard assumptions about this.

  It's a slight ABI change if you do this for hugetlb_get_unmapped_area()
  but I doubt anyone would notice. It's more likely that the current
  behaviour would cause issues, so I'd rather have them consistent.

  Basically when arm64 gained support for 52-bit addresses we did not
  want user-space calling mmap() to suddenly get such high addresses,
  otherwise we could have inadvertently broken some programs (similar
  behaviour to x86 here). Hence we added commit f6795053da. But we
  missed hugetlbfs which could still get such high mmap() addresses. So
  in theory that's a potential regression that should have bee addressed
  at the same time as commit f6795053da (and before arm64 enabled
  52-bit addresses)"

Link: https://lkml.kernel.org/r/ab847b6edb197bffdfe189e70fb4ac76bfe79e0d.1650033747.git.christophe.leroy@csgroup.eu
Fixes: f6795053da ("mm: mmap: Allow for "high" userspace addresses")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>	[5.0.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-21 20:01:09 -07:00
Jaegeuk Kim
4d8ec91208 f2fs: should not truncate blocks during roll-forward recovery
If the file preallocated blocks and fsync'ed, we should not truncate them during
roll-forward recovery which will recover i_size correctly back.

Fixes: d4dd19ec1e ("f2fs: do not expose unwritten blocks to user by DIO")
Cc: <stable@vger.kernel.org> # 5.17+
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-04-21 18:57:09 -07:00
Ye Bin
23e3d7f706 jbd2: fix a potential race while discarding reserved buffers after an abort
we got issue as follows:
[   72.796117] EXT4-fs error (device sda): ext4_journal_check_start:83: comm fallocate: Detected aborted journal
[   72.826847] EXT4-fs (sda): Remounting filesystem read-only
fallocate: fallocate failed: Read-only file system
[   74.791830] jbd2_journal_commit_transaction: jh=0xffff9cfefe725d90 bh=0x0000000000000000 end delay
[   74.793597] ------------[ cut here ]------------
[   74.794203] kernel BUG at fs/jbd2/transaction.c:2063!
[   74.794886] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[   74.795533] CPU: 4 PID: 2260 Comm: jbd2/sda-8 Not tainted 5.17.0-rc8-next-20220315-dirty #150
[   74.798327] RIP: 0010:__jbd2_journal_unfile_buffer+0x3e/0x60
[   74.801971] RSP: 0018:ffffa828c24a3cb8 EFLAGS: 00010202
[   74.802694] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   74.803601] RDX: 0000000000000001 RSI: ffff9cfefe725d90 RDI: ffff9cfefe725d90
[   74.804554] RBP: ffff9cfefe725d90 R08: 0000000000000000 R09: ffffa828c24a3b20
[   74.805471] R10: 0000000000000001 R11: 0000000000000001 R12: ffff9cfefe725d90
[   74.806385] R13: ffff9cfefe725d98 R14: 0000000000000000 R15: ffff9cfe833a4d00
[   74.807301] FS:  0000000000000000(0000) GS:ffff9d01afb00000(0000) knlGS:0000000000000000
[   74.808338] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   74.809084] CR2: 00007f2b81bf4000 CR3: 0000000100056000 CR4: 00000000000006e0
[   74.810047] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   74.810981] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   74.811897] Call Trace:
[   74.812241]  <TASK>
[   74.812566]  __jbd2_journal_refile_buffer+0x12f/0x180
[   74.813246]  jbd2_journal_refile_buffer+0x4c/0xa0
[   74.813869]  jbd2_journal_commit_transaction.cold+0xa1/0x148
[   74.817550]  kjournald2+0xf8/0x3e0
[   74.819056]  kthread+0x153/0x1c0
[   74.819963]  ret_from_fork+0x22/0x30

Above issue may happen as follows:
        write                   truncate                   kjournald2
generic_perform_write
 ext4_write_begin
  ext4_walk_page_buffers
   do_journal_get_write_access ->add BJ_Reserved list
 ext4_journalled_write_end
  ext4_walk_page_buffers
   write_end_fn
    ext4_handle_dirty_metadata
                ***************JBD2 ABORT**************
     jbd2_journal_dirty_metadata
 -> return -EROFS, jh in reserved_list
                                                   jbd2_journal_commit_transaction
                                                    while (commit_transaction->t_reserved_list)
                                                      jh = commit_transaction->t_reserved_list;
                        truncate_pagecache_range
                         do_invalidatepage
			  ext4_journalled_invalidatepage
			   jbd2_journal_invalidatepage
			    journal_unmap_buffer
			     __dispose_buffer
			      __jbd2_journal_unfile_buffer
			       jbd2_journal_put_journal_head ->put last ref_count
			        __journal_remove_journal_head
				 bh->b_private = NULL;
				 jh->b_bh = NULL;
				                      jbd2_journal_refile_buffer(journal, jh);
							bh = jh2bh(jh);
							->bh is NULL, later will trigger null-ptr-deref
				 journal_free_journal_head(jh);

After commit 96f1e09745, we no longer hold the j_state_lock while
iterating over the list of reserved handles in
jbd2_journal_commit_transaction().  This potentially allows the
journal_head to be freed by journal_unmap_buffer while the commit
codepath is also trying to free the BJ_Reserved buffers.  Keeping
j_state_lock held while trying extends hold time of the lock
minimally, and solves this issue.

Fixes: 96f1e0974575("jbd2: avoid long hold times of j_state_lock while committing a transaction")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220317142137.1821590-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-04-21 14:21:30 -04:00
Christian Brauner
0014edaedf
fs: unset MNT_WRITE_HOLD on failure
After mnt_hold_writers() has been called we will always have set MNT_WRITE_HOLD
and consequently we always need to pair mnt_hold_writers() with
mnt_unhold_writers(). After the recent cleanup in [1] where Al switched from a
do-while to a for loop the cleanup currently fails to unset MNT_WRITE_HOLD for
the first mount that was changed. Fix this and make sure that the first mount
will be cleaned up and add some comments to make it more obvious.

Link: https://lore.kernel.org/lkml/0000000000007cc21d05dd0432b8@google.com
Link: https://lore.kernel.org/lkml/00000000000080e10e05dd043247@google.com
Link: https://lore.kernel.org/r/20220420131925.2464685-1-brauner@kernel.org
Fixes: e257039f0f ("mount_setattr(): clean the control flow and calling conventions") [1]
Cc: Hillf Danton <hdanton@sina.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: syzbot+10a16d1c43580983f6a2@syzkaller.appspotmail.com
Reported-by: syzbot+306090cfa3294f0bbfb3@syzkaller.appspotmail.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-04-21 17:57:37 +02:00
Naohiro Aota
5f0addf7b8 btrfs: zoned: use dedicated lock for data relocation
Currently, we use btrfs_inode_{lock,unlock}() to grant an exclusive
writeback of the relocation data inode in
btrfs_zoned_data_reloc_{lock,unlock}(). However, that can cause a deadlock
in the following path.

Thread A takes btrfs_inode_lock() and waits for metadata reservation by
e.g, waiting for writeback:

prealloc_file_extent_cluster()
  - btrfs_inode_lock(&inode->vfs_inode, 0);
  - btrfs_prealloc_file_range()
  ...
    - btrfs_replace_file_extents()
      - btrfs_start_transaction
      ...
        - btrfs_reserve_metadata_bytes()

Thread B (e.g, doing a writeback work) needs to wait for the inode lock to
continue writeback process:

do_writepages
  - btrfs_writepages
    - extent_writpages
      - btrfs_zoned_data_reloc_lock(BTRFS_I(inode));
        - btrfs_inode_lock()

The deadlock is caused by relying on the vfs_inode's lock. By using it, we
introduced unnecessary exclusion of writeback and
btrfs_prealloc_file_range(). Also, the lock at this point is useless as we
don't have any dirty pages in the inode yet.

Introduce fs_info->zoned_data_reloc_io_lock and use it for the exclusive
writeback.

Fixes: 35156d8527 ("btrfs: zoned: only allow one process to add pages to a relocation inode")
CC: stable@vger.kernel.org # 5.16.x: 869f4cdc73: btrfs: zoned: encapsulate inode locking for zoned relocation
CC: stable@vger.kernel.org # 5.16.x
CC: stable@vger.kernel.org # 5.17
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-21 16:06:24 +02:00
Filipe Manana
a692e13d87 btrfs: fix assertion failure during scrub due to block group reallocation
During a scrub, or device replace, we can race with block group removal
and allocation and trigger the following assertion failure:

[7526.385524] assertion failed: cache->start == chunk_offset, in fs/btrfs/scrub.c:3817
[7526.387351] ------------[ cut here ]------------
[7526.387373] kernel BUG at fs/btrfs/ctree.h:3599!
[7526.388001] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
[7526.388970] CPU: 2 PID: 1158150 Comm: btrfs Not tainted 5.17.0-rc8-btrfs-next-114 #4
[7526.390279] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[7526.392430] RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
[7526.393520] Code: f3 48 c7 c7 20 (...)
[7526.396926] RSP: 0018:ffffb9154176bc40 EFLAGS: 00010246
[7526.397690] RAX: 0000000000000048 RBX: ffffa0db8a910000 RCX: 0000000000000000
[7526.398732] RDX: 0000000000000000 RSI: ffffffff9d7239a2 RDI: 00000000ffffffff
[7526.399766] RBP: ffffa0db8a911e10 R08: ffffffffa71a3ca0 R09: 0000000000000001
[7526.400793] R10: 0000000000000001 R11: 0000000000000000 R12: ffffa0db4b170800
[7526.401839] R13: 00000003494b0000 R14: ffffa0db7c55b488 R15: ffffa0db8b19a000
[7526.402874] FS:  00007f6c99c40640(0000) GS:ffffa0de6d200000(0000) knlGS:0000000000000000
[7526.404038] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[7526.405040] CR2: 00007f31b0882160 CR3: 000000014b38c004 CR4: 0000000000370ee0
[7526.406112] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[7526.407148] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[7526.408169] Call Trace:
[7526.408529]  <TASK>
[7526.408839]  scrub_enumerate_chunks.cold+0x11/0x79 [btrfs]
[7526.409690]  ? do_wait_intr_irq+0xb0/0xb0
[7526.410276]  btrfs_scrub_dev+0x226/0x620 [btrfs]
[7526.410995]  ? preempt_count_add+0x49/0xa0
[7526.411592]  btrfs_ioctl+0x1ab5/0x36d0 [btrfs]
[7526.412278]  ? __fget_files+0xc9/0x1b0
[7526.412825]  ? kvm_sched_clock_read+0x14/0x40
[7526.413459]  ? lock_release+0x155/0x4a0
[7526.414022]  ? __x64_sys_ioctl+0x83/0xb0
[7526.414601]  __x64_sys_ioctl+0x83/0xb0
[7526.415150]  do_syscall_64+0x3b/0xc0
[7526.415675]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[7526.416408] RIP: 0033:0x7f6c99d34397
[7526.416931] Code: 3c 1c e8 1c ff (...)
[7526.419641] RSP: 002b:00007f6c99c3fca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[7526.420735] RAX: ffffffffffffffda RBX: 00005624e1e007b0 RCX: 00007f6c99d34397
[7526.421779] RDX: 00005624e1e007b0 RSI: 00000000c400941b RDI: 0000000000000003
[7526.422820] RBP: 0000000000000000 R08: 00007f6c99c40640 R09: 0000000000000000
[7526.423906] R10: 00007f6c99c40640 R11: 0000000000000246 R12: 00007fff746755de
[7526.424924] R13: 00007fff746755df R14: 0000000000000000 R15: 00007f6c99c40640
[7526.425950]  </TASK>

That assertion is relatively new, introduced with commit d04fbe19ae
("btrfs: scrub: cleanup the argument list of scrub_chunk()").

The block group we get at scrub_enumerate_chunks() can actually have a
start address that is smaller then the chunk offset we extracted from a
device extent item we got from the commit root of the device tree.
This is very rare, but it can happen due to a race with block group
removal and allocation. For example, the following steps show how this
can happen:

1) We are at transaction T, and we have the following blocks groups,
   sorted by their logical start address:

   [ bg A, start address A, length 1G (data) ]
   [ bg B, start address B, length 1G (data) ]
   (...)
   [ bg W, start address W, length 1G (data) ]

     --> logical address space hole of 256M,
         there used to be a 256M metadata block group here

   [ bg Y, start address Y, length 256M (metadata) ]

      --> Y matches W's end offset + 256M

   Block group Y is the block group with the highest logical address in
   the whole filesystem;

2) Block group Y is deleted and its extent mapping is removed by the call
   to remove_extent_mapping() made from btrfs_remove_block_group().

   So after this point, the last element of the mapping red black tree,
   its rightmost node, is the mapping for block group W;

3) While still at transaction T, a new data block group is allocated,
   with a length of 1G. When creating the block group we do a call to
   find_next_chunk(), which returns the logical start address for the
   new block group. This calls returns X, which corresponds to the
   end offset of the last block group, the rightmost node in the mapping
   red black tree (fs_info->mapping_tree), plus one.

   So we get a new block group that starts at logical address X and with
   a length of 1G. It spans over the whole logical range of the old block
   group Y, that was previously removed in the same transaction.

   However the device extent allocated to block group X is not the same
   device extent that was used by block group Y, and it also does not
   overlap that extent, which must be always the case because we allocate
   extents by searching through the commit root of the device tree
   (otherwise it could corrupt a filesystem after a power failure or
   an unclean shutdown in general), so the extent allocator is behaving
   as expected;

4) We have a task running scrub, currently at scrub_enumerate_chunks().
   There it searches for device extent items in the device tree, using
   its commit root. It finds a device extent item that was used by
   block group Y, and it extracts the value Y from that item into the
   local variable 'chunk_offset', using btrfs_dev_extent_chunk_offset();

   It then calls btrfs_lookup_block_group() to find block group for
   the logical address Y - since there's currently no block group that
   starts at that logical address, it returns block group X, because
   its range contains Y.

   This results in triggering the assertion:

      ASSERT(cache->start == chunk_offset);

   right before calling scrub_chunk(), as cache->start is X and
   chunk_offset is Y.

This is more likely to happen of filesystems not larger than 50G, because
for these filesystems we use a 256M size for metadata block groups and
a 1G size for data block groups, while for filesystems larger than 50G,
we use a 1G size for both data and metadata block groups (except for
zoned filesystems). It could also happen on any filesystem size due to
the fact that system block groups are always smaller (32M) than both
data and metadata block groups, but these are not frequently deleted, so
much less likely to trigger the race.

So make scrub skip any block group with a start offset that is less than
the value we expect, as that means it's a new block group that was created
in the current transaction. It's pointless to continue and try to scrub
its extents, because scrub searches for extents using the commit root, so
it won't find any. For a device replace, skip it as well for the same
reasons, and we don't need to worry about the possibility of extents of
the new block group not being to the new device, because we have the write
duplication setup done through btrfs_map_block().

Fixes: d04fbe19ae ("btrfs: scrub: cleanup the argument list of scrub_chunk()")
CC: stable@vger.kernel.org # 5.17
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-21 16:06:19 +02:00
Ronnie Sahlberg
f5d0f921ea cifs: destage any unwritten data to the server before calling copychunk_write
because the copychunk_write might cover a region of the file that has not yet
been sent to the server and thus fail.

A simple way to reproduce this is:
truncate -s 0 /mnt/testfile; strace -f -o x -ttT xfs_io -i -f -c 'pwrite 0k 128k' -c 'fcollapse 16k 24k' /mnt/testfile

the issue is that the 'pwrite 0k 128k' becomes rearranged on the wire with
the 'fcollapse 16k 24k' due to write-back caching.

fcollapse is implemented in cifs.ko as a SMB2 IOCTL(COPYCHUNK_WRITE) call
and it will fail serverside since the file is still 0b in size serverside
until the writes have been destaged.
To avoid this we must ensure that we destage any unwritten data to the
server before calling COPYCHUNK_WRITE.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1997373
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-20 22:54:54 -05:00
Paulo Alcantara
cd70a3e898 cifs: use correct lock type in cifs_reconnect()
TCP_Server_Info::origin_fullpath and TCP_Server_Info::leaf_fullpath
are protected by refpath_lock mutex and not cifs_tcp_ses_lock
spinlock.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@vger.kernel.org
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-20 22:54:39 -05:00
Paulo Alcantara
41f10081a9 cifs: fix NULL ptr dereference in refresh_mounts()
Either mount(2) or automount might not have server->origin_fullpath
set yet while refresh_cache_worker() is attempting to refresh DFS
referrals.  Add missing NULL check and locking around it.

This fixes bellow crash:

[ 1070.276835] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI
[ 1070.277676] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[ 1070.278219] CPU: 1 PID: 8506 Comm: kworker/u8:1 Not tainted 5.18.0-rc3 #10
[ 1070.278701] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
[ 1070.279495] Workqueue: cifs-dfscache refresh_cache_worker [cifs]
[ 1070.280044] RIP: 0010:strcasecmp+0x34/0x150
[ 1070.280359] Code: 00 00 00 fc ff df 41 54 55 48 89 fd 53 48 83 ec 10 eb 03 4c 89 fe 48 89 ef 48 83 c5 01 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 <42> 0f b6 04 28 38 d0 7f 08 84 c0 0f 85 bc 00 00 00 0f b6 45 ff 44
[ 1070.281729] RSP: 0018:ffffc90008367958 EFLAGS: 00010246
[ 1070.282114] RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 0000000000000000
[ 1070.282691] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1070.283273] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffff873eda27
[ 1070.283857] R10: ffffc900083679a0 R11: 0000000000000001 R12: ffff88812624c000
[ 1070.284436] R13: dffffc0000000000 R14: ffff88810e6e9a88 R15: ffff888119bb9000
[ 1070.284990] FS:  0000000000000000(0000) GS:ffff888151200000(0000) knlGS:0000000000000000
[ 1070.285625] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1070.286100] CR2: 0000561a4d922418 CR3: 000000010aecc000 CR4: 0000000000350ee0
[ 1070.286683] Call Trace:
[ 1070.286890]  <TASK>
[ 1070.287070]  refresh_cache_worker+0x895/0xd20 [cifs]
[ 1070.287475]  ? __refresh_tcon.isra.0+0xfb0/0xfb0 [cifs]
[ 1070.287905]  ? __lock_acquire+0xcd1/0x6960
[ 1070.288247]  ? is_dynamic_key+0x1a0/0x1a0
[ 1070.288591]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[ 1070.289012]  ? lock_downgrade+0x6f0/0x6f0
[ 1070.289318]  process_one_work+0x7bd/0x12d0
[ 1070.289637]  ? worker_thread+0x160/0xec0
[ 1070.289970]  ? pwq_dec_nr_in_flight+0x230/0x230
[ 1070.290318]  ? _raw_spin_lock_irq+0x5e/0x90
[ 1070.290619]  worker_thread+0x5ac/0xec0
[ 1070.290891]  ? process_one_work+0x12d0/0x12d0
[ 1070.291199]  kthread+0x2a5/0x350
[ 1070.291430]  ? kthread_complete_and_exit+0x20/0x20
[ 1070.291770]  ret_from_fork+0x22/0x30
[ 1070.292050]  </TASK>
[ 1070.292223] Modules linked in: bpfilter cifs cifs_arc4 cifs_md4
[ 1070.292765] ---[ end trace 0000000000000000 ]---
[ 1070.293108] RIP: 0010:strcasecmp+0x34/0x150
[ 1070.293471] Code: 00 00 00 fc ff df 41 54 55 48 89 fd 53 48 83 ec 10 eb 03 4c 89 fe 48 89 ef 48 83 c5 01 48 89 f8 48 89 fa 48 c1 e8 03 83 e2 07 <42> 0f b6 04 28 38 d0 7f 08 84 c0 0f 85 bc 00 00 00 0f b6 45 ff 44
[ 1070.297718] RSP: 0018:ffffc90008367958 EFLAGS: 00010246
[ 1070.298622] RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 0000000000000000
[ 1070.299428] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1070.300296] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffff873eda27
[ 1070.301204] R10: ffffc900083679a0 R11: 0000000000000001 R12: ffff88812624c000
[ 1070.301932] R13: dffffc0000000000 R14: ffff88810e6e9a88 R15: ffff888119bb9000
[ 1070.302645] FS:  0000000000000000(0000) GS:ffff888151200000(0000) knlGS:0000000000000000
[ 1070.303462] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1070.304131] CR2: 0000561a4d922418 CR3: 000000010aecc000 CR4: 0000000000350ee0
[ 1070.305004] Kernel panic - not syncing: Fatal exception
[ 1070.305711] Kernel Offset: disabled
[ 1070.305971] ---[ end Kernel panic - not syncing: Fatal exception ]---

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@vger.kernel.org
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-20 22:54:17 -05:00
Damien Le Moal
1da18a296f zonefs: Fix management of open zones
The mount option "explicit_open" manages the device open zone
resources to ensure that if an application opens a sequential file for
writing, the file zone can always be written by explicitly opening
the zone and accounting for that state with the s_open_zones counter.

However, if some zones are already open when mounting, the device open
zone resource usage status will be larger than the initial s_open_zones
value of 0. Ensure that this inconsistency does not happen by closing
any sequential zone that is open when mounting.

Furthermore, with ZNS drives, closing an explicitly open zone that has
not been written will change the zone state to "closed", that is, the
zone will remain in an active state. Since this can then cause failures
of explicit open operations on other zones if the drive active zone
resources are exceeded, we need to make sure that the zone is not
active anymore by resetting it instead of closing it. To address this,
zonefs_zone_mgmt() is modified to change a REQ_OP_ZONE_CLOSE request
into a REQ_OP_ZONE_RESET for sequential zones that have not been
written.

Fixes: b5c00e9757 ("zonefs: open/close zone on file open/close")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
2022-04-21 08:39:20 +09:00
Damien Le Moal
694852ead2 zonefs: Clear inode information flags on inode creation
Ensure that the i_flags field of struct zonefs_inode_info is cleared to
0 when initializing a zone file inode, avoiding seeing the flag
ZONEFS_ZONE_OPEN being incorrectly set.

Fixes: b5c00e9757 ("zonefs: open/close zone on file open/close")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
2022-04-21 08:39:10 +09:00
Dave Chinner
9a5280b312 xfs: reorder iunlink remove operation in xfs_ifree
The O_TMPFILE creation implementation creates a specific order of
operations for inode allocation/freeing and unlinked list
modification. Currently both are serialised by the AGI, so the order
doesn't strictly matter as long as the are both in the same
transaction.

However, if we want to move the unlinked list insertions largely out
from under the AGI lock, then we have to be concerned about the
order in which we do unlinked list modification operations.
O_TMPFILE creation tells us this order is inode allocation/free,
then unlinked list modification.

Change xfs_ifree() to use this same ordering on unlinked list
removal. This way we always guarantee that when we enter the
iunlinked list removal code from this path, we already have the AGI
locked and we don't have to worry about lock nesting AGI reads
inside unlink list locks because it's already locked and attached to
the transaction.

We can do this safely as the inode freeing and unlinked list removal
are done in the same transaction and hence are atomic operations
with respect to log recovery.


Reported-by: Frank Hofmann <fhofmann@cloudflare.com>
Fixes: 298f7bec50 ("xfs: pin inode backing buffer to the inode log item")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21 08:45:16 +10:00
Dave Chinner
b9b3fe152e xfs: convert buffer flags to unsigned.
5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned
fields to be unsigned. This manifests as a compiler error such as:

/kisskb/src/fs/xfs/./xfs_trace.h:432:2: note: in expansion of macro 'TP_printk'
  TP_printk("dev %d:%d daddr 0x%llx bbcount 0x%x hold %d pincount %d "
  ^
/kisskb/src/fs/xfs/./xfs_trace.h:440:5: note: in expansion of macro '__print_flags'
     __print_flags(__entry->flags, "|", XFS_BUF_FLAGS),
     ^
/kisskb/src/fs/xfs/xfs_buf.h:67:4: note: in expansion of macro 'XBF_UNMAPPED'
  { XBF_UNMAPPED,  "UNMAPPED" }
    ^
/kisskb/src/fs/xfs/./xfs_trace.h:440:40: note: in expansion of macro 'XFS_BUF_FLAGS'
     __print_flags(__entry->flags, "|", XFS_BUF_FLAGS),
                                        ^
/kisskb/src/fs/xfs/./xfs_trace.h: In function 'trace_raw_output_xfs_buf_flags_class':
/kisskb/src/fs/xfs/xfs_buf.h:46:23: error: initializer element is not constant
 #define XBF_UNMAPPED  (1 << 31)/* do not map the buffer */

as __print_flags assigns XFS_BUF_FLAGS to a structure that uses an
unsigned long for the flag. Since this results in the value of
XBF_UNMAPPED causing a signed integer overflow, the result is
technically undefined behavior, which gcc-5 does not accept as an
integer constant.

This is based on a patch from Arnd Bergman <arnd@arndb.de>.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-04-21 08:44:59 +10:00
Linus Torvalds
10c5f102e2 Changes since last update:
- Fix use-after-free of the on-stack z_erofs_decompressqueue;
 
  - Fix sysfs documentation Sphinx warnings.
 -----BEGIN PGP SIGNATURE-----
 
 iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYl2A0REceGlhbmdAa2Vy
 bmVsLm9yZwAKCRA5NzHcH7XmBEe7AQCh7aNhYEncBnoHvrB276HCP0xmMwmc0gPq
 6UeSWgarpAEArgfgLyo8tFgS+gvXbhB8/P1FqEMwaRZVLWathXID3QU=
 =GFAS
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-5.18-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
 "One patch to fix a use-after-free race related to the on-stack
  z_erofs_decompressqueue, which happens very rarely but needs to be
  fixed properly soon.

  The other patch fixes some sysfs Sphinx warnings"

* tag 'erofs-for-5.18-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  Documentation/ABI: sysfs-fs-erofs: Fix Sphinx errors
  erofs: fix use-after-free of on-stack io[]
2022-04-20 12:35:20 -07:00
Linus Torvalds
906f904097 Revert "fs/pipe: use kvcalloc to allocate a pipe_buffer array"
This reverts commit 5a519c8fe4.

It turns out that making the pipe almost arbitrarily large has some
rather unexpected downsides.  The kernel test robot reports a kernel
warning that is due to pipe->max_usage now growing to the point where
the iter_file_splice_write() buffer allocation can no longer be
satisfied as a slab allocation, and the

        int nbufs = pipe->max_usage;
        struct bio_vec *array = kcalloc(nbufs, sizeof(struct bio_vec),
                                        GFP_KERNEL);

code sequence there will now always fail as a result.

That code could be modified to use kvcalloc() too, but I feel very
uncomfortable making those kinds of changes for a very niche use case
that really should have other options than make these kinds of
fundamental changes to pipe behavior.

Maybe the CRIU process dumping should be multi-threaded, and use
multiple pipes and multiple cores, rather than try to use one larger
pipe to minimize splice() calls.

Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/all/20220420073717.GD16310@xsang-OptiPlex-9020/
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-20 12:07:53 -07:00
Jaegeuk Kim
27275f181c f2fs: fix wrong condition check when failing metapage read
This patch fixes wrong initialization.

Fixes: 50c63009f6 ("f2fs: avoid an infinite loop in f2fs_sync_dirty_inodes")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-04-20 11:16:43 -07:00
Jaegeuk Kim
0adc2ab0e8 f2fs: keep io_flags to avoid IO split due to different op_flags in two fio holders
Let's attach io_flags to bio only, so that we can merge IOs given original
io_flags only.

Fixes: 64bf0eef01 ("f2fs: pass the bio operation to bio_alloc_bioset")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-04-20 11:16:43 -07:00
Jaegeuk Kim
930e260763 f2fs: remove obsolete whint_mode
This patch removes obsolete whint_mode.

Fixes: 41d36a9f3e ("fs: remove kiocb.ki_hint")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-04-20 11:16:43 -07:00
Christian Brauner
705191b03d fs: fix acl translation
Last cycle we extended the idmapped mounts infrastructure to support
idmapped mounts of idmapped filesystems (No such filesystem yet exist.).
Since then, the meaning of an idmapped mount is a mount whose idmapping
is different from the filesystems idmapping.

While doing that work we missed to adapt the acl translation helpers.
They still assume that checking for the identity mapping is enough.  But
they need to use the no_idmapping() helper instead.

Note, POSIX ACLs are always translated right at the userspace-kernel
boundary using the caller's current idmapping and the initial idmapping.
The order depends on whether we're coming from or going to userspace.
The filesystem's idmapping doesn't matter at the border.

Consequently, if a non-idmapped mount is passed we need to make sure to
always pass the initial idmapping as the mount's idmapping and not the
filesystem idmapping.  Since it's irrelevant here it would yield invalid
ids and prevent setting acls for filesystems that are mountable in a
userns and support posix acls (tmpfs and fuse).

I verified the regression reported in [1] and verified that this patch
fixes it.  A regression test will be added to xfstests in parallel.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215849 [1]
Fixes: bd303368b7 ("fs: support mapped mounts of mapped filesystems")
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <stable@vger.kernel.org> # 5.17
Cc: <regressions@lists.linux.dev>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-19 10:19:02 -07:00
Christoph Hellwig
0fdf977d45 btrfs: fix direct I/O writes for split bios on zoned devices
When a bio is split in btrfs_submit_direct, dip->file_offset contains
the file offset for the first bio.  But this means the start value used
in btrfs_end_dio_bio to record the write location for zone devices is
incorrect for subsequent bios.

CC: stable@vger.kernel.org # 5.16+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-19 15:45:04 +02:00
Christoph Hellwig
00d825258b btrfs: fix direct I/O read repair for split bios
When a bio is split in btrfs_submit_direct, dip->file_offset contains
the file offset for the first bio.  But this means the start value used
in btrfs_check_read_dio_bio is incorrect for subsequent bios.  Add
a file_offset field to struct btrfs_bio to pass along the correct offset.

Given that check_data_csum only uses start of an error message this
means problems with this miscalculation will only show up when I/O fails
or checksums mismatch.

The logic was removed in f4f39fc5dc ("btrfs: remove btrfs_bio::logical
member") but we need it due to the bio splitting.

CC: stable@vger.kernel.org # 5.16+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-19 15:44:56 +02:00
Christoph Hellwig
50f1cff3d8 btrfs: fix and document the zoned device choice in alloc_new_bio
Zone Append bios only need a valid block device in struct bio, but
not the device in the btrfs_bio.  Use the information from
btrfs_zoned_get_device to set up bi_bdev and fix zoned writes on
multi-device file system with non-homogeneous capabilities and remove
the pointless btrfs_bio.device assignment.

Add big fat comments explaining what is going on here.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-19 15:44:49 +02:00
Filipe Manana
50ff57888d btrfs: fix leaked plug after failure syncing log on zoned filesystems
On a zoned filesystem, if we fail to allocate the root node for the log
root tree while syncing the log, we end up returning without finishing
the IO plug we started before, resulting in leaking resources as we
have started writeback for extent buffers of a log tree before. That
allocation failure, which typically is either -ENOMEM or -ENOSPC, is not
fatal and the fsync can safely fallback to a full transaction commit.

So release the IO plug if we fail to allocate the extent buffer for the
root of the log root tree when syncing the log on a zoned filesystem.

Fixes: 3ddebf27fc ("btrfs: zoned: reorder log node allocation on zoned filesystem")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-19 15:44:17 +02:00
Haowen Bai
9339faac6d cifs: Use kzalloc instead of kmalloc/memset
Use kzalloc rather than duplicating its implementation, which
makes code simple and easy to understand.

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-18 10:22:57 -05:00
Pavel Begunkov
c0713540f6 io_uring: fix leaks on IOPOLL and CQE_SKIP
If all completed requests in io_do_iopoll() were marked with
REQ_F_CQE_SKIP, we'll not only skip CQE posting but also
io_free_batch_list() leaking memory and resources.

Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially
return the value greater than the real one, but iopolling will deal with
it and the userspace will re-iopoll if needed. In anyway, I don't think
there are many use cases for REQ_F_CQE_SKIP + IOPOLL.

Fixes: 83a13a4181 ("io_uring: tweak iopoll CQE_SKIP event counting")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5072fc8693fbfd595f89e5d4305bfcfd5d2f0a64.1650186611.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 06:54:11 -06:00
Jens Axboe
323b190ba2 io_uring: free iovec if file assignment fails
We just return failure in this case, but we need to release the iovec
first. If we're doing IO with more than FAST_IOV segments, then the
iovec is allocated and must be freed.

Reported-by: syzbot+96b43810dfe9c3bb95ed@syzkaller.appspotmail.com
Fixes: 584b0180f0 ("io_uring: move read/write file prep state into actual opcode handler")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-16 21:14:00 -06:00
Linus Torvalds
59250f8a7f Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "14 patches.

  Subsystems affected by this patch series: MAINTAINERS, binfmt, and
  mm (tmpfs, secretmem, kasan, kfence, pagealloc, zram, compaction,
  hugetlb, vmalloc, and kmemleak)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: kmemleak: take a full lowmem check in kmemleak_*_phys()
  mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
  revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"
  revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders"
  hugetlb: do not demote poisoned hugetlb pages
  mm: compaction: fix compiler warning when CONFIG_COMPACTION=n
  mm: fix unexpected zeroed page mapping with zram swap
  mm, page_alloc: fix build_zonerefs_node()
  mm, kfence: support kmem_dump_obj() for KFENCE objects
  kasan: fix hw tags enablement when KUNIT tests are disabled
  irq_work: use kasan_record_aux_stack_noalloc() record callstack
  mm/secretmem: fix panic when growing a memfd_secret
  tmpfs: fix regressions from wider use of ZERO_PAGE
  MAINTAINERS: Broadcom internal lists aren't maintainers
2022-04-15 15:57:18 -07:00
Andrew Morton
aeb7923733 revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"
Despite Mike's attempted fix (925346c129), regressions reports
continue:

  https://lore.kernel.org/lkml/cb5b81bd-9882-e5dc-cd22-54bdbaaefbbc@leemhuis.info/
  https://bugzilla.kernel.org/show_bug.cgi?id=215720
  https://lkml.kernel.org/r/b685f3d0-da34-531d-1aa9-479accd3e21b@leemhuis.info

So revert this patch.

Fixes: 9630f0d60f ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE")
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Fangrui Song <maskray@google.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-15 14:49:56 -07:00
Andrew Morton
354e923df0 revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders"
Commit 925346c129 ("fs/binfmt_elf: fix PT_LOAD p_align values for
loaders") was an attempt to fix regressions due to 9630f0d60f
("fs/binfmt_elf: use PT_LOAD p_align values for static PIE").

But regressionss continue to be reported:

  https://lore.kernel.org/lkml/cb5b81bd-9882-e5dc-cd22-54bdbaaefbbc@leemhuis.info/
  https://bugzilla.kernel.org/show_bug.cgi?id=215720
  https://lkml.kernel.org/r/b685f3d0-da34-531d-1aa9-479accd3e21b@leemhuis.info

This patch reverts the fix, so the original can also be reverted.

Fixes: 925346c129 ("fs/binfmt_elf: fix PT_LOAD p_align values for loaders")
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-15 14:49:56 -07:00
Linus Torvalds
0647b9cc7f io_uring-5.18-2022-04-14
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmJY2BIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpielEACDqz5oShRkOHGAaqo0jbOy7C9I0hwUXS9Y
 QeIH7umrSUBgDXLIq3jbsgK3d0nroaYHTU8aa64XnyB0lGouBTiw+I8FZfxNlW7w
 HI+AUZn/m0wvTuKcw//cSX2CP2pcVPfmao+JskU6kyrsoj0nkQNx2NNHXbVQ3cFC
 DKlZE7ZulvzfM+xC0aJxIYUWLECzqgZvicn+mqeqVQX9QJ3k/637GTnVu83QSpIC
 0Hw/isuGuaK+0nurwc4Rx9ZojItVYyPPt3a+8ImtGaJlhyeg9bHLMJZbBLvrlqjd
 AS2iVPaQOhFfxt8qe7ETHpIUmkBQZIckavsCCO7sfFVKtGRNA0kQkAYLjXLDLP8T
 1DXn8VQHsGHHcBd2vTZURng32AniOCzQkshLGF8s/yYuoHp7JODDhwu4xO5HC5eN
 rD7SNDcW9mYlEmVtCuoeKCrRknHL+x3ZTlAPTXr3DgjtgB7phBUmLqZU7riwy7vs
 0oGD/uXf5jT7Ujw2fZNF7LFetQVJkCi92p1+IGBO0hXQShZ5IsCITk95V2d+2cVs
 J1r7ZkXiXiWHkHDQQfuKNeVoUXRe10a4+xOeOCMrwuOgKpKIyoN7ofb6Lp4SRE+j
 f6PvGXvfwIRz9McJwg5IWCIEVASysn4s17bgnaMKXpRCgf67Z4HMTcBO1PHRORBD
 ssBXiVxOhA==
 =aTrS
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.18-2022-04-14' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:

 - Ensure we check and -EINVAL any use of reserved or struct padding.

   Although we generally always do that, it's missed in two spots for
   resource updates, one for the ring fd registration from this merge
   window, and one for the extended arg. Make sure we have all of them
   handled. (Dylan)

 - A few fixes for the deferred file assignment (me, Pavel)

 - Add a feature flag for the deferred file assignment so apps can tell
   we handle it correctly (me)

 - Fix a small perf regression with the current file position fix in
   this merge window (me)

* tag 'io_uring-5.18-2022-04-14' of git://git.kernel.dk/linux-block:
  io_uring: abort file assignment prior to assigning creds
  io_uring: fix poll error reporting
  io_uring: fix poll file assign deadlock
  io_uring: use right issue_flags for splice/tee
  io_uring: verify pad field is 0 in io_get_ext_arg
  io_uring: verify resv is 0 in ringfd register/unregister
  io_uring: verify that resv2 is 0 in io_uring_rsrc_update2
  io_uring: move io_uring_rsrc_update2 validation
  io_uring: fix assign file locking issue
  io_uring: stop using io_wq_work as an fd placeholder
  io_uring: move apoll->events cache
  io_uring: io_kiocb_update_pos() should not touch file for non -1 offset
  io_uring: flag the fact that linked file assignment is sane
2022-04-15 11:33:20 -07:00
Hongyu Jin
60b3005011 erofs: fix use-after-free of on-stack io[]
The root cause is the race as follows:
Thread #1                              Thread #2(irq ctx)

z_erofs_runqueue()
  struct z_erofs_decompressqueue io_A[];
  submit bio A
  z_erofs_decompress_kickoff(,,1)
                                       z_erofs_decompressqueue_endio(bio A)
                                       z_erofs_decompress_kickoff(,,-1)
                                       spin_lock_irqsave()
                                       atomic_add_return()
  io_wait_event()	-> pending_bios is already 0
  [end of function]
                                       wake_up_locked(io_A[]) // crash

Referenced backtrace in kernel 5.4:

[   10.129422] Unable to handle kernel paging request at virtual address eb0454a4
[   10.364157] CPU: 0 PID: 709 Comm: getprop Tainted: G        WC O      5.4.147-ab09225 #1
[   11.556325] [<c01b33b8>] (__wake_up_common) from [<c01b3300>] (__wake_up_locked+0x40/0x48)
[   11.565487] [<c01b3300>] (__wake_up_locked) from [<c044c8d0>] (z_erofs_vle_unzip_kickoff+0x6c/0xc0)
[   11.575438] [<c044c8d0>] (z_erofs_vle_unzip_kickoff) from [<c044c854>] (z_erofs_vle_read_endio+0x16c/0x17c)
[   11.586082] [<c044c854>] (z_erofs_vle_read_endio) from [<c06a80e8>] (clone_endio+0xb4/0x1d0)
[   11.595428] [<c06a80e8>] (clone_endio) from [<c04a1280>] (blk_update_request+0x150/0x4dc)
[   11.604516] [<c04a1280>] (blk_update_request) from [<c06dea28>] (mmc_blk_cqe_complete_rq+0x144/0x15c)
[   11.614640] [<c06dea28>] (mmc_blk_cqe_complete_rq) from [<c04a5d90>] (blk_done_softirq+0xb0/0xcc)
[   11.624419] [<c04a5d90>] (blk_done_softirq) from [<c010242c>] (__do_softirq+0x184/0x56c)
[   11.633419] [<c010242c>] (__do_softirq) from [<c01051e8>] (irq_exit+0xd4/0x138)
[   11.641640] [<c01051e8>] (irq_exit) from [<c010c314>] (__handle_domain_irq+0x94/0xd0)
[   11.650381] [<c010c314>] (__handle_domain_irq) from [<c04fde70>] (gic_handle_irq+0x50/0xd4)
[   11.659641] [<c04fde70>] (gic_handle_irq) from [<c0101b70>] (__irq_svc+0x70/0xb0)

Signed-off-by: Hongyu Jin <hongyu.jin@unisoc.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220401115527.4935-1-hongyu.jin.cn@gmail.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-04-15 23:51:43 +08:00
Theodore Ts'o
eb7054212e ext4: update the cached overhead value in the superblock
If we (re-)calculate the file system overhead amount and it's
different from the on-disk s_overhead_clusters value, update the
on-disk version since this can take potentially quite a while on
bigalloc file systems.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-04-14 22:39:00 -04:00
Jens Axboe
701521403c io_uring: abort file assignment prior to assigning creds
We need to either restore creds properly if we fail on the file
assignment, or just do the file assignment first instead. Let's do
the latter as it's simpler, should make no difference here for
file assignment.

Link: https://lore.kernel.org/lkml/000000000000a7edb305dca75a50@google.com/
Reported-by: syzbot+60c52ca98513a8760a91@syzkaller.appspotmail.com
Fixes: 6bf9c47a39 ("io_uring: defer file assignment")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-14 20:23:40 -06:00
Theodore Ts'o
85d825dbf4 ext4: force overhead calculation if the s_overhead_cluster makes no sense
If the file system does not use bigalloc, calculating the overhead is
cheap, so force the recalculation of the overhead so we don't have to
trust the precalculated overhead in the superblock.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-04-14 22:05:47 -04:00
Namjae Jeon
02655a70b7 ksmbd: set fixed sector size to FS_SECTOR_SIZE_INFORMATION
Currently ksmbd is using ->f_bsize from vfs_statfs() as sector size.
If fat/exfat is a local share, ->f_bsize is a cluster size that is too
large to be used as a sector size. Sector sizes larger than 4K cause
problem occurs when mounting an iso file through windows client.

The error message can be obtained using Mount-DiskImage command,
 the error is:
"Mount-DiskImage : The sector size of the physical disk on which the
virtual disk resides is not supported."

This patch reports fixed 4KB sector size if ->s_blocksize is bigger
than 4KB.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-14 20:56:13 -05:00
Namjae Jeon
8510a043d3 ksmbd: increment reference count of parent fp
Add missing increment reference count of parent fp in
ksmbd_lookup_fd_inode().

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-14 20:56:13 -05:00
Namjae Jeon
50f500b7f6 ksmbd: remove filename in ksmbd_file
If the filename is change by underlying rename the server, fp->filename
and real filename can be different. This patch remove the uses of
fp->filename in ksmbd and replace it with d_path().

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-14 20:56:13 -05:00
Theodore Ts'o
10b01ee92d ext4: fix overhead calculation to account for the reserved gdt blocks
The kernel calculation was underestimating the overhead by not taking
into account the reserved gdt blocks.  With this change, the overhead
calculated by the kernel matches the overhead calculation in mke2fs.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-04-14 21:31:27 -04:00
Linus Torvalds
62345e4828 5 fixes to cifs client, 1 for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmJYRzcACgkQiiy9cAdy
 T1EECAwAuCUHV174bwd5XQBeP7aouw6upKPF9jbXXJpQy/NKxwFAGpiKaEiGAGjb
 dA1M+wdBZfeIQQW7pCuFWuWvtagJChEdnAKUamnX+G+l2iWCxsJz64nisc27u6CR
 z/n6YfIT8OPVQLhfU9xgyXmmmDSSJy6YFNgL3Bh+poR/8764HhEkCgeV3P3kD4Vo
 uvj9p6M4K1Ddh4v914QgDhLzk1UsR++zHCE9IuXrxNDgzrQ5Kh5VFn52vaEaNz7g
 hIr790gALsGNnLrapiaD3L2alrGwr3GO3WT5pqAdCgzTWq5bngBBQYl0qSWnSHs+
 OhnFFBtZ1efRX2bp6SeR8r2z5hNj4YiLJCo/cjoJBIU7W4YDOdLo+UVm/sW9xKgA
 tbw5lYWNDMpHygkb+/uSnw807y1cz/3Ip9QbN/ESQP6b9hg/woAXkjgiMVDcew0P
 nNcQEz0LZHUAzidIAHlByTo17QNQdHOhhZYQuA48dH4KJJrSEWXmiG9n9Qp4Q6cX
 6aueD1Lw
 =q5N5
 -----END PGP SIGNATURE-----

Merge tag '5.18-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:

 - two fixes related to unmount

 - symlink overflow fix

 - minor netfs fix

 - improved tracing for crediting (flow control)

* tag '5.18-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: verify that tcon is valid before dereference in cifs_kill_sb
  cifs: potential buffer overflow in handling symlinks
  cifs: Split the smb3_add_credits tracepoint
  cifs: release cached dentries only if mount is complete
  cifs: Check the IOCB_DIRECT flag, not O_DIRECT
2022-04-14 16:00:36 -07:00
NeilBrown
b3d4650d82 VFS: filename_create(): fix incorrect intent.
When asked to create a path ending '/', but which is not to be a
directory (LOOKUP_DIRECTORY not set), filename_create() will never try
to create the file.  If it doesn't exist, -ENOENT is reported.

However, it still passes LOOKUP_CREATE|LOOKUP_EXCL to the filesystems
->lookup() function, even though there is no intent to create.  This is
misleading and can cause incorrect behaviour.

If you try

   ln -s foo /path/dir/

where 'dir' is a directory on an NFS filesystem which is not currently
known in the dcache, this will fail with ENOENT.

But as the name is not in the dcache, nfs_lookup gets called with
LOOKUP_CREATE|LOOKUP_EXCL and so it returns NULL without performing any
lookup, with the expectation that a subsequent call to create the target
will be made, and the lookup can be combined with the creation.  In the
case with a trailing '/' and no LOOKUP_DIRECTORY, that call is never
made.  Instead filename_create() sees that the dentry is not (yet)
positive and returns -ENOENT - even though the directory actually
exists.

So only set LOOKUP_CREATE|LOOKUP_EXCL if there really is an intent to
create, and use the absence of these flags to decide if -ENOENT should
be returned.

Note that filename_parentat() is only interested in LOOKUP_REVAL, so we
split that out and store it in 'reval_flag'.  __lookup_hash() then gets
reval_flag combined with whatever create flags were determined to be
needed.

Reviewed-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-14 15:53:43 -07:00
Linus Torvalds
722985e2f6 for-5.18-rc2-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmJUfvwACgkQxWXV+ddt
 WDtiuA//csj0CrJq7wyRvgDkbPtCCMyDtL4zfmjP4s++NWaMDOKTxBU8msuGUJgR
 Xribel2zWqlFiOzptd9sGEfxfKfz1p5Rm/gtFj57WVSGV7YtiAGyFGuzn/vpCrgq
 NP5LFY2z5N36VxDXUPKHzvdqczO5W2n9KdaysJCr6FpCz+vVrplFiT5U+X175Sgg
 zWS/XrPIHYbtaEFdb3rUKol6riu7vXW3MxEA9di8K4Xo9TJrp0NtBoGZDsWFQxf7
 vfVwtYsQPsACJxw+MjBcVQ5fNXO5iATL1JfBb9ltN589xouja7bDCb40Fm2gEwWB
 IUatnCq/4MN2S2NdPtEcXQ52W9svT/87z6ZblefSEiQqvQBBJHN131RTM/s8LBG4
 fkHcGV6PsiutSIFycrID0bpXr1Mhvg2aMjxdriLPBtYopaMPh+ivK6LPYoE5MggQ
 /rshWfjiWJhPKXPsng+H7UGbViOiYeG0kUBuaqFx4ARnESpN1gF2dJRYvXYFL/8a
 Q0wmLr1tf3M82VAaAFNOl/BVk8blutCSHcJDLDKxcl3DhVVlY5J8onEPXjneJRkk
 rRB3zoxLlptgfW75CPNyFrpicPdAXzGCoccienIUoEdHSX/W5rNA4L6XpVE5Hv94
 dWR1aVAbCUcuhY1QfBU7H2iYw0RHzqDO3+IRfqJuYsLGciijLbs=
 =APMY
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more code and warning fixes.

  There's one feature ioctl removal patch slated for 5.18 that did not
  make it to the main pull request. It's just a one-liner and the ioctl
  has a v2 that's in use for a long time, no point to postpone it to
  5.19.

  Late update:

   - remove balance v1 ioctl, superseded by v2 in 2012

  Fixes:

   - add back cgroup attribution for compressed writes

   - add super block write start/end annotations to asynchronous balance

   - fix root reference count on an error handling path

   - in zoned mode, activate zone at the chunk allocation time to avoid
     ENOSPC due to timing issues

   - fix delayed allocation accounting for direct IO

  Warning fixes:

   - simplify assertion condition in zoned check

   - remove an unused variable"

* tag 'for-5.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix btrfs_submit_compressed_write cgroup attribution
  btrfs: fix root ref counts in error handling in btrfs_get_root_ref
  btrfs: zoned: activate block group only for extent allocation
  btrfs: return allocated block group from do_chunk_alloc()
  btrfs: mark resumed async balance as writing
  btrfs: remove support of balance v1 ioctl
  btrfs: release correct delalloc amount in direct IO write path
  btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()
  btrfs: zoned: remove redundant condition in btrfs_run_delalloc_range
2022-04-14 10:58:27 -07:00
Linus Torvalds
ec9c57a732 fscache fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmJW6J0ACgkQ+7dXa6fL
 C2sXZw//Z5if0BzqI0SjZ7JeozZ4ADgAKYrT0WvYRg7TPfJBThZsa9NEaQfueXBp
 0DlZpM/ToNQXMYV/jRlDjzooPstKt45IdS4wMewTtXZAXDuZNDygOIRTY5S3zKHY
 wW4ZuZ/3AtNejY8YAnxFtykcbVfxMzYkQ+cgX79fd/+ZLgpYImtk4k3fS53pyRhv
 Q6bT7zxIOIGzaPVda/lu4t1vdTXNTNI2b/9lYOPEXMC6OFmHQ94dnqsyW6AKUD6K
 oYldKUFIFcv5CeiH8sTXFrkckNqgkUln5OXdkFLUjRPHE8Yyq57ZyX7ilp4yFdfS
 HE8VpDaDfk11hVpFLngkPu5RJKsRbad5CQ+Jl8+xiEn0fbuR4q7LFvqSVoykh/x5
 hh5veuSc/KvfpRIijNlCff4gry9gnAhHlLbt8xFbpNWPaYFTVdkkuDtoB8Kv/KoV
 HmxyNyQEW2eL7i7JXfSI4EjNPI85RVhVXmccbJJoKHQRvSVCZuWBlR7NUkykzuXg
 CQbnO/SeqilpOpbFvhTUXdOXNQEHbtcKp9yO9hd/f67E7gNKiDOewOmItMt4HfFx
 pjKv48tLR1cnZtL8pNf7nSMKuGe0bofDu46rnra0Z4mcz4d4y9mkIrdoIVT/heae
 E3Ut/1GJ0MkETl+nYXdV9IxTMHWIVGyzt7HyJTziXj03dfVquqA=
 =fUQK
 -----END PGP SIGNATURE-----

Merge tag 'fscache-fixes-20220413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull fscache fixes from David Howells:
 "Here's a collection of fscache and cachefiles fixes and misc small
  cleanups. The two main fixes are:

   - Add a missing unmark of the inode in-use mark in an error path.

   - Fix a KASAN slab-out-of-bounds error when setting the xattr on a
     cachefiles volume due to the wrong length being given to memcpy().

  In addition, there's the removal of an unused parameter, removal of an
  unused Kconfig option, conditionalising a bit of procfs-related stuff
  and some doc fixes"

* tag 'fscache-fixes-20220413' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  fscache: remove FSCACHE_OLD_API Kconfig option
  fscache: Use wrapper fscache_set_cache_state() directly when relinquishing
  fscache: Move fscache_cookies_seq_ops specific code under CONFIG_PROC_FS
  fscache: Remove the cookie parameter from fscache_clear_page_bits()
  docs: filesystems: caching/backend-api.rst: fix an object withdrawn API
  docs: filesystems: caching/backend-api.rst: correct two relinquish APIs use
  cachefiles: Fix KASAN slab-out-of-bounds in cachefiles_set_volume_xattr
  cachefiles: unmark inode in use in error path
2022-04-14 10:51:20 -07:00
Ronnie Sahlberg
8b6c58458e cifs: verify that tcon is valid before dereference in cifs_kill_sb
On umount, cifs_sb->tlink_tree might contain entries that do not represent
a valid tcon.
Check the tcon for error before we dereference it.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-14 00:07:36 -05:00
Harshit Mogalapalli
64c4a37ac0 cifs: potential buffer overflow in handling symlinks
Smatch printed a warning:
	arch/x86/crypto/poly1305_glue.c:198 poly1305_update_arch() error:
	__memcpy() 'dctx->buf' too small (16 vs u32max)

It's caused because Smatch marks 'link_len' as untrusted since it comes
from sscanf(). Add a check to ensure that 'link_len' is not larger than
the size of the 'link_str' buffer.

Fixes: c69c1b6eae ("cifs: implement CIFSParseMFSymlink()")
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-13 12:00:49 -05:00
Pavel Begunkov
7179c3ce3d io_uring: fix poll error reporting
We should not return an error code in req->result in
io_poll_check_events(), because it may get mangled and returned as
success. Just return the error code directly, the callers will fail the
request or proceed accordingly.

Fixes: 6bf9c47a39 ("io_uring: defer file assignment")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5f03514ee33324dc811fb93df84aee0f695fb044.1649862516.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-13 10:25:37 -06:00
Pavel Begunkov
cce64ef013 io_uring: fix poll file assign deadlock
We pass "unlocked" into io_assign_file() in io_poll_check_events(),
which can lead to double locking.

Fixes: 6bf9c47a39 ("io_uring: defer file assignment")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2476d4ae46554324b599ee4055447b105f20a75a.1649862516.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-13 10:25:37 -06:00
Pavel Begunkov
e941976659 io_uring: use right issue_flags for splice/tee
Pass right issue_flags into into io_file_get_fixed() instead of
IO_URING_F_UNLOCKED. It's probably not a problem at the moment but let's
do it safer.

Fixes: 6bf9c47a39 ("io_uring: defer file assignment")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7d242daa9df5d776907686977cd29fbceb4a2d8d.1649862516.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-13 10:25:36 -06:00
Tadeusz Struk
2da376228a ext4: limit length to bitmap_maxbytes - blocksize in punch_hole
Syzbot found an issue [1] in ext4_fallocate().
The C reproducer [2] calls fallocate(), passing size 0xffeffeff000ul,
and offset 0x1000000ul, which, when added together exceed the
bitmap_maxbytes for the inode. This triggers a BUG in
ext4_ind_remove_space(). According to the comments in this function
the 'end' parameter needs to be one block after the last block to be
removed. In the case when the BUG is triggered it points to the last
block. Modify the ext4_punch_hole() function and add constraint that
caps the length to satisfy the one before laster block requirement.

LINK: [1] https://syzkaller.appspot.com/bug?id=b80bd9cf348aac724a4f4dff251800106d721331
LINK: [2] https://syzkaller.appspot.com/text?tag=ReproC&x=14ba0238700000

Fixes: a4bb6b64e3 ("ext4: enable "punch hole" functionality")
Reported-by: syzbot+7a806094edd5d07ba029@syzkaller.appspotmail.com
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Link: https://lore.kernel.org/r/20220331200515.153214-1-tadeusz.struk@linaro.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-04-12 22:24:35 -04:00
Ye Bin
c186f0887f ext4: fix use-after-free in ext4_search_dir
We got issue as follows:
EXT4-fs (loop0): mounted filesystem without journal. Opts: ,errors=continue
==================================================================
BUG: KASAN: use-after-free in ext4_search_dir fs/ext4/namei.c:1394 [inline]
BUG: KASAN: use-after-free in search_dirblock fs/ext4/namei.c:1199 [inline]
BUG: KASAN: use-after-free in __ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553
Read of size 1 at addr ffff8881317c3005 by task syz-executor117/2331

CPU: 1 PID: 2331 Comm: syz-executor117 Not tainted 5.10.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:83 [inline]
 dump_stack+0x144/0x187 lib/dump_stack.c:124
 print_address_description+0x7d/0x630 mm/kasan/report.c:387
 __kasan_report+0x132/0x190 mm/kasan/report.c:547
 kasan_report+0x47/0x60 mm/kasan/report.c:564
 ext4_search_dir fs/ext4/namei.c:1394 [inline]
 search_dirblock fs/ext4/namei.c:1199 [inline]
 __ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553
 ext4_lookup_entry fs/ext4/namei.c:1622 [inline]
 ext4_lookup+0xb8/0x3a0 fs/ext4/namei.c:1690
 __lookup_hash+0xc5/0x190 fs/namei.c:1451
 do_rmdir+0x19e/0x310 fs/namei.c:3760
 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x445e59
Code: 4d c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff2277fac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000054
RAX: ffffffffffffffda RBX: 0000000000400280 RCX: 0000000000445e59
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200000c0
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000002
R10: 00007fff2277f990 R11: 0000000000000246 R12: 0000000000000000
R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000

The buggy address belongs to the page:
page:0000000048cd3304 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1317c3
flags: 0x200000000000000()
raw: 0200000000000000 ffffea0004526588 ffffea0004528088 0000000000000000
raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8881317c2f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff8881317c2f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff8881317c3000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                   ^
 ffff8881317c3080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff8881317c3100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================

ext4_search_dir:
  ...
  de = (struct ext4_dir_entry_2 *)search_buf;
  dlimit = search_buf + buf_size;
  while ((char *) de < dlimit) {
  ...
    if ((char *) de + de->name_len <= dlimit &&
	 ext4_match(dir, fname, de)) {
	    ...
    }
  ...
    de_len = ext4_rec_len_from_disk(de->rec_len, dir->i_sb->s_blocksize);
    if (de_len <= 0)
      return -1;
    offset += de_len;
    de = (struct ext4_dir_entry_2 *) ((char *) de + de_len);
  }

Assume:
de=0xffff8881317c2fff
dlimit=0x0xffff8881317c3000

If read 'de->name_len' which address is 0xffff8881317c3005, obviously is
out of range, then will trigger use-after-free.
To solve this issue, 'dlimit' must reserve 8 bytes, as we will read
'de->name_len' to judge if '(char *) de + de->name_len' out of range.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220324064816.1209985-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-04-12 22:24:22 -04:00
Ye Bin
b98535d091 ext4: fix bug_on in start_this_handle during umount filesystem
We got issue as follows:
------------[ cut here ]------------
kernel BUG at fs/jbd2/transaction.c:389!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 9 PID: 131 Comm: kworker/9:1 Not tainted 5.17.0-862.14.0.6.x86_64-00001-g23f87daf7d74-dirty #197
Workqueue: events flush_stashed_error_work
RIP: 0010:start_this_handle+0x41c/0x1160
RSP: 0018:ffff888106b47c20 EFLAGS: 00010202
RAX: ffffed10251b8400 RBX: ffff888128dc204c RCX: ffffffffb52972ac
RDX: 0000000000000200 RSI: 0000000000000004 RDI: ffff888128dc2050
RBP: 0000000000000039 R08: 0000000000000001 R09: ffffed10251b840a
R10: ffff888128dc204f R11: ffffed10251b8409 R12: ffff888116d78000
R13: 0000000000000000 R14: dffffc0000000000 R15: ffff888128dc2000
FS:  0000000000000000(0000) GS:ffff88839d680000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001620068 CR3: 0000000376c0e000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 jbd2__journal_start+0x38a/0x790
 jbd2_journal_start+0x19/0x20
 flush_stashed_error_work+0x110/0x2b3
 process_one_work+0x688/0x1080
 worker_thread+0x8b/0xc50
 kthread+0x26f/0x310
 ret_from_fork+0x22/0x30
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---

Above issue may happen as follows:
      umount            read procfs            error_work
ext4_put_super
  flush_work(&sbi->s_error_work);

                      ext4_mb_seq_groups_show
	                ext4_mb_load_buddy_gfp
			  ext4_mb_init_group
			    ext4_mb_init_cache
	                      ext4_read_block_bitmap_nowait
			        ext4_validate_block_bitmap
				  ext4_error
			            ext4_handle_error
			              schedule_work(&EXT4_SB(sb)->s_error_work);

  ext4_unregister_sysfs(sb);
  jbd2_journal_destroy(sbi->s_journal);
    journal_kill_thread
      journal->j_flags |= JBD2_UNMOUNT;

                                          flush_stashed_error_work
				            jbd2_journal_start
					      start_this_handle
					        BUG_ON(journal->j_flags & JBD2_UNMOUNT);

To solve this issue, we call 'ext4_unregister_sysfs() before flushing
s_error_work in ext4_put_super().

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20220322012419.725457-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-04-12 22:22:27 -04:00
Ye Bin
a2b0b205d1 ext4: fix symlink file size not match to file content
We got issue as follows:
[home]# fsck.ext4  -fn  ram0yb
e2fsck 1.45.6 (20-Mar-2020)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Symlink /p3/d14/d1a/l3d (inode #3494) is invalid.
Clear? no
Entry 'l3d' in /p3/d14/d1a (3383) has an incorrect filetype (was 7, should be 0).
Fix? no

As the symlink file size does not match the file content. If the writeback
of the symlink data block failed, ext4_finish_bio() handles the end of IO.
However this function fails to mark the buffer with BH_write_io_error and
so when unmount does journal checkpoint it cannot detect the writeback
error and will cleanup the journal. Thus we've lost the correct data in the
journal area. To solve this issue, mark the buffer as BH_write_io_error in
ext4_finish_bio().

Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220321144438.201685-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-04-12 22:22:14 -04:00
Darrick J. Wong
ad5cd4f4ee ext4: fix fallocate to use file_modified to update permissions consistently
Since the initial introduction of (posix) fallocate back at the turn of
the century, it has been possible to use this syscall to change the
user-visible contents of files.  This can happen by extending the file
size during a preallocation, or through any of the newer modes (punch,
zero, collapse, insert range).  Because the call can be used to change
file contents, we should treat it like we do any other modification to a
file -- update the mtime, and drop set[ug]id privileges/capabilities.

The VFS function file_modified() does all this for us if pass it a
locked inode, so let's make fallocate drop permissions correctly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220308185043.GA117678@magnolia
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2022-04-12 22:22:02 -04:00
Linus Torvalds
c1488c9751 NFSD bug fixes for 5.18-rc:
- Fix a write performance regression
 - Fix crashes during request deferral on RDMA transports
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmJVnIgACgkQM2qzM29m
 f5cdQA//SS2pbiAcjePL6zuVrd/eeALWRJIdCj8fP9eQ4dfi7OM0xQE3qHyLDC3n
 2fjciUSnoE+lIYiRXD7Ii+GqYThSrrxlR7aCT+vrIAL/ksnzTeUiUSuEihPOOYjn
 ZEz9vkP4wlCTFh4SdeBoETEjksvRU3ql37l4S04nxxgidpY/NkF5wZH5iUlV8Z14
 Q8OxJJk1tFJARPz6RqGrRkMws24NhYzeuXQktVA+AOoKjUeQSp4bN1ZSCJv3eX/E
 EjAPDFMYVdenp8OY9RJoP3Xpxb6e1mv5flXYQa7YpJUVH3AccMJ8aKuFadjGfGNS
 2tEzoipJR0fmxkxpDX1g0Bzm8fAIc467l4tskVzUxLnqNS/FjXv2P85h8p9U/R06
 BWbF5CxRu89h1tZ/ZIh4lfw/Ro94XvYaYCDeu9V1TndP+QroNDDY9Ypl13+uy6uS
 zLwEEo5nMUbc7FVmT7UidHqeEukwbNzmXEXYrgQD5hRaT9L+85R0L5/Y+gi+ODDK
 7SKu1Bomi3WN7WKzjvZspHMivVT6JNy/ngHlKSYkWYl/dJzMy5I4Z4sNRVCHKoKX
 17SpKHfKkZAt5oS54dZ40O1PhICZTMclAB7Vb8bs/yggrHTsqwqf21v8FTJW79K9
 MjnYigEWgwi62GC5WdZr5LqAkqRHi2K2rLnFwVSLtZvpb2sxPdM=
 =3AeS
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Fix a write performance regression

 - Fix crashes during request deferral on RDMA transports

* tag 'nfsd-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Fix the svc_deferred_event trace class
  SUNRPC: Fix NFSD's request deferral on RDMA transports
  nfsd: Clean up nfsd_file_put()
  nfsd: Fix a write performance regression
  SUNRPC: Return true/false (not 1/0) from bool functions
2022-04-12 14:23:19 -10:00
Mikulas Patocka
932aba1e16 stat: fix inconsistency between struct stat and struct compat_stat
struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit
st_dev and st_rdev; struct compat_stat (defined in
arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by
a 16-bit padding.

This patch fixes struct compat_stat to match struct stat.

[ Historical note: the old x86 'struct stat' did have that 16-bit field
  that the compat layer had kept around, but it was changes back in 2003
  by "struct stat - support larger dev_t":

    https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d

  and back in those days, the x86_64 port was still new, and separate
  from the i386 code, and had already picked up the old version with a
  16-bit st_dev field ]

Note that we can't change compat_dev_t because it is used by
compat_loop_info.

Also, if the st_dev and st_rdev values are 32-bit, we don't have to use
old_valid_dev to test if the value fits into them.  This fixes
-EOVERFLOW on filesystems that are on NVMe because NVMe uses the major
number 259.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-12 13:35:08 -10:00
Dylan Yudaken
d2347b9695 io_uring: verify pad field is 0 in io_get_ext_arg
Ensure that only 0 is passed for pad here.

Fixes: c73ebb685f ("io_uring: add timeout support for io_uring_enter()")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220412163042.2788062-5-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-12 10:46:54 -06:00
Dylan Yudaken
6fb53cf8ff io_uring: verify resv is 0 in ringfd register/unregister
Only allow resv field to be 0 in struct io_uring_rsrc_update user
arguments.

Fixes: e7a6c00dc7 ("io_uring: add support for registering ring file descriptors")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220412163042.2788062-4-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-12 10:46:54 -06:00
Dylan Yudaken
d8a3ba9c14 io_uring: verify that resv2 is 0 in io_uring_rsrc_update2
Verify that the user does not pass in anything but 0 for this field.

Fixes: 992da01aa9 ("io_uring: change registration/upd/rsrc tagging ABI")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220412163042.2788062-3-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-12 10:46:54 -06:00
Dylan Yudaken
565c5e616e io_uring: move io_uring_rsrc_update2 validation
Move validation to be more consistently straight after
copy_from_user. This is already done in io_register_rsrc_update and so
this removes that redundant check.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220412163042.2788062-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-12 10:46:54 -06:00
Pavel Begunkov
0f8da75b51 io_uring: fix assign file locking issue
io-wq work cancellation path can't take uring_lock as how it's done on
file assignment, we have to handle IO_WQ_WORK_CANCEL first, this fixes
encountered hangs.

Fixes: 6bf9c47a39 ("io_uring: defer file assignment")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0d9b9f37841645518503f6a207e509d14a286aba.1649773463.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-12 08:39:49 -06:00
Jens Axboe
82733d168c io_uring: stop using io_wq_work as an fd placeholder
There are two reasons why this isn't the best idea:

- It's an odd area to grab a bit of storage space, hence it's an odd area
  to grab storage from.
- It puts the 3rd io_kiocb cacheline into the hot path, where normal hot
  path just needs the first two.

Use 'cflags' for joint fd/cflags storage. We only need fd until we
successfully issue, and we only need cflags once a request is done and is
completed.

Fixes: 6bf9c47a39 ("io_uring: defer file assignment")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-11 17:06:20 -06:00
Jens Axboe
2804ecd8d3 io_uring: move apoll->events cache
In preparation for fixing a regression with pulling in an extra cacheline
for IO that doesn't usually touch the last cacheline of the io_kiocb,
move the cached location of apoll->events to space shared with some other
completion data. Like cflags, this isn't used until after the request
has been completed, so we can piggy back on top of comp_list.

Fixes: 81459350d5 ("io_uring: cache req->apoll->events in req->cflags")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-11 17:06:13 -06:00
Jens Axboe
6f83ab22ad io_uring: io_kiocb_update_pos() should not touch file for non -1 offset
-1 tells use to use the current position, but we check if the file is
a stream regardless of that. Fix up io_kiocb_update_pos() to only
dip into file if we need to. This is both more efficient and also drops
12 bytes of text on aarch64 and 64 bytes on x86-64.

Fixes: b4aec40015 ("io_uring: do not recalculate ppos unnecessarily")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-11 16:38:21 -06:00
Jens Axboe
c4212f3eb8 io_uring: flag the fact that linked file assignment is sane
Give applications a way to tell if the kernel supports sane linked files,
as in files being assigned at the right time to be able to reliably
do <open file direct into slot X><read file from slot X> while using
IOSQE_IO_LINK to order them.

Not really a bug fix, but flag it as such so that it gets pulled in with
backports of the deferred file assignment.

Fixes: 6bf9c47a39 ("io_uring: defer file assignment")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-10 19:08:18 -06:00
Linus Torvalds
33563138ac Driver core changes for 5.18-rc2
Here are 2 small driver core changes for 5.18-rc2.
 
 They are the final bits in the removal of the default_attrs field in
 struct kobj_type.  I had to wait until after 5.18-rc1 for all of the
 changes to do this came in through different development trees, and then
 one new user snuck in.  So this series has 2 changes:
 	- removal of the default_attrs field in the powerpc/pseries/vas
 	  code.  Change has been acked by the PPC maintainers to come
 	  through this tree
 	- removal of default_attrs from struct kobj_type now that all
 	  in-kernel users are removed.  This cleans up the kobject code
 	  a little bit and removes some duplicated functionality that
 	  confused people (now there is only one way to do default
 	  groups.)
 
 All of these have been in linux-next for all of this week with no
 reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYlLRHg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yn+9gCfXN0OvKmw5QD55z8YGp/jIycK0ToAnifJ/OX+
 sU2V8ZQfNbV8xw7iXfc2
 =L+Uc
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here are two small driver core changes for 5.18-rc2.

  They are the final bits in the removal of the default_attrs field in
  struct kobj_type. I had to wait until after 5.18-rc1 for all of the
  changes to do this came in through different development trees, and
  then one new user snuck in. So this series has two changes:

   - removal of the default_attrs field in the powerpc/pseries/vas code.

     The change has been acked by the PPC maintainers to come through
     this tree

   - removal of default_attrs from struct kobj_type now that all
     in-kernel users are removed.

     This cleans up the kobject code a little bit and removes some
     duplicated functionality that confused people (now there is only
     one way to do default groups)

  Both of these have been in linux-next for all of this week with no
  reported problems"

* tag 'driver-core-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  kobject: kobj_type: remove default_attrs
  powerpc/pseries/vas: use default_groups in kobj_type
2022-04-10 09:55:09 -10:00
Linus Torvalds
4d6f9f2475 io_uring-5.18-2022-04-08
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmJQoEAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpnKLEACIEBUwB70b/pXP9sV5LzPwwXiyBvamS/9w
 AIH1iuFxSvrAGU2n7yMnl2s81yvsFQEu8LACYynsmoXtRl8fHAcUYWchscI5Diub
 nbjuBM1iovoypCo2elbbRRHoHi9Og863oTMWtMTvhBaqrmVGU2wIZFfGp6qL96tH
 mMmR8fLRhiiLXHnm9xLh48/ETGCGCoTNPKsP4dhUHwe70plwwg+0rE2aASddwbvm
 FayoYH3JufbTPT6plwJWGjfATlkrsqxHRMqwNHLqPB7d8AN3sb7OtA/QnVCMfVys
 OLJ7wWQB5mxOsX7rJ6TzPctwqRK2Quo5prSi0kcTzp5VKfSGmIahAGDKasExfDaw
 Y3PREQ2ZLBu2+/j0wOALGTyQQS2i8hU6fa43s7zsAe/qxrpW8j8HMquz4gDveuBC
 tKbvtQXjiuozU0S5fcB+QbR2R0BzfPmmlOeUcdyM4SeOEB6c9Ak+gn8x3MgCYaUT
 i4QulhX8Icsg1gwAlMxybrRc2xVgmfeGxOVhzudVOIpkhcxXgE/Oei2NnNmS4sw8
 LMKg2aVSkZ+OdjGy00CTll0019nXL66fkqUdAyNH4lb719fnrG4/GB7CRWhzTcuF
 xUPvQuGN27sqwJeaprmNG3uMp4C+LC3YjrkX9nUF0NPj0p3kPW5BWwozy5ddShdm
 KJYdSc8jVg==
 =7g1t
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.18-2022-04-08' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A bit bigger than usual post merge window, largely due to a revert and
  a fix of at what point files are assigned for requests.

  The latter fixing a linked request use case where a dependent link can
  rely on what file is assigned consistently.

  Summary:

   - 32-bit compat fix for IORING_REGISTER_IOWQ_AFF (Eugene)

   - File assignment fixes (me)

   - Revert of the NAPI poll addition from this merge window. The author
     isn't available right now to engage on this, so let's revert it and
     we can retry for the 5.19 release (me, Jakub)

   - Fix a timeout removal race (me)

   - File update and SCM fixes (Pavel)"

* tag 'io_uring-5.18-2022-04-08' of git://git.kernel.dk/linux-block:
  io_uring: fix race between timeout flush and removal
  io_uring: use nospec annotation for more indexes
  io_uring: zero tag on rsrc removal
  io_uring: don't touch scm_fp_list after queueing skb
  io_uring: nospec index for tags on files update
  io_uring: implement compat handling for IORING_REGISTER_IOWQ_AFF
  Revert "io_uring: Add support for napi_busy_poll"
  io_uring: drop the old style inflight file tracking
  io_uring: defer file assignment
  io_uring: propagate issue_flags state down to file assignment
  io_uring: move read/write file prep state into actual opcode handler
  io_uring: defer splice/tee file validity check until command issue
  io_uring: don't check req->file in io_fsync_prep()
2022-04-08 18:50:14 -10:00
David Howells
1ddff77416 cifs: Split the smb3_add_credits tracepoint
Split the smb3_add_credits tracepoint to make it more obvious when looking
at the logs which line corresponds to what credit change.  Also add a
tracepoint for credit overflow when it's being added back.

Note that it might be better to add another field to the tracepoint for
the information rather than splitting it.  It would also be useful to store
the MID potentially, though that isn't available when the credits are first
obtained.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: linux-cifs@vger.kernel.org
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-08 21:25:38 -05:00
Yue Hu
61132ceeda fscache: remove FSCACHE_OLD_API Kconfig option
Commit 01491a7565 ("fscache, cachefiles: Disable configuration") added
the FSCACHE_OLD_API configuration when rewritten. Now, it's not used any
more. Remove it.

Signed-off-by: Yue Hu <huyue2@coolpad.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Link: https://listman.redhat.com/archives/linux-cachefs/2022-March/006647.html # v1
2022-04-08 23:54:37 +01:00
Yue Hu
b3c958c20a fscache: Use wrapper fscache_set_cache_state() directly when relinquishing
We already have the wrapper function to set cache state.

Signed-off-by: Yue Hu <huyue2@coolpad.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com>
cc: linux-cachefs@redhat.com
Link: https://listman.redhat.com/archives/linux-cachefs/2022-April/006648.html # v1
2022-04-08 23:54:37 +01:00
Yue Hu
19517e5374 fscache: Move fscache_cookies_seq_ops specific code under CONFIG_PROC_FS
fscache_cookies_seq_ops is only used in proc.c that is compiled under
enabled CONFIG_PROC_FS, so move related code under this config. The
same case exsits in internal.h.

Also, make fscache_lru_cookie_timeout static due to no user outside
of cookie.c.

Signed-off-by: Yue Hu <huyue2@coolpad.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Link: https://listman.redhat.com/archives/linux-cachefs/2022-April/006649.html # v1
2022-04-08 23:54:37 +01:00
Yue Hu
2c547f2998 fscache: Remove the cookie parameter from fscache_clear_page_bits()
The cookie is not used at all, remove it and update the usage in io.c
and afs/write.c (which is the only user outside of fscache currently)
at the same time.

[DH: Amended the documentation also]

Signed-off-by: Yue Hu <huyue2@coolpad.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Link: https://listman.redhat.com/archives/linux-cachefs/2022-April/006659.html
2022-04-08 23:54:37 +01:00
Dave Wysochanski
7b2f6c3066 cachefiles: Fix KASAN slab-out-of-bounds in cachefiles_set_volume_xattr
Use the actual length of volume coherency data when setting the
xattr to avoid the following KASAN report.

 BUG: KASAN: slab-out-of-bounds in cachefiles_set_volume_xattr+0xa0/0x350 [cachefiles]
 Write of size 4 at addr ffff888101e02af4 by task kworker/6:0/1347

 CPU: 6 PID: 1347 Comm: kworker/6:0 Kdump: loaded Not tainted 5.18.0-rc1-nfs-fscache-netfs+ #13
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-4.fc34 04/01/2014
 Workqueue: events fscache_create_volume_work [fscache]
 Call Trace:
  <TASK>
  dump_stack_lvl+0x45/0x5a
  print_report.cold+0x5e/0x5db
  ? __lock_text_start+0x8/0x8
  ? cachefiles_set_volume_xattr+0xa0/0x350 [cachefiles]
  kasan_report+0xab/0x120
  ? cachefiles_set_volume_xattr+0xa0/0x350 [cachefiles]
  kasan_check_range+0xf5/0x1d0
  memcpy+0x39/0x60
  cachefiles_set_volume_xattr+0xa0/0x350 [cachefiles]
  cachefiles_acquire_volume+0x2be/0x500 [cachefiles]
  ? __cachefiles_free_volume+0x90/0x90 [cachefiles]
  fscache_create_volume_work+0x68/0x160 [fscache]
  process_one_work+0x3b7/0x6a0
  worker_thread+0x2c4/0x650
  ? process_one_work+0x6a0/0x6a0
  kthread+0x16c/0x1a0
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x22/0x30
  </TASK>

 Allocated by task 1347:
  kasan_save_stack+0x1e/0x40
  __kasan_kmalloc+0x81/0xa0
  cachefiles_set_volume_xattr+0x76/0x350 [cachefiles]
  cachefiles_acquire_volume+0x2be/0x500 [cachefiles]
  fscache_create_volume_work+0x68/0x160 [fscache]
  process_one_work+0x3b7/0x6a0
  worker_thread+0x2c4/0x650
  kthread+0x16c/0x1a0
  ret_from_fork+0x22/0x30

 The buggy address belongs to the object at ffff888101e02af0
 which belongs to the cache kmalloc-8 of size 8
 The buggy address is located 4 bytes inside of
 8-byte region [ffff888101e02af0, ffff888101e02af8)

 The buggy address belongs to the physical page:
 page:00000000a2292d70 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x101e02
 flags: 0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff)
 raw: 0017ffffc0000200 0000000000000000 dead000000000001 ffff888100042280
 raw: 0000000000000000 0000000080660066 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
 ffff888101e02980: fc 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc
 ffff888101e02a00: 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc 00
 >ffff888101e02a80: fc fc fc fc 00 fc fc fc fc 00 fc fc fc fc 04 fc
                                                            ^
 ffff888101e02b00: fc fc fc 00 fc fc fc fc 00 fc fc fc fc 00 fc fc
 ffff888101e02b80: fc fc 00 fc fc fc fc 00 fc fc fc fc 00 fc fc fc
 ==================================================================

Fixes: 413a4a6b0b "cachefiles: Fix volume coherency attribute"
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/20220405134649.6579-1-dwysocha@redhat.com/ # v1
Link: https://lore.kernel.org/r/20220405142810.8208-1-dwysocha@redhat.com/ # Incorrect v2
2022-04-08 23:32:40 +01:00
Jeffle Xu
ea5dc04612 cachefiles: unmark inode in use in error path
Unmark inode in use if error encountered. If the in-use flag leakage
occurs in cachefiles_open_file(), Cachefiles will complain "Inode
already in use" when later another cookie with the same index key is
looked up.

If the in-use flag leakage occurs in cachefiles_create_tmpfile(), though
the "Inode already in use" warning won't be triggered, fix the leakage
anyway.

Reported-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Fixes: 1f08c925e7 ("cachefiles: Implement backing file wrangling")
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Link: https://listman.redhat.com/archives/linux-cachefs/2022-March/006615.html # v1
Link: https://listman.redhat.com/archives/linux-cachefs/2022-March/006618.html # v2
2022-04-08 23:32:30 +01:00
Jens Axboe
e677edbcab io_uring: fix race between timeout flush and removal
io_flush_timeouts() assumes the timeout isn't in progress of triggering
or being removed/canceled, so it unconditionally removes it from the
timeout list and attempts to cancel it.

Leave it on the list and let the normal timeout cancelation take care
of it.

Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-08 14:50:05 -06:00
Linus Torvalds
1a3b1bba7c NFS client bugfixes for Linux 5.18
Highlights include:
 
 Stable fixes:
 - SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
 
 Bugfixes:
 - Fix an Oopsable condition due to SLAB_ACCOUNT setting in the NFSv4.2
   xattr code.
 - Fix for open() using an file open mode of '3' in NFSv4
 - Replace readdir's use of xxhash() with hash_64()
 - Several patches to handle malloc() failure in SUNRPC
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmJQb5IACgkQZwvnipYK
 APLVeRAAheQX422gIrjrWsQrZxJWmMDBBj9qwiV14XMs6g5KF/DtNbIJm1BGjA9Y
 aLgNVknSDcGHMmvdh5MCVF7R61sqkCNbQWP3Bx6OBP6ei3FEgQBPyujcFXAaZ7UK
 T4fBzfPoEYaxVkEG/L5BsAUnh6TvyCfNm1oqZZv2bv9M0U6UlXnRLWZN9I6cAtcw
 GI9HgnufWOxJWIqPd9dY45aF44Ru+aJ953eecPh0G83Reoc99EAU+PvJJD18Wl0H
 BZqM6ar5pNush50yVIjnPFXg+sM97dKGlLvYL11Uy7f8valSaBXPZLZQ/jG4Vx/a
 m/l8FVwBEV89BG7z6jKNju7ERbO+xbPgXP8lSwkj69fXOpuvzo/G6VAxS6ZJww12
 p6TJnhCMSEF7qrQc5mejA803dT4MiJWo4i2th482Ws/tRN5H+y9pDYLsvBPM8iHo
 sVkJBco04tBHEH3qKgDTu8EY7/shH+GVO3Wmtcjz47T2rbmQB7gJtfipHcLmV2qd
 Jy1tiz1T7rhGs7KNtWpu8210MY9iFhIt3rAvdGdeVmgnNjrmRex0Sxqx8vgKUAFE
 aQjXkkpwHA3rHWnuGPMKu5i48BPFQ8m4QgdVC9F6ylNT6RIbImqplQiwIYNIxYC+
 cBohUG1yYZRjFEMkXBdfK5cImkTLW7RkLCatRh5v8OU8xqpRVL8=
 =edfs
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client fixes from Trond Myklebust:
 "Stable fixes:

   - SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()

  Bugfixes:

   - Fix an Oopsable condition due to SLAB_ACCOUNT setting in the
     NFSv4.2 xattr code.

   - Fix for open() using an file open mode of '3' in NFSv4

   - Replace readdir's use of xxhash() with hash_64()

   - Several patches to handle malloc() failure in SUNRPC"

* tag 'nfs-for-5.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: Move the call to xprt_send_pagedata() out of xprt_sock_sendmsg()
  SUNRPC: svc_tcp_sendmsg() should handle errors from xdr_alloc_bvec()
  SUNRPC: Handle allocation failure in rpc_new_task()
  NFS: Ensure rpc_run_task() cannot fail in nfs_async_rename()
  NFSv4/pnfs: Handle RPC allocation errors in nfs4_proc_layoutget
  SUNRPC: Handle low memory situations in call_status()
  SUNRPC: Handle ENOMEM in call_transmit_status()
  NFSv4.2: Fix missing removal of SLAB_ACCOUNT on kmem_cache allocation
  SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
  NFS: Replace readdir's use of xxhash() with hash_64()
  SUNRPC: handle malloc failure in ->request_prepare
  NFSv4: fix open failure with O_ACCMODE flag
  Revert "NFSv4: Handle the special Linux file open access mode"
2022-04-08 07:39:17 -10:00
Shyam Prasad N
d788e51636 cifs: release cached dentries only if mount is complete
During cifs_kill_sb, we first dput all the dentries that we have cached.
However this function can also get called for mount failures.
So dput the cached dentries only if the filesystem mount is complete.
i.e. cifs_sb->root is populated.

Fixes: 5e9c89d43f ("cifs: Grab a reference for the dentry of the cached directory during the lifetime of the cache")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-08 09:03:43 -05:00
David Howells
994fd530a5 cifs: Check the IOCB_DIRECT flag, not O_DIRECT
Use the IOCB_DIRECT indicator flag on the I/O context rather than checking to
see if the file was opened O_DIRECT.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-08 09:01:59 -05:00
Linus Torvalds
5a5dcfd1e8 4 fixes to cifs client, 1 for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmJPLtgACgkQiiy9cAdy
 T1HDgAwApseyrSawmewfbpwPj5r0mloUn9aBk3DV5LIU/j5ZGyOApG5VpZS+Izkk
 xaWxRVtH7vWwVyTxOpIwKOXG8FA+1R7xGP/PJCpBye6IvnjVo/pTFR21XwO6NFGV
 AzCVqJZIcqvxBykUOJLXOl64IcTyTHzpgwGI++bGwklGat0H48A1YtbP0QTbhKw1
 kCRU+vXm8mecOqb+ZyG7cXDE+Muutg6KIvSE9qBshx5l0Qjuw7rE+SpMgfoi+ll9
 PyQyl/UTg5T+DHLkofQGHMmGyPgfrYtA6LShNXq7Z/KGja/SF4LEiEPPaSyLASAl
 LwFWPXFP8jro2daN0mPU4LUkV4XVmnshMgXgM2b0yueg0SEi9pnpDbGFt2E7JXdW
 ocGeH8ehEGNty/1Z6LoBMFQdlvNLCNoZRio8iKqMhZvtmR5ybrxRj/KyPldkYUUN
 hNdruZNQ/I3hJVSgYIzaF7SlrlM4/aaLUP4E8xUCyIEjlhEFbxFUQfWY0SxlttuA
 6IM/Y9DZ
 =wy77
 -----END PGP SIGNATURE-----

Merge tag '5.18-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs client fixes from Steve French:

 - reconnect fixes: one for DFS and one to avoid a reconnect race

 - small change to deal with upcoming behavior change of list iterators

* tag '5.18-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal module number
  cifs: force new session setup and tcon for dfs
  cifs: remove check of list iterator against head past the loop body
  cifs: fix potential race with cifsd thread
2022-04-07 19:16:49 -10:00
Trond Myklebust
88dee0cc93 NFS: Ensure rpc_run_task() cannot fail in nfs_async_rename()
Ensure the call to rpc_run_task() cannot fail by preallocating the
rpc_task.

Fixes: 910ad38697 ("NFS: Fix memory allocation in rpc_alloc_task()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-04-07 16:20:00 -04:00
Trond Myklebust
68b78dcdf9 NFSv4/pnfs: Handle RPC allocation errors in nfs4_proc_layoutget
If rpc_run_task() fails due to an allocation error, then bail out early.

Fixes: 910ad38697 ("NFS: Fix memory allocation in rpc_alloc_task()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-04-07 16:20:00 -04:00
Muchun Song
dcc7977c7f NFSv4.2: Fix missing removal of SLAB_ACCOUNT on kmem_cache allocation
The commit 5c60e89e71 ("NFSv4.2: Fix up an invalid combination of memory
allocation flags") has stripped GFP_KERNEL_ACCOUNT down to GFP_KERNEL,
however, it forgot to remove SLAB_ACCOUNT from kmem_cache allocation.
It means that memory is still limited by kmemcg.  This patch also fix a
NULL pointer reference issue [1] reported by NeilBrown.

Link: https://lore.kernel.org/all/164870069595.25542.17292003658915487357@noble.neil.brown.name/ [1]
Fixes: 5c60e89e71 ("NFSv4.2: Fix up an invalid combination of memory allocation flags")
Fixes: 5abc1e37af ("mm: list_lru: allocate list_lru_one only when needed")
Reported-by: NeilBrown <neilb@suse.de>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-04-07 16:20:00 -04:00
Trond Myklebust
f00432063d SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()
We must ensure that all sockets are closed before we call xprt_free()
and release the reference to the net namespace. The problem is that
calling fput() will defer closing the socket until delayed_fput() gets
called.
Let's fix the situation by allowing rpciod and the transport teardown
code (which runs on the system wq) to call __fput_sync(), and directly
close the socket.

Reported-by: Felix Fu <foyjog@gmail.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: a73881c96d ("SUNRPC: Fix an Oops in udp_poll()")
Cc: stable@vger.kernel.org # 5.1.x: 3be232f11a: SUNRPC: Prevent immediate close+reconnect
Cc: stable@vger.kernel.org # 5.1.x: 89f42494f9: SUNRPC: Don't call connect() more than once on a TCP socket
Cc: stable@vger.kernel.org # 5.1.x
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-04-07 16:19:47 -04:00
Trond Myklebust
830f1111d9 NFS: Replace readdir's use of xxhash() with hash_64()
Both xxhash() and hash_64() appear to give similarly low collision
rates with a standard linearly increasing readdir offset. They both give
similarly higher collision rates when applied to ext4's offsets.

So switch to using the standard hash_64().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-04-07 16:19:47 -04:00
Pavel Begunkov
4cdd158be9 io_uring: use nospec annotation for more indexes
There are still several places that using pre array_index_nospec()
indexes, fix them up.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b01ef5ee83f72ed35ad525912370b729f5d145f4.1649336342.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07 11:17:47 -06:00
Pavel Begunkov
8f0a24801b io_uring: zero tag on rsrc removal
Automatically default rsrc tag in io_queue_rsrc_removal(), it's safer
than leaving it there and relying on the rest of the code to behave and
not use it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1cf262a50df17478ea25b22494dcc19f3a80301f.1649336342.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07 11:17:47 -06:00
Pavel Begunkov
a07211e300 io_uring: don't touch scm_fp_list after queueing skb
It's safer to not touch scm_fp_list after we queued an skb to which it
was assigned, there might be races lurking if we screw subtle sync
guarantees on the io_uring side.

Fixes: 6b06314c47 ("io_uring: add file set registration")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07 11:17:47 -06:00
Pavel Begunkov
34bb771841 io_uring: nospec index for tags on files update
Don't forget to array_index_nospec() for indexes before updating rsrc
tags in __io_sqe_files_update(), just use already safe and precalculated
index @i.

Fixes: c3bdad0271 ("io_uring: add generic rsrc update with tags")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07 11:17:47 -06:00
Eugene Syromiatnikov
0f5e4b83b3 io_uring: implement compat handling for IORING_REGISTER_IOWQ_AFF
Similarly to the way it is done im mbind syscall.

Cc: stable@vger.kernel.org # 5.14
Fixes: fe76421d1d ("io_uring: allow user configurable IO thread CPU affinity")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07 11:17:47 -06:00
Jens Axboe
cb31821673 Revert "io_uring: Add support for napi_busy_poll"
This reverts commit adc8682ec6.

There's some discussion on the API not being as good as it can be.
Rather than ship something and be stuck with it forever, let's revert
the NAPI support for now and work on getting something sorted out
for the next kernel release instead.

Link: https://lore.kernel.org/io-uring/b7bbc124-8502-0ee9-d4c8-7c41b4487264@kernel.dk/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07 11:17:47 -06:00
Jens Axboe
d5361233e9 io_uring: drop the old style inflight file tracking
io_uring tracks requests that are referencing an io_uring descriptor to
be able to cancel without worrying about loops in the references. Since
we now assign the file at execution time, the easier approach is to drop
a potentially problematic reference before we punt the request. This
eliminates the need to special case these types of files beyond just
marking them as such, and simplifies cancelation quite a bit.

This also fixes a recent issue where an async punted tee operation would
with the io_uring descriptor as the output file would crash when
attempting to get a reference to the file from the io-wq worker. We
could have worked around that, but this is the much cleaner fix.

Fixes: 6bf9c47a39 ("io_uring: defer file assignment")
Reported-by: syzbot+c4b9303500a21750b250@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07 11:17:37 -06:00
Jens Axboe
6bf9c47a39 io_uring: defer file assignment
If an application uses direct open or accept, it knows in advance what
direct descriptor value it will get as it picks it itself. This allows
combined requests such as:

sqe = io_uring_get_sqe(ring);
io_uring_prep_openat_direct(sqe, ..., file_slot);
sqe->flags |= IOSQE_IO_LINK | IOSQE_CQE_SKIP_SUCCESS;

sqe = io_uring_get_sqe(ring);
io_uring_prep_read(sqe,file_slot, buf, buf_size, 0);
sqe->flags |= IOSQE_FIXED_FILE;

io_uring_submit(ring);

where we prepare both a file open and read, and only get a completion
event for the read when both have completed successfully.

Currently links are fully prepared before the head is issued, but that
fails if the dependent link needs a file assigned that isn't valid until
the head has completed.

Conversely, if the same chain is performed but the fixed file slot is
already valid, then we would be unexpectedly returning data from the
old file slot rather than the newly opened one. Make sure we're
consistent here.

Allow deferral of file setup, which makes this documented case work.

Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07 11:17:28 -06:00
Jens Axboe
5106dd6e74 io_uring: propagate issue_flags state down to file assignment
We'll need this in a future patch, when we could be assigning the file
after the prep stage. While at it, get rid of the io_file_get() helper,
it just makes the code harder to read.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-07 11:17:24 -06:00
Dennis Zhou
acee08aaf6 btrfs: fix btrfs_submit_compressed_write cgroup attribution
This restores the logic from commit 46bcff2bfc ("btrfs: fix compressed
write bio blkcg attribution") which added cgroup attribution to btrfs
writeback. It also adds back the REQ_CGROUP_PUNT flag for these ios.

Fixes: 9150724048 ("btrfs: determine stripe boundary at bio allocation time in btrfs_submit_compressed_write")
CC: stable@vger.kernel.org # 5.16+
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-06 00:50:51 +02:00
Jia-Ju Bai
168a2f776b btrfs: fix root ref counts in error handling in btrfs_get_root_ref
In btrfs_get_root_ref(), when btrfs_insert_fs_root() fails,
btrfs_put_root() can happen for two reasons:

- the root already exists in the tree, in that case it returns the
  reference obtained in btrfs_lookup_fs_root()

- another error so the cleanup is done in the fail label

Calling btrfs_put_root() unconditionally would lead to double decrement
of the root reference possibly freeing it in the second case.

Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Fixes: bc44d7c4b2 ("btrfs: push btrfs_grab_fs_root into btrfs_get_fs_root")
CC: stable@vger.kernel.org # 5.10+
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-06 00:50:47 +02:00
Naohiro Aota
760e69c4c2 btrfs: zoned: activate block group only for extent allocation
In btrfs_make_block_group(), we activate the allocated block group,
expecting that the block group is soon used for allocation. However, the
chunk allocation from flush_space() context broke the assumption. There
can be a large time gap between the chunk allocation time and the extent
allocation time from the chunk.

Activating the empty block groups pre-allocated from flush_space()
context can exhaust the active zone counter of a device. Once we use all
the active zone counts for empty pre-allocated block groups, we cannot
activate new block group for the other things: metadata, tree-log, or
data relocation block group.  That failure results in a fake -ENOSPC.

This patch introduces CHUNK_ALLOC_FORCE_FOR_EXTENT to distinguish the
chunk allocation from find_free_extent(). Now, the new block group is
activated only in that context.

Fixes: eb66a010d5 ("btrfs: zoned: activate new block group")
CC: stable@vger.kernel.org # 5.16+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-06 00:50:41 +02:00
Naohiro Aota
820c363bd5 btrfs: return allocated block group from do_chunk_alloc()
Return the allocated block group from do_chunk_alloc(). This is a
preparation patch for the next patch.

CC: stable@vger.kernel.org # 5.16+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-06 00:50:39 +02:00
Naohiro Aota
a690e5f2db btrfs: mark resumed async balance as writing
When btrfs balance is interrupted with umount, the background balance
resumes on the next mount. There is a potential deadlock with FS freezing
here like as described in commit 26559780b953 ("btrfs: zoned: mark
relocation as writing"). Mark the process as sb_writing to avoid it.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-06 00:49:50 +02:00
Nikolay Borisov
d03ae0d3b6 btrfs: remove support of balance v1 ioctl
It was scheduled for removal in kernel v5.18 commit 6c405b2409
("btrfs: deprecate BTRFS_IOC_BALANCE ioctl") thus its time has come.

Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-06 00:49:39 +02:00
Naohiro Aota
6d82ad13c4 btrfs: release correct delalloc amount in direct IO write path
Running generic/406 causes the following WARNING in btrfs_destroy_inode()
which tells there are outstanding extents left.

In btrfs_get_blocks_direct_write(), we reserve a temporary outstanding
extents with btrfs_delalloc_reserve_metadata() (or indirectly from
btrfs_delalloc_reserve_space(()). We then release the outstanding extents
with btrfs_delalloc_release_extents(). However, the "len" can be modified
in the COW case, which releases fewer outstanding extents than expected.

Fix it by calling btrfs_delalloc_release_extents() for the original length.

To reproduce the warning, the filesystem should be 1 GiB.  It's
triggering a short-write, due to not being able to allocate a large
extent and instead allocating a smaller one.

  WARNING: CPU: 0 PID: 757 at fs/btrfs/inode.c:8848 btrfs_destroy_inode+0x1e6/0x210 [btrfs]
  Modules linked in: btrfs blake2b_generic xor lzo_compress
  lzo_decompress raid6_pq zstd zstd_decompress zstd_compress xxhash zram
  zsmalloc
  CPU: 0 PID: 757 Comm: umount Not tainted 5.17.0-rc8+ #101
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS d55cb5a 04/01/2014
  RIP: 0010:btrfs_destroy_inode+0x1e6/0x210 [btrfs]
  RSP: 0018:ffffc9000327bda8 EFLAGS: 00010206
  RAX: 0000000000000000 RBX: ffff888100548b78 RCX: 0000000000000000
  RDX: 0000000000026900 RSI: 0000000000000000 RDI: ffff888100548b78
  RBP: ffff888100548940 R08: 0000000000000000 R09: ffff88810b48aba8
  R10: 0000000000000001 R11: ffff8881004eb240 R12: ffff88810b48a800
  R13: ffff88810b48ec08 R14: ffff88810b48ed00 R15: ffff888100490c68
  FS:  00007f8549ea0b80(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f854a09e733 CR3: 000000010a2e9003 CR4: 0000000000370eb0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   destroy_inode+0x33/0x70
   dispose_list+0x43/0x60
   evict_inodes+0x161/0x1b0
   generic_shutdown_super+0x2d/0x110
   kill_anon_super+0xf/0x20
   btrfs_kill_super+0xd/0x20 [btrfs]
   deactivate_locked_super+0x27/0x90
   cleanup_mnt+0x12c/0x180
   task_work_run+0x54/0x80
   exit_to_user_mode_prepare+0x152/0x160
   syscall_exit_to_user_mode+0x12/0x30
   do_syscall_64+0x42/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   RIP: 0033:0x7f854a000fb7

Fixes: f0bfa76a11 ("btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range")
CC: stable@vger.kernel.org # 5.17
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-06 00:49:35 +02:00
Nathan Chancellor
6d4a6b515c btrfs: remove unused variable in btrfs_{start,write}_dirty_block_groups()
Clang's version of -Wunused-but-set-variable recently gained support for
unary operations, which reveals two unused variables:

  fs/btrfs/block-group.c:2949:6: error: variable 'num_started' set but not used [-Werror,-Wunused-but-set-variable]
          int num_started = 0;
              ^
  fs/btrfs/block-group.c:3116:6: error: variable 'num_started' set but not used [-Werror,-Wunused-but-set-variable]
          int num_started = 0;
              ^
  2 errors generated.

These variables appear to be unused from their introduction, so just
remove them to silence the warnings.

Fixes: c9dc4c6578 ("Btrfs: two stage dirty block group writeout")
Fixes: 1bbc621ef2 ("Btrfs: allow block group cache writeout outside critical section in commit")
CC: stable@vger.kernel.org # 5.4+
Link: https://github.com/ClangBuiltLinux/linux/issues/1614
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-06 00:49:15 +02:00
Haowen Bai
9435be734a btrfs: zoned: remove redundant condition in btrfs_run_delalloc_range
The logic !A || A && B is equivalent to !A || B. so we can
make code clear.

Note: though it's preferred to be in the more human readable form, there
have been repeated reports and patches as the expression is detected by
tools so apply it to reduce the load.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add note ]
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-06 00:49:09 +02:00
Linus Torvalds
ce4c854ee8 for-5.18-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmJLaJoACgkQxWXV+ddt
 WDvd3g/+KXpWfPOg4h7rCKiZJoE0/XnAaqHacGmIy/8+TqiFm7VEdG2uA4wEjqYq
 yeCr+2NhsWcGKOWARUPmP8sBmla98tE0+1abxQFYhGdsLMcgbI6EyW+j53E7++gY
 0j4ZY4s7c1c4lsyStNrS7kauDcbZ3MHWhG1VNdTyWCf6wal/5jU93SwdDSz81Int
 iD8y2nVQoHcWdyBbPA7t9dYL5cCGR425gRrSb3Yhvc3cI6KJLbPLDpt1AAnR6XVx
 W/ELKYMVBF70nTqp3ptBTfbmm83/AcrA1/Epoe2sEV+3gaejx1vkIYwcWlvgW//+
 e980zd0k/QSse8O8s66Z7c9QnLML+L/6xxK2vJObw85NFzEGkrmoZdCa7HyyI3MS
 pkI0ox9z4I4OEgzBaH8ZpUYgRxQlRnbDv58GrTs/aEBuNaHGQJfiCmh4sx75jGvR
 eXMnqmL/EIKqZqTsLfWVuMLC7ZgTG8V76VOJ3gDE4v5Yxi306bCLcdTS+0HYz5GA
 fkCisFy69jSbkMbOClTegCkYiRHxV6GjI19yPqj29OsnFpk+htnubOvgs3Q1link
 odpBavejmbVxffrYw92Qm7NRMEldZoSaMa39zFJ9Wgtwui7Oe66K3JFlrSnth5mD
 YowfkApQCzsSiXzitwLEHmfs7F1MVh2a0jY4hTz1XrizxZPWKHE=
 =XZaw
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - prevent deleting subvolume with active swapfile

 - fix qgroup reserve limit calculation overflow

 - remove device count in superblock and its item in one transaction so
   they cant't get out of sync

 - skip defragmenting an isolated sector, this could cause some extra IO

 - unify handling of mtime/permissions in hole punch with fallocate

 - zoned mode fixes:
     - remove assert checking for only single mode, we have the
       DUP mode implemented
     - fix potential lockdep warning while traversing devices
       when checking for zone activation

* tag 'for-5.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: prevent subvol with swapfile from being deleted
  btrfs: do not warn for free space inode in cow_file_range
  btrfs: avoid defragging extents whose next extents are not targets
  btrfs: fix fallocate to use file_modified to update permissions consistently
  btrfs: remove device item and update super block in the same transaction
  btrfs: fix qgroup reserve overflow the qgroup limit
  btrfs: zoned: remove left over ASSERT checking for single profile
  btrfs: zoned: traverse devices under chunk_mutex in btrfs_can_activate_zone
2022-04-05 08:59:37 -07:00
Greg Kroah-Hartman
cdb4f26a63 kobject: kobj_type: remove default_attrs
Now that all in-kernel users of default_attrs for the kobj_type are gone
and converted to properly use the default_groups pointer instead, it can
be safely removed.

There is one standard way to create sysfs files in a kobj_type, and not
two like before, causing confusion as to which should be used.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20220106133151.607703-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-05 15:39:19 +02:00
Steve French
7cd1cc415d cifs: update internal module number
To 2.36

Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-04 22:40:14 -05:00
Paulo Alcantara
fb39d30e22 cifs: force new session setup and tcon for dfs
Do not reuse existing sessions and tcons in DFS failover as it might
connect to different servers and shares.

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Cc: stable@vger.kernel.org
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-04 22:39:21 -05:00
Jens Axboe
584b0180f0 io_uring: move read/write file prep state into actual opcode handler
In preparation for not necessarily having a file assigned at prep time,
defer any initialization associated with the file to when the opcode
handler is run.

Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-04 16:50:20 -06:00
Jens Axboe
a3e4bc23d5 io_uring: defer splice/tee file validity check until command issue
In preparation for not using the file at prep time, defer checking if this
file refers to a valid io_uring instance until issue time.

This also means we can get rid of the cleanup flag for splice and tee.

Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-04 16:50:20 -06:00
Jakob Koschel
00c796eecb cifs: remove check of list iterator against head past the loop body
When list_for_each_entry() completes the iteration over the whole list
without breaking the loop, the iterator value will be a bogus pointer
computed based on the head element.

While it is safe to use the pointer to determine if it was computed
based on the head element, either with list_entry_is_head() or
&pos->member == head, using the iterator variable after the loop should
be avoided.

In preparation to limit the scope of a list iterator to the list
traversal loop, use a dedicated pointer to point to the found element [1].

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1]
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-04 12:01:22 -05:00
Paulo Alcantara
687127c81a cifs: fix potential race with cifsd thread
To avoid racing with demultiplex thread while it is handling data on
socket, use cifs_signal_cifsd_for_reconnect() helper for marking
current server to reconnect and let the demultiplex thread handle the
rest.

Fixes: dca65818c8 ("cifs: use a different reconnect helper for non-cifsd threads")
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-04-04 12:01:22 -05:00
Jens Axboe
ec858afda8 io_uring: don't check req->file in io_fsync_prep()
This is a leftover from the really old days where we weren't able to
track and error early if we need a file and it wasn't assigned. Kill
the check.

Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-03 17:07:54 -06:00
Linus Torvalds
09bb8856d4 Updates to Tracing:
- Rename the staging files to give them some meaning.
   Just stage1,stag2,etc, does not show what they are for
 
 - Check for NULL from allocation in bootconfig
 
 - Hold event mutex for dyn_event call in user events
 
 - Mark user events to broken (to work on the API)
 
 - Remove eBPF updates from user events
 
 - Remove user events from uapi header to keep it from being installed.
 
 - Move ftrace_graph_is_dead() into inline as it is called from hot paths
   and also convert it into a static branch.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYkmyIBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qutfAQD90gbUgFMFe2akF5sKhonF5T6mm0+w
 BsWqNlBEKBxmfwD+Krfpxql/PKp/gCufcIUUkYC4E6Wl9akf3eO1qQel1Ao=
 =ZTn1
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull more tracing updates from Steven Rostedt:

 - Rename the staging files to give them some meaning. Just
   stage1,stag2,etc, does not show what they are for

 - Check for NULL from allocation in bootconfig

 - Hold event mutex for dyn_event call in user events

 - Mark user events to broken (to work on the API)

 - Remove eBPF updates from user events

 - Remove user events from uapi header to keep it from being installed.

 - Move ftrace_graph_is_dead() into inline as it is called from hot
   paths and also convert it into a static branch.

* tag 'trace-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Move user_events.h temporarily out of include/uapi
  ftrace: Make ftrace_graph_is_dead() a static branch
  tracing: Set user_events to BROKEN
  tracing/user_events: Remove eBPF interfaces
  tracing/user_events: Hold event_mutex during dyn_event_add
  proc: bootconfig: Add null pointer check
  tracing: Rename the staging files for trace_events
2022-04-03 12:26:01 -07:00
Lv Ruyi
bed5b60bf6 proc: bootconfig: Add null pointer check
kzalloc is a memory allocation function which can return NULL when some
internal memory errors happen. It is safer to add null pointer check.

Link: https://lkml.kernel.org/r/20220329104004.2376879-1-lv.ruyi@zte.com.cn

Cc: stable@vger.kernel.org
Fixes: c1a3c36017 ("proc: bootconfig: Add /proc/bootconfig to show boot config list")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-04-02 08:40:09 -04:00
Linus Torvalds
88e6c02076 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "Assorted bits and pieces"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  aio: drop needless assignment in aio_read()
  clean overflow checks in count_mounts() a bit
  seq_file: fix NULL pointer arithmetic warning
  uml/x86: use x86 load_unaligned_zeropad()
  asm/user.h: killed unused macros
  constify struct path argument of finish_automount()/do_add_mount()
  fs: Remove FIXME comment in generic_write_checks()
2022-04-01 19:57:03 -07:00
Linus Torvalds
a4251ab989 Fixes for 5.18-rc1:
- Fix a potential infinite loop in FIEMAP by fixing an off by one error
    when comparing the requested range against s_maxbytes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmJEitIACgkQ+H93GTRK
 tOsgIxAAhUTopZ+ZZs/tLHlWj5wI9zBNCyIHVsb46GR/EyxpMhuAf7w+3qKuEV80
 /oZwnpY82mYUtWimjdyuNsAusPwtX/YnuocVmeR3cMj5yynk4rxLX2zVbjUDvOFW
 BGIm2IXtNSsqYqIMiEs5M8WsAbDbJz3frdcfEQcFZ9qasLZl6lpwBg2pWuO65X1T
 UBNMbPcLS8I8OHuB9sVzgHnPlPZgj3VK1JJofzrXMG4+wbB5BpVO7suT/54N45/J
 LJohEH2uokGxvy9bYNeQT7ZLkxAlEQM8uxtjKa/G1f16IWIMqw/EwVTTDDpUekN0
 6qQx6J5tPKbP/rZi7a7abDDD0mFgq7M/xggjbeX2C8gsgPMWGN4GdEdKs9SWPWNt
 emR7o0Pv9kIMHX4m2FUSLOdOoTAsJq84ujKNC0zk2wwOfIZ0PgxiQihYHp1NcoXk
 FCGUiIs0PaNxlZ58nuGp0ran2fq/RtaG+6OJRNxOChwRpRRqp3htfc8RK+7mci4X
 cUdDON3JdExd1+dPg2hnIrb4Tcbfsj4QfniXZBMg135+6N5r0upgpC5F9iR+kGDK
 RN7Ma6snVvnPmRWwzaC6PDoTI71N4sbHIkuoYOzdA+BikKfrJXm8PLLKfMf2IPEM
 lw7rEeJxWzWBIiG6IyBf5ybfxM1pdS/QNvfcYfYVIk06XwnApmE=
 =Lx1P
 -----END PGP SIGNATURE-----

Merge tag 'vfs-5.18-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull vfs fix from Darrick Wong:
 "The erofs developers felt that FIEMAP should handle ranged requests
  starting at s_maxbytes by returning EFBIG instead of passing the
  filesystem implementation a nonsense 0-byte request.

  Not sure why they keep tagging this 'iomap', but the VFS shouldn't be
  asking for information about ranges of a file that the filesystem
  already declared that it does not support.

   - Fix a potential infinite loop in FIEMAP by fixing an off by one
     error when comparing the requested range against s_maxbytes"

* tag 'vfs-5.18-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  fs: fix an infinite loop in iomap_fiemap
2022-04-01 19:35:56 -07:00
Linus Torvalds
b32e3819a8 Bug fixes for 5.18:
- Fix an incorrect free space calculation in xfs_reserve_blocks that
   could lead to a request for free blocks that will never succeed.
 - Fix a hang in xfs_reserve_blocks caused by an infinite loop and the
   incorrect free space calculation.
 - Fix yet a third problem in xfs_reserve_blocks where multiple racing
   threads can overfill the reserve pool.
 - Fix an accounting error that lead to us reporting reserved space as
   "available".
 - Fix a race condition during abnormal fs shutdown that could cause UAF
   problems when memory reclaim and log shutdown try to clean up inodes.
 - Fix a bug where log shutdown can race with unmount to tear down the
   log, thereby causing UAF errors.
 - Disentangle log and filesystem shutdown to reduce confusion.
 - Fix some confusion in xfs_trans_commit such that a race between
   transaction commit and filesystem shutdown can cause unlogged dirty
   inode metadata to be committed, thereby corrupting the filesystem.
 - Remove a performance optimization in the log as it was discovered that
   certain storage hardware handle async log flushes so poorly as to
   cause serious performance regressions.  Recent restructuring of other
   parts of the logging code mean that no performance benefit is seen on
   hardware that handle it well.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmJEjg4ACgkQ+H93GTRK
 tOsUHRAAj5n65L1r80HSuayipWrwyuD2paa3cqtV76Y8n6CBck8CcnWjZ1t88NYO
 rvfRWKlkJAxkafc5dOiEQ4lm0AL2pHuAWeMrVu/EHvwzj9D+F/GXPrgWCJ1spsN/
 Osd8LgMrxrOaaHgPhENKGa4Ktc5dQoRfDD1IvbAyPEt2puLjoRm00STqUFMYuejR
 96yzreL8kLQdnErxKlQzo4ShsdckyBqA62AAQBIVr3B93+plefTXrWlp2HrblP11
 Upd/sdrIdVp/n104fAfaMT5pSamn3NGyV+8FaUjruBv/alC7pWrWrah6KuBl9omy
 ql8wvtO5KTQdESLx2wjYpC5odi9hQfJYDLCMN3B6gxXg26mcZVvOCfSNtrayPkYj
 ShfkT4mn+TFPFqG/NOgg8ebPp94fzXctKZ+ExVg1dGVAR9oz8QlUUmsqIdbq/4tx
 hrkGOKTa/oGBVoakHgGfbfY5Zz4yX5hVjGWN+l54YRKWHZwYDatRT/O4GkQEZqlU
 dgXsZFT0IZpz4WuTCan+VPJ85I+SKuoYsjh0n4rlGgfCcVK81uvtRB+Jn9rO0wHW
 Uzv1S6HrzblvBnUZGVt49z3co+APYwIRKY8mb+YHWmVNQqmZqyUj7KyFvxuTTkOm
 g0b8oK0/3hpC70v91aTFQmpA6R6cQAS2D4JsM4nXxjyluRWAphI=
 =6Vg8
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.18-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
 "This fixes multiple problems in the reserve pool sizing functions: an
  incorrect free space calculation, a pointless infinite loop, and even
  more braindamage that could result in the pool being overfilled. The
  pile of patches from Dave fix myriad races and UAF bugs in the log
  recovery code that much to our mutual surprise nobody's tripped over.
  Dave also fixed a performance optimization that had turned into a
  regression.

  Dave Chinner is taking over as XFS maintainer starting Sunday and
  lasting until 5.19-rc1 is tagged so that I can focus on starting a
  massive design review for the (feature complete after five years)
  online repair feature. From then on, he and I will be moving XFS to a
  co-maintainership model by trading duties every other release.

  NOTE: I hope very strongly that the other pieces of the (X)FS
  ecosystem (fstests and xfsprogs) will make similar changes to spread
  their maintenance load.

  Summary:

   - Fix an incorrect free space calculation in xfs_reserve_blocks that
     could lead to a request for free blocks that will never succeed.

   - Fix a hang in xfs_reserve_blocks caused by an infinite loop and the
     incorrect free space calculation.

   - Fix yet a third problem in xfs_reserve_blocks where multiple racing
     threads can overfill the reserve pool.

   - Fix an accounting error that lead to us reporting reserved space as
     "available".

   - Fix a race condition during abnormal fs shutdown that could cause
     UAF problems when memory reclaim and log shutdown try to clean up
     inodes.

   - Fix a bug where log shutdown can race with unmount to tear down the
     log, thereby causing UAF errors.

   - Disentangle log and filesystem shutdown to reduce confusion.

   - Fix some confusion in xfs_trans_commit such that a race between
     transaction commit and filesystem shutdown can cause unlogged dirty
     inode metadata to be committed, thereby corrupting the filesystem.

   - Remove a performance optimization in the log as it was discovered
     that certain storage hardware handle async log flushes so poorly as
     to cause serious performance regressions. Recent restructuring of
     other parts of the logging code mean that no performance benefit is
     seen on hardware that handle it well"

* tag 'xfs-5.18-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: drop async cache flushes from CIL commits.
  xfs: shutdown during log recovery needs to mark the log shutdown
  xfs: xfs_trans_commit() path must check for log shutdown
  xfs: xfs_do_force_shutdown needs to block racing shutdowns
  xfs: log shutdown triggers should only shut down the log
  xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks
  xfs: shutdown in intent recovery has non-intent items in the AIL
  xfs: aborting inodes on shutdown may need buffer lock
  xfs: don't report reserved bnobt space as available
  xfs: fix overfilling of reserve pool
  xfs: always succeed at setting the reserve pool size
  xfs: remove infinite loop when reserving free block pool
  xfs: don't include bnobt blocks when reserving free block pool
  xfs: document the XFS_ALLOC_AGFL_RESERVE constant
2022-04-01 19:30:44 -07:00
Linus Torvalds
3b1509f275 for-5.18/io_uring-2022-04-01
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmJHUngQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpilREACSEJUap2IutYkj6S9EPkP0CMvOpUD66224
 somEuE/5da8m2CWANfeCngZG/Vx5O+6KNHhgJxzrzjEhSYQvfdE8IetGHa6fMWfe
 /2pYA4Yj/kuojKdfdzOQ3RRCouMR3+JoNv2+e01vt57xbEh3cHqOdE4YLW+g8vkW
 zy8k2V/xwnObAA8+Snh47t5X3biG417OBOtq2HQH5hQURWV9xrfBjT7u4cbkpSDr
 NBuqWdwJefisQWxGM+iMYdTWgTRuhm5wi/ISFmOQIwkelzecfKy3KtoP3kMoeyaP
 1P+L89Uqt+akfIl/fK0qvedico9rF0t/ptnisJR1qAvEo2cvPoOI/HUKjjS1I//z
 kOb34xJ9bPgIsGRV5OZb7SrC/rz5dvE8z3H4c8HlSeKMRSP7ZHpghCIeom2/fVp/
 85mxw0z8bmPRTDZs+X+/1ZjvolHg2TxrYU66HNJ5lcomfqHvADk38/nIIE3nXxx4
 7R03Ea/0LW9N7v1350IkpIbinwr1pVEINZSoqkdzEdv2te5zVvKtsunQGjrtZ4ir
 00ZdDpw4lexUITI9XMHEPeBmq70fCdw196dE9iVKpwh6aFh34/VNBvRSIIdDj6jY
 YbGgubnmaWjSe4/KkWMg1+durbfi7XAkQq0y4ZQ3czhuQxs1eNz0Zk5sInpFvOmZ
 KLM5G5W02Q==
 =jogi
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/io_uring-2022-04-01' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A little bit all over the map, some regression fixes for this merge
  window, and some general fixes that are stable bound. In detail:

   - Fix an SQPOLL memory ordering issue (Almog)

   - Accept fixes (Dylan)

   - Poll fixes (me)

   - Fixes for provided buffers and recycling (me)

   - Tweak to IORING_OP_MSG_RING command added in this merge window (me)

   - Memory leak fix (Pavel)

   - Misc fixes and tweaks (Pavel, me)"

* tag 'for-5.18/io_uring-2022-04-01' of git://git.kernel.dk/linux-block:
  io_uring: defer msg-ring file validity check until command issue
  io_uring: fail links if msg-ring doesn't succeeed
  io_uring: fix memory leak of uid in files registration
  io_uring: fix put_kbuf without proper locking
  io_uring: fix invalid flags for io_put_kbuf()
  io_uring: improve req fields comments
  io_uring: enable EPOLLEXCLUSIVE for accept poll
  io_uring: improve task work cache utilization
  io_uring: fix async accept on O_NONBLOCK sockets
  io_uring: remove IORING_CQE_F_MSG
  io_uring: add flag for disabling provided buffer recycling
  io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly
  io_uring: don't recycle provided buffer if punted to async worker
  io_uring: fix assuming triggered poll waitqueue is the single poll
  io_uring: bump poll refs to full 31-bits
  io_uring: remove poll entry from list when canceling all
  io_uring: fix memory ordering when SQPOLL thread goes to sleep
  io_uring: ensure that fsnotify is always called
  io_uring: recycle provided before arming poll
2022-04-01 16:10:51 -07:00
Linus Torvalds
7a3ecddc57 six ksmbd server fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmJGd6AACgkQiiy9cAdy
 T1EPMAwAhVkPm9ugSwXKWPWiRnIuEiIi9oZm8pLLnpsXW4/FJFhJv8h+8cbOUj5S
 P/aApIVX7NRMZg6xIGjrWXIE6eqCQ0iwO5V/50x5mGLUbZNMOS/tfcXEoE1VVZnf
 XQlaY53oVMVvaZWqx7tw9SPsMapjCIngAZ9pJOBJdUbmi+DcWKJuOFrYs7iw6oM8
 tz4yBKh9N8qR8Ubq0f6guMl1VcMqGd+fYC/n/eHKY4hXhQSMOAGCYy4dqpP4aITr
 DJLhMqnTtmB2bU0q17ktsk2bBo6/ENvrIyG+ozLL7LGj6BHvbeqdb1HDQgRcdFyp
 JuvUZVeENLIa2AX9aFnXyjB4AM6ccFl036f1HgbnIfspH7s+EucFEsHmiFMCoMkY
 f4CMdA5NGBXZhWu9/7CduwbvaBkVFD57ucjXRLReJKAsJVN9sk5Q0/n8uNttQ+Ao
 iMfOkeI4cdp6gXXITS8H9H1MUkqE/bac9cS3A/8pSuUjAjImF5D9ZuWsuYkInmg9
 M+ms9G4U
 =j4Oh
 -----END PGP SIGNATURE-----

Merge tag '5.18-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull ksmbd updates from Steve French:

 - three cleanup fixes

 - shorten module load warning

 - two documentation fixes

* tag '5.18-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: replace usage of found with dedicated list iterator variable
  ksmbd: Remove a redundant zeroing of memory
  MAINTAINERS: ksmbd: switch Sergey to reviewer
  ksmbd: shorten experimental warning on loading the module
  ksmbd: use netif_is_bridge_port
  Documentation: ksmbd: update Feature Status table
2022-04-01 14:39:28 -07:00
Linus Torvalds
9a005bea4f 14 fixes to cifs client and to smbfs_common code
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmJGhDkACgkQiiy9cAdy
 T1EquQv/V05eD1EWZzW+Y5Q+cbYBPn8T3r6YqSw6hIvbgdF6W6U45UPyJ4ASHKvl
 +MvTPSJzEzSWKYfcDryUBsa7aAXaekPpxW6uZk7jMRuVIfkannTV9E+rZItwC/dS
 g8kDDjvcWrwN9iQUyVNX1JCybpq5YnwEIA5z0C8rpuCjDelNfK5DCaf02PweuRlY
 3pDlj8Jy4sY8mBvqzFiWheY6Xc3pbvheDIvHEieaZpAyPwF7r1hmwvMDkzbJfPjV
 Qrwcrwq2FahK4E98gJQZ5U0CeXvNPEHPcc8c4bAkRpnaa/v2oVSCW4FGjhA1Stp2
 0APC+AsjkY95DJ0GHerGfH5G0z6FAbRJjyXtt1NTkyKavEQZOqoQvi5yM/iXUEoA
 z+1bgN7s02IMLU15gLDilK6QObWtUwvNxuS19MQ80yFnqmjNNpSmRTfpwzDJQ6Lj
 B6Yml8tIvVPLtmuwehhljffMUv9lrdElDDjT50yTn/CTkQYUMBejitMGu8G4YwZI
 luAN1msJ
 =bNGL
 -----END PGP SIGNATURE-----

Merge tag '5.18-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull more cifs updates from Steve French:

 - three fixes for big endian issues in how Persistent and Volatile file
   ids were stored

 - Various misc. fixes: including some for oops, 2 for ioctls, 1 for
   writeback

 - cleanup of how tcon (tree connection) status is tracked

 - Four changesets to move various duplicated protocol definitions
   (defined both in cifs.ko and ksmbd) into smbfs_common/smb2pdu.h

 - important performance improvement to use cached handles in some key
   compounding code paths (reduces numbers of opens/closes sent in some
   workloads)

 - fix to allow alternate DFS target to be used to retry on a failed i/o

* tag '5.18-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix NULL ptr dereference in smb2_ioctl_query_info()
  cifs: prevent bad output lengths in smb2_ioctl_query_info()
  smb3: fix ksmbd bigendian bug in oplock break, and move its struct to smbfs_common
  smb3: cleanup and clarify status of tree connections
  smb3: move defines for query info and query fsinfo to smbfs_common
  smb3: move defines for ioctl protocol header and SMB2 sizes to smbfs_common
  [smb3] move more common protocol header definitions to smbfs_common
  cifs: fix incorrect use of list iterator after the loop
  ksmbd: store fids as opaque u64 integers
  cifs: fix bad fids sent over wire
  cifs: change smb2_query_info_compound to use a cached fid, if available
  cifs: convert the path to utf16 in smb2_query_info_compound
  cifs: writeback fix
  cifs: do not skip link targets when an I/O fails
2022-04-01 14:31:57 -07:00
Linus Torvalds
ec251f3e18 Description for this pull request:
- Add keep_last_dots mount option to allow access to paths with trailing dots.
 - Avoid repetitive volume dirty bit set/clear to improve storage life time.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmJGW7IWHGxpbmtpbmpl
 b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCDp3D/49MeePwrzH8Aon+0MMWH2C23aZ
 UWkHQhedSmnscyb3xbaqT6UJHk3CbIdgMqJpJd1zwRs7U9o37zPQy8ma1DcOPWA9
 Bn1NUTdj/RWye+2t/6YT44xODsG5sRrzKeG2AcenivVxCW1+Xl9YBGk9J8XXIxm0
 IQTsswGuLfDdtGoHWXiSRPTatTxPsJospZER94nhGEiPxyOgVuLHUWxWxAUMXY8w
 RL9B/t7AlRFSk8HXW5aL5tNsd6OBeLybBN51lToPGPRq1qd9Aarlp0mlwMnaie2z
 1AoTXzExd1IBglfcKhppTIUqAioeIdTOkfSudaLPAoTc6PDjJAxEOQJx2NfBNimC
 JM209+MtJh5nTiNmpfGxT64vHkR70Kf1+Vdasv/iFfHQlhIbGJ+necIPHXUgA20Y
 0iZ9bksiYGNoHFc5wKQ1wBmMxsKNLVz0GYJJez/pw+JfwCGztiSlxQWDkFxj5qNW
 2L/9YRJLVpI4vmNTAFM3sHaDaYkXWfAtgWhlS0c95SVVUsvR4vu7UYTOKYlsc1mC
 T0ABV8TJIXQwlD6HzNGpU0OKD6/ddA81GrIdkoRuzT9KGJDGA1oz8ehHO+Hs796K
 ivRkARLIVwoQZAoZJR17tYnMd4GnSC/qr70HqCBFnFeNTOkl3fpzjEzxoIXL+A9t
 KbPtJlbxQao7DKLo6g==
 =PoqS
 -----END PGP SIGNATURE-----

Merge tag 'exfat-for-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat updates from Namjae Jeon:

 - Add keep_last_dots mount option to allow access to paths with
   trailing dots

 - Avoid repetitive volume dirty bit set/clear to improve storage life
   time

* tag 'exfat-for-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: do not clear VolumeDirty in writeback
  exfat: allow access to paths with trailing dots
2022-04-01 14:20:24 -07:00
Linus Torvalds
cda4351252 Filesystem/VFS changes for 5.18, part two
- Remove ->readpages infrastructure
  - Remove AOP_FLAG_CONT_EXPAND
  - Move read_descriptor_t to networking code
  - Pass the iocb to generic_perform_write
  - Minor updates to iomap, btrfs, ext4, f2fs, ntfs
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmJHSY8ACgkQDpNsjXcp
 gj59lgf/UJsVQjF+emdQAHa9AkFtZAb7TNv5QKLHp935c/OXREvHaQ956FyVhrc1
 n3pH3VRLFjXFQ3QZpWtArMQbIPr77I9KNs75zX0i+mutP5ieYcQVJKsGPIamiJAf
 eNTBoVaTxCVcTL43xCvnflvAeumoKzwdxGj6Hkgln8wuQ9B9p8923nBZpy94ajqp
 6b6E1rtrJlpEioqar2vCNpdJhEeN/jN93BwIynQMt1snPrBWQRYqv5pL3puUh7Gx
 UgJkCC6XvsUsXXOCu7n22RUKnDGiUW7QN99fmrztwnmiQY4hYmK2AoVMG16riDb+
 WmxIXbhaTo5qJT0rlQipi5d61TSuTA==
 =gwgb
 -----END PGP SIGNATURE-----

Merge tag 'folio-5.18d' of git://git.infradead.org/users/willy/pagecache

Pull more filesystem folio updates from Matthew Wilcox:
 "A mixture of odd changes that didn't quite make it into the original
  pull and fixes for things that did. Also the readpages changes had to
  wait for the NFS tree to be pulled first.

   - Remove ->readpages infrastructure

   - Remove AOP_FLAG_CONT_EXPAND

   - Move read_descriptor_t to networking code

   - Pass the iocb to generic_perform_write

   - Minor updates to iomap, btrfs, ext4, f2fs, ntfs"

* tag 'folio-5.18d' of git://git.infradead.org/users/willy/pagecache:
  btrfs: Remove a use of PAGE_SIZE in btrfs_invalidate_folio()
  ntfs: Correct mark_ntfs_record_dirty() folio conversion
  f2fs: Get the superblock from the mapping instead of the page
  f2fs: Correct f2fs_dirty_data_folio() conversion
  ext4: Correct ext4_journalled_dirty_folio() conversion
  filemap: Remove AOP_FLAG_CONT_EXPAND
  fs: Pass an iocb to generic_perform_write()
  fs, net: Move read_descriptor_t to net.h
  fs: Remove read_actor_t
  iomap: Simplify is_partially_uptodate a little
  readahead: Update comments
  mm: remove the skip_page argument to read_pages
  mm: remove the pages argument to read_pages
  fs: Remove ->readpages address space operation
  readahead: Remove read_cache_pages()
2022-04-01 13:50:50 -07:00
Ryusuke Konishi
cdd81b313d nilfs2: get rid of nilfs_mapping_init()
After applying the lockdep warning fixes, nilfs_mapping_init() is no
longer used, so delete it.

Link: https://lkml.kernel.org/r/1647867427-30498-4-git-send-email-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hao Sun <sunhao.th@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-01 11:46:09 -07:00
Ryusuke Konishi
6e211930f7 nilfs2: fix lockdep warnings during disk space reclamation
During disk space reclamation, nilfs2 still emits the following lockdep
warning due to page/folio operations on shadowed page caches that nilfs2
uses to get a snapshot of DAT file in memory:

  WARNING: CPU: 0 PID: 2643 at include/linux/backing-dev.h:272 __folio_mark_dirty+0x645/0x670
  ...
  RIP: 0010:__folio_mark_dirty+0x645/0x670
  ...
  Call Trace:
    filemap_dirty_folio+0x74/0xd0
    __set_page_dirty_nobuffers+0x85/0xb0
    nilfs_copy_dirty_pages+0x288/0x510 [nilfs2]
    nilfs_mdt_save_to_shadow_map+0x50/0xe0 [nilfs2]
    nilfs_clean_segments+0xee/0x5d0 [nilfs2]
    nilfs_ioctl_clean_segments.isra.19+0xb08/0xf40 [nilfs2]
    nilfs_ioctl+0xc52/0xfb0 [nilfs2]
    __x64_sys_ioctl+0x11d/0x170

This fixes the remaining warning by using inode objects to hold those
page caches.

Link: https://lkml.kernel.org/r/1647867427-30498-3-git-send-email-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-01 11:46:09 -07:00
Ryusuke Konishi
e897be17a4 nilfs2: fix lockdep warnings in page operations for btree nodes
Patch series "nilfs2 lockdep warning fixes".

The first two are to resolve the lockdep warning issue, and the last one
is the accompanying cleanup and low priority.

Based on your comment, this series solves the issue by separating inode
object as needed.  Since I was worried about the impact of the object
composition changes, I tested the series carefully not to cause
regressions especially for delicate functions such like disk space
reclamation and snapshots.

This patch (of 3):

If CONFIG_LOCKDEP is enabled, nilfs2 hits lockdep warnings at
inode_to_wb() during page/folio operations for btree nodes:

  WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 inode_to_wb include/linux/backing-dev.h:269 [inline]
  WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 folio_account_dirtied mm/page-writeback.c:2460 [inline]
  WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 __folio_mark_dirty+0xa7c/0xe30 mm/page-writeback.c:2509
  Modules linked in:
  ...
  RIP: 0010:inode_to_wb include/linux/backing-dev.h:269 [inline]
  RIP: 0010:folio_account_dirtied mm/page-writeback.c:2460 [inline]
  RIP: 0010:__folio_mark_dirty+0xa7c/0xe30 mm/page-writeback.c:2509
  ...
  Call Trace:
    __set_page_dirty include/linux/pagemap.h:834 [inline]
    mark_buffer_dirty+0x4e6/0x650 fs/buffer.c:1145
    nilfs_btree_propagate_p fs/nilfs2/btree.c:1889 [inline]
    nilfs_btree_propagate+0x4ae/0xea0 fs/nilfs2/btree.c:2085
    nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337
    nilfs_collect_dat_data+0x45/0xd0 fs/nilfs2/segment.c:625
    nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1009
    nilfs_segctor_scan_file+0x47a/0x700 fs/nilfs2/segment.c:1048
    nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1224 [inline]
    nilfs_segctor_collect fs/nilfs2/segment.c:1494 [inline]
    nilfs_segctor_do_construct+0x14f3/0x6c60 fs/nilfs2/segment.c:2036
    nilfs_segctor_construct+0x7a7/0xb30 fs/nilfs2/segment.c:2372
    nilfs_segctor_thread_construct fs/nilfs2/segment.c:2480 [inline]
    nilfs_segctor_thread+0x3c3/0xf90 fs/nilfs2/segment.c:2563
    kthread+0x405/0x4f0 kernel/kthread.c:327
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

This is because nilfs2 uses two page caches for each inode and
inode->i_mapping never points to one of them, the btree node cache.

This causes inode_to_wb(inode) to refer to a different page cache than
the caller page/folio operations such like __folio_start_writeback(),
__folio_end_writeback(), or __folio_mark_dirty() acquired the lock.

This patch resolves the issue by allocating and using an additional
inode to hold the page cache of btree nodes.  The inode is attached
one-to-one to the traditional nilfs2 inode if it requires a block
mapping with b-tree.  This setup change is in memory only and does not
affect the disk format.

Link: https://lkml.kernel.org/r/1647867427-30498-1-git-send-email-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/1647867427-30498-2-git-send-email-konishi.ryusuke@gmail.com
Link: https://lore.kernel.org/r/YXrYvIo8YRnAOJCj@casper.infradead.org
Link: https://lore.kernel.org/r/9a20b33d-b38f-b4a2-4742-c1eb5b8e4d6c@redhat.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+0d5b462a6f07447991b3@syzkaller.appspotmail.com
Reported-by: syzbot+34ef28bb2aeb28724aa0@syzkaller.appspotmail.com
Reported-by: Hao Sun <sunhao.th@gmail.com>
Reported-by: David Hildenbrand <david@redhat.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-01 11:46:09 -07:00
Joseph Qi
de19433423 ocfs2: fix crash when mount with quota enabled
There is a reported crash when mounting ocfs2 with quota enabled.

  RIP: 0010:ocfs2_qinfo_lock_res_init+0x44/0x50 [ocfs2]
  Call Trace:
    ocfs2_local_read_info+0xb9/0x6f0 [ocfs2]
    dquot_load_quota_sb+0x216/0x470
    dquot_load_quota_inode+0x85/0x100
    ocfs2_enable_quotas+0xa0/0x1c0 [ocfs2]
    ocfs2_fill_super.cold+0xc8/0x1bf [ocfs2]
    mount_bdev+0x185/0x1b0
    legacy_get_tree+0x27/0x40
    vfs_get_tree+0x25/0xb0
    path_mount+0x465/0xac0
    __x64_sys_mount+0x103/0x140

It is caused by when initializing dqi_gqlock, the corresponding dqi_type
and dqi_sb are not properly initialized.

This issue is introduced by commit 6c85c2c728, which wants to avoid
accessing uninitialized variables in error cases.  So make global quota
info properly initialized.

Link: https://lkml.kernel.org/r/20220323023644.40084-1-joseph.qi@linux.alibaba.com
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1007141
Fixes: 6c85c2c728 ("ocfs2: quota_local: fix possible uninitialized-variable access in ocfs2_local_read_info()")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Dayvison <sathlerds@gmail.com>
Tested-by: Valentin Vidic <vvidic@valentin-vidic.from.hr>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-01 11:46:09 -07:00
Matthew Wilcox (Oracle)
5a60542c61 btrfs: Remove a use of PAGE_SIZE in btrfs_invalidate_folio()
While btrfs doesn't use large folios yet, this should have been changed
as part of the conversion from invalidatepage to invalidate_folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01 14:40:44 -04:00
Matthew Wilcox (Oracle)
c37731301d ntfs: Correct mark_ntfs_record_dirty() folio conversion
We've already done the work of block_dirty_folio() here, leaving
only the work that needs to be done by filemap_dirty_folio().
This was a misconversion where I misread __set_page_dirty_nobuffers()
as __set_page_dirty_buffers().

Fixes: e621900ad2 ("fs: Convert __set_page_dirty_buffers to block_dirty_folio")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01 14:40:44 -04:00
Matthew Wilcox (Oracle)
29c87793eb f2fs: Get the superblock from the mapping instead of the page
It's slightly more efficient to go directly from the mapping to the
superblock than to go from the page.  Now that these routines have
the mapping passed to them, there's no reason not to use it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01 14:40:44 -04:00
Matthew Wilcox (Oracle)
0fb5b2ebc0 f2fs: Correct f2fs_dirty_data_folio() conversion
I got the return value wrong.  Very little checks the return value
from set_page_dirty(), so nobody noticed during testing.

Fixes: 4f5e34f713 ("f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01 14:40:44 -04:00
Matthew Wilcox (Oracle)
0f25233663 ext4: Correct ext4_journalled_dirty_folio() conversion
This should use the new folio_buffers() instead of page_has_buffers().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01 14:40:44 -04:00
Matthew Wilcox (Oracle)
d7414ba14a filemap: Remove AOP_FLAG_CONT_EXPAND
This flag is no longer used, so remove it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01 14:40:44 -04:00
Matthew Wilcox (Oracle)
800ba29547 fs: Pass an iocb to generic_perform_write()
We can extract both the file pointer and the pos from the iocb.
This simplifies each caller as well as allowing generic_perform_write()
to see more of the iocb contents in the future.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01 14:40:44 -04:00
Matthew Wilcox (Oracle)
2756c818e5 iomap: Simplify is_partially_uptodate a little
Remove the unnecessary variable 'len' and fix a comment to refer to
the folio instead of the page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01 14:40:43 -04:00
Matthew Wilcox (Oracle)
704528d895 fs: Remove ->readpages address space operation
All filesystems have now been converted to use ->readahead, so
remove the ->readpages operation and fix all the comments that
used to refer to it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
2022-04-01 13:45:33 -04:00
Yuezhang Mo
a4a3d8c52d exfat: do not clear VolumeDirty in writeback
Before this commit, VolumeDirty will be cleared first in
writeback if 'dirsync' or 'sync' is not enabled. If the power
is suddenly cut off after cleaning VolumeDirty but other
updates are not written, the exFAT filesystem will not be able
to detect the power failure in the next mount.

And VolumeDirty will be set again but not cleared when updating
the parent directory. It means that BootSector will be written at
least once in each write-back, which will shorten the life of the
device.

Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-04-01 10:51:03 +09:00
Vasant Karasulli
9ec784bf77 exfat: allow access to paths with trailing dots
The Linux kernel exfat driver currently unconditionally strips
 trailing periods '.' from path components. This isdone intentionally,
 loosely following Windows behaviour and specifications
 which state:

  #exFAT
  The concatenated file name has the same set of illegal characters as
  other FAT-based file systems (see Table 31).

  #FAT
  ...
  Leading and trailing spaces in a long name are ignored.
  Leading and embedded periods are allowed in a name and are stored in
  the long name. Trailing periods are ignored.

Note: Leading and trailing space ' ' characters are currently retained
by Linux kernel exfat, in conflict with the above specification.
On Windows 10, trailing and leading space ' ' characters are stripped
from the filenames.
Some implementations, such as fuse-exfat, don't perform path trailer
removal. When mounting images which contain trailing-dot paths, these
paths are unreachable, e.g.:

  + mount.exfat-fuse /dev/zram0 /mnt/test/
  FUSE exfat 1.3.0
  + cd /mnt/test/
  + touch fuse_created_dots... '  fuse_created_spaces  '
  + ls -l
  total 0
  -rwxrwxrwx 1 root 0 0 Aug 18 09:45 '  fuse_created_spaces  '
  -rwxrwxrwx 1 root 0 0 Aug 18 09:45  fuse_created_dots...
  + cd /
  + umount /mnt/test/
  + mount -t exfat /dev/zram0 /mnt/test
  + cd /mnt/test
  + ls -l
  ls: cannot access 'fuse_created_dots...': No such file or directory
  total 0
  -rwxr-xr-x 1 root 0 0 Aug 18 09:45 '  fuse_created_spaces  '
  -????????? ? ?    ? ?            ?  fuse_created_dots...
  + touch kexfat_created_dots... '  kexfat_created_spaces  '
  + ls -l
  ls: cannot access 'fuse_created_dots...': No such file or directory
  total 0
  -rwxr-xr-x 1 root 0 0 Aug 18 09:45 '  fuse_created_spaces  '
  -rwxr-xr-x 1 root 0 0 Aug 18 09:45 '  kexfat_created_spaces  '
  -????????? ? ?    ? ?            ?  fuse_created_dots...
  -rwxr-xr-x 1 root 0 0 Aug 18 09:45  kexfat_created_dots
  + cd /
  + umount /mnt/test/

This commit adds "keep_last_dots" mount option that controls whether or
not trailing periods '.' are stripped
from path components during file lookup or file creation.
This mount option can be used to access
paths with trailing periods and disallow creating files with names with
trailing periods. E.g. continuing from the previous example:

  + mount -t exfat -o keep_last_dots /dev/zram0 /mnt/test
  + cd /mnt/test
  + ls -l
  total 0
  -rwxr-xr-x 1 root 0 0 Aug 18 10:32 '  fuse_created_spaces  '
  -rwxr-xr-x 1 root 0 0 Aug 18 10:32 '  kexfat_created_spaces  '
  -rwxr-xr-x 1 root 0 0 Aug 18 10:32  fuse_created_dots...
  -rwxr-xr-x 1 root 0 0 Aug 18 10:32  kexfat_created_dots

  + echo > kexfat_created_dots_again...
  sh: kexfat_created_dots_again...: Invalid argument

Link: https://bugzilla.suse.com/show_bug.cgi?id=1188964
Link: https://lore.kernel.org/linux-fsdevel/003b01d755e4$31fb0d80$95f12880$
@samsung.com/
Link: https://docs.microsoft.com/en-us/windows/win32/fileio/exfat-specification
Suggested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Vasant Karasulli <vkarasulli@suse.de>
Co-developed-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-04-01 10:51:02 +09:00
Linus Torvalds
a87a08e3bf This pull request contains fixes for JFFS2, UBI and UBIFS
JFFS2:
         - Fixes for various memory issues
 
 UBI:
         - Fix for a race condition in cdev ioctl handler
 
 UBIFS:
 	- Fixes for O_TMPFILE and whiteout handling
 	- Fixes for various memory issues
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmJGFFEWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wcV9D/9DCaOP5Zso6Vi2QlAM3pi8ScgG
 7PXCK/l+iEMqWKv+5l4fOZuGDF9w8mf7MyBnzV6auRKEJqbOW6d3mDs2bPNORuPN
 LKLSOiumqxwrSqmpEFjS9Q3MsVMOWtx0aN57R3d8oCen44a5gGhYyP8iQSCLc8nQ
 pJszRpWnBhoWqxnc0BxDHYOYQhcQ/ckWc5mn0cEd0oN1NYmEbrAo0SpwVOjX9r1B
 TnTwDUzcAVK/M/hEsvHbEOChgibD+cTKfkORz6BkGB/VR5AlduF73UYUt/np4WH4
 6ZTe6RXKp6gqad8pJq6T/aev2m0aM5OHxwd6JTC8qK5KSa0iXnoeungsKy0ouhYv
 8V4yPRWKP9q7WEPJsiTM9KKQRiljXM+C/GOsFLKmme7zw66uu2qPdc2k99XRJzYj
 kO42AuXZ0Wx5Xueb+ZNmbPG3brNJ5ku3EXrzy/iAF6zS+uvmv1ZG5ZraLQ8RNqRT
 f8vO3NbtVnpbB6tvpPHYpjlcXkAGjumOIy9cxpl/vR5D87lRrUVSZiGu3xcW8tYm
 FOohVnLZFk3vgt9odU+TJ6qBUzWW5LVQx90Rr4o0sR7HDG8pFvX3cl3Z98bqCWGw
 wHSqy3tmUYOUaiIHvf94qJmjfY9vNlUyrnjZcoK+hFCqJz0IbrXQlrmq7o7OOvyh
 2mnOksk0XcFd41gsuw==
 =ZU5j
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull JFFS2, UBI and UBIFS updates from Richard Weinberger:
 "JFFS2:
   - Fixes for various memory issues

  UBI:
   - Fix for a race condition in cdev ioctl handler

  UBIFS:
   - Fixes for O_TMPFILE and whiteout handling

   - Fixes for various memory issues"

* tag 'for-linus-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  ubifs: rename_whiteout: correct old_dir size computing
  jffs2: fix memory leak in jffs2_scan_medium
  jffs2: fix memory leak in jffs2_do_mount_fs
  jffs2: fix use-after-free in jffs2_clear_xattr_subsystem
  fs/jffs2: fix comments mentioning i_mutex
  ubi: fastmap: Return error code if memory allocation fails in add_aeb()
  ubifs: Fix to add refcount once page is set private
  ubifs: Fix read out-of-bounds in ubifs_wbuf_write_nolock()
  ubifs: setflags: Make dirtied_ino_d 8 bytes aligned
  ubifs: Rectify space amount budget for mkdir/tmpfile operations
  ubifs: Fix 'ui->dirty' race between do_tmpfile() and writeback work
  ubifs: Rename whiteout atomically
  ubifs: Add missing iput if do_tmpfile() failed in rename whiteout
  ubifs: Fix wrong number of inodes locked by ui_mutex in ubifs_inode comment
  ubifs: Fix deadlock in concurrent rename whiteout and inode writeback
  ubifs: rename_whiteout: Fix double free for whiteout_ui->data
  ubi: Fix race condition between ctrl_cdev_ioctl and ubi_cdev_ioctl
2022-03-31 16:09:41 -07:00
Linus Torvalds
3d198e42ce gfs2 fixes
* To avoid deadlocks, actively cancel dlm locking requests when we give
   up on them.  Further dlm operations on the same lock will return
   -EBUSY until the cancel has been completed, so in that case, wait and
   repeat.  (This is rare.)
 * Lock inversion fixes in gfs2_inode_lookup() and gfs2_create_inode().
 * Some more fallout from the gfs2 mmap + page fault deadlock fixes
   (merge c03098d4b9).
 * Various other minor bug fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmJGCAsUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTrWcg//TEDazop2y7rGMFsMBXI7HPyBu4uD
 BwoclS5IfjoQbBTtkl7cWmQViMk8s3EFGxdEBorfGmMEq65I/krHi4JXG2GETdui
 ORoi8NH1sW9H2GJXmwtE2wYZlJBZtdntoBGdPXWFvt1hLajf6WGpy/CR1Wd4rYak
 8AHQxtd98OtsA6LAPlWl2UaXS4m7rhEt0Iy83mqWtbBOvZsULczuraazawnoQ/m4
 Wf5pvb+73hpwTVUkruH0+If+vi/HF0WVv1nZVyMwrSh3mpvkrsZSkbN0fd0veAhD
 b5XGI1dD5+YPxAOdwDKqnqy8/E3gRekybmpcd48BXoxF4EX/AlLX/Zn9qnrAhY6M
 qEbGzC2UqLIrPe/KjzQ8+0aKPCY5FB1VqoRMAHC/bj7mlmNgGtHxQUXdDmC4LIi6
 GOLpnueI1KtA7Hb4HCgX0BLxSqUEhUuGssBkNIqGet1cRwmM33pt1J4CG4TDLBt/
 VZiERnN3qktSlmukvd3oLSZso4fVbg7PyFTl8YMgiLDNfgcZI9RY5qwIJYrOaucr
 KTNfR6lAL2slFPIVcLwmgJt+axogk6GnCkfDVMX2VLJnMQYqJnDYn6fVG9jngSB+
 F4UBZ/alzhpel08r8xtxjADFJzA+weG1I2jnikSLKlgVN+uiQTBrhqyWdtxtEqFM
 31Nd7piiSVQEvrM=
 =xMYz
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v5.17-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fixes from Andreas Gruenbacher:

 - To avoid deadlocks, actively cancel dlm locking requests when we give
   up on them.

   Further dlm operations on the same lock will return -EBUSY until the
   cancel has been completed, so in that case, wait and repeat. (This is
   rare.)

 - Lock inversion fixes in gfs2_inode_lookup() and gfs2_create_inode().

 - Some more fallout from the gfs2 mmap + page fault deadlock fixes
   (merged in commit c03098d4b9: "Merge tag 'gfs2-v5.15-rc5-mmap-fault'").

 - Various other minor bug fixes and cleanups.

* tag 'gfs2-v5.17-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Make sure FITRIM minlen is rounded up to fs block size
  gfs2: Make sure not to return short direct writes
  gfs2: Remove dead code in gfs2_file_read_iter
  gfs2: Fix gfs2_file_buffered_write endless loop workaround
  gfs2: Minor retry logic cleanup
  gfs2: Disable page faults during lockless buffered reads
  gfs2: Fix should_fault_in_pages() logic
  gfs2: Remove return value for gfs2_indirect_init
  gfs2: Initialize gh_error in gfs2_glock_nq
  gfs2: Make use of list_is_first
  gfs2: Switch lock order of inode and iopen glock
  gfs2: cancel timed-out glock requests
  gfs2: Expect -EBUSY after canceling dlm locking requests
  gfs2: gfs2_setattr_size error path fix
  gfs2: assign rgrp glock before compute_bitstructs
2022-03-31 15:57:50 -07:00
Linus Torvalds
f008b1d6e1 Netfs prep for write helpers
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmI1HOwACgkQ+7dXa6fL
 C2u9mA/+LUdXHqlvET/PAtFTg75bUPeOFGLnuDnYl1Ng2FCKMSodAohpbVtENxsK
 E/gTVS7uiVZFQgC+YmNA00z6eIQkAaDVyvKyEcUbKREBbUgONfJ/HLeaK/NvVKxx
 TY5gx/POdG6yHRQXL6JGBqSJUB8bZrGKwnJm8ebzeKOji9n7GSJBYiMlYBA7EAhs
 Aut/P7Y39ISHLw3y+y5czBeRoubljmTyznbP20xUZEzrRwhTpNwpJVzBGUZU635T
 93Sqcp//0U5LIdn6Pg6DUGHBMBTNDNJChb21ZoBusF/HHswXsOOnf/mcRUBSJUTI
 M1WSpNLk8PRBgajMdIymQpGU1sCZZzJ3krrSA3RcXdN6GPHwZg8kKjoroHsLDL6l
 igPbDSMJ5wfiwA2A2gXbY1CkAl3ik5ccb7ZqhTwS0WBk0vOnHmAsE9cs/bBo7Xii
 GTiWXEFOgtJiXANPMS2P9DiOS3ZQNf+wxotCYdkGPOXuX9wnIo1Kmy8XfujQ1bXf
 pJsEZKfeyROKrzyKWgqLI64/Kg5xNueoFQZfDpOlZYzF1uDstynADPUt0eQD706q
 jcuKaXLN3rn5gSPun5mWOYbRtXVgOLdFL/7zptMVJwFKBFguQENhjG4UMNZcjkVA
 3Mr0kGocsgoCSk1oDBkFlrw1wIsXxWbkRBL1Pww6kovivuGUwoo=
 =j0yx
 -----END PGP SIGNATURE-----

Merge tag 'netfs-prep-20220318' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull netfs updates from David Howells:
 "Netfs prep for write helpers.

  Having had a go at implementing write helpers and content encryption
  support in netfslib, it seems that the netfs_read_{,sub}request
  structs and the equivalent write request structs were almost the same
  and so should be merged, thereby requiring only one set of
  alloc/get/put functions and a common set of tracepoints.

  Merging the structs also has the advantage that if a bounce buffer is
  added to the request struct, a read operation can be performed to fill
  the bounce buffer, the contents of the buffer can be modified and then
  a write operation can be performed on it to send the data wherever it
  needs to go using the same request structure all the way through. The
  I/O handlers would then transparently perform any required crypto.
  This should make it easier to perform RMW cycles if needed.

  The potentially common functions and structs, however, by their names
  all proclaim themselves to be associated with the read side of things.

  The bulk of these changes alter this in the following ways:

   - Rename struct netfs_read_{,sub}request to netfs_io_{,sub}request.

   - Rename some enums, members and flags to make them more appropriate.

   - Adjust some comments to match.

   - Drop "read"/"rreq" from the names of common functions. For
     instance, netfs_get_read_request() becomes netfs_get_request().

   - The ->init_rreq() and ->issue_op() methods become ->init_request()
     and ->issue_read(). I've kept the latter as a read-specific
     function and in another branch added an ->issue_write() method.

  The driver source is then reorganised into a number of files:

        fs/netfs/buffered_read.c        Create read reqs to the pagecache
        fs/netfs/io.c                   Dispatchers for read and write reqs
        fs/netfs/main.c                 Some general miscellaneous bits
        fs/netfs/objects.c              Alloc, get and put functions
        fs/netfs/stats.c                Optional procfs statistics.

  and future development can be fitted into this scheme, e.g.:

        fs/netfs/buffered_write.c       Modify the pagecache
        fs/netfs/buffered_flush.c       Writeback from the pagecache
        fs/netfs/direct_read.c          DIO read support
        fs/netfs/direct_write.c         DIO write support
        fs/netfs/unbuffered_write.c     Write modifications directly back

  Beyond the above changes, there are also some changes that affect how
  things work:

   - Make fscache_end_operation() generally available.

   - In the netfs tracing header, generate enums from the symbol ->
     string mapping tables rather than manually coding them.

   - Add a struct for filesystems that uses netfslib to put into their
     inode wrapper structs to hold extra state that netfslib is
     interested in, such as the fscache cookie. This allows netfslib
     functions to be set in filesystem operation tables and jumped to
     directly without having to have a filesystem wrapper.

   - Add a member to the struct added above to track the remote inode
     length as that may differ if local modifications are buffered. We
     may need to supply an appropriate EOF pointer when storing data (in
     AFS for example).

   - Pass extra information to netfs_alloc_request() so that the
     ->init_request() hook can access it and retain information to
     indicate the origin of the operation.

   - Make the ->init_request() hook return an error, thereby allowing a
     filesystem that isn't allowed to cache an inode (ceph or cifs, for
     example) to skip readahead.

   - Switch to using refcount_t for subrequests and add tracepoints to
     log refcount changes for the request and subrequest structs.

   - Add a function to consolidate dispatching a read request. Similar
     code is used in three places and another couple are likely to be
     added in the future"

Link: https://lore.kernel.org/all/2639515.1648483225@warthog.procyon.org.uk/

* tag 'netfs-prep-20220318' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Maintain netfs_i_context::remote_i_size
  netfs: Keep track of the actual remote file size
  netfs: Split some core bits out into their own file
  netfs: Split fs/netfs/read_helper.c
  netfs: Rename read_helper.c to io.c
  netfs: Prepare to split read_helper.c
  netfs: Add a function to consolidate beginning a read
  netfs: Add a netfs inode context
  ceph: Make ceph_init_request() check caps on readahead
  netfs: Change ->init_request() to return an error code
  netfs: Refactor arguments for netfs_alloc_read_request
  netfs: Adjust the netfs_failure tracepoint to indicate non-subreq lines
  netfs: Trace refcounting on the netfs_io_subrequest struct
  netfs: Trace refcounting on the netfs_io_request struct
  netfs: Adjust the netfs_rreq tracepoint slightly
  netfs: Split netfs_io_* object handling out
  netfs: Finish off rename of netfs_read_request to netfs_io_request
  netfs: Rename netfs_read_*request to netfs_io_*request
  netfs: Generate enums from trace symbol mapping lists
  fscache: export fscache_end_operation()
2022-03-31 15:49:36 -07:00
Linus Torvalds
b8321ed4a4 Kbuild updates for v5.18
- Add new environment variables, USERCFLAGS and USERLDFLAGS to allow
    additional flags to be passed to user-space programs.
 
  - Fix missing fflush() bugs in Kconfig and fixdep
 
  - Fix a minor bug in the comment format of the .config file
 
  - Make kallsyms ignore llvm's local labels, .L*
 
  - Fix UAPI compile-test for cross-compiling with Clang
 
  - Extend the LLVM= syntax to support LLVM=<suffix> form for using a
    particular version of LLVm, and LLVM=<prefix> form for using custom
    LLVM in a particular directory path.
 
  - Clean up Makefiles
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmJFGloVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGH0kP/j6Vx5BqEv3tP2Q+UANxLqITleJs
 IFpbSesz/BhlG7I/IapWmCDSqFbYd5uJTO4ko8CsPmZHcxr6Gw3y+DN5yQACKaG/
 p9xiF6GjPyKR8+VdcT2tV50+dVY8ANe/DxCyzKrJd/uyYxgARPKJh0KRMNz+d9lj
 ixUpCXDhx/XlKzPIlcxrvhhjevKz+NnHmN0fe6rzcOw9KzBGBTsf20Q3PqUuBOKa
 rWHsRGcBPA8eKLfWT1Us1jjic6cT2g4aMpWjF20YgUWKHgWVKcNHpxYKGXASVo/z
 ewdDnNfmwo7f7fKMCDDro9iwFWV/BumGtn43U00tnqdBcTpFojPlEOga37UPbZDF
 nmTblGVUhR0vn4PmfBy8WkAkbW+IpVatKwJGV4J3KjSvdWvZOmVj9VUGLVAR0TXW
 /YcgRs6EtG8Hn0IlCj0fvZ5wRWoDLbP2DSZ67R/44EP0GaNQPwUe4FI1izEE4EYX
 oVUAIxcKixWGj4RmdtmtMMdUcZzTpbgS9uloMUmS3u9LK0Ir/8tcWaf2zfMO6Jl2
 p4Q31s1dUUKCnFnj0xDKRyKGUkxYebrHLfuBqi0RIc0xRpSlxoXe3Dynm9aHEQoD
 ZSV0eouQJxnaxM1ck5Bu4AHLgEebHfEGjWVyUHno7jFU5EI9Wpbqpe4pCYEEDTm1
 +LJMEpdZO0dFvpF+
 =84rW
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v5.18-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Add new environment variables, USERCFLAGS and USERLDFLAGS to allow
   additional flags to be passed to user-space programs.

 - Fix missing fflush() bugs in Kconfig and fixdep

 - Fix a minor bug in the comment format of the .config file

 - Make kallsyms ignore llvm's local labels, .L*

 - Fix UAPI compile-test for cross-compiling with Clang

 - Extend the LLVM= syntax to support LLVM=<suffix> form for using a
   particular version of LLVm, and LLVM=<prefix> form for using custom
   LLVM in a particular directory path.

 - Clean up Makefiles

* tag 'kbuild-v5.18-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: Make $(LLVM) more flexible
  kbuild: add --target to correctly cross-compile UAPI headers with Clang
  fixdep: use fflush() and ferror() to ensure successful write to files
  arch: syscalls: simplify uapi/kapi directory creation
  usr/include: replace extra-y with always-y
  certs: simplify empty certs creation in certs/Makefile
  certs: include certs/signing_key.x509 unconditionally
  kallsyms: ignore all local labels prefixed by '.L'
  kconfig: fix missing '# end of' for empty menu
  kconfig: add fflush() before ferror() check
  kbuild: replace $(if A,A,B) with $(or A,B)
  kbuild: Add environment variables for userprogs flags
  kbuild: unify cmd_copy and cmd_shipped
2022-03-31 11:59:03 -07:00
Andrew Price
27ca8273fd gfs2: Make sure FITRIM minlen is rounded up to fs block size
Per fstrim(8) we must round up the minlen argument to the fs block size.
The current calculation doesn't take into account devices that have a
discard granularity and requested minlen less than 1 fs block, so the
value can get shifted away to zero in the translation to fs blocks.

The zero minlen passed to gfs2_rgrp_send_discards() then allows
sb_issue_discard() to be called with nr_sects == 0 which returns -EINVAL
and results in gfs2_rgrp_send_discards() returning -EIO.

Make sure minlen is never < 1 fs block by taking the max of the
requested minlen and the fs block size before comparing to the device's
discard granularity and shifting to fs blocks.

Fixes: 076f0faa76 ("GFS2: Fix FITRIM argument handling")
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-03-31 20:35:38 +02:00
Trond Myklebust
999397926a nfsd: Clean up nfsd_file_put()
Make it a little less racy, by removing the refcount_read() test. Then
remove the redundant 'is_hashed' variable.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-03-31 10:39:59 -04:00
Paulo Alcantara
d6f5e35845 cifs: fix NULL ptr dereference in smb2_ioctl_query_info()
When calling smb2_ioctl_query_info() with invalid
smb_query_info::flags, a NULL ptr dereference is triggered when trying
to kfree() uninitialised rqst[n].rq_iov array.

This also fixes leaked paths that are created in SMB2_open_init()
which required SMB2_open_free() to properly free them.

Here is a small C reproducer that triggers it

	#include <stdio.h>
	#include <stdlib.h>
	#include <stdint.h>
	#include <unistd.h>
	#include <fcntl.h>
	#include <sys/ioctl.h>

	#define die(s) perror(s), exit(1)
	#define QUERY_INFO 0xc018cf07

	int main(int argc, char *argv[])
	{
		int fd;

		if (argc < 2)
			exit(1);
		fd = open(argv[1], O_RDONLY);
		if (fd == -1)
			die("open");
		if (ioctl(fd, QUERY_INFO, (uint32_t[]) { 0, 0, 0, 4, 0, 0}) == -1)
			die("ioctl");
		close(fd);
		return 0;
	}

	mount.cifs //srv/share /mnt -o ...
	gcc repro.c && ./a.out /mnt/f0

	[ 1832.124468] CIFS: VFS: \\w22-dc.zelda.test\test Invalid passthru query flags: 0x4
	[ 1832.125043] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI
	[ 1832.125764] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
	[ 1832.126241] CPU: 3 PID: 1133 Comm: a.out Not tainted 5.17.0-rc8 #2
	[ 1832.126630] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
	[ 1832.127322] RIP: 0010:smb2_ioctl_query_info+0x7a3/0xe30 [cifs]
	[ 1832.127749] Code: 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 6c 05 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 74 24 28 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 cb 04 00 00 49 8b 3e e8 bb fc fa ff 48 89 da 48
	[ 1832.128911] RSP: 0018:ffffc90000957b08 EFLAGS: 00010256
	[ 1832.129243] RAX: dffffc0000000000 RBX: ffff888117e9b850 RCX: ffffffffa020580d
	[ 1832.129691] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffa043a2c0
	[ 1832.130137] RBP: ffff888117e9b878 R08: 0000000000000001 R09: 0000000000000003
	[ 1832.130585] R10: fffffbfff4087458 R11: 0000000000000001 R12: ffff888117e9b800
	[ 1832.131037] R13: 00000000ffffffea R14: 0000000000000000 R15: ffff888117e9b8a8
	[ 1832.131485] FS:  00007fcee9900740(0000) GS:ffff888151a00000(0000) knlGS:0000000000000000
	[ 1832.131993] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[ 1832.132354] CR2: 00007fcee9a1ef5e CR3: 0000000114cd2000 CR4: 0000000000350ee0
	[ 1832.132801] Call Trace:
	[ 1832.132962]  <TASK>
	[ 1832.133104]  ? smb2_query_reparse_tag+0x890/0x890 [cifs]
	[ 1832.133489]  ? cifs_mapchar+0x460/0x460 [cifs]
	[ 1832.133822]  ? rcu_read_lock_sched_held+0x3f/0x70
	[ 1832.134125]  ? cifs_strndup_to_utf16+0x15b/0x250 [cifs]
	[ 1832.134502]  ? lock_downgrade+0x6f0/0x6f0
	[ 1832.134760]  ? cifs_convert_path_to_utf16+0x198/0x220 [cifs]
	[ 1832.135170]  ? smb2_check_message+0x1080/0x1080 [cifs]
	[ 1832.135545]  cifs_ioctl+0x1577/0x3320 [cifs]
	[ 1832.135864]  ? lock_downgrade+0x6f0/0x6f0
	[ 1832.136125]  ? cifs_readdir+0x2e60/0x2e60 [cifs]
	[ 1832.136468]  ? rcu_read_lock_sched_held+0x3f/0x70
	[ 1832.136769]  ? __rseq_handle_notify_resume+0x80b/0xbe0
	[ 1832.137096]  ? __up_read+0x192/0x710
	[ 1832.137327]  ? __ia32_sys_rseq+0xf0/0xf0
	[ 1832.137578]  ? __x64_sys_openat+0x11f/0x1d0
	[ 1832.137850]  __x64_sys_ioctl+0x127/0x190
	[ 1832.138103]  do_syscall_64+0x3b/0x90
	[ 1832.138378]  entry_SYSCALL_64_after_hwframe+0x44/0xae
	[ 1832.138702] RIP: 0033:0x7fcee9a253df
	[ 1832.138937] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
	[ 1832.140107] RSP: 002b:00007ffeba94a8a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
	[ 1832.140606] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcee9a253df
	[ 1832.141058] RDX: 00007ffeba94a910 RSI: 00000000c018cf07 RDI: 0000000000000003
	[ 1832.141503] RBP: 00007ffeba94a930 R08: 00007fcee9b24db0 R09: 00007fcee9b45c4e
	[ 1832.141948] R10: 00007fcee9918d40 R11: 0000000000000246 R12: 00007ffeba94aa48
	[ 1832.142396] R13: 0000000000401176 R14: 0000000000403df8 R15: 00007fcee9b78000
	[ 1832.142851]  </TASK>
	[ 1832.142994] Modules linked in: cifs cifs_arc4 cifs_md4 bpf_preload [last unloaded: cifs]

Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-31 09:39:58 -05:00
Paulo Alcantara
b92e358757 cifs: prevent bad output lengths in smb2_ioctl_query_info()
When calling smb2_ioctl_query_info() with
smb_query_info::flags=PASSTHRU_FSCTL and
smb_query_info::output_buffer_length=0, the following would return
0x10

	buffer = memdup_user(arg + sizeof(struct smb_query_info),
			     qi.output_buffer_length);
	if (IS_ERR(buffer)) {
		kfree(vars);
		return PTR_ERR(buffer);
	}

rather than a valid pointer thus making IS_ERR() check fail.  This
would then cause a NULL ptr deference in @buffer when accessing it
later in smb2_ioctl_query_ioctl().  While at it, prevent having a
@buffer smaller than 8 bytes to correctly handle SMB2_SET_INFO
FileEndOfFileInformation requests when
smb_query_info::flags=PASSTHRU_SET_INFO.

Here is a small C reproducer which triggers a NULL ptr in @buffer when
passing an invalid smb_query_info::flags

	#include <stdio.h>
	#include <stdlib.h>
	#include <stdint.h>
	#include <unistd.h>
	#include <fcntl.h>
	#include <sys/ioctl.h>

	#define die(s) perror(s), exit(1)
	#define QUERY_INFO 0xc018cf07

	int main(int argc, char *argv[])
	{
		int fd;

		if (argc < 2)
			exit(1);
		fd = open(argv[1], O_RDONLY);
		if (fd == -1)
			die("open");
		if (ioctl(fd, QUERY_INFO, (uint32_t[]) { 0, 0, 0, 4, 0, 0}) == -1)
			die("ioctl");
		close(fd);
		return 0;
	}

	mount.cifs //srv/share /mnt -o ...
	gcc repro.c && ./a.out /mnt/f0

	[  114.138620] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI
	[  114.139310] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
	[  114.139775] CPU: 2 PID: 995 Comm: a.out Not tainted 5.17.0-rc8 #1
	[  114.140148] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
	[  114.140818] RIP: 0010:smb2_ioctl_query_info+0x206/0x410 [cifs]
	[  114.141221] Code: 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 c8 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 7b 28 4c 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 9c 01 00 00 49 8b 3f e8 58 02 fb ff 48 8b 14 24
	[  114.142348] RSP: 0018:ffffc90000b47b00 EFLAGS: 00010256
	[  114.142692] RAX: dffffc0000000000 RBX: ffff888115503200 RCX: ffffffffa020580d
	[  114.143119] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffa043a380
	[  114.143544] RBP: ffff888115503278 R08: 0000000000000001 R09: 0000000000000003
	[  114.143983] R10: fffffbfff4087470 R11: 0000000000000001 R12: ffff888115503288
	[  114.144424] R13: 00000000ffffffea R14: ffff888115503228 R15: 0000000000000000
	[  114.144852] FS:  00007f7aeabdf740(0000) GS:ffff888151600000(0000) knlGS:0000000000000000
	[  114.145338] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[  114.145692] CR2: 00007f7aeacfdf5e CR3: 000000012000e000 CR4: 0000000000350ee0
	[  114.146131] Call Trace:
	[  114.146291]  <TASK>
	[  114.146432]  ? smb2_query_reparse_tag+0x890/0x890 [cifs]
	[  114.146800]  ? cifs_mapchar+0x460/0x460 [cifs]
	[  114.147121]  ? rcu_read_lock_sched_held+0x3f/0x70
	[  114.147412]  ? cifs_strndup_to_utf16+0x15b/0x250 [cifs]
	[  114.147775]  ? dentry_path_raw+0xa6/0xf0
	[  114.148024]  ? cifs_convert_path_to_utf16+0x198/0x220 [cifs]
	[  114.148413]  ? smb2_check_message+0x1080/0x1080 [cifs]
	[  114.148766]  ? rcu_read_lock_sched_held+0x3f/0x70
	[  114.149065]  cifs_ioctl+0x1577/0x3320 [cifs]
	[  114.149371]  ? lock_downgrade+0x6f0/0x6f0
	[  114.149631]  ? cifs_readdir+0x2e60/0x2e60 [cifs]
	[  114.149956]  ? rcu_read_lock_sched_held+0x3f/0x70
	[  114.150250]  ? __rseq_handle_notify_resume+0x80b/0xbe0
	[  114.150562]  ? __up_read+0x192/0x710
	[  114.150791]  ? __ia32_sys_rseq+0xf0/0xf0
	[  114.151025]  ? __x64_sys_openat+0x11f/0x1d0
	[  114.151296]  __x64_sys_ioctl+0x127/0x190
	[  114.151549]  do_syscall_64+0x3b/0x90
	[  114.151768]  entry_SYSCALL_64_after_hwframe+0x44/0xae
	[  114.152079] RIP: 0033:0x7f7aead043df
	[  114.152306] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
	[  114.153431] RSP: 002b:00007ffc2e0c1f80 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
	[  114.153890] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7aead043df
	[  114.154315] RDX: 00007ffc2e0c1ff0 RSI: 00000000c018cf07 RDI: 0000000000000003
	[  114.154747] RBP: 00007ffc2e0c2010 R08: 00007f7aeae03db0 R09: 00007f7aeae24c4e
	[  114.155192] R10: 00007f7aeabf7d40 R11: 0000000000000246 R12: 00007ffc2e0c2128
	[  114.155642] R13: 0000000000401176 R14: 0000000000403df8 R15: 00007f7aeae57000
	[  114.156071]  </TASK>
	[  114.156218] Modules linked in: cifs cifs_arc4 cifs_md4 bpf_preload
	[  114.156608] ---[ end trace 0000000000000000 ]---
	[  114.156898] RIP: 0010:smb2_ioctl_query_info+0x206/0x410 [cifs]
	[  114.157792] Code: 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 c8 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 7b 28 4c 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 9c 01 00 00 49 8b 3f e8 58 02 fb ff 48 8b 14 24
	[  114.159293] RSP: 0018:ffffc90000b47b00 EFLAGS: 00010256
	[  114.159641] RAX: dffffc0000000000 RBX: ffff888115503200 RCX: ffffffffa020580d
	[  114.160093] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffa043a380
	[  114.160699] RBP: ffff888115503278 R08: 0000000000000001 R09: 0000000000000003
	[  114.161196] R10: fffffbfff4087470 R11: 0000000000000001 R12: ffff888115503288
	[  114.155642] R13: 0000000000401176 R14: 0000000000403df8 R15: 00007f7aeae57000
	[  114.156071]  </TASK>
	[  114.156218] Modules linked in: cifs cifs_arc4 cifs_md4 bpf_preload
	[  114.156608] ---[ end trace 0000000000000000 ]---
	[  114.156898] RIP: 0010:smb2_ioctl_query_info+0x206/0x410 [cifs]
	[  114.157792] Code: 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 c8 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 7b 28 4c 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 9c 01 00 00 49 8b 3f e8 58 02 fb ff 48 8b 14 24
	[  114.159293] RSP: 0018:ffffc90000b47b00 EFLAGS: 00010256
	[  114.159641] RAX: dffffc0000000000 RBX: ffff888115503200 RCX: ffffffffa020580d
	[  114.160093] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffffa043a380
	[  114.160699] RBP: ffff888115503278 R08: 0000000000000001 R09: 0000000000000003
	[  114.161196] R10: fffffbfff4087470 R11: 0000000000000001 R12: ffff888115503288
	[  114.161823] R13: 00000000ffffffea R14: ffff888115503228 R15: 0000000000000000
	[  114.162274] FS:  00007f7aeabdf740(0000) GS:ffff888151600000(0000) knlGS:0000000000000000
	[  114.162853] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[  114.163218] CR2: 00007f7aeacfdf5e CR3: 000000012000e000 CR4: 0000000000350ee0
	[  114.163691] Kernel panic - not syncing: Fatal exception
	[  114.164087] Kernel Offset: disabled
	[  114.164316] ---[ end Kernel panic - not syncing: Fatal exception ]---

Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-31 09:39:58 -05:00
Trond Myklebust
6b8a94332e nfsd: Fix a write performance regression
The call to filemap_flush() in nfsd_file_put() is there to ensure that
we clear out any writes belonging to a NFSv3 client relatively quickly
and avoid situations where the file can't be evicted by the garbage
collector. It also ensures that we detect write errors quickly.

The problem is this causes a regression in performance for some
workloads.

So try to improve matters by deferring writeback until we're ready to
close the file, and need to detect errors so that we can force the
client to resend.

Tested-by: Jan Kara <jack@suse.cz>
Fixes: b6669305d3 ("nfsd: Reduce the number of calls to nfsd_file_gc()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Link: https://lore.kernel.org/all/20220330103457.r4xrhy2d6nhtouzk@quack3.lan
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-03-31 10:39:57 -04:00
Steve French
c7803b05f7 smb3: fix ksmbd bigendian bug in oplock break, and move its struct to smbfs_common
Fix an endian bug in ksmbd for one remaining use of
Persistent/VolatileFid that unnecessarily converted it (it is an
opaque endian field that does not need to be and should not
be converted) in oplock_break for ksmbd, and move the definitions
for the oplock and lease break protocol requests and responses
to fs/smbfs_common/smb2pdu.h

Also move a few more definitions for various protocol requests
that were duplicated (in fs/cifs/smb2pdu.h and fs/ksmbd/smb2pdu.h)
into fs/smbfs_common/smb2pdu.h including:

- various ioctls and reparse structures
- validate negotiate request and response structs
- duplicate extents structs

Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-31 09:38:53 -05:00
Guo Xuenan
49df342218 fs: fix an infinite loop in iomap_fiemap
when get fiemap starting from MAX_LFS_FILESIZE, (maxbytes - *len) < start
will always true , then *len set zero. because of start offset is beyond
file size, for erofs filesystem it will always return iomap.length with
zero,iomap iterate will enter infinite loop. it is necessary cover this
corner case to avoid this situation.

------------[ cut here ]------------
WARNING: CPU: 7 PID: 905 at fs/iomap/iter.c:35 iomap_iter+0x97f/0xc70
Modules linked in: xfs erofs
CPU: 7 PID: 905 Comm: iomap Tainted: G        W         5.17.0-rc8 #27
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:iomap_iter+0x97f/0xc70
Code: 85 a1 fc ff ff e8 71 be 9c ff 0f 1f 44 00 00 e9 92 fc ff ff e8 62 be 9c ff 0f 0b b8 fb ff ff ff e9 fc f8 ff ff e8 51 be 9c ff <0f> 0b e9 2b fc ff ff e8 45 be 9c ff 0f 0b e9 e1 fb ff ff e8 39 be
RSP: 0018:ffff888060a37ab0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff888060a37bb0 RCX: 0000000000000000
RDX: ffff88807e19a900 RSI: ffffffff81a7da7f RDI: ffff888060a37be0
RBP: 7fffffffffffffff R08: 0000000000000000 R09: ffff888060a37c20
R10: ffff888060a37c67 R11: ffffed100c146f8c R12: 7fffffffffffffff
R13: 0000000000000000 R14: ffff888060a37bd8 R15: ffff888060a37c20
FS:  00007fd3cca01540(0000) GS:ffff888108780000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020010820 CR3: 0000000054b92000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 iomap_fiemap+0x1c9/0x2f0
 erofs_fiemap+0x64/0x90 [erofs]
 do_vfs_ioctl+0x40d/0x12e0
 __x64_sys_ioctl+0xaa/0x1c0
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
 </TASK>
---[ end trace 0000000000000000 ]---
watchdog: BUG: soft lockup - CPU#7 stuck for 26s! [iomap:905]

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[djwong: fix some typos]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2022-03-30 09:49:28 -07:00
Jakob Koschel
edf5f0548f ksmbd: replace usage of found with dedicated list iterator variable
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean [1].

This removes the need to use a found variable and simply checking if
the variable was set, can determine if the break/goto was hit.

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-30 08:17:55 -05:00
Christophe JAILLET
56b401fb0c ksmbd: Remove a redundant zeroing of memory
fill_transform_hdr() has only one caller that already clears tr_buf (it is
kzalloc'ed).

So there is no need to clear it another time here.

Remove the superfluous memset() and add a comment to remind that the caller
must clear the buffer.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Hyunchul Lee <hyc.lee@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-30 08:17:53 -05:00
Steve French
adc3282140 ksmbd: shorten experimental warning on loading the module
ksmbd is continuing to improve.  Shorten the warning message
logged the first time it is loaded to:
   "The ksmbd server is experimental"

Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-03-30 08:14:59 -05:00
Linus Torvalds
d888c83fce fs: fix fd table size alignment properly
Jason Donenfeld reports that my commit 1c24a18639 ("fs: fd tables have
to be multiples of BITS_PER_LONG") doesn't work, and the reason is an
embarrassing brown-paper-bag bug.

Yes, we want to align the number of fds to BITS_PER_LONG, and yes, the
reason they might not be aligned is because the incoming 'max_fd'
argument might not be aligned.

But aligining the argument - while simple - will cause a "infinitely
big" maxfd (eg NR_OPEN_MAX) to just overflow to zero.  Which most
definitely isn't what we want either.

The obvious fix was always just to do the alignment last, but I had
moved it earlier just to make the patch smaller and the code look
simpler.  Duh.  It certainly made _me_ look simple.

Fixes: 1c24a18639 ("fs: fd tables have to be multiples of BITS_PER_LONG")
Reported-and-tested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Fedor Pchelkin <aissur0002@gmail.com>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-29 23:29:18 -07:00