Commit Graph

89212 Commits

Author SHA1 Message Date
Linus Torvalds
7521f258ea 21 hotfixes. 12 are cc:stable and the remainder pertain to post-6.7
issues or aren't considered to be needed in earlier kernel versions.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZcfLvgAKCRDdBJ7gKXxA
 joCTAP4/XdBXA7Sj3GyjSAkYjg2U0quwX9oRhsx2Qy9duPDaLAD+NRl9XG14YSOB
 f/7OiTQoDfnwVgHAOVBHY/ylrcgZRQg=
 =2wdS
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2024-02-10-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "21 hotfixes. 12 are cc:stable and the remainder pertain to post-6.7
  issues or aren't considered to be needed in earlier kernel versions"

* tag 'mm-hotfixes-stable-2024-02-10-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
  nilfs2: fix potential bug in end_buffer_async_write
  mm/damon/sysfs-schemes: fix wrong DAMOS tried regions update timeout setup
  nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
  MAINTAINERS: Leo Yan has moved
  mm/zswap: don't return LRU_SKIP if we have dropped lru lock
  fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super
  mailmap: switch email address for John Moon
  mm: zswap: fix objcg use-after-free in entry destruction
  mm/madvise: don't forget to leave lazy MMU mode in madvise_cold_or_pageout_pte_range()
  arch/arm/mm: fix major fault accounting when retrying under per-VMA lock
  selftests: core: include linux/close_range.h for CLOSE_RANGE_* macros
  mm/memory-failure: fix crash in split_huge_page_to_list from soft_offline_page
  mm: memcg: optimize parent iteration in memcg_rstat_updated()
  nilfs2: fix data corruption in dsync block recovery for small block sizes
  mm/userfaultfd: UFFDIO_MOVE implementation should use ptep_get()
  exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock)
  fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats
  fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand()
  getrusage: use sig->stats_lock rather than lock_task_sighand()
  getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()
  ...
2024-02-10 15:28:07 -08:00
Linus Torvalds
5a7ec87063 Two ksmbd server fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmXGtzoACgkQiiy9cAdy
 T1G9UQv/b1rOI+u7Cr5RDnO0O4sbL7bJ7pfJHEK0KKpat0BFtsrGZFRwDsuSDmkc
 BMIdeENnM1aoGjGEzvyGJmzUEZUcusy2zFdLBDiW1zPBb5D5HLRmr7fN02ZwPwj9
 5vnuvM5/Iql/dSMBjDcm7M5NuiVlp9+SmN27OqXbfc0e6xHxnzhwu6A3x3Ryaz/J
 0LzxNt++UUkZfK6FrePDdRyWlvBHsMy4RfTmjIO432bhNjsx90YHrPtKj2ph4xi5
 /92QuLJvSaYyj1IrZIV6v0UBJBKtnoGek8UJ7k3Mz/BkHBXvvZTR0MYL/tKW80eK
 Bfck2qcRVauLPseGRnn5GTkvF+itTb5RXksXzVSomveAzQ7TAle/qx7EL93QKCLC
 vPJLAXK00T0JvE0zyVxGuPWvl9iWBUwbR4uwUL4XNnJksIXsTYci7TZ0ELyAA7hJ
 bdn/4DyRTS5KXC60JwE9hcGXpjYstD6w8Jz+UseADsS+qE3zuX0UwnynNCQc0zjy
 iTboUnA1
 =exTN
 -----END PGP SIGNATURE-----

Merge tag '6.8-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:
 "Two ksmbd server fixes:

   - memory leak fix

   - a minor kernel-doc fix"

* tag '6.8-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: free aux buffer if ksmbd_iov_pin_rsp_read fails
  ksmbd: Add kernel-doc for ksmbd_extract_sharename() function
2024-02-10 07:53:41 -08:00
Linus Torvalds
ca00c700c5 Five smb3 client fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmXGuQsACgkQiiy9cAdy
 T1FJHgwAgvfgXuEjLXzjcEUg7bZ7z76BDC4Qq0NJImkJ6GUm9eQssCF8xbOZlfb/
 bxUiZATGGbRso3cLKIcZtgBOkyKx8v9ZTVbjxpYJQqqliAvYNHXfi3TlMsjDwGlM
 Nd4kVboZmHJSMirR3O915Il4iOt36/RygDiLHqqE6jG+BM74I3fpOI+wtphUIEdG
 KHRczjbTlmKoZDH99Np/CYGYKiQOcFLOj7FetiYBW3AS1H2qSol5PDO0vOgvSDFq
 3QOIRN1Km5tRogHx/hgr993DYamvDHTI3GbSEwDT45zP1m13AHLFf7tPrPW9vNwQ
 1LIRqTFp7UYJElHGUZMYkPwY9ryqU1GHNekiV/JqUJyzvZ6wLC0mHJnY4Sh+xpdP
 wUyTCdZITZ6KPw+bQnITISkQvB3Z8lp6xX4OQZYusICLzAqfsGVI8qM2AMWM5Ywv
 9BEQb1cY6iolcQZ1nYl1/yJTVy9KZHK+N/f/8GCrF/7MDEJn2CKyIUT+Zz8f1K+F
 lS6Dm8XU
 =jOZH
 -----END PGP SIGNATURE-----

Merge tag '6.8-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - reconnect fix

 - multichannel channel selection fix

 - minor mount warning fix

 - reparse point fix

 - null pointer check improvement

* tag '6.8-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: clarify mount warning
  cifs: handle cases where multiple sessions share connection
  cifs: change tcon status when need_reconnect is set on it
  smb: client: set correct d_type for reparse points under DFS mounts
  smb3: add missing null server pointer check
2024-02-09 17:09:30 -08:00
Linus Torvalds
e1e3f530a1 Some fscrypt-related fixups (sparse reads are used only for encrypted
files) and two cap handling fixes from Xiubo and Rishabh.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmXGRJATHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi+/0B/4pEweAm2W0UUaaS59DecNySBFobwed
 m7bBDBGIAQ/I3duN46a13FzsGNclho967TeB0ig1jrQxnoo3HEMiXpZz5xfG9spe
 fyvrIk3R8cSqgd7YsyITnUjGGd2UBvZVrbWOCbWrKofSoflS6IjcGDQF7ZrgEsff
 0KkMaWHvO6poIU2mAToV//UkWUk6RrtAUNlSdjLpizXnUrrAQ+vUA3OU9SSp6Klf
 xmFaIiAiVZC6M8qFpXJtnIf8Ba7PrpW5InAXgCOkxDKciE9fLaPsIu0B3H9lUVKZ
 TJwjEJ0nB+akh0tRO5bZKyM8j0D3lhgxphJwNtUoYjQsV3m7LcGQV+Il
 =u953
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "Some fscrypt-related fixups (sparse reads are used only for encrypted
  files) and two cap handling fixes from Xiubo and Rishabh"

* tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client:
  ceph: always check dir caps asynchronously
  ceph: prevent use-after-free in encode_cap_msg()
  ceph: always set initial i_blkbits to CEPH_FSCRYPT_BLOCK_SHIFT
  libceph: just wait for more data to be available on the socket
  libceph: rename read_sparse_msg_*() to read_partial_sparse_msg_*()
  libceph: fail sparse-read if the data length doesn't match
2024-02-09 17:05:02 -08:00
Linus Torvalds
a2343df3fb driver ntfs3 for linux 6.8
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEh0DEKNP0I9IjwfWEqbAzH4MkB7YFAmXGIMMACgkQqbAzH4Mk
 B7b07xAAw/VcHqPdhcVg2SttGy1D0rpjXvSK9Na3pulD83M3AptvjqXToP1xZHUP
 fpZ32rAWoeaJieTbS+xJUEyRj+VGN9iEgUMBtoaIYIv9ozmC7IU0xyDvJCPU07F1
 X2n+IXxpBsU/Y1rQiJIijzum+BQYgXgsifdwkZU50QQjQWWcFdMU9VPjN2Saw/Sx
 8gd1rzKVKIclErpReDyuZpTqDweM4BxiuwKhLodzlMtfO2MEqXxwFXnLQDX2xJLh
 zJPMepM/3mzHSBjKrHQ+xZHZCDuP393UUJK+sd9PaETR8xHR4ew9yqSu1Ajg/o64
 xoQ8rkpT9g6AZS+JNKtKN52rw5rn4ZCi/VZ61HgqiLGTxOkVnHpynmDr3IKzfgn+
 j7ZD33HteBJLxnR3YTi7fJA8DF9d0vHUv+HtH351WVibJn9DrzWzIkp6uDdaVfoa
 YvVE+ODynLVvpDKVTm4QOmIRnVMFDZwNo7C2sURy6nqQYf+ufYYRbe5btrvhSZ8k
 uazJLhLSLFCHiT6WlbmykntTo15sub/yIF5juVRcWthi2jWj0qII549jtSkZquQR
 pEVcitMTrr6RqwEB/B5nsz2azQ4m/+JgO0se1kWvxa+6erVV0wCdB7STW77zbRmx
 m8Xyr8Pf+ZxM+IhP4cpSxgcc5olhvUjcrkNBtilQc0vLqf535k0=
 =4Gkn
 -----END PGP SIGNATURE-----

Merge tag 'ntfs3_for_6.8' of https://github.com/Paragon-Software-Group/linux-ntfs3

Pull ntfs3 fixes from Konstantin Komarov:
 "Fixed:
   - size update for compressed file
   - some logic errors, overflows
   - memory leak
   - some code was refactored

  Added:
   - implement super_operations::shutdown

  Improved:
   - alternative boot processing
   - reduced stack usage"

* tag 'ntfs3_for_6.8' of https://github.com/Paragon-Software-Group/linux-ntfs3: (28 commits)
  fs/ntfs3: Slightly simplify ntfs_inode_printk()
  fs/ntfs3: Add ioctl operation for directories (FITRIM)
  fs/ntfs3: Fix oob in ntfs_listxattr
  fs/ntfs3: Fix an NULL dereference bug
  fs/ntfs3: Update inode->i_size after success write into compressed file
  fs/ntfs3: Fixed overflow check in mi_enum_attr()
  fs/ntfs3: Correct function is_rst_area_valid
  fs/ntfs3: Use i_size_read and i_size_write
  fs/ntfs3: Prevent generic message "attempt to access beyond end of device"
  fs/ntfs3: use non-movable memory for ntfs3 MFT buffer cache
  fs/ntfs3: Use kvfree to free memory allocated by kvmalloc
  fs/ntfs3: Disable ATTR_LIST_ENTRY size check
  fs/ntfs3: Fix c/mtime typo
  fs/ntfs3: Add NULL ptr dereference checking at the end of attr_allocate_frame()
  fs/ntfs3: Add and fix comments
  fs/ntfs3: ntfs3_forced_shutdown use int instead of bool
  fs/ntfs3: Implement super_operations::shutdown
  fs/ntfs3: Drop suid and sgid bits as a part of fpunch
  fs/ntfs3: Add file_modified
  fs/ntfs3: Correct use bh_read
  ...
2024-02-09 16:59:49 -08:00
Steve French
a5cc98eba2 smb3: clarify mount warning
When a user tries to use the "sec=krb5p" mount parameter to encrypt
data on connection to a server (when authenticating with Kerberos), we
indicate that it is not supported, but do not note the equivalent
recommended mount parameter ("sec=krb5,seal") which turns on encryption
for that mount (and uses Kerberos for auth).  Update the warning message.

Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-09 14:43:27 -06:00
Shyam Prasad N
a39c757bf0 cifs: handle cases where multiple sessions share connection
Based on our implementation of multichannel, it is entirely
possible that a server struct may not be found in any channel
of an SMB session.

In such cases, we should be prepared to move on and search for
the server struct in the next session.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-09 14:43:25 -06:00
Shyam Prasad N
c6e02eefd6 cifs: change tcon status when need_reconnect is set on it
When a tcon is marked for need_reconnect, the intention
is to have it reconnected.

This change adjusts tcon->status in cifs_tree_connect
when need_reconnect is set. Also, this change has a minor
correction in resetting need_reconnect on success. It makes
sure that it is done with tc_lock held.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-09 14:43:23 -06:00
Paulo Alcantara
55c7788c37 smb: client: set correct d_type for reparse points under DFS mounts
Send query dir requests with an info level of
SMB_FIND_FILE_FULL_DIRECTORY_INFO rather than
SMB_FIND_FILE_DIRECTORY_INFO when the client is generating its own
inode numbers (e.g. noserverino) so that reparse tags still
can be parsed directly from the responses, but server won't
send UniqueId (server inode number)

Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-08 10:50:40 -06:00
Steve French
45be0882c5 smb3: add missing null server pointer check
Address static checker warning in cifs_ses_get_chan_index():
    warn: variable dereferenced before check 'server'
To be consistent, and reduce risk, we should add another check
for null server pointer.

Fixes: 88675b22d3 ("cifs: do not search for channel if server is terminating")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-08 10:50:40 -06:00
Ryusuke Konishi
5bc09b397c nilfs2: fix potential bug in end_buffer_async_write
According to a syzbot report, end_buffer_async_write(), which handles the
completion of block device writes, may detect abnormal condition of the
buffer async_write flag and cause a BUG_ON failure when using nilfs2.

Nilfs2 itself does not use end_buffer_async_write().  But, the async_write
flag is now used as a marker by commit 7f42ec3941 ("nilfs2: fix issue
with race condition of competition between segments for dirty blocks") as
a means of resolving double list insertion of dirty blocks in
nilfs_lookup_dirty_data_buffers() and nilfs_lookup_node_buffers() and the
resulting crash.

This modification is safe as long as it is used for file data and b-tree
node blocks where the page caches are independent.  However, it was
irrelevant and redundant to also introduce async_write for segment summary
and super root blocks that share buffers with the backing device.  This
led to the possibility that the BUG_ON check in end_buffer_async_write
would fail as described above, if independent writebacks of the backing
device occurred in parallel.

The use of async_write for segment summary buffers has already been
removed in a previous change.

Fix this issue by removing the manipulation of the async_write flag for
the remaining super root block buffer.

Link: https://lkml.kernel.org/r/20240203161645.4992-1-konishi.ryusuke@gmail.com
Fixes: 7f42ec3941 ("nilfs2: fix issue with race condition of competition between segments for dirty blocks")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+5c04210f7c7f897c1e7f@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/00000000000019a97c05fd42f8c8@google.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-07 21:20:37 -08:00
Ryusuke Konishi
38296afe3c nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
Syzbot reported a hang issue in migrate_pages_batch() called by mbind()
and nilfs_lookup_dirty_data_buffers() called in the log writer of nilfs2.

While migrate_pages_batch() locks a folio and waits for the writeback to
complete, the log writer thread that should bring the writeback to
completion picks up the folio being written back in
nilfs_lookup_dirty_data_buffers() that it calls for subsequent log
creation and was trying to lock the folio.  Thus causing a deadlock.

In the first place, it is unexpected that folios/pages in the middle of
writeback will be updated and become dirty.  Nilfs2 adds a checksum to
verify the validity of the log being written and uses it for recovery at
mount, so data changes during writeback are suppressed.  Since this is
broken, an unclean shutdown could potentially cause recovery to fail.

Investigation revealed that the root cause is that the wait for writeback
completion in nilfs_page_mkwrite() is conditional, and if the backing
device does not require stable writes, data may be modified without
waiting.

Fix these issues by making nilfs_page_mkwrite() wait for writeback to
finish regardless of the stable write requirement of the backing device.

Link: https://lkml.kernel.org/r/20240131145657.4209-1-konishi.ryusuke@gmail.com
Fixes: 1d1d1a7672 ("mm: only enforce stable page writes if the backing device requires it")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+ee2ae68da3b22d04cd8d@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/00000000000047d819061004ad6c@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-07 21:20:36 -08:00
Oscar Salvador
79d72c68c5 fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super
When configuring a hugetlb filesystem via the fsconfig() syscall, there is
a possible NULL dereference in hugetlbfs_fill_super() caused by assigning
NULL to ctx->hstate in hugetlbfs_parse_param() when the requested pagesize
is non valid.

E.g: Taking the following steps:

     fd = fsopen("hugetlbfs", FSOPEN_CLOEXEC);
     fsconfig(fd, FSCONFIG_SET_STRING, "pagesize", "1024", 0);
     fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);

Given that the requested "pagesize" is invalid, ctxt->hstate will be replaced
with NULL, losing its previous value, and we will print an error:

 ...
 ...
 case Opt_pagesize:
 ps = memparse(param->string, &rest);
 ctx->hstate = h;
 if (!ctx->hstate) {
         pr_err("Unsupported page size %lu MB\n", ps / SZ_1M);
         return -EINVAL;
 }
 return 0;
 ...
 ...

This is a problem because later on, we will dereference ctxt->hstate in
hugetlbfs_fill_super()

 ...
 ...
 sb->s_blocksize = huge_page_size(ctx->hstate);
 ...
 ...

Causing below Oops.

Fix this by replacing cxt->hstate value only when then pagesize is known
to be valid.

 kernel: hugetlbfs: Unsupported page size 0 MB
 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028
 kernel: #PF: supervisor read access in kernel mode
 kernel: #PF: error_code(0x0000) - not-present page
 kernel: PGD 800000010f66c067 P4D 800000010f66c067 PUD 1b22f8067 PMD 0
 kernel: Oops: 0000 [#1] PREEMPT SMP PTI
 kernel: CPU: 4 PID: 5659 Comm: syscall Tainted: G            E      6.8.0-rc2-default+ #22 5a47c3fef76212addcc6eb71344aabc35190ae8f
 kernel: Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017
 kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0
 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28
 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246
 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004
 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000
 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004
 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000
 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400
 kernel: FS:  00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000
 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0
 kernel: Call Trace:
 kernel:  <TASK>
 kernel:  ? __die_body+0x1a/0x60
 kernel:  ? page_fault_oops+0x16f/0x4a0
 kernel:  ? search_bpf_extables+0x65/0x70
 kernel:  ? fixup_exception+0x22/0x310
 kernel:  ? exc_page_fault+0x69/0x150
 kernel:  ? asm_exc_page_fault+0x22/0x30
 kernel:  ? __pfx_hugetlbfs_fill_super+0x10/0x10
 kernel:  ? hugetlbfs_fill_super+0xb4/0x1a0
 kernel:  ? hugetlbfs_fill_super+0x28/0x1a0
 kernel:  ? __pfx_hugetlbfs_fill_super+0x10/0x10
 kernel:  vfs_get_super+0x40/0xa0
 kernel:  ? __pfx_bpf_lsm_capable+0x10/0x10
 kernel:  vfs_get_tree+0x25/0xd0
 kernel:  vfs_cmd_create+0x64/0xe0
 kernel:  __x64_sys_fsconfig+0x395/0x410
 kernel:  do_syscall_64+0x80/0x160
 kernel:  ? syscall_exit_to_user_mode+0x82/0x240
 kernel:  ? do_syscall_64+0x8d/0x160
 kernel:  ? syscall_exit_to_user_mode+0x82/0x240
 kernel:  ? do_syscall_64+0x8d/0x160
 kernel:  ? exc_page_fault+0x69/0x150
 kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 kernel: RIP: 0033:0x7ffbc0cb87c9
 kernel: Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 96 0d 00 f7 d8 64 89 01 48
 kernel: RSP: 002b:00007ffc29d2f388 EFLAGS: 00000206 ORIG_RAX: 00000000000001af
 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffbc0cb87c9
 kernel: RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
 kernel: RBP: 00007ffc29d2f3b0 R08: 0000000000000000 R09: 0000000000000000
 kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
 kernel: R13: 00007ffc29d2f4c0 R14: 0000000000000000 R15: 0000000000000000
 kernel:  </TASK>
 kernel: Modules linked in: rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) sunrpc(E) netfs(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) intel_rapl_msr(E) intel_rapl_common(E) iTCO_wdt(E) intel_pmc_bxt(E) sb_edac(E) iTCO_vendor_support(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) rfkill(E) ipmi_ssif(E) kvm(E) acpi_ipmi(E) irqbypass(E) pcspkr(E) igb(E) ipmi_si(E) mei_me(E) i2c_i801(E) joydev(E) intel_pch_thermal(E) i2c_smbus(E) dca(E) lpc_ich(E) mei(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_pad(E) tiny_power_button(E) button(E) fuse(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) ext4(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sd_mod(E) t10_pi(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) polyval_clmulni(E) ahci(E) xhci_pci(E) polyval_generic(E) gf128mul(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) xhci_pci_renesas(E) libahci(E) ehci_pci(E) sha1_ssse3(E) xhci_hcd(E) ehci_hcd(E) libata(E)
 kernel:  mgag200(E) i2c_algo_bit(E) usbcore(E) wmi(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E) aesni_intel(E) crypto_simd(E) cryptd(E)
 kernel: Unloaded tainted modules: acpi_cpufreq(E):1 fjes(E):1
 kernel: CR2: 0000000000000028
 kernel: ---[ end trace 0000000000000000 ]---
 kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0
 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28
 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246
 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004
 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000
 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004
 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000
 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400
 kernel: FS:  00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000
 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0

Link: https://lkml.kernel.org/r/20240130210418.3771-1-osalvador@suse.de
Fixes: 32021982a3 ("hugetlbfs: Convert to fs_context")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-07 21:20:36 -08:00
Ryusuke Konishi
67b8bcbaed nilfs2: fix data corruption in dsync block recovery for small block sizes
The helper function nilfs_recovery_copy_block() of
nilfs_recovery_dsync_blocks(), which recovers data from logs created by
data sync writes during a mount after an unclean shutdown, incorrectly
calculates the on-page offset when copying repair data to the file's page
cache.  In environments where the block size is smaller than the page
size, this flaw can cause data corruption and leak uninitialized memory
bytes during the recovery process.

Fix these issues by correcting this byte offset calculation on the page.

Link: https://lkml.kernel.org/r/20240124121936.10575-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-07 21:20:34 -08:00
Oleg Nesterov
7601df8031 fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats
lock_task_sighand() can trigger a hard lockup.  If NR_CPUS threads call
do_task_stat() at the same time and the process has NR_THREADS, it will
spin with irqs disabled O(NR_CPUS * NR_THREADS) time.

Change do_task_stat() to use sig->stats_lock to gather the statistics
outside of ->siglock protected section, in the likely case this code will
run lockless.

Link: https://lkml.kernel.org/r/20240123153357.GA21857@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Dylan Hatch <dylanbhatch@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-07 21:20:33 -08:00
Oleg Nesterov
60f92acb60 fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand()
Patch series "fs/proc: do_task_stat: use sig->stats_".

do_task_stat() has the same problem as getrusage() had before "getrusage:
use sig->stats_lock rather than lock_task_sighand()": a hard lockup.  If
NR_CPUS threads call lock_task_sighand() at the same time and the process
has NR_THREADS, spin_lock_irq will spin with irqs disabled O(NR_CPUS *
NR_THREADS) time.


This patch (of 3):

thread_group_cputime() does its own locking, we can safely shift
thread_group_cputime_adjusted() which does another for_each_thread loop
outside of ->siglock protected section.

Not only this removes for_each_thread() from the critical section with
irqs disabled, this removes another case when stats_lock is taken with
siglock held.  We want to remove this dependency, then we can change the
users of stats_lock to not disable irqs.

Link: https://lkml.kernel.org/r/20240123153313.GA21832@redhat.com
Link: https://lkml.kernel.org/r/20240123153355.GA21854@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Dylan Hatch <dylanbhatch@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-07 21:20:33 -08:00
Prakash Sangappa
e656c7a9e5 mm: hugetlb pages should not be reserved by shmat() if SHM_NORESERVE
For shared memory of type SHM_HUGETLB, hugetlb pages are reserved in
shmget() call.  If SHM_NORESERVE flags is specified then the hugetlb pages
are not reserved.  However when the shared memory is attached with the
shmat() call the hugetlb pages are getting reserved incorrectly for
SHM_HUGETLB shared memory created with SHM_NORESERVE which is a bug.

-------------------------------
Following test shows the issue.

$cat shmhtb.c

int main()
{
	int shmflags = 0660 | IPC_CREAT | SHM_HUGETLB | SHM_NORESERVE;
	int shmid;

	shmid = shmget(SKEY, SHMSZ, shmflags);
	if (shmid < 0)
	{
		printf("shmat: shmget() failed, %d\n", errno);
		return 1;
	}
	printf("After shmget()\n");
	system("cat /proc/meminfo | grep -i hugepages_");

	shmat(shmid, NULL, 0);
	printf("\nAfter shmat()\n");
	system("cat /proc/meminfo | grep -i hugepages_");

	shmctl(shmid, IPC_RMID, NULL);
	return 0;
}

 #sysctl -w vm.nr_hugepages=20
 #./shmhtb

After shmget()
HugePages_Total:      20
HugePages_Free:       20
HugePages_Rsvd:        0
HugePages_Surp:        0

After shmat()
HugePages_Total:      20
HugePages_Free:       20
HugePages_Rsvd:        5 <--
HugePages_Surp:        0
--------------------------------

Fix is to ensure that hugetlb pages are not reserved for SHM_HUGETLB shared
memory in the shmat() call.

Link: https://lkml.kernel.org/r/1706040282-12388-1-git-send-email-prakash.sangappa@oracle.com
Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-07 21:20:32 -08:00
Fedor Pchelkin
108a020c64 ksmbd: free aux buffer if ksmbd_iov_pin_rsp_read fails
ksmbd_iov_pin_rsp_read() doesn't free the provided aux buffer if it
fails. Seems to be the caller's responsibility to clear the buffer in
error case.

Found by Linux Verification Center (linuxtesting.org).

Fixes: e2b76ab8b5 ("ksmbd: add support for read compound")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-07 20:23:37 -06:00
Yang Li
a12bc36032 ksmbd: Add kernel-doc for ksmbd_extract_sharename() function
The ksmbd_extract_sharename() function lacked a complete kernel-doc
comment. This patch adds parameter descriptions and detailed function
behavior to improve code readability and maintainability.

Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-07 20:23:37 -06:00
Linus Torvalds
c8d80f83de nfsd-6.8 fixes:
- Address a deadlock regression in RELEASE_LOCKOWNER
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmXDkGoACgkQM2qzM29m
 f5cD1xAAsqKTmrb2ABdFZadYfVl6lKxtNWEp8S9eRS6eBU6bcNr5sxoTi+eflHuB
 a58TfUwj8ffVtmd0qGaWI8RpDuAYtxhJU6l3TQZCLheMlKvsP3u6lZDjk8CeYtIK
 9bVDZV7MOh9C/p01p/21P2B3OukgzzF8Lz/AFPCxCtoK5nnaDT5F+8pX8yfZU5x0
 bM1gBHzB80BoCdzlimN6QB8EfFkOgIF9Apnk53E676KmkOuADV+CIp4aQMsm8l3r
 6lJAdsOCuw5BgSDJWh1vGmFKubfhP841QbglDnnZcc+WTBhvEO0PR8Kv0Ugn5Xek
 8NkcnOlti+gjH0EKTHx5P4frV0BcWptLjCjVdruOvBszrZtEaX547Mp/d04GGO0U
 8gUEhen/RT7l1E2RM4qII9q5nuaekjI2Da2FGZNmK/j66OPFibiOBMRn3u/haTr5
 2axrrO9NOnfWcTQX08iKZygEUY7E4h3iImqkNw+c+avZ6SRiH548TRCrKdlBeuia
 Pj3QFRfu/9PQnnl9qnROXtAP70AKmX1iSLZ4s9+KqTzQVCW7tLbZpOn9D1hHCJ1q
 0JTVql5rSXCl5YpBsDiqr7qvZhVab/2+1X9wHkLPZd5aIBoUmOQOMebZCUxpu+TR
 houx3NBGYNixk0khEkHH3+c0HydvnxVU3xClUgLxjHgEdeS7R6s=
 =z3oC
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

 - Address a deadlock regression in RELEASE_LOCKOWNER

* tag 'nfsd-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: don't take fi_lock in nfsd_break_deleg_cb()
2024-02-07 17:48:15 +00:00
Xiubo Li
07045648c0 ceph: always check dir caps asynchronously
The MDS will issue the 'Fr' caps for async dirop, while there is
buggy in kclient and it could miss releasing the async dirop caps,
which is 'Fsxr'. And then the MDS will complain with:

"[WRN] client.xxx isn't responding to mclientcaps(revoke) ..."

So when releasing the dirop async requests or when they fail we
should always make sure that being revoked caps could be released.

Link: https://tracker.ceph.com/issues/50223
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-02-07 14:58:02 +01:00
Rishabh Dave
cda4672da1 ceph: prevent use-after-free in encode_cap_msg()
In fs/ceph/caps.c, in encode_cap_msg(), "use after free" error was
caught by KASAN at this line - 'ceph_buffer_get(arg->xattr_buf);'. This
implies before the refcount could be increment here, it was freed.

In same file, in "handle_cap_grant()" refcount is decremented by this
line - 'ceph_buffer_put(ci->i_xattrs.blob);'. It appears that a race
occurred and resource was freed by the latter line before the former
line could increment it.

encode_cap_msg() is called by __send_cap() and __send_cap() is called by
ceph_check_caps() after calling __prep_cap(). __prep_cap() is where
arg->xattr_buf is assigned to ci->i_xattrs.blob. This is the spot where
the refcount must be increased to prevent "use after free" error.

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/59259
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-02-07 14:58:02 +01:00
Xiubo Li
bbb20ea993 ceph: always set initial i_blkbits to CEPH_FSCRYPT_BLOCK_SHIFT
The fscrypt code will use i_blkbits to setup ci_data_unit_bits when
allocating the new inode, but ceph will initiate i_blkbits ater when
filling the inode, which is too late. Since ci_data_unit_bits will only
be used by the fscrypt framework so initiating i_blkbits with
CEPH_FSCRYPT_BLOCK_SHIFT is safe.

Link: https://tracker.ceph.com/issues/64035
Fixes: 5b11888471 ("fscrypt: support crypto data unit size less than filesystem block size")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-02-07 14:43:29 +01:00
Linus Torvalds
6d280f4d76 for-6.8-rc3-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmXDNuAACgkQxWXV+ddt
 WDvBGg/9FuCJm/GkBxgeVKNxdF28fIzYkYHjSYzHSo5A5GFMNENHUDfXcSNjZjUM
 ZFWHCXENcnNa7pKONPaW5QIIQuecBPqcXK+lPJXlqFlC22CGSVD7MZ7/Fm7uKJ5W
 mhGGuq7NTuTN1MYm480WVa+5DkfVbFkPeZgWOVTQ0tXGxTEKU9pXvwmflx8rbmRG
 VPhT0iZO/KmkRSp91BwAJxitw8v76WG9JpGemiFcNOISCdE/HENxrxj8rE6beZoc
 g0Kx8YQDTlf119bdwlCdJkvRVEzjIEZIUE2g8J0oKzPE6CmY2a+8+Iv/S0nCCc6V
 2nFVHdhLnUH5oIuFEoo026tvu3tMKR1K30EAQyFslsjPE74Hye7MAjr8sEvAF7E/
 J4Sbn3NIILkKu1Ozn/RqhPh+XsSyU9tXeO1+BcdmrGY9vDGq18lVbruOrde14fqZ
 xHFJloXKsJCw7AcMzNfsa6arRQ7YGa8sGudMLpriUemUUn0MK8OdY6zCq20p43ON
 8eUigP3WHOdPfCJXNfgqlJdyjmYdHCWvn4wKpPDMQuU5rMyUloOJDqtR6fxVCatO
 0Pjg0zVyLu/CF6+vrL6wP4qT9sRj1Jy2YEh8fFe4fWc9+JOmQZYBm/Eyaw4oU0rg
 lOmqE1/TEgl0ra9IHvxcgJo5l7zx2dbHAgMEmScCgIwrLpkh14A=
 =nMOd
 -----END PGP SIGNATURE-----

Merge tag 'for-6.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - two fixes preventing deletion and manual creation of subvolume qgroup

 - unify error code returned for unknown send flags

 - fix assertion during subvolume creation when anonymous device could
   be allocated by other thread (e.g. due to backref walk)

* tag 'for-6.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: do not ASSERT() if the newly created subvolume already got read
  btrfs: forbid deleting live subvol qgroup
  btrfs: forbid creating subvol qgroups
  btrfs: send: return EOPNOTSUPP on unknown flags
2024-02-07 08:21:32 +00:00
Linus Torvalds
99bd3cb0d1 bcachefs fixes for v6.8-rc4
Two serious ones here that we'll want to backport to stable: a fix for a
 race in the thread_with_file code, and another locking fixup in the
 subvolume deletion path.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmXBeaYACgkQE6szbY3K
 bnb4iA//dByHSjorgEDU/8EZwq6vKSJ5F+M4JqEYTYHP3IG9Fci6dLwCbx/SBFFn
 NaYx1mPxSnVCjr1WJEikdUPFW9SAT5Qv/xqqbAKZZ8F6AW1tGM+kK9isV24tpbQx
 RYpYZaYObmyHUgd2Fm3hOUltLVkRB2y2wI1SQKbq+31Zk9rYEITIdhe7mXOsD40f
 Uo6KJGVATDC8gg6S3bG3ZNG7AUqOx8HErcsrErQCxJpwbbYWzpz+3QbVZNRaID04
 AGRrasJcjk6RZ1LM9jasjN+i4mPOrZBS4gyXHg3YcH7+Ft4uGz//afVBU8AcnaX+
 zMP12LwFRLZul1Q5lNVcI3t5lJLd2JlWiMssruZcY6tCFmyWBS8DOA5F1PJwy171
 8pEcKkNhjgs/LMcpc/GPL09jRjBXYI6yM14nm1HbRE9oMm3HamyZJ0ThSQn5m63x
 4wLpjlCfgBI96qPTXLUR4FQzqWe0TNI/pMkMLl0L4wcvqCQSxJ2UwTEjkaDF7hif
 G98XPG8G2wgDqWKXD8u/su57E/X1vj893Y11QsMHcp0XwmYRiebsP+kR6sKU8m44
 CggyC7CUEhfZM7XmmQFnhOXMMvF9f7BCS4MpvAUSuPzwYWRE8CpUpbBUjQFFGZ9w
 h6er30CjOhjkc0JOT416mNvXusO3h9P4x/pyWZ33iQ1fS58ckXU=
 =iour
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2024-02-05' of https://evilpiepirate.org/git/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "Two serious ones here that we'll want to backport to stable: a fix for
  a race in the thread_with_file code, and another locking fixup in the
  subvolume deletion path"

* tag 'bcachefs-2024-02-05' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: time_stats: Check for last_event == 0 when updating freq stats
  bcachefs: install fd later to avoid race with close
  bcachefs: unlock parent dir if entry is not found in subvolume deletion
  bcachefs: Fix build on parisc by avoiding __multi3()
2024-02-06 07:38:31 +00:00
NeilBrown
5ea9a7c5fe nfsd: don't take fi_lock in nfsd_break_deleg_cb()
A recent change to check_for_locks() changed it to take ->flc_lock while
holding ->fi_lock.  This creates a lock inversion (reported by lockdep)
because there is a case where ->fi_lock is taken while holding
->flc_lock.

->flc_lock is held across ->fl_lmops callbacks, and
nfsd_break_deleg_cb() is one of those and does take ->fi_lock.  However
it doesn't need to.

Prior to v4.17-rc1~110^2~22 ("nfsd: create a separate lease for each
delegation") nfsd_break_deleg_cb() would walk the ->fi_delegations list
and so needed the lock.  Since then it doesn't walk the list and doesn't
need the lock.

Two actions are performed under the lock.  One is to call
nfsd_break_one_deleg which calls nfsd4_run_cb().  These doesn't act on
the nfs4_file at all, so don't need the lock.

The other is to set ->fi_had_conflict which is in the nfs4_file.
This field is only ever set here (except when initialised to false)
so there is no possible problem will multiple threads racing when
setting it.

The field is tested twice in nfs4_set_delegation().  The first test does
not hold a lock and is documented as an opportunistic optimisation, so
it doesn't impose any need to hold ->fi_lock while setting
->fi_had_conflict.

The second test in nfs4_set_delegation() *is* make under ->fi_lock, so
removing the locking when ->fi_had_conflict is set could make a change.
The change could only be interesting if ->fi_had_conflict tested as
false even though nfsd_break_one_deleg() ran before ->fi_lock was
unlocked.  i.e. while hash_delegation_locked() was running.
As hash_delegation_lock() doesn't interact in any way with nfs4_run_cb()
there can be no importance to this interaction.

So this patch removes the locking from nfsd_break_one_deleg() and moves
the final test on ->fi_had_conflict out of the locked region to make it
clear that locking isn't important to the test.  It is still tested
*after* vfs_setlease() has succeeded.  This might be significant and as
vfs_setlease() takes ->flc_lock, and nfsd_break_one_deleg() is called
under ->flc_lock this "after" is a true ordering provided by a spinlock.

Fixes: edcf972515 ("nfsd: fix RELEASE_LOCKOWNER")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-02-05 09:49:47 -05:00
Kent Overstreet
7b508b323b bcachefs: time_stats: Check for last_event == 0 when updating freq stats
This fixes spurious outliers in the frequency stats.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-05 01:16:31 -05:00
Mathias Krause
dd839f31d7 bcachefs: install fd later to avoid race with close
Calling fd_install() makes a file reachable for userland, including the
possibility to close the file descriptor, which leads to calling its
'release' hook. If that happens before the code had a chance to bump the
reference of the newly created task struct, the release callback will
call put_task_struct() too early, leading to the premature destruction
of the kernel thread.

Avoid that race by calling fd_install() later, after all the setup is
done.

Fixes: 1c6fdbd8f2 ("bcachefs: Initial commit")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-02-05 01:16:15 -05:00
Linus Torvalds
3f24fcdacd Miscellaneous bug fixes and cleanups in ext4's multi-block allocator
and extent handling code.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmW/G4YACgkQ8vlZVpUN
 gaPTpwf/c/Fk27GV8ge9PQtR6gmir/lyw2qkvK3Z+12aEsblZRmyvElyZWjAuNQG
 bciQyltabIPOA4XxfsZOrdgYI42n0rTTFG7bmI0lr+BJM/HRw0tGGMN91FZla0FP
 EXv/AiHKCqlT5OFZbD+8n1TzfdOgWotjug1VLteXve3YKjkDgt5IQm/0Gx9hKBld
 IR8SrQlD/rYe+VPvaHz5G4u09Ne5pUE5fDj3xE23wxfU5cloVzuVRCSOGWUCTnCW
 T0v6sHeKrmiLC8tIOZkBjer4nXC0MOu0p5+geAjwOArc9VJ1Lh2eAkH+GgDOVprx
 ahdl2qmbIbacBYECIeQ/+1i78+O1yw==
 =CmYr
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Miscellaneous bug fixes and cleanups in ext4's multi-block allocator
  and extent handling code"

* tag 'for-linus-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
  ext4: make ext4_set_iomap() recognize IOMAP_DELALLOC map type
  ext4: make ext4_map_blocks() distinguish delalloc only extent
  ext4: add a hole extent entry in cache after punch
  ext4: correct the hole length returned by ext4_map_blocks()
  ext4: convert to exclusive lock while inserting delalloc extents
  ext4: refactor ext4_da_map_blocks()
  ext4: remove 'needed' in trace_ext4_discard_preallocations
  ext4: remove unnecessary parameter "needed" in ext4_discard_preallocations
  ext4: remove unused return value of ext4_mb_release_group_pa
  ext4: remove unused return value of ext4_mb_release_inode_pa
  ext4: remove unused return value of ext4_mb_release
  ext4: remove unused ext4_allocation_context::ac_groups_considered
  ext4: remove unneeded return value of ext4_mb_release_context
  ext4: remove unused parameter ngroup in ext4_mb_choose_next_group_*()
  ext4: remove unused return value of __mb_check_buddy
  ext4: mark the group block bitmap as corrupted before reporting an error
  ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal()
  ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found()
  ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt
  ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
  ...
2024-02-04 07:33:01 +00:00
Linus Torvalds
9e28c7a23b five smb3 client fixes, mostly multichannel related
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmW+oQoACgkQiiy9cAdy
 T1E6GwwAk9PQo1N8h6AI0PWCPoiHps8YzpISTgBfchDs+sIvo1sZEVMzTmSlWm4F
 cON4nb3QBkD4aqWbCdb1tjby2VAb0aHuRbCQPAczW4+R7He8ELlGYow7IPBWmyI8
 CTQRirYrJuIePh1aT2UG4mykNyQzp5i5ycsdcWFiIGliXrTp0De0rvE/60KGLuJ/
 lnDZp7+FHytIfpwVzl4bGax4odQh14whxdQme4C9a8kVfYPQGKFyADbuxy0KmWmu
 8kkEcGpY/bPdqLE1tGzNoVci9cEdYK6yUzkc8dj2ddsZ7YDHPz0NtsdWlC517qt+
 ekDADHRQZiKwyJzMGHibK8E4WoP0+JjB4j6OTc369ICzgjuIutWCCIexbS0IV/po
 IuyRQTvLSTfYpE8eoQavAKaty6sDnJcP7FepPtH2k5yGr4+ILZV4ozoH+hqzn2/K
 sf/q9ZfkyfaEi2JkMD9TTv7k2EWLMntpxklbpg59VKHY5MGvlQqoRl9b+jzfgKEZ
 xS/myr3e
 =q5wf
 -----END PGP SIGNATURE-----

Merge tag 'v6.8-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Five smb3 client fixes, mostly multichannel related:

   - four multichannel fixes including fix for channel allocation when
     multiple inactive channels, fix for unneeded race in channel
     deallocation, correct redundant channel scaling, and redundant
     multichannel disabling scenarios

   - add warning if max compound requests reached"

* tag 'v6.8-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: increase number of PDUs allowed in a compound request
  cifs: failure to add channel on iface should bump up weight
  cifs: do not search for channel if server is terminating
  cifs: avoid redundant calls to disable multichannel
  cifs: make sure that channel scaling is done only once
2024-02-04 07:26:19 +00:00
Linus Torvalds
fc86e5c990 Bug fixes for 6.8-rc3:
* Clear XFS_ATTR_INCOMPLETE filter on removing xattr from a node format
    attribute fork.
 
  * Remove conditional compilation of realtime geometry validator functions to
    prevent confusing error messages from being printed on the console during the
    mount operation.
 
 Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZbo7FAAKCRAH7y4RirJu
 9Cn+APsFEbHA8YQpCSxGDM+Xelez9X7wroi6QkyOxRP6Lqq6ogD6A96uuV86TxkQ
 Hkse9IAKkFoLmyzohT9u7Bv46M/X4w8=
 =Ez8Z
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.8-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Chandan Babu:

 - Clear XFS_ATTR_INCOMPLETE filter on removing xattr from a node format
   attribute fork

 - Remove conditional compilation of realtime geometry validator
   functions to prevent confusing error messages from being printed on
   the console during the mount operation

* tag 'xfs-6.8-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: remove conditional building of rt geometry validator functions
  xfs: reset XFS_ATTR_INCOMPLETE filter on node removal
2024-02-04 07:22:51 +00:00
Linus Torvalds
56897d5188 Tracing and eventfs fixes for v6.8:
- Fix the return code for ring_buffer_poll_wait()
   It was returing a -EINVAL instead of EPOLLERR.
 
 - Zero out the tracefs_inode so that all fields are initialized.
   The ti->private could have had stale data, but instead of
   just initializing it to NULL, clear out the entire structure
   when it is allocated.
 
 - Fix a crash in timerlat
   The hrtimer was initialized at read and not open, but is
   canceled at close. If the file was opened and never read
   the close will pass a NULL pointer to hrtime_cancel().
 
 - Rewrite of eventfs.
   Linus wrote a patch series to remove the dentry references in the
   eventfs_inode and to use ref counting and more of proper VFS
   interfaces to make it work.
 
 - Add warning to put_ei() if ei is not set to free. That means
   something is about to free it when it shouldn't.
 
 - Restructure the eventfs_inode to make it more compact, and remove
   the unused llist field.
 
 - Remove the fsnotify*() funtions for when the inodes were being created
   in the lookup code. It doesn't make sense to notify about creation
   just because something is being looked up.
 
 - The inode hard link count was not accurate. It was being updated
   when a file was looked up. The inodes of directories were updating
   their parent inode hard link count every time the inode was created.
   That means if memory reclaim cleaned a stale directory inode and
   the inode was lookup up again, it would increment the parent inode
   again as well. Al Viro said to just have all eventfs directories
   have a hard link count of 1. That tells user space not to trust it.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZb1l/RQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qk6jAQDmecDOnx+j/Rm5krbX/meVPYXFj2CU
 1wO7w1HBzopsBwEA5AjTKm9IGrl/eVG/+jViS165b+sJfwEcblHEFPWcIwo=
 =uUzb
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing and eventfs fixes from Steven Rostedt:

 - Fix the return code for ring_buffer_poll_wait()

   It was returing a -EINVAL instead of EPOLLERR.

 - Zero out the tracefs_inode so that all fields are initialized.

   The ti->private could have had stale data, but instead of just
   initializing it to NULL, clear out the entire structure when it is
   allocated.

 - Fix a crash in timerlat

   The hrtimer was initialized at read and not open, but is canceled at
   close. If the file was opened and never read the close will pass a
   NULL pointer to hrtime_cancel().

 - Rewrite of eventfs.

   Linus wrote a patch series to remove the dentry references in the
   eventfs_inode and to use ref counting and more of proper VFS
   interfaces to make it work.

 - Add warning to put_ei() if ei is not set to free. That means
   something is about to free it when it shouldn't.

 - Restructure the eventfs_inode to make it more compact, and remove the
   unused llist field.

 - Remove the fsnotify*() funtions for when the inodes were being
   created in the lookup code. It doesn't make sense to notify about
   creation just because something is being looked up.

 - The inode hard link count was not accurate.

   It was being updated when a file was looked up. The inodes of
   directories were updating their parent inode hard link count every
   time the inode was created. That means if memory reclaim cleaned a
   stale directory inode and the inode was lookup up again, it would
   increment the parent inode again as well. Al Viro said to just have
   all eventfs directories have a hard link count of 1. That tells user
   space not to trust it.

* tag 'trace-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  eventfs: Keep all directory links at 1
  eventfs: Remove fsnotify*() functions from lookup()
  eventfs: Restructure eventfs_inode structure to be more condensed
  eventfs: Warn if an eventfs_inode is freed without is_freed being set
  tracing/timerlat: Move hrtimer_init to timerlat_fd open()
  eventfs: Get rid of dentry pointers without refcounts
  eventfs: Clean up dentry ops and add revalidate function
  eventfs: Remove unused d_parent pointer field
  tracefs: dentry lookup crapectomy
  tracefs: Avoid using the ei->dentry pointer unnecessarily
  eventfs: Initialize the tracefs inode properly
  tracefs: Zero out the tracefs_inode when allocating it
  ring-buffer: Clean ring_buffer_poll_wait() error return
2024-02-02 15:32:58 -08:00
Linus Torvalds
6b89b6af45 gfs2: revert this broken commit
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmW9FuAUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpX2A//ZE1Tb/YHSqiVHvUSH0YFyKc50H24
 Q3KqNH7LHRKBBIyIUJxZ2QaYsmmGPQqaB3RDAAQoItYb6mXaJQR0qIAxfKmcaUec
 genFrJNuaICIq/U+cQeAypnRSvOhnMfsY9c/jFPCZI72bS8mMDcGC1vvJW+R1N1V
 c5VAGuI/W1hkYOaVba3CkQ7sKcV/P+qbCI6dHB7DuCMr8L0foftItz0NIQ7tL9VM
 PMRtcUSehgsiDSH7HthvmHvC2bfBXgjgXTde9f5aqMoBkJE/QuCl4W3FCChcAFLg
 bxy1Ke6z4SF6N70BiKj+enSBDUVljOZOd/5IZOUPd6BuDcKDK/vKX6Vva5xYjnJ2
 iZOBdGFoPVAmevt9iXPehMdoWcEZOZeHzBV82+k9My5wjWdLWp6ZtFQ/zhvt42YN
 Z0EWPIKqxN4xSpQFEjczlKA1IWPv6YJTJJBPaCS86rBduU/cMvSwVFGnk3XnES2o
 k4fyBF6UUYUe3tCD0E6NfBJCgHiq4V1ZjMMXiXKAUUe6L5zSIucjws27l3VZofFo
 abjE/wxR5ad+3T0rt0sx+gFD8aiCzqKYaHgHHYAbofUXiTpEt7gMNhtNGlzUudO9
 KpkAiWGHtYUlDP1qIWR8lHTdf8Vj6KpCrcuyPVDIjWb9im6rC+6fh63PE9ddrLfI
 CqjwSoFOXRGEjm0=
 =Qv1t
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v6.8-rc2-revert' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 revert from Andreas Gruenbacher:
 "It turns out that the commit to use GL_NOBLOCK flag for non-blocking
  lookups has several issues, and not all of them have a simple fix"

* tag 'gfs2-v6.8-rc2-revert' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  Revert "gfs2: Use GL_NOBLOCK flag for non-blocking lookups"
2024-02-02 15:30:33 -08:00
Andreas Gruenbacher
e9f1e6bb55 Revert "gfs2: Use GL_NOBLOCK flag for non-blocking lookups"
Commit "gfs2: Use GL_NOBLOCK flag for non-blocking lookups" has several
issues, some of which are non-trivial to fix, so revert it for now:

  https://lore.kernel.org/gfs2/20240202050230.GA875515@ZenIV/T/

This reverts commit dd00aaeb34.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-02-02 17:21:44 +01:00
Zhang Yi
ec9d669eba ext4: make ext4_set_iomap() recognize IOMAP_DELALLOC map type
Since ext4_map_blocks() can recognize a delayed allocated only extent,
make ext4_set_iomap() can also recognize it, and remove the useless
separate check in ext4_iomap_begin_report().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-7-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-02-01 23:59:21 -05:00
Zhang Yi
874eaba96d ext4: make ext4_map_blocks() distinguish delalloc only extent
Add a new map flag EXT4_MAP_DELAYED to indicate the mapping range is a
delayed allocated only (not unwritten) one, and making
ext4_map_blocks() can distinguish it, no longer mixing it with holes.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-02-01 23:59:21 -05:00
Zhang Yi
9f1118223a ext4: add a hole extent entry in cache after punch
In order to cache hole extents in the extent status tree and keep the
hole length as long as possible, re-add a hole entry to the cache just
after punching a hole.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-5-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-02-01 23:59:21 -05:00
Zhang Yi
6430dea07e ext4: correct the hole length returned by ext4_map_blocks()
In ext4_map_blocks(), if we can't find a range of mapping in the
extents cache, we are calling ext4_ext_map_blocks() to search the real
path and ext4_ext_determine_hole() to determine the hole range. But if
the querying range was partially or completely overlaped by a delalloc
extent, we can't find it in the real extent path, so the returned hole
length could be incorrect.

Fortunately, ext4_ext_put_gap_in_cache() have already handle delalloc
extent, but it searches start from the expanded hole_start, doesn't
start from the querying range, so the delalloc extent found could not be
the one that overlaped the querying range, plus, it also didn't adjust
the hole length. Let's just remove ext4_ext_put_gap_in_cache(), handle
delalloc and insert adjusted hole extent in ext4_ext_determine_hole().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-4-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-02-01 23:47:02 -05:00
Zhang Yi
acf795dc16 ext4: convert to exclusive lock while inserting delalloc extents
ext4_da_map_blocks() only hold i_data_sem in shared mode and i_rwsem
when inserting delalloc extents, it could be raced by another querying
path of ext4_map_blocks() without i_rwsem, .e.g buffered read path.
Suppose we buffered read a file containing just a hole, and without any
cached extents tree, then it is raced by another delayed buffered write
to the same area or the near area belongs to the same hole, and the new
delalloc extent could be overwritten to a hole extent.

 pread()                           pwrite()
  filemap_read_folio()
   ext4_mpage_readpages()
    ext4_map_blocks()
     down_read(i_data_sem)
     ext4_ext_determine_hole()
     //find hole
     ext4_ext_put_gap_in_cache()
      ext4_es_find_extent_range()
      //no delalloc extent
                                    ext4_da_map_blocks()
                                     down_read(i_data_sem)
                                     ext4_insert_delayed_block()
                                     //insert delalloc extent
      ext4_es_insert_extent()
      //overwrite delalloc extent to hole

This race could lead to inconsistent delalloc extents tree and
incorrect reserved space counter. Fix this by converting to hold
i_data_sem in exclusive mode when adding a new delalloc extent in
ext4_da_map_blocks().

Cc: stable@vger.kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-3-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-02-01 23:47:02 -05:00
Zhang Yi
3fcc2b887a ext4: refactor ext4_da_map_blocks()
Refactor and cleanup ext4_da_map_blocks(), reduce some unnecessary
parameters and branches, no logic changes.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-02-01 23:47:02 -05:00
Linus Torvalds
49a4be2c84 Description for this pull request:
- Fix BUG in iov_iter_revert reported from syzbot.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmW7jWYWHGxpbmtpbmpl
 b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCJvdEACPLj/xOrjTCzOPam6iKaf4NXGF
 8LlAtHv9nDwKIXoA3lqMFGkc74cx2JOk/rnuO4l2SE75eWdlTHjo27XKFyDoakcF
 kij+1bw7BUy4AwgANc4jIUTFYCx8mbzJF18eTuJdrs2h0+3rFHhcs9PpPmLfjDbZ
 BGY6Xhc/a/1/56lA6IMsioLHBiOn7FCVl+I+joj+I0etMfqKg3mkro2LJ2AaiCjv
 kvsiZ8A3HHgHM8X3JEOqqzokhTrbtQ6Thrit1NSH7e/JdjKSZvkAIv81otU6Fect
 vLQSz4mfRdfzAhwrrX+cTSLljkgv7OwD9AfeyI2Fhhu0lQjzjIFknTghZsIreQm4
 1YnAfKukWJ8AfzVmAMQVbqtWfP3WAs2gHYgoIibm8VIkjHEOmPHnK83cJCqFlq92
 OvQZsqZK2u3hXj2L3fl+1sQlSt+bcnpSE/vEDxpEwxVN3YifY9pJoyhP09RjH1GE
 /VnSHgX1laVeaNXinwlMKL1uTTVCjuosm4zFzByAQwwka6YCG4nLoHVZ6u2M26Bb
 a/tw9RTeGnlQiMy/rb5QdxKTBVCuZM8eMbcaVc5yNnD9P7yRqbRNaI0SnmBjOU7I
 V6bNqJUZwlPjQMeFJ0cUB5uaYopioA7lCjA0Hv8Eh+FX6v9g7oJvIXiGFPb8o4Pq
 adUVsyGvxbVBabdAjg==
 =CryD
 -----END PGP SIGNATURE-----

Merge tag 'exfat-for-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat fix from Namjae Jeon:

 - Fix BUG in iov_iter_revert reported from syzbot

* tag 'exfat-for-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: fix zero the unwritten part for dio read
2024-02-01 11:45:53 -08:00
Paulo Alcantara
11d4d1dba3 smb: client: increase number of PDUs allowed in a compound request
With the introduction of SMB2_OP_QUERY_WSL_EA, the client may now send
5 commands in a single compound request in order to query xattrs from
potential WSL reparse points, which should be fine as we currently
allow up to 5 PDUs in a single compound request.  However, if
encryption is enabled (e.g. 'seal' mount option) or enforced by the
server, current MAX_COMPOUND(5) won't be enough as we require an extra
PDU for the transform header.

Fix this by increasing MAX_COMPOUND to 7 and, while we're at it, add
an WARN_ON_ONCE() and return -EIO instead of -ENOMEM in case we
attempt to send a compound request that couldn't include the extra
transform header.

Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-01 12:15:51 -06:00
Shyam Prasad N
6aac002bcf cifs: failure to add channel on iface should bump up weight
After the interface selection policy change to do a weighted
round robin, each iface maintains a weight_fulfilled. When the
weight_fulfilled reaches the total weight for the iface, we know
that the weights can be reset and ifaces can be allocated from
scratch again.

During channel allocation failures on a particular channel,
weight_fulfilled is not incremented. If a few interfaces are
inactive, we could end up in a situation where the active
interfaces are all allocated for the total_weight, and inactive
ones are all that remain. This can cause a situation where
no more channels can be allocated further.

This change fixes it by increasing weight_fulfilled, even when
channel allocation failure happens. This could mean that if
there are temporary failures in channel allocation, the iface
weights may not strictly be adhered to. But that's still okay.

Fixes: a6d8fb54a5 ("cifs: distribute channels across interfaces based on speed")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-01 12:13:05 -06:00
Shyam Prasad N
88675b22d3 cifs: do not search for channel if server is terminating
In order to scale down the channels, the following sequence
of operations happen:
1. server struct is marked for terminate
2. the channel is deallocated in the ses->chans array
3. at a later point the cifsd thread actually terminates the server

Between 2 and 3, there can be calls to find the channel for
a server struct. When that happens, there can be an ugly warning
that's logged. But this is expected.

So this change does two things:
1. in cifs_ses_get_chan_index, if server->terminate is set, return
2. always make sure server->terminate is set with chan_lock held

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-01 12:12:17 -06:00
Shyam Prasad N
e77e15fa5e cifs: avoid redundant calls to disable multichannel
When the server reports query network interface info call
as unsupported following a tree connect, it means that
multichannel is unsupported, even if the server capabilities
report otherwise.

When this happens, cifs_chan_skip_or_disable is called to
disable multichannel on the client. However, we only need
to call this when multichannel is currently setup.

Fixes: f591062bdb ("cifs: handle servers that still advertise multichannel after disabling")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-02-01 12:10:57 -06:00
Steven Rostedt (Google)
ca185770db eventfs: Keep all directory links at 1
The directory link count in eventfs was somewhat bogus. It was only being
updated when a directory child was being looked up and not on creation.

One solution would be to update in get_attr() the link count by iterating
the ei->children list and then adding 2. But that could slow down simple
stat() calls, especially if it's done on all directories in eventfs.

Another solution would be to add a parent pointer to the eventfs_inode
and keep track of the number of sub directories it has on creation. But
this adds overhead for something not really worthwhile.

The solution decided upon is to keep all directory links in eventfs as 1.
This tells user space not to rely on the hard links of directories. Which
in this case it shouldn't.

Link: https://lore.kernel.org/linux-trace-kernel/20240201002719.GS2087318@ZenIV/
Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.339968298@goodmis.org

Cc: stable@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Fixes: c1504e5102 ("eventfs: Implement eventfs dir creation functions")
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-01 11:53:53 -05:00
Steven Rostedt (Google)
12d823b31f eventfs: Remove fsnotify*() functions from lookup()
The dentries and inodes are created when referenced in the lookup code.
There's no reason to call fsnotify_*() functions when they are created by
a reference. It doesn't make any sense.

Link: https://lore.kernel.org/linux-trace-kernel/20240201002719.GS2087318@ZenIV/
Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.166973329@goodmis.org

Cc: stable@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Fixes: a376007917 ("eventfs: Implement functions to create files and dirs when accessed");
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-01 11:53:53 -05:00
Steven Rostedt (Google)
264424dfdd eventfs: Restructure eventfs_inode structure to be more condensed
Some of the eventfs_inode structure has holes in it. Rework the structure
to be a bit more condensed, and also remove the no longer used llist
field.

Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.002321438@goodmis.org

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-01 11:53:52 -05:00
Steven Rostedt (Google)
5a49f99604 eventfs: Warn if an eventfs_inode is freed without is_freed being set
There should never be a case where an evenfs_inode is being freed without
is_freed being set. Add a WARN_ON_ONCE() if it ever happens. That would
mean there was one too many put_ei()s.

Link: https://lore.kernel.org/linux-trace-kernel/20240201161616.843551963@goodmis.org

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-01 11:53:52 -05:00
Linus Torvalds
43aa6f97c2 eventfs: Get rid of dentry pointers without refcounts
The eventfs inode had pointers to dentries (and child dentries) without
actually holding a refcount on said pointer.  That is fundamentally
broken, and while eventfs tried to then maintain coherence with dentries
going away by hooking into the '.d_iput' callback, that doesn't actually
work since it's not ordered wrt lookups.

There were two reasonms why eventfs tried to keep a pointer to a dentry:

 - the creation of a 'events' directory would actually have a stable
   dentry pointer that it created with tracefs_start_creating().

   And it needed that dentry when tearing it all down again in
   eventfs_remove_events_dir().

   This use is actually ok, because the special top-level events
   directory dentries are actually stable, not just a temporary cache of
   the eventfs data structures.

 - the 'eventfs_inode' (aka ei) needs to stay around as long as there
   are dentries that refer to it.

   It then used these dentry pointers as a replacement for doing
   reference counting: it would try to make sure that there was only
   ever one dentry associated with an event_inode, and keep a child
   dentry array around to see which dentries might still refer to the
   parent ei.

This gets rid of the invalid dentry pointer use, and renames the one
valid case to a different name to make it clear that it's not just any
random dentry.

The magic child dentry array that is kind of a "reverse reference list"
is simply replaced by having child dentries take a ref to the ei.  As
does the directory dentries.  That makes the broken use case go away.

Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240131185513.280463000@goodmis.org

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: c1504e5102 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-02-01 10:31:17 -05:00