Commit Graph

94366 Commits

Author SHA1 Message Date
Linus Torvalds
63fa605041 Changes since last update:
- Make sure only regular inodes can be used for file-backed mounts;
 
  - Two minor codebase cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmcNHqARHHhpYW5nQGtl
 cm5lbC5vcmcACgkQUXZn5Zlu5qo1XA/+MFbobJ4bWxJQKnouLlCiFQ5C1xEFbVn2
 HasHLfrMIcdz/n/S3Ib4Ayi+9W0zM2Ekq9EG+fuOBqjZP17+EOj3e7OPtVVPNwx0
 u2GbD9zNCliZg9PigCfPO+6oImt6l/Mytmx+7bELqbMywAy7JNCNesJuyycsTcja
 o1I3dNNUZdppilohXPIENTRLjBlOuGBaZdUXDih0LqB+Pb0jgXTP6JfD88h1MLFw
 xBbhqQ1A/GgyESfsMpZFn2xvFIocLBCIAdAehi9M1AiEwCTjGkTZ66WW3H6Es/Zp
 vcC9KjHJoGGCXxZf8mnoQHQo/WqQuNUPc2BVf9iExzCo0nwRArcTbAu5Bskqg0LF
 c+a7FrrxhODz8ioxOOiMUqG4b3/qGkzlk6w5a/t7IRrmFtmcXmPWZ14aI8qpy7o/
 CW3iPUoF/zEsmmFvOgJtHwy3g+bC8KhDvz3fqFIDSSMjSKjqb4cPYSe/L5MyhwED
 wmLgp1uYjEyR0uuqqUp93FEYIbHuO5HpPRT5crLczRIoYn7bXRhjNWLCTmzlLqrj
 yDAQsrngK99BQ7g0FTQ/OV9si/HRRGsusZmCkeCb6KnRNIvml4X9/WXKc1ioOFk/
 3MSaxlQlTXzCCctjVCDNn9GfD/yR1cXu2sUpGSEnP1ssLG4ARyXGVfoeSw7gJ4xn
 C5lm9SOmkzU=
 =0gFG
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-6.12-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
 "The main one fixes a syzbot issue due to the invalid inode type out of
  file-backed mounts. The others are minor cleanups without actual logic
  changes.

  Summary:

   - Make sure only regular inodes can be used for file-backed mounts

   - Two minor codebase cleanups"

* tag 'erofs-for-6.12-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: get rid of kaddr in `struct z_erofs_maprecorder`
  erofs: get rid of z_erofs_try_to_claim_pcluster()
  erofs: ensure regular inodes for file-backed mounts
2024-10-14 11:12:09 -07:00
Linus Torvalds
cfea70e835 two fixes for Windows symlink handling
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmcMBf4ACgkQiiy9cAdy
 T1Hf7Qv/f/TEXZisWIGshUpIerxOAWmN70bTw4sNID9ge8mVWwtVJBs57rlSjPTc
 97Jj95urqnKEAGk/KC8qntp5QCMBQAeBFILigZph2c7vqEXPQy0dpbDUEUFuRN2G
 mq0wn7IcJZcPJmhZGx9JJeteHk/24drJRSM+jyklwI2Rmev6Y6dlsv4JyMuvP7iI
 YuCdbN7rYXsRBkpnK5AbiWCRdxwQMiMuGsppNQyBVSZKkt/g+8R16Z6WKxSbkaZf
 XajVsywhlP5Bg9HRAk/YTPK4enKVi8ISp9qfS9EuinwM/VFzEnXnYrec/fiD0Ukg
 rEemM7iF/YQdQq/2q8gm5KpoOjnLbaew+Zb+OoWyXMK7RJygD79+uMHn3v1cdi7B
 BWCgbQQ7KiRi6rOo0Xzz8Rmw3L4+DHjTvIbh46jz90qQyuumR2hUSa7cPl2ATO4l
 lxA50Q8xPE1i0Cfob1w/XHlrfmWMyovtHSKDvaeOMclp/VAHDfS6nB0x/ngyY8UH
 ii2czaDd
 =uI8y
 -----END PGP SIGNATURE-----

Merge tag '6.12-rc2-cifs-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Two fixes for Windows symlink handling"

* tag '6.12-rc2-cifs-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix creating native symlinks pointing to current or parent directory
  cifs: Improve creating native symlinks pointing to directory
2024-10-13 10:52:39 -07:00
Linus Torvalds
6254d53727 NFS Client Bugfixes for Linux 6.12-rc
Localio Bugfixes:
   * Remove duplicated include in localio.c
   * Fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()
   * Fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT
   * Fix nfsd_file tracepoints to handle NULL rqstp pointers
 
 Other Bugfixes:
   * Fix program selection loop in svc_process_common
   * Fix integer overflow in decode_rc_list()
   * Prevent NULL-pointer dereference in nfs42_complete_copies()
   * Fix CB_RECALL performance issues when using a large number of delegations
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmcJjjQACgkQ18tUv7Cl
 QOvgJw/6A33s+pjyBVLIKT6oMCPkUJeQ4Rhg9Je0Qw/ji0eFkT4Eyd65kRz3T9M/
 qRrCfWaUd2dTYcbKQyhuGTlEfICZa9R4I0/Ztk9yvf9xcd1xFXKzTkFekGUVeHQA
 OcngDu9psFxhvyzKI8nAHs1ephX/T7TywvTKANMRbeRCYYvVkytAt9YeVMigYZa5
 dnchoUdGUdL6B6RXCU/Qhf0A1uYyA4hkk/FTBCPgv+kYx5pnjFq0y/yIIHDzCR3I
 +yE1ss3EpVTQgt2Ca/cmDyYXsa7G8G51U7cS5AeIoXfsf1EGtTujowWcBY4oqFEC
 ixx58fQe48AqwsP5XDZn8gnsuYH9snnw5rIB0IVqq55/a+XLMupHayyf/iziMV3s
 JWgT4gKDyFca2pT+bJ8iWweU+ecRYxKGnh2NydyBiqowogsHZm4uKh0vELvqqkBd
 RIjCyIiQVhYBII2jqpjRnxrqhGUT5XO99NQdQIGV0bUjCEP4YAjY4ChfEVcWXhnB
 ppyBP+r8N5O77NcVqsVQS26U0/jb9K30LyYl9VT43ank3d+VVtHA5ZqnUflWtwuc
 2XiGDvXW9mIvbVraWIZXUNVy39bzRclDf5bx4jeYLnKCMym81rkEIBOvBKQKZTrl
 v+1Nhaj+fSw+rFSUm0KPqms0UDiT0Ol7ltu84ifadYqubbSEbqU=
 =QBvR
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
 "Localio Bugfixes:
   - remove duplicated include in localio.c
   - fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()
   - fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT
   - fix nfsd_file tracepoints to handle NULL rqstp pointers

  Other Bugfixes:
   - fix program selection loop in svc_process_common
   - fix integer overflow in decode_rc_list()
   - prevent NULL-pointer dereference in nfs42_complete_copies()
   - fix CB_RECALL performance issues when using a large number of
     delegations"

* tag 'nfs-for-6.12-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: remove revoked delegation from server's delegation list
  nfsd/localio: fix nfsd_file tracepoints to handle NULL rqstp
  nfs_common: fix Kconfig for NFS_COMMON_LOCALIO_SUPPORT
  nfs_common: fix race in NFS calls to nfsd_file_put_local() and nfsd_serv_put()
  NFSv4: Prevent NULL-pointer dereference in nfs42_complete_copies()
  SUNRPC: Fix integer overflow in decode_rc_list()
  sunrpc: fix prog selection loop in svc_process_common
  nfs: Remove duplicated include in localio.c
2024-10-11 15:37:15 -07:00
Gao Xiang
ae54567eaa erofs: get rid of kaddr in struct z_erofs_maprecorder
`kaddr` becomes useless after switching to metabuf.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241010235830.1535616-1-hsiangkao@linux.alibaba.com
2024-10-11 13:36:58 +08:00
Gao Xiang
2402082e53 erofs: get rid of z_erofs_try_to_claim_pcluster()
Just fold it into the caller for simplicity.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241010090420.405871-1-hsiangkao@linux.alibaba.com
2024-10-11 13:36:58 +08:00
Gao Xiang
416a8b2c02 erofs: ensure regular inodes for file-backed mounts
Only regular inodes are allowed for file-backed mounts, not directories
(as seen in the original syzbot case) or special inodes.

Also ensure that .read_folio() is implemented on the underlying fs
for the primary device.

Fixes: fb17675026 ("erofs: add file-backed mount support")
Reported-by: syzbot+001306cd9c92ce0df23f@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/00000000000011bdde0622498ee3@google.com
Tested-by: syzbot+001306cd9c92ce0df23f@syzkaller.appspotmail.com
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240917130803.32418-1-hsiangkao@linux.alibaba.com
2024-10-11 13:36:41 +08:00
Linus Torvalds
eb952c47d1 for-6.12-rc2-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmcH4qAACgkQxWXV+ddt
 WDtDig//czntfO+iRvDERZWTIB6vdVExfLd3r3ZNYlO1pIvgCuvqx3iYva+0ZhGW
 8A+gcRax7cz0jCaxDp/+5lIRfdNxZH6/LwjZsDgU8Ly7himeRmwhtn2fCgNeiH/K
 bUl92+ZMo2vwqTKXYa3xF1g3Hz6cRXVW7gJrMwNhb1hpPTGx+lgYJU02m/Io/vjK
 1jcrZ84OEPIOY5uiAoDyO2hgsT/zVEeuuOiSTpKSzrghPbo0vmjLiYJ5T+CE5Uw3
 u3w7/Fqnw49NwucqtncvyFFDXY9EWNuQhowi3hqJgOYTInqwwJigIpQV0hDDwYxb
 ohGUGjazGfAEf/cy1jZXMbwCVgg8/Nj9x0eDKKhfs19VYUbMkEYQ8LKRTUlCeBwS
 H/2AmqpqHEEO+tPY3P+w6MVwkNho8JNpWPdP5OzJs7XrD067IViOjD06HPM/k5ci
 TU3zp9NYvgHVtmfZK1Aqsg9OYVhI1klVXejmlAzOLxejRPWXK/1hBw3kXbC6I+k1
 50l0Yh1dgEnclMI3yWsKoj8IYUAkh2eudt0pNsot4a5vICMY++NVS2eukdz5UcEz
 ix7hcpYcCcmzoOaelyEgmdAncWVGJT5w2Nzy85YaOp+Z1C65Ywb41utU+sSY+swB
 kZfwl9vrsfu754vX7UKBherCvvYo+Lnj3GeX8Oe+1LoT2BP0TPk=
 =lTqc
 -----END PGP SIGNATURE-----

Merge tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - update fstrim loop and add more cancellation points, fix reported
   delayed or blocked suspend if there's a huge chunk queued

 - fix error handling in recent qgroup xarray conversion

 - in zoned mode, fix warning printing device path without RCU
   protection

 - again fix invalid extent xarray state (6252690f7e), lost due to
   refactoring

* tag 'for-6.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix clear_dirty and writeback ordering in submit_one_sector()
  btrfs: zoned: fix missing RCU locking in error message when loading zone info
  btrfs: fix missing error handling when adding delayed ref with qgroups enabled
  btrfs: add cancellation points to trim loops
  btrfs: split remaining space to discard in chunks
2024-10-10 10:02:59 -07:00
Linus Torvalds
5870963f6c nfsd-6.12 fixes:
- Fix NFSD bring-up / shutdown
 - Fix a UAF when releasing a stateid
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmcHx6IACgkQM2qzM29m
 f5ecExAAheSpSP9D+1n3hKeOlNfhY8FUzd41Arn4NYV+jIBJbtx94/FlSNOxA0mp
 Ovm1I8uAxy4TR8TLt7tsxfT7JDOStKwXFl3QlOUZT/+uyyJr7/q5R959R3oMiccR
 Rfrpj6j2yYWrI8qGDGHca4Vv2bDSxr4mzztWwDe0SHSsjwf4OAcv+XF5vcfZ/CJN
 Bxulb9WfNU8XvdFcRDHokMfk6jiY6/+FCTwX8ckvbVEG6gHT8+CRYSUJ05j0LJGo
 xKZV913NgzcuV7PH0vq6vExJE6+rEPt/ejDAT5FM5yeNe+WJ4RTDgsYyIr9iLbHF
 mWB9M4NnP+EZhejtOCbZ9RZjjKro09ilEPpqILuuGQPtcHSeWmhNbFz0kwLe+zYZ
 CdtjnPZhjB0ITWgZ1HCtoJ8k/ZcMa7iiM/kApMLGr9fVj8/BHHFzS95PK7K/Fqur
 FLdhvo6CzZCnRd16e2kqWsG7wO2lPWcz4NWTf9wxIG5GCunXoVCEnK1VfHvnldbH
 BIFXZ+ib5qnL2i3Qmz7bQxmfIp5ryZnNx1mF0OM8imR9K/rsnARd7JfQ99lpMy8D
 mD4coZVTMMk/Zg9zuH8k5GBzB2zXXqgngp4IJIxqrKR7/AsuSU3R7r+O9CWN91GQ
 GKpRtMn/rVUg81jxDr3qoKquyxONoyVrVXAKsj1PgUSQdjUJgqU=
 =Rud7
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Fix NFSD bring-up / shutdown

 - Fix a UAF when releasing a stateid

* tag 'nfsd-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: fix possible badness in FREE_STATEID
  nfsd: nfsd_destroy_serv() must call svc_destroy() even if nfsd_startup_net() failed
  NFSD: Mark filecache "down" if init fails
2024-10-10 09:52:49 -07:00
Linus Torvalds
825ec756af Bug fixes for 6.12-rc3
* A few small typo fixes
 * fstests xfs/538 DEBUG-only fix
 * Performance fix on blockgc on COW'ed files,
   by skipping trims on cowblock inodes currently
   opened for write
 * Prevent cowblocks to be freed under dirty pagecache
   during unshare
 * Update MAINTAINERS file to quote the new maintainer
 
 Signed-off-by: Carlos Maiolino <cem@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iJUEABMJAB0WIQQMHYkcUKcy4GgPe2RGdaER5QtfpgUCZwY6mgAKCRBGdaER5Qtf
 poE8AYCZzMJr9wMrs2RsWRnaEhMRJNZIPQmSKXgHAK3mV5AbXtdHRc8yGVNHf+mW
 Nh0fwAkBf1Ix0VJWkXOSFHZI9O2lLRsCogbNjFhwYF0MHZch2/mq1Wa4Tj1SDlfg
 Ny2PJBNHyA==
 =OkRo
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Carlos Maiolino:

 - A few small typo fixes

 - fstests xfs/538 DEBUG-only fix

 - Performance fix on blockgc on COW'ed files, by skipping trims on
   cowblock inodes currently opened for write

 - Prevent cowblocks to be freed under dirty pagecache during unshare

 - Update MAINTAINERS file to quote the new maintainer

* tag 'xfs-6.12-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix a typo
  xfs: don't free cowblocks from under dirty pagecache on unshare
  xfs: skip background cowblock trims on inodes open for write
  xfs: support lowmode allocations in xfs_bmap_exact_minlen_extent_alloc
  xfs: call xfs_bmap_exact_minlen_extent_alloc from xfs_bmap_btalloc
  xfs: don't ifdef around the exact minlen allocations
  xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocate
  xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addname
  xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split
  xfs: return bool from xfs_attr3_leaf_add
  xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addname
  xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate()
  xfs: scrub: convert comma to semicolon
  xfs: Remove empty declartion in header file
  MAINTAINERS: add Carlos Maiolino as XFS release manager
2024-10-10 09:45:45 -07:00
Linus Torvalds
d3d1556696 12 hotfixes, 5 of which are c:stable. All singletons, about half of which
are MM.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZwcILgAKCRDdBJ7gKXxA
 jjnMAQDRl+UfscRUeMippi7wnL3ee6MKyhhZVOhoxP24uB7yBwD/Ulq4oE+mLHml
 YTlK/wj5qTZIsdxGaBzM1yifqp3L7gU=
 =lFmJ
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2024-10-09-15-46' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "12 hotfixes, 5 of which are c:stable. All singletons, about half of
  which are MM"

* tag 'mm-hotfixes-stable-2024-10-09-15-46' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm: zswap: delete comments for "value" member of 'struct zswap_entry'.
  CREDITS: sort alphabetically by name
  secretmem: disable memfd_secret() if arch cannot set direct map
  .mailmap: update Fangrui's email
  mm/huge_memory: check pmd_special() only after pmd_present()
  resource, kunit: fix user-after-free in resource_test_region_intersects()
  fs/proc/kcore.c: allow translation of physical memory addresses
  selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test
  device-dax: correct pgoff align in dax_set_mapping()
  kthread: unpark only parked kthread
  Revert "mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN"
  bcachefs: do not use PF_MEMALLOC_NORECLAIM
2024-10-09 16:01:40 -07:00
Alexander Gordeev
3d5854d75e fs/proc/kcore.c: allow translation of physical memory addresses
When /proc/kcore is read an attempt to read the first two pages results in
HW-specific page swap on s390 and another (so called prefix) pages are
accessed instead.  That leads to a wrong read.

Allow architecture-specific translation of memory addresses using
kc_xlate_dev_mem_ptr() and kc_unxlate_dev_mem_ptr() callbacks similarily
to /dev/mem xlate_dev_mem_ptr() and unxlate_dev_mem_ptr() callbacks.  That
way an architecture can deal with specific physical memory ranges.

Re-use the existing /dev/mem callback implementation on s390, which
handles the described prefix pages swapping correctly.

For other architectures the default callback is basically NOP.  It is
expected the condition (vaddr == __va(__pa(vaddr))) always holds true for
KCORE_RAM memory type.

Link: https://lkml.kernel.org/r/20240930122119.1651546-1-agordeev@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-09 12:47:19 -07:00
Michal Hocko
9897713fe1 bcachefs: do not use PF_MEMALLOC_NORECLAIM
Patch series "remove PF_MEMALLOC_NORECLAIM" v3.


This patch (of 2):

bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
inode to achieve GFP_NOWAIT semantic while holding locks. If this
allocation fails it will drop locks and use GFP_NOFS allocation context.

We would like to drop PF_MEMALLOC_NORECLAIM because it is really
dangerous to use if the caller doesn't control the full call chain with
this flag set. E.g. if any of the function down the chain needed
GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
cause unexpected failure.

While this is not the case in this particular case using the scoped gfp
semantic is not really needed bacause we can easily pus the allocation
context down the chain without too much clutter.

[akpm@linux-foundation.org: fix kerneldoc warnings]
Link: https://lkml.kernel.org/r/20240926172940.167084-1-mhocko@kernel.org
Link: https://lkml.kernel.org/r/20240926172940.167084-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz> # For vfs changes
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: James Morris <jmorris@namei.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-09 12:47:18 -07:00
Dai Ngo
7ef6010806 NFS: remove revoked delegation from server's delegation list
After the delegation is returned to the NFS server remove it
from the server's delegations list to reduce the time it takes
to scan this list.

Network trace captured while running the below script shows the
time taken to service the CB_RECALL increases gradually due to
the overhead of traversing the delegation list in
nfs_delegation_find_inode_server.

The NFS server in this test is a Solaris server which issues
CB_RECALL when receiving the all-zero stateid in the SETATTR.

mount=/mnt/data
for i in $(seq 1 20)
do
   echo $i
   mkdir $mount/testtarfile$i
   time  tar -C $mount/testtarfile$i -xf 5000_files.tar
done

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-10-09 15:39:22 -04:00
Linus Torvalds
ff9d4099e6 unicode updates
* Patch to handle code-points with the Ignorable property as regular
 character instead of treating them as an empty string. (Me)
 
 Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQS3XO7QfvpFoONBhH1OwQgI3t8RJgUCZwbNtQAKCRBOwQgI3t8R
 JrlrAP4yCrZCp4YPlXO6oQGfS9RIeYpmcMzGmp1IAeqlzpB5qwD/YS53kiAzF4qV
 +eD2fl/O4qNhZcWqBZKSH4shZBbXJAg=
 =XCsY
 -----END PGP SIGNATURE-----

Merge tag 'unicode-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode

Pull unicode fix from Gabriel Krisman Bertazi:

 - Handle code-points with the Ignorable property as regular character
   instead of treating them as an empty string (me)

* tag 'unicode-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
  unicode: Don't special case ignorable code points
2024-10-09 12:22:02 -07:00
Gabriel Krisman Bertazi
5c26d2f1d3 unicode: Don't special case ignorable code points
We don't need to handle them separately. Instead, just let them
decompose/casefold to themselves.

Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
2024-10-09 13:34:01 -04:00
Naohiro Aota
e761be2a07 btrfs: fix clear_dirty and writeback ordering in submit_one_sector()
This commit is a replay of commit 6252690f7e ("btrfs: fix invalid
mapping of extent xarray state"). We need to call
btrfs_folio_clear_dirty() before btrfs_set_range_writeback(), so that
xarray DIRTY tag is cleared.

With a refactoring commit 8189197425 ("btrfs: refactor
__extent_writepage_io() to do sector-by-sector submission"), it screwed
up and the order is reversed and causing the same hang. Fix the ordering
now in submit_one_sector().

Fixes: 8189197425 ("btrfs: refactor __extent_writepage_io() to do sector-by-sector submission")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-09 13:23:51 +02:00
Filipe Manana
fe4cd7ed12 btrfs: zoned: fix missing RCU locking in error message when loading zone info
At btrfs_load_zone_info() we have an error path that is dereferencing
the name of a device which is a RCU string but we are not holding a RCU
read lock, which is incorrect.

Fix this by using btrfs_err_in_rcu() instead of btrfs_err().

The problem is there since commit 08e11a3db0 ("btrfs: zoned: load zone's
allocation offset"), back then at btrfs_load_block_group_zone_info() but
then later on that code was factored out into the helper
btrfs_load_zone_info() by commit 09a46725cc ("btrfs: zoned: factor out
per-zone logic from btrfs_load_block_group_zone_info").

Fixes: 08e11a3db0 ("btrfs: zoned: load zone's allocation offset")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-09 13:23:51 +02:00
Andrew Kreimer
77bfe1b11e xfs: fix a typo
Fix a typo in comments.

Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-10-09 10:05:26 +02:00
Brian Foster
4390f019ad xfs: don't free cowblocks from under dirty pagecache on unshare
fallocate unshare mode explicitly breaks extent sharing. When a
command completes, it checks the data fork for any remaining shared
extents to determine whether the reflink inode flag and COW fork
preallocation can be removed. This logic doesn't consider in-core
pagecache and I/O state, however, which means we can unsafely remove
COW fork blocks that are still needed under certain conditions.

For example, consider the following command sequence:

xfs_io -fc "pwrite 0 1k" -c "reflink <file> 0 256k 1k" \
	-c "pwrite 0 32k" -c "funshare 0 1k" <file>

This allocates a data block at offset 0, shares it, and then
overwrites it with a larger buffered write. The overwrite triggers
COW fork preallocation, 32 blocks by default, which maps the entire
32k write to delalloc in the COW fork. All but the shared block at
offset 0 remains hole mapped in the data fork. The unshare command
redirties and flushes the folio at offset 0, removing the only
shared extent from the inode. Since the inode no longer maps shared
extents, unshare purges the COW fork before the remaining 28k may
have written back.

This leaves dirty pagecache backed by holes, which writeback quietly
skips, thus leaving clean, non-zeroed pagecache over holes in the
file. To verify, fiemap shows holes in the first 32k of the file and
reads return different data across a remount:

$ xfs_io -c "fiemap -v" <file>
<file>:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   ...
   1: [8..511]:        hole               504
   ...
$ xfs_io -c "pread -v 4k 8" <file>
00001000:  cd cd cd cd cd cd cd cd  ........
$ umount <mnt>; mount <dev> <mnt>
$ xfs_io -c "pread -v 4k 8" <file>
00001000:  00 00 00 00 00 00 00 00  ........

To avoid this problem, make unshare follow the same rules used for
background cowblock scanning and never purge the COW fork for inodes
with dirty pagecache or in-flight I/O.

Fixes: 46afb0628b ("xfs: only flush the unshared range in xfs_reflink_unshare")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-10-09 10:05:10 +02:00
Linus Torvalds
5b7c893ed5 Changes for 6.12-rc3
New:
 	implement fallocate for compressed files;
 	add support for the compression attribute;
 	optimize large writes to sparse files.
 Fixed:
 	fix several potential deadlock scenarios;
 	fix various internal bugs detected by syzbot;
 	add checks before accessing NTFS structures during parsing;
 	correct the format of output messages.
 Refactored:
 	replace fsparam_flag_no with fsparam_flag in options parser;
 	remove unused functions and macros.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEh0DEKNP0I9IjwfWEqbAzH4MkB7YFAmcFRkgACgkQqbAzH4Mk
 B7aQdQ//VT6lzBaxSZ0dm3xs/ygQYE3ter1OczAMQxh70O5px+Z0li5WCXQQe31W
 I74MmAy2G7C+JdzMkSc4F7+pFF21DO+1WQjODl9rP7yoHiLteiLFkMI+P5tHuFFT
 VPupAiCDQgm6QCtW2wofp/WR0ooxpLzC/yQsILrhUXJHdEY21oHLy/MQSOhVnv2k
 VeFdWA70+YS+JEQ+YyZBSeGeMVMQG9VROA57dZ+eAy1vgBUJGcKhYp9v4eiWG1uT
 A8rnV81FiHu+49OrE+ZwStguHkxu30dvlHuOqKW0pLW9/XoKCFf9w0WqLOGrAxLs
 lKznVgQZ521+QW+lNURMHBZt/bqQxQQInuOOH7jwLyBh0tmKoZlIU9Vw4V3yrm3t
 9NELb0FR/FeH02O89nMHl0SeaxDJvgFHDnfoOShuHaffur61d4RjChxqB4QcibOl
 Ibvtns71Uo89ngPIXdeTJK+7XZQVmAZHBXXYL+Rwr84Yv0tyn2q8DbrqzmphiYlR
 wbLyVBGqtx0ZcN/geuFLiVt2W/HSMPocXqrxv791ALX6cryL+CZdolQP7vWjt4Vh
 rKoaRZYN8fdcfT+BIYy95Kz0k12T7ColOoMfxVwugxK9ypF6tH6BJBwYXjCW4MOv
 AhkszJcsISxMKidsbzZA9xESM6M3sLHcW0swjAwWWRhc5zbSe6E=
 =j71V
 -----END PGP SIGNATURE-----

Merge tag 'ntfs3_for_6.12' of https://github.com/Paragon-Software-Group/linux-ntfs3

Pull ntfs3 updates from Konstantin Komarov:
"New:
   - implement fallocate for compressed files
   - add support for the compression attribute
   - optimize large writes to sparse files

 Fixes:
   - fix several potential deadlock scenarios
   - fix various internal bugs detected by syzbot
   - add checks before accessing NTFS structures during parsing
   - correct the format of output messages

  Refactoring:
   - replace fsparam_flag_no with fsparam_flag in options parser
   - remove unused functions and macros"

* tag 'ntfs3_for_6.12' of https://github.com/Paragon-Software-Group/linux-ntfs3: (25 commits)
  fs/ntfs3: Format output messages like others fs in kernel
  fs/ntfs3: Additional check in ntfs_file_release
  fs/ntfs3: Fix general protection fault in run_is_mapped_full
  fs/ntfs3: Sequential field availability check in mi_enum_attr()
  fs/ntfs3: Additional check in ni_clear()
  fs/ntfs3: Fix possible deadlock in mi_read
  ntfs3: Change to non-blocking allocation in ntfs_d_hash
  fs/ntfs3: Remove unused al_delete_le
  fs/ntfs3: Rename ntfs3_setattr into ntfs_setattr
  fs/ntfs3: Replace fsparam_flag_no -> fsparam_flag
  fs/ntfs3: Add support for the compression attribute
  fs/ntfs3: Implement fallocate for compressed files
  fs/ntfs3: Make checks in run_unpack more clear
  fs/ntfs3: Add rough attr alloc_size check
  fs/ntfs3: Stale inode instead of bad
  fs/ntfs3: Refactor enum_rstbl to suppress static checker
  fs/ntfs3: Fix sparse warning in ni_fiemap
  fs/ntfs3: Fix warning possible deadlock in ntfs_set_state
  fs/ntfs3: Fix sparse warning for bigendian
  fs/ntfs3: Separete common code for file_read/write iter/splice
  ...
2024-10-08 10:53:06 -07:00
Filipe Manana
6ef8fbce01 btrfs: fix missing error handling when adding delayed ref with qgroups enabled
When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(),
if we fail to insert the qgroup record we don't error out, we ignore it.
In fact we treat it as if there was no error and there was already an
existing record - we don't distinguish between the cases where
btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already
existed and we can free the given record, and the case where it returns
a negative error value, meaning the insertion into the xarray that is
used to track records failed.

Effectively we end up ignoring that we are lacking qgroup record in the
dirty extents xarray, resulting in incorrect qgroup accounting.

Fix this by checking for errors and return them to the callers.

Fixes: 3cce39a8ca ("btrfs: qgroup: use xarray to track dirty extents in transaction")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-07 23:22:30 +02:00
Luca Stefani
69313850dc btrfs: add cancellation points to trim loops
There are reports that system cannot suspend due to running trim because
the task responsible for trimming the device isn't able to finish in
time, especially since we have a free extent discarding phase, which can
trim a lot of unallocated space. There are no limits on the trim size
(unlike the block group part).

Since trime isn't a critical call it can be interrupted at any time,
in such cases we stop the trim, report the amount of discarded bytes and
return an error.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180
Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-07 23:21:56 +02:00
Luca Stefani
a99fcb0158 btrfs: split remaining space to discard in chunks
Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device,
mostly empty although we will do the split according to our super block
locations, the last super block ends at 256G, we can submit a huge
discard for the range [256G, 8T), causing a large delay.

Split the space left to discard based on BTRFS_MAX_DISCARD_CHUNK_SIZE in
preparation of introduction of cancellation points to trim. The value
of the chunk size is arbitrary, it can be higher or derived from actual
device capabilities but we can't easily read that using
bio_discard_limit().

Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180
Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-07 23:21:53 +02:00
Brian Foster
90a71daaf7 xfs: skip background cowblock trims on inodes open for write
The background blockgc scanner runs on a 5m interval by default and
trims preallocation (post-eof and cow fork) from inodes that are
otherwise idle. Idle effectively means that iolock can be acquired
without blocking and that the inode has no dirty pagecache or I/O in
flight.

This simple mechanism and heuristic has worked fairly well for
post-eof speculative preallocations. Support for reflink and COW
fork preallocations came sometime later and plugged into the same
mechanism, with similar heuristics. Some recent testing has shown
that COW fork preallocation may be notably more sensitive to blockgc
processing than post-eof preallocation, however.

For example, consider an 8GB reflinked file with a COW extent size
hint of 1MB. A worst case fully randomized overwrite of this file
results in ~8k extents of an average size of ~1MB. If the same
workload is interrupted a couple times for blockgc processing
(assuming the file goes idle), the resulting extent count explodes
to over 100k extents with an average size <100kB. This is
significantly worse than ideal and essentially defeats the COW
extent size hint mechanism.

While this particular test is instrumented, it reflects a fairly
reasonable pattern in practice where random I/Os might spread out
over a large period of time with varying periods of (in)activity.
For example, consider a cloned disk image file for a VM or container
with long uptime and variable and bursty usage. A background blockgc
scan that races and processes the image file when it happens to be
clean and idle can have a significant effect on the future
fragmentation level of the file, even when still in use.

To help combat this, update the heuristic to skip cowblocks inodes
that are currently opened for write access during non-sync blockgc
scans. This allows COW fork preallocations to persist for as long as
possible unless otherwise needed for functional purposes (i.e. a
sync scan), the file is idle and closed, or the inode is being
evicted from cache. While here, update the comments to help
distinguish performance oriented heuristics from the logic that
exists to maintain functional correctness.

Suggested-by: Darrick Wong <djwong@kernel.org>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-10-07 08:00:12 +02:00
Christoph Hellwig
6aac770598 xfs: support lowmode allocations in xfs_bmap_exact_minlen_extent_alloc
Currently the debug-only xfs_bmap_exact_minlen_extent_alloc allocation
variant fails to drop into the lowmode last resort allocator, and
thus can sometimes fail allocations for which the caller has a
transaction block reservation.

Fix this by using xfs_bmap_btalloc_low_space to do the actual allocation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-10-07 08:00:12 +02:00
Christoph Hellwig
405ee87c69 xfs: call xfs_bmap_exact_minlen_extent_alloc from xfs_bmap_btalloc
xfs_bmap_exact_minlen_extent_alloc duplicates the args setup in
xfs_bmap_btalloc.  Switch to call it from xfs_bmap_btalloc after
doing the basic setup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-10-07 08:00:11 +02:00
Christoph Hellwig
b611fddc04 xfs: don't ifdef around the exact minlen allocations
Exact minlen allocations only exist as an error injection tool for debug
builds.  Currently this is implemented using ifdefs, which means the code
isn't even compiled for non-XFS_DEBUG builds.  Enhance the compile test
coverage by always building the code and use the compilers' dead code
elimination to remove it from the generated binary instead.

The only downside is that the alloc_minlen_only field is unconditionally
added to struct xfs_alloc_args now, but by moving it around and packing
it tightly this doesn't actually increase the size of the structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-10-07 08:00:11 +02:00
Christoph Hellwig
865469cd41 xfs: fold xfs_bmap_alloc_userdata into xfs_bmapi_allocate
Userdata and metadata allocations end up in the same allocation helpers.
Remove the separate xfs_bmap_alloc_userdata function to make this more
clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-10-07 08:00:11 +02:00
Christoph Hellwig
b3f4e84e2f xfs: distinguish extra split from real ENOSPC from xfs_attr_node_try_addname
Just like xfs_attr3_leaf_split, xfs_attr_node_try_addname can return
-ENOSPC both for an actual failure to allocate a disk block, but also
to signal the caller to convert the format of the attr fork.  Use magic
1 to ask for the conversion here as well.

Note that unlike the similar issue in xfs_attr3_leaf_split, this one was
only found by code review.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-10-07 08:00:11 +02:00
Christoph Hellwig
a5f73342ab xfs: distinguish extra split from real ENOSPC from xfs_attr3_leaf_split
xfs_attr3_leaf_split propagates the need for an extra btree split as
-ENOSPC to it's only caller, but the same return value can also be
returned from xfs_da_grow_inode when it fails to find free space.

Distinguish the two cases by returning 1 for the extra split case instead
of overloading -ENOSPC.

This can be triggered relatively easily with the pending realtime group
support and a file system with a lot of small zones that use metadata
space on the main device.  In this case every about 5-10th run of
xfs/538 runs into the following assert:

	ASSERT(oldblk->magic == XFS_ATTR_LEAF_MAGIC);

in xfs_attr3_leaf_split caused by an allocation failure.  Note that
the allocation failure is caused by another bug that will be fixed
subsequently, but this commit at least sorts out the error handling.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-10-07 08:00:11 +02:00
Christoph Hellwig
346c1d46d4 xfs: return bool from xfs_attr3_leaf_add
xfs_attr3_leaf_add only has two potential return values, indicating if the
entry could be added or not.  Replace the errno return with a bool so that
ENOSPC from it can't easily be confused with a real ENOSPC.

Remove the return value from the xfs_attr3_leaf_add_work helper entirely,
as it always return 0.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-10-07 08:00:11 +02:00
Christoph Hellwig
b1c649da15 xfs: merge xfs_attr_leaf_try_add into xfs_attr_leaf_addname
xfs_attr_leaf_try_add is only called by xfs_attr_leaf_addname, and
merging the two will simplify a following error handling fix.

To facilitate this move the remote block state save/restore helpers up in
the file so that they don't need forward declarations now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-10-07 08:00:11 +02:00
Uros Bizjak
20195d011c xfs: Use try_cmpxchg() in xlog_cil_insert_pcp_aggregate()
Use !try_cmpxchg instead of cmpxchg (*ptr, old, new) != old in
xlog_cil_insert_pcp_aggregate().  x86 CMPXCHG instruction returns
success in ZF flag, so this change saves a compare after cmpxchg.

Also, try_cmpxchg implicitly assigns old *ptr value to "old" when
cmpxchg fails. There is no need to re-read the value in the loop.

Note that the value from *ptr should be read using READ_ONCE to
prevent the compiler from merging, refetching or reordering the read.

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Cc: Chandan Babu R <chandan.babu@oracle.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-10-07 08:00:11 +02:00
Yan Zhen
6148b77960 xfs: scrub: convert comma to semicolon
Replace a comma between expression statements by a semicolon.

Signed-off-by: Yan Zhen <yanzhen@vivo.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-10-07 08:00:11 +02:00
Zhang Zekun
f6225eebd7 xfs: Remove empty declartion in header file
The definition of xfs_attr_use_log_assist() has been removed since
commit d9c61ccb3b ("xfs: move xfs_attr_use_log_assist out of xfs_log.c").
So, Remove the empty declartion in header files.

Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-10-07 08:00:11 +02:00
Pali Rohár
63271b7d56 cifs: Fix creating native symlinks pointing to current or parent directory
Calling 'ln -s . symlink' or 'ln -s .. symlink' creates symlink pointing to
some object name which ends with U+F029 unicode codepoint. This is because
trailing dot in the object name is replaced by non-ASCII unicode codepoint.

So Linux SMB client currently is not able to create native symlink pointing
to current or parent directory on Windows SMB server which can be read by
either on local Windows server or by any other SMB client which does not
implement compatible-reverse character replacement.

Fix this problem in cifsConvertToUTF16() function which is doing that
character replacement. Function comment already says that it does not need
to handle special cases '.' and '..', but after introduction of native
symlinks in reparse point form, this handling is needed.

Note that this change depends on the previous change
"cifs: Improve creating native symlinks pointing to directory".

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-10-06 22:57:12 -05:00
Pali Rohár
3eb4051253 cifs: Improve creating native symlinks pointing to directory
SMB protocol for native symlinks distinguish between symlink to directory
and symlink to file. These two symlink types cannot be exchanged, which
means that symlink of file type pointing to directory cannot be resolved at
all (and vice-versa).

Windows follows this rule for local filesystems (NTFS) and also for SMB.

Linux SMB client currenly creates all native symlinks of file type. Which
means that Windows (and some other SMB clients) cannot resolve symlinks
pointing to directory created by Linux SMB client.

As Linux system does not distinguish between directory and file symlinks,
its API does not provide enough information for Linux SMB client during
creating of native symlinks.

Add some heuristic into the Linux SMB client for choosing the correct
symlink type during symlink creation. Check if the symlink target location
ends with slash, or last path component is dot or dot-dot, and check if the
target location on SMB share exists and is a directory. If at least one
condition is truth then create a new SMB symlink of directory type.
Otherwise create it as file type symlink.

This change improves interoperability with Windows systems. Windows systems
would be able to resolve more SMB symlinks created by Linux SMB client
which points to existing directory.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-10-06 22:57:12 -05:00
Linus Torvalds
8f602276d3 bcachefs fixes for 6.12-rc2
A lot of little fixes, bigger ones include:
 
 - bcachefs's __wait_on_freeing_inode() was broken in rc1 due to vfs
   changes, now fixed along with another lost wakeup
 - fragmentation LRU fixes; fsck now repairs successfully (this is the
   data structure copygc uses); along with some nice simplification.
 - Rework logged op error handling, so that if logged op replay errors
   (due to another filesystem error) we delete the logged op instead of
   going into an infinite loop)
 - Various small filesystem connectivitity repair fixes
 
 The final part of this patch series, fixing snapshots + unlinked file
 handling, is now out on the list - I'm giving that part of the series
 more time for user testing.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmcBhkIACgkQE6szbY3K
 bnYt8RAAqZo6RcN91sgz6xGsJkUvE6DS4Rtj1J4vlVAmuiIa5NUhRhqFnS6j8V9A
 AWZw63JwTizrglbLk4Z4knfiViT4GOeiKX4sttaJk7cLW7bxwCUddlho1G5Q7q0I
 PFurYevqG1ltcl5oZpD6LhZiqEhndQI3XnkpEvKsmoXy9TSB4KEqaU8Y+cewjq4q
 KCFuxTBhbmatxP9eTGuDhd6uWw5h0EVDGQyMitEcSutIaernGlSsBQ8gZ5n9dWSd
 lP91qFT5iypmCMo9Arf8Fq1YBvOpV6P91eq8YPa4A3sKDfzHn3CCzsSyjUiGK0RM
 Wcl+kNwqYJa7Fwtb7aGgTVhaMkqLzPTI+XYye3FXrXjJ6B0JKpl2QvvDoFhDxop9
 ZPb57QyRgRBtOvofvFz8fWQOr67n+HNvaMbeG1iwGvqm6/MrgdSLsN6OaRh80uAE
 5P0qX7rwTTOfJj5T6dKLxr3KuXKXNrM5AAIG0MjOMsha232+XUAZvofYNmqx7BMi
 juJvqZc9/GXrcXqdPTYDyBs4UXDkwHsKdr744ooZ64VNiIYFs6eTvXp7V0XuajYH
 ExLrEEjhO2UGPM5N9R9jw9AMsEhJstexgylHQsiiADtdi+jY4LKa/NZAJSJQQC+C
 QQyE3Q7ZCpzRPiGPkkpIY/D7IRoIHL2H+LhbXV/K3oMGdbA7hS4=
 =XnG4
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2024-10-05' of git://evilpiepirate.org/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "A lot of little fixes, bigger ones include:

   - bcachefs's __wait_on_freeing_inode() was broken in rc1 due to vfs
     changes, now fixed along with another lost wakeup

   - fragmentation LRU fixes; fsck now repairs successfully (this is the
     data structure copygc uses); along with some nice simplification.

   - Rework logged op error handling, so that if logged op replay errors
     (due to another filesystem error) we delete the logged op instead
     of going into an infinite loop)

   - Various small filesystem connectivitity repair fixes"

* tag 'bcachefs-2024-10-05' of git://evilpiepirate.org/bcachefs:
  bcachefs: Rework logged op error handling
  bcachefs: Add warn param to subvol_get_snapshot, peek_inode
  bcachefs: Kill snapshot arg to fsck_write_inode()
  bcachefs: Check for unlinked, non-empty dirs in check_inode()
  bcachefs: Check for unlinked inodes with dirents
  bcachefs: Check for directories with no backpointers
  bcachefs: Kill alloc_v4.fragmentation_lru
  bcachefs: minor lru fsck fixes
  bcachefs: Mark more errors AUTOFIX
  bcachefs: Make sure we print error that causes fsck to bail out
  bcachefs: bkey errors are only AUTOFIX during read
  bcachefs: Create lost+found in correct snapshot
  bcachefs: Fix reattach_inode()
  bcachefs: Add missing wakeup to bch2_inode_hash_remove()
  bcachefs: Fix trans_commit disk accounting revert
  bcachefs: Fix bch2_inode_is_open() check
  bcachefs: Fix return type of dirent_points_to_inode_nowarn()
  bcachefs: Fix bad shift in bch2_read_flag_list()
2024-10-05 15:18:04 -07:00
Olga Kornievskaia
c88c150a46 nfsd: fix possible badness in FREE_STATEID
When multiple FREE_STATEIDs are sent for the same delegation stateid,
it can lead to a possible either use-after-free or counter refcount
underflow errors.

In nfsd4_free_stateid() under the client lock we find a delegation
stateid, however the code drops the lock before calling nfs4_put_stid(),
that allows another FREE_STATE to find the stateid again. The first one
will proceed to then free the stateid which leads to either
use-after-free or decrementing already zeroed counter.

Fixes: 3f29cc82a8 ("nfsd: split sc_status out of sc_type")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-10-05 15:44:25 -04:00
Linus Torvalds
fdd0a94dcf Fix some ext4 bug fixes and regressions relating to oneline resize and
fast commits.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmcAYaIACgkQ8vlZVpUN
 gaO8fQf+IOHRpyqVBXd2WwDJ05vBJAbbaKdhaI1r6gFPF1AzyIZ+lpCnEV9f9B2u
 IrDZaigb/CiJxO9mLRLrp0z/O9s6Z/+NjNdVFkoynJo2Crfr5mi07DXR9EQ7XkvT
 WYcT0IXg7xVswOzVosgBOUA76yMLJwtuTPp7YsAmCmp5EksMk1LSk1VzUEJcKRgJ
 BHSWSn2BqHxwMStB5L0eBnYtdaW8l1RLeWlTPdcPeCrD7UrYCXThvxfyz2KdztBz
 YfjcQEZhY6OD/uKdNFE6oQc54kL8RpqGDpF3YGMBixiZ8h99PpGUUz3bB5VLTfco
 WlxQwOD9dofPQ5Yh+s5icY8q67XXow==
 =Cnp+
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Fix some ext4 bugs and regressions relating to oneline resize and fast
  commits"

* tag 'ext4_for_linus-5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix off by one issue in alloc_flex_gd()
  ext4: mark fc as ineligible using an handle in ext4_xattr_set()
  ext4: use handle to mark fc as ineligible in __track_dentry_update()
2024-10-05 10:47:00 -07:00
Kent Overstreet
0f25eb4b60 bcachefs: Rework logged op error handling
Initially it was thought that we just wanted to ignore errors from
logged op replay, but it turns out we do need to catch -EROFS, or we'll
go into an infinite loop.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-04 20:25:32 -04:00
Kent Overstreet
1f73cb4d34 bcachefs: Add warn param to subvol_get_snapshot, peek_inode
These shouldn't always be fatal errors - logged op resume, in
particular, and we want it as a parameter there.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-04 20:25:32 -04:00
Kent Overstreet
72350ee0ea bcachefs: Kill snapshot arg to fsck_write_inode()
It was initially believed that it would be better to be explicit about
the snapshot we're updating when writing inodes in fsck; however, it
turns out that passing around the snapshot separately is more error
prone and we're usually updating the inode in the same snapshow we read
it from.

This is different from normal filesystem paths, where we do the update
in the snapshot of the subvolume we're in.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-04 20:25:32 -04:00
Kent Overstreet
c9306a91c3 bcachefs: Check for unlinked, non-empty dirs in check_inode()
We want to check for this early so it can be reattached if necessary in
check_unreachable_inodes(); better than letting it be deleted and having
the children reattached, losing their filenames.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-04 20:25:32 -04:00
Kent Overstreet
c7da5ee2e5 bcachefs: Check for unlinked inodes with dirents
link count works differently in bcachefs - it's only nonzero for files
with multiple hardlinks, which means we can also avoid checking it
except for files that are known to have hardlinks.

That means we need a few different checks instead; in particular, we
don't want fsck to delet a file that has a dirent pointing to it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-04 20:25:32 -04:00
Kent Overstreet
1c6051bbd7 bcachefs: Check for directories with no backpointers
It's legal for regular files to have missing backpointers (due to
hardlinks), and fsck should automatically add them, but for directories
this is an error that should be flagged.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-04 20:25:32 -04:00
Kent Overstreet
260af1562e bcachefs: Kill alloc_v4.fragmentation_lru
The fragmentation_lru field hasn't been needed since we reworked the LRU
btrees to use the btree write buffer; previously it was used to resolve
collisions, but the revised LRU btree uses the backpointer (the bucket)
as part of the key.

It should have been deleted at the time of the LRU rework; since it
wasn't, that left places for bugs to hide, in check/repair.

This fixes LRU fsck on a filesystem image helpfully provided by a user
who disappeared before I could get his name for the reported-by.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-04 20:25:32 -04:00
Kent Overstreet
01bf5e3bd2 bcachefs: minor lru fsck fixes
check_lru_key() wasn't using write buffer updates for deleting bad lru
entries - dating from before the lru btree used the btree write buffer.

And when possibly flushing the btree write buffer (to make sure we're
seeing a real inconsistency), we need to be using the modern
bch2_btree_write_buffer_maybe_flush().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-04 20:25:31 -04:00
Kent Overstreet
1bea714c53 bcachefs: Mark more errors AUTOFIX
Errors are getting marked as AUTOFIX once they've been (re)-tested and
audited.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-04 20:25:31 -04:00
Kent Overstreet
492e24d760 bcachefs: Make sure we print error that causes fsck to bail out
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-10-04 20:25:31 -04:00