Commit Graph

90456 Commits

Author SHA1 Message Date
zhuxiaohui
bb66009958 bcachefs: add REQ_SYNC and REQ_IDLE in write dio
when writing file with direct_IO on bcachefs, then performance is
much lower than other fs due to write back throttle in block layer:

        wbt_wait+1
        __rq_qos_throttle+32
        blk_mq_submit_bio+394
        submit_bio_noacct_nocheck+649
        bch2_submit_wbio_replicas+538
        __bch2_write+2539
        bch2_direct_write+1663
        bch2_write_iter+318
        aio_write+355
        io_submit_one+1224
        __x64_sys_io_submit+169
        do_syscall_64+134
        entry_SYSCALL_64_after_hwframe+110

add set REQ_SYNC and REQ_IDLE in bio->bi_opf as standard dirct-io

Signed-off-by: zhuxiaohui <zhuxiaohui.400@bytedance.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet
79032b0781 bcachefs: Improved topology repair checks
Consolidate bch2_gc_check_topology() and btree_node_interior_verify(),
and replace them with an improved version,
bch2_btree_node_check_topology().

This checks that children of an interior node correctly span the full
range of the parent node with no overlaps.

Also, ensure that topology repairs at runtime are always a fatal error;
in particular, this adds a check in btree_iter_down() - if we don't find
a key while walking down the btree that's indicative of a topology error
and should be flagged as such, not a null ptr deref.

Some checks in btree_update_interior.c remaining BUG_ONS(), because we
already checked the node for topology errors when starting the update,
and the assertions indicate that we _just_ corrupted the btree node -
i.e. the problem can't be that existing on disk corruption, they
indicate an actual algorithmic bug.

In the future, we'll be annotating the fsck errors list with which
recovery pass corrects them; the open coded "run explicit recovery pass
or fatal error" in bch2_btree_node_check_topology() will in the future
be done for every fsck_err() call.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet
40cb26233a bcachefs: Be careful about btree node splits during journal replay
Don't pick a pivot that's going to be deleted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet
048f47e83f bcachefs: btree_and_journal_iter now respects trans->journal_replay_not_finished
btree_and_journal_iter is now safe to use at runtime, not just during
recovery before journal keys have been freed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Hongbo Li
36f9ef109b bcachefs: fix trans->mem realloc in __bch2_trans_kmalloc
The old code doesn't consider the mem alloced from mempool when call
krealloc on trans->mem. Also in bch2_trans_put, using mempool_free to
free trans->mem by condition "trans->mem_bytes == BTREE_TRANS_MEM_MAX"
is inaccurate when trans->mem was allocated by krealloc function.
Instead, we use used_mempool stuff to record the situation, and realloc
or free the trans->mem in elegant way.

Also, after krealloc failed in __bch2_trans_kmalloc, the old data
should be copied to the new buffer when alloc from mempool_alloc.

Fixes: 31403dca5b ("bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet
57339b24a0 bcachefs: Don't do extent merging before journal replay is finished
We don't normally do extent updates this early in recovery, but some of
the repair paths have to and when we do, we don't want to do anything
that requires the snapshots table.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet
ec9cc18fc2 bcachefs: Add checks for invalid snapshot IDs
Previously, we assumed that keys were consistent with the snapshots
btree - but that's not correct as fsck may not have been run or may not
be complete.

This adds checks and error handling when using the in-memory snapshots
table (that mirrors the snapshots btree).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet
63332394c7 bcachefs: Move snapshot table size to struct snapshot_table
We need to add bounds checking for snapshot table accesses - it turns
out there are cases where we do need to use the snapshots table before
fsck checks have completed (and indeed, fsck may not have been run).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet
aa6e130e3c bcachefs: Add an assertion for trying to evict btree root
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:11 -04:00
Kent Overstreet
4bd02d3fb3 bcachefs: fix mount error path
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:10 -04:00
Thomas Bertschinger
688d750d10 bcachefs: fix misplaced newline in __bch2_inode_unpacked_to_text()
before:

u64s 18 type inode_v3 0:1879048192:U32_MAX len 0 ver 0:   mode=40700
  flags= (15300000)
  journal_seq=4
  bi_size=0
  bi_sectors=0

  bi_version=0bi_atime=227064388944
  ...

after:

u64s 18 type inode_v3 0:1879048192:U32_MAX len 0 ver 0:   mode=40700
  flags= (15300000)
  journal_seq=4
  bi_size=0
  bi_sectors=0
  bi_version=0
  bi_atime=227064388944
  ...

Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:10 -04:00
Kent Overstreet
8aad8e1f65 bcachefs: Fix journal pins in btree write buffer
btree write buffer flush has two phases
 - in natural key order, which is more efficient but may fail
 - then in journal order

The journal order flush was assuming that keys were still correctly
ordered by journal sequence number - but due to coalescing by the
previous phase, we need an additional sort.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:10 -04:00
Kent Overstreet
a5e3dce493 bcachefs: Fix assert in bch2_backpointer_invalid()
Backpointers that point to invalid devices are caught by fsck, not
.key_invalid; so .key_invalid needs to check for them instead of hitting
asserts.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31 20:36:10 -04:00
Linus Torvalds
7e40c2100c Kbuild fixes for v6.9
- Deduplicate Kconfig entries for CONFIG_CXL_PMU
 
  - Fix unselectable choice entry in MIPS Kconfig, and forbid this
    structure
 
  - Remove unused include/asm-generic/export.h
 
  - Fix a NULL pointer dereference bug in modpost
 
  - Enable -Woverride-init warning consistently with W=1
 
  - Drop KCSAN flags from *.mod.c files
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmYJVK0VHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGf/sP/3GOk//cQGwPyWCgtCEUo6T4yyD7
 1m2TTR0JQk/lcohSFtYk0I20rhKRqU6yAMAERmyehI66D2QY7lhiYVc16ram5y04
 x0nWxd9IqerIlGJtaWePOvNqKdCw2EP9fS9NKz58rEDMGlsSf0Rd3NEdSsWoH8td
 dECtt8yCawENAMStb/rAfsnL6kn2JIhVMyqwo0RdQfiaVT5Zk6Qgpko0Oq0ncRP2
 qdNgHbvnJdKMy81FHSBAi0QEZOYvhFNX+E+6lFfWEsX6xT+wvXddCNQzJf/YV3Cw
 Klw1tGveV7UGzlZ4fsnFrv4V6g1KO2AD3342efdDo++ypBEBpImVODc+Rp0jE9Nk
 OgdOQRe2k9a5keH0LWY0ehvDbQlSbfNxk0wNtAfo5Kk5e41nHmHJBWCwGG+cXrjJ
 mPJjSrTpuNVSaGV0kt3EskHbDBeBmIIg+5QPbldmW2qcC88kWoavkyLD3WPFsg/a
 CAuR/HqH7MDfxzvsqTCjonlVcyDKX6aW66LrQ1NCtmphI4F8mdKp746CzGlziuIm
 gjYJL/UWVlx0VebMo8dwDpaHvez4/4s6xAJcyqtA+TS5HbrQWKQuwFkiv4iWQxNd
 MvyVdzgKhcMdoXhfFpUZ0LlFvHGefJ+Z6N1FQLoQJkTirt5aqRbEAjP0VXwQB4eH
 zYygkhvvtiH9/STu
 =tx+2
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-fixes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Deduplicate Kconfig entries for CONFIG_CXL_PMU

 - Fix unselectable choice entry in MIPS Kconfig, and forbid this
   structure

 - Remove unused include/asm-generic/export.h

 - Fix a NULL pointer dereference bug in modpost

 - Enable -Woverride-init warning consistently with W=1

 - Drop KCSAN flags from *.mod.c files

* tag 'kbuild-fixes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: Fix typo HEIGTH to HEIGHT
  Documentation/llvm: Note s390 LLVM=1 support with LLVM 18.1.0 and newer
  kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries
  kbuild: make -Woverride-init warnings more consistent
  modpost: do not make find_tosym() return NULL
  export.h: remove include/asm-generic/export.h
  kconfig: do not reparent the menu inside a choice block
  MIPS: move unselectable FIT_IMAGE_FDT_EPM5 out of the "System type" choice
  cxl: remove CONFIG_CXL_PMU entry in drivers/cxl/Kconfig
2024-03-31 11:23:51 -07:00
Arnd Bergmann
c40845e319 kbuild: make -Woverride-init warnings more consistent
The -Woverride-init warn about code that may be intentional or not,
but the inintentional ones tend to be real bugs, so there is a bit of
disagreement on whether this warning option should be enabled by default
and we have multiple settings in scripts/Makefile.extrawarn as well as
individual subsystems.

Older versions of clang only supported -Wno-initializer-overrides with
the same meaning as gcc's -Woverride-init, though all supported versions
now work with both. Because of this difference, an earlier cleanup of
mine accidentally turned the clang warning off for W=1 builds and only
left it on for W=2, while it's still enabled for gcc with W=1.

There is also one driver that only turns the warning off for newer
versions of gcc but not other compilers, and some but not all the
Makefiles still use a cc-disable-warning conditional that is no
longer needed with supported compilers here.

Address all of the above by removing the special cases for clang
and always turning the warning off unconditionally where it got
in the way, using the syntax that is supported by both compilers.

Fixes: 2cd3271b7a ("kbuild: avoid duplicate warning options")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Andrew Jeffery <andrew@codeconstruct.com.au>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-03-31 11:32:26 +09:00
Linus Torvalds
712e14250d Bug fixes for 6.9-rc2:
* Allow stripe unit/width value passed via mount option to be written over
    existing values in the super block.
  * Do not set current->journal_info to avoid its value from being miused by
    another filesystem context.
 
 Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZgKa+AAKCRAH7y4RirJu
 9IL1APwPBMzSowijBI/rCD5BGlISn7mCRlZwvyXE1avmRmbQPAEApU5yRhBHWi62
 629azfSr1I5m678xM7WQKh6X3/VUDAg=
 =pqNH
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.9-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Chandan Babu:

 - Allow stripe unit/width value passed via mount option to be written
   over existing values in the super block

 - Do not set current->journal_info to avoid its value from being miused
   by another filesystem context

* tag 'xfs-6.9-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: don't use current->journal_info
  xfs: allow sunit mount option to repair bad primary sb stripe values
2024-03-30 13:51:58 -07:00
Linus Torvalds
091619baac 2 cifs.ko changesets
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmYG6B4ACgkQiiy9cAdy
 T1H8vAv/ULRl/PQreMq6FaXSeW0S5w7RiMeeZ1+314C7W/iLLbd8n6fmOshAS1nn
 d8vC6qx5ulTNRwFR7eYexolWgqG7bisKTiGzp+UZelo01LlPU90cJQvPZfaETVPR
 XpqRUwrQE/sU9n48PGvcGARqd2sz1WC3l/oeN1P0QXJ7xWKJJNcLqtzT0eeNgAto
 u/EKw9SZ5D20V1GrGYxHY57M9L0rRcndUoWZDn9s0NCkOGnibOOiTxpQV8sepJ35
 6CRKxLTatzAJahUaQ/C5uvM0cSfRcTQdp3u4B01+gCSPhP7mwIS0o5EDzw2lWk6W
 9zQz9de2PPw98Q0sbZn1fuherbLSKDXZIIUKEH+oZ2WXJfva5jIVBpp5EIR92VY4
 A86DznmRJC5br3q+FuBOc1NPyslV01l80ZmhWppeX6/YA8egkm4KWgFB/02n7Agh
 Ug2Fe8MNGVPiwmijXBswf9CSjn3ctAH9cRvqs0QXj3aPkduvu+jDVGwbO/zWdvlY
 c1gs/eF7
 =ai8C
 -----END PGP SIGNATURE-----

Merge tag '6.9-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - Add missing trace point (noticed when debugging the recent mknod LSM
   regression)

 - fscache fix

* tag '6.9-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix duplicate fscache cookie warnings
  smb3: add trace event for mknod
2024-03-29 12:06:09 -07:00
Linus Torvalds
d8e8fbec00 nfsd-6.9 fixes:
- Address three recently introduced regressions
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmYFtQkACgkQM2qzM29m
 f5fSbA//S0YiPCU+bVwN+mK47vhai1Dzaw1kd2rzlfnvTCyBFVocfQU85bFV8uns
 ZMcHhO6jPjRg1o/qkCzeNf+n6AVGwvgOn1jC628jTP1wJOS8WPsZtdwz8vHX/xw5
 qnMwEUtfFPiDn85VyK6z6mJBWRcF/vStPc022Ie5ba5ydxD0rFmaeIc6zYX8csUe
 8tbHlulYtRR97mXjxgtXC/H0scvag07MA1fRKiZRNk0jwUdtcAsHktiBfWPV2KXO
 ew8saA+BLyBBdE01cq1CoMH/3wt/a5rpaiV5ePPtYkmlPVT5IEmSLZUwUVZAjMuu
 qlvKF9xrH7x8/7g/SwOJdH8po0IjiRdW9EKID+6PPqXz+YYKyTe7JqZtvEzt/4x9
 tGbIfyHxpUPPTZ6VedlXYW95uWG+lrKzur+zHQwsNf+U0d5Xila8euEKqvB4kDtk
 riHUlvs8O6WroeHcOQ5pK72l7x8gELdVqHP6n+E0t5+2VKp/Vcqf8WlWV0suoDfa
 D2EYGIzxn8NrNuivGESHfzafmHKKMn51UGrm1Sl0vF5w2LlqMfL4Yo4NIT+e69VR
 E3h6wIhK7Jor9px5/tBD3Y6o9k8wT9upjA+qsrOUhxyU3T9r+s88I64gvGyABTIM
 kZJHLJYROoNO6A5FSRySSH+bk/UU+wJ0s6avgHL1A2yYnyrQUzU=
 =YV5O
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Address three recently introduced regressions

* tag 'nfsd-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: CREATE_SESSION must never cache NFS4ERR_DELAY replies
  SUNRPC: Revert 561141dd49
  nfsd: Fix error cleanup path in nfsd_rename()
2024-03-28 14:35:32 -07:00
Linus Torvalds
8d025e2092 Changes since last update:
- Add a new reviewer Sandeep Dhavale to build a healthier community;
 
  - Drop experimental warning for FSDAX.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmYE1ucRHHhpYW5nQGtl
 cm5lbC5vcmcACgkQUXZn5Zlu5qoSnw//Q35Sc7fIQb6YfuOOQ2VQX7TVE4jTrLej
 +3hIQUz3BA8wRwrBYdeJLcvjda4y+0gNxlpw2ycNS6S4vxAiVEAuwPJYO6oAC0Il
 ETa6opc1vpTMeXgwwtbXC7ACnWas9EQC61Z4E8W5zeVcNqQZnZyInMJ9Rkjqs/iJ
 VjJH2wWR5MIgWJdEPqchPx/28nqbBOcztc7ARJqpujyZEvu+OBVIPv8P/7N0a5yG
 bpHkDrzoelBMkpktpuvrkv84ymyCDC7LH9mq+Wk6dY5wRFxa2BtoZVh6YcpcjrpR
 75GZVg3BN3Ph41JCwYHqxyRGpoLO11dSYi6DNDVngxOGtkTNRVGrJ1FYEapWV+o8
 1MnEbl0vZSHUkjrIFbfZTFSqpvW2XSfEOa3heNDFknmzT/ISobSGENULp9cggcYI
 jhS6wtVG4bl5bCiCKCZluByr8/J8TCQc/5t5f5bQLy2MWqlyjaSx82uWuDpzO1Hh
 +q+p+MB+ketMUwxaIUAuTNgzhFPFT4Na/ni9WqP7Ri3GJY6pdjDUUtrtoIBK4oPQ
 ajUWhPlOk5zMwLq9Jl4MiG1ostBI9P37ZerjsdaLDZYElGhTjwjPu/xlh7p26Inq
 Ufq3QaQH2wai+oAVS6Sli3dJfb399XVJhmT2WFMH+0DmQW6JzvsGTdUX2fMgv5sb
 I7dVfuceTs0=
 =+ZW9
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-6.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:

 - Add a new reviewer Sandeep Dhavale to build a healthier community

 - Drop experimental warning for FSDAX

* tag 'erofs-for-6.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  MAINTAINERS: erofs: add myself as reviewer
  erofs: drop experimental warning for FSDAX
2024-03-27 20:24:09 -07:00
Linus Torvalds
4076fa1612 fs/9p: fix errors found in 6.9-rc1
Two of these fix syzbot reported issues, and
 the other fixes a unused variable in some configurations.
 
 Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElpbw0ZalkJikytFRiP/V+0pf/5gFAmYEjyAACgkQiP/V+0pf
 /5jrDhAAkrEkFry9Zp7txMNG7fyRVRW8BPkvCUbORBdmLm01lswTnvHDT5pbSYcf
 Z3KxSpo0DvO2G3XEHQHQN24Sbx78Xg8XssXGFc0j1/hQbGpwIyXM/NTa3VVnfmwH
 lV2ysLa5zCR81k0hu8QzGe8DSOFsyE9oz3ABE7nCjmQsOI0zgyOvn7Uy1rwT5B11
 lBYN4rotl2Ie1wPoVTf8PNbeAWtuVR3FN9GTSvZuNDFbYjYEv0zFangxXIBFhn2g
 pUJ22C6CYenhLzTKBRVW/CcSxPVVS4jFok1nws07MmlXozrbklPYHuN6XQYFetLt
 wtnZxmUmy1u4pmpG2xPziUOGb7wdm6cDC1aIrcYPlbk/9U2iYHuVgQz0WOcEIbA3
 g5LrgCpUZO8UPxEvtFYDq9oUAhxhjf6BI7/rR0fNWp8JgJ0scz3d/e+Q/srWWr23
 Pej+10N5JijTEuD39BUnWy/yTz0WVBg1nP/C44buB0zSWJbxiPOJd0imSW6ipoIk
 Onr3FmhnRSiIfuUKEH/jI+QsqKzZTdBe9e3+SYBEg4KXvM3e1ltJgDs+BIdWbo00
 ih/jkZL7iX+uzkOdVR3o/1KagkwHTcU8P+4i8wELD7+CquQXvsedfqeTmz5JvNIR
 9Df1Gu1xCeINn6jnlSaVAonnCi1awJoaxFW+azCb+q5MP8lP6IU=
 =1ev6
 -----END PGP SIGNATURE-----

Merge tag '9p-fixes-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull 9p fixes from Eric Van Hensbergen:
 "Two of these fix syzbot reported issues, and the other fixes a unused
  variable in some configurations"

* tag '9p-fixes-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  fs/9p: fix uninitialized values during inode evict
  fs/9p: remove redundant pointer v9ses
  fs/9p: fix uaf in in v9fs_stat2inode_dotl
2024-03-27 14:53:56 -07:00
Linus Torvalds
400dd456bd for-6.9-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmYEfJYACgkQxWXV+ddt
 WDucIg/+IupuqdLKnj6bepxX/VufnFAjecD3sRgZQQLIMfm3MQX3TzbNoPYEiAGU
 tNG6jxYgkGRoyhN3aIQnsJmRFje5epYjNA5+ueUNT2/KfyKonnS2TIKQt6u7XBls
 fl4SCTSNRX7w/QUNUWwyY5/86yzV4F8w19X5nVOKcp7Nz3hUBdeDZWAmMlYyHuFW
 N2YRyNdCxB4Y0U9g1vgI63wFjOac0F+7RTHGsDH7ueOZ2dbtDM38lHBoCbdX3jmy
 5nG7wVJZp2H/zCmzrVQJ897CMfr3h9r9Kxx8EE3JDJaJ5sMMaRh361rgsZTaGsjz
 SwUzT6Z0u0hsBANSTOUZixhfX5sqArmemG+XpFu6Rq+732DqS+c4vWRSu7c8Rc8i
 +4HIQNsjJqm/d1u2IyxXfuqSbaULLnyYQ8rdEx3o2AM37JnuTvOWoB+v/JqPb9TI
 aG+bOPvg7GM9Sl3IoM5sR+j3bEebranZbUF+UiDEujZJiY+uiw3vMbFvyOBRWaUU
 ODTpNoyCmz94mWg79hyosOjM9A/NCEkRH4oSc+YeqOvzTIBG3V+D3HxN/DX4FVTy
 VDxMdptu0aPIEkUQ3nvsj4t3OKgj1w9rxZFpRYH33zJVvRqZ8VrgtC9V6zgPv3h7
 suQL4s4i4EiIgAk2Z0OR23wDwg1TwXVWLGErfQVHslvhl/a2Qb4=
 =Lhhp
 -----END PGP SIGNATURE-----

Merge tag 'for-6.9-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - fix race when reading extent buffer and 'uptodate' status is missed
   by one thread (introduced in 6.5)

 - do additional validation of devices using major:minor numbers

 - zoned mode fixes:
     - use zone-aware super block access during scrub
     - fix use-after-free during device replace (found by KASAN)
     - also delete zones that are 100% unusable to reclaim space

 - extent unpinning fixes:
     - fix extent map leak after error handling
     - print correct range in error message

 - error code and message updates

* tag 'for-6.9-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix race in read_extent_buffer_pages()
  btrfs: return accurate error code on open failure in open_fs_devices()
  btrfs: zoned: don't skip block groups with 100% zone unusable
  btrfs: use btrfs_warn() to log message at btrfs_add_extent_mapping()
  btrfs: fix message not properly printing interval when adding extent map
  btrfs: fix warning messages not printing interval at unpin_extent_range()
  btrfs: fix extent map leak in unexpected scenario at unpin_extent_cache()
  btrfs: validate device maj:min during open
  btrfs: zoned: fix use-after-free in do_zone_finish()
  btrfs: zoned: use zone aware sb location for scrub
2024-03-27 13:56:41 -07:00
Chuck Lever
99dc2ef039 NFSD: CREATE_SESSION must never cache NFS4ERR_DELAY replies
There are one or two cases where CREATE_SESSION returns
NFS4ERR_DELAY in order to force the client to wait a bit and try
CREATE_SESSION again. However, after commit e4469c6cc6 ("NFSD: Fix
the NFSv4.1 CREATE_SESSION operation"), NFSD caches that response in
the CREATE_SESSION slot. Thus, when the client resends the
CREATE_SESSION, the server always returns the cached NFS4ERR_DELAY
response rather than actually executing the request and properly
recording its outcome. This blocks the client from making further
progress.

RFC 8881 Section 15.1.1.3 says:
> If NFS4ERR_DELAY is returned on an operation other than SEQUENCE
> that validly appears as the first operation of a request ... [t]he
> request can be retried in full without modification. In this case
> as well, the replier MUST avoid returning a response containing
> NFS4ERR_DELAY as the response to an initial operation of a request
> solely on the basis of its presence in the reply cache.

Neither the original NFSD code nor the discussion in section 18.36.4
refer explicitly to this important requirement, so I missed it.

Note also that not only must the server not cache NFS4ERR_DELAY, but
it has to not advance the CREATE_SESSION slot sequence number so
that it can properly recognize and accept the client's retry.

Reported-by: Dai Ngo <dai.ngo@oracle.com>
Fixes: e4469c6cc6 ("NFSD: Fix the NFSv4.1 CREATE_SESSION operation")
Tested-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-27 13:19:47 -04:00
David Howells
8876a37277 cifs: Fix duplicate fscache cookie warnings
fscache emits a lot of duplicate cookie warnings with cifs because the
index key for the fscache cookies does not include everything that the
cifs_find_inode() function does.  The latter is used with iget5_locked() to
distinguish between inodes in the local inode cache.

Fix this by adding the creation time and file type to the fscache cookie
key.

Additionally, add a couple of comments to note that if one is changed the
other must be also.

Signed-off-by: David Howells <dhowells@redhat.com>
Fixes: 70431bfd82 ("cifs: Support fscache indexing rewrite")
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-03-27 12:04:06 -05:00
Linus Torvalds
f4a432914a execve fixes for v6.9-rc2
- Fix selftests to conform to the TAP output format (Muhammad Usama Anjum)
 
 - Fix NOMMU linux_binprm::exec pointer in auxv (Max Filippov)
 
 - Replace deprecated strncpy usage (Justin Stitt)
 
 - Replace another /bin/sh instance in selftests
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmYDT3sWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJjxlD/49PYpA4hMReEJ/01UkMn7IT2DP
 QWV9IfaPTodj9tjQngalhcF7r6O5guRR7MRfZxyaXriq4aJNzOLm2STmwSG1cOgP
 hP9D0HnMSc5CrqMJ2kSTr3ETK0a2mTivWl375TUgGdW+QJo7YYInHYaH2THhme1Z
 MkLHqSkruHw6YVvSvzoWiwZ4taiia7op8HbAEvJQiwnJdiVeCLIYbf2AxXNop2xv
 xcmoGkSh6KSiQ0XQ7VXs4LC3v/ElHBINSbChoXPBDY5kBWZybyxRwYCVt8mJftgF
 mVGXBFFpnaLU/gDayPg/Pyq9sW1bLpi8w0BBu419BVfAQ475K+YZ/V8nj4fm95e3
 gIWm3x1O48r0OxdzmPb5re/s7lG5uNLzzFEWIus18NmqgA8S1CyFveRB3Zh8LlXB
 9UEt4mlcgp/CLAo1Zv6IBe6UDcAf4AR4Tq+d+etmORTqHmM7n399XivNuft9myyB
 9ObLCfKvOa71uF0n714XLHc5STk2KTK70Me2L/H5gitSqjIEKFNQ5SOaSbsGImDv
 i4YPnptCJFTQumE0Tu5hna8uyjOXFIxq/zkfDmzc1wP8FcijwRx3UPoO6WlQsdfx
 5cmJSaIX1bhFC+4gxAoEHUDWPh/f4kLeDpIXX6NPH28Do1wxLnri3ryvkfgkw5Vj
 /1E03LXfcnnSbjQAPQ==
 =Siss
 -----END PGP SIGNATURE-----

Merge tag 'execve-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull execve fixes from Kees Cook:

 - Fix selftests to conform to the TAP output format (Muhammad Usama
   Anjum)

 - Fix NOMMU linux_binprm::exec pointer in auxv (Max Filippov)

 - Replace deprecated strncpy usage (Justin Stitt)

 - Replace another /bin/sh instance in selftests

* tag 'execve-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  binfmt: replace deprecated strncpy
  exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack()
  selftests/exec: Convert remaining /bin/sh to /bin/bash
  selftests/exec: execveat: Improve debug reporting
  selftests/exec: recursion-depth: conform test to TAP format output
  selftests/exec: load_address: conform test to TAP format output
  selftests/exec: binfmt_script: Add the overall result line according to TAP
2024-03-27 09:57:30 -07:00
Steve French
e9e9fbeb83 smb3: add trace event for mknod
Add trace points to help debug mknod and mkfifo:

   smb3_mknod_done
   smb3_mknod_enter
   smb3_mknod_err

Example output:

      TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
         | |         |   |||||     |         |
    mkfifo-6163    [003] .....   960.425558: smb3_mknod_enter: xid=12 sid=0xb55130f6 tid=0x46e6241c path=\fifo1
    mkfifo-6163    [003] .....   960.432719: smb3_mknod_done: xid=12 sid=0xb55130f6 tid=0x46e6241c

Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Reviewed-by: Meetakshi Setiya <msetiya@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-03-26 14:56:36 -05:00
Tavian Barnes
ef1e68236b btrfs: fix race in read_extent_buffer_pages()
There are reports from tree-checker that detects corrupted nodes,
without any obvious pattern so possibly an overwrite in memory.
After some debugging it turns out there's a race when reading an extent
buffer the uptodate status can be missed.

To prevent concurrent reads for the same extent buffer,
read_extent_buffer_pages() performs these checks:

    /* (1) */
    if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags))
        return 0;

    /* (2) */
    if (test_and_set_bit(EXTENT_BUFFER_READING, &eb->bflags))
        goto done;

At this point, it seems safe to start the actual read operation. Once
that completes, end_bbio_meta_read() does

    /* (3) */
    set_extent_buffer_uptodate(eb);

    /* (4) */
    clear_bit(EXTENT_BUFFER_READING, &eb->bflags);

Normally, this is enough to ensure only one read happens, and all other
callers wait for it to finish before returning.  Unfortunately, there is
a racey interleaving:

    Thread A | Thread B | Thread C
    ---------+----------+---------
       (1)   |          |
             |    (1)   |
       (2)   |          |
       (3)   |          |
       (4)   |          |
             |    (2)   |
             |          |    (1)

When this happens, thread B kicks of an unnecessary read. Worse, thread
C will see UPTODATE set and return immediately, while the read from
thread B is still in progress.  This race could result in tree-checker
errors like this as the extent buffer is concurrently modified:

    BTRFS critical (device dm-0): corrupted node, root=256
    block=8550954455682405139 owner mismatch, have 11858205567642294356
    expect [256, 18446744073709551360]

Fix it by testing UPTODATE again after setting the READING bit, and if
it's been set, skip the unnecessary read.

Fixes: d7172f52e9 ("btrfs: use per-buffer locking for extent_buffer reading")
Link: https://lore.kernel.org/linux-btrfs/CAHk-=whNdMaN9ntZ47XRKP6DBes2E5w7fi-0U3H2+PS18p+Pzw@mail.gmail.com/
Link: https://lore.kernel.org/linux-btrfs/f51a6d5d7432455a6a858d51b49ecac183e0bbc9.1706312914.git.wqu@suse.com/
Link: https://lore.kernel.org/linux-btrfs/c7241ea4-fcc6-48d2-98c8-b5ea790d6c89@gmx.com/
CC: stable@vger.kernel.org # 6.5+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tavian Barnes <tavianator@tavianator.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor update of changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26 16:42:39 +01:00
Anand Jain
2f1aeab9fc btrfs: return accurate error code on open failure in open_fs_devices()
When attempting to exclusive open a device which has no exclusive open
permission, such as a physical device associated with the flakey dm
device, the open operation will fail, resulting in a mount failure.

In this particular scenario, we erroneously return -EINVAL instead of the
correct error code provided by the bdev_open_by_path() function, which is
-EBUSY.

Fix this, by returning error code from the bdev_open_by_path() function.
With this correction, the mount error message will align with that of
ext4 and xfs.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26 16:42:39 +01:00
Johannes Thumshirn
a8b70c7f86 btrfs: zoned: don't skip block groups with 100% zone unusable
Commit f4a9f21941 ("btrfs: do not delete unused block group if it may be
used soon") changed the behaviour of deleting unused block-groups on zoned
filesystems. Starting with this commit, we're using
btrfs_space_info_used() to calculate the number of used bytes in a
space_info. But btrfs_space_info_used() also accounts
btrfs_space_info::bytes_zone_unusable as used bytes.

So if a block group is 100% zone_unusable it is skipped from the deletion
step.

In order not to skip fully zone_unusable block-groups, also check if the
block-group has bytes left that can be used on a zoned filesystem.

Fixes: f4a9f21941 ("btrfs: do not delete unused block group if it may be used soon")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26 16:42:39 +01:00
Filipe Manana
2133460061 btrfs: use btrfs_warn() to log message at btrfs_add_extent_mapping()
At btrfs_add_extent_mapping(), if we failed to merge the extent map, which
is unexpected and theoretically should never happen, we use WARN_ONCE() to
log a message which is not great because we don't get information about
which filesystem it relates to in case we have multiple btrfs filesystems
mounted. So change this to use btrfs_warn() and surround the error check
with WARN_ON() so we always get a useful stack trace and the condition is
flagged as "unlikely" since it's not expected to ever happen.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26 16:42:39 +01:00
Filipe Manana
379c872393 btrfs: fix message not properly printing interval when adding extent map
At btrfs_add_extent_mapping(), if we are unable to merge the existing
extent map, we print a warning message that suggests interval ranges in
the form "[X, Y)", where the first element is the inclusive start offset
of a range and the second element is the exclusive end offset. However
we end up printing the length of the ranges instead of the exclusive end
offsets. So fix this by printing the range end offsets.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26 16:42:39 +01:00
Filipe Manana
4dc1d69c2b btrfs: fix warning messages not printing interval at unpin_extent_range()
At unpin_extent_range() we print warning messages that are supposed to
print an interval in the form "[X, Y)", with the first element being an
inclusive start offset and the second element being the exclusive end
offset of a range. However we end up printing the range's length instead
of the range's exclusive end offset, so fix that to avoid having confusing
and non-sense messages in case we hit one of these unexpected scenarios.

Fixes: 00deaf04df ("btrfs: log messages at unpin_extent_range() during unexpected cases")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26 16:42:38 +01:00
Filipe Manana
8a565ec04d btrfs: fix extent map leak in unexpected scenario at unpin_extent_cache()
At unpin_extent_cache() if we happen to find an extent map with an
unexpected start offset, we jump to the 'out' label and never release the
reference we added to the extent map through the call to
lookup_extent_mapping(), therefore resulting in a leak. So fix this by
moving the free_extent_map() under the 'out' label.

Fixes: c03c89f821 ("btrfs: handle errors returned from unpin_extent_cache()")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26 16:42:38 +01:00
Anand Jain
9f7eb8405d btrfs: validate device maj:min during open
Boris managed to create a device capable of changing its maj:min without
altering its device path.

Only multi-devices can be scanned. A device that gets scanned and remains
in the btrfs kernel cache might end up with an incorrect maj:min.

Despite the temp-fsid feature patch did not introduce this bug, it could
lead to issues if the above multi-device is converted to a single device
with a stale maj:min. Subsequently, attempting to mount the same device
with the correct maj:min might mistake it for another device with the same
fsid, potentially resulting in wrongly auto-enabling the temp-fsid feature.

To address this, this patch validates the device's maj:min at the time of
device open and updates it if it has changed since the last scan.

CC: stable@vger.kernel.org # 6.7+
Fixes: a5b8a5f9f8 ("btrfs: support cloned-device mount capability")
Reported-by: Boris Burkov <boris@bur.io>
Co-developed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Boris Burkov <boris@bur.io>#
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26 16:42:38 +01:00
Johannes Thumshirn
1ec17ef591 btrfs: zoned: fix use-after-free in do_zone_finish()
Shinichiro reported the following use-after-free triggered by the device
replace operation in fstests btrfs/070.

 BTRFS info (device nullb1): scrub: finished on devid 1 with status: 0
 ==================================================================
 BUG: KASAN: slab-use-after-free in do_zone_finish+0x91a/0xb90 [btrfs]
 Read of size 8 at addr ffff8881543c8060 by task btrfs-cleaner/3494007

 CPU: 0 PID: 3494007 Comm: btrfs-cleaner Tainted: G        W          6.8.0-rc5-kts #1
 Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 3.3 02/21/2020
 Call Trace:
  <TASK>
  dump_stack_lvl+0x5b/0x90
  print_report+0xcf/0x670
  ? __virt_addr_valid+0x200/0x3e0
  kasan_report+0xd8/0x110
  ? do_zone_finish+0x91a/0xb90 [btrfs]
  ? do_zone_finish+0x91a/0xb90 [btrfs]
  do_zone_finish+0x91a/0xb90 [btrfs]
  btrfs_delete_unused_bgs+0x5e1/0x1750 [btrfs]
  ? __pfx_btrfs_delete_unused_bgs+0x10/0x10 [btrfs]
  ? btrfs_put_root+0x2d/0x220 [btrfs]
  ? btrfs_clean_one_deleted_snapshot+0x299/0x430 [btrfs]
  cleaner_kthread+0x21e/0x380 [btrfs]
  ? __pfx_cleaner_kthread+0x10/0x10 [btrfs]
  kthread+0x2e3/0x3c0
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x31/0x70
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1b/0x30
  </TASK>

 Allocated by task 3493983:
  kasan_save_stack+0x33/0x60
  kasan_save_track+0x14/0x30
  __kasan_kmalloc+0xaa/0xb0
  btrfs_alloc_device+0xb3/0x4e0 [btrfs]
  device_list_add.constprop.0+0x993/0x1630 [btrfs]
  btrfs_scan_one_device+0x219/0x3d0 [btrfs]
  btrfs_control_ioctl+0x26e/0x310 [btrfs]
  __x64_sys_ioctl+0x134/0x1b0
  do_syscall_64+0x99/0x190
  entry_SYSCALL_64_after_hwframe+0x6e/0x76

 Freed by task 3494056:
  kasan_save_stack+0x33/0x60
  kasan_save_track+0x14/0x30
  kasan_save_free_info+0x3f/0x60
  poison_slab_object+0x102/0x170
  __kasan_slab_free+0x32/0x70
  kfree+0x11b/0x320
  btrfs_rm_dev_replace_free_srcdev+0xca/0x280 [btrfs]
  btrfs_dev_replace_finishing+0xd7e/0x14f0 [btrfs]
  btrfs_dev_replace_by_ioctl+0x1286/0x25a0 [btrfs]
  btrfs_ioctl+0xb27/0x57d0 [btrfs]
  __x64_sys_ioctl+0x134/0x1b0
  do_syscall_64+0x99/0x190
  entry_SYSCALL_64_after_hwframe+0x6e/0x76

 The buggy address belongs to the object at ffff8881543c8000
  which belongs to the cache kmalloc-1k of size 1024
 The buggy address is located 96 bytes inside of
  freed 1024-byte region [ffff8881543c8000, ffff8881543c8400)

 The buggy address belongs to the physical page:
 page:00000000fe2c1285 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1543c8
 head:00000000fe2c1285 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
 flags: 0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
 page_type: 0xffffffff()
 raw: 0017ffffc0000840 ffff888100042dc0 ffffea0019e8f200 dead000000000002
 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff8881543c7f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff8881543c7f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 >ffff8881543c8000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                        ^
  ffff8881543c8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8881543c8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

This UAF happens because we're accessing stale zone information of a
already removed btrfs_device in do_zone_finish().

The sequence of events is as follows:

btrfs_dev_replace_start
  btrfs_scrub_dev
   btrfs_dev_replace_finishing
    btrfs_dev_replace_update_device_in_mapping_tree <-- devices replaced
    btrfs_rm_dev_replace_free_srcdev
     btrfs_free_device                              <-- device freed

cleaner_kthread
 btrfs_delete_unused_bgs
  btrfs_zone_finish
   do_zone_finish              <-- refers the freed device

The reason for this is that we're using a cached pointer to the chunk_map
from the block group, but on device replace this cached pointer can
contain stale device entries.

The staleness comes from the fact, that btrfs_block_group::physical_map is
not a pointer to a btrfs_chunk_map but a memory copy of it.

Also take the fs_info::dev_replace::rwsem to prevent
btrfs_dev_replace_update_device_in_mapping_tree() from changing the device
underneath us again.

Note: btrfs_dev_replace_update_device_in_mapping_tree() is holding
fs_info::mapping_tree_lock, but as this is a spinning read/write lock we
cannot take it as the call to blkdev_zone_mgmt() requires a memory
allocation which may not sleep.
But btrfs_dev_replace_update_device_in_mapping_tree() is always called with
the fs_info::dev_replace::rwsem held in write mode.

Many thanks to Shinichiro for analyzing the bug.

Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
CC: stable@vger.kernel.org # 6.8
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-26 16:41:01 +01:00
Linus Torvalds
928a87efa4 gfs2 fix
- Fix boundary check in punch_hole
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmYBaM0UHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTrzqw/9GpK71h1dIA8vYqInumdrUabksLKy
 jRMR2ZxfzBKLdAfgn9AS3nrWNos72vjAxbjYCi/fbY9uvIK1/zzq7Ef7601kCetM
 NzxShY8AwLJa9mO8O5yReLL7O/61gjlcdD6rSjkYwphWuobd5vpudKkibgpdJyH8
 bn6U1/2K5ASFtWyTRbudOIsz4AqPUE6ZB4KxSuCDx7uFiQjnuh6sk8wfg48pdig7
 GAsNPmBFfWAQXClPnI/WFG0hpkuRIK1hk9ITWx1ybu2JqaNeVXRBqGoRZbEkPYju
 qEkp4oT3j/1siBz1sMOjC5tfmAzhLvAeL61pD2EOcm5Bpd3iKJibYt/uCIpYFHM0
 WfRcUmqEduN1zhDuSR4KSe49JQ5dFXVf83YqUgbtrHFiHHXNBYYqFNUVfcDAB1p7
 IH9AlNd82zyxJ3fsBX7VpEbGC2qNa3K8hYO7px8DNVrPGzW7AhPF1Lsh0OE9GlZU
 H5f70Nryi98iwadbePBUchTrx0S3iYjk2TQgLGf5L/lAl6J/MRNG31kittDtehri
 cct/JBr8sUAK014TS5NxPbpxqDnVot3UsYk7h6s7WdmM1svfs7j5f1mo3ovMEGqX
 io5Z6pFEE7n1ce5hbieDKr3JFh6LxP1ArUSY8oz5rR0shE2XHMcIdq3J26Vfi0Q0
 4VjdBic/7rUUBXI=
 =QXEK
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v6.8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Andreas Gruenbacher:

 - Fix boundary check in punch_hole

* tag 'gfs2-v6.8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix invalid metadata access in punch_hole
2024-03-25 10:53:39 -07:00
Eric Van Hensbergen
6630036b7c
fs/9p: fix uninitialized values during inode evict
If an iget fails due to not being able to retrieve information
from the server then the inode structure is only partially
initialized.  When the inode gets evicted, references to
uninitialized structures (like fscache cookies) were being
made.

This patch checks for a bad_inode before doing anything other
than clearing the inode from the cache.  Since the inode is
bad, it shouldn't have any state associated with it that needs
to be written back (and there really isn't a way to complete
those anyways).

Reported-by: syzbot+eb83fe1cce5833cd66a0@syzkaller.appspotmail.com
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
2024-03-25 14:16:06 +00:00
Dave Chinner
f2e812c152 xfs: don't use current->journal_info
syzbot reported an ext4 panic during a page fault where found a
journal handle when it didn't expect to find one. The structure
it tripped over had a value of 'TRAN' in the first entry in the
structure, and that indicates it tripped over a struct xfs_trans
instead of a jbd2 handle.

The reason for this is that the page fault was taken during a
copy-out to a user buffer from an xfs bulkstat operation. XFS uses
an "empty" transaction context for bulkstat to do automated metadata
buffer cleanup, and so the transaction context is valid across the
copyout of the bulkstat info into the user buffer.

We are using empty transaction contexts like this in XFS to reduce
the risk of failing to release objects we reference during the
operation, especially during error handling. Hence we really need to
ensure that we can take page faults from these contexts without
leaving landmines for the code processing the page fault to trip
over.

However, this same behaviour could happen from any other filesystem
that triggers a page fault or any other exception that is handled
on-stack from within a task context that has current->journal_info
set.  Having a page fault from some other filesystem bounce into XFS
where we have to run a transaction isn't a bug at all, but the usage
of current->journal_info means that this could result corruption of
the outer task's journal_info structure.

The problem is purely that we now have two different contexts that
now think they own current->journal_info. IOWs, no filesystem can
allow page faults or on-stack exceptions while current->journal_info
is set by the filesystem because the exception processing might use
current->journal_info itself.

If we end up with nested XFS transactions whilst holding an empty
transaction, then it isn't an issue as the outer transaction does
not hold a log reservation. If we ignore the current->journal_info
usage, then the only problem that might occur is a deadlock if the
exception tries to take the same locks the upper context holds.
That, however, is not a problem that setting current->journal_info
would solve, so it's largely an irrelevant concern here.

IOWs, we really only use current->journal_info for a warning check
in xfs_vm_writepages() to ensure we aren't doing writeback from a
transaction context. Writeback might need to do allocation, so it
can need to run transactions itself. Hence it's a debug check to
warn us that we've done something silly, and largely it is not all
that useful.

So let's just remove all the use of current->journal_info in XFS and
get rid of all the potential issues from nested contexts where
current->journal_info might get misused by another filesystem
context.

Reported-by: syzbot+cdee56dbcdf0096ef605@syzkaller.appspotmail.com
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Mark Tinguely <mark.tinguely@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-03-25 10:21:01 +05:30
Dave Chinner
15922f5dbf xfs: allow sunit mount option to repair bad primary sb stripe values
If a filesystem has a busted stripe alignment configuration on disk
(e.g. because broken RAID firmware told mkfs that swidth was smaller
than sunit), then the filesystem will refuse to mount due to the
stripe validation failing. This failure is triggering during distro
upgrades from old kernels lacking this check to newer kernels with
this check, and currently the only way to fix it is with offline
xfs_db surgery.

This runtime validity checking occurs when we read the superblock
for the first time and causes the mount to fail immediately. This
prevents the rewrite of stripe unit/width via
mount options that occurs later in the mount process. Hence there is
no way to recover this situation without resorting to offline xfs_db
rewrite of the values.

However, we parse the mount options long before we read the
superblock, and we know if the mount has been asked to re-write the
stripe alignment configuration when we are reading the superblock
and verifying it for the first time. Hence we can conditionally
ignore stripe verification failures if the mount options specified
will correct the issue.

We validate that the new stripe unit/width are valid before we
overwrite the superblock values, so we can ignore the invalid config
at verification and fail the mount later if the new values are not
valid. This, at least, gives users the chance of correcting the
issue after a kernel upgrade without having to resort to xfs-db
hacks.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-03-25 10:17:18 +05:30
Gao Xiang
a97b59ed79 erofs: drop experimental warning for FSDAX
As EXT4/XFS filesystems, FSDAX functionality is considered to be stable.
Let's drop this warning.

Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240325005116.106351-1-hsiangkao@linux.alibaba.com
2024-03-25 10:48:15 +08:00
Colin Ian King
10211b4a23
fs/9p: remove redundant pointer v9ses
Pointer v9ses is being assigned the value from the return of inlined
function v9fs_inode2v9ses (which just returns inode->i_sb->s_fs_info).
The pointer is not used after the assignment, so the variable is
redundant and can be removed.

Cleans up clang scan warnings such as:
fs/9p/vfs_inode_dotl.c:300:28: warning: variable 'v9ses' set but not
used [-Wunused-but-set-variable]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
2024-03-25 00:34:35 +00:00
Lizhi Xu
11763a8598
fs/9p: fix uaf in in v9fs_stat2inode_dotl
The incorrect logical order of accessing the st object code in v9fs_fid_iget_dotl
is causing this uaf.

Fixes: 724a08450f ("fs/9p: simplify iget to remove unnecessary paths")
Reported-and-tested-by: syzbot+7a3d75905ea1a830dbe5@syzkaller.appspotmail.com
Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com>
Tested-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
2024-03-25 00:34:35 +00:00
Linus Torvalds
ff9c18e435 A patch to minimize blockage when processing very large batches of
dirty caps and two fixes to better handle EOF in the face of multiple
 clients performing reads and size-extending writes at the same time.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmX9xDETHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi3HzCACJYjTUKq4v8/LkhzyJM0WTYXKQ+Orz
 BnwgFHGIEiihQKko/7Ks+fcuEGpdy97Rsn9mtmkN0UCKfbcCHwGwflaoYfkkIA4t
 V9pNX0xRwDTyKaENtiI5GVjC/nYcfotbRK4BfURRYKb1xYHq8lO0mOxXwvt5weqQ
 CISWACp7k7eMcX0R0fKT9LemfBDDu2Pxi5ZnDNSdI6Z87Bwdv96jOaCaJ93Azo1W
 Mjr9ddMmaaqsrmaUE3jp58b56nxTrcOUGR7XQUZtjNjEy5h91WazydD4TJFaEQrF
 CQsV5nXHSRT8E4ROUZk8fa7amLs6FGrx307fkOKW02exQPBF7ij/SAt4
 =STLZ
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-6.9-rc1' of https://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "A patch to minimize blockage when processing very large batches of
  dirty caps and two fixes to better handle EOF in the face of multiple
  clients performing reads and size-extending writes at the same time"

* tag 'ceph-for-6.9-rc1' of https://github.com/ceph/ceph-client:
  ceph: set correct cap mask for getattr request for read
  ceph: stop copying to iter at EOF on sync reads
  ceph: remove SLAB_MEM_SPREAD flag usage
  ceph: break the check delayed cap loop every 5s
2024-03-22 11:15:45 -07:00
Linus Torvalds
6f6efce52d Bug fixes for 6.9:
* Fix invalid pointer dereference by initializing xmbuf before tracepoint
   function is invoked.
 * Use memalloc_nofs_save() when inserting into quota radix tree.
 
 Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQjMC4mbgVeU7MxEIYH7y4RirJu9AUCZffInwAKCRAH7y4RirJu
 9IyEAP9h8KMNtDRXyBxFe8vCtjoSj7fwwIijWLa/y2NH1oWQowEAk83m14akH1KH
 J0HInearcoRv8L16oe/tcNxxPPBuSQQ=
 =emhS
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.9-merge-9' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Chandan Babu:

 - Fix invalid pointer dereference by initializing xmbuf before
   tracepoint function is invoked

 - Use memalloc_nofs_save() when inserting into quota radix tree

* tag 'xfs-6.9-merge-9' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: quota radix tree allocations need to be NOFS on insert
  xfs: fix dev_t usage in xmbuf tracepoints
2024-03-22 11:12:21 -07:00
Jan Kara
9fe6e9e7b5 nfsd: Fix error cleanup path in nfsd_rename()
Commit a8b0026847 ("rename(): avoid a deadlock in the case of parents
having no common ancestor") added an error bail out path. However this
path does not drop the remount protection that has been acquired. Fix
the cleanup path to properly drop the remount protection.

Fixes: a8b0026847 ("rename(): avoid a deadlock in the case of parents having no common ancestor")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-22 09:52:00 -04:00
Justin Stitt
5248f40973 binfmt: replace deprecated strncpy
strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

There is a _nearly_ identical implementation of fill_psinfo present in
binfmt_elf.c -- except that one uses get_task_comm over strncpy(). Let's
mirror that in binfmt_elf_fdpic.c

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://github.com/KSPP/linux/issues/90
Cc:  <linux-hardening@vger.kernel.org>
Signed-off-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240321-strncpy-fs-binfmt_elf_fdpic-c-v2-1-0b6daec6cc56@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-21 20:20:52 -07:00
Linus Torvalds
8e938e3986 9 cifs.ko changesets
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmX83AYACgkQiiy9cAdy
 T1GxyQwApHv/GjrvWXZAHoF2w6BOZSdirHaFPHL+9hqwmxDQ8vcQntqDqyzBtP7R
 aanJdyH8h56e09Dhie46zo6vndHzqD7gqTQtvyHxM3YyXNfW6AH0/NB/hT/UP4s2
 /IFzrWvuH9vobi/UyqjbukVQo3Gix53E1SlkSvERDWvi8ynsHUt4SxeVS9PBid4H
 cZrMeb9FjRmaGLQE3kDEmASsnMFcoGjNiWkfu3TRX0LDFk9TMClzBWGWrWUtNHE+
 QNm6BN7I/mEJP8+W5MSKy20UnWrTH6GkDVuB2E/hJsB4Eo2ggAWTz8X5MYjbx4/A
 f0zx/TbeEiDCGLZpUeySPpz0BjoGVRJkrq1wAXw1H6VfvfwWTQxB+wB57p4SwfVO
 u1By3h5DsUszz4haL34wLSLhLkuMBuO7yMoa5Fnv5gb5NBu6U2IOtMG9WcmPZr6M
 ZUwyxSFZ/l1poinvyI3pQ7pplI3RO57qCUdfFnracVF56qvSby2KXqZ5X907Kgw3
 q3NzTHUc
 =aohZ
 -----END PGP SIGNATURE-----

Merge tag '6.9-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - Various get_inode_info_fixes

 - Fix for querying xattrs of cached dirs

 - Four minor cleanup fixes (including adding some header corrections
   and a missing flag)

 - Performance improvement for deferred close

 - Two query interface fixes

* tag '6.9-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  smb311: additional compression flag defined in updated protocol spec
  smb311: correct incorrect offset field in compression header
  cifs: Move some extern decls from .c files to .h
  cifs: remove redundant variable assignment
  cifs: fixes for get_inode_info
  cifs: open_cached_dir(): add FILE_READ_EA to desired access
  cifs: reduce warning log level for server not advertising interfaces
  cifs: make sure server interfaces are requested only for SMB3+
  cifs: defer close file handles having RH lease
2024-03-21 19:14:28 -07:00
Linus Torvalds
85a79128c4 This pull request contains updates for UBI and UBIFS:
UBI:
         - Add Zhihao Cheng as reviewer
 	- Attach via device tree
 	- Add NVMEM layer
 	- Various fastmap related fixes
 
 UBIFS:
         - Add Zhihao Cheng as reviewer
 	- Convert to folios
 	- Various fixes (memory leaks in error paths, function prototypes)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmX8kjUWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wSUcD/sFJyv3oD9qqt+OZJUI2b84nHdk
 7EXC4vAd1ioTZzQS0txWx8rPPrhi/XKKGIea71qkDpHyi3foT0n2MlELHNpIZaoH
 r8F50LeMzxBC7NEdGMaU4JYR5FOhNrLJanF5H1MEiiN+IaovhPWrA0V9ViWvS8tM
 e3WDA3tEPo2bbpkzgstjow7YxIAD4OcXhgkFxqb0j299zZzO9GmhLqTlyaidBFne
 VJIjurHd4ixgFEBRJGxAxcAdST5ONwx5RmlTy+9/lubn326jRz5VTRj6pkcugjvn
 odyPeLHc3jEXGP+6qvtyuL2jy6AqyRksXQvZYgP5iL8m2+ga0Edj8/zfoiGPnjRN
 ukYIFI2l9Qv4jUsByHX/klSdILL2L5gK2G5u9LrgDameOTnBcQH/i/TBb1MWzPCA
 O48XJo8T0XvwOLCbgHOuQ7+yKKaI49C9AtM2cbrMRL1gJJKjUsXcC5YZu+3a9+Fi
 TO0o0Y61GKS893mmMznhQqTMMr+5JMMlHJ6C7F6pXdt90twThwABZidWQz1uZc2h
 s+KWo7ts5itxBLW4XP8oue4aBsRdVTQ0IbYcB7j+EXE3EjY7CEge2SNHY6/7eiEK
 Y86M75svkMkQdbLNgV+iSUrn7Uddozm14eHL6wIrWv8Pe9bx0OFlCTFsXzhM37hK
 EK3aNxhyIHk5EFkGHA==
 =70g8
 -----END PGP SIGNATURE-----

Merge tag 'ubifs-for-linus-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull UBI and UBIFS updates from Richard Weinberger:
 "UBI:
   - Add Zhihao Cheng as reviewer
   - Attach via device tree
   - Add NVMEM layer
   - Various fastmap related fixes

  UBIFS:
   - Add Zhihao Cheng as reviewer
   - Convert to folios
   - Various fixes (memory leaks in error paths, function prototypes)"

* tag 'ubifs-for-linus-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: (34 commits)
  mtd: ubi: fix NVMEM over UBI volumes on 32-bit systems
  mtd: ubi: provide NVMEM layer over UBI volumes
  mtd: ubi: populate ubi volume fwnode
  mtd: ubi: introduce pre-removal notification for UBI volumes
  mtd: ubi: attach from device tree
  mtd: ubi: block: use notifier to create ubiblock from parameter
  dt-bindings: mtd: ubi-volume: allow UBI volumes to provide NVMEM
  dt-bindings: mtd: add basic bindings for UBI
  ubifs: Queue up space reservation tasks if retrying many times
  ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path
  ubifs: dbg_check_idx_size: Fix kmemleak if loading znode failed
  ubi: Correct the number of PEBs after a volume resize failure
  ubi: fix slab-out-of-bounds in ubi_eba_get_ldesc+0xfb/0x130
  ubi: correct the calculation of fastmap size
  ubifs: Remove unreachable code in dbg_check_ltab_lnum
  ubifs: fix function pointer cast warnings
  ubifs: fix sort function prototype
  ubi: Check for too small LEB size in VTBL code
  MAINTAINERS: Add Zhihao Cheng as UBI/UBIFS reviewer
  ubifs: Convert populate_page() to take a folio
  ...
2024-03-21 15:09:29 -07:00
Linus Torvalds
241590e5a1 Driver core changes for 6.9-rc1
Here is the "big" set of driver core and kernfs changes for 6.9-rc1.
 
 Nothing all that crazy here, just some good updates that include:
   - automatic attribute group hiding from Dan Williams (he fixed up my
     horrible attempt at doing this.)
   - kobject lock contention fixes from Eric Dumazet
   - driver core cleanups from Andy
   - kernfs rcu work from Tejun
   - fw_devlink changes to resolve some reported issues
   - other minor changes, all details in the shortlog
 
 All of these have been in linux-next for a long time with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZfwsHg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynT4ACePcNRAsYrINlOPPKPHimJtyP01yEAn0pZYnj2
 0/UpqIqf3HVPu7zsLKTa
 =vR9S
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the "big" set of driver core and kernfs changes for 6.9-rc1.

  Nothing all that crazy here, just some good updates that include:

   - automatic attribute group hiding from Dan Williams (he fixed up my
     horrible attempt at doing this.)

   - kobject lock contention fixes from Eric Dumazet

   - driver core cleanups from Andy

   - kernfs rcu work from Tejun

   - fw_devlink changes to resolve some reported issues

   - other minor changes, all details in the shortlog

  All of these have been in linux-next for a long time with no reported
  issues"

* tag 'driver-core-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (28 commits)
  device: core: Log warning for devices pending deferred probe on timeout
  driver: core: Use dev_* instead of pr_* so device metadata is added
  driver: core: Log probe failure as error and with device metadata
  of: property: fw_devlink: Add support for "post-init-providers" property
  driver core: Add FWLINK_FLAG_IGNORE to completely ignore a fwnode link
  driver core: Adds flags param to fwnode_link_add()
  debugfs: fix wait/cancellation handling during remove
  device property: Don't use "proxy" headers
  device property: Move enum dev_dma_attr to fwnode.h
  driver core: Move fw_devlink stuff to where it belongs
  driver core: Drop unneeded 'extern' keyword in fwnode.h
  firmware_loader: Suppress warning on FW_OPT_NO_WARN flag
  sysfs:Addresses documentation in sysfs_merge_group and sysfs_unmerge_group.
  firmware_loader: introduce __free() cleanup hanler
  platform-msi: Remove usage of the deprecated ida_simple_xx() API
  sysfs: Introduce DEFINE_SIMPLE_SYSFS_GROUP_VISIBLE()
  sysfs: Document new "group visible" helpers
  sysfs: Fix crash on empty group attributes array
  sysfs: Introduce a mechanism to hide static attribute_groups
  sysfs: Introduce a mechanism to hide static attribute_groups
  ...
2024-03-21 13:34:15 -07:00
Max Filippov
2aea94ac14 exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack()
In NOMMU kernel the value of linux_binprm::p is the offset inside the
temporary program arguments array maintained in separate pages in the
linux_binprm::page. linux_binprm::exec being a copy of linux_binprm::p
thus must be adjusted when that array is copied to the user stack.
Without that adjustment the value passed by the NOMMU kernel to the ELF
program in the AT_EXECFN entry of the aux array doesn't make any sense
and it may break programs that try to access memory pointed to by that
entry.

Adjust linux_binprm::exec before the successful return from the
transfer_args_to_stack().

Cc: <stable@vger.kernel.org>
Fixes: b6a2fea393 ("mm: variable length argument support")
Fixes: 5edc2a5123 ("binfmt_elf_fdpic: wire up AT_EXECFD, AT_EXECFN, AT_SECURE")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Link: https://lore.kernel.org/r/20240320182607.1472887-1-jcmvbkbc@gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-21 10:05:47 -07:00
Linus Torvalds
7b65c810a1 for-6.9-part2-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmX8X6gACgkQxWXV+ddt
 WDs9fQ/+NdeEjgwmYD6EdGNB7dZZKz0XvZ+/2SyDTaQHRazPSmMQh7cHyCzyckmW
 ORRcluGWo+m7cg2bdW9c1KMFdvCaEfuDnDxzyMGvE1iy9gY6KuLbqFIZ1jQe7I6t
 X8voNKLYhF8W3nrBP+PR/PSi61Op2jzrpgzPKdQZFvV4UlTnsNQJyG77IgDB/FFu
 vvnQBtW1xsufnxzYX7+S2066GpCcxqQlaWcgbBRzDcWvJBtzte2hnX2A/wX19evO
 nOqSaWcvALYCPBXfJ4VQ9Znjd3dnU0p2Gf4bp/eTClNB9h5QiDtMCr54/OKT2O8W
 bdqg6RqqWTHtKTWW9MCrWN2qLT8aPoRQJTFv91D2njSelemLVGrBnxXWZeDpB6kX
 0GC4Iqld+F1AM1lOd91D6V7ICQAwf1msp3YE/cokCLZozssKHN+4wrk3lyngWgDT
 AnvRRFTC9TOqLavobI6Upfc/jxP3ZkrSuacgJCuIILvptCLOyVmUqNhQyXx5GVEm
 TeARUeLnNrvnmaXWiW3tSRNZd52VGsoGqW81N/Uefa0zG5HGnUEJbnHb3HXLgH17
 AxyXKDwPnRnOj3fNl0fjZaTFtSgB3noFMMKZ2j6gRiyh5iG3/f8pZgtxesqcwT1e
 vCdq6b3sYEU8bYkSXeaFvi5WBrUIwquej4k0t/F4I/1hmzk7tTg=
 =B5bp
 -----END PGP SIGNATURE-----

Merge tag 'for-6.9-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "Fix a problem found in 6.7 after adding the temp-fsid feature which
  changed device tracking in memory and broke grub-probe. This is used
  on initrd-less systems. There were several iterations of the fix and
  it took longer than expected"

* tag 'for-6.9-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: do not skip re-registration for the mounted device
2024-03-21 09:54:28 -07:00
Linus Torvalds
1b3e251373 Description for this pull request:
- Improve dirsync performance by syncing on a dentry-set rather than
     on a per-directory entry
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmX653kWHGxpbmtpbmpl
 b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCMEZD/9bpRN1V5YezmQUUh928yC0OWai
 aEDZqmzEsrJ7wXKUYmwjCzxO9/h3CwdtWTaJMctz2XlfJ9L62hiGXki4Cc27vfgs
 RGDV7fHpRmRq+JxgZN+UEnfrJx6kA0xrOaoyrbvT0t8pTCyK8yOY28YsltbI8wKd
 yQbWS4u4r1/Gugfry7PeGA5x6fxGcP/kbjB81Q1+/ilJetqELcUH9INOGSwvfzOh
 k9lgF+ujJVauzP1Pbw4fSZhcfXGYu0x4rbwcUAJUvuc3NXbotKEAW/ICJW1bH5En
 nN7IjiCYwMjEpK7H4zaZ61zrTIfe/MYgKLsq5XWYrvAmSL8QIlJiB4gbSpwd95gc
 7AK4d4mgYrF3oZPjYvb7kpbPtrNywQzIhef2W67E+fGifAKSsuI8gf+LYA2a7m5A
 HqTeL+3z1DnCwEGfLkNtEBy4xbv039dftjDcR8qzPs3WgwWU7G1nhwuQttcXC1qB
 p5eYxL1RvCbDXER9K6chdYQg7KkxCThlIGuG6shqQt4ybApyGbngmw7s3laUHMZQ
 3R3eD1UqtDb3vr04/PmvQ8NbLy0js8Q2lmjIowznyO/ahpCw/++OgX7T4Q+Sse6h
 xfGj8YkyeXNF7t6NwXhErdygvovoXQ4ADZIcxlhSc3NKnZUzL7RDEx5yMUzKggHu
 BlZlCJZoOoAm5Gk9Cw==
 =0fPf
 -----END PGP SIGNATURE-----

Merge tag 'exfat-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat updates from Namjae Jeon:

 - Improve dirsync performance by syncing on a dentry-set rather than on
   a per-directory entry

* tag 'exfat-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: remove duplicate update parent dir
  exfat: do not sync parent dir if just update timestamp
  exfat: remove unused functions
  exfat: convert exfat_find_empty_entry() to use dentry cache
  exfat: convert exfat_init_ext_entry() to use dentry cache
  exfat: move free cluster out of exfat_init_ext_entry()
  exfat: convert exfat_remove_entries() to use dentry cache
  exfat: convert exfat_add_entry() to use dentry cache
  exfat: add exfat_get_empty_dentry_set() helper
  exfat: add __exfat_get_dentry_set() helper
2024-03-21 09:47:12 -07:00
Linus Torvalds
2395690004 9 ksmbd changesets
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmX6Yr8ACgkQiiy9cAdy
 T1GOEwv/fOi8lx3BdIUoJI/AhlnCHze87MikdD3WAmhFaheTiCanVz+0o26ugDoZ
 Tzq3razDbK4Z6sFGHLI4i4XKkaWpj9RP1UAVckNMbz2IGqR5vR+mKk0Q6BHnViPs
 CbIsqo1Ya84IYBeCyoe99FZxBtDWxx+H16UtFsMro6leyP1bUGSIwfh4fH21hnXt
 OSrR2eMKSbEAQYzFiCnacbhh2ssWw2blnbh4eqxOyuKrU37GXpgfLA9OvlAXmnNa
 DBLgp9fdpng+q+zFOfLPSUburEJBndsvIMX/rZvpKJETL64WWNmjqnNcIuoovEAi
 sMLt4W5MWupcRCpArT/ZmjFLUhsjGUMmAbHaMFLe76Pi4ooeZhWEfakR4g/4g4kF
 Y/XXNEkwrdUQKam9CkkXzCB0DGAwfu5XLhpSTnOTQ0ECbGF5AYKdbChxioTSmxU8
 alz4wviviMY8OxTbJa75MpSac2eDSW6bprO7Q++cVYk1DzR3I36Wts3FH0+uz9fW
 4mEoMnS2
 =Zyn6
 -----END PGP SIGNATURE-----

Merge tag 'v6.9-rc-smb3-server-fixes' of git://git.samba.org/ksmbd

Pull smb server updates from Steve French:

 - add support for durable file handles (an important data integrity
   feature)

 - fixes for potential out of bounds issues

 - fix possible null dereference in close

 - getattr fixes

 - trivial typo fix and minor cleanup

* tag 'v6.9-rc-smb3-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: remove module version
  ksmbd: fix potencial out-of-bounds when buffer offset is invalid
  ksmbd: fix slab-out-of-bounds in smb_strndup_from_utf16()
  ksmbd: Fix spelling mistake "connction" -> "connection"
  ksmbd: fix possible null-deref in smb_lazy_parent_lease_break_close
  ksmbd: add support for durable handles v1/v2
  ksmbd: mark SMB2_SESSION_EXPIRED to session when destroying previous session
  ksmbd: retrieve number of blocks using vfs_getattr in set_file_allocation_info
  ksmbd: replace generic_fillattr with vfs_getattr
2024-03-20 16:42:47 -07:00
Steve French
e56bc745fa smb311: additional compression flag defined in updated protocol spec
Added new compression flag that was recently documented, in
addition fix some typos and clarify the sid_attr_data struct
definition.

Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-03-20 11:49:44 -05:00
Steve French
68c5818a27 smb311: correct incorrect offset field in compression header
The offset field in the compression header is 32 bits not 16.

Reviewed-by: Bharath SM <bharathsm@microsoft.com>
Reported-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-03-20 11:49:44 -05:00
David Howells
5b142b37c7 cifs: Move some extern decls from .c files to .h
Move the following:

        extern mempool_t *cifs_sm_req_poolp;
        extern mempool_t *cifs_req_poolp;
        extern mempool_t *cifs_mid_poolp;
        extern bool disable_legacy_dialects;

from various .c files to cifsglob.h.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-03-20 11:49:24 -05:00
Linus Torvalds
a4145ce1e7 bcachefs fixes for 6.9-rc1
Assorted bugfixes.
 
 Most are fixes for simple assertion pops; the most significant fix is
 for a deadlock in recovery when we have to rewrite large numbers of
 btree nodes to fix errors. This was incorrectly running out of the same
 workqueue as the core interior btree update path - we now give it its
 own single threaded workqueue.
 
 This was visible to users as "bch2_btree_update_start(): error:
 BCH_ERR_journal_reclaim_would_deadlock" - and then recovery hanging.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmX6CoYACgkQE6szbY3K
 bnZp2hAAwAw8haQKeR0+0aAaqTavvBcjcloeKlQhRl+OxV1rAgxcjKGai5txZ9rI
 d4FVOOo7MqHq1oN9Ydsy1+0R70eCFzhDxhT1Ph5MhIzc7nd8lC0GQjO0atx23cni
 4UZgSxi6quEP401MTVhvVbCPLmvfPJLpIBzptJUDS/eysxSZpS4A10gEzipoNjPv
 DOdrsvoo8nQX53tERJ/IxtroFL44p4y8OyZK65NILFF9xZosKz1P9ktrWufmRVoY
 /Hl8SUfhSNJDFW5pIMPOmoG/+RG+hJK4BaiNWPXLaSvO+3PmQskJ2tvHQVNjHQYt
 dMYWcy4hN47XtYvrHG9xmaQP+lZCDijdBrhmik4brqfZbloH43MVdDFysjfIPhUm
 qk+zzb0uE0ZhwRvQOjnYEQpHjXmj7Bm80+dhfNuuiKlhz4bOeDz8UZykJOzgD0zH
 n4cd+nbCxuogkukzLLQMbFv1+MCsCZpStkXP3GQXCK0k+H2briPGALuA74sxfAhH
 ajHLNr6qMU+uB6Ce0oM7e+9dPLfV/NalEwWW7aR/4TamxPBt575Hpjp0BV//BRfD
 IxdEKrMNdbKBJDUj1s5aTwcSF6ae6zHtyQXuKr93mWQqNvVXvX5/FPQYr70uA1VP
 iieBkde7aSTGCbTdTEcY9NcXdT2X/91aobsPvwGeq1z5Y1JJ0nU=
 =J/hu
 -----END PGP SIGNATURE-----

Merge tag 'bcachefs-2024-03-19' of https://evilpiepirate.org/git/bcachefs

Pull bcachefs fixes from Kent Overstreet:
 "Assorted bugfixes.

  Most are fixes for simple assertion pops; the most significant fix is
  for a deadlock in recovery when we have to rewrite large numbers of
  btree nodes to fix errors. This was incorrectly running out of the
  same workqueue as the core interior btree update path - we now give it
  its own single threaded workqueue.

  This was visible to users as "bch2_btree_update_start(): error:
  BCH_ERR_journal_reclaim_would_deadlock" - and then recovery hanging"

* tag 'bcachefs-2024-03-19' of https://evilpiepirate.org/git/bcachefs:
  bcachefs: Fix lost wakeup on journal shutdown
  bcachefs; Fix deadlock in bch2_btree_update_start()
  bcachefs: ratelimit errors from async_btree_node_rewrite
  bcachefs: Run check_topology() first
  bcachefs: Improve bch2_fatal_error()
  bcachefs: Fix lost transaction restart error
  bcachefs: Don't corrupt journal keys gap buffer when dropping alloc info
  bcachefs: fix for building in userspace
  bcachefs: bch2_snapshot_is_ancestor() now safe to call in early recovery
  bcachefs: Fix nested transaction restart handling in bch2_bucket_gens_init()
  bcachefs: Improve sysfs internal/btree_updates
  bcachefs: Split out btree_node_rewrite_worker
  bcachefs: Fix locking in bch2_alloc_write_key()
  bcachefs: Avoid extent entry type assertions in .invalid()
  bcachefs: Fix spurious -BCH_ERR_transaction_restart_nested
  bcachefs: Fix check_key_has_snapshot() call
  bcachefs: Change "accounting overran journal reservation" to a warning
2024-03-19 17:27:25 -07:00
Xiubo Li
825b82f6b8 ceph: set correct cap mask for getattr request for read
In case of hitting the file EOF, ceph_read_iter() needs to retrieve the
file size from MDS, and Fr caps aren't neccessary.

[ idryomov: fold into existing retry_op == READ_INLINE branch ]

Reported-by: Frank Hsiao <frankhsiao@qnap.com>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Frank Hsiao <frankhsiao@qnap.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-03-19 14:35:55 +01:00
Xiubo Li
1065da21e5 ceph: stop copying to iter at EOF on sync reads
If EOF is encountered, ceph_sync_read() return value is adjusted down
according to i_size, but the "to" iter is advanced by the actual number
of bytes read.  Then, when retrying, the remainder of the range may be
skipped incorrectly.

Ensure that the "to" iter is advanced only until EOF.

[ idryomov: changelog ]

Fixes: c3d8e0b5de ("ceph: return the real size read when it hits EOF")
Reported-by: Frank Hsiao <frankhsiao@qnap.com>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Frank Hsiao <frankhsiao@qnap.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-03-19 14:35:55 +01:00
Yuezhang Mo
dc38fdc51b exfat: remove duplicate update parent dir
For renaming, the directory only needs to be updated once if it
is in the same directory.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:56:10 +09:00
Yuezhang Mo
96cf51accc exfat: do not sync parent dir if just update timestamp
When sync or dir_sync is enabled, there is no need to sync the
parent directory's inode if only for updating its timestamp.

1. If an unexpected power failure occurs, the timestamp of the
   parent directory is not updated to the storage, which has no
   impact on the user.

2. The number of writes will be greatly reduced, which can not
   only improve performance, but also prolong device life.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:56:05 +09:00
Yuezhang Mo
4d71455976 exfat: remove unused functions
exfat_count_ext_entries() is no longer called, remove it.
exfat_update_dir_chksum() is no longer called, remove it and
rename exfat_update_dir_chksum_with_entry_set() to it.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:56:01 +09:00
Yuezhang Mo
af02c72d0b exfat: convert exfat_find_empty_entry() to use dentry cache
Before this conversion, each dentry traversed needs to be read
from the storage device or page cache. There are at least 16
dentries in a sector. This will result in frequent page cache
searches.

After this conversion, if all directory entries in a sector are
used, the sector only needs to be read once.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:55:54 +09:00
Yuezhang Mo
d97e060673 exfat: convert exfat_init_ext_entry() to use dentry cache
Before this conversion, in exfat_init_ext_entry(), to init
the dentries in a dentry set, the sync times is equals the
dentry number if 'dirsync' or 'sync' is enabled.
That affects not only performance but also device life.

After this conversion, only needs to be synchronized once if
'dirsync' or 'sync' is enabled.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:55:49 +09:00
Yuezhang Mo
4e1aa22fea exfat: move free cluster out of exfat_init_ext_entry()
exfat_init_ext_entry() is an init function, it's a bit strange
to free cluster in it. And the argument 'inode' will be removed
from exfat_init_ext_entry(). So this commit changes to free the
cluster in exfat_remove_entries().

Code refinement, no functional changes.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:55:45 +09:00
Yuezhang Mo
ff4343da02 exfat: convert exfat_remove_entries() to use dentry cache
Before this conversion, in exfat_remove_entries(), to mark the
dentries in a dentry set as deleted, the sync times is equals
the dentry numbers if 'dirsync' or 'sync' is enabled.
That affects not only performance but also device life.

After this conversion, only needs to be synchronized once if
'dirsync' or 'sync' is enabled.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:55:40 +09:00
Yuezhang Mo
cf8663fa99 exfat: convert exfat_add_entry() to use dentry cache
After this conversion, if "dirsync" or "sync" is enabled, the
number of synchronized dentries in exfat_add_entry() will change
from 2 to 1.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:55:36 +09:00
Yuezhang Mo
01da3a5176 exfat: add exfat_get_empty_dentry_set() helper
This helper is used to lookup empty dentry set. If there are
no enough empty dentries at the input location, this helper will
return the number of dentries that need to be skipped for the
next lookup.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:55:33 +09:00
Yuezhang Mo
7b6bab2359 exfat: add __exfat_get_dentry_set() helper
Since exfat_get_dentry_set() invokes the validate functions of
exfat_validate_entry(), it only supports getting a directory
entry set of an existing file, doesn't support getting an empty
entry set.

To remove the limitation, add this helper.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19 20:55:28 +09:00
Kent Overstreet
2e92d26b25 bcachefs: Fix lost wakeup on journal shutdown
We need to check for journal shutdown first in __journal_res_get() -
after the journal is shutdown, j->watermark won't be changing anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18 23:35:42 -04:00
Kent Overstreet
c502b5b878 bcachefs; Fix deadlock in bch2_btree_update_start()
BCH_TRANS_COMMIT_journal_reclaim with watermark != BCH_WATERMARK_reclaim
means nonblocking, and we need the journal_res_get() in
btree_update_start() to respect that.

In a future refactoring we'll be deleting
BCH_TRANS_COMMIT_journal_reclaim and replacing it with an explicit
BCH_TRANS_COMMIT_nonblocking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18 23:35:42 -04:00
Namjae Jeon
def30e72d8 ksmbd: remove module version
ksmbd module version marking is not needed. Since there is a
Linux kernel version, there is no point in increasing it anymore.

Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-03-18 21:21:38 -05:00
Namjae Jeon
c6cd2e8d2d ksmbd: fix potencial out-of-bounds when buffer offset is invalid
I found potencial out-of-bounds when buffer offset fields of a few requests
is invalid. This patch set the minimum value of buffer offset field to
->Buffer offset to validate buffer length.

Cc: stable@vger.kernel.org
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2024-03-18 21:21:33 -05:00
Linus Torvalds
b3603fcb79 dlm for 6.9
- Fix mistaken variable assignment that caused a refcounting problem.
 - Revert a recent change that began using atomic counters where they
   were not needed (for lkb wait_count.)
 - Add comments around forced state reset for waiting lock operations
   during recovery.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEcGkeEvkvjdvlR90nOBtzx/yAaaoFAmX4raoACgkQOBtzx/yA
 aapyThAAtLcTZXOa9MuZDvLtaQKX4c2MDlqiAhdL0YOYnz3+DAveA8HF1FRbVwL0
 74lA1O/GX0t2TdCrLiq75u+N/Sm2ACtbZEr8z6VeEoxxtOwCVbGKjA0CwDgvhdSe
 hUv5beO4mlguc16l4+u88z1Ta6GylXmWHRL6l2q4dPKmO4qVX6wn9JUT4JHJSQy/
 ACJ3+Lu7ndREBzCmqb4cR4TcHAhBynYmV7IIE3LQprgkCKiX2A3boeOIk+lEhUn5
 aqmwNNF2WDjJ1D5QVKbXu07MraD71rnyZBDuHzjprP01OhgXfUHLIcgdi7GzK8aN
 KnQ9S5hQWHzTiWA/kYgrUq/S5124plm2pMRyh1WDG6g3dhBxh7XsOHUxtgbLaurJ
 LmMxdQgH0lhJ3f+LSm3w8e3m45KxTeCYC2NUVg/icjOGUjAsVx1xMDXzMxoABoWO
 GGVED4i4CesjOyijMuRO9G/0MRb/lIyZkfoZgtHgL20yphmtv0B5XIIz062N28Wf
 PqmsYUz4ESYkxR4u/5VPBey5aYYdhugnOSERC6yH4QQJXyRgGWQn/CSuRrEmJJS2
 CurprPKx99XJZjZE7RJNlvpUrSBcD9Y7R6I3vo6RyrUCNwPJ0Y+Qvydvc9FoMN3R
 tn7fJe7tDfEEsukhGkwp90vK3MLbW5iKv7IaAxyALdSW12A23WM=
 =6RCz
 -----END PGP SIGNATURE-----

Merge tag 'dlm-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm

Pull dlm updates from David Teigland:

 - Fix mistaken variable assignment that caused a refcounting problem

 - Revert a recent change that began using atomic counters where they
   were not needed (for lkb wait_count)

 - Add comments around forced state reset for waiting lock operations
   during recovery

* tag 'dlm-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: add comments about forced waiters reset
  dlm: revert atomic_t lkb_wait_count
  dlm: fix user space lkb refcounting
2024-03-18 15:39:48 -07:00
Linus Torvalds
ad584d73a2 Tracing updates for 6.9:
Main user visible change:
 
 - User events can now have "multi formats"
 
   The current user events have a single format. If another event is created
   with a different format, it will fail to be created. That is, once an
   event name is used, it cannot be used again with a different format. This
   can cause issues if a library is using an event and updates its format.
   An application using the older format will prevent an application using
   the new library from registering its event.
 
   A task could also DOS another application if it knows the event names, and
   it creates events with different formats.
 
   The multi-format event is in a different name space from the single
   format. Both the event name and its format are the unique identifier.
   This will allow two different applications to use the same user event name
   but with different payloads.
 
 - Added support to have ftrace_dump_on_oops dump out instances and
   not just the main top level tracing buffer.
 
 Other changes:
 
 - Add eventfs_root_inode
 
   Only the root inode has a dentry that is static (never goes away) and
   stores it upon creation. There's no reason that the thousands of other
   eventfs inodes should have a pointer that never gets set in its
   descriptor. Create a eventfs_root_inode desciptor that has a eventfs_inode
   descriptor and a dentry pointer, and only the root inode will use this.
 
 - Added WARN_ON()s in eventfs
 
   There's some conditionals remaining in eventfs that should never be hit,
   but instead of removing them, add WARN_ON() around them to make sure that
   they are never hit.
 
 - Have saved_cmdlines allocation also include the map_cmdline_to_pid array
 
   The saved_cmdlines structure allocates a large amount of data to hold its
   mappings. Within it, it has three arrays. Two are already apart of it:
   map_pid_to_cmdline[] and saved_cmdlines[]. More memory can be saved by
   also including the map_cmdline_to_pid[] array as well.
 
 - Restructure __string() and __assign_str() macros used in TRACE_EVENT().
 
   Dynamic strings in TRACE_EVENT() are declared with:
 
       __string(name, source)
 
   And assigned with:
 
      __assign_str(name, source)
 
   In the tracepoint callback of the event, the __string() is used to get the
   size needed to allocate on the ring buffer and __assign_str() is used to
   copy the string into the ring buffer. There's a helper structure that is
   created in the TRACE_EVENT() macro logic that will hold the string length
   and its position in the ring buffer which is created by __string().
 
   There are several trace events that have a function to create the string
   to save. This function is executed twice. Once for __string() and again
   for __assign_str(). There's no reason for this. The helper structure could
   also save the string it used in __string() and simply copy that into
   __assign_str() (it also already has its length).
 
   By using the structure to store the source string for the assignment, it
   means that the second argument to __assign_str() is no longer needed.
 
   It will be removed in the next merge window, but for now add a warning if
   the source string given to __string() is different than the source string
   given to __assign_str(), as the source to __assign_str() isn't even used
   and will be going away.
 
 - Added checks to make sure that the source of __string() is also the
   source of __assign_str() so that it can be safely removed in the next
   merge window.
 
   Included fixes that the above check found.
 
 - Other minor clean ups and fixes
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZfhbUBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qrhJAP9bfnYO7tfNGZVNPmTT7Fz0z4zCU1Pb
 P8M+24yiFTeFWwD/aIPlMFZONVkTdFAlLdffl6kJOKxZ7vW4XzUjfNWb6wo=
 =z/D6
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:
 "Main user visible change:

   - User events can now have "multi formats"

     The current user events have a single format. If another event is
     created with a different format, it will fail to be created. That
     is, once an event name is used, it cannot be used again with a
     different format. This can cause issues if a library is using an
     event and updates its format. An application using the older format
     will prevent an application using the new library from registering
     its event.

     A task could also DOS another application if it knows the event
     names, and it creates events with different formats.

     The multi-format event is in a different name space from the single
     format. Both the event name and its format are the unique
     identifier. This will allow two different applications to use the
     same user event name but with different payloads.

   - Added support to have ftrace_dump_on_oops dump out instances and
     not just the main top level tracing buffer.

  Other changes:

   - Add eventfs_root_inode

     Only the root inode has a dentry that is static (never goes away)
     and stores it upon creation. There's no reason that the thousands
     of other eventfs inodes should have a pointer that never gets set
     in its descriptor. Create a eventfs_root_inode desciptor that has a
     eventfs_inode descriptor and a dentry pointer, and only the root
     inode will use this.

   - Added WARN_ON()s in eventfs

     There's some conditionals remaining in eventfs that should never be
     hit, but instead of removing them, add WARN_ON() around them to
     make sure that they are never hit.

   - Have saved_cmdlines allocation also include the map_cmdline_to_pid
     array

     The saved_cmdlines structure allocates a large amount of data to
     hold its mappings. Within it, it has three arrays. Two are already
     apart of it: map_pid_to_cmdline[] and saved_cmdlines[]. More memory
     can be saved by also including the map_cmdline_to_pid[] array as
     well.

   - Restructure __string() and __assign_str() macros used in
     TRACE_EVENT()

     Dynamic strings in TRACE_EVENT() are declared with:

         __string(name, source)

     And assigned with:

        __assign_str(name, source)

     In the tracepoint callback of the event, the __string() is used to
     get the size needed to allocate on the ring buffer and
     __assign_str() is used to copy the string into the ring buffer.
     There's a helper structure that is created in the TRACE_EVENT()
     macro logic that will hold the string length and its position in
     the ring buffer which is created by __string().

     There are several trace events that have a function to create the
     string to save. This function is executed twice. Once for
     __string() and again for __assign_str(). There's no reason for
     this. The helper structure could also save the string it used in
     __string() and simply copy that into __assign_str() (it also
     already has its length).

     By using the structure to store the source string for the
     assignment, it means that the second argument to __assign_str() is
     no longer needed.

     It will be removed in the next merge window, but for now add a
     warning if the source string given to __string() is different than
     the source string given to __assign_str(), as the source to
     __assign_str() isn't even used and will be going away.

   - Added checks to make sure that the source of __string() is also the
     source of __assign_str() so that it can be safely removed in the
     next merge window.

     Included fixes that the above check found.

   - Other minor clean ups and fixes"

* tag 'trace-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (34 commits)
  tracing: Add __string_src() helper to help compilers not to get confused
  tracing: Use strcmp() in __assign_str() WARN_ON() check
  tracepoints: Use WARN() and not WARN_ON() for warnings
  tracing: Use div64_u64() instead of do_div()
  tracing: Support to dump instance traces by ftrace_dump_on_oops
  tracing: Remove second parameter to __assign_rel_str()
  tracing: Add warning if string in __assign_str() does not match __string()
  tracing: Add __string_len() example
  tracing: Remove __assign_str_len()
  ftrace: Fix most kernel-doc warnings
  tracing: Decrement the snapshot if the snapshot trigger fails to register
  tracing: Fix snapshot counter going between two tracers that use it
  tracing: Use EVENT_NULL_STR macro instead of open coding "(null)"
  tracing: Use ? : shortcut in trace macros
  tracing: Do not calculate strlen() twice for __string() fields
  tracing: Rework __assign_str() and __string() to not duplicate getting the string
  cxl/trace: Properly initialize cxl_poison region name
  net: hns3: tracing: fix hclgevf trace event strings
  drm/i915: Add missing ; to __assign_str() macros in tracepoint code
  NFSD: Fix nfsd_clid_class use of __string_len() macro
  ...
2024-03-18 15:11:44 -07:00
Chengming Zhou
a8922f7967 ceph: remove SLAB_MEM_SPREAD flag usage
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was
removed as of v6.8-rc1, so it became a dead flag since the commit
16a1d96835 ("mm/slab: remove mm/slab.c and slab_def.h"). And the
series [1] went on to mark it obsolete to avoid confusion for users.
Here we can just remove all its users, which has no functional change.

[1] https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz/

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-03-18 22:03:29 +01:00
Xiubo Li
09927e7ef1 ceph: break the check delayed cap loop every 5s
In some cases this may take a long time and will block renewing
the caps to MDS.

[ idryomov: massage comment ]

Link: https://tracker.ceph.com/issues/50223#note-21
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-03-18 22:03:29 +01:00
Linus Torvalds
bf3a69c686 One fix, one cleanup...
Fix:
 Julia Lawall pointed out a null pointer dereference.
 
 Cleanup:
 Vlastimil Babka sent me a patch to remove some SLAB related code.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEIGSFVdO6eop9nER2z0QOqevODb4FAmXSUcsACgkQz0QOqevO
 Db5VmxAAiQlsxX2Ki3q00rMgaXFS4yTwSPsGsTrC5eQlyJfq4xEGMTWJGjjRS56k
 L2FGioP7OmIlo2VrzCc9Ms4ve/NyQjXpaoDMnsEUWSUfd8OHSJkBrpeVUWWfcHHk
 zLEmxNb2AETcJupAgoJOWOoSb59ggCKpCqmLezYoSZmlnb9qg6lhFbnWtkVC6q+p
 AREOfByoLIrJUtVh4Bmexo4nO5w3F84cfAV2WAmLMnXKjGnyFLGkqQvy8yXW0sA1
 hsZW+VjmoRaCG78M6OX+Sl3ok0V8i2AcOkguWPj+dkCPb8XLkxhGwjnqLziJ54Z5
 aFrtSzeZiNQOqy7b6cj6+x2KcWE5FhphAKjX/psEZrZNa0e6ZvNfby3yJ4TzNWaN
 eajtOtcq+Ec9IruWXt/WCsm0zYwW1HumUhga5QCHjQRRjOt36ua4QC02iCx2sYuX
 SBnsBCgQo1xxAta3uOMj2sG38lUwYoH0U5wlPsqrGh1nsbGbc49Ok7BYX/wWF8os
 CYnT5t2KR9yUvblV+dH9XTj2EwqgINMRYBW7uBjZqY9gq2v/RKrtQCjnedAAA+yx
 B6UUob/naV5VpXfhwpXiw2oJrQy/kqQuwOEcgY/a6cwmENd9RuwGXBiBI6hrAOzl
 ftxiUcByW8/hS13G04qJ7pGTACs4njteMvvg+Y68nWUmPWMZTio=
 =/gAC
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-6.9-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs updates from Mike Marshall:
 "One fix, one cleanup...

  Fix: Julia Lawall pointed out a null pointer dereference.

  Cleanup: Vlastimil Babka sent me a patch to remove some SLAB related
  code"

* tag 'for-linus-6.9-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  Julia Lawall reported this null pointer dereference, this should fix it.
  fs/orangefs: remove ORANGEFS_CACHE_CREATE_FLAGS
2024-03-18 12:15:19 -07:00
Linus Torvalds
c5d9ab85eb f2fs update for 6.9-rc1
In this round, there are a number of updates on mainly two areas: Zoned block
 device support and Per-file compression. For example, we've found several issues
 to support Zoned block device especially having large sections regarding to GC
 and file pinning used for Android devices. In compression side, we've fixed many
 corner race conditions that had broken the design assumption.
 
 Enhancement:
  - Support file pinning for Zoned block device having large section
  - Enhance the data recovery after sudden power cut on Zoned block device
  - Add more error injection cases to easily detect the kernel panics
  - add a proc entry show the entire disk layout
  - Improve various error paths paniced by BUG_ON in block allocation and GC
  - support SEEK_DATA and SEEK_HOLE for compression files
 
 Bug fix:
  - fix to avoid use-after-free issue in f2fs_filemap_fault
  - fix some race conditions to break the atomic write design assumption
  - fix to truncate meta inode pages forcely
  - resolve various per-file compression issues wrt the space management and
    compression policies
  - fix some swap-related bugs
 
 In addition, we removed deprecated codes such as io_bits and heap_allocation,
 and also fixed minor error handling routines with neat debugging messages.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmX4gS0ACgkQQBSofoJI
 UNLmgBAAg4mvbWjmJ5VbXs4zGLOgLRJYcY1sZRO5Ufg4LhWzoGRxL1Dru+TELw0t
 1Ck2EQvP91XZ5weA5AZOfWbxcijy4+8L3P8L7ohOShudfACci0wQsx6IaUUWWylC
 ILA4+DkovpZrlu6th12Gj9QAM6TN9gdy3V1VLT5O/KmE1x6Pekwp2hQoIvVJRH5L
 I3KxOf5fTe3oWLvEN6m7yCz/8qGqz8+w0ae90UG0fqi0wVEuZJ99zsVPnuhu6uBo
 riFm2A6ra0I/JqoPyqn2QM6ApItM867ULo9EoyQVgq56Q1w31ENOJXsU9N7N4Wxt
 olgujH1SijkWk9ni57iKtMhR68e3Rs+pVsuNFmJuOPq0HASoggB66QRrVvCgM9JG
 z3D//CB2ONtX2XiKJMiTcX9VqIqrMw6L1eVxEZu0P96C3CS70MoBU69mdSR9Og2S
 5nQXja3yzFhdk3thp6+wAJ3I04ZQkf3qoHZB+0chU2Xl1pV+5NIkBgBsSw8g/TY3
 EIHMfK+TX0SBSNCvkUDEJ+Z8ZRID6tcbAquTSsBr6wxB+F9mq7onEvI8O7xwyH9W
 DU8xhymOE2QUoluNtyW7ww6HK913ripXIenI9LaYJnuj0XeDAcMIoPsgR7AGU5UG
 hshvirFdUdWRMTfXxNNUrvhOWI0qurQSVx+VV6Qb62DGqR5ofOw=
 =Qpvy
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs update from Jaegeuk Kim:
 "In this round, there are a number of updates on mainly two areas:
  Zoned block device support and Per-file compression. For example,
  we've found several issues to support Zoned block device especially
  having large sections regarding to GC and file pinning used for
  Android devices. In compression side, we've fixed many corner race
  conditions that had broken the design assumption.

  Enhancements:
   - Support file pinning for Zoned block device having large section
   - Enhance the data recovery after sudden power cut on Zoned block
     device
   - Add more error injection cases to easily detect the kernel panics
   - add a proc entry show the entire disk layout
   - Improve various error paths paniced by BUG_ON in block allocation
     and GC
   - support SEEK_DATA and SEEK_HOLE for compression files

  Bug fixes:
   - avoid use-after-free issue in f2fs_filemap_fault
   - fix some race conditions to break the atomic write design
     assumption
   - fix to truncate meta inode pages forcely
   - resolve various per-file compression issues wrt the space
     management and compression policies
   - fix some swap-related bugs

  In addition, we removed deprecated codes such as io_bits and
  heap_allocation, and also fixed minor error handling routines with
  neat debugging messages"

* tag 'f2fs-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (60 commits)
  f2fs: fix to avoid use-after-free issue in f2fs_filemap_fault
  f2fs: truncate page cache before clearing flags when aborting atomic write
  f2fs: mark inode dirty for FI_ATOMIC_COMMITTED flag
  f2fs: prevent atomic write on pinned file
  f2fs: fix to handle error paths of {new,change}_curseg()
  f2fs: unify the error handling of f2fs_is_valid_blkaddr
  f2fs: zone: fix to remove pow2 check condition for zoned block device
  f2fs: fix to truncate meta inode pages forcely
  f2fs: compress: fix reserve_cblocks counting error when out of space
  f2fs: compress: relocate some judgments in f2fs_reserve_compress_blocks
  f2fs: add a proc entry show disk layout
  f2fs: introduce SEGS_TO_BLKS/BLKS_TO_SEGS for cleanup
  f2fs: fix to check return value of f2fs_gc_range
  f2fs: fix to check return value __allocate_new_segment
  f2fs: fix to do sanity check in update_sit_entry
  f2fs: fix to reset fields for unloaded curseg
  f2fs: clean up new_curseg()
  f2fs: relocate f2fs_precache_extents() in f2fs_swap_activate()
  f2fs: fix blkofs_end correctly in f2fs_migrate_blocks()
  f2fs: ro: don't start discard thread for readonly image
  ...
2024-03-18 11:26:00 -07:00
Anand Jain
d565fffa68 btrfs: do not skip re-registration for the mounted device
There are reports that since version 6.7 update-grub fails to find the
device of the root on systems without initrd and on a single device.

This looks like the device name changed in the output of
/proc/self/mountinfo:

6.5-rc5 working

  18 1 0:16 / / rw,noatime - btrfs /dev/sda8 ...

6.7 not working:

  17 1 0:15 / / rw,noatime - btrfs /dev/root ...

and "update-grub" shows this error:

  /usr/sbin/grub-probe: error: cannot find a device for / (is /dev mounted?)

This looks like it's related to the device name, but grub-probe
recognizes the "/dev/root" path and tries to find the underlying device.
However there's a special case for some filesystems, for btrfs in
particular.

The generic root device detection heuristic is not done and it all
relies on reading the device infos by a btrfs specific ioctl. This ioctl
returns the device name as it was saved at the time of device scan (in
this case it's /dev/root).

The change in 6.7 for temp_fsid to allow several single device
filesystem to exist with the same fsid (and transparently generate a new
UUID at mount time) was to skip caching/registering such devices.

This also skipped mounted device. One step of scanning is to check if
the device name hasn't changed, and if yes then update the cached value.

This broke the grub-probe as it always read the device /dev/root and
couldn't find it in the system. A temporary workaround is to create a
symlink but this does not survive reboot.

The right fix is to allow updating the device path of a mounted
filesystem even if this is a single device one.

In the fix, check if the device's major:minor number matches with the
cached device. If they do, then we can allow the scan to happen so that
device_list_add() can take care of updating the device path. The file
descriptor remains unchanged.

This does not affect the temp_fsid feature, the UUID of the mounted
filesystem remains the same and the matching is based on device major:minor
which is unique per mounted filesystem.

This covers the path when the device (that exists for all mounted
devices) name changes, updating /dev/root to /dev/sdx. Any other single
device with filesystem and is not mounted is still skipped.

Note that if a system is booted and initial mount is done on the
/dev/root device, this will be the cached name of the device. Only after
the command "btrfs device scan" it will change as it triggers the
rename.

The fix was verified by users whose systems were affected.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=218353
Link: https://lore.kernel.org/lkml/CAKLYgeJ1tUuqLcsquwuFqjDXPSJpEiokrWK2gisPKDZLs8Y2TQ@mail.gmail.com/
Fixes: bc27d6f0aa ("btrfs: scan but don't register device on single device filesystem")
CC: stable@vger.kernel.org # 6.7+
Tested-by: Alex Romosan <aromosan@gmail.com>
Tested-by: CHECK_1234543212345@protonmail.com
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-18 19:16:50 +01:00
Linus Torvalds
0d7ca657df overlayfs fixes for 6.9-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIyBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmX4AWYACgkQEVvVyTe/
 1WpUXA/4r0qYfDU3M1NYtBhcRcXe44WvfdMxUPgc5IqffHof058FWC6cRhRnKB9U
 R1Wb09xIguX2hWsseyq1MWBuoZUq3nWUNRfTkld0uUX1I97ggV70YqRY2SSUWqCq
 eLMYwR6e5Ar1/noMkHYMLupJknTJsyOOFTXZrAvl5aZDCKvAQUQoVWY772B71/4j
 odH7wpgVX4Ah6U5Zk3anEdn5TVcvl4XXW4U1o4kJzUjIgxI9ZyrBpN6EmoSgpoQj
 zRfbQ6OrT2FNAPc/y3Dwn9fAoUZT2F3A/R4dVddBAPyLY96Qj+p0fv+y+0RY9efV
 dcf0bvoj4xXT+9KDVVp2rWszmFnr0niu+dE6bCkzW/lC2Uz/NIghjKcgwOGIjqpb
 9o8CzqGOF6B07A7SoVtcQlZR+7iUn+SUPEIk8JpvDkDLCLBOqd6rc58MnlMwJwrs
 OsMKH8IQKb6bGO0wfxmeyWj8637k9dtc0cF16hHhCxlRoextU3guNw0uBlta29QQ
 J+QCkFR6F2xQH2UbKEpOTsQo+2x385mEByXZdSUGkCM7ABb3cictrIMXyWJMNoWq
 un5tnJ5d/E3N+FKc5r9UWGHguwdYKiucvx7NQC6lK0y3b8aZMyVioiCVYX+lPKHB
 vdSp6F5vBrL6PHVsD9oDjiqLl8gQk7eIc4ne6zs/T9M5h8yHSg==
 =cMqg
 -----END PGP SIGNATURE-----

Merge tag 'ovl-fixes-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs

Pull overlayfs fixes from Amir Goldstein:
 "Only minor fixes:

   - Fix uncalled for WARN_ON from v6.8-rc1

   - Fix the overlayfs MAINTAINERS entry"

* tag 'ovl-fixes-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
  ovl: relax WARN_ON in ovl_verify_area()
  MAINTAINERS: update overlayfs git tree
2024-03-18 11:15:58 -07:00
Linus Torvalds
0a7b0acece vfs-6.9-rc1.fixes
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZfglxgAKCRCRxhvAZXjc
 ovK9APsF7/TMFhNbtW+JsghSyrEk0cOVPizi8JkRDDWNW3qY+wEAxtydhbmWpbKq
 MpIjMHqwjPx3zXBL8Ec/b4vAoJqpJwQ=
 =NgvO
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.9-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs fixes from Christian Brauner:
 "This contains a few small fixes for this merge window:

   - Undo the hiding of silly-rename files in afs. If they're hidden
     they can't be deleted by rm manually anymore causing regressions

   - Avoid caching the preferred address for an afs server to avoid
     accidently overriding an explicitly specified preferred server
     address

   - Fix bad stat() and rmdir() interaction in afs

   - Take a passive reference on the superblock when opening a block
     device so the holder is available to concurrent callers from the
     block layer

   - Clear private data pointer in fscache_begin_operation() to avoid it
     being falsely treated as valid"

* tag 'vfs-6.9-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  fscache: Fix error handling in fscache_begin_operation()
  fs,block: get holder during claim
  afs: Fix occasional rmdir-then-VNOVNODE with generic/011
  afs: Don't cache preferred address
  afs: Revert "afs: Hide silly-rename files from userspace"
2024-03-18 09:15:50 -07:00
Steven Rostedt (Google)
c759e60903 tracing: Remove __assign_str_len()
Now that __assign_str() gets the length from the __string() (and
__string_len()) macros, there's no reason to have a separate
__assign_str_len() macro as __assign_str() can get the length of the
string needed.

Also remove __assign_rel_str() although it had no users anyway.

Link: https://lore.kernel.org/linux-trace-kernel/20240223152206.0b650659@gandalf.local.home

Cc: Jeff Layton <jlayton@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-03-18 10:33:05 -04:00
Steven Rostedt (Google)
9388a2aa45 NFSD: Fix nfsd_clid_class use of __string_len() macro
I'm working on restructuring the __string* macros so that it doesn't need
to recalculate the string twice. That is, it will save it off when
processing __string() and the __assign_str() will not need to do the work
again as it currently does.

Currently __string_len(item, src, len) doesn't actually use "src", but my
changes will require src to be correct as that is where the __assign_str()
will get its value from.

The event class nfsd_clid_class has:

  __string_len(name, name, clp->cl_name.len)

But the second "name" does not exist and causes my changes to fail to
build. That second parameter should be: clp->cl_name.data.

Link: https://lore.kernel.org/linux-trace-kernel/20240222122828.3d8d213c@gandalf.local.home

Cc: Neil Brown <neilb@suse.de>
Cc: Olga Kornievskaia <kolga@netapp.com>
Cc: Dai Ngo <Dai.Ngo@oracle.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: stable@vger.kernel.org
Fixes: d27b74a867 ("NFSD: Use new __string_len C macros for nfsd_clid_class")
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-03-18 10:17:41 -04:00
David Howells
449ac55146
fscache: Fix error handling in fscache_begin_operation()
Fix fscache_begin_operation() to clear cres->cache_priv on error, otherwise
fscache_resources_valid() will report it as being valid.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/3933237.1710514106@warthog.procyon.org.uk
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reported-by: Marc Dionne <marc.dionne@auristor.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-18 10:33:48 +01:00
Christian Brauner
59a55a63c2
fs,block: get holder during claim
Now that we open block devices as files we need to deal with the
realities that closing is a deferred operation. An operation on the
block device such as e.g., freeze, thaw, or removal that runs
concurrently with umount, tries to acquire a stable reference on the
holder. The holder might already be gone though. Make that reliable by
grabbing a passive reference to the holder during bdev_open() and
releasing it during bdev_release().

Fixes: f3a608827d ("bdev: open block device as files") # mainline only
Reported-by: Christoph Hellwig <hch@infradead.org>
Link: https://lore.kernel.org/r/ZfEQQ9jZZVes0WCZ@infradead.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reported-by: https://lore.kernel.org/r/CAHj4cs8tbDwKRwfS1=DmooP73ysM__xAb2PQc6XsAmWR+VuYmg@mail.gmail.com
Link: https://lore.kernel.org/r/20240315-freibad-annehmbar-ca68c375af91@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-18 10:32:44 +01:00
Kent Overstreet
b38114dde0 bcachefs: ratelimit errors from async_btree_node_rewrite
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18 00:24:24 -04:00
Kent Overstreet
8d347a5545 bcachefs: Run check_topology() first
check_topology() doesn't actually require alloc info - and running it
first means other passes don't have to catch btree read errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18 00:24:24 -04:00
Kent Overstreet
3ed94062e3 bcachefs: Improve bch2_fatal_error()
error messages should always include __func__

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18 00:24:24 -04:00
Kent Overstreet
ec35b30481 bcachefs: Fix lost transaction restart error
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18 00:24:23 -04:00
Kent Overstreet
a586036841 bcachefs: Don't corrupt journal keys gap buffer when dropping alloc info
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17 21:17:38 -04:00
Kent Overstreet
f3589bfa7e bcachefs: fix for building in userspace
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17 20:53:12 -04:00
Kent Overstreet
1c31b83a4e bcachefs: bch2_snapshot_is_ancestor() now safe to call in early recovery
this fixes an assertion pop in
  bch2_check_snapshot_trees() ->
  check_snapshot_tree() ->
  bch2_snapshot_tree_master_subvol() ->
  bch2_snapshot_is_ancestor()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17 20:53:12 -04:00
Kent Overstreet
1ba6f48f09 bcachefs: Fix nested transaction restart handling in bch2_bucket_gens_init()
Nested transaction restart handling is typically best avoided; when the
inner context handles a transaction restart it invalidates the outer
transaction context, so we need to make sure to return a
transaction_restart_nested error.

This code wasn't doing that, and hit the assertion in
for_each_btree_key() that checks for that via trans->restart_count.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17 20:53:12 -04:00
Kent Overstreet
3becdd4850 bcachefs: Improve sysfs internal/btree_updates
Print out the function that launched the btree update.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17 20:53:12 -04:00
Kent Overstreet
a0a466ea98 bcachefs: Split out btree_node_rewrite_worker
This fixes a deadlock due to using btree_interior_update_worker for non
interior updates - async btree node rewrites were blocking, and then
blocking other interior updates.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17 20:53:12 -04:00
Kent Overstreet
37bb9c9572 bcachefs: Fix locking in bch2_alloc_write_key()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17 20:53:11 -04:00
Kent Overstreet
264b501f8f bcachefs: Avoid extent entry type assertions in .invalid()
After keys have passed bkey_ops.key_invalid we should never see invalid
extent entry types - but .key_invalid itself needs to cope with them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17 20:53:11 -04:00
Kent Overstreet
109ea419cf bcachefs: Fix spurious -BCH_ERR_transaction_restart_nested
We only need to return transaction_restart_nested when we're inside a
context that's handling transaction restarts.

Also, add a missing check_subdir_count() call.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17 20:53:11 -04:00
Kent Overstreet
3ff3475611 bcachefs: Fix check_key_has_snapshot() call
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17 20:53:11 -04:00
Kent Overstreet
62f35024b2 bcachefs: Change "accounting overran journal reservation" to a warning
This doesn't need to be a BUG_ON(); the actual serious "things break"
condition is if the whole journal write overruns the available space,
and that has a fatal error, not a BUG_ON(). This check indicates we
screwed something up, but it should be a warning.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17 20:53:11 -04:00