- Assorted tiny syzbot fixes
- Shutdown path fix: "bch2_btree_write_buffer_flush_going_ro()"
The shutdown path wasn't flushing the btree write buffer, leading to
shutting down while we still had operations in flight. This fixes a
whole slew of syzbot bugs, and undoubtedly other strange heisenbugs.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmc1ceQACgkQE6szbY3K
bna56w//V4On/z8q8r4vyj44NXfFNXL87QNB7e5BAP8lmow8S1KnqpHlpUq+MaJm
/LI1OPPaDJITokrQ4eUKyfRLQaH9PlDk7WafgzixnYmaXKVIraqGzWfISSbNhLdC
Hw/EjYdqmuQMAxTgj1upyx2UijvBRaT0dvjeaTX2OgQiqIlBUWGdUDFHWzc1sRgO
8PrJf9CYscdS+P+XrDBKLLCiKGn73Z93lSsFTwVzDictfrqUT8X5J5/a+ofCwp8W
sEiL/13bp05ABb3YjVrBupApyY6szC6YrKNwJ8RoHxBGo0zl5L/gd9FnBJX0T1wc
10dDMv4gZWf9mF2KePDH5EYBBV62N8qEwRAHrjkGu/87mNBRf6sUZ3fbTiMMmVds
nSTJUHH6t6uiwtezPPzDKtPYQBcGsP95DBdvL4un97jTlUtszHLU8OqVErojTYzZ
sZLilUUMz+CK/qfoJjUcSJwslevHeiri5Hto4jOvf8M7uH0Xx29LvmwqWLdZP2uT
RNjslFLvqzZoWGsbJEfe2YihJwJd9kkiE/0xQK60169bzoNEI6oWNmOK2Ts1Dr0X
tEkdHmWqNI65+nbjomGOa3u1wysrACuaRwu6fq0562+IvlDtVXLUWbmlS58+bqz2
KwUmZp4Y2OibeKFT+y0iqxpw9WBPQb7dwj9xwpwk3xgph1uVwbY=
=L6kv
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-11-13' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"This fixes one minor regression from the btree cache fixes (in the
scan_for_btree_nodes repair path) - and the shutdown path fix is the
big one here, in terms of bugs closed:
- Assorted tiny syzbot fixes
- Shutdown path fix: "bch2_btree_write_buffer_flush_going_ro()"
The shutdown path wasn't flushing the btree write buffer, leading
to shutting down while we still had operations in flight. This
fixes a whole slew of syzbot bugs, and undoubtedly other strange
heisenbugs.
* tag 'bcachefs-2024-11-13' of git://evilpiepirate.org/bcachefs:
bcachefs: Fix assertion pop in bch2_ptr_swab()
bcachefs: Fix journal_entry_dev_usage_to_text() overrun
bcachefs: Allow for unknown key types in backpointers fsck
bcachefs: Fix assertion pop in topology repair
bcachefs: Fix hidden btree errors when reading roots
bcachefs: Fix validate_bset() repair path
bcachefs: Fix missing validation for bch_backpointer.level
bcachefs: Fix bch_member.btree_bitmap_shift validation
bcachefs: bch2_btree_write_buffer_flush_going_ro()
singletons.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZzP1ZAAKCRDdBJ7gKXxA
jmBUAP9n2zTKoNeF/WpS0aSg+SpG78mtyMIwSUW2PPfGObYTBwD/bncG9U3fnno1
v6Sey0OjAKwGdV+gTd+5ymWJKPSQbgA=
=HxTA
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-11-12-16-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"10 hotfixes, 7 of which are cc:stable. 7 are MM, 3 are not. All
singletons"
* tag 'mm-hotfixes-stable-2024-11-12-16-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: swapfile: fix cluster reclaim work crash on rotational devices
selftests: hugetlb_dio: fixup check for initial conditions to skip in the start
mm/thp: fix deferred split queue not partially_mapped: fix
mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
nommu: pass NULL argument to vma_iter_prealloc()
ocfs2: fix UBSAN warning in ocfs2_verify_volume()
nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint
nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint
mm: page_alloc: move mlocked flag clearance into free_pages_prepare()
mm: count zeromap read and set for swapout and swapin
This runs on extents that haven't yet been validated, so we don't want
to assert that we have a valid entry type.
Reported-by: syzbot+4f29c3f12f864d8a8d17@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If the jset_entry_dev_usage is malformed, and too small, our nr_entries
calculation will be incorrect - just bail out.
Reported-by: syzbot+05d7520be047c9be86e0@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When events are reported to userland and prefer_busy_poll is set, irqs
are temporarily suspended using napi_suspend_irqs.
If no events are found and ep_poll would go to sleep, irq suspension is
cancelled using napi_resume_irqs.
Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Tested-by: Martin Karsten <mkarsten@uwaterloo.ca>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://patch.msgid.link/20241109050245.191288-5-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Setting prefer_busy_poll now leads to an effectively nonblocking
iteration though napi_busy_loop, even when busy_poll_usecs is 0.
Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca>
Co-developed-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Joe Damato <jdamato@fastly.com>
Tested-by: Joe Damato <jdamato@fastly.com>
Tested-by: Martin Karsten <mkarsten@uwaterloo.ca>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Link: https://patch.msgid.link/20241109050245.191288-4-jdamato@fastly.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When using the "block:block_dirty_buffer" tracepoint, mark_buffer_dirty()
may cause a NULL pointer dereference, or a general protection fault when
KASAN is enabled.
This happens because, since the tracepoint was added in
mark_buffer_dirty(), it references the dev_t member bh->b_bdev->bd_dev
regardless of whether the buffer head has a pointer to a block_device
structure.
In the current implementation, nilfs_grab_buffer(), which grabs a buffer
to read (or create) a block of metadata, including b-tree node blocks,
does not set the block device, but instead does so only if the buffer is
not in the "uptodate" state for each of its caller block reading
functions. However, if the uptodate flag is set on a folio/page, and the
buffer heads are detached from it by try_to_free_buffers(), and new buffer
heads are then attached by create_empty_buffers(), the uptodate flag may
be restored to each buffer without the block device being set to
bh->b_bdev, and mark_buffer_dirty() may be called later in that state,
resulting in the bug mentioned above.
Fix this issue by making nilfs_grab_buffer() always set the block device
of the super block structure to the buffer head, regardless of the state
of the buffer's uptodate flag.
Link: https://lkml.kernel.org/r/20241106160811.3316-3-konishi.ryusuke@gmail.com
Fixes: 5305cb8308 ("block: add block_{touch|dirty}_buffer tracepoint")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ubisectech Sirius <bugreport@valiantsec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "nilfs2: fix null-ptr-deref bugs on block tracepoints".
This series fixes null pointer dereference bugs that occur when using
nilfs2 and two block-related tracepoints.
This patch (of 2):
It has been reported that when using "block:block_touch_buffer"
tracepoint, touch_buffer() called from __nilfs_get_folio_block() causes a
NULL pointer dereference, or a general protection fault when KASAN is
enabled.
This happens because since the tracepoint was added in touch_buffer(), it
references the dev_t member bh->b_bdev->bd_dev regardless of whether the
buffer head has a pointer to a block_device structure. In the current
implementation, the block_device structure is set after the function
returns to the caller.
Here, touch_buffer() is used to mark the folio/page that owns the buffer
head as accessed, but the common search helper for folio/page used by the
caller function was optimized to mark the folio/page as accessed when it
was reimplemented a long time ago, eliminating the need to call
touch_buffer() here in the first place.
So this solves the issue by eliminating the touch_buffer() call itself.
Link: https://lkml.kernel.org/r/20241106160811.3316-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20241106160811.3316-2-konishi.ryusuke@gmail.com
Fixes: 5305cb8308 ("block: add block_{touch|dirty}_buffer tracepoint")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: Ubisectech Sirius <bugreport@valiantsec.com>
Closes: https://lkml.kernel.org/r/86bd3013-887e-4e38-960f-ca45c657f032.bugreport@valiantsec.com
Reported-by: syzbot+9982fb8d18eba905abe2@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9982fb8d18eba905abe2
Tested-by: syzbot+9982fb8d18eba905abe2@syzkaller.appspotmail.com
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We can't assume that btrees only contain keys of a given type - even if
they only have a single key type listed in the allowed key types for
that btree; this is a forwards compatibility issue.
Reported-by: syzbot+a27c3aaa3640dd3e1dfb@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Three affect DAMON. Lorenzo's five-patch series to address the
mmap_region error handling is here also.
Apart from that, various singletons.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZzBVmAAKCRDdBJ7gKXxA
ju42AQD0EEnzW+zFyI+E7x5FwCmLL6ofmzM8Sw9YrKjaeShdZgEAhcyS2Rc/AaJq
Uty2ZvVMDF2a9p9gqHfKKARBXEbN2w0=
=n+lO
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2024-11-09-22-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"20 hotfixes, 14 of which are cc:stable.
Three affect DAMON. Lorenzo's five-patch series to address the
mmap_region error handling is here also.
Apart from that, various singletons"
* tag 'mm-hotfixes-stable-2024-11-09-22-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mailmap: add entry for Thorsten Blum
ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove()
signal: restore the override_rlimit logic
fs/proc: fix compile warning about variable 'vmcore_mmap_ops'
ucounts: fix counter leak in inc_rlimit_get_ucounts()
selftests: hugetlb_dio: check for initial conditions to skip in the start
mm: fix docs for the kernel parameter ``thp_anon=``
mm/damon/core: avoid overflow in damon_feed_loop_next_input()
mm/damon/core: handle zero schemes apply interval
mm/damon/core: handle zero {aggregation,ops_update} intervals
mm/mlock: set the correct prev on failure
objpool: fix to make percpu slot allocation more robust
mm/page_alloc: keep track of free highatomic
mm: resolve faulty mmap_region() error path behaviour
mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
mm: refactor map_deny_write_exec()
mm: unconditionally close VMAs on error
mm: avoid unsafe VMA hook invocation when error arises on mmap hook
mm/thp: fix deferred split unqueue naming and locking
mm/thp: fix deferred split queue not partially_mapped
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmcuWXMACgkQiiy9cAdy
T1Eu8gv+LUAmrvvv8PDoLUT50QZb6aAY2SeulgTdeG8OzImXH5VUSjptRYwP46Dk
KNLh85A4C39w/guxm3FX2qjeesZZD5DDubSJNATLy75jorq7z+1uTNg8oUZGpvJS
airmcv/0mcDZqVayCmiT7wPyhUSYa+VTvHrkFpsI20BrlyDybe5HGps77iCOJ5K0
uTRgM6VNxkKx+Z5NietpDyaUl2A5b6Yx/9J8vMq4ytBfEcSGi+ndpZNvG7kKg8gQ
3i/ND4O2+eScwvYclVP5mJbF71LW0Z/ljS4mEVH5UuRgLH2Ji35B9xaDFDSixI3x
EHFwnAX0QeGHIlIuFhRDdtR2gFqREAJOYxkDxfo7PXO5gOXLWZXru9F7v6lWsydN
varqSseBBucHOLn8NylvgJWwqYs+sIKQycYKsX3ZUnQfejaUwfV2H/ADJzccjFF8
PUzVQFyOZtUK3fdkoqvULr/zvwninhtLJYLIsPcUgSPCcxGxMApvtkCaJVV3JGfB
2acZPdMu
=ZzcZ
-----END PGP SIGNATURE-----
Merge tag 'v6.12-rc6-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fix from Steve French:
"Fix net namespace refcount use after free issue"
* tag 'v6.12-rc6-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: Fix use-after-free of network namespace.
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmcsXSUACgkQiiy9cAdy
T1FyOgv+Ks1lfl+6D/G89zFl5XOtCm8njsedJu9y3jR7hzophX2osfmodACMVX6B
0VLu0jzquvUo18VNlL+wF7YFH+Mc6zrevEnjBay9Xa05YyRqK5c7qjpiWEgXPN7/
ROQfC2slCAFjymhw+9qY+PGZYg3x0fyGdJC/gBNSFnu2ufag367Li+0fTKQTXFwz
F24S5eI+M9OWNgMnMYoNt+77f0n0JkKbQznq9nTEvUsbTWZFSEfmVczfSY0ltdOH
RER9zoyTU3zbPuMZqK+Jb7c2247ahsLzDEBAUG0Wn77wSaiWXU5dmVD5bWsDTp25
5p9uLpkr3irDWwJGkCrkpm2Tva/50IHPEFQ4kllVlm6ffoao/dxBCwFf/MEvJXzI
OgU+HpXyZdq6NF1hcB4xUlcbHvGCa6pEcYkcM7PwLml+6SKIwEsEGpnJ23kxGR3+
MGYMCITatRuvZstfEDolNyrO2+gPMd3ODnLhfjfjT47Kh38e7yxrLr4cmxbPAA+s
EVdm2N08
=zTn6
-----END PGP SIGNATURE-----
Merge tag 'v6.12-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
"Four fixes, all also marked for stable:
- fix two potential use after free issues
- fix OOM issue with many simultaneous requests
- fix missing error check in RPC pipe handling"
* tag 'v6.12-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: check outstanding simultaneous SMB operations
ksmbd: fix slab-use-after-free in smb3_preauth_hash_rsp
ksmbd: fix slab-use-after-free in ksmbd_smb2_session_create
ksmbd: Fix the missing xa_store error check
We silence btree errors in btree_node_scan, since it's probing and
errors are expected: add a fake pass so that btree_node_scan is no
longer recovery pass 0, and we don't think we're in btree node scan when
reading btree roots.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When we truncate a bset (due to it extending past the end of the btree
node), we can't skip the rest of the validation for e.g. the packed
format (if it's the first bset in the node).
Reported-by: syzbot+4d722d3c539d77c7bc82@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmct+40ACgkQxWXV+ddt
WDvCtRAAp0rheEu14hpVvWE2//+6u9Gx7Wfjzbj0+o4zBRWdg7BigFxfeb6JsH/E
2TjuWdcoP/OMV9ghCBQAQxySAPtsxH7skkyNy2UcMk5byBIrNvhw9auP5GXXlrhK
jSKDD4yfOMb++8LhrLevgTrijNyjLqaKXruw9a1Pmc3gxpdNmnMEySsQaF62o2Sm
YC3jwi0KpNAhu2qyJ6TnPgd5zf3BTM0JAeuB019IZW4WoeRTOdcPe7S7gqqJwZ+e
lL0D2/lfIE1lKvLE266Fab4FAQiJV07rozYj25XHiDpqThCxnJVOZCEHasOQ1PRy
d6j3RmGPqJYAYfQL1L+FH2hsS1BVZfVyCV1V7A/cN+lAffBfnROnf13C3gJ15Nbx
3lTyjBPQQw2WpfdmeyF3ikbrjZ8AfahChQO+mMnLN7oAWdIwWX5MRB+cwfWTxzA/
P8upz6HSTpSwy8nXdq264q1KkyCjx0Wv+8iyU7LirN2fCcEchA12HAIaOBeHedgh
PrGZDqrkZccQQxAvU5H7hQv0hZkGK8qba381oYHO09g72VM6ysuBU7tGrPZrlZYB
CvYTCwNZ/lqI8ikrcHOyUO1SPR9SaaWej1mWgBJ69ZIfg+ZuMtOMl171DU4S/i2V
iYgYoN8eCqTQWdaX5kk+3LWmK8fSU7F/KSDtJtT1KxkaSwCacfY=
=TQzP
-----END PGP SIGNATURE-----
Merge tag 'for-6.12-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more one-liners that fix some user visible problems:
- use correct range when clearing qgroup reservations after COW
- properly reset freed delayed ref list head
- fix ro/rw subvolume mounts to be backward compatible with old and
new mount API"
* tag 'for-6.12-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix the length of reserved qgroup to free
btrfs: reinitialize delayed ref list after deleting it from the list
btrfs: fix per-subvolume RO/RW flags with new mount API
Some trivial syzbot fixes, two more serious btree fixes found by looping
single_devices.ktest small_nodes:
- Topology error on split after merge, where we accidentaly picked the
node being deleted for the pivot, resulting in an assertion pop
- New nodes being preallocated were left on the freedlist, unlocked,
resulting in them sometimes being accidentally freed: this dated from
pre-cycle detector, when we could leave them locked. This should have
resulted in more explosions and fireworks, but turned out to be
surprisingly hard to hit because the preallocated nodes were being
used right away.
the fix for this is bigger than we'd like - reworking btree list
handling was a bit invasive - but we've now got more assertions and
it's well tested.
- Also another mishandled transaction restart fix (in
btree_node_prefetch) - we're almost done with those.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmctRYoACgkQE6szbY3K
bnbkxRAArtqV9/qsKbSYAaa/+GaL7YdapuYbi/pmC9X96F9qbTdEJzW5rs66iiGE
zbkfFqo2I85nacSTk3b12E3QUXj+CEmSIWOPQtamYw/0AkmVsKepgGsXLazZ0rYi
X8UDVc6fuFkoO1aC/9V2NJEFG9QXIj8ru0m2kyUE9ZM6rgskugVN/ec9ipNQNZhY
4L8U7Z6Y9AX4vs/BeV3i6cLrTaMroUFYSM0hJalBJ24KZsZ1bWflC39C0dXSvy/O
gCmBCobZTT5aDEQai1kdyFr4GZZUgCJg4YEUDfyOdpPmhbcP4iwX/cJqJHXqxXVt
nMyLz5nLs0nYO791UlLHZuUUUe99nl+tC09b034n20peLnQwWW/obTrhn86SDDka
2eQv1Rk5C5i8r5b0k8UYjy5ogfiVlC/X1OwmLKkarKnC/wd0eFQI71Qq9s8KpXbo
VVASENYFV3hrIV8ZcxiqiJ18g6o7++jtTAmIfRljQrO6B8tU5g5uWCTZli+wciii
qWnt1k7P92er8lBzUnQGh9CEwLVbe9ZyBJv+fYVwTOxPES/TbJS7n5fb+1f1rF9w
j5llXVUiaLucXoCpBjEDflvhBTRQHEkKk3gJgy86NKgRjEjPhQT8D2dksT4kgyHb
RqgOSUN+oVqi/i+7RKf9x/jG4id0uvMH5xT7qiXTUiQXtUD+J9g=
=cn3u
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-11-07' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Some trivial syzbot fixes, two more serious btree fixes found by
looping single_devices.ktest small_nodes:
- Topology error on split after merge, where we accidentaly picked
the node being deleted for the pivot, resulting in an assertion pop
- New nodes being preallocated were left on the freedlist, unlocked,
resulting in them sometimes being accidentally freed: this dated
from pre-cycle detector, when we could leave them locked. This
should have resulted in more explosions and fireworks, but turned
out to be surprisingly hard to hit because the preallocated nodes
were being used right away.
The fix for this is bigger than we'd like - reworking btree list
handling was a bit invasive - but we've now got more assertions and
it's well tested.
- Also another mishandled transaction restart fix (in
btree_node_prefetch) - we're almost done with those"
* tag 'bcachefs-2024-11-07' of git://evilpiepirate.org/bcachefs:
bcachefs: Fix UAF in __promote_alloc() error path
bcachefs: Change OPT_STR max to be 1 less than the size of choices array
bcachefs: btree_cache.freeable list fixes
bcachefs: check the invalid parameter for perf test
bcachefs: add check NULL return of bio_kmalloc in journal_read_bucket
bcachefs: Ensure BCH_FS_may_go_rw is set before exiting recovery
bcachefs: Fix topology errors on split after merge
bcachefs: Ancient versions with bad bkey_formats are no longer supported
bcachefs: Fix error handling in bch2_btree_node_prefetch()
bcachefs: Fix null ptr deref in bucket_gen_get()
This fixes an assertion pop where we try to navigate to the target of
the backpointer, and the path level isn't what we expect.
Reported-by: syzbot+b17df21b4d370f2dc330@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The write buffer needs to be specifically flushed when going RO: keys in
the journal that haven't yet been moved to the write buffer don't have a
journal pin yet.
This fixes numerous syzbot bugs, all with symptoms of still doing writes
after we've got RO.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When build with !CONFIG_MMU, the variable 'vmcore_mmap_ops'
is defined but not used:
>> fs/proc/vmcore.c:458:42: warning: unused variable 'vmcore_mmap_ops'
458 | static const struct vm_operations_struct vmcore_mmap_ops = {
Fix this by only defining it when CONFIG_MMU is enabled.
Link: https://lkml.kernel.org/r/20241101034803.9298-1-xiqi2@huawei.com
Fixes: 9cb218131d ("vmcore: introduce remap_oldmem_pfn_range()")
Signed-off-by: Qi Xi <xiqi2@huawei.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/lkml/202410301936.GcE8yUos-lkp@intel.com/
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Wang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If we error in data_update_init() after adding to the rhashtable of
outstanding promotes, kfree_rcu() is required.
Reported-by: Reed Riley <reed@riley.engineer>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Change OPT_STR max value to be 1 less than the "ARRAY_SIZE" of "_choices"
array. As a result, remove -1 from (opt->max-1) in bch2_opt_to_text.
The "_choices" array is a null-terminated array, so computing the maximum
using "ARRAY_SIZE" without subtracting 1 yields an incorrect result. Since
bch2_opt_validate don't subtract 1, as bch2_opt_to_text does, values
bigger than the actual maximum would pass through option validation.
Reported-by: syzbot+bee87a0c3291c06aa8c6@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=bee87a0c3291c06aa8c6
Fixes: 63c4b25453 ("bcachefs: Better superblock opt validation")
Suggested-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Piotr Zalewski <pZ010001011111@proton.me>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When allocating new btree nodes, we were leaving them on the freeable
list - unlocked - allowing them to be reclaimed: ouch.
Additionally, bch2_btree_node_free_never_used() ->
bch2_btree_node_hash_remove was putting it on the freelist, while
bch2_btree_node_free_never_used() was putting it back on the btree
update reserve list - ouch.
Originally, the code was written to always keep btree nodes on a list -
live or freeable - and this worked when new nodes were kept locked.
But now with the cycle detector, we can't keep nodes locked that aren't
tracked by the cycle detector; and this is fine as long as they're not
reachable.
We also have better and more robust leak detection now, with memory
allocation profiling, so the original justification no longer applies.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The perf_test does not check the number of iterations and threads
when it is zero. If nr_thread is 0, the perf test will keep
waiting for wakekup. If iteration is 0, it will cause exception
of division by zero. This can be reproduced by:
echo "rand_insert 0 1" > /sys/fs/bcachefs/${uuid}/perf_test
or
echo "rand_insert 1 0" > /sys/fs/bcachefs/${uuid}/perf_test
Fixes: 1c6fdbd8f2 ("bcachefs: Initial commit")
Signed-off-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bio_kmalloc may return NULL, will cause NULL pointer dereference.
Add check NULL return for bio_kmalloc in journal_read_bucket.
Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn>
Fixes: ac10a9611d ("bcachefs: Some fixes for building in userspace")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If BCH_FS_may_go_rw is not yet set, it indicates to the transaction
commit path that updates should be done via the list of journal replay
keys.
This must be set before multithreaded use commences.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If a btree split picks a pivot that's being deleted by a btree node
merge, we're going to have problems.
Fix this by checking if the pivot is being deleted, the same as we check
for deletions in journal replay keys.
Found by single_devic.ktest small_nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Syzbot found an assertion pop, by generating an ancient filesystem
version with an invalid bkey_format (with fields that can overflow) as
well as packed keys that aren't representable unpacked.
This breaks key comparisons in all sorts of painful ways.
Filesystems have been automatically rewriting nodes with such invalid
formats for years; we can safely drop support for them.
Reported-by: syzbot+8a0109511de9d4b61217@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bucket_gen() checks if we're lookup up a valid bucket and returns NULL
otherwise, but bucket_gen_get() was failing to check; other callers were
correct.
Also do a bit of cleanup on callers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
seq_printf is costy, on a system with n CPUs, reading /proc/softirqs
would yield 10*n decimal values, and the extra cost parsing format string
grows linearly with number of cpus. Replace seq_printf with
seq_put_decimal_ull_width have significant performance improvement.
On an 8CPUs system, reading /proc/softirqs show ~40% performance
gain with this patch.
Signed-off-by: David Wang <00107082@163.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I noticed that recently, simple operations like "make" started
failing on NFSv3 mounts of ext4 exports. Network capture shows that
READDIRPLUS operated correctly but READDIR failed with
NFS3ERR_INVAL. The vfs_llseek() call returned EINVAL when it is
passed a non-zero starting directory cookie.
I bisected to commit c689bdd3bf ("nfsd: further centralize
protocol version checks.").
Turns out that nfsd3_proc_readdir() does not call fh_verify() before
it calls nfsd_readdir(), so the new fhp->fh_64bit_cookies boolean is
not set properly. This leaves the NFSD_MAY_64BIT_COOKIE unset when
the directory is opened.
For ext4, this causes the wrong "max file size" value to be used
when sanity checking the incoming directory cookie (which is a seek
offset value).
The fhp->fh_64bit_cookies boolean is /always/ properly initialized
after nfsd_open() returns. There doesn't seem to be a reason for the
generic NFSD open helper to handle the f_mode fix-up for
directories, so just move that to the one caller that tries to open
an S_IFDIR with NFSD_MAY_64BIT_COOKIE.
Suggested-by: NeilBrown <neilb@suse.de>
Fixes: c689bdd3bf ("nfsd: further centralize protocol version checks.")
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The dealloc flag may be cleared and the extent won't reach the disk in
cow_file_range when errors path. The reserved qgroup space is freed in
commit 30479f31d4 ("btrfs: fix qgroup reserve leaks in
cow_file_range"). However, the length of untouched region to free needs
to be adjusted with the correct remaining region size.
Fixes: 30479f31d4 ("btrfs: fix qgroup reserve leaks in cow_file_range")
CC: stable@vger.kernel.org # 6.11+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Haisu Wang <haisuwang@tencent.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
At insert_delayed_ref() if we need to update the action of an existing
ref to BTRFS_DROP_DELAYED_REF, we delete the ref from its ref head's
ref_add_list using list_del(), which leaves the ref's add_list member
not reinitialized, as list_del() sets the next and prev members of the
list to LIST_POISON1 and LIST_POISON2, respectively.
If later we end up calling drop_delayed_ref() against the ref, which can
happen during merging or when destroying delayed refs due to a transaction
abort, we can trigger a crash since at drop_delayed_ref() we call
list_empty() against the ref's add_list, which returns false since
the list was not reinitialized after the list_del() and as a consequence
we call list_del() again at drop_delayed_ref(). This results in an
invalid list access since the next and prev members are set to poison
pointers, resulting in a splat if CONFIG_LIST_HARDENED and
CONFIG_DEBUG_LIST are set or invalid poison pointer dereferences
otherwise.
So fix this by deleting from the list with list_del_init() instead.
Fixes: 1d57ee9416 ("btrfs: improve delayed refs iterations")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
With util-linux 2.40.2, the 'mount' utility is already utilizing the new
mount API. e.g:
# strace mount -o subvol=subv1,ro /dev/test/scratch1 /mnt/test/
...
fsconfig(3, FSCONFIG_SET_STRING, "source", "/dev/mapper/test-scratch1", 0) = 0
fsconfig(3, FSCONFIG_SET_STRING, "subvol", "subv1", 0) = 0
fsconfig(3, FSCONFIG_SET_FLAG, "ro", NULL, 0) = 0
fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
fsmount(3, FSMOUNT_CLOEXEC, 0) = 4
mount_setattr(4, "", AT_EMPTY_PATH, {attr_set=MOUNT_ATTR_RDONLY, attr_clr=0, propagation=0 /* MS_??? */, userns_fd=0}, 32) = 0
move_mount(4, "", AT_FDCWD, "/mnt/test", MOVE_MOUNT_F_EMPTY_PATH) = 0
But this leads to a new problem, that per-subvolume RO/RW mount no
longer works, if the initial mount is RO:
# mount -o subvol=subv1,ro /dev/test/scratch1 /mnt/test
# mount -o rw,subvol=subv2 /dev/test/scratch1 /mnt/scratch
# mount | grep mnt
/dev/mapper/test-scratch1 on /mnt/test type btrfs (ro,relatime,discard=async,space_cache=v2,subvolid=256,subvol=/subv1)
/dev/mapper/test-scratch1 on /mnt/scratch type btrfs (ro,relatime,discard=async,space_cache=v2,subvolid=257,subvol=/subv2)
# touch /mnt/scratch/foobar
touch: cannot touch '/mnt/scratch/foobar': Read-only file system
This is a common use cases on distros.
[CAUSE]
We have a workaround for remount to handle the RO->RW change, but if the
mount is using the new mount API, we do not do that, and rely on the
mount tool NOT to set the ro flag.
But that's not how the mount tool is doing for the new API:
fsconfig(3, FSCONFIG_SET_STRING, "source", "/dev/mapper/test-scratch1", 0) = 0
fsconfig(3, FSCONFIG_SET_STRING, "subvol", "subv1", 0) = 0
fsconfig(3, FSCONFIG_SET_FLAG, "ro", NULL, 0) = 0 <<<< Setting RO flag for super block
fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
fsmount(3, FSMOUNT_CLOEXEC, 0) = 4
mount_setattr(4, "", AT_EMPTY_PATH, {attr_set=MOUNT_ATTR_RDONLY, attr_clr=0, propagation=0 /* MS_??? */, userns_fd=0}, 32) = 0
move_mount(4, "", AT_FDCWD, "/mnt/test", MOVE_MOUNT_F_EMPTY_PATH) = 0
This means we will set the super block RO at the first mount.
Later RW mount will not try to reconfigure the fs to RW because the
mount tool is already using the new API.
This totally breaks the per-subvolume RO/RW mount behavior.
[FIX]
Do not skip the reconfiguration even if using the new API. The old
comments are just expecting any mount tool to properly skip the RO flag
set even if we specify "ro", which is not the reality.
Update the comments regarding the backward compatibility on the kernel
level so it works with old and new mount utilities.
CC: stable@vger.kernel.org # 6.8+
Fixes: f044b31867 ("btrfs: handle the ro->rw transition for mounting different subvolumes")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable Fixes:
* Fix KMSAN warning in decode_getfattr_attrs()
Other Bugfixes:
* Handle -ENOTCONN in xs_tcp_setup_socked()
* NFSv3: only use NFS timeout for MOUNT when protocols are compatible
* Fix attribute delegation behavior on exclusive create and a/mtime changes
* Fix localio to cope with racing nfs_local_probe()
* Avoid i_lock contention in fs_clear_invalid_mapping()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmcr1HUACgkQ18tUv7Cl
QOtdYBAA0YohWDHflcHPbltJu0UyCyDDtowvpVacSDJwZwVEXnLQRTTqrdUnWVxx
Bc2Ae8tGsfcwo10yZ6LUIPjcyEqLQeYvKoKv2Awf0j7eubjRYZrQVypIKtmy8aC2
H5ETCyrbIubE06jX8EPO8LFxQ+T6nGD7kC8qJZL8z/aNVXGA2nRRCi7AzdE4o6Ht
0t6fC+W5vxJ4hQHYKb59nGvREMwpKSLg2U4wo1lyFvkDxEJ06DobGOKEtD333cI8
Mou/1UlSZ6RzgfwJNIPMMpCepIp2spaDeet0XVN+zqzxg55Jmk7LqpxP5pswTjLb
WsxErV9ZRXtwutCCf+IDoMCv/YS4g4ZG7CLKXQ4felKJVYIuiS4z0n659xRqLyyi
nW71vrRUdOBE3rCXUW6crZYwX/fHDvl6bsq9/h7cy2ZPnbGkVvXx+LIm0dJRenfb
MaxVM3CyrMnzL3UUk/caK/rVCOHrDD5q/dAtSNfizMWnqoX+gXby3ho6Zwn0Wj89
NiUZJIRI/s4V1WzMw4g+Daz7LUUwGblODTtphH2nnKRDfTiYXeT/r/waU6zUOVcS
7Jd285DF/tkQp2SJ3nvsM/ni7TD2UuG2BsKA3Urlht9i32lwyENeS3nNcx6aHo3i
blNpD+9mp3vZfWWZNVvLM/JldcIqEvd30+P6GWwS/Td8Zz4PYIM=
=9mwu
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.12-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"These are mostly fixes that came up during the nfs bakeathon the other
week.
Stable Fixes:
- Fix KMSAN warning in decode_getfattr_attrs()
Other Bugfixes:
- Handle -ENOTCONN in xs_tcp_setup_socked()
- NFSv3: only use NFS timeout for MOUNT when protocols are compatible
- Fix attribute delegation behavior on exclusive create and a/mtime
changes
- Fix localio to cope with racing nfs_local_probe()
- Avoid i_lock contention in fs_clear_invalid_mapping()"
* tag 'nfs-for-6.12-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
nfs: avoid i_lock contention in nfs_clear_invalid_mapping
nfs_common: fix localio to cope with racing nfs_local_probe()
NFS: Further fixes to attribute delegation a/mtime changes
NFS: Fix attribute delegation behaviour on exclusive create
nfs: Fix KMSAN warning in decode_getfattr_attrs()
NFSv3: only use NFS timeout for MOUNT when protocols are compatible
sunrpc: handle -ENOTCONN in xs_tcp_setup_socket()
The commit 78ff640819 ("vfs: Convert tracefs to use the new mount API")
broke the gid setting when set by fstab or other mount utility.
It is ignored when it is set. Fix the code so that it recognises the
option again and will honor the settings on mount at boot up.
Update the internal documentation and create a selftest to make sure
it doesn't break again in the future.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZyuidRQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qsgQAQDuV0x4RLpCrrowDS/ITQw/eb/WjhR7
lhkXVROLN6RK6wD+JWmbaCP82q2S4A2Vx0Rjc72gUMmTzDb1HQflhQiLhwU=
=0dZF
-----END PGP SIGNATURE-----
Merge tag 'tracefs-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracefs fixes from Steven Rostedt:
"Fix tracefs mount options.
Commit 78ff640819 ("vfs: Convert tracefs to use the new mount API")
broke the gid setting when set by fstab or other mount utility. It is
ignored when it is set. Fix the code so that it recognises the option
again and will honor the settings on mount at boot up.
Update the internal documentation and create a selftest to make sure
it doesn't break again in the future"
* tag 'tracefs-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/selftests: Add tracefs mount options test
tracing: Document tracefs gid mount option
tracing: Fix tracefs mount options
If Client send simultaneous SMB operations to ksmbd, It exhausts too much
memory through the "ksmbd_work_cache”. It will cause OOM issue.
ksmbd has a credit mechanism but it can't handle this problem. This patch
add the check if it exceeds max credits to prevent this problem by assuming
that one smb request consumes at least one credit.
Cc: stable@vger.kernel.org # v5.15+
Reported-by: Norbert Szetei <norbert@doyensec.com>
Tested-by: Norbert Szetei <norbert@doyensec.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
ksmbd_user_session_put should be called under smb3_preauth_hash_rsp().
It will avoid freeing session before calling smb3_preauth_hash_rsp().
Cc: stable@vger.kernel.org # v5.15+
Reported-by: Norbert Szetei <norbert@doyensec.com>
Tested-by: Norbert Szetei <norbert@doyensec.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
There is a race condition between ksmbd_smb2_session_create and
ksmbd_expire_session. This patch add missing sessions_table_lock
while adding/deleting session from global session table.
Cc: stable@vger.kernel.org # v5.15+
Reported-by: Norbert Szetei <norbert@doyensec.com>
Tested-by: Norbert Szetei <norbert@doyensec.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Multi-threaded buffered reads to the same file exposed significant
inode spinlock contention in nfs_clear_invalid_mapping().
Eliminate this spinlock contention by checking flags without locking,
instead using smp_rmb and smp_load_acquire accordingly, but then take
spinlock and double-check these inode flags.
Also refactor nfs_set_cache_invalid() slightly to use
smp_store_release() to pair with nfs_clear_invalid_mapping()'s
smp_load_acquire().
While this fix is beneficial for all multi-threaded buffered reads
issued by an NFS client, this issue was identified in the context of
surprisingly low LOCALIO performance with 4K multi-threaded buffered
read IO. This fix dramatically speeds up LOCALIO performance:
before: read: IOPS=1583k, BW=6182MiB/s (6482MB/s)(121GiB/20002msec)
after: read: IOPS=3046k, BW=11.6GiB/s (12.5GB/s)(232GiB/20001msec)
Fixes: 17dfeb9113 ("NFS: Fix races in nfs_revalidate_mapping")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Fix the possibility of racing nfs_local_probe() resulting in:
list_add double add: new=ffff8b99707f9f58, prev=ffff8b99707f9f58, next=ffffffffc0f30000.
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:35!
Add nfs_uuid_init() to properly initialize all nfs_uuid_t members
(particularly its list_head).
Switch to returning bool from nfs_uuid_begin(), returns false if
nfs_uuid_t is already in-use (its list_head is on a list). Update
nfs_local_probe() to return early if the nfs_client's cl_uuid
(nfs_uuid_t) is in-use.
Also, switch nfs_uuid_begin() from using list_add_tail_rcu() to
list_add_tail() -- rculist was used in an earlier version of the
localio code that had a lockless nfs_uuid_lookup interface.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
When asked to set both an atime and an mtime to the current system time,
ensure that the setting is atomic by calling inode_update_timestamps()
only once with the appropriate flags.
Fixes: e12912d941 ("NFSv4: Add support for delegated atime and mtime attributes")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
When the client does an exclusive create and the server decides to store
the verifier in the timestamps, a SETATTR is subsequently sent to fix up
those timestamps. When that is the case, suppress the exceptions for
attribute delegations in nfs4_bitmap_copy_adjust().
Fixes: 32215c1f89 ("NFSv4: Don't request atime/mtime/size if they are delegated to us")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>