Commit Graph

1324010 Commits

Author SHA1 Message Date
Carlos Maiolino
bf354410af xfs: bug fixes for 6.13 [01/12]
Bug fixes for 6.13.
 
 This has been running on the djcloud for months with no problems.  Enjoy!
 
 Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZ1uRzwAKCRBKO3ySh0YR
 poYjAP0YxGr59OFEmdu9fZzLRQoARjchlqMmYiMOokbXxqGfhgD/Wo7Er+Dpj4KE
 jIvDWUy8anoKuE2pvcRVBYyYaPoTNgY=
 =i+9a
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.13-fixes_2024-12-12' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into next-rc

xfs: bug fixes for 6.13 [01/12]

Bug fixes for 6.13.

This has been running on the djcloud for months with no problems.  Enjoy!

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-12-13 07:47:12 +01:00
Darrick J. Wong
12f2930f5f xfs: port xfs_ioc_start_commit to multigrain timestamps
Take advantage of the multigrain timestamp APIs to ensure that nobody
can sneak in and write things to a file between starting a file update
operation and committing the results.  This should have been part of the
multigrain timestamp merge, but I forgot to fling it at jlayton when he
resubmitted the patchset due to developer bandwidth problems.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: 4e40eff0b5 ("fs: add infrastructure for multigrain timestamps")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2024-12-12 17:45:13 -08:00
Darrick J. Wong
7f8b718c58 xfs: return from xfs_symlink_verify early on V4 filesystems
V4 symlink blocks didn't have headers, so return early if this is a V4
filesystem.

Cc: <stable@vger.kernel.org> # v5.1
Fixes: 39708c20ab ("xfs: miscellaneous verifier magic value fixups")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:13 -08:00
Darrick J. Wong
c004a793e0 xfs: fix zero byte checking in the superblock scrubber
The logic to check that the region past the end of the superblock is all
zeroes is wrong -- we don't want to check only the bytes past the end of
the maximally sized ondisk superblock structure as currently defined in
xfs_format.h; we want to check the bytes beyond the end of the ondisk as
defined by the feature bits.

Port the superblock size logic from xfs_repair and then put it to use in
xfs_scrub.

Cc: <stable@vger.kernel.org> # v4.15
Fixes: 21fb4cb198 ("xfs: scrub the secondary superblocks")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:13 -08:00
Darrick J. Wong
06b20ef09b xfs: check pre-metadir fields correctly
The checks that were added to the superblock scrubber for metadata
directories aren't quite right -- the old inode pointers are now defined
to be zeroes until someone else reuses them.  Also consolidate the new
metadir field checks to one place; they were inexplicably scattered
around.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: 28d756d4d5 ("xfs: update sb field checks when metadir is turned on")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:12 -08:00
Darrick J. Wong
e57e083be9 xfs: don't crash on corrupt /quotas dirent
If the /quotas dirent points to an inode but the inode isn't loadable
(and hence mkdir returns -EEXIST), don't crash, just bail out.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: e80fbe1ad8 ("xfs: use metadir for quota inodes")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:12 -08:00
Darrick J. Wong
3853b5e1d7 xfs: don't move nondir/nonreg temporary repair files to the metadir namespace
Only directories or regular files are allowed in the metadata directory
tree.  Don't move the repair tempfile to the metadir namespace if this
is not true; this will cause the inode verifiers to trip.

xrep_tempfile_adjust_directory_tree opportunistically moves sc->tempip
from the regular directory tree to the metadata directory tree if sc->ip
is part of the metadata directory tree.  However, the scrub setup
functions grab sc->ip and create sc->tempip before we actually get
around to checking if the file mode is the right type for the scrubber.

IOWs, you can invoke the symlink scrubber with the file handle of a
subdirectory in the metadir.  xrep_setup_symlink will create a temporary
symlink file, xrep_tempfile_adjust_directory_tree will foolishly try to
set the METADATA flag on the temp symlink, which trips the inode
verifier in the inode item precommit, which shuts down the filesystem
when expensive checks are turned on.  If they're /not/ turned on, then
xchk_symlink will return ENOENT when it sees that it's been passed a
symlink, but the invalid inode could still get flushed to disk.  We
don't want that.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: 9dc31acb01 ("xfs: move repair temporary files to the metadata directory tree")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:12 -08:00
Darrick J. Wong
7f8a44f372 xfs: fix sb_spino_align checks for large fsblock sizes
For a sparse inodes filesystem, mkfs.xfs computes the values of
sb_spino_align and sb_inoalignmt with the following code:

	int     cluster_size = XFS_INODE_BIG_CLUSTER_SIZE;

	if (cfg->sb_feat.crcs_enabled)
		cluster_size *= cfg->inodesize / XFS_DINODE_MIN_SIZE;

	sbp->sb_spino_align = cluster_size >> cfg->blocklog;
	sbp->sb_inoalignmt = XFS_INODES_PER_CHUNK *
			cfg->inodesize >> cfg->blocklog;

On a V5 filesystem with 64k fsblocks and 512 byte inodes, this results
in cluster_size = 8192 * (512 / 256) = 16384.  As a result,
sb_spino_align and sb_inoalignmt are both set to zero.  Unfortunately,
this trips the new sb_spino_align check that was just added to
xfs_validate_sb_common, and the mkfs fails:

# mkfs.xfs -f -b size=64k, /dev/sda
meta-data=/dev/sda               isize=512    agcount=4, agsize=81136 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=1
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
         =                       exchange=0   metadir=0
data     =                       bsize=65536  blocks=324544, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=65536  ascii-ci=0, ftype=1, parent=0
log      =internal log           bsize=65536  blocks=5006, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=65536  blocks=0, rtextents=0
         =                       rgcount=0    rgsize=0 extents
Discarding blocks...Sparse inode alignment (0) is invalid.
Metadata corruption detected at 0x560ac5a80bbe, xfs_sb block 0x0/0x200
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1
mkfs.xfs: Releasing dirty buffer to free list!
found dirty buffer (bulk) on free list!
Sparse inode alignment (0) is invalid.
Metadata corruption detected at 0x560ac5a80bbe, xfs_sb block 0x0/0x200
libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1
mkfs.xfs: writing AG headers failed, err=22

Prior to commit 59e43f5479 this all worked fine, even if "sparse"
inodes are somewhat meaningless when everything fits in a single
fsblock.  Adjust the checks to handle existing filesystems.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: 59e43f5479 ("xfs: sb_spino_align is not verified")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:12 -08:00
Darrick J. Wong
ca378189fd xfs: convert quotacheck to attach dquot buffers
Now that we've converted the dquot logging machinery to attach the dquot
buffer to the li_buf pointer so that the AIL dqflush doesn't have to
allocate or read buffers in a reclaim path, do the same for the
quotacheck code so that the reclaim shrinker dqflush call doesn't have
to do that either.

Cc: <stable@vger.kernel.org> # v6.12
Fixes: 903edea6c5 ("mm: warn about illegal __GFP_NOFAIL usage in a more appropriate location and manner")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:12 -08:00
Darrick J. Wong
acc8f8628c xfs: attach dquot buffer to dquot log item buffer
Ever since 6.12-rc1, I've observed a pile of warnings from the kernel
when running fstests with quotas enabled:

WARNING: CPU: 1 PID: 458580 at mm/page_alloc.c:4221 __alloc_pages_noprof+0xc9c/0xf18
CPU: 1 UID: 0 PID: 458580 Comm: xfsaild/sda3 Tainted: G        W          6.12.0-rc6-djwa #rc6 6ee3e0e531f6457e2d26aa008a3b65ff184b377c
<snip>
Call trace:
 __alloc_pages_noprof+0xc9c/0xf18
 alloc_pages_mpol_noprof+0x94/0x240
 alloc_pages_noprof+0x68/0xf8
 new_slab+0x3e0/0x568
 ___slab_alloc+0x5a0/0xb88
 __slab_alloc.constprop.0+0x7c/0xf8
 __kmalloc_noprof+0x404/0x4d0
 xfs_buf_get_map+0x594/0xde0 [xfs 384cb02810558b4c490343c164e9407332118f88]
 xfs_buf_read_map+0x64/0x2e0 [xfs 384cb02810558b4c490343c164e9407332118f88]
 xfs_trans_read_buf_map+0x1dc/0x518 [xfs 384cb02810558b4c490343c164e9407332118f88]
 xfs_qm_dqflush+0xac/0x468 [xfs 384cb02810558b4c490343c164e9407332118f88]
 xfs_qm_dquot_logitem_push+0xe4/0x148 [xfs 384cb02810558b4c490343c164e9407332118f88]
 xfsaild+0x3f4/0xde8 [xfs 384cb02810558b4c490343c164e9407332118f88]
 kthread+0x110/0x128
 ret_from_fork+0x10/0x20
---[ end trace 0000000000000000 ]---

This corresponds to the line:

	WARN_ON_ONCE(current->flags & PF_MEMALLOC);

within the NOFAIL checks.  What's happening here is that the XFS AIL is
trying to write a disk quota update back into the filesystem, but for
that it needs to read the ondisk buffer for the dquot.  The buffer is
not in memory anymore, probably because it was evicted.  Regardless, the
buffer cache tries to allocate a new buffer, but those allocations are
NOFAIL.  The AIL thread has marked itself PF_MEMALLOC (aka noreclaim)
since commit 43ff2122e6 ("xfs: on-stack delayed write buffer lists")
presumably because reclaim can push on XFS to push on the AIL.

An easy way to fix this probably would have been to drop the NOFAIL flag
from the xfs_buf allocation and open code a retry loop, but then there's
still the problem that for bs>ps filesystems, the buffer itself could
require up to 64k worth of pages.

Inode items had similar behavior (multi-page cluster buffers that we
don't want to allocate in the AIL) which we solved by making transaction
precommit attach the inode cluster buffers to the dirty log item.  Let's
solve the dquot problem in the same way.

So: Make a real precommit handler to read the dquot buffer and attach it
to the log item; pass it to dqflush in the push method; and have the
iodone function detach the buffer once we've flushed everything.  Add a
state flag to the log item to track when a thread has entered the
precommit -> push mechanism to skip the detaching if it turns out that
the dquot is very busy, as we don't hold the dquot lock between log item
commit and AIL push).

Reading and attaching the dquot buffer in the precommit hook is inspired
by the work done for inode cluster buffers some time ago.

Cc: <stable@vger.kernel.org> # v6.12
Fixes: 903edea6c5 ("mm: warn about illegal __GFP_NOFAIL usage in a more appropriate location and manner")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:11 -08:00
Darrick J. Wong
ec88b41b93 xfs: clean up log item accesses in xfs_qm_dqflush{,_done}
Clean up these functions a little bit before we move on to the real
modifications, and make the variable naming consistent for dquot log
items.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:11 -08:00
Darrick J. Wong
a40fe30868 xfs: separate dquot buffer reads from xfs_dqflush
The first step towards holding the dquot buffer in the li_buf instead of
reading it in the AIL is to separate the part that reads the buffer from
the actual flush code.  There should be no functional changes.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:11 -08:00
Darrick J. Wong
07137e925f xfs: don't lose solo dquot update transactions
Quota counter updates are tracked via incore objects which hang off the
xfs_trans object.  These changes are then turned into dirty log items in
xfs_trans_apply_dquot_deltas just prior to commiting the log items to
the CIL.

However, updating the incore deltas do not cause XFS_TRANS_DIRTY to be
set on the transaction.  In other words, a pure quota counter update
will be silently discarded if there are no other dirty log items
attached to the transaction.

This is currently not the case anywhere in the filesystem because quota
updates always dirty at least one other metadata item, but a subsequent
bug fix will add dquot log item precommits, so we actually need a dirty
dquot log item prior to xfs_trans_run_precommits.  Also let's not leave
a logic bomb.

Cc: <stable@vger.kernel.org> # v2.6.35
Fixes: 0924378a68 ("xfs: split out iclog writing from xfs_trans_commit()")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:11 -08:00
Darrick J. Wong
3762113b59 xfs: don't lose solo superblock counter update transactions
Superblock counter updates are tracked via per-transaction counters in
the xfs_trans object.  These changes are then turned into dirty log
items in xfs_trans_apply_sb_deltas just prior to commiting the log items
to the CIL.

However, updating the per-transaction counter deltas do not cause
XFS_TRANS_DIRTY to be set on the transaction.  In other words, a pure sb
counter update will be silently discarded if there are no other dirty
log items attached to the transaction.

This is currently not the case anywhere in the filesystem because sb
counter updates always dirty at least one other metadata item, but let's
not leave a logic bomb.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:11 -08:00
Darrick J. Wong
a004afdc62 xfs: avoid nested calls to __xfs_trans_commit
Currently, __xfs_trans_commit calls xfs_defer_finish_noroll, which calls
__xfs_trans_commit again on the same transaction.  In other words,
there's a nested function call (albeit with slightly different
arguments) that has caused minor amounts of confusion in the past.
There's no reason to keep this around, since there's only one place
where we actually want the xfs_defer_finish_noroll, and that is in the
top level xfs_trans_commit call.

This also reduces stack usage a little bit.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:11 -08:00
Darrick J. Wong
44d9b07e52 xfs: only run precommits once per transaction object
Committing a transaction tx0 with a defer ops chain of (A, B, C)
creates a chain of transactions that looks like this:

tx0 -> txA -> txB -> txC

Prior to commit cb04211748, __xfs_trans_commit would run precommits
on tx0, then call xfs_defer_finish_noroll to convert A-C to tx[A-C].
Unfortunately, after the finish_noroll loop we forgot to run precommits
on txC.  That was fixed by adding the second precommit call.

Unfortunately, none of us remembered that xfs_defer_finish_noroll
calls __xfs_trans_commit a second time to commit tx0 before finishing
work A in txA and committing that.  In other words, we run precommits
twice on tx0:

xfs_trans_commit(tx0)
    __xfs_trans_commit(tx0, false)
        xfs_trans_run_precommits(tx0)
        xfs_defer_finish_noroll(tx0)
            xfs_trans_roll(tx0)
                txA = xfs_trans_dup(tx0)
                __xfs_trans_commit(tx0, true)
                xfs_trans_run_precommits(tx0)

This currently isn't an issue because the inode item precommit is
idempotent; the iunlink item precommit deletes itself so it can't be
called again; and the buffer/dquot item precommits only check the incore
objects for corruption.  However, it doesn't make sense to run
precommits twice.

Fix this situation by only running precommits after finish_noroll.

Cc: <stable@vger.kernel.org> # v6.4
Fixes: cb04211748 ("xfs: defered work could create precommits")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:10 -08:00
Darrick J. Wong
53b001a21c xfs: unlock inodes when erroring out of xfs_trans_alloc_dir
Debugging a filesystem patch with generic/475 caused the system to hang
after observing the following sequences in dmesg:

 XFS (dm-0): metadata I/O error in "xfs_imap_to_bp+0x61/0xe0 [xfs]" at daddr 0x491520 len 32 error 5
 XFS (dm-0): metadata I/O error in "xfs_btree_read_buf_block+0xba/0x160 [xfs]" at daddr 0x3445608 len 8 error 5
 XFS (dm-0): metadata I/O error in "xfs_imap_to_bp+0x61/0xe0 [xfs]" at daddr 0x138e1c0 len 32 error 5
 XFS (dm-0): log I/O error -5
 XFS (dm-0): Metadata I/O Error (0x1) detected at xfs_trans_read_buf_map+0x1ea/0x4b0 [xfs] (fs/xfs/xfs_trans_buf.c:311).  Shutting down filesystem.
 XFS (dm-0): Please unmount the filesystem and rectify the problem(s)
 XFS (dm-0): Internal error dqp->q_ino.reserved < dqp->q_ino.count at line 869 of file fs/xfs/xfs_trans_dquot.c.  Caller xfs_trans_dqresv+0x236/0x440 [xfs]
 XFS (dm-0): Corruption detected. Unmount and run xfs_repair
 XFS (dm-0): Unmounting Filesystem be6bcbcc-9921-4deb-8d16-7cc94e335fa7

The system is stuck in unmount trying to lock a couple of inodes so that
they can be purged.  The dquot corruption notice above is a clue to what
happened -- a link() call tried to set up a transaction to link a child
into a directory.  Quota reservation for the transaction failed after IO
errors shut down the filesystem, but then we forgot to unlock the inodes
on our way out.  Fix that.

Cc: <stable@vger.kernel.org> # v6.10
Fixes: bd5562111d ("xfs: Hold inode locks in xfs_trans_alloc_dir")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:10 -08:00
Darrick J. Wong
ffc3ea4f3c xfs: fix scrub tracepoints when inode-rooted btrees are involved
Fix a minor mistakes in the scrub tracepoints that can manifest when
inode-rooted btrees are enabled.  The existing code worked fine for bmap
btrees, but we should tighten the code up to be less sloppy.

Cc: <stable@vger.kernel.org> # v5.7
Fixes: 92219c292a ("xfs: convert btree cursor inode-private member names")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:10 -08:00
Darrick J. Wong
6d7b4bc1c3 xfs: update btree keys correctly when _insrec splits an inode root block
In commit 2c813ad66a, I partially fixed a bug wherein xfs_btree_insrec
would erroneously try to update the parent's key for a block that had
been split if we decided to insert the new record into the new block.
The solution was to detect this situation and update the in-core key
value that we pass up to the caller so that the caller will (eventually)
add the new block to the parent level of the tree with the correct key.

However, I missed a subtlety about the way inode-rooted btrees work.  If
the full block was a maximally sized inode root block, we'll solve that
fullness by moving the root block's records to a new block, resizing the
root block, and updating the root to point to the new block.  We don't
pass a pointer to the new block to the caller because that work has
already been done.  The new record will /always/ land in the new block,
so in this case we need to use xfs_btree_update_keys to update the keys.

This bug can theoretically manifest itself in the very rare case that we
split a bmbt root block and the new record lands in the very first slot
of the new block, though I've never managed to trigger it in practice.
However, it is very easy to reproduce by running generic/522 with the
realtime rmapbt patchset if rtinherit=1.

Cc: <stable@vger.kernel.org> # v4.8
Fixes: 2c813ad66a ("xfs: support btrees with overlapping intervals for keys")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:10 -08:00
Darrick J. Wong
23bee6f390 xfs: fix error bailout in xfs_rtginode_create
smatch reported that we screwed up the error cleanup in this function.
Fix it.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: ae897e0bed ("xfs: support creating per-RTG files in growfs")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:10 -08:00
Darrick J. Wong
af9f02457f xfs: fix null bno_hint handling in xfs_rtallocate_rtg
xfs_bmap_rtalloc initializes the bno_hint variable to NULLRTBLOCK (aka
NULLFSBLOCK).  If the allocation request is for a file range that's
adjacent to an existing mapping, it will then change bno_hint to the
blkno hint in the bmalloca structure.

In other words, bno_hint is either a rt block number, or it's all 1s.
Unfortunately, commit ec12f97f1b didn't take the NULLRTBLOCK state
into account, which means that it tries to translate that into a
realtime extent number.  We then end up with an obnoxiously high rtx
number and pointlessly feed that to the near allocator.  This often
fails and falls back to the by-size allocator.  Seeing as we had no
locality hint anyway, this is a waste of time.

Fix the code to detect a lack of bno_hint correctly.  This was detected
by running xfs/009 with metadir enabled and a 28k rt extent size.

Cc: <stable@vger.kernel.org> # v6.12
Fixes: ec12f97f1b ("xfs: make the rtalloc start hint a xfs_rtblock_t")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:09 -08:00
Darrick J. Wong
dc5a052739 xfs: mark metadir repair tempfiles with IRECOVERY
Once in a long while, xfs/566 and xfs/801 report directory corruption in
one of the metadata subdirectories while it's forcibly rebuilding all
filesystem metadata.  I observed the following sequence of events:

1. Initiate a repair of the parent pointers for the /quota/user file.
   This is the secret file containing user quota data.

2. The pptr repair thread creates a temporary file and begins staging
   parent pointers in the ondisk metadata in preparation for an
   exchange-range to commit the new pptr data.

3. At the same time, initiate a repair of the /quota directory itself.

4. The dir repair thread finds the temporary file from (2), scans it for
   parent pointers, and stages a dirent in its own temporary dir in
   preparation to commit the fixed directory.

5. The parent pointer repair completes and frees the temporary file.

6. The dir repair commits the new directory and scans it again.  It
   finds the dirent that points to the old temporary file in (2) and
   marks the directory corrupt.

Oops!  Repair code must never scan the temporary files that other repair
functions create to stage new metadata.  They're not supposed to do
that, but the predicate function xrep_is_tempfile is incorrect because
it assumes that any XFS_DIFLAG2_METADATA file cannot ever be a temporary
file, but xrep_tempfile_adjust_directory_tree creates exactly that.

Fix this by setting the IRECOVERY flag on temporary metadata directory
inodes and using that to correct the predicate.  Repair code is supposed
to erase all the data in temporary files before releasing them, so it's
ok if a thread scans the temporary file after we drop IRECOVERY.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: bb6cdd5529 ("xfs: hide metadata inodes from everyone because they are special")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:09 -08:00
Darrick J. Wong
6f4669708a xfs: set XFS_SICK_INO_SYMLINK_ZAPPED explicitly when zapping a symlink
If we need to reset a symlink target to the "durr it's busted" string,
then we clear the zapped flag as well.  However, this should be using
the provided helper so that we don't set the zapped state on an
otherwise ok symlink.

Cc: <stable@vger.kernel.org> # v6.10
Fixes: 2651923d8d ("xfs: online repair of symbolic links")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:09 -08:00
Darrick J. Wong
aa7bfb537e xfs: separate healthy clearing mask during repair
In commit d9041681dd we introduced some XFS_SICK_*ZAPPED flags so
that the inode record repair code could clean up a damaged inode record
enough to iget the inode but still be able to remember that the higher
level repair code needs to be called.  As part of that, we introduced a
xchk_mark_healthy_if_clean helper that is supposed to cause the ZAPPED
state to be removed if that higher level metadata actually checks out.
This was done by setting additional bits in sick_mask hoping that
xchk_update_health will clear all those bits after a healthy scrub.

Unfortunately, that's not quite what sick_mask means -- bits in that
mask are indeed cleared if the metadata is healthy, but they're set if
the metadata is NOT healthy.  fsck is only intended to set the ZAPPED
bits explicitly.

If something else sets the CORRUPT/XCORRUPT state after the
xchk_mark_healthy_if_clean call, we end up marking the metadata zapped.
This can happen if the following sequence happens:

1. Scrub runs, discovers that the metadata is fine but could be
   optimized and calls xchk_mark_healthy_if_clean on a ZAPPED flag.
   That causes the ZAPPED flag to be set in sick_mask because the
   metadata is not CORRUPT or XCORRUPT.

2. Repair runs to optimize the metadata.

3. Some other metadata used for cross-referencing in (1) becomes
   corrupt.

4. Post-repair scrub runs, but this time it sets CORRUPT or XCORRUPT due
   to the events in (3).

5. Now the xchk_health_update sets the ZAPPED flag on the metadata we
   just repaired.  This is not the correct state.

Fix this by moving the "if healthy" mask to a separate field, and only
ever using it to clear the sick state.

Cc: <stable@vger.kernel.org> # v6.8
Fixes: d9041681dd ("xfs: set inode sick state flags when we zap either ondisk fork")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:09 -08:00
Darrick J. Wong
7ce31f20a0 xfs: don't drop errno values when we fail to ficlone the entire range
Way back when we first implemented FICLONE for XFS, life was simple --
either the the entire remapping completed, or something happened and we
had to return an errno explaining what happened.  Neither of those
ioctls support returning partial results, so it's all or nothing.

Then things got complicated when copy_file_range came along, because it
actually can return the number of bytes copied, so commit 3f68c1f562
tried to make it so that we could return a partial result if the
REMAP_FILE_CAN_SHORTEN flag is set.  This is also how FIDEDUPERANGE can
indicate that the kernel performed a partial deduplication.

Unfortunately, the logic is wrong if an error stops the remapping and
CAN_SHORTEN is not set.  Because those callers cannot return partial
results, it is an error for ->remap_file_range to return a positive
quantity that is less than the @len passed in.  Implementations really
should be returning a negative errno in this case, because that's what
btrfs (which introduced FICLONE{,RANGE}) did.

Therefore, ->remap_range implementations cannot silently drop an errno
that they might have when the number of bytes remapped is less than the
number of bytes requested and CAN_SHORTEN is not set.

Found by running generic/562 on a 64k fsblock filesystem and wondering
why it reported corrupt files.

Cc: <stable@vger.kernel.org> # v4.20
Fixes: 3fc9f5e409 ("xfs: remove xfs_reflink_remap_range")
Really-Fixes: 3f68c1f562 ("xfs: support returning partial reflink results")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:09 -08:00
Darrick J. Wong
bd27c7bcdc xfs: return a 64-bit block count from xfs_btree_count_blocks
With the nrext64 feature enabled, it's possible for a data fork to have
2^48 extent mappings.  Even with a 64k fsblock size, that maps out to
a bmbt containing more than 2^32 blocks.  Therefore, this predicate must
return a u64 count to avoid an integer wraparound that will cause scrub
to do the wrong thing.

It's unlikely that any such filesystem currently exists, because the
incore bmbt would consume more than 64GB of kernel memory on its own,
and so far nobody except me has driven a filesystem that far, judging
from the lack of complaints.

Cc: <stable@vger.kernel.org> # v5.19
Fixes: df9ad5cc7a ("xfs: Introduce macros to represent new maximum extent counts for data/attr forks")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:09 -08:00
Darrick J. Wong
e1d8602b6c xfs: keep quota directory inode loaded
In the same vein as the previous patch, there's no point in the metapath
scrub setup function doing a lookup on the quota metadir just so it can
validate that lookups work correctly.  Instead, retain the quota
directory inode in memory for the lifetime of the mount so that we can
check this meaningfully.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: 128a055291 ("xfs: scrub quota file metapaths")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:08 -08:00
Darrick J. Wong
9b72800103 xfs: metapath scrubber should use the already loaded inodes
Don't waste time in xchk_setup_metapath_dqinode doing a second lookup of
the quota inodes, just grab them from the quotainfo structure.  The
whole point of this scrubber is to make sure that the dirents exist, so
it's completely silly to do lookups.

Cc: <stable@vger.kernel.org> # v6.13-rc1
Fixes: 128a055291 ("xfs: scrub quota file metapaths")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:08 -08:00
Darrick J. Wong
a440a28ddb xfs: fix off-by-one error in fsmap's end_daddr usage
In commit ca6448aed4, we created an "end_daddr" variable to fix
fsmap reporting when the end of the range requested falls in the middle
of an unknown (aka free on the rmapbt) region.  Unfortunately, I didn't
notice that the the code sets end_daddr to the last sector of the device
but then uses that quantity to compute the length of the synthesized
mapping.

Zizhi Wo later observed that when end_daddr isn't set, we still don't
report the last fsblock on a device because in that case (aka when
info->last is true), the info->high mapping that we pass to
xfs_getfsmap_group_helper has a startblock that points to the last
fsblock.  This is also wrong because the code uses startblock to
compute the length of the synthesized mapping.

Fix the second problem by setting end_daddr unconditionally, and fix the
first problem by setting start_daddr to one past the end of the range to
query.

Cc: <stable@vger.kernel.org> # v6.11
Fixes: ca6448aed4 ("xfs: Fix missing interval for missing_owner in xfs fsmap")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Reported-by: Zizhi Wo <wozizhi@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-12-12 17:45:08 -08:00
Linus Torvalds
f932fb9b40 three kernel server fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmdbcJkACgkQiiy9cAdy
 T1GhSQv/WCHq1jdgw/4IeAKKoGyPDq1fhWK3YVjTe6G8RDHbpdPtP1lPnQkwhW3U
 6f5IMbphRc7MlaHBo4nwSvCSSKieb+uQ9ppMMu5qi0iSkfvtyDZFyEsIpI2OlXEp
 s7QqcWe0vylyAwwClVZgRvlLa7j9T1QoaELoEV92JMaLpZ0Q8kHBlA4XLH2K5aYH
 WQ8MXnuZIl1G59SzIekvUDsAzKqxoJ7XYuaypGtp9/tmnmyEf2GcPlJ1lpGVdPjE
 y8H46CC9Kx2e/2a9J/d9HnPco4AQ4/VESrBPfvFKNaAL4P9DqXczuiFFkMtH1KYx
 06L9R6XPQaQVUPZZ7XMM79vvyvrhX1LoElMxApfmcB5evfJy4UIxcfbRjdIgKVIJ
 J4mOSOEkf8pn8T0jQ9r3787M3nFs8qxrg1PZEPvbaa5njHn5pYkxkZ71TddG+1pR
 /ryljIMDHZudOzzGJIUh90QRcWE/k8lc5pEqdEwholcq0nlkQ/kMgkwJ3I7XpAmh
 z5JPgeJ+
 =+OkY
 -----END PGP SIGNATURE-----

Merge tag 'v6.13-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

 - fix ctime setting in setattr

 - fix reference count on user session to avoid potential race with
   session expire

 - fix query dir issue

* tag 'v6.13-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: set ATTR_CTIME flags when setting mtime
  ksmbd: fix racy issue from session lookup and expire
  ksmbd: retry iterate_dir in smb2_query_dir
2024-12-12 17:33:20 -08:00
Linus Torvalds
01abac26dc perf tools fixes for v6.13
A set of random fixes for this cycle.
 
 perf record
 -----------
 * Fix build-id event size calculation in perf record
 * Fix perf record -C/--cpu option on hybrid systems
 * Fix perf mem record with precise-ip on SapphireRapids
 
 perf test
 ---------
 * Refresh hwmon directory before reading the test files
 * Make sure system_tsc_freq event is tested on x86 only
 
 Others
 ------
 * Usual header file sync
 * Fix undefined behavior in perf ftrace profile
 * Properly initialize a return variable in perf probe
 
 Signed-off-by: Namhyung Kim <namhyung@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZ1taMgAKCRCMstVUGiXM
 g+JLAP99JamxAdmiXW9qwU7Z2OGWZ+J5RsVh5sId+XuH5Lui1wEAgxE5vhULfRaz
 IH74FuxietYfo0YpTLVlJJgntHO7ugo=
 =YUUe
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-fixes-for-v6.13-2024-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools fixes from Namhyung Kim:
 "A set of random fixes for this cycle.

  perf record:
   - Fix build-id event size calculation in perf record
   - Fix perf record -C/--cpu option on hybrid systems
   - Fix perf mem record with precise-ip on SapphireRapids

  perf test:
   - Refresh hwmon directory before reading the test files
   - Make sure system_tsc_freq event is tested on x86 only

  Others:
   - Usual header file sync
   - Fix undefined behavior in perf ftrace profile
   - Properly initialize a return variable in perf probe"

* tag 'perf-tools-fixes-for-v6.13-2024-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (21 commits)
  perf probe: Fix uninitialized variable
  libperf: evlist: Fix --cpu argument on hybrid platform
  perf test expr: Fix system_tsc_freq for only x86
  perf test hwmon_pmu: Fix event file location
  perf hwmon_pmu: Use openat rather than dup to refresh directory
  perf ftrace: Fix undefined behavior in cmp_profile_data()
  perf tools: Fix precise_ip fallback logic
  perf tools: Fix build error on generated/fs_at_flags_array.c
  tools headers: Sync uapi/linux/prctl.h with the kernel sources
  tools headers: Sync uapi/linux/mount.h with the kernel sources
  tools headers: Sync uapi/linux/fcntl.h with the kernel sources
  tools headers: Sync uapi/asm-generic/mman.h with the kernel sources
  tools headers: Sync *xattrat syscall changes with the kernel sources
  tools headers: Sync arm64 kvm header with the kernel sources
  tools headers: Sync x86 kvm and cpufeature headers with the kernel
  tools headers: Sync uapi/linux/kvm.h with the kernel sources
  tools headers: Sync uapi/linux/perf_event.h with the kernel sources
  tools headers: Sync uapi/drm/drm.h with the kernel sources
  perf machine: Initialize machine->env to address a segfault
  perf test: Don't signal all processes on system when interrupting tests
  ...
2024-12-12 17:26:55 -08:00
Linus Torvalds
eefa7a9c06 OpenRISC fixes for 6.13-rc3
A few fixes. One is just a formatting change but I figure I will include
 it here to avoid unnessesary pull requests in the future.  If you prefer
 not to merge it now I can leave it out.
 
  - Fix from Masahiro Yamada to fix 6.13 OpenRISC boot issues after
    vmlinux.lds.h symbol ordering was changed.
  - Code formatting fixups from Geert.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE2cRzVK74bBA6Je/xw7McLV5mJ+QFAmdbE1YACgkQw7McLV5m
 J+RBFg//TP5RcFAJQkAZ3D6rrCpuFx5pFe0PZaVpD1JbgPIXCoj5+TghY2akfxA5
 ogU1P68FZosh0M0NAns36Viw0lH55zdGyrAnsc61KYXFZtE9OloLuBgXa3tTV9/y
 7Qep+thCBA4KIfMFQpsHeAG4zC+NBQbSZtzrQLY5FUvYHAaTiJj4NZxh9D7uEJhm
 k3qyCDmh5jpx0wkAvk97A2nM6ks7Io4+V4kKrijRwfSvSUmoQq/wAfjbpY41LqYR
 +BoX6cjC/+iDXOOvAYadmYGWOiTU1zFI9MUykYE6O6tRAgMaM0tSw1DkyInLni6n
 SYWKw3rPYIxcszNd9C15+L0U3xUYXf55AXWTIpgWnD/+nw+wn4ggX2iYjXuwHC4P
 hkQpervd0AxcV9KOn5Vr5g8mHOJ1JuaOtCtUu143TgOWY0JirbFOZHqe+fxTn5w0
 t2TRu/o+QWKV7HU91Pb219YRWCTArxS+QgSLOV/sRT9JkcS9Y33Rqd4osZ8PST8L
 mWEWFwiDft3kmMtLEJ1KOzBb8H8tDesL/o83CzTa/T3gpQpeLmRX7IsXR+QsLFo8
 rKo/tfjIUlgwu+ORJXiXKpoWifTetaPVh9eHdwLEu+O1Z8LK1q9aXon9rDeOXWyG
 qL058kBpkTxrrTVT0GSN9FMWOJl69ZgJtxAPcydk2kYzioSuNgM=
 =rh6a
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of https://github.com/openrisc/linux

Pull OpenRISC fixes from Stafford Horne:

 - Fix from Masahiro Yamada to fix 6.13 OpenRISC boot issues after
   vmlinux.lds.h symbol ordering was changed

 - Code formatting fixups from Geert

* tag 'for-linus' of https://github.com/openrisc/linux:
  openrisc: Fix misalignments in head.S
  openrisc: place exception table at the head of vmlinux
2024-12-12 13:20:53 -08:00
Linus Torvalds
150b567e0d Including fixes from bluetooth, netfilter and wireless.
Current release - fix to a fix:
 
  - rtnetlink: fix error code in rtnl_newlink()
 
  - tipc: fix NULL deref in cleanup_bearer()
 
 Current release - regressions:
 
  - ip: fix warning about invalid return from in ip_route_input_rcu()
 
 Current release - new code bugs:
 
  - udp: fix L4 hash after reconnect
 
  - eth: lan969x: fix cyclic dependency between modules
 
  - eth: bnxt_en: fix potential crash when dumping FW log coredump
 
 Previous releases - regressions:
 
  - wifi: mac80211:
    - fix a queue stall in certain cases of channel switch
    - wake the queues in case of failure in resume
 
  - splice: do not checksum AF_UNIX sockets
 
  - virtio_net: fix BUG()s in BQL support due to incorrect accounting
    of purged packets during interface stop
 
  - eth: stmmac: fix TSO DMA API mis-usage causing oops
 
  - eth: bnxt_en: fixes for HW GRO: GSO type on 5750X chips and
    oops due to incorrect aggregation ID mask on 5760X chips
 
 Previous releases - always broken:
 
  - Bluetooth: improve setsockopt() handling of malformed user input
 
  - eth: ocelot: fix PTP timestamping in presence of packet loss
 
  - ptp: kvm: x86: avoid "fail to initialize ptp_kvm" when simply
    not supported
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmdbCXgACgkQMUZtbf5S
 Irtizg/9GVtTtu0OeQpHlxOpXOdqRciHDHBcjc0+rYihHazA47wtOszPg2BDiebV
 1D+uTaPoxuJUZo9jDAGMerUpy6gmC8+4h9gp72oSU9uGNHTrDWsylsn16foFkmpg
 hMsq+bzYr9ayekIXoI4T//PQ8MO8fqLFPdJmFPIKjkTtsrCzzARck9R4uDlWzrJj
 v5cQY+q/6qnwZTvvto67ahjdKUw8k3XIRZxLDqrDiW+zUzdk9XRwK46AdP3eybcx
 OCMHvXmWx6DTbjeEbzhq5YwDGAnBOE9rP4vJmpV9y+PcPDCmPzt7IDNWACcEPHY4
 3vuZv3JJP/5MIqGHidDn1JYgWl/Y3iv5ZfKInG585XH+5VWemq3WL1JOS2ua6Xmu
 hoGhwNTGea4KtCeutE8xSwMSBTxswkdPb93ZFPt28zKAN118chBvGLRv2jepSvQR
 3AQhJ9bgGuErHMYh5vdiluRVj/4bwSIFqEH6vr6w9+DUDFiTSKERLXSJ8dc8S+9K
 ghd/I8POb4VTfjZIyHzo1DJOulPXe84KGMcOuAfh0AV7o5HcuP+oNdR3+qS2Lf+G
 EByIX8osZsHjqaVr5ba+KnZz2XrdO7mbE54fCKa9ZUwkNIbcCEqOJBqcMlPWxvtK
 whrGDOS8ifYYK6fL6IFO5CtxBvWmQgMOYV6Sjp9J27PD4jiMrms=
 =TDKt
 -----END PGP SIGNATURE-----

Merge tag 'net-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from bluetooth, netfilter and wireless.

  Current release - fix to a fix:

   - rtnetlink: fix error code in rtnl_newlink()

   - tipc: fix NULL deref in cleanup_bearer()

  Current release - regressions:

   - ip: fix warning about invalid return from in ip_route_input_rcu()

  Current release - new code bugs:

   - udp: fix L4 hash after reconnect

   - eth: lan969x: fix cyclic dependency between modules

   - eth: bnxt_en: fix potential crash when dumping FW log coredump

  Previous releases - regressions:

   - wifi: mac80211:
      - fix a queue stall in certain cases of channel switch
      - wake the queues in case of failure in resume

   - splice: do not checksum AF_UNIX sockets

   - virtio_net: fix BUG()s in BQL support due to incorrect accounting
     of purged packets during interface stop

   - eth:
      - stmmac: fix TSO DMA API mis-usage causing oops
      - bnxt_en: fixes for HW GRO: GSO type on 5750X chips and oops
        due to incorrect aggregation ID mask on 5760X chips

  Previous releases - always broken:

   - Bluetooth: improve setsockopt() handling of malformed user input

   - eth: ocelot: fix PTP timestamping in presence of packet loss

   - ptp: kvm: x86: avoid "fail to initialize ptp_kvm" when simply not
     supported"

* tag 'net-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
  net: dsa: tag_ocelot_8021q: fix broken reception
  net: dsa: microchip: KSZ9896 register regmap alignment to 32 bit boundaries
  net: renesas: rswitch: fix initial MPIC register setting
  Bluetooth: btmtk: avoid UAF in btmtk_process_coredump
  Bluetooth: iso: Fix circular lock in iso_conn_big_sync
  Bluetooth: iso: Fix circular lock in iso_listen_bis
  Bluetooth: SCO: Add support for 16 bits transparent voice setting
  Bluetooth: iso: Fix recursive locking warning
  Bluetooth: iso: Always release hdev at the end of iso_listen_bis
  Bluetooth: hci_event: Fix using rcu_read_(un)lock while iterating
  Bluetooth: hci_core: Fix sleeping function called from invalid context
  team: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL
  team: Fix initial vlan_feature set in __team_compute_features
  bonding: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL
  bonding: Fix initial {vlan,mpls}_feature set in bond_compute_features
  net, team, bonding: Add netdev_base_features helper
  net/sched: netem: account for backlog updates from child qdisc
  net: dsa: felix: fix stuck CPU-injected packets with short taprio windows
  splice: do not checksum AF_UNIX sockets
  net: usb: qmi_wwan: add Telit FE910C04 compositions
  ...
2024-12-12 11:28:05 -08:00
Jakub Kicinski
ad913dfd8b bluetooth pull request for net:
- SCO: Fix transparent voice setting
  - ISO: Locking fixes
  - hci_core: Fix sleeping function called from invalid context
  - hci_event: Fix using rcu_read_(un)lock while iterating
  - btmtk: avoid UAF in btmtk_process_coredump
 -----BEGIN PGP SIGNATURE-----
 
 iQJNBAABCAA3FiEE7E6oRXp8w05ovYr/9JCA4xAyCykFAmda8pEZHGx1aXoudm9u
 LmRlbnR6QGludGVsLmNvbQAKCRD0kIDjEDILKTf/D/sFRQb6FTdttMV934GDH+1W
 DCS2prkqGUi6KGTxFFpT1rKafR0+h1osta1yvtM7h1tbXqpjAsv06ksFEP1vsXgl
 Kw1gTn/cEGlK6KQ+oX4ObEBWXtlnYparl86m+OJY8xVc2GEA7GGdbwwsuDLeEv8H
 JVoMN2FLb/Io3VHYBO595xh0BK4K61gM0zh/nxwWNxaOH1AzCwoh4oVEyCtlwQpn
 5okDjfawHUfU9T/VlzL0TwlXP/Rwi6afvaJ+vt7N6wqgrJ51Q1cXf00kTqKWD2mQ
 vsRDjIMn4YYSjH3X27i6xif3jFQ4z1fjto0N4PjE5IgNe+VUtEiQsT2h5NnxA/mt
 MWNx9EvUYXLrOkVot91FPJqYTNjGpKr4EBxRdFW1MW3sJX4rCGDVpW7gJF3fG44+
 iEFHaZpJ8XlyyT7gD6BffkePv5iicbJtmgk++Dx1Z0ekkvvjA4RkQHFMTwWh2a+Y
 s+1qS8rfhmyWf8IdUVCAxbrOW9nXRNFEaRh2ooqzEI/ycXtogzmoAI8g/xnZ7VHg
 H2sSOyO4HfsH/nHHkPvaIQL+8pt8EWYIuGUgMhNFuRIYpYvcoDNNpVuhKrvk9Qp+
 SsmmIw/ov5M9ucEE24fTf3jaSwin+fORUOMye31yF6tmxJOuTFnMyil8kOPovlu3
 IdxLOmCDxYXZCYNZslc65w==
 =1Po/
 -----END PGP SIGNATURE-----

Merge tag 'for-net-2024-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

 - SCO: Fix transparent voice setting
 - ISO: Locking fixes
 - hci_core: Fix sleeping function called from invalid context
 - hci_event: Fix using rcu_read_(un)lock while iterating
 - btmtk: avoid UAF in btmtk_process_coredump

* tag 'for-net-2024-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: btmtk: avoid UAF in btmtk_process_coredump
  Bluetooth: iso: Fix circular lock in iso_conn_big_sync
  Bluetooth: iso: Fix circular lock in iso_listen_bis
  Bluetooth: SCO: Add support for 16 bits transparent voice setting
  Bluetooth: iso: Fix recursive locking warning
  Bluetooth: iso: Always release hdev at the end of iso_listen_bis
  Bluetooth: hci_event: Fix using rcu_read_(un)lock while iterating
  Bluetooth: hci_core: Fix sleeping function called from invalid context
  Bluetooth: Improve setsockopt() handling of malformed user input
====================

Link: https://patch.msgid.link/20241212142806.2046274-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-12 07:10:40 -08:00
Robert Hodaszi
36ff681d22 net: dsa: tag_ocelot_8021q: fix broken reception
The blamed commit changed the dsa_8021q_rcv() calling convention to
accept pre-populated source_port and switch_id arguments. If those are
not available, as in the case of tag_ocelot_8021q, the arguments must be
pre-initialized with -1.

Due to the bug of passing uninitialized arguments in tag_ocelot_8021q,
dsa_8021q_rcv() does not detect that it needs to populate the
source_port and switch_id, and this makes dsa_conduit_find_user() fail,
which leads to packet loss on reception.

Fixes: dcfe767378 ("net: dsa: tag_sja1105: absorb logic for not overwriting precise info into dsa_8021q_rcv()")
Signed-off-by: Robert Hodaszi <robert.hodaszi@digi.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20241211144741.1415758-1-robert.hodaszi@digi.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-12 07:10:27 -08:00
Jesse Van Gavere
5af53577c6 net: dsa: microchip: KSZ9896 register regmap alignment to 32 bit boundaries
Commit 8d7ae22ae9 ("net: dsa: microchip: KSZ9477 register regmap
alignment to 32 bit boundaries") fixed an issue whereby regmap_reg_range
did not allow writes as 32 bit words to KSZ9477 PHY registers, this fix
for KSZ9896 is adapted from there as the same errata is present in
KSZ9896C as "Module 5: Certain PHY registers must be written as pairs
instead of singly" the explanation below is likewise taken from this
commit.

The commit provided code
to apply "Module 6: Certain PHY registers must be written as pairs instead
of singly" errata for KSZ9477 as this chip for certain PHY registers
(0xN120 to 0xN13F, N=1,2,3,4,5) must be accessed as 32 bit words instead
of 16 or 8 bit access.
Otherwise, adjacent registers (no matter if reserved or not) are
overwritten with 0x0.

Without this patch some registers (e.g. 0x113c or 0x1134) required for 32
bit access are out of valid regmap ranges.

As a result, following error is observed and KSZ9896 is not properly
configured:

ksz-switch spi1.0: can't rmw 32bit reg 0x113c: -EIO
ksz-switch spi1.0: can't rmw 32bit reg 0x1134: -EIO
ksz-switch spi1.0 lan1 (uninitialized): failed to connect to PHY: -EIO
ksz-switch spi1.0 lan1 (uninitialized): error -5 setting up PHY for tree 0, switch 0, port 0

The solution is to modify regmap_reg_range to allow accesses with 4 bytes
boundaries.

Fixes: 5c844d57aa ("net: dsa: microchip: fix writes to phy registers >= 0x10")
Signed-off-by: Jesse Van Gavere <jesse.vangavere@scioteq.com>
Link: https://patch.msgid.link/20241211092932.26881-1-jesse.vangavere@scioteq.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-12 07:10:11 -08:00
Nikita Yushchenko
fb9e6039c3 net: renesas: rswitch: fix initial MPIC register setting
MPIC.PIS must be set per phy interface type.
MPIC.LSC must be set per speed.

Do that strictly per datasheet, instead of hardcoding MPIC.PIS to GMII.

Fixes: 3590918b5d ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20241211053012.368914-1-nikita.yoush@cogentembedded.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-12 15:32:22 +01:00
Thadeu Lima de Souza Cascardo
b548f5e945 Bluetooth: btmtk: avoid UAF in btmtk_process_coredump
hci_devcd_append may lead to the release of the skb, so it cannot be
accessed once it is called.

==================================================================
BUG: KASAN: slab-use-after-free in btmtk_process_coredump+0x2a7/0x2d0 [btmtk]
Read of size 4 at addr ffff888033cfabb0 by task kworker/0:3/82

CPU: 0 PID: 82 Comm: kworker/0:3 Tainted: G     U             6.6.40-lockdep-03464-g1d8b4eb3060e #1 b0b3c1cc0c842735643fb411799d97921d1f688c
Hardware name: Google Yaviks_Ufs/Yaviks_Ufs, BIOS Google_Yaviks_Ufs.15217.552.0 05/07/2024
Workqueue: events btusb_rx_work [btusb]
Call Trace:
 <TASK>
 dump_stack_lvl+0xfd/0x150
 print_report+0x131/0x780
 kasan_report+0x177/0x1c0
 btmtk_process_coredump+0x2a7/0x2d0 [btmtk 03edd567dd71a65958807c95a65db31d433e1d01]
 btusb_recv_acl_mtk+0x11c/0x1a0 [btusb 675430d1e87c4f24d0c1f80efe600757a0f32bec]
 btusb_rx_work+0x9e/0xe0 [btusb 675430d1e87c4f24d0c1f80efe600757a0f32bec]
 worker_thread+0xe44/0x2cc0
 kthread+0x2ff/0x3a0
 ret_from_fork+0x51/0x80
 ret_from_fork_asm+0x1b/0x30
 </TASK>

Allocated by task 82:
 stack_trace_save+0xdc/0x190
 kasan_set_track+0x4e/0x80
 __kasan_slab_alloc+0x4e/0x60
 kmem_cache_alloc+0x19f/0x360
 skb_clone+0x132/0xf70
 btusb_recv_acl_mtk+0x104/0x1a0 [btusb]
 btusb_rx_work+0x9e/0xe0 [btusb]
 worker_thread+0xe44/0x2cc0
 kthread+0x2ff/0x3a0
 ret_from_fork+0x51/0x80
 ret_from_fork_asm+0x1b/0x30

Freed by task 1733:
 stack_trace_save+0xdc/0x190
 kasan_set_track+0x4e/0x80
 kasan_save_free_info+0x28/0xb0
 ____kasan_slab_free+0xfd/0x170
 kmem_cache_free+0x183/0x3f0
 hci_devcd_rx+0x91a/0x2060 [bluetooth]
 worker_thread+0xe44/0x2cc0
 kthread+0x2ff/0x3a0
 ret_from_fork+0x51/0x80
 ret_from_fork_asm+0x1b/0x30

The buggy address belongs to the object at ffff888033cfab40
 which belongs to the cache skbuff_head_cache of size 232
The buggy address is located 112 bytes inside of
 freed 232-byte region [ffff888033cfab40, ffff888033cfac28)

The buggy address belongs to the physical page:
page:00000000a174ba93 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x33cfa
head:00000000a174ba93 order:1 entire_mapcount:0 nr_pages_mapped:0 pincount:0
anon flags: 0x4000000000000840(slab|head|zone=1)
page_type: 0xffffffff()
raw: 4000000000000840 ffff888100848a00 0000000000000000 0000000000000001
raw: 0000000000000000 0000000080190019 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888033cfaa80: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
 ffff888033cfab00: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
>ffff888033cfab80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                     ^
 ffff888033cfac00: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
 ffff888033cfac80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

Check if we need to call hci_devcd_complete before calling
hci_devcd_append. That requires that we check data->cd_info.cnt >=
MTK_COREDUMP_NUM instead of data->cd_info.cnt > MTK_COREDUMP_NUM, as we
increment data->cd_info.cnt only once the call to hci_devcd_append
succeeds.

Fixes: 0b70151328 ("Bluetooth: btusb: mediatek: add MediaTek devcoredump support")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-12-12 09:25:28 -05:00
Iulia Tanasescu
7a17308c17 Bluetooth: iso: Fix circular lock in iso_conn_big_sync
This fixes the circular locking dependency warning below, by reworking
iso_sock_recvmsg, to ensure that the socket lock is always released
before calling a function that locks hdev.

[  561.670344] ======================================================
[  561.670346] WARNING: possible circular locking dependency detected
[  561.670349] 6.12.0-rc6+ #26 Not tainted
[  561.670351] ------------------------------------------------------
[  561.670353] iso-tester/3289 is trying to acquire lock:
[  561.670355] ffff88811f600078 (&hdev->lock){+.+.}-{3:3},
               at: iso_conn_big_sync+0x73/0x260 [bluetooth]
[  561.670405]
               but task is already holding lock:
[  561.670407] ffff88815af58258 (sk_lock-AF_BLUETOOTH){+.+.}-{0:0},
               at: iso_sock_recvmsg+0xbf/0x500 [bluetooth]
[  561.670450]
               which lock already depends on the new lock.

[  561.670452]
               the existing dependency chain (in reverse order) is:
[  561.670453]
               -> #2 (sk_lock-AF_BLUETOOTH){+.+.}-{0:0}:
[  561.670458]        lock_acquire+0x7c/0xc0
[  561.670463]        lock_sock_nested+0x3b/0xf0
[  561.670467]        bt_accept_dequeue+0x1a5/0x4d0 [bluetooth]
[  561.670510]        iso_sock_accept+0x271/0x830 [bluetooth]
[  561.670547]        do_accept+0x3dd/0x610
[  561.670550]        __sys_accept4+0xd8/0x170
[  561.670553]        __x64_sys_accept+0x74/0xc0
[  561.670556]        x64_sys_call+0x17d6/0x25f0
[  561.670559]        do_syscall_64+0x87/0x150
[  561.670563]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  561.670567]
               -> #1 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}:
[  561.670571]        lock_acquire+0x7c/0xc0
[  561.670574]        lock_sock_nested+0x3b/0xf0
[  561.670577]        iso_sock_listen+0x2de/0xf30 [bluetooth]
[  561.670617]        __sys_listen_socket+0xef/0x130
[  561.670620]        __x64_sys_listen+0xe1/0x190
[  561.670623]        x64_sys_call+0x2517/0x25f0
[  561.670626]        do_syscall_64+0x87/0x150
[  561.670629]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  561.670632]
               -> #0 (&hdev->lock){+.+.}-{3:3}:
[  561.670636]        __lock_acquire+0x32ad/0x6ab0
[  561.670639]        lock_acquire.part.0+0x118/0x360
[  561.670642]        lock_acquire+0x7c/0xc0
[  561.670644]        __mutex_lock+0x18d/0x12f0
[  561.670647]        mutex_lock_nested+0x1b/0x30
[  561.670651]        iso_conn_big_sync+0x73/0x260 [bluetooth]
[  561.670687]        iso_sock_recvmsg+0x3e9/0x500 [bluetooth]
[  561.670722]        sock_recvmsg+0x1d5/0x240
[  561.670725]        sock_read_iter+0x27d/0x470
[  561.670727]        vfs_read+0x9a0/0xd30
[  561.670731]        ksys_read+0x1a8/0x250
[  561.670733]        __x64_sys_read+0x72/0xc0
[  561.670736]        x64_sys_call+0x1b12/0x25f0
[  561.670738]        do_syscall_64+0x87/0x150
[  561.670741]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  561.670744]
               other info that might help us debug this:

[  561.670745] Chain exists of:
&hdev->lock --> sk_lock-AF_BLUETOOTH-BTPROTO_ISO --> sk_lock-AF_BLUETOOTH

[  561.670751]  Possible unsafe locking scenario:

[  561.670753]        CPU0                    CPU1
[  561.670754]        ----                    ----
[  561.670756]   lock(sk_lock-AF_BLUETOOTH);
[  561.670758]                                lock(sk_lock
                                              AF_BLUETOOTH-BTPROTO_ISO);
[  561.670761]                                lock(sk_lock-AF_BLUETOOTH);
[  561.670764]   lock(&hdev->lock);
[  561.670767]
                *** DEADLOCK ***

Fixes: 07a9342b94 ("Bluetooth: ISO: Send BIG Create Sync via hci_sync")
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-12-12 09:25:13 -05:00
Iulia Tanasescu
168e28305b Bluetooth: iso: Fix circular lock in iso_listen_bis
This fixes the circular locking dependency warning below, by
releasing the socket lock before enterning iso_listen_bis, to
avoid any potential deadlock with hdev lock.

[   75.307983] ======================================================
[   75.307984] WARNING: possible circular locking dependency detected
[   75.307985] 6.12.0-rc6+ #22 Not tainted
[   75.307987] ------------------------------------------------------
[   75.307987] kworker/u81:2/2623 is trying to acquire lock:
[   75.307988] ffff8fde1769da58 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO)
               at: iso_connect_cfm+0x253/0x840 [bluetooth]
[   75.308021]
               but task is already holding lock:
[   75.308022] ffff8fdd61a10078 (&hdev->lock)
               at: hci_le_per_adv_report_evt+0x47/0x2f0 [bluetooth]
[   75.308053]
               which lock already depends on the new lock.

[   75.308054]
               the existing dependency chain (in reverse order) is:
[   75.308055]
               -> #1 (&hdev->lock){+.+.}-{3:3}:
[   75.308057]        __mutex_lock+0xad/0xc50
[   75.308061]        mutex_lock_nested+0x1b/0x30
[   75.308063]        iso_sock_listen+0x143/0x5c0 [bluetooth]
[   75.308085]        __sys_listen_socket+0x49/0x60
[   75.308088]        __x64_sys_listen+0x4c/0x90
[   75.308090]        x64_sys_call+0x2517/0x25f0
[   75.308092]        do_syscall_64+0x87/0x150
[   75.308095]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   75.308098]
               -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}:
[   75.308100]        __lock_acquire+0x155e/0x25f0
[   75.308103]        lock_acquire+0xc9/0x300
[   75.308105]        lock_sock_nested+0x32/0x90
[   75.308107]        iso_connect_cfm+0x253/0x840 [bluetooth]
[   75.308128]        hci_connect_cfm+0x6c/0x190 [bluetooth]
[   75.308155]        hci_le_per_adv_report_evt+0x27b/0x2f0 [bluetooth]
[   75.308180]        hci_le_meta_evt+0xe7/0x200 [bluetooth]
[   75.308206]        hci_event_packet+0x21f/0x5c0 [bluetooth]
[   75.308230]        hci_rx_work+0x3ae/0xb10 [bluetooth]
[   75.308254]        process_one_work+0x212/0x740
[   75.308256]        worker_thread+0x1bd/0x3a0
[   75.308258]        kthread+0xe4/0x120
[   75.308259]        ret_from_fork+0x44/0x70
[   75.308261]        ret_from_fork_asm+0x1a/0x30
[   75.308263]
               other info that might help us debug this:

[   75.308264]  Possible unsafe locking scenario:

[   75.308264]        CPU0                CPU1
[   75.308265]        ----                ----
[   75.308265]   lock(&hdev->lock);
[   75.308267]                            lock(sk_lock-
                                                AF_BLUETOOTH-BTPROTO_ISO);
[   75.308268]                            lock(&hdev->lock);
[   75.308269]   lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO);
[   75.308270]
                *** DEADLOCK ***

[   75.308271] 4 locks held by kworker/u81:2/2623:
[   75.308272]  #0: ffff8fdd66e52148 ((wq_completion)hci0#2){+.+.}-{0:0},
                at: process_one_work+0x443/0x740
[   75.308276]  #1: ffffafb488b7fe48 ((work_completion)(&hdev->rx_work)),
                at: process_one_work+0x1ce/0x740
[   75.308280]  #2: ffff8fdd61a10078 (&hdev->lock){+.+.}-{3:3}
                at: hci_le_per_adv_report_evt+0x47/0x2f0 [bluetooth]
[   75.308304]  #3: ffffffffb6ba4900 (rcu_read_lock){....}-{1:2},
                at: hci_connect_cfm+0x29/0x190 [bluetooth]

Fixes: 02171da6e8 ("Bluetooth: ISO: Add hcon for listening bis sk")
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-12-12 09:24:57 -05:00
Frédéric Danis
29a651451e Bluetooth: SCO: Add support for 16 bits transparent voice setting
The voice setting is used by sco_connect() or sco_conn_defer_accept()
after being set by sco_sock_setsockopt().

The PCM part of the voice setting is used for offload mode through PCM
chipset port.
This commits add support for mSBC 16 bits offloading, i.e. audio data
not transported over HCI.

The BCM4349B1 supports 16 bits transparent data on its I2S port.
If BT_VOICE_TRANSPARENT is used when accepting a SCO connection, this
gives only garbage audio while using BT_VOICE_TRANSPARENT_16BIT gives
correct audio.
This has been tested with connection to iPhone 14 and Samsung S24.

Fixes: ad10b1a487 ("Bluetooth: Add Bluetooth socket voice option")
Signed-off-by: Frédéric Danis <frederic.danis@collabora.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-12-12 09:24:35 -05:00
Iulia Tanasescu
9bde7c3b3a Bluetooth: iso: Fix recursive locking warning
This updates iso_sock_accept to use nested locking for the parent
socket, to avoid lockdep warnings caused because the parent and
child sockets are locked by the same thread:

[   41.585683] ============================================
[   41.585688] WARNING: possible recursive locking detected
[   41.585694] 6.12.0-rc6+ #22 Not tainted
[   41.585701] --------------------------------------------
[   41.585705] iso-tester/3139 is trying to acquire lock:
[   41.585711] ffff988b29530a58 (sk_lock-AF_BLUETOOTH)
               at: bt_accept_dequeue+0xe3/0x280 [bluetooth]
[   41.585905]
               but task is already holding lock:
[   41.585909] ffff988b29533a58 (sk_lock-AF_BLUETOOTH)
               at: iso_sock_accept+0x61/0x2d0 [bluetooth]
[   41.586064]
               other info that might help us debug this:
[   41.586069]  Possible unsafe locking scenario:

[   41.586072]        CPU0
[   41.586076]        ----
[   41.586079]   lock(sk_lock-AF_BLUETOOTH);
[   41.586086]   lock(sk_lock-AF_BLUETOOTH);
[   41.586093]
                *** DEADLOCK ***

[   41.586097]  May be due to missing lock nesting notation

[   41.586101] 1 lock held by iso-tester/3139:
[   41.586107]  #0: ffff988b29533a58 (sk_lock-AF_BLUETOOTH)
                at: iso_sock_accept+0x61/0x2d0 [bluetooth]

Fixes: ccf74f2390 ("Bluetooth: Add BTPROTO_ISO socket type")
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-12-12 09:24:20 -05:00
Iulia Tanasescu
9c76fff747 Bluetooth: iso: Always release hdev at the end of iso_listen_bis
Since hci_get_route holds the device before returning, the hdev
should be released with hci_dev_put at the end of iso_listen_bis
even if the function returns with an error.

Fixes: 02171da6e8 ("Bluetooth: ISO: Add hcon for listening bis sk")
Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-12-12 09:24:05 -05:00
Luiz Augusto von Dentz
581dd2dc16 Bluetooth: hci_event: Fix using rcu_read_(un)lock while iterating
The usage of rcu_read_(un)lock while inside list_for_each_entry_rcu is
not safe since for the most part entries fetched this way shall be
treated as rcu_dereference:

	Note that the value returned by rcu_dereference() is valid
	only within the enclosing RCU read-side critical section [1]_.
	For example, the following is **not** legal::

		rcu_read_lock();
		p = rcu_dereference(head.next);
		rcu_read_unlock();
		x = p->address;	/* BUG!!! */
		rcu_read_lock();
		y = p->data;	/* BUG!!! */
		rcu_read_unlock();

Fixes: a0bfde167b ("Bluetooth: ISO: Add support for connecting multiple BISes")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-12-12 09:23:49 -05:00
Luiz Augusto von Dentz
4d94f05558 Bluetooth: hci_core: Fix sleeping function called from invalid context
This reworks hci_cb_list to not use mutex hci_cb_list_lock to avoid bugs
like the bellow:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 5070, name: kworker/u9:2
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
4 locks held by kworker/u9:2/5070:
 #0: ffff888015be3948 ((wq_completion)hci0#2){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline]
 #0: ffff888015be3948 ((wq_completion)hci0#2){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x1770 kernel/workqueue.c:3335
 #1: ffffc90003b6fd00 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline]
 #1: ffffc90003b6fd00 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x1770 kernel/workqueue.c:3335
 #2: ffff8880665d0078 (&hdev->lock){+.+.}-{3:3}, at: hci_le_create_big_complete_evt+0xcf/0xae0 net/bluetooth/hci_event.c:6914
 #3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
 #3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
 #3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: hci_le_create_big_complete_evt+0xdb/0xae0 net/bluetooth/hci_event.c:6915
CPU: 0 PID: 5070 Comm: kworker/u9:2 Not tainted 6.8.0-syzkaller-08073-g480e035fc4c7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Workqueue: hci0 hci_rx_work
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
 __might_resched+0x5d4/0x780 kernel/sched/core.c:10187
 __mutex_lock_common kernel/locking/mutex.c:585 [inline]
 __mutex_lock+0xc1/0xd70 kernel/locking/mutex.c:752
 hci_connect_cfm include/net/bluetooth/hci_core.h:2004 [inline]
 hci_le_create_big_complete_evt+0x3d9/0xae0 net/bluetooth/hci_event.c:6939
 hci_event_func net/bluetooth/hci_event.c:7514 [inline]
 hci_event_packet+0xa53/0x1540 net/bluetooth/hci_event.c:7569
 hci_rx_work+0x3e8/0xca0 net/bluetooth/hci_core.c:4171
 process_one_work kernel/workqueue.c:3254 [inline]
 process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335
 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
 kthread+0x2f0/0x390 kernel/kthread.c:388
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
 </TASK>

Reported-by: syzbot+2fb0835e0c9cefc34614@syzkaller.appspotmail.com
Tested-by: syzbot+2fb0835e0c9cefc34614@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2fb0835e0c9cefc34614
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-12-12 09:23:28 -05:00
Paolo Abeni
3d64c3d3c6 netfilter pull request 24-12-11
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEjF9xRqF1emXiQiqU1w0aZmrPKyEFAmdaFt8ACgkQ1w0aZmrP
 KyGzow//fio7wR3vGa+l822MpbXxtyPindBNdNbdIrc1UajcyWaja1fG0vrvnLC5
 Ze6OUI0Q4OumjmKHYRbWGxTufCURFA4IXJ92mnftu0d/7AFIoRms3F3rTSXzFQbJ
 Osgt6yUAUlw3naG0TSvUjejWBI8L/ISRyZlz/olJSubww54+4h3VcsrhAFvV8RWr
 cHoRwTnrIG2q0ACk77AX7pmP13kO8rvFG8Qv2lcGxhDCJUglQG+KYVleY0oddkEO
 9O8fNi9WocUCjkFGIpMUBxSSNIXbX1GY0qj52aSGfowgqgQNVHykHxSOiow84Qbg
 ea5uUBmtzswNzbSsRkNPX/rsk1FIVzmWDT3Gyt/rrED2/8gUuH9RvTWgjVHhB75G
 UpZYmnsFLTJXDBk3Y8q5RKR0+nntmVtsIqL8UoSJR1Ck8Kz3Xp1qSJqPJMGjfYkm
 3rstS2T2SPh9As0pqq0/ACEgCz+Gzztuoo4L+JZJRo3clu/lEtbJVnQy/6MkckcA
 Y5U6IHWEctB+xMvRirPcx6cJa/7dX8sCg4UdRVberdGKq+XUXmQpbPo1w89gVA5l
 BQq3eeX9lxEDNEjxgpVapm5BkDsA23sbCrjM26NryAJSkGh2MwXxp4zqWz4WIQGW
 R6FVUi4JgR2+LGkchHSZUULpVjNWyq9U8RWvv4ge00tSj5XEHmw=
 =JEfQ
 -----END PGP SIGNATURE-----

Merge tag 'nf-24-12-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Fix bogus test reports in rpath.sh selftest by adding permanent
   neighbor entries, from Phil Sutter.

2) Lockdep reports possible ABBA deadlock in xt_IDLETIMER, fix it by
   removing sysfs out of the mutex section, also from Phil Sutter.

3) It is illegal to release basechain via RCU callback, for several
   reasons. Keep it simple and safe by calling synchronize_rcu() instead.
   This is a partially reverting a botched recent attempt of me to fix
   this basechain release path on netdevice removal.
   From Florian Westphal.

netfilter pull request 24-12-11

* tag 'nf-24-12-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_tables: do not defer rule destruction via call_rcu
  netfilter: IDLETIMER: Fix for possible ABBA deadlock
  selftests: netfilter: Stabilize rpath.sh
====================

Link: https://patch.msgid.link/20241211230130.176937-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-12 13:11:38 +01:00
Daniel Borkmann
9871284458 team: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL
Similar to bonding driver, add NETIF_F_GSO_ENCAP_ALL to TEAM_VLAN_FEATURES
in order to support slave devices which propagate NETIF_F_GSO_UDP_TUNNEL &
NETIF_F_GSO_UDP_TUNNEL_CSUM as vlan_features.

Fixes: 3625920b62 ("teaming: fix vlan_features computing")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: Ido Schimmel <idosch@idosch.org>
Cc: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241210141245.327886-5-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-12 11:59:18 +01:00
Daniel Borkmann
396699ac2c team: Fix initial vlan_feature set in __team_compute_features
Similarly as with bonding, fix the calculation of vlan_features to reuse
netdev_base_features() in order derive the set in the same way as
ndo_fix_features before iterating through the slave devices to refine the
feature set.

Fixes: 3625920b62 ("teaming: fix vlan_features computing")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: Ido Schimmel <idosch@idosch.org>
Cc: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241210141245.327886-4-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-12 11:59:18 +01:00
Daniel Borkmann
77b11c8bf3 bonding: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL
Drivers like mlx5 expose NIC's vlan_features such as
NETIF_F_GSO_UDP_TUNNEL & NETIF_F_GSO_UDP_TUNNEL_CSUM which are
later not propagated when the underlying devices are bonded and
a vlan device created on top of the bond.

Right now, the more cumbersome workaround for this is to create
the vlan on top of the mlx5 and then enslave the vlan devices
to a bond.

To fix this, add NETIF_F_GSO_ENCAP_ALL to BOND_VLAN_FEATURES
such that bond_compute_features() can probe and propagate the
vlan_features from the slave devices up to the vlan device.

Given the following bond:

  # ethtool -i enp2s0f{0,1}np{0,1}
  driver: mlx5_core
  [...]

  # ethtool -k enp2s0f0np0 | grep udp
  tx-udp_tnl-segmentation: on
  tx-udp_tnl-csum-segmentation: on
  tx-udp-segmentation: on
  rx-udp_tunnel-port-offload: on
  rx-udp-gro-forwarding: off

  # ethtool -k enp2s0f1np1 | grep udp
  tx-udp_tnl-segmentation: on
  tx-udp_tnl-csum-segmentation: on
  tx-udp-segmentation: on
  rx-udp_tunnel-port-offload: on
  rx-udp-gro-forwarding: off

  # ethtool -k bond0 | grep udp
  tx-udp_tnl-segmentation: on
  tx-udp_tnl-csum-segmentation: on
  tx-udp-segmentation: on
  rx-udp_tunnel-port-offload: off [fixed]
  rx-udp-gro-forwarding: off

Before:

  # ethtool -k bond0.100 | grep udp
  tx-udp_tnl-segmentation: off [requested on]
  tx-udp_tnl-csum-segmentation: off [requested on]
  tx-udp-segmentation: on
  rx-udp_tunnel-port-offload: off [fixed]
  rx-udp-gro-forwarding: off

After:

  # ethtool -k bond0.100 | grep udp
  tx-udp_tnl-segmentation: on
  tx-udp_tnl-csum-segmentation: on
  tx-udp-segmentation: on
  rx-udp_tunnel-port-offload: off [fixed]
  rx-udp-gro-forwarding: off

Various users have run into this reporting performance issues when
configuring Cilium in vxlan tunneling mode and having the combination
of bond & vlan for the core devices connecting the Kubernetes cluster
to the outside world.

Fixes: a9b3ace44c ("bonding: fix vlan_features computing")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: Ido Schimmel <idosch@idosch.org>
Cc: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241210141245.327886-3-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-12 11:59:18 +01:00
Daniel Borkmann
d064ea7fe2 bonding: Fix initial {vlan,mpls}_feature set in bond_compute_features
If a bonding device has slave devices, then the current logic to derive
the feature set for the master bond device is limited in that flags which
are fully supported by the underlying slave devices cannot be propagated
up to vlan devices which sit on top of bond devices. Instead, these get
blindly masked out via current NETIF_F_ALL_FOR_ALL logic.

vlan_features and mpls_features should reuse netdev_base_features() in
order derive the set in the same way as ndo_fix_features before iterating
through the slave devices to refine the feature set.

Fixes: a9b3ace44c ("bonding: fix vlan_features computing")
Fixes: 2e770b507c ("net: bonding: Inherit MPLS features from slave devices")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Cc: Ido Schimmel <idosch@idosch.org>
Cc: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20241210141245.327886-2-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-12 11:59:18 +01:00