The aim of this patch is to reduce the fragmentation of block groups
under certain unhappy workloads. It is particularly effective when the
size of extents correlates with their lifetime, which is something we
have observed causing fragmentation in the fleet at Meta.
This patch categorizes extents into size classes:
- x < 128KiB: "small"
- 128KiB < x < 8MiB: "medium"
- x > 8MiB: "large"
and as much as possible reduces allocations of extents into block groups
that don't match the size class. This takes advantage of any (possible)
correlation between size and lifetime and also leaves behind predictable
re-usable gaps when extents are freed; small writes don't gum up bigger
holes.
Size classes are implemented in the following way:
- Mark each new block group with a size class of the first allocation
that goes into it.
- Add two new passes to ffe: "unset size class" and "wrong size class".
First, try only matching block groups, then try unset ones, then allow
allocation of new ones, and finally allow mismatched block groups.
- Filtering is done just by skipping inappropriate ones, there is no
special size class indexing.
Other solutions I considered were:
- A best fit allocator with an rb-tree. This worked well, as small
writes didn't leak big holes from large freed extents, but led to
regressions in ffe and write performance due to lock contention on
the rb-tree with every allocation possibly updating it in parallel.
Perhaps something clever could be done to do the updates in the
background while being "right enough".
- A fixed size "working set". This prevents freeing an extent
drastically changing where writes currently land, and seems like a
good option too. Doesn't take advantage of size in any way.
- The same size class idea, but implemented with xarray marks. This
turned out to be slower than looping the linked list and skipping
wrong block groups, and is also less flexible since we must have only
3 size classes (max #marks). With the current approach we can have as
many as we like.
Performance testing was done via: https://github.com/josefbacik/fsperf
Of particular relevance are the new fragmentation specific tests.
A brief summary of the testing results:
- Neutral results on existing tests. There are some minor regressions
and improvements here and there, but nothing that truly stands out as
notable.
- Improvement on new tests where size class and extent lifetime are
correlated. Fragmentation in these cases is completely eliminated
and write performance is generally a little better. There is also
significant improvement where extent sizes are just a bit larger than
the size class boundaries.
- Regression on one new tests: where the allocations are sized
intentionally a hair under the borders of the size classes. Results
are neutral on the test that intentionally attacks this new scheme by
mixing extent size and lifetime.
The full dump of the performance results can be found here:
https://bur.io/fsperf/size-class-2022-11-15.txt
(there are ANSI escape codes, so best to curl and view in terminal)
Here is a snippet from the full results for a new test which mixes
buffered writes appending to a long lived set of files and large short
lived fallocates:
bufferedappendvsfallocate results
metric baseline current stdev diff
======================================================================================
avg_commit_ms 31.13 29.20 2.67 -6.22%
bg_count 14 15.60 0 11.43%
commits 11.10 12.20 0.32 9.91%
elapsed 27.30 26.40 2.98 -3.30%
end_state_mount_ns 11122551.90 10635118.90 851143.04 -4.38%
end_state_umount_ns 1.36e+09 1.35e+09 12248056.65 -1.07%
find_free_extent_calls 116244.30 114354.30 964.56 -1.63%
find_free_extent_ns_max 599507.20 1047168.20 103337.08 74.67%
find_free_extent_ns_mean 3607.19 3672.11 101.20 1.80%
find_free_extent_ns_min 500 512 6.67 2.40%
find_free_extent_ns_p50 2848 2876 37.65 0.98%
find_free_extent_ns_p95 4916 5000 75.45 1.71%
find_free_extent_ns_p99 20734.49 20920.48 1670.93 0.90%
frag_pct_max 61.67 0 8.05 -100.00%
frag_pct_mean 43.59 0 6.10 -100.00%
frag_pct_min 25.91 0 16.60 -100.00%
frag_pct_p50 42.53 0 7.25 -100.00%
frag_pct_p95 61.67 0 8.05 -100.00%
frag_pct_p99 61.67 0 8.05 -100.00%
fragmented_bg_count 6.10 0 1.45 -100.00%
max_commit_ms 49.80 46 5.37 -7.63%
sys_cpu 2.59 2.62 0.29 1.39%
write_bw_bytes 1.62e+08 1.68e+08 17975843.50 3.23%
write_clat_ns_mean 57426.39 54475.95 2292.72 -5.14%
write_clat_ns_p50 46950.40 42905.60 2101.35 -8.62%
write_clat_ns_p99 148070.40 143769.60 2115.17 -2.90%
write_io_kbytes 4194304 4194304 0 0.00%
write_iops 2476.15 2556.10 274.29 3.23%
write_lat_ns_max 2101667.60 2251129.50 370556.59 7.11%
write_lat_ns_mean 59374.91 55682.00 2523.09 -6.22%
write_lat_ns_min 17353.10 16250 1646.08 -6.36%
There are some mixed improvements/regressions in most metrics along with
an elimination of fragmentation in this workload.
On the balance, the drastic 1->0 improvement in the happy cases seems
worth the mix of regressions and improvements we do observe.
Some considerations for future work:
- Experimenting with more size classes
- More hinting/search ordering work to approximate a best-fit allocator
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
find_free_extent is a complicated function. It consists (at least) of:
- a hint that jumps into the middle of a for loop macro
- a middle loop trying every raid level
- an outer loop ascending through ffe loop levels
- complicated logic for skipping some of those ffe loop levels
- multiple underlying in-bg allocators (zoned, cluster, no cluster)
Which is all to say that more tracing is helpful for debugging its
behavior. Add two new tracepoints: at the entrance to the block_groups
loop (hit for every raid level and every ffe_ctl loop) and at the point
we seriously consider a block_group for allocation. This way we can see
the whole path through the algorithm, including hints, multiple loops,
etc.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The allocator tracepoints currently have a pile of values from ffe_ctl.
In modifying the allocator and adding more tracepoints, I found myself
adding to the already long argument list of the tracepoints. It makes it
a lot simpler to just send in the ffe_ctl itself.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Given that wait is always set to 1, so remove the argument.
Last use of wait with 0 was in 0c304304fe ("Btrfs: remove
csum_bytes_left").
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We currently use 'ret' and 'err' to track the return value for
log_dir_items(), which is confusing and likely the cause for previous
bugs where log_dir_items() did not return an error when it should, fixed
in previous patches.
So change this and use only a single variable, 'ret', to track the return
value. This is simpler and makes it similar to most of the existing code.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently we use the value 1 for BTRFS_LOG_FORCE_COMMIT, but that value
has a few inconveniences:
1) If it's ever used by btrfs_log_inode(), or any function down the call
chain, we have to remember to btrfs_set_log_full_commit(), which is
repetitive and has a chance to be forgotten in future use cases.
btrfs_log_inode_parent() only calls btrfs_set_log_full_commit() when
it gets a negative value from btrfs_log_inode();
2) Down the call chain of btrfs_log_inode(), we may have functions that
need to force a log commit, but can return either an error (negative
value), false (0) or true (1). So they are forced to return some
random negative to force a log commit - using BTRFS_LOG_FORCE_COMMIT
would make the intention more clear. Currently the only example is
flush_dir_items_batch().
So turn BTRFS_LOG_FORCE_COMMIT into a negative value. The chosen value
is -(MAX_ERRNO + 1), so that it does not overlap any errno value and makes
it easier to debug.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The header file linux/mm.h provides PAGE_ALIGN, PAGE_ALIGNED,
PAGE_ALIGN_DOWN macros. Use these macros to make code more
concise.
Signed-off-by: Yushan Zhou <katrinzhou@tencent.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When btrfs_get_chunk_map fails to allocate a new em the cleanup does not
need to be done so the goto target is out_err, which is consistent with
current coding style.
Signed-off-by: Peng Hao <flyingpeng@tencent.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
We had a recent bug that would have been caught by a newer compiler with
-Wmaybe-uninitialized and would have saved us a month of failing tests
that I didn't have time to investigate.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
With -Wmaybe-uninitialized compiler complains about ret being possibly
uninitialized, which isn't possible as the WQ_ constants are set only
from our code, however we can handle the default case and get rid of the
warning.
The value is set to BLK_STS_IOERR so it does not issue any IO and could
be potentially detected, but this is basically a "cannot happen" error.
To catch any problems during development use the assert.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ set the error in default: ]
Signed-off-by: David Sterba <dsterba@suse.com>
Fix an uninitialized warning we get with -Wmaybe-uninitialized where it
thought zno may have been uninitialized, in both cases it depends on
zinfo->zone_cache but we know the value won't change between checks.
Reported-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/linux-btrfs/af6c527cbd8bdc782e50bd33996ee83acc3a16fb.1671221596.git.josef@toxicpanda.com/
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We only have 3 possible mirrors, and we have ASSERT()'s to make sure
we're not passing in an invalid super mirror into this function, so
technically this value isn't uninitialized. However
-Wmaybe-uninitialized will complain, so set it to U64_MAX so if we don't
have ASSERT()'s turned on it'll error out later on when it see's the
zone is beyond our maximum zones.
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We will pass in the parent and p pointer into our tree_search function
to avoid doing a second search when inserting a new extent state into
the tree. However because this is conditional upon passing in these
pointers the compiler seems to think these values can be uninitialized
if we're using -Wmaybe-uninitialized. Fix this by initializing these
values.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
reclaim isn't set in the alloc case, however we only care about
reclaim in the !alloc case. This isn't an actual problem, however
-Wmaybe-uninitialized will complain, so initialize reclaim to quiet the
compiler.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Anybody that calls get_inode_gen() can have an uninitialized gen if
there's an error. This isn't a big deal because all the users just exit
if they get an error, however it makes -Wmaybe-uninitialized complain,
so fix this up to always initialize the passed in gen, this quiets all
of the uninitialized warnings in send.c.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We can conditionally pass in a locked page, and then we'll use that page
range to skip marking errors as that will happen in another layer.
However this causes the compiler to complain because it doesn't
understand we only use these values when we have the page. Make the
compiler stop complaining by setting these values to 0.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While trying to sync messages.[ch] I ended up with this dependency on
messages.h in the rest of btrfs-progs code base because it's where
btrfs_abort_transaction() was now held. We want to keep messages.[ch]
limited to the kernel code, and the btrfs_abort_transaction() code
better fits in the transaction code and not in messages.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ move the __cold attributes ]
Signed-off-by: David Sterba <dsterba@suse.com>
Now that none of the functions called by btrfs_merge_delayed_refs() needs
a btrfs_trans_handle, directly pass in a btrfs_fs_info to
btrfs_merge_delayed_refs().
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that drop_delayed_ref() doesn't need a btrfs_trans_handle, drop it
from insert_delayed_ref() as well.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that drop_delayed_ref() doesn't get the btrfs_trans_handle passed in
anymore, we can get rid of it in merge_ref() as well.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
drop_delayed_ref() doesn't use the btrfs_trans_handle it gets passed in,
so remove it.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Both Rich Felker and Yoshinori Sato haven't done any work on arch/sh
for a while. As I have been maintaining Debian's sh4 port since 2014,
I am interested to keep the architecture alive.
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The TASK_COMM_LEN was converted from a macro into an enum so that BTF
would have access to it. But this unfortunately caused TASK_COMM_LEN to
display in the format fields of trace events, as they are created by the
TRACE_EVENT() macro and such, macros convert to their values, where as
enums do not.
To handle this, instead of using the field itself to be display, save the
value of the array size as another field in the trace_event_fields
structure, and use that instead. Not only does this fix the issue, but
also converts the other trace events that have this same problem (but were
not breaking tooling). With this change, the original work around
b3bc8547d3 ("tracing: Have TRACE_DEFINE_ENUM affect trace event types
as well") could be reverted (but that should be done in the merge window).
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY+lOqxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6quYPAQD+9j+RPIUm9Ms4XCIEOXkFI04yjsbd
rQSRcpYBWyP59AEAnZNYNwE11vDsKBGxPrOPgGYe4Pzfr5yOWY84mgaxgwo=
=iYsE
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
"Fix showing of TASK_COMM_LEN instead of its value
The TASK_COMM_LEN was converted from a macro into an enum so that BTF
would have access to it. But this unfortunately caused TASK_COMM_LEN
to display in the format fields of trace events, as they are created
by the TRACE_EVENT() macro and such, macros convert to their values,
where as enums do not.
To handle this, instead of using the field itself to be display, save
the value of the array size as another field in the trace_event_fields
structure, and use that instead.
Not only does this fix the issue, but also converts the other trace
events that have this same problem (but were not breaking tooling).
With this change, the original work around b3bc8547d3 ("tracing:
Have TRACE_DEFINE_ENUM affect trace event types as well") could be
reverted (but that should be done in the merge window)"
* tag 'trace-v6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix TASK_COMM_LEN in trace event format file
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmPo41YACgkQxWXV+ddt
WDsPXA/8DPCp1PEvmkJ998wBCgSuoVvG9b4l1HOI0aFWC/giJWYsTdBF/+rFP/83
+UFBmxDsbG8tMoq73Dw8XxTvmYwRUyCdtn/AmKkGpu/l9KF4fnM+RTIh94e4DaH7
O1R5zPVOX34ScgL/bR6Hmcrw8a7q6yUmW9xORR40AAbYOccUld4nvUZOI+hVUbtN
84pphG+U4KowtX2J4fqLWALGU/2hDP9Aiq3aKOdupoiRYJacx3FoMP4aaEblJlMk
ViLJYBXrJ+6v71frjT4LgSdDd7+l6QEaHHlQwIxMrf3r7AXUkMerwoiOhasMRXTB
WnZjC8XeS9yogY6Ls5/gIEEWB7buz6TFJwm3rwfXMM+0OQ1g0RFvjXQPD8sOLazS
X/5ToML8SZYpfkmIMnP+hBnmAMFKpjC06o40cN5/96xkqqMAwL7ws+XIlso/Hx+l
Lu01cgnDLluRflWtVwMLmrhOGLStjbiDJKmG4zKl/WsyqGdodjIUyCOjhB0Wy0CN
RMrkvOUwngTfAdWQYTHDdxkTdn1+b/nB+N9BvLbD8Dt+Q5H7loGR+0mS5xsRNg4Q
jDY0yLDtR6bDxvcp4L2Vz1ezn+dSo8XAR9zqd4pT+7mZ6tLsf0R5F3iedAZkaqQC
1uVkjiHyi1Gq/6iKRwf72rQMNKdDmAgM+sDx0uQK5JyG8ZGqgLA=
=KGNk
-----END PGP SIGNATURE-----
Merge tag 'for-6.2-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- one more fix for a tree-log 'write time corruption' report, update
the last dir index directly and don't keep in the log context
- do VFS-level inode lock around FIEMAP to prevent a deadlock with
concurrent fsync, the extent-level lock is not sufficient
- don't cache a single-device filesystem device to avoid cases when a
loop device is reformatted and the entry gets stale
* tag 'for-6.2-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: free device in btrfs_close_devices for a single device filesystem
btrfs: lock the inode in shared mode before starting fiemap
btrfs: simplify update of last_dir_index_offset when logging a directory
Here are 2 small USB driver fixes that resolve some reported regressions
and one new device quirk. Specifically these are:
- new quirk for Alcor Link AK9563 smartcard reader
- revert of u_ether gadget change in 6.2-rc1 that caused problems
- typec pin probe fix
All of these have been in linux-next with no reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY+jGFQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylYkQCgybIKsO7j9Mfi5nTsZktsWJRexu8AoMGuckH6
zko7UCEUrN5mk4xy5w9p
=1LuO
-----END PGP SIGNATURE-----
Merge tag 'usb-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are 2 small USB driver fixes that resolve some reported
regressions and one new device quirk. Specifically these are:
- new quirk for Alcor Link AK9563 smartcard reader
- revert of u_ether gadget change in 6.2-rc1 that caused problems
- typec pin probe fix
All of these have been in linux-next with no reported problems"
* tag 'usb-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: core: add quirk for Alcor Link AK9563 smartcard reader
usb: typec: altmodes/displayport: Fix probe pin assign check
Revert "usb: gadget: u_ether: Do not make UDC parent of the net device"
A fix from Darren to widen the SMBIOS match for detecting Ampere Altra
machines with problematic firmware. In the mean time, we are working on
a more precise check, but this is still work in progress.
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmPo134ACgkQw08iOZLZ
jyT7Ywv/YrEUEm6dlWDSCkhWdli7B6pcwWhcvbsPcDLZWSDuo4vIJhtav1eeWkjs
E2YDewXRSwewDlHeQPX+v25loVNKqbeBUDK6kX/DkNsk+jRVVM4Xt9myJA9XO//v
YDO7Srbkbk/GlBkZFNUkVYPaEy5aKVO7l/hQZy+GTYhuA/UXZVtlZrqA0EJNOnT0
xR7s+yXUNd0gBsgvTypnRACviL1qgZkY/yso51Gv/oXxzsVqm8K1XGsRVZblHwHK
YfhgytI/kj6mZ4I6WaOYiCt5NTq+GT7g8lMUmHISHNXxl9qzvaZ51jV2Cxf/9Bck
1RyfsIh3JoLHBlwCrfKRqIooitRENXWlIj+8PxZYG2/ONov7MkqEork7mSb1ITJw
0uqb0tClIZE23C+fdHI7fctbNrh+CQLr1RjSz7iNX+HUWsXJRag6bDrjFXzwqQnx
tLur+4QbpC8KbpwDoEQu74wveacJ6kn4r0KeKRWTp7IRsdA7NH+wQl6IJkMKr49A
41UnT1x8
=g+5h
-----END PGP SIGNATURE-----
Merge tag 'efi-fixes-for-v6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fix from Ard Biesheuvel:
"A fix from Darren to widen the SMBIOS match for detecting Ampere Altra
machines with problematic firmware. In the mean time, we are working
on a more precise check, but this is still work in progress"
* tag 'efi-fixes-for-v6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
arm64: efi: Force the use of SetVirtualAddressMap() on eMAG and Altra Max machines
When we upgraded our kernel, we started seeing some page corruption like
the following consistently:
BUG: Bad page state in process ganesha.nfsd pfn:1304ca
page:0000000022261c55 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x1304ca
flags: 0x17ffffc0000000()
raw: 0017ffffc0000000 ffff8a513ffd4c98 ffffeee24b35ec08 0000000000000000
raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
page dumped because: nonzero mapcount
CPU: 0 PID: 15567 Comm: ganesha.nfsd Kdump: loaded Tainted: P B O 5.10.158-1.nutanix.20221209.el7.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
Call Trace:
dump_stack+0x74/0x96
bad_page.cold+0x63/0x94
check_new_page_bad+0x6d/0x80
rmqueue+0x46e/0x970
get_page_from_freelist+0xcb/0x3f0
? _cond_resched+0x19/0x40
__alloc_pages_nodemask+0x164/0x300
alloc_pages_current+0x87/0xf0
skb_page_frag_refill+0x84/0x110
...
Sometimes, it would also show up as corruption in the free list pointer
and cause crashes.
After bisecting the issue, we found the issue started from commit
e320d3012d ("mm/page_alloc.c: fix freeing non-compound pages"):
if (put_page_testzero(page))
free_the_page(page, order);
else if (!PageHead(page))
while (order-- > 0)
free_the_page(page + (1 << order), order);
So the problem is the check PageHead is racy because at this point we
already dropped our reference to the page. So even if we came in with
compound page, the page can already be freed and PageHead can return
false and we will end up freeing all the tail pages causing double free.
Fixes: e320d3012d ("mm/page_alloc.c: fix freeing non-compound pages")
Link: https://lore.kernel.org/lkml/BYAPR02MB448855960A9656EEA81141FC94D99@BYAPR02MB4488.namprd02.prod.outlook.com/
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After commit 3087c61ed2 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN"),
the content of the format file under
/sys/kernel/tracing/events/task/task_newtask was changed from
field:char comm[16]; offset:12; size:16; signed:0;
to
field:char comm[TASK_COMM_LEN]; offset:12; size:16; signed:0;
John reported that this change breaks older versions of perfetto.
Then Mathieu pointed out that this behavioral change was caused by the
use of __stringify(_len), which happens to work on macros, but not on enum
labels. And he also gave the suggestion on how to fix it:
:One possible solution to make this more robust would be to extend
:struct trace_event_fields with one more field that indicates the length
:of an array as an actual integer, without storing it in its stringified
:form in the type, and do the formatting in f_show where it belongs.
The result as follows after this change,
$ cat /sys/kernel/tracing/events/task/task_newtask/format
field:char comm[16]; offset:12; size:16; signed:0;
Link: https://lore.kernel.org/lkml/Y+QaZtz55LIirsUO@google.com/
Link: https://lore.kernel.org/linux-trace-kernel/20230210155921.4610-1-laoar.shao@gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20230212151303.12353-1-laoar.shao@gmail.com
Cc: stable@vger.kernel.org
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Kajetan Puchalski <kajetan.puchalski@arm.com>
CC: Qais Yousef <qyousef@layalina.io>
Fixes: 3087c61ed2 ("tools/testing/selftests/bpf: replace open-coded 16 with TASK_COMM_LEN")
Reported-by: John Stultz <jstultz@google.com>
Debugged-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
A couple of hopefully final fixes for spi, one driver specific fix for
an issue with very large transfers and a fix for an issue with the
locking fixes in spidev merged earlier this release cycle which was
missed.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmPmcGcACgkQJNaLcl1U
h9AR9gf+PPYoUik/vNzZgDZ9UBC3C8L8FK2utCJ9w3vezzc+4xY2o+Cstd+VlWjW
xyy+amw3RSSh7Itd7kYKLgWOBxrOD7oFq+scjVWUcdRI1vyz2AMvgyrmpyWvyVGI
MTM7LfHtoQSuIHmp4PvF4wegTI+rL0iWmWnbkX4UrsfCYVXaHss2ziHsq9YNvxc7
q2cKCDAzZFnaie7SFwc5sw16J331pV7B7GtS+OITpi+9Fmy4s0GoBZq4b8ZdOnLf
sk+nUrYkGngY1+FkMOsfYoLThFPDxf94mmEyHQm1b+hzn/oQddhRPDV+Yhzu658+
L6X7afIWQvhohnQ36+vDbI0nVx5G3A==
=9x2+
-----END PGP SIGNATURE-----
Merge tag 'spi-fix-v6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A couple of hopefully final fixes for spi: one driver specific fix for
an issue with very large transfers and a fix for an issue with the
locking fixes in spidev merged earlier this release cycle which was
missed"
* tag 'spi-fix-v6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spidev: fix a recursive locking error
spi: dw: Fix wrong FIFO level setting for long xfers
upstream <asm/intel-family.h> header for drivers to use.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPnWH0RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1ihFhAAuHAH9a9ylmIBV4gFmbAw7lKyIcWtPIBi
o6YcD3XEyeL3eXWHAdMhmPvOw5KO5cflgVAP7vHScTgeS3UosznoG2/khuapBbpX
Wz5gNuHNp6zOTWWFU5x43SWaSP6T9OJcNZuPBJjFN/HYoHiG7+13fVy30qC8i/Lw
Whib2ImwUMDHNkvw8/brE1cPg3Kko0SxzSb51HANm14s+Au1KHEd0+Dn2GtTTVbB
2AoxflCaRt8t3kJvdOrzgrm6GoyT+lwsWJch9n0EqqDCVi8uc83Pl1cW2/equ/xm
tLkTjl0kHjdRRbXP7Ry1nvhNSokw02JRQwZ01kx9eBdDxH3FXe5/r4+IsMvc0fcO
q3xrMk8OS9bEEl5iS0zYLfTSPmSP3BBcCro9CYoz2FP+OjsKTIZc0OI5X1gR8JoP
9y5PPhwx1U12e3411kAh2E1ezMa70BISvfK+fFidMMGQKPC4iBMAksEFqjj96SAo
1i10elkvzcLldS8rsHj6yHRnwadwP3ojUiiiofY1b9/+qxpXqYOaXdCLtYenYWpo
b6HKCQZJzlzy6tfbIERQDLI9jaOaUbVLegMxNIgK/x7NGa5dOLieKxapz2F+pRna
dvoEZUZHuV51u87SxkpF5dxtOGLcKphl+RkI7Hbe+wMs5jREU857U/AgZn+0jUmD
fv8nFJ+1g+w=
=7sB6
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fix a kprobes bug, plus add a new Intel model number to the upstream
<asm/intel-family.h> header for drivers to use"
* tag 'x86-urgent-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add Lunar Lake M
x86/kprobes: Fix 1 byte conditional jump target
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPnV1wRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gWxg/8D6jl/Wl/16LkcnLk8eRWGmF34u24lGPK
d8T9VN0fF0Pnz0vQVNyuhpdSARX1PvaOAyvx/gv0KE+OjeNYdXWTHavNEekpBeKA
0XFrw2xoa+M7kJE1LswARk23ynFtLZoFD05G3fNZ6NxK/uxRepZzxUyAmwYE9ibG
9gekg+IeTysaCtmIKHkCOgcLvSy41/JEJMtA3CHA3Bww/CdPd5JSc9ERqZKYCp3L
lVHNtmTZotZ4TA0Dzgx+OgF1JtoQqQyerPQVhkXjmq0MnTJLjKWnvesF2gBcFLHS
6rovr3eCaO2dm7KGsFlt8Ne6FYEd1us1ifK166xoJgRV+TFFpf2UoDZkEhiCOL63
5x3S35wOuKopsBT4IHK5j2LTfhT8KTFSOsZMN41EPhvYYY5/n1EOrHSvFQKmwEFO
jbVvAWHY56YGKH54qePULb+hSrAR1V+AO2ceghusg9BhT4IQ72aR0vkv4hbxd/Zh
mufUd5E3+vWLOVbYM7e3ZGFiC6DA/QVZkTvVxKIllE0bzJkGKI80ITeLbjAxFFmp
OCs+stGij+SwOxEWfK+I0qz6ae8mL/lgWUr7hhkAi8LXGA/t5q1jErCULZFiPr+6
vugrk2SeQZOEVvfUb0/U3GUdn01yGHak7sz7wJBsd++y8I8FR9q6fb7kawgMCo4I
ZydwCwXat2I=
=sfDB
-----END PGP SIGNATURE-----
Merge tag 'locking-urgent-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
"Fix an rtmutex missed-wakeup bug"
* tag 'locking-urgent-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rtmutex: Ensure that the top waiter is always woken up
- Fix a crash when shutting down regions in the presence of passthrough
decoders
- Fix region creation to understand passthrough decoders instead of the
narrower definition of passthrough ports
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCY+b+6wAKCRDfioYZHlFs
Z7d4AQDiMrslPtG+izGEWWMn8a0B5P9MvVgNvMreEfTEug+9cgD+Pp6wJMsZOkVJ
4QX9nxdwnzCzM+l4ppBbW+dhZzBj6gE=
=sZBg
-----END PGP SIGNATURE-----
Merge tag 'cxl-fixes-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Dan Williams:
"Two fixups for CXL (Compute Express Link) in presence of passthrough
decoders.
This primarily helps developers using the QEMU CXL emulation, but with
the impending arrival of CXL switches these types of topologies will
be of interest to end users.
- Fix a crash when shutting down regions in the presence of
passthrough decoders
- Fix region creation to understand passthrough decoders instead of
the narrower definition of passthrough ports"
* tag 'cxl-fixes-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/region: Fix passthrough-decoder detection
cxl/region: Fix null pointer dereference for resetting decoder
- Resolve the conflict between KMSAN and NVDIMM with respect to
reserving pmem namespace / volume capacity for larger
sizeof(struct page)
- Fix a lockdep warning in the the NFIT code
- Fix a kernel-doc build warning
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCY+b8BQAKCRDfioYZHlFs
Z+W4AQDNd/WepR9MiDZsDx+Kbte2WEpf3lHvP8Nzi9hspnlpQwD8Ds/rV4y4XGW5
/CXl6fqrs4A6O9jtz4FxCu+ZCy5YvAI=
=SiUf
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-fixes-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A fix for an issue that could causes users to inadvertantly reserve
too much capacity when debugging the KMSAN and persistent memory
namespace, a lockdep fix, and a kernel-doc build warning:
- Resolve the conflict between KMSAN and NVDIMM with respect to
reserving pmem namespace / volume capacity for larger sizeof(struct
page)
- Fix a lockdep warning in the the NFIT code
- Fix a kernel-doc build warning"
* tag 'libnvdimm-fixes-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm: Support sizeof(struct page) > MAX_STRUCT_PAGE_SIZE
ACPI: NFIT: fix a potential deadlock during NFIT teardown
dax: super.c: fix kernel-doc bad line warning
This reverts commit 115d9d77bb.
The pages being freed by memblock_free_late() have already been
initialized, but if they are in the deferred init range, __free_one_page()
might access nearby uninitialized pages when trying to coalesce buddies,
which will cause a crash.
A proper fix will be more involved so revert this change for the time
being.
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmPnaSQQHHJwcHRAa2Vy
bmVsLm9yZwAKCRA5A4Ymyw79kQl5B/42xQ7QDacxL+okyQXYUytC5DqZ8+1bL5uU
bHg4rNyR7/+7r+D0p6z7MhpeoSdXMSgSLGbx8joaXDNhyNtQqMSj19IQjtzndj4L
pzH5jQ5RJR9ePJBJ3Mq3uInaEvACzPIkfyvHAT4JE65jle8WQ5F5BJ+TzwlWOU0Q
cf9orYTIlDp50saJ/rrw0WKelSZ1oCQJnvFsgIfshmD4b3fZ+X70gsIRAcvqizgw
gszZmpIkgU6idLlboku0jnVTkW2f1C5ZplrDrFXaDbai5mSviPSA7I3TsTA495iD
bwo6xAaPeVOoJOnu7XvKs0e2MFKIfNPIcGzxJe+4vSS+i4W62uyC
=h6Xh
-----END PGP SIGNATURE-----
Merge tag 'fixes-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock revert from Mike Rapoport:
"Revert 'mm: Always release pages to the buddy allocator in
memblock_free_late()'
The pages being freed by memblock_free_late() have already been
initialized, but if they are in the deferred init range,
__free_one_page() might access nearby uninitialized pages when trying
to coalesce buddies, which will cause a crash.
A proper fix will be more involved so revert this change for the time
being"
* tag 'fixes-2023-02-11' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
Revert "mm: Always release pages to the buddy allocator in memblock_free_late()."
- Use devm_kasprintf() to avoid overflows when forming clk names
in the Microchip PolarFire driver
- Fix the pretty broken Ingenic JZ4760 M/N/OD calculation to actually
work and find proper divisors
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAmPmy9gRHHNib3lkQGtl
cm5lbC5vcmcACgkQrQKIl8bklSWtZA//YkdI9ObFNcDIy4hQB1G8ppdwmnwQyUI1
MpTYEKQA1Zuk1HzARcO4WBeGpswKK28gkFgoYuBPvOPsCOgFcb7C9d2sub5rDpKI
+rwG4cptgumrYXxwRHx1qiieW65Tl+vqmVDy8fKhtpnFkc2PHbtJDf2FSYgTyMkd
MZeqsjcys7kYgRTEBvTo/aG9PeDAzhoAWplFTcLvhnb0MxfzcwSSQp3U8/vOTH+i
T1GhWrh5AT0rKEpbuTlc7dodvvs35Y5xkTHRj+wZ63IvCE9/i08k6Jbon2oLsTro
Uw7hbfd9WucpKGnjNGXGJLnY3Wg2egx7l6eJaJZiIyRiprEJlAk2j0v1cd9ZrKGD
TMwszQuGmD/AzNR+oDzmzcRLIsPFmthfC3YRvwPYC/PpFuiKMeQD0BWkUh9z9c6n
AlPlSeNgyWr1nV+mEfoF8X2GsVqlWjvjz1Vc1anAqcfHiL47iSK2ovJN2xfM9kXg
rvmSpoNIuFObq525bTUZoQDGqXtB36h1NWuY5PIr9rNdzK+nlVdOp97Yvsl9scJS
D8fx5rcF3NlzVgHFa4NEbcJdEdKZVVTmdVHta3rOC0+EMxhLkpaowwaL5Rmgd3mT
p3KIzOsVSrL8+J7JyQqPCdEvrODpTyK7PguvicxmyVjFg6CtVeUCj1ucq4IpQV6k
GxZrX2A3MJk=
=RXcc
-----END PGP SIGNATURE-----
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Two clk driver fixes
- Use devm_kasprintf() to avoid overflows when forming clk names in
the Microchip PolarFire driver
- Fix the pretty broken Ingenic JZ4760 M/N/OD calculation to actually
work and find proper divisors"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: ingenic: jz4760: Update M/N/OD calculation algorithm
clk: microchip: mpfs-ccc: Use devm_kasprintf() for allocating formatted strings
- Some pin drive register fixes in the Mediatek driver.
- Return proper error code in the Aspeed driver, and revert
and ill-advised force-disablement patch that needs to be
reworked.
- Fix AMD driver debug output.
- Fix potential NULL dereference in the Single driver.
- Fix a group definition error in the Qualcomm SM8450 LPASS
driver.
- Restore pins used in direct IRQ mode in the Intel driver.
(This fixes some laptop touchpads!)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEElDRnuGcz/wPCXQWMQRCzN7AZXXMFAmPmxzYACgkQQRCzN7AZ
XXM57g//Yvm3K2OHsFzucCoi0YRmC8Z6Y9jlXQJEpxs85XoRVFWPgNVJLlLD1Yix
N01QmMAz6Nb6niDe3OESPwciYIOWRgCSLkgdc1agiW/cJ8mtmy7o20UlVjeJYRTK
NiPUHrjZFeVt/FNc4BlmrnQLGl1LS25WQzVCVj+5DvgWIiaUwbEotfGnTYsqCLqN
kImJ6FWDe/DqJBFvVeNySUOSZfY1FsWwt+zG9V6mjk/rKNdFy+sL1v/FweWJmLf/
mnyqzyZH+wDS0zx0lUIsEb+SBVSNdcV9MbhYj88qLv3HIihFoDsUnX36P1VPvKYG
QEXbskqpfOhAVei2FF1nOHnvA6dFmSNmYGPmipWYEO4zt3Oe/jMJx+5IOcN+fE+3
wtCuRnErrgW6NjDHUIqpfbCdcYno7loA2QvwJ24YyDJgw7bzC5sGQihDQ2bziJyZ
eaRjNSUCn71aB3Ex320mZakT9Rpy6tJnJsimbrusQpv8ljoRrOoFl+Es4apTFmR3
NDWLFIvFenkCg9GhGz1j8LQywutMKXxmK37lPCviPtAHckAktpEqs6HWCsUErepd
HNOqumZvNx2MuBGnOS/yqNizkoNgf++pPdX6l22QKMky8rd5WxTTd9FtHvCUFkgX
FgAphYShPdmux8COxGrpuevcSLDJ0pMbKfKGARYcctjanJ/UpMA=
=HOZg
-----END PGP SIGNATURE-----
Merge tag 'pinctrl-v6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Some assorted pin control fixes, the most interesting will be the
Intel patch fixing a classic problem: laptop touchpad IRQs...
- Some pin drive register fixes in the Mediatek driver.
- Return proper error code in the Aspeed driver, and revert and
ill-advised force-disablement patch that needs to be reworked.
- Fix AMD driver debug output.
- Fix potential NULL dereference in the Single driver.
- Fix a group definition error in the Qualcomm SM8450 LPASS driver.
- Restore pins used in direct IRQ mode in the Intel driver (This
fixes some laptop touchpads!)"
* tag 'pinctrl-v6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: intel: Restore the pins that used to be in Direct IRQ mode
pinctrl: qcom: sm8450-lpass-lpi: correct swr_rx_data group
pinctrl: aspeed: Revert "Force to disable the function's signal"
pinctrl: single: fix potential NULL dereference
pinctrl: amd: Fix debug output for debounce time
pinctrl: aspeed: Fix confusing types in return value
pinctrl: mediatek: Fix the drive register definition of some Pins
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmPmuwEUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vzEYg//XHHddqDRiZmx9McETDAi33rJ9DDo
CMCwiydUzGlDl/IDnBxwcmq0K5wiA5jFvXlRFmzHfnHGpWpRf6ntcT436QnhKe4G
/DAXxVdZGWr079m7s4NKjByDunhkkkT/elapFCtZTwXxMkUvbprM0ozMdtSMnC/M
RDCJKfaV2CKUkl/5Mk9Iw3vzrr62PP8fVHHMIr+6O39frZ2+MrzYCgpGkW0pubmT
He0gmeVnNFzR6qB1GraXVNwlapjPjzvHe1IggDDLJRxM4+sz8qKJz0vKew10JwSo
R5s8ACfTNtHwY45af1EWIeO9BoGD3soNLvWmK/5uNrCWJx9wnczQuz4b/Km2y02Y
KCJaudiC6EfAzu5gCSgao3VZ/EQ45sHrYZN9qiyDujOgAUUPl0oonwa1HW/1WUSH
Pd/ff9o78vASxdZP1o1hF0davNET1HOsvXGxQj71TJLXVsB2pifWvAoNocHHnpoe
cPCix8t3c4pgXzI0RG04tcfqGWAgsaVz73SdU0/g5qk+hPRvypjcY1lw6U66sk9f
/ZNII5fSX6hIWTetD27JiCZNOxJq1jikxOD4/LZizMTjdZYf6VxjDxkIaLS99pZw
RCOQ8chKVemr12lD//8eFUJJvblug2aTlHIwFnMuKiavy6pL5Sm1zGMBrqhYmUSO
pkNXzFaZe+GyF3k=
=NSFX
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fixes from Bjorn Helgaas:
- Move to a shared PCI git tree (Bjorn Helgaas)
- Add Krzysztof Wilczyński as another PCI maintainer (Lorenzo
Pieralisi)
- Revert a couple ASPM patches to fix suspend/resume regressions (Bjorn
Helgaas)
* tag 'pci-v6.2-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
Revert "PCI/ASPM: Refactor L1 PM Substates Control Register programming"
Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"
MAINTAINERS: Promote Krzysztof to PCI controller maintainer
MAINTAINERS: Move to shared PCI tree
This reverts commit 5e85eba6f5.
Thomas Witt reported that 5e85eba6f5 ("PCI/ASPM: Refactor L1 PM Substates
Control Register programming") broke suspend/resume on a Tuxedo
Infinitybook S 14 v5, which seems to use a Clevo L140CU Mainboard.
The main symptom is:
iwlwifi 0000:02:00.0: Unable to change power state from D3hot to D0, device inaccessible
nvme 0000:03:00.0: Unable to change power state from D3hot to D0, device inaccessible
and the machine is only partially usable after resume. It can't run dmesg
and can't do a clean reboot. This happens on every suspend/resume cycle.
Revert 5e85eba6f5 until we can figure out the root cause.
Fixes: 5e85eba6f5 ("PCI/ASPM: Refactor L1 PM Substates Control Register programming")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877
Reported-by: Thomas Witt <kernel@witt.link>
Tested-by: Thomas Witt <kernel@witt.link>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v6.1+
Cc: Vidya Sagar <vidyas@nvidia.com>
This reverts commit 4ff116d0d5.
Tasev Nikola and Mark Enriquez reported that resume from suspend was broken
in v6.1-rc1. Tasev bisected to a47126ec29 ("PCI/PTM: Cache PTM
Capability offset"), but we can't figure out how that could be related.
Mark saw the same symptoms and bisected to 4ff116d0d5 ("PCI/ASPM: Save L1
PM Substates Capability for suspend/resume"), which does have a connection:
it restores L1 Substates configuration while ASPM L1 may be enabled:
pci_restore_state
pci_restore_aspm_l1ss_state
aspm_program_l1ss
pci_write_config_dword(PCI_L1SS_CTL1, ctl1) # L1SS restore
pci_restore_pcie_state
pcie_capability_write_word(PCI_EXP_LNKCTL, cap[i++]) # L1 restore
which is a problem because PCIe r6.0, sec 5.5.4, requires that:
If setting either or both of the enable bits for ASPM L1 PM
Substates, both ports must be configured as described in this
section while ASPM L1 is disabled.
Separately, Thomas Witt reported that 5e85eba6f5 ("PCI/ASPM: Refactor L1
PM Substates Control Register programming") broke suspend/resume, and it
depends on 4ff116d0d5.
Revert 4ff116d0d5 ("PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume") to fix the resume issue and enable revert of 5e85eba6f5
to fix the issue Thomas reported.
Note that reverting 4ff116d0d5 means L1 Substates config may be lost on
suspend/resume. As far as we know the system will use more power but will
still *work* correctly.
Fixes: 4ff116d0d5 ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216782
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877
Reported-by: Tasev Nikola <tasev.stefanoska@skynet.be>
Reported-by: Mark Enriquez <enriquezmark36@gmail.com>
Reported-by: Thomas Witt <kernel@witt.link>
Tested-by: Mark Enriquez <enriquezmark36@gmail.com>
Tested-by: Thomas Witt <kernel@witt.link>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v6.1+
Cc: Vidya Sagar <vidyas@nvidia.com>
All the changes this time are minor devicetree corrections, the majority
being for 64-bit Rockchip SoC support. These are a couple of corrections
for properties that are in violation of the binding, some that put the
machine into safer operating points for the eMMC and thermal settings,
and missing properties that prevented rk356x PCIe and ethernet from
working correctly.
The changes for amlogic and mediatek address incorrect properties that
were preventing the display support on MT8195 and the MMC support
on various Meson SoCs from working correctly.
The stihxxx-b2120 change fixes the GPIO polarity for the DVB tuner
to allow this to be used correctly after a futre driver change,
though it has no effect on older kernels.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmPmbm4ACgkQmmx57+YA
GNmtWhAAur/HSnseXtPfSViejI2QU2zd4nQmUlaJ7c2CgQFEAoc1aL+8FQIiGgtK
UN8eC+SBrrEzEAouHcUptfwo1/SAIcxwtF96s16xOu/za9yTk0QSugqd1WNh+MuK
aCXG6iFkStosEPxgpKAWLTI48pdK32MxnckrSLASYIv84LwlaK7QackBmtmXHiiw
TbBoldv1k1kvhnK7uYjdN5D35fYywv7gwFmEMU3otHLO+aTZZ6RJOfkXN6hc+3lt
sQg/cacgONznFlCyfCLKIgabb01Aya0oG1nYZrn4c3PrJciDkiVyTKut6OHKqSQV
CTg+x2DGOeD2Rqtq5K2gvu2kUkvgBK0oghAROIK2u4xTFIqiWyNqcA3AADNePlaz
p3/H0Io2xyfixt4KNTR7onJ6pTTh5x7PJA5147lX/2WzxoY4W9t3Y8Q4Z2RfLLBw
jq+DWuLDoJT1TpcvlVuflKalsVnfdVXXYDkNTuXnFRl4j+zSQ36v6fZAUl4g0DTG
+kFI4Xa11KWKwxAbANYgqDKFS/BG+KuEuPmYnxCuOMnRxIhpv+2Wj+wlsARDUSn/
Gyv9bsRkEGURAVAvrNnlpTpwp84Vb2b/fBs+7Yg1dKLk4SZ7txJ9vAIaEgDrRt6J
smlS8NOZem4pZTP8Nr2bvbDPPosEMFj72py9KJU57mbtQ16fsxs=
=23z6
-----END PGP SIGNATURE-----
Merge tag 'soc-fixes-6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"All the changes this time are minor devicetree corrections, the
majority being for 64-bit Rockchip SoC support. These are a couple of
corrections for properties that are in violation of the binding, some
that put the machine into safer operating points for the eMMC and
thermal settings, and missing properties that prevented rk356x PCIe
and ethernet from working correctly.
The changes for amlogic and mediatek address incorrect properties that
were preventing the display support on MT8195 and the MMC support on
various Meson SoCs from working correctly.
The stihxxx-b2120 change fixes the GPIO polarity for the DVB tuner to
allow this to be used correctly after a futre driver change, though it
has no effect on older kernels"
* tag 'soc-fixes-6.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: dts: meson-gx: Make mmc host controller interrupts level-sensitive
arm64: dts: meson-g12-common: Make mmc host controller interrupts level-sensitive
arm64: dts: meson-axg: Make mmc host controller interrupts level-sensitive
ARM: dts: stihxxx-b2120: fix polarity of reset line of tsin0 port
arm64: dts: mediatek: mt8195: Fix vdosys* compatible strings
arm64: dts: rockchip: align rk3399 DMC OPP table with bindings
arm64: dts: rockchip: set sdmmc0 speed to sd-uhs-sdr50 on rock-3a
arm64: dts: rockchip: fix probe of analog sound card on rock-3a
arm64: dts: rockchip: add missing #interrupt-cells to rk356x pcie2x1
arm64: dts: rockchip: fix input enable pinconf on rk3399
ARM: dts: rockchip: add power-domains property to dp node on rk3288
arm64: dts: rockchip: add io domain setting to rk3566-box-demo
arm64: dts: rockchip: remove unsupported property from sdmmc2 for rock-3a
arm64: dts: rockchip: drop unused LED mode property from rk3328-roc-cc
arm64: dts: rockchip: reduce thermal limits on rk3399-pinephone-pro
arm64: dts: rockchip: use correct reset names for rk3399 crypto nodes
* A fix to avoid partial TLB fences for huge pages, which are disallowed
by the ISA.
* A fix to to avoid missing a frame when dumping stacks.
* A fix to avoid misaligned accesses (and possibly overflows) in
kprobes.
* A fix for a race condition in tracking page dirtiness.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmPmZ50THHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiXg6D/0S9sZJWHNHRUIPzTSkmZX7T5awnT+J
G6sQ1c48ukNz6ujHnlW9x05xHi41taRco+OIpZ7rLmEN/sdjTDEk7fhSaYqqpzZ+
KsVjhT3rBfoAf6B/zwHOkLFnvCLHagv4NzPQrT+6Kwc7TAm2XFm0UihBDFaTUXJt
6SXCFhy3YPft4WqFxQX5umo8W+TrIJd39c3/suwUABPFGyHjmRt5RHEaeOsil7SG
rdxq2KlYVeZ7485bYIKshmIRXQNmxwVI75+CGdzQqL0g8z8dT6rnV4FOW2PBP074
xr+nn0jV9DBmDNbZ7fcsjAJeAzNZZVyYzE9PGESF5euu09iO+VEdnRD3Bqx/rVcB
PQjB78PsaL0OV8fdJLssa0W9D2kvV7FF3aZLNgEJ3+lhmOw2kkuCyp7ud0QXMU0t
rYb6TQQ3kr5lKu3pJPbt8oir9/KDXaFCEN2e5Lmr7T58KIvoYns+jZnY/7FctDTq
S3Ev9sxDzfJWttVYNmciNTmUV8NMlTOTllBFcp68IWmTrOlKETURH7XBOLDPl8Pd
xPmu226whK/5s1/vjwMH/uUnSCxQXWlxiljMg8tkYt44+RU68AWMcnhylWy2eu1c
njdQIcpWKtZVy/NTH4gSvaGIuY3bi4h0zsY23x713eDp5ZbEeEWEcEDCrnjjuU1c
EcOgqzpIzc/c8w==
=hyiU
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"This is a little bigger that I'd hope for this late in the cycle, but
they're all pretty concrete fixes and the only one that's bigger than
a few lines is pmdp_collapse_flush() (which is almost all
boilerplate/comment). It's also all bug fixes for issues that have
been around for a while.
So I think it's not all that scary, just bad timing.
- avoid partial TLB fences for huge pages, which are disallowed by
the ISA
- avoid missing a frame when dumping stacks
- avoid misaligned accesses (and possibly overflows) in kprobes
- fix a race condition in tracking page dirtiness"
* tag 'riscv-for-linus-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fixup race condition on PG_dcache_clean in flush_icache_pte
riscv: kprobe: Fixup misaligned load text
riscv: stacktrace: Fix missing the first frame
riscv: mm: Implement pmdp_collapse_flush for THP
from Xiubo, marked for stable.
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmPmbYkTHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi+XXB/0c7jZNIR7+sQX6Tf+iaDPuCn2p03eP
vfogzoSCg+7yLq526PTfLYkG/MlLVdcQtm+w86VdUPv0b6G2FPWp1XA3xVkVcWOF
4kD640luNCWrHxB6Rw/NIwCogJGp0YKd3BDvkMgNdAd03gBNgHvzKIWtRJYZ/cUw
WY1LTCXZg3mJ7RL+3F9Mjvzesms/W/v3mW21ieTtAV1OJt1yEhPmosSZelU0tSt6
FLRMlkYtLcAkt2w86//J+b4sShcbcp/W4Io5QRrngGiT8v2Cd+PyoqqnC0V6cbVm
kUo9H0k31zv7p5r4zqjz0YWn130aSQG2MycFk2YywPptHZBrW6pr5AMm
=IUMM
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-6.2-rc8' of https://github.com/ceph/ceph-client
Pull ceph fix from Ilya Dryomov:
"A fix for a pretty embarrassing omission in the session flush handler
from Xiubo, marked for stable"
* tag 'ceph-for-6.2-rc8' of https://github.com/ceph/ceph-client:
ceph: flush cap releases when the session is flushed
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPmbGcQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpms0D/9O87gfqnGasICPYI4ra8b1mTNWN2yonOVC
W8TNjgqfm085CSSyfDgZHhJQzB5/t4W+3upLKHjGzs9gi4xPlc/f+I1cnGj/NkKZ
Z1ocomKZ6Z6xRf+GjXGikzHneJ7e3SiGUGFHpQCHU1mqDAv1sAoie+sHCZviHso+
UbwrNZ8Yxln2sXeGcernbNisjrh++uFFoIZ/Uf3MeAzbE6vfA4p/FdPn/+jNBoc2
fFzfcta0KHejiZx9xbm3PkJ+tkPX8f4cVBSWfde0vsp6k4Lsf/pw4PQucLIY6OzW
gOWnsmq+kjpmuE5fwOukQ7mhdnvogVoIPugwS6HVs11AztBYAwZTSy2XerB+npEk
qiGCGHgsO3xRr3p+fzGbS+sHsWYg8/65k6dACCiopfasMiblD5FltuHhMdqZcP98
evDsQ8saAsANnssA/plLEs2TRMXEMajXMjNTIxqnq0rwZzzdWYjAxu9EvdBWMs3H
KlM9YiZ5WaDEkARHoC0o/SSA+brXEQwR/+mp9gppmWhzMAZvB8K7hV9hEQ1vA7Ja
153C1mB2Z+vKSkd62UFf0bupSH26j5TbvO5/t2RnRBAD+riYc23EGreuunq1vLz0
evxFGHbBJuNQwWzEL/CIf0fmR5Bxnlq7yK2Do5Yt5AmSVYMRwTuZQ1p401Vmw4YM
AWMizMxv1g==
=FUPy
-----END PGP SIGNATURE-----
Merge tag 'block-6.2-2023-02-10' of git://git.kernel.dk/linux
Pull block fix from Jens Axboe:
"A single fix for a smatch regression introduced in this merge window"
* tag 'block-6.2-2023-02-10' of git://git.kernel.dk/linux:
nvme-auth: mark nvme_auth_wq static
Hopefully the last one for 6.2, a collection of the fixes that have
been gathered since the last PR. All changes are small and trivial
device-specific fixes.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmPknxUOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE8A+BAAqAgryk1HDUJz5QeTH4sHwphhrqnSlIwIRFop
LcUBHXRKoOfmJsVfKq80JLuRkUmdUojKUc2t32XTxdcybPEARFz8VSIzIsWZGwzG
UDEpsK41ItpySCl3FhIE9oNfgFp68HzZhoHvKt21DdUg2kVTF63nBVaC/3ao3YIS
FrPhpLco78w40HDiGFM7bInFLn52Gyi/yS5eva6RwRIZ4xPg/jYYO+z9d2xCsboC
iL45AyeV2r6yIAK8ESxgDj3qyEUJIMYMKj4BDJVVENstLQ4Rj5UspP/na5l1Qv/0
c/2cYwkdU8vq4hdrqI5C5gN+h2VOUalowUDRfvjR44gLGz2jIA3+iAWzIhMHZ1Hs
888giJFPD8H/2xErVwU1jlYj0YxhvX0C1vlUQzSHsgjeY9HTwt4IN52u2LaMVyog
76NUcopoSaqAeZDfSjn4/X22QvKijHe0/NVtAF2s18uTQQ6z13q0XLs74PI67Fyd
AOZZcJrLWezbzHHq/ssRiQb47tSzmCEHiAAuzN7VxGPv+2cNvsw/Epi+pq1bKztz
AapPKeyIPng08Perws2ZMh+Psg6rL2ap6Y3hW/WbnXS4/7e1fb9s//JAdIKDMOK2
YlSq/KGjmOAzkFWDxtxLY5wsMFR0UiBN1P+ZGbeG0o5rWufPW8N1nO1H9D6lXlTk
zG5CVXE=
=fz1j
-----END PGP SIGNATURE-----
Merge tag 'sound-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Hopefully the last one for 6.2, a collection of the fixes that have
been gathered since the last pull.
All changes are small and trivial device-specific fixes"
* tag 'sound-6.2-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Add Positivo N14KP6-TG
ASoC: topology: Return -ENOMEM on memory allocation failure
ALSA: emux: Avoid potential array out-of-bound in snd_emux_xg_control()
ASoC: fsl_sai: fix getting version from VERID
ALSA: hda/realtek: fix mute/micmute LEDs don't work for a HP platform.
ALSA: hda/realtek: Add quirk for ASUS UM3402 using CS35L41
ASoC: codecs: es8326: Fix DTS properties reading
ASoC: tas5805m: add missing page switch.
ASoC: tas5805m: rework to avoid scheduling while atomic.
ALSA: hda/realtek: Enable mute/micmute LEDs on HP Elitebook, 645 G9
ASoC: SOF: amd: Fix for handling spurious interrupts from DSP
ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book2 Pro 360
ALSA: pci: lx6464es: fix a debug loop
ASoC: rt715-sdca: fix clock stop prepare timeout issue
The usage of edge-triggered interrupts lead to lost interrupts under load,
see [0]. This was confirmed to be fixed by using level-triggered
interrupts.
The report was about SDIO. However, as the host controller is the same
for SD and MMC, apply the change to all mmc controller instances.
[0] https://www.spinics.net/lists/linux-mmc/msg73991.html
Fixes: ef8d2ffedf ("ARM64: dts: meson-gxbb: add MMC support")
Cc: stable@vger.kernel.org
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/76e042e0-a610-5ed5-209f-c4d7f879df44@gmail.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
The usage of edge-triggered interrupts lead to lost interrupts under load,
see [0]. This was confirmed to be fixed by using level-triggered
interrupts.
The report was about SDIO. However, as the host controller is the same
for SD and MMC, apply the change to all mmc controller instances.
[0] https://www.spinics.net/lists/linux-mmc/msg73991.html
Fixes: 4759fd87b9 ("arm64: dts: meson: g12a: add mmc nodes")
Tested-by: FUKAUMI Naoki <naoki@radxa.com>
Tested-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Tested-by: Jerome Brunet <jbrunet@baylibre.com>
Cc: stable@vger.kernel.org
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/27d89baa-b8fa-baca-541b-ef17a97cde3c@gmail.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
The usage of edge-triggered interrupts lead to lost interrupts under load,
see [0]. This was confirmed to be fixed by using level-triggered
interrupts.
The report was about SDIO. However, as the host controller is the same
for SD and MMC, apply the change to all mmc controller instances.
[0] https://www.spinics.net/lists/linux-mmc/msg73991.html
Fixes: 221cf34bac ("ARM64: dts: meson-axg: enable the eMMC controller")
Reported-by: Peter Suti <peter.suti@streamunlimited.com>
Tested-by: Vyacheslav Bocharov <adeep@lexina.in>
Tested-by: Peter Suti <peter.suti@streamunlimited.com>
Cc: stable@vger.kernel.org
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/c00655d3-02f8-6f5f-4239-ca2412420cad@gmail.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>