The bitops.h is for bit related operations. The aligned_byte_mask()
is about byte (or part of the machine word) operations, for which
we have a separate header, move the mentioned macro to wordpart.h
to consolidate similar operations.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Bitops API is the very basic, and it's widely used by the kernel. But
corresponding files are not maintained.
Bitmaps actively use bit operations, and big share of bitops material
already moves through the bitmap branch.
I would like to take a closer look to bitops.
This patch creates a BITOPS API record in the MAINTAINERS, and adds
Rasmus as a reviewer, and myself as a maintainer of those files.
CC: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
The function claims to return the bitmap size, if Nth bit doesn't exist.
This rule is violated in inline case because the fns() that is used
there doesn't know anything about size of the bitmap.
So, relax this requirement to '>= size', and make the outline
implementation a bit cheaper.
All in-tree kernel users of find_nth_bit() are safe against that.
Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Closes: https://lore.kernel.org/all/Zi50cAgR8nZvgLa3@yury-ThinkPad/T/#m6da806a0525e74dcc91f35e5f20766ed4e853e8a
Signed-off-by: Yury Norov <yury.norov@gmail.com>
The test now is limited to be compiled as a module. There's no technical
reason for it. Now that the test bears some performance benchmarks, it
would be reasonable to run it at kernel load time, before userspace
starts, to reduce possible jitter.
Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
The current fns() repeatedly uses __ffs() to find the index of the
least significant bit and then clears the corresponding bit using
__clear_bit(). The method for clearing the least significant bit can be
optimized by using word &= word - 1 instead.
Typically, the execution time of one __ffs() plus one __clear_bit() is
longer than that of a bitwise AND operation and a subtraction. To
improve performance, the loop for clearing the least significant bit
has been replaced with word &= word - 1, followed by a single __ffs()
operation to obtain the answer. This change reduces the number of
__ffs() iterations from n to just one, enhancing overall performance.
This modification significantly accelerates the fns() function in the
test_bitops benchmark, improving its speed by approximately 7.6 times.
Additionally, it enhances the performance of find_nth_bit() in the
find_bit benchmark by approximately 26%.
Before:
test_bitops: fns: 58033164 ns
find_nth_bit: 4254313 ns, 16525 iterations
After:
test_bitops: fns: 7637268 ns
find_nth_bit: 3362863 ns, 16501 iterations
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Introduce a benchmark test for the fns(). It measures the total time
taken by fns() to process 10,000 test data generated using
get_random_bytes() for each n in the range [0, BITS_PER_LONG).
example:
test_bitops: fns: 7637268 ns
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Rasmus Villemoes <linux@rasmusvillemoes.dk>
CC: David Laight <David.Laight@aculab.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Suggested-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
In some cases like performance benchmarking, we need to call a
function, but don't need to read the returned value. If compiler
recognizes the function as pure or const, it can remove the function
invocation, which is not what we want.
To prevent that, the common practice is assigning the return value to
a temporary static volatile variable. From compiler's point of view,
the variable is unused because never read back after been assigned.
To make sure the variable is always emitted, we provide a __used
attribute. This works with GCC, but clang still emits
Wunused-but-set-variable. To suppress that warning, we need to teach
clang to do that with the 'unused' attribute.
Nathan Chancellor explained that in details:
While having used and unused attributes together might look unusual,
reading the GCC attribute manual makes it seem like these attributes
fulfill similar yet different roles, __unused__ prevents any unused
warnings while __used__ forces the variable to be emitted. A strict
reading of that does not make it seem like __used__ implies disabling
unused warnings
The compiler documentation makes it clear what happens behind the 'used'
and 'unused' attributes, but the chosen names may confuse readers if
such combination catches an eye in a random code.
This patch adds __always_used macro, which combines both attributes
and comments on what happens for those interested in details.
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/oe-kbuild-all/202405030808.UsoMKFNP-lkp@intel.com/
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Optimize topology_span_sane() by removing duplicate comparisons.
Since topology_span_sane() is called inside of for_each_cpu(), each
previous CPU has already been compared against every other CPU. The
current CPU only needs to be compared against higher-numbered CPUs.
The total number of comparisons is reduced from N * (N - 1) to
N * (N - 1) / 2 on each non-NUMA scheduling domain level.
Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Add for_each_cpu_from() as a generic cpumask macro.
for_each_cpu_from() is the same as for_each_cpu(), except it starts at
@cpu instead of zero.
Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
The device tree changes this time are all for NXP i.MX platforms, addressing
issues with clocks and regulators on i.MX7 and i.MX8.
The old OMAP2 based Nokia N8x0 tablet get a couple of code fixes for
regressions that came in.
The ARM SCMI and FF-A firmware interfaces get a couple of minor
bug fixes.
A regression fix for RISC-V cache management addresses a problem with
probe order on Sifive cores.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmYZjn8ACgkQYKtH/8kJ
UidrDQ/+MISVzAtuF9ZVpGC8K7QTHjd/Izv7lDFQq+Um1AM1CCHlb1SyKintdDCK
qQBn9TXElP5K1j3+9wAooShqvZI9CFLNQm7qg0f16rX2EGoV1ZTNQnAru6c/E0iq
8Sxe7xE0gk9ju5VfYNBt8suBtetFwU+x5befODmTh4TP2dvOkewB7GF90E2z43yr
/JD5IMDGboxomdQ0vWhvQpI++GBjxAolXRFA/uIdr91PnD76WamXIG2kRxBVZnCF
/Z/VIKrBJ4iRntrOwtZLocbDwfUCEMNITx4NuSsbu0JWgm6+Eucf8j9/G1v7zSZ6
hT5v4L742clGHs1GLYPA9rRfejaWb6x2aUgm2JZ0vF6JG8ZMBKllUVxlwye/kJ8Y
SizZDX1ztRN/tlFGMkkreb22gYNZxFG5HtilGwQ+/sH8mgjhGaiU3VywiraSDWBg
5ksGXAuwWDyHb9GHL5a/lbYB6hsvPKzoCNIEluuFbL2Fpz+vpBeD21EZWYwmnKLi
U/YJc0eYge6MrWn+6960rTs+7ZouuAejPyJcPzjz1Gz2Lsapa4zwLuX2vtSavEC2
Hl5DqnB6/vRdEQml4tDukt1uJ3guFdDa2a7RE2EVgSwKqScnUH40Tb7GLE/fQL/o
Nr1Mdz0T48PG9pvur9i1TdiWZqhrjsEID/xmYheIzJa4bQyrh+s=
=qMYh
-----END PGP SIGNATURE-----
Merge tag 'soc-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"The device tree changes this time are all for NXP i.MX platforms,
addressing issues with clocks and regulators on i.MX7 and i.MX8.
The old OMAP2 based Nokia N8x0 tablet get a couple of code fixes for
regressions that came in.
The ARM SCMI and FF-A firmware interfaces get a couple of minor bug
fixes.
A regression fix for RISC-V cache management addresses a problem with
probe order on Sifive cores"
* tag 'soc-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (23 commits)
MAINTAINERS: Change Krzysztof Kozlowski's email address
arm64: dts: imx8qm-ss-dma: fix can lpcg indices
arm64: dts: imx8-ss-dma: fix can lpcg indices
arm64: dts: imx8-ss-dma: fix adc lpcg indices
arm64: dts: imx8-ss-dma: fix pwm lpcg indices
arm64: dts: imx8-ss-dma: fix spi lpcg indices
arm64: dts: imx8-ss-conn: fix usb lpcg indices
arm64: dts: imx8-ss-lsio: fix pwm lpcg indices
ARM: dts: imx7s-warp: Pass OV2680 link-frequencies
ARM: dts: imx7-mba7: Use 'no-mmc' property
arm64: dts: imx8-ss-conn: fix usdhc wrong lpcg clock order
arm64: dts: freescale: imx8mp-venice-gw73xx-2x: fix USB vbus regulator
arm64: dts: freescale: imx8mp-venice-gw72xx-2x: fix USB vbus regulator
cache: sifive_ccache: Partially convert to a platform driver
firmware: arm_scmi: Make raw debugfs entries non-seekable
firmware: arm_scmi: Fix wrong fastchannel initialization
firmware: arm_ffa: Fix the partition ID check in ffa_notification_info_get()
ARM: OMAP2+: fix USB regression on Nokia N8x0
mmc: omap: restore original power up/down steps
mmc: omap: fix deferred probe
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmYZZyUUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vyDBg/9FZpeff8u0p+DscTL51bVttVqm7Mk
mFvJErifFdSiuCN1izjZ27yrOL6Ef9q3AUAcaWmt9y11s3+l7iqt1PP87T3OHQ2e
Lg8lHjrtRjxViQ9AYqi7/KdTlcQ6kRbqED5PHSx4uw4whSBliJRPbrovqtyM9zuG
8rcCUGmgFwOvCKLVz9XuwyTbPTu8daGOtGAVns1cQodYbp8MlSb6lNlX1JHUbaj9
4GObD2nWr8bUfcA4CrIOo/N/Gn1jZmsHXkbZ7OzORVB2PTG5/1atYHSYgLC1CU/z
C+eCp8XROH/TS23KTv58sFRXdby4x/hvIhfev5k8v7n5cLrLruDpWF7qfT/s0cKE
ZN81kx+Kr+5knV27+NVEUdUIrhaoHJEjHo9Y2A1Xd3+NNCX4pWf/Iws3DBCZ07sv
OiE2U6H3qpQnoyzSwAw2NiS6jgCYnukgt1GiyyiHh/41SeAmsY5JW8+mBOFnSpYe
FIhE7hT4sG4Dh1yGrPkSND3NOAMW3GFL/i87avOMvdwbd3lIg99X7CgBpq7uZakq
I6ffTCyCic/WVp74xxMxRmadU/EaieYIHKaIR2DQEyeUKs0GcfY2c1QMfcDPBU6t
k0QmhmNDsvH6ARzbWAIwaZldhqIFomimgRPI9L7DD39o/MVhD71yJVT9L+Rf+8mo
DN+ElQ9trGT26ho=
=lxSz
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.9-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Revert a quirk that prevented Secondary Bus Reset for LSI / Agere
FW643.
We thought the device was broken, but the reset does work correctly
on other platforms, and the reset avoids leaking data out of VMs
(Bjorn Helgaas)
- Update MAINTAINERS to reflect that Gustavo Pimentel is no longer
reachable (Manivannan Sadhasivam)
* tag 'pci-v6.9-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
Revert "PCI: Mark LSI FW643 to avoid bus reset"
MAINTAINERS: Drop Gustavo Pimentel as PCI DWC Maintainer
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmYZWKYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpl7VD/0VyVykpWz9ydtEQvlTLndMASi4U0RpSPPy
mPX8ydz5k8IywiaAaKxCWK0hCSEXWceQKrbqJqQau9M33JD1Ss7x44BQR/XBZD5J
vRV5PbnMpIFL1knGzf9xgXKAeEHI2Hi/VKuW3dMoXcXHABfHlbS2ulvpQ97S9oAo
iCS3728sFOiqcKh2SSSMI04f0o+48F2mh3g8CAd45Kx/kwbEogaxc6U590wbnA+5
+TTVb+XRT0x55DCq0awtXDxLs4030IkcqJ+S5891m5Dfm8LUt0m0ekr8R9ks/sA+
+axbcLLw3M3B6QTsT8QCytpAyW5eNBdyGSj4BFl5AOGjUDjuassdzXFtKevItgEd
kPsuYexNHERnntb+zEXfO7wVvyewy4Dby/WjoYi2hOhvkt73UrR9y93cp3Lq3iI4
G1+JNLNVkQo/bI0ZtyN5+q39odHcvhGvAr6zAygR+fqU6qOymRvQBpV/bnnMrv2j
wxlGC5r7tGd7kEwTg/UBK+sVR8OyOm8IJAszK372KRORDRiYLlw7cF+4d5jpG9P5
frk6z6QrJPoH15iaOjZY62PKEEXiuJbQm2FJYMZvD5upR+UHVvg2xVdLMy/tSnaX
zgNzlPFPKXXQ72IS8shJ9JE0XZWfZxirbUNIL+1Z2HUUj3t5SduBV/dWEeupPvuv
5Ml9Lg2uMA==
=rCiN
-----END PGP SIGNATURE-----
Merge tag 'block-6.9-20240412' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- MD pull request via Song:
- UAF fix (Yu)
- Avoid out-of-bounds shift in blk-iocost (Rik)
- Fix for q->blkg_list corruption (Ming)
- Relax virt boundary mask/size segment checking (Ming)
* tag 'block-6.9-20240412' of git://git.kernel.dk/linux:
block: fix that blk_time_get_ns() doesn't update time after schedule
block: allow device to have both virt_boundary_mask and max segment size
block: fix q->blkg_list corruption during disk rebind
blk-iocost: avoid out of bounds shift
raid1: fix use-after-free for original bio in raid1_write_request()
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmYZWJQQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgprlZEACKCVoeCG6JJ5u4do3YvQcYu4cYHAkixCqj
wFJnJ+vZBvv/DLi04SjKPZWFWiR3z4GFGEkZxgOakLYxJRQ8gnvosn65V/aH8qMR
UK03E6fFt5hQ1lgUMbn0HyiiUsoDzsx/WkrQHYr/aQaxgW5827qWPZuKIvRzA6dy
gPT8Rxcr+IJMcm1iwdiLQbKScFznUzHAmBpE9TXc8rqHRVAzSbd+i8CpVB6+UJGY
ftkkPQsRRa3PGtQO5Q23F48XI22gRFwDw6x2odieAPGIyNrt8DIIexJeknhJ7JOl
j5/XAupmS3tNen+scd7l3fD/jToH5yO2a3VpbW6s2ptwHaMVpqRUmCP2BsA6FgSi
q31GeQC5UJwKc6Bhj5iUZ3HgrTj50V4zobGbb3XNlMfqf+zmKWYm8c56Wl78Jm1X
WkGIJrxKsRs9ewVxFAm3uYi4SFE1+YNiNTcrHX/d8E93mgoLb0j2z8ay1eWPOY1k
F4alHFyWEVP/8EQJPfTj2R/EjztHHkzTx+2GPU/dEAVqgs61Hhghwpo/shT/jLrQ
I0I28+VhBjA2hy6ZZGA6HRQTP7UhMOnR9CmW7ldRse0a9r6UittBaZrABhVb3e0Q
pw6KcwZNdANODkANS4s+hvce1PEYg5Y/a6OxXY8kTioXBRdjvH4PO3LRVPfN96Ii
8PrOgM3eqA==
=M7Ul
-----END PGP SIGNATURE-----
Merge tag 'io_uring-6.9-20240412' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Fix for sigmask restoring while waiting for events (Alexey)
- Typo fix in comment (Haiyue)
- Fix for a msg_control retstore on SEND_ZC retries (Pavel)
* tag 'io_uring-6.9-20240412' of git://git.kernel.dk/linux:
io-uring: correct typo in comment for IOU_F_TWQ_LAZY_WAKE
io_uring/net: restore msg_control on sendzc retry
io_uring: Fix io_cqring_wait() not restoring sigmask on get_timespec64() failure
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmYZXkYTHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHziyLNCACkJv5GjDeHC22P+UB6re3gO39wdFsL
BvHmhyebz5iidjS1l6pC/RcLOmB+OwekOiDfFvkwu0E+3d5dZTBInUV4BWzbJVLR
Hkjd+oCEThV+f58n2Ol/r6lBGIFCgJpgtW02DQurM8wHTLm/vqxpj2GsvJCRXNr6
lfqWVvZBteEmqxHUGNWldOQh2jNPGyvOpDRCQTzdjEfUAijoc8arLyuwbxl9tFdA
qVtCZk+Jv0Lz9uQUSsI7aM+SEXUgGW3pZXJL9h9PPjC9d811GCwQnUcL4hNE2B3X
Nf5Qzo9jV2GKJh8+kHui0dmpt0dMQfFZ2YaeJDZl8c8dE2uqBL+zMO79
=mvie
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-6.9-rc4' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two CephFS fixes marked for stable and a MAINTAINERS update"
* tag 'ceph-for-6.9-rc4' of https://github.com/ceph/ceph-client:
MAINTAINERS: remove myself as a Reviewer for Ceph
ceph: switch to use cap_delay_lock for the unlink delay list
ceph: redirty page before returning AOP_WRITEPAGE_ACTIVATE
Commit d96c36004e ("tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig
entry") removed a hidden tab because it apparently showed breakage in
some third-party kernel config parsing tool.
It wasn't clear what tool it was, but let's make sure it gets fixed.
Because if you can't parse tabs as whitespace, you should not be parsing
the kernel Kconfig files.
In fact, let's make such breakage more obvious than some esoteric ftrace
record size option. If you can't parse tabs, you can't have page sizes.
Yes, tab-vs-space confusion is sadly a traditional Unix thing, and
'make' is famous for being broken in this regard. But no, that does not
mean that it's ok.
I'd add more random tabs to our Kconfig files, but I don't want to make
things uglier than necessary. But it *might* bbe necessary if it turns
out we see more of this kind of silly tooling.
Fixes: d96c36004e ("tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig entry")
Link: https://lore.kernel.org/lkml/CAHk-=wj-hLLN_t_m5OL4dXLaxvXKy_axuoJYXif7iczbfgAevQ@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fix the buffer_percent accounting as it is dependent on three variables:
1) pages_read - number of subbuffers read
2) pages_lost - number of subbuffers lost due to overwrite
3) pages_touched - number of pages that a writer entered
These three counters only increment, and to know how many active pages
there are on the buffer at any given time, the pages_read and
pages_lost are subtracted from pages_touched. But the pages touched
was incremented whenever any writer went to the next subbuffer even
if it wasn't the only one, so it was incremented more than it should
be causing the counter for how many subbuffers currently have content
incorrect, which caused the buffer_percent that holds waiters until
the ring buffer is filled to a given percentage to wake up early.
- Fix warning of unused functions when PERF_EVENTS is not configured in
- Replace bad tab with space in Kconfig for FTRACE_RECORD_RECURSION_SIZE
- Fix to some kerneldoc function comments in eventfs code.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZhk+khQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qvs0AP98c226UFU6Dha4wvgSulC/wKVvHN3X
jeclMTdn8RGs2gD/b9OULKNv1//6fP16ZRun7ntRQkotVhlNhf9Ee0smiwU=
=UYrk
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix the buffer_percent accounting as it is dependent on three
variables:
1) pages_read - number of subbuffers read
2) pages_lost - number of subbuffers lost due to overwrite
3) pages_touched - number of pages that a writer entered
These three counters only increment, and to know how many active
pages there are on the buffer at any given time, the pages_read and
pages_lost are subtracted from pages_touched.
But the pages touched was incremented whenever any writer went to the
next subbuffer even if it wasn't the only one, so it was incremented
more than it should be causing the counter for how many subbuffers
currently have content incorrect, which caused the buffer_percent
that holds waiters until the ring buffer is filled to a given
percentage to wake up early.
- Fix warning of unused functions when PERF_EVENTS is not configured in
- Replace bad tab with space in Kconfig for FTRACE_RECORD_RECURSION_SIZE
- Fix to some kerneldoc function comments in eventfs code.
* tag 'trace-v6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ring-buffer: Only update pages_touched when a new page is touched
tracing: hide unused ftrace_event_id_fops
tracing: Fix FTRACE_RECORD_RECURSION_SIZE Kconfig entry
eventfs: Fix kernel-doc comments to functions
-----BEGIN PGP SIGNATURE-----
iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmYY7QoaHHRzYm9nZW5k
QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHDuJxAAhS4A48oPpC98MvxzzIUz
xJIOoJbQZuFK7rEd2CEusbt4Ri3tojsIQ1IqLKEGHq6OJxuWS8U9egn8wQZVcO3s
BsJrtfiu4jEhUohWNDzVG7XiMc+/H1vRN7/BXV3Fvl8bSvwouqWcYfMU/0qe7Ntq
X34aidIvVLUCDZPz/LMUXGoaxW2oA2yAEuH+LaLfW/sZ+FWzM9f1e6BgxzV5HnqV
qeRiZq56C1y2UMUbbgoLdHlP0T2PgiZ4eZ3wzlw9goka47hXUQlvPvg6x7pCv+ZS
9mOZWMqTiMczhQlm3jjeKBo45eVKLnf6QfV3Sr4TwRYoxoO95hTEsgf6JTgprxiP
+M5BG5XJchXUE2m3Oi9cnIo9wXcIyj+QlYhxqGPR5zIHcAOOKGUn3E4yuhezbN8X
tvdF8kUv7He9drRA5rUdZ0AJ1P+nooaBfEPoIPhb7laSxlyXi4SGLKziY9CRdiMq
jxJezT2ES75cz66XMHJMSCZTHEF4rthVA6UsTa3gGer2BEKWAMFZ2zoMTv1ZO9Mo
tcxsSonL2vfW99lcFt8PWAucoVi0EjB0sxDASccg0NS/sT6hBb9qrWuG3Bq9u2N7
DzWPO1275yMuPwYxWpjxQ5bDMBnT/TzPxSv06obFayCip6NJABHAYpZ/gzK8CaaD
bGmMowZIPpsdoUzrIdFN3DA=
=fJiH
-----END PGP SIGNATURE-----
Merge tag 'mips-fixes_6.9_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fix from Thomas Bogendoerfer:
"Fix for syscall_get_nr() to make it work even if tracing is disabled"
* tag 'mips-fixes_6.9_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: scall: Save thread_info.syscall unconditionally on entry
client:
- Protect connector modes with mode_config mutex
ast:
- Fix soft lockup
host1x:
- Do not setup DMA for virtual addresses
ivpu:
- Fix deadlock in context_xa
- PCI fixes
- Fixes to error handling
nouveau:
- gsp: Fix OOB access
- Fix casting
panfrost:
- Fix error path in MMU code
qxl:
- Revert "drm/qxl: simplify qxl_fence_wait"
vmwgfx:
- Enable DMA for SEV mappings
i915:
- Couple CDCLK programming fixes
- HDCP related fix
- 4 Bigjoiner related fixes
- Fix for a circular locking around GuC on reset+wedged case
xe:
- Fix double display mutex initializations
- Fix u32 -> u64 implicit conversions
- Fix RING_CONTEXT_CONTROL not marked as masked
msm:
- DP refcount leak fix on disconnect
- Add missing newlines to prints in msm_fb and msm_kms
- fix dpu debugfs entry permissions
- Fix the interface table for the catalog of X1E80100
- fix irq message printing
- Bindings fix to add DP node as child of mdss for mdss node
- Minor typo fix in DP driver API which handles port status change
- fix CHRASHDUMP_READ()
- fix HHB (highest bank bit) for a619 to fix UBWC corruption
amdgpu:
- GPU reset fixes
- Fix some confusing logging
- UMSCH fix
- Aborted suspend fix
- DCN 3.5 fixes
- S4 fix
- MES logging fixes
- SMU 14 fixes
- SDMA 4.4.2 fix
- KASAN fix
- SMU 13.0.10 fix
- VCN partition fix
- GFX11 fixes
- DWB fixes
- Plane handling fix
- FAMS fix
- DCN 3.1.6 fix
- VSC SDP fixes
- OLED panel fix
- GFX 11.5 fix
amdkfd:
- GPU reset fixes
- fix ioctl integer overflow
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmYYjYoACgkQDHTzWXnE
hr5+AhAAqM5ZXNEC5Y5zbxqLmMyr08sn4lm/fE0CNZYPoRXerBJIOTXvc8V9YL9W
5Nxux+5qLzLrIOKdY5PIGEyq0TlA53ZyR0GSF3NfFDxI2OiBRbuRpS6obuimPRY6
bHdfb89S3TRj3XtP7eJs3fuejygAzLDg+4AXIEF9htlXVARrEBVA2Cw1Km0goWoa
uXhfLlt9tycIdtuedpPDnetZGlffZa1eCFqFlDJg671moEdQDfdSSsgVtkaDkAy1
Cy1jdu4+9WHTNAS6kGHy8iB7Ebq+n6UKiBr3o9GoWTIgqVz33FeMU1zrJ2IdwNNK
nuPBeyk7Z125VEntyj+29GdLRcKv+K75zbdoKrifYp5/937DBjpiJ5bcqOnZdDVG
hc6IPeMWcvLJ10W+ZrtUqs7BA2cS2dO06uBKEHqNL5Th/TmOSpyKeS/JOtU9Tpkq
b2OkGy/H+20wXFBFmanLlPuS0lrvuUkLi7bIvtqZq2e5vOHbm4xKLrCDPWFtp1sR
ohkDtqmkGocQz5l0Ublqdcnrtttg3+Rr5Jh+cCkq5f9gEqsbQWEoPzG9dyZOOBq2
xEtWf5enH+3J711/9FB8+aWD+j7T04a7ZEODeDDAQBwgW6gMYeGHQPKvTWC0xkYX
8LBYvxfV1TSK+4l3geF9m+MQuZnuX1sZ5QSh7b/nfBvxMaFVJ84=
=tDFk
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2024-04-12' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Looks like everyone woke up after holidays, this weeks pull has a
bunch of stuff all over, 2 weeks worth of amdgpu is a lot of it, then
i915/xe have a few, a bunch of msm fixes, then some scattered driver
fixes.
I expect things will settle down for rc5.
client:
- Protect connector modes with mode_config mutex
ast:
- Fix soft lockup
host1x:
- Do not setup DMA for virtual addresses
ivpu:
- Fix deadlock in context_xa
- PCI fixes
- Fixes to error handling
nouveau:
- gsp: Fix OOB access
- Fix casting
panfrost:
- Fix error path in MMU code
qxl:
- Revert "drm/qxl: simplify qxl_fence_wait"
vmwgfx:
- Enable DMA for SEV mappings
i915:
- Couple CDCLK programming fixes
- HDCP related fix
- 4 Bigjoiner related fixes
- Fix for a circular locking around GuC on reset+wedged case
xe:
- Fix double display mutex initializations
- Fix u32 -> u64 implicit conversions
- Fix RING_CONTEXT_CONTROL not marked as masked
msm:
- DP refcount leak fix on disconnect
- Add missing newlines to prints in msm_fb and msm_kms
- fix dpu debugfs entry permissions
- Fix the interface table for the catalog of X1E80100
- fix irq message printing
- Bindings fix to add DP node as child of mdss for mdss node
- Minor typo fix in DP driver API which handles port status change
- fix CHRASHDUMP_READ()
- fix HHB (highest bank bit) for a619 to fix UBWC corruption
amdgpu:
- GPU reset fixes
- Fix some confusing logging
- UMSCH fix
- Aborted suspend fix
- DCN 3.5 fixes
- S4 fix
- MES logging fixes
- SMU 14 fixes
- SDMA 4.4.2 fix
- KASAN fix
- SMU 13.0.10 fix
- VCN partition fix
- GFX11 fixes
- DWB fixes
- Plane handling fix
- FAMS fix
- DCN 3.1.6 fix
- VSC SDP fixes
- OLED panel fix
- GFX 11.5 fix
amdkfd:
- GPU reset fixes
- fix ioctl integer overflow"
* tag 'drm-fixes-2024-04-12' of https://gitlab.freedesktop.org/drm/kernel: (65 commits)
amdkfd: use calloc instead of kzalloc to avoid integer overflow
drm/xe: Label RING_CONTEXT_CONTROL as masked
drm/xe/xe_migrate: Cast to output precision before multiplying operands
drm/xe/hwmon: Cast result to output precision on left shift of operand
drm/xe/display: Fix double mutex initialization
drm/amdgpu: differentiate external rev id for gfx 11.5.0
drm/amd/display: Adjust dprefclk by down spread percentage.
drm/amd/display: Set VSC SDP Colorimetry same way for MST and SST
drm/amd/display: Program VSC SDP colorimetry for all DP sinks >= 1.4
drm/amd/display: fix disable otg wa logic in DCN316
drm/amd/display: Do not recursively call manual trigger programming
drm/amd/display: always reset ODM mode in context when adding first plane
drm/amdgpu: fix incorrect number of active RBs for gfx11
drm/amd/display: Return max resolution supported by DWB
amd/amdkfd: sync all devices to wait all processes being evicted
drm/amdgpu: clear set_q_mode_offs when VM changed
drm/amdgpu: Fix VCN allocation in CPX partition
drm/amd/pm: fix the high voltage issue after unload
drm/amd/display: Skip on writeback when it's not applicable
drm/amdgpu: implement IRQ_STATE_ENABLE for SDMA v4.4.2
...
While monitoring the throttle time of IO from iocost, it's found that
such time is always zero after the io_schedule() from ioc_rqos_throttle,
for example, with the following debug patch:
+ printk("%s-%d: %s enter %llu\n", current->comm, current->pid, __func__, blk_time_get_ns());
while (true) {
set_current_state(TASK_UNINTERRUPTIBLE);
if (wait.committed)
break;
io_schedule();
}
+ printk("%s-%d: %s exit %llu\n", current->comm, current->pid, __func__, blk_time_get_ns());
It can be observerd that blk_time_get_ns() always return the same time:
[ 1068.096579] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.272587] fio-1268: ioc_rqos_throttle exit 1067901962288
[ 1068.274389] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.472690] fio-1268: ioc_rqos_throttle exit 1067901962288
[ 1068.474485] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.672656] fio-1268: ioc_rqos_throttle exit 1067901962288
[ 1068.674451] fio-1268: ioc_rqos_throttle enter 1067901962288
[ 1068.872655] fio-1268: ioc_rqos_throttle exit 1067901962288
And I think the root cause is that 'PF_BLOCK_TS' is always cleared
by blk_flush_plug() before scheduel(), hence blk_plug_invalidate_ts()
will never be called:
blk_time_get_ns
plug->cur_ktime = ktime_get_ns();
current->flags |= PF_BLOCK_TS;
io_schedule:
io_schedule_prepare
blk_flush_plug
__blk_flush_plug
/* the flag is cleared, while time is not */
current->flags &= ~PF_BLOCK_TS;
schedule
sched_update_worker
/* the flag is not set, hence plug->cur_ktime is not cleared */
if (tsk->flags & PF_BLOCK_TS)
blk_plug_invalidate_ts()
blk_time_get_ns
/* got the time stashed before schedule */
return plug->cur_ktime;
Fix the problem by clearing cached time in __blk_flush_plug().
Fixes: 06b23f92af ("block: update cached timestamp post schedule/preemption")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20240411032349.3051233-2-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use consistent log severity (pr_warn) to log all messages in SNP
enable path.
Suggested-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20240410101643.32309-1-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Commit 1a75cc710b ("iommu/vt-d: Use rbtree to track iommu probed
devices") adds all devices probed by the iommu driver in a rbtree
indexed by the source ID of each device. It assumes that each device
has a unique source ID. This assumption is incorrect and the VT-d
spec doesn't state this requirement either.
The reason for using a rbtree to track devices is to look up the device
with PCI bus and devfunc in the paths of handling ATS invalidation time
out error and the PRI I/O page faults. Both are PCI ATS feature related.
Only track the devices that have PCI ATS capabilities in the rbtree to
avoid unnecessary WARN_ON in the iommu probe path. Otherwise, on some
platforms below kernel splat will be displayed and the iommu probe results
in failure.
WARNING: CPU: 3 PID: 166 at drivers/iommu/intel/iommu.c:158 intel_iommu_probe_device+0x319/0xd90
Call Trace:
<TASK>
? __warn+0x7e/0x180
? intel_iommu_probe_device+0x319/0xd90
? report_bug+0x1f8/0x200
? handle_bug+0x3c/0x70
? exc_invalid_op+0x18/0x70
? asm_exc_invalid_op+0x1a/0x20
? intel_iommu_probe_device+0x319/0xd90
? debug_mutex_init+0x37/0x50
__iommu_probe_device+0xf2/0x4f0
iommu_probe_device+0x22/0x70
iommu_bus_notifier+0x1e/0x40
notifier_call_chain+0x46/0x150
blocking_notifier_call_chain+0x42/0x60
bus_notify+0x2f/0x50
device_add+0x5ed/0x7e0
platform_device_add+0xf5/0x240
mfd_add_devices+0x3f9/0x500
? preempt_count_add+0x4c/0xa0
? up_write+0xa2/0x1b0
? __debugfs_create_file+0xe3/0x150
intel_lpss_probe+0x49f/0x5b0
? pci_conf1_write+0xa3/0xf0
intel_lpss_pci_probe+0xcf/0x110 [intel_lpss_pci]
pci_device_probe+0x95/0x120
really_probe+0xd9/0x370
? __pfx___driver_attach+0x10/0x10
__driver_probe_device+0x73/0x150
driver_probe_device+0x19/0xa0
__driver_attach+0xb6/0x180
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x77/0xd0
bus_add_driver+0x114/0x210
driver_register+0x5b/0x110
? __pfx_intel_lpss_pci_driver_init+0x10/0x10 [intel_lpss_pci]
do_one_initcall+0x57/0x2b0
? kmalloc_trace+0x21e/0x280
? do_init_module+0x1e/0x210
do_init_module+0x5f/0x210
load_module+0x1d37/0x1fc0
? init_module_from_file+0x86/0xd0
init_module_from_file+0x86/0xd0
idempotent_init_module+0x17c/0x230
__x64_sys_finit_module+0x56/0xb0
do_syscall_64+0x6e/0x140
entry_SYSCALL_64_after_hwframe+0x71/0x79
Fixes: 1a75cc710b ("iommu/vt-d: Use rbtree to track iommu probed devices")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10689
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20240407011429.136282-1-baolu.lu@linux.intel.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The page request queue is per IOMMU, its allocation should be made
NUMA-aware for performance reasons.
Fixes: a222a7f0bb ("iommu/vt-d: Implement page request handling")
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20240403214007.985600-1-jacob.jun.pan@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded
based on the alias from of_device_id table.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20240410164109.233308-1-krzk@kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
DTE[Mode]=0 is not supported when SNP is enabled in the host. That means
to support SNP, IOMMU must be configured with V1 page table (See IOMMU
spec [1] for the details). If user passes kernel command line to configure
IOMMU domains with v2 page table (amd_iommu=pgtbl_v2) then disable SNP
as the user asked by not forcing the page table to v1.
[1] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/specifications/48882_IOMMU.pdf
Cc: Ashish Kalra <ashish.kalra@amd.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/20240410085702.31869-1-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
LOCKDEP detector reported below warning:
----------------------------------------
[ 23.796949] ========================================================
[ 23.796950] WARNING: possible irq lock inversion dependency detected
[ 23.796952] 6.8.0fix+ #811 Not tainted
[ 23.796954] --------------------------------------------------------
[ 23.796954] kworker/0:1/8 just changed the state of lock:
[ 23.796956] ff365325e084a9b8 (&domain->lock){..-.}-{3:3}, at: amd_iommu_flush_iotlb_all+0x1f/0x50
[ 23.796969] but this lock took another, SOFTIRQ-unsafe lock in the past:
[ 23.796970] (pd_bitmap_lock){+.+.}-{3:3}
[ 23.796972]
and interrupts could create inverse lock ordering between them.
[ 23.796973]
other info that might help us debug this:
[ 23.796974] Chain exists of:
&domain->lock --> &dev_data->lock --> pd_bitmap_lock
[ 23.796980] Possible interrupt unsafe locking scenario:
[ 23.796981] CPU0 CPU1
[ 23.796982] ---- ----
[ 23.796983] lock(pd_bitmap_lock);
[ 23.796985] local_irq_disable();
[ 23.796985] lock(&domain->lock);
[ 23.796988] lock(&dev_data->lock);
[ 23.796990] <Interrupt>
[ 23.796991] lock(&domain->lock);
Fix this issue by disabling interrupt when acquiring pd_bitmap_lock.
Note that this is temporary fix. We have a plan to replace custom bitmap
allocator with IDA allocator.
Fixes: 87a6f1f22c ("iommu/amd: Introduce per-device domain ID to fix potential TLB aliasing issue")
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Vasant Hegde <vasant.hegde@amd.com>
Link: https://lore.kernel.org/r/20240404102717.6705-1-vasant.hegde@amd.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Fixes for v6.9
Display:
- Fixes for PM refcount leak when DP goes to disconnected state and
also when link training fails. This is also one of the issues found
with the pm runtime series
- Add missing newlines to prints in msm_fb and msm_kms
- Change permissions of some dpu debugfs entries which write to const
data from catalog to read-only to avoid protection faults
- Fix the interface table for the catalog of X1E80100. This is an
important fix to bringup DP for X1E80100.
- Logging fix to print the callback symbol in the invalid IRQ message
case rather than printing when its known to be NULL.
- Bindings fix to add DP node as child of mdss for mdss node
- Minor typo fix in DP driver API which handles port status change
GPU:
- fix CHRASHDUMP_READ()
- fix HHB (highest bank bit) for a619 to fix UBWC corruption
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGvFwRUcHGWva7oDeydq1PTiZMduuykCD2MWaFrT4iMGZA@mail.gmail.com
- Fix index of Clear Event Record handles in cxl_clear_event_record().
- Fix use before init of map->reg_type in cxl_decode_regblock().
- Fix initialization of mbox_cmd.size_out in cxl_mem_get_records_log().
- Series fixing CXL path access_coordinate computation.
- Remove unneded check of iter in loop.
- Fix of retrieving of access_coordinate in PCI topology walk.
- Fix of incorrect region access_coordinate data calculation.
- Consolidate of access_coordinates attached to downstream port
context.
- Add check to validate access_coordinate validity to prevent
incorrect data being exposed via sysfs.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmYYa+YACgkQYGjFFmlT
OErU2A/+MOjbUrgHAm2NECLR2SrXb7JHJA6J5glaWLwjUpuV97BopHpEAU5Whlf4
sLk5o1j7DcNjKDBQQTtBvefYDdfzMQGS2amZdu9Z7FZJtWW1DRiVYjuKdMS3y4mC
I6U0jRHWp6ojhf6Wa/09LYrRzOxu+sPLV8t3MGkkIpdYFwunJXl169H22EIjCuWD
BALjl2jCqKSPIwxZMnM7hR817s1z6sDM25XK2Wr1oSCGaIeV0uGvZyx0PnY1jFSG
z2iFN/ZntivbT554JTNEFMeHheOlkzZL7liy5QZRGCKmrfTM0WnVtFyMWGpQ85XI
GMoi/xSCDozrmOk3aMTPqyhabCX9VGdUO5IZDyiMwfofCXKZQrGs6IbzQLvTC/MV
Ngtzb8CExvel+N24UAiWDBilhsgvzrRLCBRWc8Scl08cGXF0/C+n2+Nq3brTqAaP
aDn4Zj9IOpSG0POawN4mqLb90A7JkbCNux35ssQ6b/lXVjIe7uRqrmlFcXMnV6ja
dQ1fw5dxZBCr1wtTSOOAqOqVt1XNw16VP85nmQ6SwWed++4Ja2U/cMZVwtRLnf1A
sz53Po209RJODhwjzyQ5kxj6oTss3voqQ7MlVUiWrOnXPohQsGMRk3gGW/fC1DG3
prvNFZlHfeeWyw5H7goJ+Newx/fcY991ytuI9X7II/cG+TLQ0cM=
=SGRd
-----END PGP SIGNATURE-----
Merge tag 'cxl-fixes-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Dave Jiang:
- Fix index of Clear Event Record handles in cxl_clear_event_record()
- Fix use before init of map->reg_type in cxl_decode_regblock()
- Fix initialization of mbox_cmd.size_out in cxl_mem_get_records_log()
- Fix CXL path access_coordinate computation:
- Remove unneded check of iter in loop
- Fix of retrieving of access_coordinate in PCI topology walk
- Fix of incorrect region access_coordinate data calculation
- Consolidate of access_coordinates attached to downstream port
context
- Add check to validate access_coordinate validity to prevent
incorrect data being exposed via sysfs
* tag 'cxl-fixes-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl: Add checks to access_coordinate calculation to fail missing data
cxl: Consolidate dport access_coordinate ->hb_coord and ->sw_coord into ->coord
cxl: Fix incorrect region perf data calculation
cxl: Fix retrieving of access_coordinates in PCIe path
cxl: Remove checking of iter in cxl_endpoint_get_perf_coordinates()
cxl/core: Fix initialization of mbox_cmd.size_out in get event
cxl/core/regs: Fix usage of map->reg_type in cxl_decode_regblock() before assigned
cxl/mem: Fix for the index of Clear Event Record Handle
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmYYYPkTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXnxhB/4/8c7lFT53VbujFmVA5sNvpP5Ji5Xg
ERhVID7tzDyaVRPr+tpPJIW0Oj/t34SH9seoTYCBCM/UABWe/Gxceg2JaoOzIx+l
LHi73T4BBaqExiXbCCFj8N7gLO5P4Xz6ZZRgwHws1KmMXsiWYmYsbv36eSv9x6qK
+z/n6p9/ubKFNj2/vsvfiGmY0XHayD3NM4Y4toMbYE/tuRT8uZ7D5sqWdRf+UhW/
goRDA5qppeSfuaQu2LNVoz1e6wRmeJFv8OHgaPvQqAjTRLzPwwss28HICmKc8gh3
HDDUUJCHSs1XItSGDFip6rIFso5X/ZHO0d6pV75hOKCisd7lV0qH6NIZ
=k62H
-----END PGP SIGNATURE-----
Merge tag 'hyperv-fixes-signed-20240411' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Some cosmetic changes (Erni Sri Satya Vennela, Li Zhijian)
- Introduce hv_numa_node_to_pxm_info() (Nuno Das Neves)
- Fix KVP daemon to handle IPv4 and IPv6 combination for keyfile format
(Shradha Gupta)
- Avoid freeing decrypted memory in a confidential VM (Rick Edgecombe
and Michael Kelley)
* tag 'hyperv-fixes-signed-20240411' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: vmbus: Don't free ring buffers that couldn't be re-encrypted
uio_hv_generic: Don't free decrypted memory
hv_netvsc: Don't free decrypted memory
Drivers: hv: vmbus: Track decrypted status in vmbus_gpadl
Drivers: hv: vmbus: Leak pages if set_memory_encrypted() fails
hv/hv_kvp_daemon: Handle IPv4 and Ipv6 combination for keyfile format
hv: vmbus: Convert sprintf() family to sysfs_emit() family
mshyperv: Introduce hv_numa_node_to_pxm_info()
x86/hyperv: Cosmetic changes for hv_apic.c
The "buffer_percent" logic that is used by the ring buffer splice code to
only wake up the tasks when there's no data after the buffer is filled to
the percentage of the "buffer_percent" file is dependent on three
variables that determine the amount of data that is in the ring buffer:
1) pages_read - incremented whenever a new sub-buffer is consumed
2) pages_lost - incremented every time a writer overwrites a sub-buffer
3) pages_touched - incremented when a write goes to a new sub-buffer
The percentage is the calculation of:
(pages_touched - (pages_lost + pages_read)) / nr_pages
Basically, the amount of data is the total number of sub-bufs that have been
touched, minus the number of sub-bufs lost and sub-bufs consumed. This is
divided by the total count to give the buffer percentage. When the
percentage is greater than the value in the "buffer_percent" file, it
wakes up splice readers waiting for that amount.
It was observed that over time, the amount read from the splice was
constantly decreasing the longer the trace was running. That is, if one
asked for 60%, it would read over 60% when it first starts tracing, but
then it would be woken up at under 60% and would slowly decrease the
amount of data read after being woken up, where the amount becomes much
less than the buffer percent.
This was due to an accounting of the pages_touched incrementation. This
value is incremented whenever a writer transfers to a new sub-buffer. But
the place where it was incremented was incorrect. If a writer overflowed
the current sub-buffer it would go to the next one. If it gets preempted
by an interrupt at that time, and the interrupt performs a trace, it too
will end up going to the next sub-buffer. But only one should increment
the counter. Unfortunately, that was not the case.
Change the cmpxchg() that does the real switch of the tail-page into a
try_cmpxchg(), and on success, perform the increment of pages_touched. This
will only increment the counter once for when the writer moves to a new
sub-buffer, and not when there's a race and is incremented for when a
writer and its preempting writer both move to the same new sub-buffer.
Link: https://lore.kernel.org/linux-trace-kernel/20240409151309.0d0e5056@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 2c2b0a78b3 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
It has been a couple of years since I stepped down as CephFS maintainer.
I'm not involved in any meaningful way with the project these days, so
while I'm happy to help review the occasional patch, I don't need to be
cc'ed on all of them.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
The same list item will be used in both cap_delay_list and
cap_unlink_delay_list, so it's buggy to use two different locks
to protect them.
Cc: stable@vger.kernel.org
Fixes: dbc347ef7f ("ceph: add ceph_cap_unlink_work to fire check_caps() immediately")
Link: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/AODC76VXRAMXKLFDCTK4TKFDDPWUSCN5
Reported-by: Marc Ruhmann <ruhmann@luis.uni-hannover.de>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Marc Ruhmann <ruhmann@luis.uni-hannover.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
- Modify the ACPI device enumeration code to avoid counting
dependencies that have been met already as unmet (Hans de Goede).
- Make _UID matching take the integer value of 0 into account as
appropriate (Raag Jadav).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmYYI2sSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxPAYP/0tbALMVJFkigYEoMsAGPSJqa94T6ZTC
UwtnmBl7/IgVAGFiPOqsaGEaJ2PIxNZCz/UBwfDbE98HR9Ow+kvl6V8U/PFLweEj
8JH1OSm5EpvpGzb+Iqmuq49kcewE3usabm5BPoXnJwchC939YSDMgTcKrNci9X3l
az9CIeWMTxfSEMyCWP3N90Iaga1cDUJeuZJ344G29RPFLjiDBjZQNFUX4mQro5l8
NjdeTxYJXfr6P7D81OH944Agw43d6oyjw3hDRvOz2t7EPeEdg8nd8doHL4P31TMy
378hxzpwmKnK1Q/5d6DdDqUCaTXhxs5ztV/fgGMlLccBabgE7g/Wp+Pd6zEut0wO
Rw1YrVv/ahuf3VB47kgVK77Q1ij7EXk9nE9ity2CMuLET2QfbFoFgIOlWCfSxVzs
aoBtS2NmKmvKHxvSxEN7doe3ImVV2EeWMvWNw/03CcA6/SlILskQO/6MKfBhtSPO
AUcTRKM2ia5PDzPfjFCYoiryLf6TEbNrRgwQ9Sq3bQuHimQAL8IqGaFeMd2VIaB8
WIsjV7LM+IebiPXf65DKRmVvq6uDe7uUTH8vsGnB9hXeTgrUcmhab9LZtZIdf2Ao
UtCq024W01QBTz/erW0gDwOMalRYqlsUkuL/gz1QDffXbzKMirtoQqZvonCqYEZO
iiH4PKpxxSRK
=94Dn
-----END PGP SIGNATURE-----
Merge tag 'acpi-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix the handling of dependencies between devices in the ACPI
device enumeration code and address a _UID matching regression from
the 6.8 development cycle.
Specifics:
- Modify the ACPI device enumeration code to avoid counting
dependencies that have been met already as unmet (Hans de Goede)
- Make _UID matching take the integer value of 0 into account as
appropriate (Raag Jadav)"
* tag 'acpi-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: bus: allow _UID matching for integer zero
ACPI: scan: Do not increase dep_unmet for already met dependencies
Fix the suspend-to-idle core code to guarantee that timers queued on
CPUs other than the one that has first left the idle state, which should
expire directly after resume, will be handled (Anna-Maria Behnsen).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmYYIvYSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxq5oP/RS98YTjjrhqieQ2mfKN+thVpgTEeYBa
C4NoRbQf0x8kyhHbqEmjEdxlIUUmLDpolWMxFtlYdhTSZB7+k0z9GL3fZfTKRTVq
8lYSuxEzyicylve+b+gVcQ6m61kaV3M2jwP23Myaf5MOIx2OvQy7SnNS02p3yfJ8
NhBTqqTXbupWnnRuSPZvejnZaU/3wW1UvVmbkBUNwhR0O+6J0bxC5fNrvgip//kG
Iyx04j88787nwS9rUL68A9xSA9WVRuLNZRMb7RE2mZ9YVR6qLhxhUxfA8WrKignl
SxFPPmzGNkWdwtlWeIsQvcAOOhTKnWqOm7aggBfcf7jGoYBOJ7Uo4gfxvEvA1gnc
36UTpJXc/5HQDeVKYcSx5O9VSm7/cKx4sP5jiDKT/iBPLnSXAKoAflgl1RvkchvA
Gg0aJ37MRIvoP53OidNACkHN2VsikTMuYA4b21G+we/ib7q8kIbdI9yqDF4aPcCU
vV0rMlSGSfM5+PGn8fDei2sbZ4E6Mk3XRwzF386xJDIyrvnqr6hhmVxDmWOx7oEb
Jpv80971cbr1lmtP2SKFVdsRxElhRWfXk3OwLOGRbLbQI1BP+SAWeyICAYUanPJI
W6vmJwDqylCUFRy4mhufDe7fceXHnvFUYMhi0XWkGvCgWEgLffKHTzFNup12hU1Y
65Rcv19bRP8h
=8DDE
-----END PGP SIGNATURE-----
Merge tag 'pm-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix the suspend-to-idle core code to guarantee that timers queued on
CPUs other than the one that has first left the idle state, which
should expire directly after resume, will be handled (Anna-Maria
Behnsen)"
* tag 'pm-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: s2idle: Make sure CPUs will wakeup directly on resume
Current release - new code bugs:
- netfilter: complete validation of user input
- mlx5: disallow SRIOV switchdev mode when in multi-PF netdev
Previous releases - regressions:
- core: fix u64_stats_init() for lockdep when used repeatedly in one file
- ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr
- bluetooth: fix memory leak in hci_req_sync_complete()
- batman-adv: avoid infinite loop trying to resize local TT
- drv: geneve: fix header validation in geneve[6]_xmit_skb
- drv: bnxt_en: fix possible memory leak in bnxt_rdma_aux_device_init()
- drv: mlx5: offset comp irq index in name by one
- drv: ena: avoid double-free clearing stale tx_info->xdpf value
- drv: pds_core: fix pdsc_check_pci_health deadlock
Previous releases - always broken:
- xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING
- bluetooth: fix setsockopt not validating user input
- af_unix: clear stale u->oob_skb.
- nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies
- drv: virtio_net: fix guest hangup on invalid RSS update
- drv: mlx5e: Fix mlx5e_priv_init() cleanup flow
- dsa: mt7530: trap link-local frames regardless of ST Port State
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmYXyoQSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOk72wQAJJ9DQra9b/8S3Zla1dutBcznSCxruas
vWrpgIZiT3Aw5zUmZUZn+rNP8xeWLBK78Yv4m236B8/D3Kji2uMbVrjUAApBBcHr
/lLmctZIhDHCoJCYvRTC/VOVPuqbbbmxOmx6rNvry93iNAiHAnBdCUlOYzMNPzJz
6XIGtztFP0ICtt9owFtQRnsPeKhZJ5DoxqgE9KS2Pmb9PU99i1bEShpwLwB5I83S
yTHKUY5W0rknkQZTW1gbv+o3dR0iFy7LZ+1FItJ/UzH0bG6JmcqzSlH5mZZJCc/L
5LdUwtwMmKG2Kez/vKr1DAwTeAyhwVU+d+Hb28QXiO0kAYbjbOgNXse1st3RwDt5
YKMKlsmR+kgPYLcvs9df2aubNSRvi2utwIA2kuH33HxBYF5PfQR5PTGeR21A+cKo
wvSit8aMaGFTPJ7rRIzkNaPdIHSvPMKYcXV/T8EPvlOHzi5GBX0qHWj99JO9Eri+
VFci+FG3HCPHK8v683g/WWiiVNx/IHMfNbcukes1oDFsCeNo7KZcnPY+zVhtdvvt
QBnvbAZGKeDXMbnHZLB3DCR3ENHWTrJzC3alLDp3/uFC79VKtfIRO2wEX3gkrN8S
JHsdYU13Yp1ERaNjUeq7Sqk2OGLfsBt4HSOhcK8OPPgE5rDRON5UPjkuNvbaEiZY
Morzaqzerg1B
=a9bB
-----END PGP SIGNATURE-----
Merge tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth.
Current release - new code bugs:
- netfilter: complete validation of user input
- mlx5: disallow SRIOV switchdev mode when in multi-PF netdev
Previous releases - regressions:
- core: fix u64_stats_init() for lockdep when used repeatedly in one
file
- ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr
- bluetooth: fix memory leak in hci_req_sync_complete()
- batman-adv: avoid infinite loop trying to resize local TT
- drv: geneve: fix header validation in geneve[6]_xmit_skb
- drv: bnxt_en: fix possible memory leak in
bnxt_rdma_aux_device_init()
- drv: mlx5: offset comp irq index in name by one
- drv: ena: avoid double-free clearing stale tx_info->xdpf value
- drv: pds_core: fix pdsc_check_pci_health deadlock
Previous releases - always broken:
- xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING
- bluetooth: fix setsockopt not validating user input
- af_unix: clear stale u->oob_skb.
- nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies
- drv: virtio_net: fix guest hangup on invalid RSS update
- drv: mlx5e: Fix mlx5e_priv_init() cleanup flow
- dsa: mt7530: trap link-local frames regardless of ST Port State"
* tag 'net-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits)
net: ena: Set tx_info->xdpf value to NULL
net: ena: Fix incorrect descriptor free behavior
net: ena: Wrong missing IO completions check order
net: ena: Fix potential sign extension issue
af_unix: Fix garbage collector racing against connect()
net: dsa: mt7530: trap link-local frames regardless of ST Port State
Revert "s390/ism: fix receive message buffer allocation"
net: sparx5: fix wrong config being used when reconfiguring PCS
net/mlx5: fix possible stack overflows
net/mlx5: Disallow SRIOV switchdev mode when in multi-PF netdev
net/mlx5e: RSS, Block XOR hash with over 128 channels
net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit
net/mlx5e: HTB, Fix inconsistencies with QoS SQs number
net/mlx5e: Fix mlx5e_priv_init() cleanup flow
net/mlx5e: RSS, Block changing channels number when RXFH is configured
net/mlx5: Correctly compare pkt reformat ids
net/mlx5: Properly link new fs rules into the tree
net/mlx5: offset comp irq index in name by one
net/mlx5: Register devlink first under devlink lock
net/mlx5: E-switch, store eswitch pointer before registering devlink_param
...
The most important fix is the sg one because the regression it fixes
(spurious warning and use after final put) is already backported to
stable. The next biggest impact is the target fix for wrong
credentials used to load a module because it's affecting new kernels
installed on selinux based distributions. The other three fixes are
an obvious off by one and SATA protocol issues.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZhfZJyYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishaOtAQCADxSy
dAO9ygAEEoz25FXPUvJquzBhwuiFMy878fdd7AD8DNYBs7p5mo+1omWqpLpa0l6Z
3ZXOBQ6JiuDOx6iKSiQ=
=+qtC
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"The most important fix is the sg one because the regression it fixes
(spurious warning and use after final put) is already backported to
stable.
The next biggest impact is the target fix for wrong credentials used
to load a module because it's affecting new kernels installed on
selinux based distributions.
The other three fixes are an obvious off by one and SATA protocol
issues"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Fix off by one in qla_edif_app_getstats()
scsi: hisi_sas: Modify the deadline for ata_wait_after_reset()
scsi: hisi_sas: Handle the NCQ error returned by D2H frame
scsi: target: Fix SELinux error when systemd-modules loads the target module
scsi: sg: Avoid race in error handling & drop bogus warn
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmYXUTAWHGNoZW5odWFj
YWlAa2VybmVsLm9yZwAKCRAChivD8uImeuw1D/9unmPbKPdYGT8sfLAjr/xfR6dK
lyXCBQAoa7mazMjyV+iCjbWYOloMXDsuJ5jaFYU1RFEXLRez7pTI8RAQuGOuS6ql
liBlZ7zbyo/SRs6GMR0eAGVzNYyKHH/xyqGGmwlQabkZt0yqOXAU/Q2FTNRkzN1w
peTILEi2MLPZjJzJx6iQt2QDcWHSpqQBWwNQSLoNnSmKkbSwgMEAGqYArJ6w1OPC
KA513Ct7Lk+Qbbbj4JNHSrZvXcR7sqywiJQHK90MTXlOx6Yha+ymCu33iZf5ryJe
V8RFJGBLfFj8fywk272skuX5XnhXsK14ej6o1R6lp0g8lqKr1jWnpVC0f2+kbChN
P2XefOe85cXOkt8TqjuTGchg2P95xBGG59xavQo2y9mjtvZG+YyWJsjQCUR7ppEV
fZ70h+vBsIVAikEKIfEahruk2x5OU+zhzs+OdXy/Har9JLd8sKymNHbfne0RV7Lq
BKSFeBBGSSPyxL4Hr5Oo0pnowsY29rBWM6mKEVDvkREptkU8nxJdJ07gI0n/cfWQ
AD3Z7Op6bEU2oOU2oUk1L2471+p2ozjdU+YNCZKz3RNiciuTVgLkeIxwjbRYDojz
cV7jD84LGEoSgHdJNE5F3URF3ncX64k8ARS05KCnCVeCz9ko6hTPs5wcKLwMuKG9
GePDGTqGU7H9OGA2qg==
=tiT2
-----END PGP SIGNATURE-----
Merge tag 'loongarch-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
- make {virt, phys, page, pfn} translation work with KFENCE for
LoongArch (otherwise NVMe and virtio-blk cannot work with KFENCE
enabled)
- update dts files for Loongson-2K series to make devices work
correctly
- fix a build error
* tag 'loongarch-fixes-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Include linux/sizes.h in addrspace.h to prevent build errors
LoongArch: Update dts for Loongson-2K2000 to support GMAC/GNET
LoongArch: Update dts for Loongson-2K2000 to support PCI-MSI
LoongArch: Update dts for Loongson-2K2000 to support ISA/LPC
LoongArch: Update dts for Loongson-2K1000 to support ISA/LPC
LoongArch: Make virt_addr_valid()/__virt_addr_valid() work with KFENCE
LoongArch: Make {virt, phys, page, pfn} translation work with KFENCE
mm: Move lowmem_page_address() a little later
Notable user impacting bugs
- On multi device filesystems, recovery was looping in
btree_trans_too_many_iters(). This checks if a transaction has touched
too many btree paths (because of iteration over many keys), and isuses
a restart to drop unneeded paths. But it's now possible for some paths
to exceed the previous limit without iteration in the interior btree
update path, since the transaction commit will do alloc updates for
every old and new btree node, and during journal replay we don't use
the btree write buffer for locking reasons and thus those updates use
btree paths when they wouldn't normally.
- Fix a corner case in rebalance when moving extents on a durability=0
device. This wouldn't be hit when a device was formatted with
durability=0 since in that case we'll only use it as a write through
cache (only cached extents will live on it), but durability can now be
changed on an existing device.
- bch2_get_acl() could rarely forget to handle a transaction restart;
this manifested as the occasional missing acl that came back after
dropping caches.
- Fix a major performance regression on high iops multithreaded write
workloads (only since 6.9-rc1); a previous fix for a deadlock in the
interior btree update path to check the journal watermark introduced a
dependency on the state of btree write buffer flushing that we didn't
want.
- Assorted other repair paths and recovery fixes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmYXTd8ACgkQE6szbY3K
bna4MA//Y/CSB2JupxJPFUAb69+WmNDMnJJV3FlD/Hwo19kOR7aRbKQMUxsH51nb
dfv5o/58n39QIBqMcMtTipnoND6jrwv7l5NimFKQmj/YehxosdsOf2BgbD/M4Ozz
84IUmHXSs5+zezjF8IAw/bjR/p13XNJGSYTfl1RWbGUMERLpfZcLn70FvoCR/qpQ
Bcp+z70K6bBhrPZqFYd2mEC+Cfo42aCD1lqWUQ/e0FHiNEZnCNH2lNca4phBzGt4
f9sBxBcwmDfizQxqpyZ4izRzbS9ZwZ3ega336L2DrPpwgjMgTRLKLjdXqJDjGvDW
ngvnaUw7SgICs+q48g3f67cGhw/4lSPdVu/9a/ldqDEP9PynQiUS1G8aDofLdboW
xCZM6toX86p0DMNY2kP9vc5kxr377cSXAL7VeKbE+ZV5vGyEz1qFDLRAYVixHDfr
Q7KgYvoJq8CgjfK7rIZbO2tqKhz2TP4+2rJa6tDLwKXs5ice++w/aF5d/nBZ7iCe
+dyJ+aJiosjHEVG+QscACFrjBJdNNspJqBWDP396XeOsdl+iCeIRP0VHOGytLLRf
gisE0Lhj4pz6bv7OhAAkdxvegVQ8HfqN1E+f/WM7Kogqos0NS2skEhy9cTuDCHUm
qtPTUq5XNibiE6J+NOK86pu6o+6sqpWIpfTPuTbMid5sevLsomE=
=BQC0
-----END PGP SIGNATURE-----
Merge tag 'bcachefs-2024-04-10' of https://evilpiepirate.org/git/bcachefs
Pull more bcachefs fixes from Kent Overstreet:
"Notable user impacting bugs
- On multi device filesystems, recovery was looping in
btree_trans_too_many_iters(). This checks if a transaction has
touched too many btree paths (because of iteration over many keys),
and isuses a restart to drop unneeded paths.
But it's now possible for some paths to exceed the previous limit
without iteration in the interior btree update path, since the
transaction commit will do alloc updates for every old and new
btree node, and during journal replay we don't use the btree write
buffer for locking reasons and thus those updates use btree paths
when they wouldn't normally.
- Fix a corner case in rebalance when moving extents on a
durability=0 device. This wouldn't be hit when a device was
formatted with durability=0 since in that case we'll only use it as
a write through cache (only cached extents will live on it), but
durability can now be changed on an existing device.
- bch2_get_acl() could rarely forget to handle a transaction restart;
this manifested as the occasional missing acl that came back after
dropping caches.
- Fix a major performance regression on high iops multithreaded write
workloads (only since 6.9-rc1); a previous fix for a deadlock in
the interior btree update path to check the journal watermark
introduced a dependency on the state of btree write buffer flushing
that we didn't want.
- Assorted other repair paths and recovery fixes"
* tag 'bcachefs-2024-04-10' of https://evilpiepirate.org/git/bcachefs: (25 commits)
bcachefs: Fix __bch2_btree_and_journal_iter_init_node_iter()
bcachefs: Kill read lock dropping in bch2_btree_node_lock_write_nofail()
bcachefs: Fix a race in btree_update_nodes_written()
bcachefs: btree_node_scan: Respect member.data_allowed
bcachefs: Don't scan for btree nodes when we can reconstruct
bcachefs: Fix check_topology() when using node scan
bcachefs: fix eytzinger0_find_gt()
bcachefs: fix bch2_get_acl() transaction restart handling
bcachefs: fix the count of nr_freed_pcpu after changing bc->freed_nonpcpu list
bcachefs: Fix gap buffer bug in bch2_journal_key_insert_take()
bcachefs: Rename struct field swap to prevent macro naming collision
MAINTAINERS: Add entry for bcachefs documentation
Documentation: filesystems: Add bcachefs toctree
bcachefs: JOURNAL_SPACE_LOW
bcachefs: Disable errors=panic for BCH_IOCTL_FSCK_OFFLINE
bcachefs: Fix BCH_IOCTL_FSCK_OFFLINE for encrypted filesystems
bcachefs: fix rand_delete unit test
bcachefs: fix ! vs ~ typo in __clear_bit_le64()
bcachefs: Fix rebalance from durability=0 device
bcachefs: Print shutdown journal sequence number
...
The page has been marked clean before writepage is called. If we don't
redirty it before postponing the write, it might never get written.
Cc: stable@vger.kernel.org
Fixes: 503d4fa6ee ("ceph: remove reliance on bdi congestion")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
RING_CONTEXT_CONTROL is a masked register.
v2: Also clean up setting register value (Lucas)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240404161256.3852502-1-ashutosh.dixit@intel.com
(cherry picked from commit dc30c6e714)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Addressing potential overflow in result of multiplication of two lower
precision (u32) operands before widening it to higher precision
(u64).
-v2
Fix commit message and description. (Rodrigo)
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240401175300.3823653-1-himal.prasad.ghimiray@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 34820967ae)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>