- Add GENPD_FLAG_DEV_NAME_FW flag to generate unique names
pmdomain providers:
- arm: Use FLAG_DEV_NAME_FW to ensure unique names
- imx93-blk-ctrl: Fix the remove path
arm_scmi/qcom-cpucp:
- Report duplicate OPPs as firmware bugs for arm_scmi
- Skip OPP duplicates for arm_scmi
- Mark the qcom-cpucp mailbox irq with IRQF_NO_SUSPEND flag
-----BEGIN PGP SIGNATURE-----
iQJLBAABCgA1FiEEugLDXPmKSktSkQsV/iaEJXNYjCkFAmc3I+wXHHVsZi5oYW5z
c29uQGxpbmFyby5vcmcACgkQ/iaEJXNYjCmsYw/+PphbBhA6Tu1SjSwpkxx0QNBr
isc0DSwRG+iVfnwpeBB9lS5yLNwltbe8goZAVZYcFUdrCPs+OX5HZimFklAAufFh
X/S5Ge3MyXwiRjwwyz04wMl8SF2W4P90cLC9w8hB85dJ0/lhgTLCbi//eHXHQKkq
hzsVpusO4t+wa2HoTjdq6rqlSKW2NdUyZxrCWLDWaXrbqnBL0yJRs1rn2034w/qi
RannE36bm8YUTBgyiayViEGWC73VUtxB07oSn8Qq9dYzwjIUqVpqoBBQiJL4VCFR
/147/40rcq7T7B8qp9/3S82EV9vX26VAa/gUDDosb/RoB9XbBtx04QPBaHPiVitf
6kjqn6vD+F+jap+u7i0Q4PMvdfmSUfVeqbEx5ED3rpAbzfBazgyO4e3JAGFZOvqj
M6YP8i6mSmTOUZbkuF1dwmD9DtuoySaWzTrwUFluu8VMykTo/F8GqkeZXPIKg9CO
H+OXq81+btVOwXZ/HFRQPNVXY8Q7gQIQAsLoAW8QqXxzFJ8ovYHs0voT4KPXeWzu
56dWATJ6uh6P+zjIbWDDsg9m/tXhH5CGoRpn33OkhIBYnsSnoTDF6h4mVGr4K6Zn
W+uPS+Xg6nbXxaZHomJG3dDSpRWUA1FNVwiLZbnbvEBi5aU7FwdZNc8MCV6h5DOw
zaOeRa1zHz8F2K4l8z0=
=pPcz
-----END PGP SIGNATURE-----
Merge tag 'pmdomain-v6.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain fixes from Ulf Hansson:
"pmdomain core:
- Add GENPD_FLAG_DEV_NAME_FW flag to generate unique names
pmdomain providers:
- arm: Use FLAG_DEV_NAME_FW to ensure unique names
- imx93-blk-ctrl: Fix the remove path
arm_scmi/qcom-cpucp:
- Report duplicate OPPs as firmware bugs for arm_scmi
- Skip OPP duplicates for arm_scmi
- Mark the qcom-cpucp mailbox irq with IRQF_NO_SUSPEND flag"
* tag 'pmdomain-v6.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag
firmware: arm_scmi: Report duplicate opps as firmware bugs
firmware: arm_scmi: Skip opp duplicates
pmdomain: imx93-blk-ctrl: correct remove path
pmdomain: arm: Use FLAG_DEV_NAME_FW to ensure unique names
pmdomain: core: Add GENPD_FLAG_DEV_NAME_FW flag
A few last-minute fixes. All changes are device-specific small
fixes that should be pretty safe to apply.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmc2IUEOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE/vqhAA0m5OhLPRJU3O4t8LVlaJKzRm8mS39jJ7bnpP
I116NxZTtyS840CJbb5/731XCrS8IRz1RlqY6Ex9Y7kAnCK05h9R+vEw6RSH13rJ
XZgaAzK4qssx0x9Y/kd62x+/cRTLKPQHFtpJYzc2GYAZqrEovH4VHZxXfwtOd9HY
fVCXt2Bv7tzT3ZP9/3rq8QMEM4c+al3voB9EnIsL/4YMBLSkO17hFi8NlLjGiBCz
6shh1y/3JeBqVq/pj4E6xvy5ba+GUJI5h/oxxqaEF0u7AwaZbg+p6GF+yZJSybP2
Ry3y6lpU8gN51ZN5B6HVdgDkTLJoWTqFPgAu4purywmpJ0NAF/T7cE/uizNOdX3F
4l3PTwll+4MFMmKTA1MElVYTlVj/ogOm1DbeZfovtwJoQEs0xP3DJS0TVFdZtKGr
nISdQHyZ23mV/8UHq6eHKQjM4sK3BFJQJptCT9vYp3H7k6bPdntCG8dJLGTnZTCW
nPmDRfvh8hh95kYXHfPdtP52btoG/Kd5L5104ta9tfqRgsbze0mawxFph3qUyHEn
cpjq/XOgzt2prFlaZGBHMw46/62XlQ1lEZ8hSrdi9bRIZiSibeGr4FncRqBJBlTI
iahMBlxPEEt1D8wglc3YAQIFPR0dvfEjpm6uSCtu2Il+DS1V6J1SjfFhsL2At3PT
Ee5ero8=
=Erlm
-----END PGP SIGNATURE-----
Merge tag 'sound-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A few last-minute fixes. All changes are device-specific small fixes
that should be pretty safe to apply"
* tag 'sound-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - update set GPIO3 to default for Thinkpad with ALC1318
ALSA: hda/realtek: fix mute/micmute LEDs for a HP EliteBook 645 G10
ALSA: hda/realtek - Fixed Clevo platform headset Mic issue
ALSA: usb-audio: Fix Yamaha P-125 Quirk Entry
ASoC: max9768: Fix event generation for playback mute
ASoC: intel: sof_sdw: add quirk for Dell SKU
ASoC: audio-graph-card2: Purge absent supplies for device tree nodes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmccju4ACgkQxycdCkmx
i6db/BAAyqJ3joT62295GBRcgFMn+nZe/1Tq4jkFAj7saoCzFRn70SWu0nq3Zaed
xaQNl5BePD57ySUtsGm80qTISxOSE3zJ3c1S2vSisO6pjryMGuUFI1nPynD5sFz8
O+GMQB85qtZMOQZUtNQ6kt3gugGdoIvdgM0D/mG4qhRa5H6EeagV7rL+h89AAQ1M
c+ggOxpX2RKFFNsw+7xBjlIEMnYtl+Qq5ifgp5wUGetbQwUhWBF9lwBm7OV06Dp5
pnwQt2ctw2D12ZFyG25ncHHjf3ZzEuNU75SWyJQT+MuAO+4eSAplAr+MwvOytwqc
bxeWz5T4qfV3RqFCVJMpTzYrBG6jgUZrvUA1s4NNCXi07YhakZIlKcCql0tGAlkN
xg2MQxlwp0vmlayzq6PcT0zem8/pr0hvEUGl6FOTq1yCuRnUpre/kAIJY5QdQ/P7
TJ1KczyOJL56VEYwFEWmIFWKdz/XDmJxehiQVvFHOBA89zxsNv3/+oH6knMXEFXl
nmRwCDIjtZTv0QORL3h/odULwSxa6cN56+MrooIlrsDBDAajCXAj8lwBatgmI1QT
sCo1z2GMWQZ2j9R5nfyZRdrwIn8LXMozzRd/cfBkVi7hn7Mp8bz3xDOtK8TFik7l
iYmekMXH7itafm8yhDstLbk2dmymmk3+Gk41qr83vrAVZXk3YjY=
=GphV
-----END PGP SIGNATURE-----
Merge tag 'v6.12-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a regression in the MIPS CRC32C code"
* tag 'v6.12-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: mips/crc32 - fix the CRC32C implementation
ops.cpu_acquire() was being invoked with the wrong kfunc mask allowing the
operation to call kfuncs which shouldn't be allowed. Fix it by using
SCX_KF_REST instead, which is trivial and low risk.
-----BEGIN PGP SIGNATURE-----
iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZzamXw4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGRReAP4/JQ1mKkJv+9nTZkW9OcFFHGVVhrprOUEEFk5j
pmHwPAD8DTBMMS/BCQOoXDdiB9uU7ut6M8VdsIj1jmJkMja+eQI=
=942J
-----END PGP SIGNATURE-----
Merge tag 'sched_ext-for-6.12-rc7-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fix from Tejun Heo:
"One more fix for v6.12-rc7
ops.cpu_acquire() was being invoked with the wrong kfunc mask allowing
the operation to call kfuncs which shouldn't be allowed. Fix it by
using SCX_KF_REST instead, which is trivial and low risk"
* tag 'sched_ext-for-6.12-rc7-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: ops.cpu_acquire() should be called with SCX_KF_REST
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmc2aGIACgkQxWXV+ddt
WDu0AQ/9FLfC/e3X2GjZ0auna/7A/rF8MPoATUdAyHn75Md6Hc8PXpi1YvMph+ba
pvoufOqrU66/g0UNeacgsp6rF4rKJHg0q9Id+7wueLnDr41g9paXjsLYItq4j26w
GusDiZvUDwuDmb70vlTXrgAfnjooIdSwqJTlIzJxvl4wrNzOiUlUJtTMzmUrwn/9
Lf/iByWlGcPKKBc+1ZzFz4HlVOZZSt9YePeJw2/Aul2OMtuI3RTTAL/NtjaFIlYc
pb+NHVqFrrfgC+xo68hLBmnsBfS41EGR58rYRjEuQo0+hARa8WbxL3DNA/E/Vi5X
dsq/wQVlD7IVIWCoF9J94/iyDdwlDOGFMoL6FUrJwDtPGN/v/xxtA6ruvuC7k5zy
bHCR8ZVrJWVaxE7u0Gtl+hFPpDTwNTR7SfvK69gxPfci1cN0m2wCNK02SEUJwV09
N82N2ENGGwyWS+nOl/ERB+7A0QxViMr3JpUrPzSYqsmn8bwDvovSjK2fFouJoSey
bpAzbFWj+OS0O9nnRqabTJDM/Tk9O0s0Ye76aUS+Vfk9d5EuVfAg6pHiOBcFDhsK
UEG9QbPltfh6LPDHCdV93HOOsC0uNxCTCSpbQ9LFGKBICQsPIX/vZeV45fNFJDLX
j5kEtHFVU3snU+jA97nvYXPRANDnnNx/EzXv7zo0Ye8L+plecBs=
=ssYj
-----END PGP SIGNATURE-----
Merge tag 'for-6.12-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"One more fix that seems urgent and good to have in 6.12 final.
It could potentially lead to unexpected transaction aborts, due to
wrong comparison and order of processing of delayed refs"
* tag 'for-6.12-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix incorrect comparison for delayed refs
Regions will serve multiple purposes. First, with it we can decouple
ring/etc. object creation from registration / mapping of the memory they
will be placed in. We already have hacks that allow to put both SQ and
CQ into the same huge page, in the future we should be able to:
region = create_region(io_ring);
create_pbuf_ring(io_uring, region, offset=0);
create_pbuf_ring(io_uring, region, offset=N);
The second use case is efficiently passing parameters. The following
patch enables back on top of regions IORING_ENTER_EXT_ARG_REG, which
optimises wait arguments. It'll also be useful for request arguments
replacing iovecs, msghdr, etc. pointers. Eventually it would also be
handy for BPF as well if it comes to fruition.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0798cf3a14fad19cfc96fc9feca5f3e11481691d.1731689588.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We've got a good number of mappings we share with the userspace, that
includes the main rings, provided buffer rings, upcoming rings for
zerocopy rx and more. All of them duplicate user argument parsing and
some internal details as well (page pinnning, huge page optimisations,
mmap'ing, etc.)
Introduce a notion of regions. For userspace for now it's just a new
structure called struct io_uring_region_desc which is supposed to
parameterise all such mapping / queue creations. A region either
represents a user provided chunk of memory, in which case the user_addr
field should point to it, or a request for the kernel to allocate the
memory, in which case the user would need to mmap it after using the
offset returned in the mmap_offset field. With a uniform userspace API
we can avoid additional boiler plate code and apply future optimisation
to all of them at once.
Internally, there is a new structure struct io_mapped_region holding all
relevant runtime information and some helpers to work with it. This
patch limits it to user provided regions.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0e6fe25818dfbaebd1bd90b870a6cac503fe1a24.1731689588.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We're a bit too frivolous with types of nr_pages arguments, converting
it to long and back to int, passing an unsigned int pointer as an int
pointer and so on. Shouldn't cause any problem but should be carefully
reviewed, but until then let's add a WARN_ON_ONCE check to be more
confident callers don't pass poorely checked arguents.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d48e0c097cbd90fb47acaddb6c247596510d8cfc.1731689588.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use CLASS(fd) to get the file for sync message ring requests, rather
than open-code the file retrieval dance.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/r/20241115034902.GP3387508@ZenIV
[axboe: make a more coherent commit message]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For unbound workqueue, pwqs usually map to just a few pools. Most of
the time, pwqs will be linked sequentially to wq->pwqs list by cpu
index. Usually, consecutive CPUs have the same workqueue attribute
(e.g. belong to the same NUMA node). This makes pwqs with the same
pool cluster together in the pwq list.
Only do lock/unlock if the pool has changed in flush_workqueue_prep_pwqs().
This reduces the number of expensive lock operations.
The performance data shows this change boosts FIO by 65x in some cases
when multiple concurrent threads write to xfs mount points with fsync.
FIO Benchmark Details
- FIO version: v3.35
- FIO Options: ioengine=libaio,iodepth=64,norandommap=1,rw=write,
size=128M,bs=4k,fsync=1
- FIO Job Configs: 64 jobs in total writing to 4 mount points (ramdisks
formatted as xfs file system).
- Kernel Codebase: v6.12-rc5
- Test Platform: Xeon 8380 (2 sockets)
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
When CONFIG_CMDLINE_EXTEND is set, the core kernel command line handling
logic appends CONFIG_CMDLINE to the bootloader provided command line.
The EFI stub does the opposite, and parses the builtin one first.
The usual behavior of command line options is that the last one takes
precedence if it appears multiple times, unless there is a meaningful
way to combine them. In either case, parsing the builtin command line
first while the core kernel does it in the opposite order is likely to
produce inconsistent results in such cases.
Therefore, switch the order in the stub to match the core kernel.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Kexec bypasses EFI's switch to virtual mode. In exchange, it has its own
routine, kexec_enter_virtual_mode(), which replays the mappings made by
the original kernel. Unfortunately, that function fails to reinstate
EFI's memory attributes, which would've otherwise been set after
entering virtual mode. Remediate this by calling
efi_runtime_update_mappings() within kexec's routine.
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Drop support for the EFI_PROPERTIES_TABLE. It was a failed, short-lived
experiment that broke the boot both on Linux and Windows, and was
replaced by the EFI_MEMORY_ATTRIBUTES_TABLE shortly after.
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Nicolas Saenz Julienne <nsaenz@amazon.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
A previous commit changed how requests are linked in the plug structure,
but unlike the previous method, it uses a new type for it rather than
struct request. The latter is available even for !CONFIG_BLOCK, while
struct rq_list is now. Move it outside CONFIG_BLOCK.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Fixes: a3396b9999 ("block: add a rq_list type")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In cesa/cipher.c most declarations of struct mv_cesa_op_ctx are uninitialized.
This causes one of the values in the struct to be left unitialized in later
usages.
This patch fixes it by adding initializations in the same way it is done in
cesa/hash.c.
Fixes errors discovered in coverity: 1600942, 1600939, 1600935, 1600934, 1600929, 1600927,
1600925, 1600921, 1600920, 1600919, 1600915, 1600914
Signed-off-by: Karol Przybylski <karprzy7@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If do_cpt_init() fails, a previous dma_alloc_coherent() call needs to be
undone.
Add the needed dma_free_coherent() before returning.
Fixes: 9e2c7d9994 ("crypto: cavium - Add Support for Octeon-tx CPT Engine")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch reverts commit 0fbafd06bd
("crypto: aesni - fix failing setkey for rfc4106-gcm-aesni") by
moving the aesni init function back to module_init from late_initcall.
The original patch was needed because tests were synchronous. This
is no longer the case so there is no need to postpone the registration.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This function is part of the exposed API and should be exported.
Otherwise a modular user would fail to build, e.g., crypto/rsa.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
A hwcap feature bit is passed to cpu_has_feature, resulting in testing
for CPU_FTR_MMCRA instead of the 3.1 platform revision.
Fixes: c954b252de ("crypto: powerpc/p10-aes-gcm - Register modules as SIMD")
Reported-by: Nicolai Stange <nstange@suse.com>
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit 62f8f307c80e ("powerpc/64: Remove maple platform") removes the
PPC_MAPLE config as a consequence of the platform’s removal.
The config definition of HW_RANDOM_AMD refers to this removed config option
in its dependencies.
Remove the reference to the removed config option.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The CRC-T10DIF algorithm produces a 16-bit CRC, and this is reflected in
the folding coefficients, which are also only 16 bits wide.
This means that the polynomial multiplications involving these
coefficients can be performed using 8-bit long polynomial multiplication
(8x8 -> 16) in only a few steps, and this is an instruction that is part
of the base NEON ISA, which is all most real ARMv7 cores implement. (The
64-bit PMULL instruction is part of the crypto extensions, which are
only implemented by 64-bit cores)
The final reduction is a bit more involved, but we can delegate that to
the generic CRC-T10DIF implementation after folding the entire input
into a 16 byte vector.
This results in a speedup of around 6.6x on Cortex-A72 running in 32-bit
mode. On Cortex-A8 (BeagleBone White), the results are substantially
better than that, but not sufficiently reproducible (with tcrypt) to
quote a number here.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
To allow an alternative version to be created of the PMULL based
CRC-T10DIF algorithm, turn the bulk of it into a macro, except for the
final reduction, which will only be used by the existing version.
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The only remaining user of the fallback implementation of 64x64
polynomial multiplication using 8x8 PMULL instructions is the final
reduction from a 16 byte vector to a 16-bit CRC.
The fallback code is complicated and messy, and this reduction has
little impact on the overall performance, so instead, let's calculate
the final CRC by passing the 16 byte vector to the generic CRC-T10DIF
implementation when running the fallback version.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The CRC-T10DIF implementation for arm64 has a version that uses 8x8
polynomial multiplication, for cores that lack the crypto extensions,
which cover the 64x64 polynomial multiplication instruction that the
algorithm was built around.
This fallback version rather naively adopted the 64x64 polynomial
multiplication algorithm that I ported from ARM for the GHASH driver,
which needs 8 PMULL8 instructions to implement one PMULL64. This is
reasonable, given that each 8-bit vector element needs to be multiplied
with each element in the other vector, producing 8 vectors with partial
results that need to be combined to yield the correct result.
However, most PMULL64 invocations in the CRC-T10DIF code involve
multiplication by a pair of 16-bit folding coefficients, and so all the
partial results from higher order bytes will be zero, and there is no
need to calculate them to begin with.
Then, the CRC-T10DIF algorithm always XORs the output values of the
PMULL64 instructions being issued in pairs, and so there is no need to
faithfully implement each individual PMULL64 instruction, as long as
XORing the results pairwise produces the expected result.
Implementing these improvements results in a speedup of 3.3x on low-end
platforms such as Raspberry Pi 4 (Cortex-A72)
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This is a partial revert of commit fc754c024a, which moved the logic
into C code which ensures that kernel mode NEON code does not hog the
CPU for too long.
This is no longer needed now that kernel mode NEON no longer disables
preemption, so we can drop this.
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The ahash_init functions may return fails. The ahash_hmac_init should
not return ok when ahash_init returns error. For an example, ahash_init
will return -ENOMEM when allocation memory is error.
Fixes: 9d12ba86f8 ("crypto: brcm - Add Broadcom SPU driver")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The caam_rsa_set_priv_key_form did not check for memory allocation errors.
Add the checks to the caam_rsa_set_priv_key_form functions.
Fixes: 52e26d77b8 ("crypto: caam - add support for RSA key form 2")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Reviewed-by: Gaurav Jain <gaurav.jain@nxp.com>
Reviewed-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There are reports [0] of cases where a corrupt EFI Memory Attributes
Table leads to out of memory issues at boot because the descriptor size
and entry count in the table header are still used to reserve the entire
table in memory, even though the resulting region is gigabytes in size.
Given that the EFI Memory Attributes Table is supposed to carry up to 3
entries for each EfiRuntimeServicesCode region in the EFI memory map,
and given that there is no reason for the descriptor size used in the
table to exceed the one used in the EFI memory map, 3x the size of the
entire EFI memory map is a reasonable upper bound for the size of this
table. This means that sizes exceeding that are highly likely to be
based on corrupted data, and the table should just be ignored instead.
[0] https://bugzilla.suse.com/show_bug.cgi?id=1231465
Cc: Gregory Price <gourry@gourry.net>
Cc: Usama Arif <usamaarif642@gmail.com>
Acked-by: Jiri Slaby <jirislaby@kernel.org>
Acked-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/all/20240912155159.1951792-2-ardb+git@google.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
EFI zboot no longer uses LoadImage/StartImage, but subsumes the arch
code to load and start the bare metal image directly. Fix the Kconfig
description accordingly.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
cmdline_ptr is an out parameter, which is not allocated by the function
itself, and likely points into the caller's stack.
cmdline refers to the pool allocation that should be freed when cleaning
up after a failure, so pass this instead to free_pool().
Fixes: 42c8ea3dca ("efi: libstub: Factor out EFI stub entrypoint ...")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Syzbot has reported the following BUG:
kernel BUG at fs/ocfs2/uptodate.c:509!
...
Call Trace:
<TASK>
? __die_body+0x5f/0xb0
? die+0x9e/0xc0
? do_trap+0x15a/0x3a0
? ocfs2_set_new_buffer_uptodate+0x145/0x160
? do_error_trap+0x1dc/0x2c0
? ocfs2_set_new_buffer_uptodate+0x145/0x160
? __pfx_do_error_trap+0x10/0x10
? handle_invalid_op+0x34/0x40
? ocfs2_set_new_buffer_uptodate+0x145/0x160
? exc_invalid_op+0x38/0x50
? asm_exc_invalid_op+0x1a/0x20
? ocfs2_set_new_buffer_uptodate+0x2e/0x160
? ocfs2_set_new_buffer_uptodate+0x144/0x160
? ocfs2_set_new_buffer_uptodate+0x145/0x160
ocfs2_group_add+0x39f/0x15a0
? __pfx_ocfs2_group_add+0x10/0x10
? __pfx_lock_acquire+0x10/0x10
? mnt_get_write_access+0x68/0x2b0
? __pfx_lock_release+0x10/0x10
? rcu_read_lock_any_held+0xb7/0x160
? __pfx_rcu_read_lock_any_held+0x10/0x10
? smack_log+0x123/0x540
? mnt_get_write_access+0x68/0x2b0
? mnt_get_write_access+0x68/0x2b0
? mnt_get_write_access+0x226/0x2b0
ocfs2_ioctl+0x65e/0x7d0
? __pfx_ocfs2_ioctl+0x10/0x10
? smack_file_ioctl+0x29e/0x3a0
? __pfx_smack_file_ioctl+0x10/0x10
? lockdep_hardirqs_on_prepare+0x43d/0x780
? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
? __pfx_ocfs2_ioctl+0x10/0x10
__se_sys_ioctl+0xfb/0x170
do_syscall_64+0xf3/0x230
entry_SYSCALL_64_after_hwframe+0x77/0x7f
...
</TASK>
When 'ioctl(OCFS2_IOC_GROUP_ADD, ...)' has failed for the particular
inode in 'ocfs2_verify_group_and_input()', corresponding buffer head
remains cached and subsequent call to the same 'ioctl()' for the same
inode issues the BUG() in 'ocfs2_set_new_buffer_uptodate()' (trying
to cache the same buffer head of that inode). Fix this by uncaching
the buffer head with 'ocfs2_remove_from_cache()' on error path in
'ocfs2_group_add()'.
Link: https://lkml.kernel.org/r/20241114043844.111847-1-dmantipov@yandex.ru
Fixes: 7909f2bf83 ("[PATCH 2/2] ocfs2: Implement group add for online resize")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Reported-by: syzbot+453873f1588c2d75b447@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=453873f1588c2d75b447
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Dmitry Antipov <dmantipov@yandex.ru>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We triggered a NULL pointer dereference for ac.preferred_zoneref->zone in
alloc_pages_bulk_noprof() when the task is migrated between cpusets.
When cpuset is enabled, in prepare_alloc_pages(), ac->nodemask may be
¤t->mems_allowed. when first_zones_zonelist() is called to find
preferred_zoneref, the ac->nodemask may be modified concurrently if the
task is migrated between different cpusets. Assuming we have 2 NUMA Node,
when traversing Node1 in ac->zonelist, the nodemask is 2, and when
traversing Node2 in ac->zonelist, the nodemask is 1. As a result, the
ac->preferred_zoneref points to NULL zone.
In alloc_pages_bulk_noprof(), for_each_zone_zonelist_nodemask() finds a
allowable zone and calls zonelist_node_idx(ac.preferred_zoneref), leading
to NULL pointer dereference.
__alloc_pages_noprof() fixes this issue by checking NULL pointer in commit
ea57485af8 ("mm, page_alloc: fix check for NULL preferred_zone") and
commit df76cee6bb ("mm, page_alloc: remove redundant checks from alloc
fastpath").
To fix it, check NULL pointer for preferred_zoneref->zone.
Link: https://lkml.kernel.org/r/20241113083235.166798-1-tujinjiang@huawei.com
Fixes: 387ba26fb1 ("mm/page_alloc: add a bulk page allocator")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Lobakin <alobakin@pm.me>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
MADV_HUGEPAGE is a new addition to readahead with behavior distinct from
normal pages. To prevent confusion, we should update the documentation
accordingly.
Link: https://lkml.kernel.org/r/20241113150711.1685-1-laoar.shao@gmail.com
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The "arg->vec_len" variable is a u64 that comes from the user at the start
of the function. The "arg->vec_len * sizeof(struct page_region))"
multiplication can lead to integer wrapping. Use size_mul() to avoid
that.
Also the size_add/mul() functions work on unsigned long so for 32bit
systems we need to ensure that "arg->vec_len" fits in an unsigned long.
Link: https://lkml.kernel.org/r/39d41335-dd4d-48ed-8a7f-402c57d8ea84@stanley.mountain
Fixes: 52526ca7fd ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Andrei Vagin <avagin@google.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fixes boot failures on 6.9 on PPC_BOOK3S_32 machines using Open Firmware.
On these machines, the kernel refuses to boot from non-zero
PHYSICAL_START, which occurs when CRASH_DUMP is on.
Since most PPC_BOOK3S_32 machines boot via Open Firmware, it should
default to off for them. Users booting via some other mechanism can still
turn it on explicitly.
Does not change the default on any other architectures for the
time being.
Link: https://lkml.kernel.org/r/20240917163720.1644584-1-dave@vasilevsky.ca
Fixes: 75bc255a74 ("crash: clean up kdump related config items")
Signed-off-by: Dave Vasilevsky <dave@vasilevsky.ca>
Reported-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Closes: https://lists.debian.org/debian-powerpc/2024/07/msg00001.html
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Acked-by: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
On 32-bit platforms, it is possible for the expression `len + old_addr <
old_end` to be false-positive if `len + old_addr` wraps around.
`old_addr` is the cursor in the old range up to which page table entries
have been moved; so if the operation succeeded, `old_addr` is the *end* of
the old region, and adding `len` to it can wrap.
The overflow causes mremap() to mistakenly believe that PTEs have been
copied; the consequence is that mremap() bails out, but doesn't move the
PTEs back before the new VMA is unmapped, causing anonymous pages in the
region to be lost. So basically if userspace tries to mremap() a
private-anon region and hits this bug, mremap() will return an error and
the private-anon region's contents appear to have been zeroed.
The idea of this check is that `old_end - len` is the original start
address, and writing the check that way also makes it easier to read; so
fix the check by rearranging the comparison accordingly.
(An alternate fix would be to refactor this function by introducing an
"orig_old_start" variable or such.)
Tested in a VM with a 32-bit X86 kernel; without the patch:
```
user@horn:~/big_mremap$ cat test.c
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <err.h>
#include <sys/mman.h>
#define ADDR1 ((void*)0x60000000)
#define ADDR2 ((void*)0x10000000)
#define SIZE 0x50000000uL
int main(void) {
unsigned char *p1 = mmap(ADDR1, SIZE, PROT_READ|PROT_WRITE,
MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED_NOREPLACE, -1, 0);
if (p1 == MAP_FAILED)
err(1, "mmap 1");
unsigned char *p2 = mmap(ADDR2, SIZE, PROT_NONE,
MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED_NOREPLACE, -1, 0);
if (p2 == MAP_FAILED)
err(1, "mmap 2");
*p1 = 0x41;
printf("first char is 0x%02hhx\n", *p1);
unsigned char *p3 = mremap(p1, SIZE, SIZE,
MREMAP_MAYMOVE|MREMAP_FIXED, p2);
if (p3 == MAP_FAILED) {
printf("mremap() failed; first char is 0x%02hhx\n", *p1);
} else {
printf("mremap() succeeded; first char is 0x%02hhx\n", *p3);
}
}
user@horn:~/big_mremap$ gcc -static -o test test.c
user@horn:~/big_mremap$ setarch -R ./test
first char is 0x41
mremap() failed; first char is 0x00
```
With the patch:
```
user@horn:~/big_mremap$ setarch -R ./test
first char is 0x41
mremap() succeeded; first char is 0x41
```
Link: https://lkml.kernel.org/r/20241111-fix-mremap-32bit-wrap-v1-1-61d6be73b722@google.com
Fixes: af8ca1c149 ("mm/mremap: optimize the start addresses in move_page_tables()")
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
scx_next_task_picked() has been replaced with siwtch_class(), but comment
is still referencing old one, so replace it.
Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
Signed-off-by: Tejun Heo <tj@kernel.org>
Dan reported that after the rework the newly introduced
scf_add_to_free_list() may get a NULL pointer passed. This replaced
kfree() which was fine with a NULL pointer but scf_add_to_free_list()
isn't.
Let scf_add_to_free_list() handle NULL pointer.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/2375aa2c-3248-4ffa-b9b0-f0a24c50f237@stanley.mountain
Fixes: 4788c861ad ("scftorture: Use a lock-less list to free memory.")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
There are two flags used to synchronize allocation and scanning with
swapoff: SWP_WRITEOK and SWP_SCANNING.
SWP_WRITEOK: Swapoff will first unset this flag, at this point any further
swap allocation or scanning on this device should just abort so no more
new entries will be referencing this device. Swapoff will then unuse all
existing swap entries.
SWP_SCANNING: This flag is set when device is being scanned. Swapoff will
wait for all scanner to stop before the final release of the swap device
structures to avoid UAF. Note this flag is the highest used bit of
si->flags so it could be added up arithmetically, if there are multiple
scanner.
commit 5f843a9a3a ("mm: swap: separate SSD allocation from
scan_swap_map_slots()") ignored SWP_SCANNING and SWP_WRITEOK flags while
separating cluster allocation path from the old allocation path. Add the
flags back to fix swapoff race. The race is hard to trigger as si->lock
prevents most parallel operations, but si->lock could be dropped for
reclaim or discard. This issue is found during code review.
This commit fixes this problem. For SWP_SCANNING, Just like before, set
the flag before scan and remove it afterwards.
For SWP_WRITEOK, there are several places where si->lock could be dropped,
it will be error-prone and make the code hard to follow if we try to cover
these places one by one. So just do one check before the real allocation,
which is also very similar like before. With new cluster allocator it may
waste a bit of time iterating the clusters but won't take long, and
swapoff is not performance sensitive.
Link: https://lkml.kernel.org/r/20241112083414.78174-1-ryncsn@gmail.com
Fixes: 5f843a9a3a ("mm: swap: separate SSD allocation from scan_swap_map_slots()")
Reported-by: "Huang, Ying" <ying.huang@intel.com>
Closes: https://lore.kernel.org/linux-mm/87a5es3f1f.fsf@yhuang6-desk2.ccr.corp.intel.com/
Signed-off-by: Kairui Song <kasong@tencent.com>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
ops.cpu_acquire() is currently called with 0 kf_maks which is interpreted as
SCX_KF_UNLOCKED which allows all unlocked kfuncs, but ops.cpu_acquire() is
called from balance_one() under the rq lock and should only be allowed call
kfuncs that are safe under the rq lock. Update it to use SCX_KF_REST.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
Cc: Zhao Mengmeng <zhaomzhao@126.com>
Link: http://lkml.kernel.org/r/ZzYvf2L3rlmjuKzh@slm.duckdns.org
Fixes: 245254f708 ("sched_ext: Implement sched_ext_ops.cpu_acquire/release()")
With some recent proposed changes [1] in the deadline server code,
it has caused a test failure in test_cpuset_prs.sh when a change
is being made to an isolated partition. This is due to failing
the cpuset_cpumask_can_shrink() check for SCHED_DEADLINE tasks at
validate_change().
This is actually a false positive as the failed test case involves an
isolated partition with load balancing disabled. The deadline check
is not meaningful in this case and the users should know what they
are doing.
Fix this by doing the cpuset_cpumask_can_shrink() check only when loading
balanced is enabled. Also change its arguments to use effective_cpus
for the current cpuset and user_xcpus() as an approiximation for the
target effective_cpus as the real effective_cpus hasn't been fully
computed yet as this early stage.
As the check isn't comprehensive, there may be false positives or
negatives. We may have to revise the code to do a more thorough check
in the future if this becomes a concern.
[1] https://lore.kernel.org/lkml/82be06c1-6d6d-4651-86c9-bcc828cbcb80@redhat.com/T/#t
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>