Currently, only acpi_early_node_map[0] was initialized to NUMA_NO_NODE.
To ensure all the values were properly initialized, switch to initialize
all of them to NUMA_NO_NODE.
Fixes: e189624916 ("arm64: numa: rework ACPI NUMA initialization")
Cc: <stable@vger.kernel.org> # 4.19.x
Reported-by: Andrew Jones <ajones@ventanamicro.com>
Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Haibo Xu <haibo1.xu@intel.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://lore.kernel.org/r/853d7f74aa243f6f5999e203246f0d1ae92d2b61.1722828421.git.haibo1.xu@intel.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In the CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y version of __get_mem_asm(), we
incorrectly use _ASM_EXTABLE_##type##ACCESS_ERR() such that upon a fault
the extable fixup handler writes -EFAULT into "%w0", which is the
register containing 'x' (the result of the load).
This was a thinko in commit:
86a6a68feb ("arm64: start using 'asm goto' for get_user() when available")
Prior to that commit _ASM_EXTABLE_##type##ACCESS_ERR_ZERO() was used
such that the extable fixup handler wrote -EFAULT into "%w0" (the
register containing 'err'), and zero into "%w1" (the register containing
'x'). When the 'err' variable was removed, the extable entry was updated
incorrectly.
Writing -EFAULT to the value register is unnecessary but benign:
* We never want -EFAULT in the value register, and previously this would
have been zeroed in the extable fixup handler.
* In __get_user_error() the value is overwritten with zero explicitly in
the error path.
* The asm goto outputs cannot be used when the goto label is taken, as
older compilers (e.g. clang < 16.0.0) do not guarantee that asm goto
outputs are usable in this path and may use a stale value rather than
the value in an output register. Consequently, zeroing in the extable
fixup handler is insufficient to ensure callers see zero in the error
path.
* The expected usage of unsafe_get_user() and get_kernel_nofault()
requires that the value is not consumed in the error path.
Some versions of GCC would mis-compile asm goto with outputs, and
erroneously omit subsequent assignments, breaking the error path
handling in __get_user_error(). This was discussed at:
https://lore.kernel.org/lkml/ZpfxLrJAOF2YNqCk@J2N7QTR9R3.cambridge.arm.com/
... and was fixed by removing support for asm goto with outputs on those
broken compilers in commit:
f2f6a8e887 ("init/Kconfig: remove CONFIG_GCC_ASM_GOTO_OUTPUT_WORKAROUND")
With that out of the way, we can safely replace the usage of
_ASM_EXTABLE_##type##ACCESS_ERR() with _ASM_EXTABLE_##type##ACCESS(),
leaving the value register unchanged in the case a fault is taken, as
was originally intended. This matches other architectures and matches
our __put_mem_asm().
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240807103731.2498893-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Disallow read-only memslots for SEV-{ES,SNP} VM types, as KVM can't
directly emulate instructions for ES/SNP, and instead the guest must
explicitly request emulation. Unless the guest explicitly requests
emulation without accessing memory, ES/SNP relies on KVM creating an MMIO
SPTE, with the subsequent #NPF being reflected into the guest as a #VC.
But for read-only memslots, KVM deliberately doesn't create MMIO SPTEs,
because except for ES/SNP, doing so requires setting reserved bits in the
SPTE, i.e. the SPTE can't be readable while also generating a #VC on
writes. Because KVM never creates MMIO SPTEs and jumps directly to
emulation, the guest never gets a #VC. And since KVM simply resumes the
guest if ES/SNP guests trigger emulation, KVM effectively puts the vCPU
into an infinite #NPF loop if the vCPU attempts to write read-only memory.
Disallow read-only memory for all VMs with protected state, i.e. for
upcoming TDX VMs as well as ES/SNP VMs. For TDX, it's actually possible
to support read-only memory, as TDX uses EPT Violation #VE to reflect the
fault into the guest, e.g. KVM could configure read-only SPTEs with RX
protections and SUPPRESS_VE=0. But there is no strong use case for
supporting read-only memslots on TDX, e.g. the main historical usage is
to emulate option ROMs, but TDX disallows executing from shared memory.
And if someone comes along with a legitimate, strong use case, the
restriction can always be lifted for TDX.
Don't bother trying to retroactively apply the restriction to SEV-ES
VMs that are created as type KVM_X86_DEFAULT_VM. Read-only memslots can't
possibly work for SEV-ES, i.e. disallowing such memslots is really just
means reporting an error to userspace instead of silently hanging vCPUs.
Trying to deal with the ordering between KVM_SEV_INIT and memslot creation
isn't worth the marginal benefit it would provide userspace.
Fixes: 26c44aa9e0 ("KVM: SEV: define VM types for SEV and SEV-ES")
Fixes: 1dfe571c12 ("KVM: SEV: Add initial SEV-SNP support")
Cc: Peter Gonda <pgonda@google.com>
Cc: Michael Roth <michael.roth@amd.com>
Cc: Vishal Annapurve <vannapurve@google.com>
Cc: Ackerly Tng <ackerleytng@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240809190319.1710470-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAma8yoAUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXOJWw/8D8ZhOLdK9HFKpywIA855CHXbTVHz
VSM9avEykSHTPqWhGx9dqaS8hG/xG7Fh5t3Me2goh0E7dAZjJQCGf/PXyh8Ntotj
5QJB3CEzi9o4yM5yCgNgEYtnVdP52Y8SxfzE9TQWST7EbWsrWDjE68a/08h0cRkr
XTScFIB+SVHOJGBcdSKdYyVqnpgoE8OeWOxVa2OnThug8VPB5WMmJIPkMQ2ykOR4
4dGykI8ULNSiZ+0skfdhxXaykYDdajzCSrk21OP2iaD/41X+I627HaJzCxa3m9s0
z04XtaOV31Jye8McPF6dCHtloFQvZ/deXXeow2yytR7WnT5/KDwBt+8A7mDcKL0V
uhW9DiwxCit6Pao4a/YuEerXqxSH4DdJmTCbqrHITnQWdAW9UQ0jsY1msxwal/aq
l9kK1SJoB3/5A0Dcsc57TFCvh2dhwRnAU3Ik00SCW2LlZrsCmBSPMRCsTHDBVg49
R5ya6qFB9+KEcAjZgJqa15X8WC8cmx+/96mDScd6VjS2Eyh57tZmJgonjlUwJplH
A1erboJtlo2NHzMaxmIXEsaxJYvrOcwKDeQ80H81SBIDPLweW2FYDVZpp4/nx++2
+8myHz9KM7iWFVceJuqvNRix4iR3RkXxvgocm1CzUedgXYUJOVM/li+0pjgJ/bUM
gPo7Sr3BAk9T43k=
=B712
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20240814' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fixes from Paul Moore:
- Fix a xperms counting problem where we adding to the xperms count
even if we failed to add the xperm.
- Propogate errors from avc_add_xperms_decision() back to the caller so
that we can trigger the proper cleanup and error handling.
- Revert our use of vma_is_initial_heap() in favor of our older logic
as vma_is_initial_heap() doesn't correctly handle the no-heap case
and it is causing issues with the SELinux process/execheap access
control. While the older SELinux logic may not be perfect, it
restores the expected user visible behavior.
Hopefully we will be able to resolve the problem with the
vma_is_initial_heap() macro with the mm folks, but we need to fix
this in the meantime.
* tag 'selinux-pr-20240814' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: revert our use of vma_is_initial_heap()
selinux: add the processing of the failure of avc_add_xperms_decision()
selinux: fix potential counting error in avc_add_xperms_decision()
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZrym4AAKCRCRxhvAZXjc
oqT3AP9ydoUNavaZcRayH8r3ybvz9+aJGJ6Q7NznFVCk71vn0gD/buLzmq96Muns
M5DWHbft2AFwK0Rz2nx8j5OXUeHwrQg=
=HZBL
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"VFS:
- Fix the name of file lease slab cache. When file leases were split
out of file locks the name of the file lock slab cache was used for
the file leases slab cache as well.
- Fix a type in take_fd() helper.
- Fix infinite directory iteration for stable offsets in tmpfs.
- When the icache is pruned all reclaimable inodes are marked with
I_FREEING and other processes that try to lookup such inodes will
block.
But some filesystems like ext4 can trigger lookups in their inode
evict callback causing deadlocks. Ext4 does such lookups if the
ea_inode feature is used whereby a separate inode may be used to
store xattrs.
Introduce I_LRU_ISOLATING which pins the inode while its pages are
reclaimed. This avoids inode deletion during inode_lru_isolate()
avoiding the deadlock and evict is made to wait until
I_LRU_ISOLATING is done.
netfs:
- Fault in smaller chunks for non-large folio mappings for
filesystems that haven't been converted to large folios yet.
- Fix the CONFIG_NETFS_DEBUG config option. The config option was
renamed a short while ago and that introduced two minor issues.
First, it depended on CONFIG_NETFS whereas it wants to depend on
CONFIG_NETFS_SUPPORT. The former doesn't exist, while the latter
does. Second, the documentation for the config option wasn't fixed
up.
- Revert the removal of the PG_private_2 writeback flag as ceph is
using it and fix how that flag is handled in netfs.
- Fix DIO reads on 9p. A program watching a file on a 9p mount
wouldn't see any changes in the size of the file being exported by
the server if the file was changed directly in the source
filesystem. Fix this by attempting to read the full size specified
when a DIO read is requested.
- Fix a NULL pointer dereference bug due to a data race where a
cachefiles cookies was retired even though it was still in use.
Check the cookie's n_accesses counter before discarding it.
nsfs:
- Fix ioctl declaration for NS_GET_MNTNS_ID from _IO() to _IOR() as
the kernel is writing to userspace.
pidfs:
- Prevent the creation of pidfds for kthreads until we have a
use-case for it and we know the semantics we want. It also confuses
userspace why they can get pidfds for kthreads.
squashfs:
- Fix an unitialized value bug reported by KMSAN caused by a
corrupted symbolic link size read from disk. Check that the
symbolic link size is not larger than expected"
* tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
Squashfs: sanity check symbolic link size
9p: Fix DIO read through netfs
vfs: Don't evict inode under the inode lru traversing context
netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag"
file: fix typo in take_fd() comment
pidfd: prevent creation of pidfds for kthreads
netfs: clean up after renaming FSCACHE_DEBUG config
libfs: fix infinite directory reads for offset dir
nsfs: fix ioctl declaration
fs/netfs/fscache_cookie: add missing "n_accesses" check
filelock: fix name of file_lease slab cache
netfs: Fault in smaller chunks for non-large folio mappings
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAma75lcACgkQ6rmadz2v
bToAARAAjGtU7TXWr7mRO9IuwjhbnIv4STWsfZQAOm9qIVxl72vqE6YQVISSI6+l
lBgVbrzvrd5DhO3y/xz5/4e/uS2zrS566jiwI+g7rvHYTI4CWBXrfewXIwPEjprT
nXntq+tRM7DpdiZk3KmO3nf/CQyj0Pft5AJQR3B3fmehoqjrhDNk6YnSum1SoRPx
yBparmWpIn4ysk2y0RCB4bpzjQIxCQwhYAVpQeiFar7blfWMKGYi+/ti0fPnvWFG
atUU7JZhIBaXcDfVEC82vECd/CkxZpWrgtv9kgp3b75ylE3vhR6C9Kq2nd0zBOFU
+wqrAwStLRDFnv+WkAHkPH2vIXQ+9ACwefINYrzzjFmoQr75jEYlo7oal2GU7sQI
Y5M9qGW6HcVUtw9soGT5eoqlwK2Uot/FpvsAVKCZq6TcniCGDeQomXEy6Rmc5aSz
HHIbmYSsSwuoWkXZ8Gn2He2QKTmGSG9d3EII8Ge6C7Ev2Rx1ARLZKG+jAy+igMfn
QDL9capp3LSTvbPGYU1z0llN6nW4u9JrkCPjFdopwM8RZoXphArtZMki87StXmH/
eZxA6PEJcFdBhCUkge14s/D7YW+dw05YBRsJ6QnEQULHx6pd77jlhT4Sv+U/SGL/
ghtmqIdT8JreL4e4dCS/sH7fkaT9ZmO12Awi3dPz8cs/B5LwdWY=
=sGTX
-----END PGP SIGNATURE-----
Merge tag 'bpf-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix bpftrace regression from Kyle Huey.
Tracing bpf prog was called with perf_event input arguments causing
bpftrace produce garbage output.
- Fix verifier crash in stacksafe() from Yonghong Song.
Daniel Hodges reported verifier crash when playing with sched-ext.
The stack depth in the known verifier state was larger than stack
depth in being explored state causing out-of-bounds access.
- Fix update of freplace prog in prog_array from Leon Hwang.
freplace prog type wasn't recognized correctly.
* tag 'bpf-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
perf/bpf: Don't call bpf_overflow_handler() for tracing events
selftests/bpf: Add a test to verify previous stacksafe() fix
bpf: Fix a kernel verifier crash in stacksafe()
bpf: Fix updating attached freplace prog in prog_array map
If a file has the S_DAX flag (aka fsdax access mode) set, we cannot
allow users to change the realtime flag unless the datadev and rtdev
both support fsdax access modes. Even if there are no extents allocated
to the file, the setattr thread could be racing with another thread
that has already started down the write code paths.
Fixes: ba23cba9b3 ("fs: allow per-device dax status checking for filesystems")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
In commit 9adf40249e, we changed the behavior of the AIL thread to
set its own task state to KILLABLE whenever the timeout value is
nonzero. Unfortunately, this missed the fact that xfsaild_push will
return 50ms (aka a longish sleep) when we reach the push target or the
AIL becomes empty, so xfsaild goes to sleep for a long period of time in
uninterruptible D state.
This results in artificially high load averages because KILLABLE
processes are UNINTERRUPTIBLE, which contributes to load average even
though the AIL is asleep waiting for someone to interrupt it. It's not
blocked on IOs or anything, but people scrap ps for processes that look
like they're stuck in D state, so restore the previous threshold.
Fixes: 9adf40249e ("xfs: AIL doesn't need manual pushing")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
It turns out that I misunderstood the difference between the attr and
attr2 feature bits. "attr" means that at some point an attr fork was
created somewhere in the filesystem. "attr2" means that inodes have
variable-sized forks, but says nothing about whether or not there
actually /are/ attr forks in the system.
If we have an attr fork, we only need to check that attr is set.
Fixes: 99d9d8d05d ("xfs: scrub inode block mappings")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
The data conversion is done rather by a wrong function. We convert to
BE32, not from BE32. Although the end result must be same, this was
complained by the compiler.
Fix the code again and align with another similar function
tas2563_apply_calib() that does already right.
Fixes: 3beddef84d ("ALSA: hda/tas2781: fix wrong calibrated data order")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202408141630.DiDUB8Z4-lkp@intel.com/
Link: https://patch.msgid.link/20240814100500.1944-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This reverts commit 28ab976911.
Sense data can be in either fixed format or descriptor format.
SAT-6 revision 1, "10.4.6 Control mode page", defines the D_SENSE bit:
"The SATL shall support this bit as defined in SPC-5 with the following
exception: if the D_ SENSE bit is set to zero (i.e., fixed format sense
data), then the SATL should return fixed format sense data for ATA
PASS-THROUGH commands."
The libata SATL has always kept D_SENSE set to zero by default. (It is
however possible to change the value using a MODE SELECT SG_IO command.)
Failed ATA PASS-THROUGH commands correctly respected the D_SENSE bit,
however, successful ATA PASS-THROUGH commands incorrectly returned the
sense data in descriptor format (regardless of the D_SENSE bit).
Commit 28ab976911 ("ata: libata-scsi: Honor the D_SENSE bit for
CK_COND=1 and no error") fixed this bug for successful ATA PASS-THROUGH
commands.
However, after commit 28ab976911 ("ata: libata-scsi: Honor the D_SENSE
bit for CK_COND=1 and no error"), there were bug reports that hdparm,
hddtemp, and udisks were no longer working as expected.
These applications incorrectly assume the returned sense data is in
descriptor format, without even looking at the RESPONSE CODE field in the
returned sense data (to see which format the returned sense data is in).
Considering that there will be broken versions of these applications around
roughly forever, we are stuck with being bug compatible with older kernels.
Cc: stable@vger.kernel.org # 4.19+
Reported-by: Stephan Eisvogel <eisvogel@seitics.de>
Reported-by: Christian Heusel <christian@heusel.eu>
Closes: https://lore.kernel.org/linux-ide/0bf3f2f0-0fc6-4ba5-a420-c0874ef82d64@heusel.eu/
Fixes: 28ab976911 ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error")
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240813131900.1285842-2-cassel@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
This patch is based on the discussions between Neal Cardwell and
Eric Dumazet in the link
https://lore.kernel.org/netdev/20240726204105.1466841-1-quic_subashab@quicinc.com/
It was correctly pointed out that tp->window_clamp would not be
updated in cases where net.ipv4.tcp_moderate_rcvbuf=0 or if
(copied <= tp->rcvq_space.space). While it is expected for most
setups to leave the sysctl enabled, the latter condition may
not end up hitting depending on the TCP receive queue size and
the pattern of arriving data.
The updated check should be hit only on initial MSS update from
TCP_MIN_MSS to measured MSS value and subsequently if there was
an update to a larger value.
Fixes: 05f76b2d63 ("tcp: Adjust clamping window for applications specifying SO_RCVBUF")
Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit a0821ca14b ("media: atomisp: Remove test pattern generator (TPG)
support") broke BYT support because it removed a seemingly unused field
from struct sh_css_sp_config and a seemingly unused value from enum
ia_css_input_mode.
But these are part of the ABI between the kernel and firmware on ISP2400
and this part of the TPG support removal changes broke ISP2400 support.
ISP2401 support was not affected because on ISP2401 only a part of
struct sh_css_sp_config is used.
Restore the removed field and enum value to fix this.
Fixes: a0821ca14b ("media: atomisp: Remove test pattern generator (TPG) support")
Cc: stable@vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
This adds another disk accounting counter to track usage per inode
number (any snapshot ID).
This will be used for a couple things:
- It'll give us a way to tell the user how much space a given file ista
consuming in all snapshots; i.e. how much extra space it's consuming
due to snapshot versioning.
- It counts number of extents and total size of extents (both in btree
keyspace sectors and actual disk usage), meaning it gives us average
extent size: that is, it'll let us cheaply find fragmented files that
should be defragmented.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The next patch will be adding a disk accounting counter type which is
not kept in the in-memory eytzinger tree.
As prep, fold __bch2_accounting_mem_mod() into
bch2_accounting_mem_mod_locked() so that we can check for that counter
type and bail out without calling bpos_to_disk_accounting_pos() twice.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bkey_fsck_err() was added as an interface that looks like fsck_err(),
but previously all it did was ensure that the appropriate error counter
was incremented in the superblock.
This is a cleanup and bugfix patch that converts it to a wrapper around
fsck_err(). This is needed to fix an issue with the upgrade path to
disk_accounting_v3, where the "silent fix" error list now includes
bkey_fsck errors; fsck_err() handles this in a unified way, and since we
need to change printing of bkey fsck errors from the caller to the inner
bkey_fsck_err() calls, this ends up being a pretty big change.
Als,, rename .invalid() methods to .validate(), for clarity, while we're
changing the function signature anyways (to drop the printbuf argument).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This helps ensure key cache reclaim isn't contending with threads
waiting for the key cache to be helped, and fixes a severe performance
bug.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If we need to increase the tree depth, allocate a new node, and then
race with another thread that increased the tree depth before us, we'll
still have a preallocated node that might be used later.
If we then use that node for a new non-root node, it'll still have a
pointer to the old root instead of being zeroed - fix this by zeroing it
in the cmpxchg failure path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
for_each_btree_node() now works similarly to for_each_btree_key(), where
the loop body is passed as an argument to be passed to lockrestart_do().
This now calls trans_begin() on every loop iteration - which fixes an
SRCU warning in backpointers fsck.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bch2_trigger_alloc was assuming that the new key would always be newly
created and thus always an alloc_v4 key, but - not when called from
btree_gc.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bch2_btree_path_traverse_cached() was previously checking if it could
just relock the path, which is a common idiom in path traversal.
However, it was using btree_node_relock(), not btree_path_relock();
btree_path_relock() only succeeds if the path was in state
BTREE_ITER_NEED_RELOCK.
If the path was in state BTREE_ITER_NEED_TRAVERSE a full traversal is
needed; this led to a null ptr deref in
bch2_btree_path_traverse_cached().
And the short circuit check here isn't needed, since it was already done
in the main bch2_btree_path_traverse_one().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
ssn_offset field is u32 and is placed into the netlink response with
nla_put_u32(), but only 2 bytes are reserved for the attribute payload
in subflow_get_info_size() (even though it makes no difference
in the end, as it is aligned up to 4 bytes). Supply the correct
argument to the relevant nla_total_size() call to make it less
confusing.
Fixes: 5147dfb508 ("mptcp: allow dumping subflow context to userspace")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20240812065024.GA19719@asgard.redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- binfmt_flat: Fix corruption when not offsetting data start
- exec: Fix ToCToU between perm check and set-uid/gid usage
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCZrvBmwAKCRA2KwveOeQk
u9Z2AQDgtypF4Kficiwn9BwZL5OxCxr9XnFuCYUue8Ufzm2WdQEAwhl1tcbs+Auf
VAqpr6gZTRlpBjbl55LHeyMdxIbwhg4=
=L28M
-----END PGP SIGNATURE-----
Merge tag 'execve-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve fixes from Kees Cook:
- binfmt_flat: Fix corruption when not offsetting data start
- exec: Fix ToCToU between perm check and set-uid/gid usage
* tag 'execve-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
exec: Fix ToCToU between perm check and set-uid/gid usage
binfmt_flat: Fix corruption when not offsetting data start
Add the missing geni_icc_disable() call before returning in the
geni_i2c_runtime_resume() function.
Commit 9ba48db9f7 ("i2c: qcom-geni: Add missing
geni_icc_disable in geni_i2c_runtime_resume") by Gaosheng missed
disabling the interconnect in one case.
Fixes: bf225ed357 ("i2c: i2c-qcom-geni: Add interconnect support")
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: stable@vger.kernel.org # v5.9+
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
When of_irq_parse_raw() is invoked with a device address smaller than
the interrupt parent node (from #address-cells property), KASAN detects
the following out-of-bounds read when populating the initial match table
(dyndbg="func of_irq_parse_* +p"):
OF: of_irq_parse_one: dev=/soc@0/picasso/watchdog, index=0
OF: parent=/soc@0/pci@878000000000/gpio0@17,0, intsize=2
OF: intspec=4
OF: of_irq_parse_raw: ipar=/soc@0/pci@878000000000/gpio0@17,0, size=2
OF: -> addrsize=3
==================================================================
BUG: KASAN: slab-out-of-bounds in of_irq_parse_raw+0x2b8/0x8d0
Read of size 4 at addr ffffff81beca5608 by task bash/764
CPU: 1 PID: 764 Comm: bash Tainted: G O 6.1.67-484c613561-nokia_sm_arm64 #1
Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2023.01-12.24.03-dirty 01/01/2023
Call trace:
dump_backtrace+0xdc/0x130
show_stack+0x1c/0x30
dump_stack_lvl+0x6c/0x84
print_report+0x150/0x448
kasan_report+0x98/0x140
__asan_load4+0x78/0xa0
of_irq_parse_raw+0x2b8/0x8d0
of_irq_parse_one+0x24c/0x270
parse_interrupts+0xc0/0x120
of_fwnode_add_links+0x100/0x2d0
fw_devlink_parse_fwtree+0x64/0xc0
device_add+0xb38/0xc30
of_device_add+0x64/0x90
of_platform_device_create_pdata+0xd0/0x170
of_platform_bus_create+0x244/0x600
of_platform_notify+0x1b0/0x254
blocking_notifier_call_chain+0x9c/0xd0
__of_changeset_entry_notify+0x1b8/0x230
__of_changeset_apply_notify+0x54/0xe4
of_overlay_fdt_apply+0xc04/0xd94
...
The buggy address belongs to the object at ffffff81beca5600
which belongs to the cache kmalloc-128 of size 128
The buggy address is located 8 bytes inside of
128-byte region [ffffff81beca5600, ffffff81beca5680)
The buggy address belongs to the physical page:
page:00000000230d3d03 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1beca4
head:00000000230d3d03 order:1 compound_mapcount:0 compound_pincount:0
flags: 0x8000000000010200(slab|head|zone=2)
raw: 8000000000010200 0000000000000000 dead000000000122 ffffff810000c300
raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffffff81beca5500: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffffff81beca5580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffffff81beca5600: 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffffff81beca5680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffffff81beca5700: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc
==================================================================
OF: -> got it !
Prevent the out-of-bounds read by copying the device address into a
buffer of sufficient size.
Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com>
Link: https://lore.kernel.org/r/20240812100652.3800963-1-stefan.wiehler@nokia.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
When opening a file for exec via do_filp_open(), permission checking is
done against the file's metadata at that moment, and on success, a file
pointer is passed back. Much later in the execve() code path, the file
metadata (specifically mode, uid, and gid) is used to determine if/how
to set the uid and gid. However, those values may have changed since the
permissions check, meaning the execution may gain unintended privileges.
For example, if a file could change permissions from executable and not
set-id:
---------x 1 root root 16048 Aug 7 13:16 target
to set-id and non-executable:
---S------ 1 root root 16048 Aug 7 13:16 target
it is possible to gain root privileges when execution should have been
disallowed.
While this race condition is rare in real-world scenarios, it has been
observed (and proven exploitable) when package managers are updating
the setuid bits of installed programs. Such files start with being
world-executable but then are adjusted to be group-exec with a set-uid
bit. For example, "chmod o-x,u+s target" makes "target" executable only
by uid "root" and gid "cdrom", while also becoming setuid-root:
-rwxr-xr-x 1 root cdrom 16048 Aug 7 13:16 target
becomes:
-rwsr-xr-- 1 root cdrom 16048 Aug 7 13:16 target
But racing the chmod means users without group "cdrom" membership can
get the permission to execute "target" just before the chmod, and when
the chmod finishes, the exec reaches brpm_fill_uid(), and performs the
setuid to root, violating the expressed authorization of "only cdrom
group members can setuid to root".
Re-check that we still have execute permissions in case the metadata
has changed. It would be better to keep a copy from the perm-check time,
but until we can do that refactoring, the least-bad option is to do a
full inode_permission() call (under inode lock). It is understood that
this is safe against dead-locks, but hardly optimal.
Reported-by: Marco Vanotti <mvanotti@google.com>
Tested-by: Marco Vanotti <mvanotti@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Kees Cook <kees@kernel.org>
kmalloc is unreliable when allocating more than 8 pages of memory. It may
fail when there is plenty of free memory but the memory is fragmented.
Zdenek Kabelac observed such failure in his tests.
This commit changes kmalloc to kvmalloc - kvmalloc will fall back to
vmalloc if the large allocation fails.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Cc: stable@vger.kernel.org
The regressing commit is new in 6.10. It assumed that anytime event->prog
is set bpf_overflow_handler() should be invoked to execute the attached bpf
program. This assumption is false for tracing events, and as a result the
regressing commit broke bpftrace by invoking the bpf handler with garbage
inputs on overflow.
Prior to the regression the overflow handlers formed a chain (of length 0,
1, or 2) and perf_event_set_bpf_handler() (the !tracing case) added
bpf_overflow_handler() to that chain, while perf_event_attach_bpf_prog()
(the tracing case) did not. Both set event->prog. The chain of overflow
handlers was replaced by a single overflow handler slot and a fixed call to
bpf_overflow_handler() when appropriate. This modifies the condition there
to check event->prog->type == BPF_PROG_TYPE_PERF_EVENT, restoring the
previous behavior and fixing bpftrace.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Reported-by: Joe Damato <jdamato@fastly.com>
Closes: https://lore.kernel.org/lkml/ZpFfocvyF3KHaSzF@LQ3V64L9R2/
Fixes: f11f10bfa1 ("perf/bpf: Call BPF handler directly, not through overflow machinery")
Cc: stable@vger.kernel.org
Tested-by: Joe Damato <jdamato@fastly.com> # bpftrace
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240813151727.28797-1-jdamato@fastly.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
add HDP_SD support on gc 12.0.0/1
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 61cffacb3a)
kmd_fw_shared changed in VCN5
Signed-off-by: Yinjie Yao <yinjie.yao@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit aa02486fb1)
Add JPEG IB command parser to ensure registers
in the command are within the JPEG IP block.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit a7f670d5d8)
Cc: stable@vger.kernel.org
Use mes pipe to unmap kcq and kgq.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f7fb9d677f)
Free memory for two pipes and unmap pipe0 via pipe1.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 98cae695a8)
Configure two pipes with different hardware resources.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ea5d6db17a)
Adjust mes12 sw/hw initiailization for both pipe0 and pipe1
enablement. The two pipes are almost identical pipe. Pipe0
behaves like schq and pipe1 like kiq, pipe0 was mapped by pipe1.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit aa539da8af)
Add mes pipe switch to let caller choose pipe
to submit packet.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b2dee0837a)
Enable unified mes firmware to load on pipe0 and pipe1.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e69c2dd753)
Add multiple mes ring instances in mes structure to support
multiple mes pipes.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c7d4355648)
Update mes12 api definition.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2ab5dc5917)
Missing validation ...
Checked libdrm and it clears all the structs, so we should be
safe to just check everything.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c6b86421f1)
Cc: stable@vger.kernel.org
This needs to be set as well if the IB uses atomics.
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit c6c2e8b6a4)
Cc: stable@vger.kernel.org
This needs to be set as well if the IB uses atomics.
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 35c628774e)
Cc: stable@vger.kernel.org
[why & how]
When the commit 9d84c7ef8a ("drm/amd/display: Correct cursor position
on horizontal mirror") was introduced, it used the wrong calculation for
the position copy for X. This commit uses the correct calculation for that
based on the original patch.
Fixes: 9d84c7ef8a ("drm/amd/display: Correct cursor position on horizontal mirror")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8f9b23abba)
Cc: stable@vger.kernel.org