mem=[X][G|M] is broken on ARM64 platform, there are cases that even
type.cnt is 1, but total_size is not 0 because regions are merged into
1. So only check 'cnt' is not enough, total_size should be used,
othersize bootargs 'mem=[X][G|B]' not work anymore.
Link: https://lkml.kernel.org/r/20210930024437.32598-1-peng.fan@oss.nxp.com
Fixes: e888fa7bb8 ("memblock: Check memory add/cap ordering")
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Starting with kernel 5.11 built with CONFIG_FORTIFY_SOURCE mouting an
ocfs2 filesystem with either o2cb or pcmk cluster stack fails with the
trace below. Problem seems to be that strings for cluster stack and
cluster name are not guaranteed to be null terminated in the disk
representation, while strlcpy assumes that the source string is always
null terminated. This causes a read outside of the source string
triggering the buffer overflow detection.
detected buffer overflow in strlen
------------[ cut here ]------------
kernel BUG at lib/string.c:1149!
invalid opcode: 0000 [#1] SMP PTI
CPU: 1 PID: 910 Comm: mount.ocfs2 Not tainted 5.14.0-1-amd64 #1
Debian 5.14.6-2
RIP: 0010:fortify_panic+0xf/0x11
...
Call Trace:
ocfs2_initialize_super.isra.0.cold+0xc/0x18 [ocfs2]
ocfs2_fill_super+0x359/0x19b0 [ocfs2]
mount_bdev+0x185/0x1b0
legacy_get_tree+0x27/0x40
vfs_get_tree+0x25/0xb0
path_mount+0x454/0xa20
__x64_sys_mount+0x103/0x140
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Link: https://lkml.kernel.org/r/20210929180654.32460-1-vvidic@valentin-vidic.from.hr
Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 6dbf7bb555 ("fs: Don't invalidate page buffers in
block_write_full_page()") uncovered a latent bug in ocfs2 conversion
from inline inode format to a normal inode format.
The code in ocfs2_convert_inline_data_to_extents() attempts to zero out
the whole cluster allocated for file data by grabbing, zeroing, and
dirtying all pages covering this cluster. However these pages are
beyond i_size, thus writeback code generally ignores these dirty pages
and no blocks were ever actually zeroed on the disk.
This oversight was fixed by commit 693c241a5f ("ocfs2: No need to zero
pages past i_size.") for standard ocfs2 write path, inline conversion
path was apparently forgotten; the commit log also has a reasoning why
the zeroing actually is not needed.
After commit 6dbf7bb555, things became worse as writeback code stopped
invalidating buffers on pages beyond i_size and thus these pages end up
with clean PageDirty bit but with buffers attached to these pages being
still dirty. So when a file is converted from inline format, then
writeback triggers, and then the file is grown so that these pages
become valid, the invalid dirtiness state is preserved,
mark_buffer_dirty() does nothing on these pages (buffers are already
dirty) but page is never written back because it is clean. So data
written to these pages is lost once pages are reclaimed.
Simple reproducer for the problem is:
xfs_io -f -c "pwrite 0 2000" -c "pwrite 2000 2000" -c "fsync" \
-c "pwrite 4000 2000" ocfs2_file
After unmounting and mounting the fs again, you can observe that end of
'ocfs2_file' has lost its contents.
Fix the problem by not doing the pointless zeroing during conversion
from inline format similarly as in the standard write path.
[akpm@linux-foundation.org: fix whitespace, per Joseph]
Link: https://lkml.kernel.org/r/20210930095405.21433-1-jack@suse.cz
Fixes: 6dbf7bb555 ("fs: Don't invalidate page buffers in block_write_full_page()")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: "Markov, Andrey" <Markov.Andrey@Dell.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The node demotion order needs to be updated during CPU hotplug. Because
whether a NUMA node has CPU may influence the demotion order. The
update function should be called during CPU online/offline after the
node_states[N_CPU] has been updated. That is done in
CPUHP_AP_ONLINE_DYN during CPU online and in CPUHP_MM_VMSTAT_DEAD during
CPU offline. But in commit 884a6e5d1f ("mm/migrate: update node
demotion order on hotplug events"), the function to update node demotion
order is called in CPUHP_AP_ONLINE_DYN during CPU online/offline. This
doesn't satisfy the order requirement.
For example, there are 4 CPUs (P0, P1, P2, P3) in 2 sockets (P0, P1 in S0
and P2, P3 in S1), the demotion order is
- S0 -> NUMA_NO_NODE
- S1 -> NUMA_NO_NODE
After P2 and P3 is offlined, because S1 has no CPU now, the demotion
order should have been changed to
- S0 -> S1
- S1 -> NO_NODE
but it isn't changed, because the order updating callback for CPU
hotplug doesn't see the new nodemask. After that, if P1 is offlined,
the demotion order is changed to the expected order as above.
So in this patch, we added CPUHP_AP_MM_DEMOTION_ONLINE and
CPUHP_MM_DEMOTION_DEAD to be called after CPUHP_AP_ONLINE_DYN and
CPUHP_MM_VMSTAT_DEAD during CPU online and offline, and register the
update function on them.
Link: https://lkml.kernel.org/r/20210929060351.7293-1-ying.huang@intel.com
Fixes: 884a6e5d1f ("mm/migrate: update node demotion order on hotplug events")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Once upon a time, the node demotion updates were driven solely by memory
hotplug events. But now, there are handlers for both CPU and memory
hotplug.
However, the #ifdef around the code checks only memory hotplug. A
system that has HOTPLUG_CPU=y but MEMORY_HOTPLUG=n would miss CPU
hotplug events.
Update the #ifdef around the common code. Add memory and CPU-specific
#ifdefs for their handlers. These memory/CPU #ifdefs avoid unused
function warnings when their Kconfig option is off.
[arnd@arndb.de: rework hotplug_memory_notifier() stub]
Link: https://lkml.kernel.org/r/20211013144029.2154629-1-arnd@kernel.org
Link: https://lkml.kernel.org/r/20210924161255.E5FE8F7E@davehans-spike.ostc.intel.com
Fixes: 884a6e5d1f ("mm/migrate: update node demotion order on hotplug events")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm/migrate: 5.15 fixes for automatic demotion", v2.
This contains two fixes for the "automatic demotion" code which was
merged into 5.15:
* Fix memory hotplug performance regression by watching
suppressing any real action on irrelevant hotplug events.
* Ensure CPU hotplug handler is registered when memory hotplug
is disabled.
This patch (of 2):
== tl;dr ==
Automatic demotion opted for a simple, lazy approach to handling hotplug
events. This noticeably slows down memory hotplug[1]. Optimize away
updates to the demotion order when memory hotplug events should have no
effect.
This has no effect on CPU hotplug. There is no known problem on the CPU
side and any work there will be in a separate series.
== Background ==
Automatic demotion is a memory migration strategy to ensure that new
allocations have room in faster memory tiers on tiered memory systems.
The kernel maintains an array (node_demotion[]) to drive these
migrations.
The node_demotion[] path is calculated by starting at nodes with CPUs
and then "walking" to nodes with memory. Only hotplug events which
online or offline a node with memory (N_ONLINE) or CPUs (N_CPU) will
actually affect the migration order.
== Problem ==
However, the current code is lazy. It completely regenerates the
migration order on *any* CPU or memory hotplug event. The logic was
that these events are extremely rare and that the overhead from
indiscriminate order regeneration is minimal.
Part of the update logic involves a synchronize_rcu(), which is a pretty
big hammer. Its overhead was large enough to be detected by some 0day
tests that watch memory hotplug performance[1].
== Solution ==
Add a new helper (node_demotion_topo_changed()) which can differentiate
between superfluous and impactful hotplug events. Skip the expensive
update operation for superfluous events.
== Aside: Locking ==
It took me a few moments to declare the locking to be safe enough for
node_demotion_topo_changed() to work. It all hinges on the memory
hotplug lock:
During memory hotplug events, 'mem_hotplug_lock' is held for write.
This ensures that two memory hotplug events can not be called
simultaneously.
CPU hotplug has a similar lock (cpuhp_state_mutex) which also provides
mutual exclusion between CPU hotplug events. In addition, the demotion
code acquire and hold the mem_hotplug_lock for read during its CPU
hotplug handlers. This provides mutual exclusion between the demotion
memory hotplug callbacks and the CPU hotplug callbacks.
This effectively allows treating the migration target generation code to
act as if it is single-threaded.
1. https://lore.kernel.org/all/20210905135932.GE15026@xsang-OptiPlex-9020/
Link: https://lkml.kernel.org/r/20210924161251.093CCD06@davehans-spike.ostc.intel.com
Link: https://lkml.kernel.org/r/20210924161253.D7673E31@davehans-spike.ostc.intel.com
Fixes: 884a6e5d1f ("mm/migrate: update node demotion order on hotplug events")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A race is possible when a process exits, its VMAs are removed by
exit_mmap() and at the same time userfaultfd_writeprotect() is called.
The race was detected by KASAN on a development kernel, but it appears
to be possible on vanilla kernels as well.
Use mmget_not_zero() to prevent the race as done in other userfaultfd
operations.
Link: https://lkml.kernel.org/r/20210921200247.25749-1-namit@vmware.com
Fixes: 63b2d4174c ("userfaultfd: wp: add the writeprotect API to userfaultfd ioctl")
Signed-off-by: Nadav Amit <namit@vmware.com>
Tested-by: Li Wang <liwang@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In RHEL's gating selftests we've encountered memory corruption in the
uffd event test even with upstream kernel:
# ./userfaultfd anon 128 4
nr_pages: 32768, nr_pages_per_cpu: 32768
bounces: 3, mode: rnd racing read, userfaults: 6240 missing (6240) 14729 wp (14729)
bounces: 2, mode: racing read, userfaults: 1444 missing (1444) 28877 wp (28877)
bounces: 1, mode: rnd read, userfaults: 6055 missing (6055) 14699 wp (14699)
bounces: 0, mode: read, userfaults: 82 missing (82) 25196 wp (25196)
testing uffd-wp with pagemap (pgsize=4096): done
testing uffd-wp with pagemap (pgsize=2097152): done
testing events (fork, remap, remove): ERROR: nr 32427 memory corruption 0 1 (errno=0, line=963)
ERROR: faulting process failed (errno=0, line=1117)
It can be easily reproduced when global thp enabled, which is the
default for RHEL.
It's also known as a side effect of commit 0db282ba2c ("selftest: use
mmap instead of posix_memalign to allocate memory", 2021-07-23), which
is imho right itself on using mmap() to make sure the addresses will be
untagged even on arm.
The problem is, for each test we allocate buffers using two
allocate_area() calls. We assumed these two buffers won't affect each
other, however they could, because mmap() could have found that the two
buffers are near each other and having the same VMA flags, so they got
merged into one VMA.
It won't be a big problem if thp is not enabled, but when thp is
agressively enabled it means when initializing the src buffer it could
accidentally setup part of the dest buffer too when there's a shared THP
that overlaps the two regions. Then some of the dest buffer won't be
able to be trapped by userfaultfd missing mode, then it'll cause memory
corruption as described.
To fix it, do release_pages() after initializing the src buffer.
Since the previous two release_pages() calls are after
uffd_test_ctx_clear() which will unmap all the buffers anyway (which is
stronger than release pages; as unmap() also tear town pgtables), drop
them as they shouldn't really be anything useful.
We can mark the Fixes tag upon 0db282ba2c as it's reported to only
happen there, however the real "Fixes" IMHO should be 8ba6e86408, as
before that commit we'll always do explicit release_pages() before
registration of uffd, and 8ba6e86408 changed that logic by adding
extra unmap/map and we didn't release the pages at the right place.
Meanwhile I don't have a solid glue anyway on whether posix_memalign()
could always avoid triggering this bug, hence it's safer to attach this
fix to commit 8ba6e86408.
Link: https://lkml.kernel.org/r/20210923232512.210092-1-peterx@redhat.com
Fixes: 8ba6e86408 ("userfaultfd/selftests: reinitialize test context in each test")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1994931
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Li Wang <liwan@redhat.com>
Tested-by: Li Wang <liwang@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a Jieli Technology USB Webcam is connected, the video part works
well, but the mic sound is speeded up. On dmesg there are messages
about different rates from the runtime rates, warnings about volume
resolution and lastly, the log is filled, every 5 seconds, with
retire_capture_urb error messages.
The mic works only when ep packet size is set to wMaxPacketSize (normal
sound and no more retire_capture_urb error messages). Skipping reading
sample rate, fixes the messages about different rates and forcing a volume
resolution, fixes warnings about volume range. I have arbitrarily choosed
the value (16): I read in a comment that there should be no more than 255
levels, so 4096 (max volume) / 16 = 0-255.
Signed-off-by: Marco Giunta <giun7a@gmail.com>
Link: https://lore.kernel.org/r/20211018162552.12082-1-giun7a@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Implement the ->restore() PM operation and set the link to off, which will
force a full reset and restore. This ensures that Host Performance Booster
is reset after suspend-to-disk.
The Host Performance Booster feature caches logical-to-physical mapping
information in the host memory. After suspend-to-disk, such information is
not valid, so a full reset and restore is needed.
A full reset and restore is done if the SPM level is 5 or 6, but not for
other SPM levels, so this change fixes those cases.
A full reset and restore also restores base address registers, so that code
is removed.
Link: https://lore.kernel.org/r/20211018151004.284200-2-adrian.hunter@intel.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The sgl is freed in the target stack in target_release_cmd_kref() before
calling qlt_free_cmd() but there is an unmap of sgl in qlt_free_cmd() that
causes a panic if sgl is not yet DMA unmapped:
NIP dma_direct_unmap_sg+0xdc/0x180
LR dma_direct_unmap_sg+0xc8/0x180
Call Trace:
ql_dbg_prefix+0x68/0xc0 [qla2xxx] (unreliable)
dma_unmap_sg_attrs+0x54/0xf0
qlt_unmap_sg.part.19+0x54/0x1c0 [qla2xxx]
qlt_free_cmd+0x124/0x1d0 [qla2xxx]
tcm_qla2xxx_release_cmd+0x4c/0xa0 [tcm_qla2xxx]
target_put_sess_cmd+0x198/0x370 [target_core_mod]
transport_generic_free_cmd+0x6c/0x1b0 [target_core_mod]
tcm_qla2xxx_complete_free+0x6c/0x90 [tcm_qla2xxx]
The sgl may be left unmapped in error cases of response sending. For
instance, qlt_rdy_to_xfer() maps sgl and exits when session is being
deleted keeping the sgl mapped.
This patch removes use-after-free of the sgl and ensures that the sgl is
unmapped for any command that was not sent to firmware.
Link: https://lore.kernel.org/r/20211018122650.11846-1-d.bogdanov@yadro.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 8c0eb596ba ("[SCSI] qla2xxx: Fix a memory leak in an error path of
qla2x00_process_els()"), intended to change:
bsg_job->request->msgcode == FC_BSG_HST_ELS_NOLOGIN
to:
bsg_job->request->msgcode != FC_BSG_RPT_ELS
but changed it to:
bsg_job->request->msgcode == FC_BSG_RPT_ELS
instead.
Change the == to a != to avoid leaking the fcport structure or freeing
unallocated memory.
Link: https://lore.kernel.org/r/20211012191834.90306-2-jgu@purestorage.com
Fixes: 8c0eb596ba ("[SCSI] qla2xxx: Fix a memory leak in an error path of qla2x00_process_els()")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Joy Gu <jgu@purestorage.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver probing function should return < 0 for failure, otherwise
kernel will treat value > 0 as success.
Link: https://lore.kernel.org/r/1634522181-31166-1-git-send-email-zheyuma97@gmail.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix possible null-pointer dereference in audit_filter_rules.
audit_filter_rules() error: we previously assumed 'ctx' could be null
Cc: stable@vger.kernel.org
Fixes: bf361231c2 ("audit: add saddr_fam filter field")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
While writing an email explaining the "bit = 0" logic for a discussion on
making ftrace_test_recursion_trylock() disable preemption, I discovered a
path that makes the "not do the logic if bit is zero" unsafe.
The recursion logic is done in hot paths like the function tracer. Thus,
any code executed causes noticeable overhead. Thus, tricks are done to try
to limit the amount of code executed. This included the recursion testing
logic.
Having recursion testing is important, as there are many paths that can
end up in an infinite recursion cycle when tracing every function in the
kernel. Thus protection is needed to prevent that from happening.
Because it is OK to recurse due to different running context levels (e.g.
an interrupt preempts a trace, and then a trace occurs in the interrupt
handler), a set of bits are used to know which context one is in (normal,
softirq, irq and NMI). If a recursion occurs in the same level, it is
prevented*.
Then there are infrastructure levels of recursion as well. When more than
one callback is attached to the same function to trace, it calls a loop
function to iterate over all the callbacks. Both the callbacks and the
loop function have recursion protection. The callbacks use the
"ftrace_test_recursion_trylock()" which has a "function" set of context
bits to test, and the loop function calls the internal
trace_test_and_set_recursion() directly, with an "internal" set of bits.
If an architecture does not implement all the features supported by ftrace
then the callbacks are never called directly, and the loop function is
called instead, which will implement the features of ftrace.
Since both the loop function and the callbacks do recursion protection, it
was seemed unnecessary to do it in both locations. Thus, a trick was made
to have the internal set of recursion bits at a more significant bit
location than the function bits. Then, if any of the higher bits were set,
the logic of the function bits could be skipped, as any new recursion
would first have to go through the loop function.
This is true for architectures that do not support all the ftrace
features, because all functions being traced must first go through the
loop function before going to the callbacks. But this is not true for
architectures that support all the ftrace features. That's because the
loop function could be called due to two callbacks attached to the same
function, but then a recursion function inside the callback could be
called that does not share any other callback, and it will be called
directly.
i.e.
traced_function_1: [ more than one callback tracing it ]
call loop_func
loop_func:
trace_recursion set internal bit
call callback
callback:
trace_recursion [ skipped because internal bit is set, return 0 ]
call traced_function_2
traced_function_2: [ only traced by above callback ]
call callback
callback:
trace_recursion [ skipped because internal bit is set, return 0 ]
call traced_function_2
[ wash, rinse, repeat, BOOM! out of shampoo! ]
Thus, the "bit == 0 skip" trick is not safe, unless the loop function is
call for all functions.
Since we want to encourage architectures to implement all ftrace features,
having them slow down due to this extra logic may encourage the
maintainers to update to the latest ftrace features. And because this
logic is only safe for them, remove it completely.
[*] There is on layer of recursion that is allowed, and that is to allow
for the transition between interrupt context (normal -> softirq ->
irq -> NMI), because a trace may occur before the context update is
visible to the trace recursion logic.
Link: https://lore.kernel.org/all/609b565a-ed6e-a1da-f025-166691b5d994@linux.alibaba.com/
Link: https://lkml.kernel.org/r/20211018154412.09fcad3c@gandalf.local.home
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Jisheng Zhang <jszhang@kernel.org>
Cc: =?utf-8?b?546L6LSH?= <yun.wang@linux.alibaba.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: stable@vger.kernel.org
Fixes: edc15cafcb ("tracing: Avoid unnecessary multiple recursion checks")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
A new warning in clang points out two places in this driver where
boolean expressions are being used with a bitwise OR instead of a
logical one:
drivers/net/ethernet/netronome/nfp/nfp_asm.c:199:20: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
reg->src_lmextn = swreg_lmextn(lreg) | swreg_lmextn(rreg);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
||
drivers/net/ethernet/netronome/nfp/nfp_asm.c:199:20: note: cast one or both operands to int to silence this warning
drivers/net/ethernet/netronome/nfp/nfp_asm.c:280:20: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
reg->src_lmextn = swreg_lmextn(lreg) | swreg_lmextn(rreg);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
||
drivers/net/ethernet/netronome/nfp/nfp_asm.c:280:20: note: cast one or both operands to int to silence this warning
2 errors generated.
The motivation for the warning is that logical operations short circuit
while bitwise operations do not. In this case, it does not seem like
short circuiting is harmful so implement the suggested fix of changing
to a logical operation to fix the warning.
Link: https://github.com/ClangBuiltLinux/linux/issues/1479
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20211018193101.2340261-1-nathan@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Until we better understand the stability issues caused by frequent
frequency changes, lets limit them to a618.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Tested-by: Caleb Connolly <caleb.connolly@linaro.org>
Link: https://lore.kernel.org/r/20211018153627.2787882-1-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
In commit fda31c5029 ("signal: avoid double atomic counter
increments for user accounting") Linus made a clever optimization to
how rlimits and the struct user_struct. Unfortunately that
optimization does not work in the obvious way when moved to nested
rlimits. The problem is that the last decrement of the per user
namespace per user sigpending counter might also be the last decrement
of the sigpending counter in the parent user namespace as well. Which
means that simply freeing the leaf ucount in __free_sigqueue is not
enough.
Maintain the optimization and handle the tricky cases by introducing
inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
By moving the entire optimization into functions that perform all of
the work it becomes possible to ensure that every level is handled
properly.
The new function inc_rlimit_get_ucounts returns 0 on failure to
increment the ucount. This is different than inc_rlimit_ucounts which
increments the ucounts and returns LONG_MAX if the ucount counter has
exceeded it's maximum or it wrapped (to indicate the counter needs to
decremented).
I wish we had a single user to account all pending signals to across
all of the threads of a process so this complexity was not necessary
Cc: stable@vger.kernel.org
Fixes: d646969055 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
v1: https://lkml.kernel.org/r/87mtnavszx.fsf_-_@disp2133
Link: https://lkml.kernel.org/r/87fssytizw.fsf_-_@disp2133
Reviewed-by: Alexey Gladkov <legion@kernel.org>
Tested-by: Rune Kleveland <rune.kleveland@infomedia.dk>
Tested-by: Yu Zhao <yuzhao@google.com>
Tested-by: Jordan Glover <Golden_Miller83@protonmail.ch>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The size of the GHCB scratch area is limited to 16 KiB (GHCB_SCRATCH_AREA_LIMIT),
so there is no need for it to be a u64. This fixes a build error on 32-bit
systems:
i686-linux-gnu-ld: arch/x86/kvm/svm/sev.o: in function `sev_es_string_io:
sev.c:(.text+0x110f): undefined reference to `__udivdi3'
Cc: stable@vger.kernel.org
Fixes: 019057bd73 ("KVM: SEV-ES: fix length of string I/O")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hardware may or may not set exit_reason.bus_lock_detected on BUS_LOCK
VM-Exits. Dealing with KVM_RUN_X86_BUS_LOCK in handle_bus_lock_vmexit
could be redundant when exit_reason.basic is EXIT_REASON_BUS_LOCK.
We can remove redundant handling of bus lock vmexit. Unconditionally Set
exit_reason.bus_lock_detected in handle_bus_lock_vmexit(), and deal with
KVM_RUN_X86_BUS_LOCK only in vmx_handle_exit().
Signed-off-by: Hao Xiang <hao.xiang@linux.alibaba.com>
Message-Id: <1634299161-30101-1-git-send-email-hao.xiang@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Similar to commit 111d0bda8e ("tools/kvm_stat: Exempt time-based
counters"), we should not show timer values in kvm_stat. Remove the new
halt_wait_ns.
Fixes: 87bcc5fa09 ("KVM: stats: Add halt_wait_ns stats for all architectures")
Cc: Jing Zhang <jingzhangos@google.com>
Cc: Stefan Raspl <raspl@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Stefan Raspl <raspl@linux.ibm.com>
Message-Id: <20211006121724.4154-1-borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
WARN if the static keys used to track if any vCPU has disabled its APIC
are left elevated at module exit. Unlike the underflow case, nothing in
the static key infrastructure will complain if a key is left elevated,
and because an elevated key only affects performance, nothing in KVM will
fail if either key is improperly incremented.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211013003554.47705-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Revert a change to open code bits of kvm_lapic_set_base() when emulating
APIC RESET to fix an apic_hw_disabled underflow bug due to arch.apic_base
and apic_hw_disabled being unsyncrhonized when the APIC is created. If
kvm_arch_vcpu_create() fails after creating the APIC, kvm_free_lapic()
will see the initialized-to-zero vcpu->arch.apic_base and decrement
apic_hw_disabled without KVM ever having incremented apic_hw_disabled.
Using kvm_lapic_set_base() in kvm_lapic_reset() is also desirable for a
potential future where KVM supports RESET outside of vCPU creation, in
which case all the side effects of kvm_lapic_set_base() are needed, e.g.
to handle the transition from x2APIC => xAPIC.
Alternatively, KVM could temporarily increment apic_hw_disabled (and call
kvm_lapic_set_base() at RESET), but that's a waste of cycles and would
impact the performance of other vCPUs and VMs. The other subtle side
effect is that updating the xAPIC ID needs to be done at RESET regardless
of whether the APIC was previously enabled, i.e. kvm_lapic_reset() needs
an explicit call to kvm_apic_set_xapic_id() regardless of whether or not
kvm_lapic_set_base() also performs the update. That makes stuffing the
enable bit at vCPU creation slightly more palatable, as doing so affects
only the apic_hw_disabled key.
Opportunistically tweak the comment to explicitly call out the connection
between vcpu->arch.apic_base and apic_hw_disabled, and add a comment to
call out the need to always do kvm_apic_set_xapic_id() at RESET.
Underflow scenario:
kvm_vm_ioctl() {
kvm_vm_ioctl_create_vcpu() {
kvm_arch_vcpu_create() {
if (something_went_wrong)
goto fail_free_lapic;
/* vcpu->arch.apic_base is initialized when something_went_wrong is false. */
kvm_vcpu_reset() {
kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event) {
vcpu->arch.apic_base = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
}
}
return 0;
fail_free_lapic:
kvm_free_lapic() {
/* vcpu->arch.apic_base is not yet initialized when something_went_wrong is true. */
if (!(vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE))
static_branch_slow_dec_deferred(&apic_hw_disabled); // <= underflow bug.
}
return r;
}
}
}
This (mostly) reverts commit 421221234a.
Fixes: 421221234a ("KVM: x86: Open code necessary bits of kvm_lapic_set_base() at vCPU RESET")
Reported-by: syzbot+9fc046ab2b0cf295a063@syzkaller.appspotmail.com
Debugged-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20211013003554.47705-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The refactoring in commit bb18a67774 ("KVM: SEV: Acquire
vcpu mutex when updating VMSA") left behind the assignment to
svm->vcpu.arch.guest_state_protected; add it back.
Signed-off-by: Peter Gonda <pgonda@google.com>
[Delta between v2 and v3 of Peter's patch, which had already been
committed; the commit message is my own. - Paolo]
Fixes: bb18a67774 ("KVM: SEV: Acquire vcpu mutex when updating VMSA")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If allocation of rmaps fails, but some of the pointers have already been written,
those pointers can be cleaned up when the memslot is freed, or even reused later
for another attempt at allocating the rmaps. Therefore there is no need to
WARN, as done for example in memslot_rmap_alloc, but the allocation *must* be
skipped lest KVM will overwrite the previous pointer and will indeed leak memory.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When adding partitions to the disk, the reference count of the disk
object is increased. then alloc partition device and called
device_add(), if the device_add() return error, the reference
count of the disk object will be reduced twice, at put_device(pdev)
and put_disk(disk). this leads to the end of the object's life cycle
prematurely, and trigger following calltrace.
__init_work+0x2d/0x50 kernel/workqueue.c:519
synchronize_rcu_expedited+0x3af/0x650 kernel/rcu/tree_exp.h:847
bdi_remove_from_list mm/backing-dev.c:938 [inline]
bdi_unregister+0x17f/0x5c0 mm/backing-dev.c:946
release_bdi+0xa1/0xc0 mm/backing-dev.c:968
kref_put include/linux/kref.h:65 [inline]
bdi_put+0x72/0xa0 mm/backing-dev.c:976
bdev_free_inode+0x11e/0x220 block/bdev.c:408
i_callback+0x3f/0x70 fs/inode.c:226
rcu_do_batch kernel/rcu/tree.c:2508 [inline]
rcu_core+0x76d/0x16c0 kernel/rcu/tree.c:2743
__do_softirq+0x1d7/0x93b kernel/softirq.c:558
invoke_softirq kernel/softirq.c:432 [inline]
__irq_exit_rcu kernel/softirq.c:636 [inline]
irq_exit_rcu+0xf2/0x130 kernel/softirq.c:648
sysvec_apic_timer_interrupt+0x93/0xc0
making disk is NULL when calling put_disk().
Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211018103422.2043-1-qiang.zhang1211@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Both arch/nios2/ and drivers/mmc/host/tmio_mmc.c define a macro
with the name "CTL_STATUS". Change the one in arch/nios2/ to be
"CTL_FSTATUS" (flags status) to eliminate the build warning.
In file included from ../drivers/mmc/host/tmio_mmc.c:22:
drivers/mmc/host/tmio_mmc.h:31: warning: "CTL_STATUS" redefined
31 | #define CTL_STATUS 0x1c
arch/nios2/include/asm/registers.h:14: note: this is the location of the previous definition
14 | #define CTL_STATUS 0
Fixes: b31ebd8055 ("nios2: Nios2 registers")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
Kernel TLS test has added SM4 GCM/CCM algorithm support, but SM4
algorithm is not compiled by default, this patch add SM4 config
dependency.
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently have some implicit padding in struct sockaddr_mctp. This
patch makes this padding explicit, and ensures we have consistent
layout on platforms with <32bit alignmnent.
Fixes: 60fc639816 ("mctp: Add sockaddr_mctp to uapi")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the more precise __kernel_sa_family_t for smctp_family, to match
struct sockaddr.
Also, use an unsigned int for the network member; negative networks
don't make much sense. We're already using unsigned for mctp_dev and
mctp_skb_cb, but need to change mctp_sock to suit.
Fixes: 60fc639816 ("mctp: Add sockaddr_mctp to uapi")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Acked-by: Eugene Syromiatnikov <esyr@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During the process of driver probing, the probe function should return < 0
for failure, otherwise, the kernel will treat value > 0 as success.
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix following coccicheck warning:
./drivers/net/ethernet/mscc/ocelot_vsc7514.c:946:1-33: WARNING: Function
for_each_available_child_of_node should have of_node_put() before goto.
Early exits from for_each_available_child_of_node should decrement the
node reference counter.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix following coccicheck warning:
./drivers/net/ethernet/microchip/sparx5/s4parx5_main.c:723:1-33: WARNING: Function
for_each_available_child_of_node should have of_node_put() before goto
Early exits from for_each_available_child_of_node should decrement the
node reference counter.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
First fragmented packets (frag offset = 0) byte len is zeroed
when stolen by ip_defrag(). And since act_ct update the stats
only afterwards (at end of execute), bytes aren't correctly
accounted for such packets.
To fix this, move stats update to start of action execute.
Fixes: b57dc7c13e ("net/sched: Introduce action ct")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting ds->num_ports to DSA_MAX_PORTS made DSA core allocate unnecessary
dsa_port's and call mt7530_port_disable for non-existent ports.
Set it to MT7530_NUM_PORTS to fix that, and dsa_is_user_port check in
port_enable/disable is no longer required.
Cc: stable@vger.kernel.org
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I compared the register definitions with the D-Link DWR-966
GPL sources and found that the PUAFD field definition was
incorrect. This definition is unused and causes no issues.
Fixes: 14fceff477 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAmFsjT8THG1rbEBwZW5n
dXRyb25peC5kZQAKCRCpyVqK+u3vqXJNCACBFemjdVEKUaWEjd10YvHaILdnEkge
6bj51eIpYy/aAfIqGLhhUOYiyVRlrTD4phJBSVl25jx0Vaf6UqXQ6Otd3GjJAV7j
16Hp8EvGc14cnx+rCPU471YUWyTrPl7YU9AoYKeBmbRmwh4i6HiJZ75AbG6xGvE8
2JZLmrF0QvpZRFfxE0uF5oVwjP9tOSvDmuFa6Me/JQ30CBrpxSalayizMvCO27eO
k2PaXSCpfms2K2eSBTqcjcqjJV57izJzNVtPkcmgOcyB1gvbBmHD7Y2GiiJ6Uigg
YkVEscsZNE1Wn/5p+sIS3pKDErsCsTgiAaopxCiaEsMLdv8b7HLecRGN
=CfwZ
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-5.15-20211017' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2021-10-17
this is a pull request of 11 patches for net/master.
The first 4 patches are by Ziyang Xuan and Zhang Changzhong and fix 1
use after free and 3 standard conformance problems in the j1939 CAN
stack.
The next 2 patches are by Ziyang Xuan and fix 2 concurrency problems
in the ISOTP CAN stack.
Yoshihiro Shimoda's patch for the rcar_can fix suspend/resume on not
running CAN interfaces.
Aswath Govindraju's patch for the m_can driver fixes access for MMIO
devices.
Zheyu Ma contributes a patch for the peak_pci driver to fix a use
after free.
Stephane Grosjean's 2 patches fix CAN error state handling in the
peak_usb driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
On i386, the baycom_epp driver wants to inspect X86 CPU features (TSC)
and then act on that data, but that info is not available when running
on UML, so prevent that test and do the default action.
Prevents this build error on UML + i386:
../drivers/net/hamradio/baycom_epp.c: In function ‘epp_bh’:
../drivers/net/hamradio/baycom_epp.c:630:6: error: implicit declaration of function ‘boot_cpu_has’; did you mean ‘get_cpu_mask’? [-Werror=implicit-function-declaration]
if (boot_cpu_has(X86_FEATURE_TSC)) \
^
../drivers/net/hamradio/baycom_epp.c:658:2: note: in expansion of macro ‘GETTICK’
GETTICK(time1);
Fixes: 68f5d3f3b6 ("um: add PCI over virtio emulation driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-um@lists.infradead.org
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Thomas Sailer <t.sailer@alumni.ethz.ch>
Cc: linux-hams@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Mapping something twice should be possible as long as,
DMA_ATTR_SKIP_CPU_SYNC is passed to the strictly speaking second relevant
mapping operation (that attempts to map the same thing). So, don't issue a
warning if the specified condition is met in add_dma_entry().
Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2 fixes for this cycle:
* Fix a null pointer dereference in ahci-platform driver (from Hai)
* Fix uninitialized variables in pata_legacy driver (from Dan)
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCYWzTkAAKCRDdoc3SxdoY
ductAQCNJ4xNuZVaFFiwvrQOn16xQsu/UHnBReqdBBLytSaW1wD+IjAKEaOYVTbU
0nLtzNSxV52N6o6xSouUT7vxVJ7kjQI=
=H02G
-----END PGP SIGNATURE-----
Merge tag 'libata-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull libata fixes from Damien Le Moal:
"Two fixes for this cycle:
- Fix a null pointer dereference in ahci-platform driver (from Hai)
- Fix uninitialized variables in pata_legacy driver (from Dan)"
* tag 'libata-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: ahci_platform: fix null-ptr-deref in ahci_platform_enable_regulators()
pata_legacy: fix a couple uninitialized variable bugs
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmFsIqAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgppbBEACDewLUv7bg1VFIGdroRN51OGiOv1oV+8HP
ruY7O9CPtV7wcb3lA1Zy9igICuzuC5culHjbRJrNIUeTdWCQHCFk/sfKSD6VGMoT
cFTqpKxV7M3vYr9G2m5TFWgY2mfS+I5fxyDZxK2z2esHCFw6TZ7A5W13xScVXKP+
QdNFSlTrGkpggsSIEeHApG+NLsIecnkT4qzm8zPfUodUtQ3A8JMjQjnYUFEAWfWv
l9x9zDIzaGjPtXf5soFEvmdh1ALh3WWiYb1kIwK1FeP/PYX0JV/3zCMgqOwpK+4b
69OM3Q0NPHvu2TgSRK+ghekAtz5qgPDMCrzdhSgLYJEL/PGAOboqjrB9E+wWoEjd
IKrYLx4Xao2TUZLJF2y34hHfODGdasx7d+wS191UpVFEZHFhDhIaazZ2rDd5xnQK
LdzQw1JQF/igJovHauhSkGFIdJWBSDneLQoMimBnitZlsWARUmFSZej34FFRLZsW
8ZXfqipn/x+fh4sQ/HdEfWxnGHtveDpU+0Ka5bMUe/tJ9RPtmn/Ye7nFjYecC6NY
4UzFSNn+4e9DpHaDuP3I/eA1YBmVlcB5Hum3ve7X6ovwpjArYg3dgJOEi8uCZjfb
hdMANmkVptcPiEO9njEHhC7S8+Nm3t+8o3qQceN81j6Vcjgzt/Y/n3Z6UkKeSlkn
Ila+cZI1oA==
=J/e4
-----END PGP SIGNATURE-----
Merge tag 'block-5.15-2021-10-17' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Bigger than usual for this point in time, the majority is fixing some
issues around BDI lifetimes with the move from the request_queue to
the disk in this release. In detail:
- Series on draining fs IO for del_gendisk() (Christoph)
- NVMe pull request via Christoph:
- fix the abort command id (Keith Busch)
- nvme: fix per-namespace chardev deletion (Adam Manzanares)
- brd locking scope fix (Tetsuo)
- BFQ fix (Paolo)"
* tag 'block-5.15-2021-10-17' of git://git.kernel.dk/linux-block:
block, bfq: reset last_bfqq_created on group change
block: warn when putting the final reference on a registered disk
brd: reduce the brd_devices_mutex scope
kyber: avoid q->disk dereferences in trace points
block: keep q_usage_counter in atomic mode after del_gendisk
block: drain file system I/O on del_gendisk
block: split bio_queue_enter from blk_queue_enter
block: factor out a blk_try_enter_queue helper
block: call submit_bio_checks under q_usage_counter
nvme: fix per-namespace chardev deletion
block/rnbd-clt-sysfs: fix a couple uninitialized variable bugs
nvme-pci: Fix abort command id
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmFsIe4QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpgjCD/9szxypMlfPv0ZuUo/mnWLt/qSI/tFkfDt2
4uwDnibObDUii9jw+2NBvIAHhSnotx+/nOvoI4DeSVahMx7+AbE/zZKHqAjLZDAS
Uc81SyDJLA7IYlNN0XL4cxyQo9LSHOESqLyEekAGvECBIwI/M+HIWIPOn7NCChOr
YU6gZImgO1ty+IrRWCS/QV1goOm6mtwHvzS3vtcrGCorApTACdAuuePDrYfsOAiO
btdhdSgSkkFg0L+fCsTpxVKCddUvP196wesNCEW/yAOXSaZKRX11hAb82LtlB6S1
Il4fLuJGlCHSdEFmgUkfIqHWUW8xDuKLu52NYb7aoU76xKEYE59HrMjEpo1NHOKS
iAZxr3BaTaCfPq0y3r/rCwBQhDVKz9vyXSuELoKqpf1CHCiFFmLYxE4qWM3A0GoE
0Y4DcYLroVWiobvsArsaMPNRQsWdl6nWqwlpGyaWrCP/cLlyaiNmmre0CEr3tf3s
u7nMiPMrk9NkRSYl9O14WlEduuR5Bng97ORVGYB+/Bm3x8VKLyO7VHL2OmpFKC23
Z07Pg9UrH/tm5Hh6VGDuxTYfbJ4iqoIrNYeA+zdxpMDYwozmXGfw90Zia+JgtDXS
Zt/oY9LYp6qJsVFcy+70lAl7EEQqfC9zFRbW3UZ6A7lDKNBgXKlBCpdvCliWqi05
UbWe4AEntQ==
=4wXI
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.15-2021-10-17' of git://git.kernel.dk/linux-block
Pull io_uring fix from Jens Axboe:
"Just a single fix for a wrong condition for grabbing a lock, a
regression in this merge window"
* tag 'io_uring-5.15-2021-10-17' of git://git.kernel.dk/linux-block:
io_uring: fix wrong condition to grab uring lock
Fixes up some issues in rc5.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmFm1OcPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpp0AIAL5lvwd/Dzj0trDEOm93yTiKigkllJ9if8pF
G3V6IgyyeNujmEJLD2Fz5GPUVXg5G2+Yh2xgf8OP4syS/pZoCGkGcLIoo7lJRSvz
D3KpNYrZ2O1kFw2XWP3p7O/H8pxWAMPjykRaVoniCd+rIIpRzWdYBDDWConUGqC1
IlB5BS6cv3vRIoJ4Ac3YVOEUM4WOw/2fwzxejVjxQdNjgbWL0JBY1IBiJrfp7iEo
L3KRmN25JWCwE0x+Ehy6/uSalVzPgjESBYzrBFihnFXDS/LIIIXuRb9Q3HzEFFHT
UhccuExc8Nq8JDKWrZkYP2s950Gnv249bC9tqlHHTnoE+pxG3Ms=
=i68W
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"Fixes up some issues in rc5"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost-vdpa: Fix the wrong input in config_cb
VDUSE: fix documentation underline warning
Revert "virtio-blk: Add validation for block size in config space"
vhost_vdpa: unset vq irq before freeing irq
virtio: write back F_VERSION_1 before validate
Fix a bug where guests on P9 with interrupts passed through could get stuck in
synchronize_irq().
Fix a bug in KVM on P8 where secondary threads entering a guest would write outside their
allocated stack.
Fix a bug in KVM on P8 where secondary threads could confuse the host offline code and
cause the guest or host to crash.
Thanks to: Cédric Le Goater
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmFsFZkTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgBQND/0VSdTHegF+7kBqWnyP+Rc8aXwr/kPL
xijmmeWS+BY0nkW7BNabRwVtOHZ4lKyDxPMwFiQ+tbpsmm4LX9sA9igV2yXdK73Y
pRFkJBxljxXuO3eYa4peQwsvHnI7cR//fUr94b0OT6r0N39T5tKq5MZvWawk05y4
RivpMOnEtDEd12ySJk5/fshwNKV5L7InI2FRKi3EU8sPNV3XA/z5JQ1CKuvXUUiK
1HIwwxYqECcMM0dkIo7Qa7F93Wr0V8ZjsXLFV96jpBNL2aUfXVMaIBmITwa1Pk+i
1581QmI4AJx3b3YAMqoujrgFJQKxaw2ZjiFfDcHGcTD4qCNVH2MNeyHM86Z7lFKL
osYI7Ya3zzlwd4qhPsLV21mBdBp1xjpiBj1FiRrq4o+Me+O6eQnmG2abL8xUFI/3
F7ab9qV8xjqfw/XlAXKtb7cco4SSQfi6XuLYSdTduCU5A1unLgpBw9F6H14YrQ5A
8/uriGLIzhgklwDYgBJUGOjJzMCv0ey1+MJF4R4tEeIJ7+83WuAHCf2eAnxSVXLT
Q2JaFt1LP8fdW1IPYYICjVgahREsWvTW4wK2lk5sO+BSoop6EbOVfz5TK52ePxLC
II4ETYG3tIipLCkf2tqF/rU4De7d2SRg70lCNdEBVU9GpmiXec4uwPyX3uAwKdlf
9qphXh0d15LAPg==
=fPMr
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix a bug where guests on P9 with interrupts passed through could get
stuck in synchronize_irq().
- Fix a bug in KVM on P8 where secondary threads entering a guest would
write outside their allocated stack.
- Fix a bug in KVM on P8 where secondary threads could confuse the host
offline code and cause the guest or host to crash.
Thanks to Cédric Le Goater.
* tag 'powerpc-5.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest
KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()
powerpc/xive: Discard disabled interrupts in get_irqchip_state()
trigger a safety check in elftoolchain's implementation of libelf
- Do not add garbage data to the .rela.orc_unwind_ip section
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmFr+qQACgkQEsHwGGHe
VUqYCxAAqkHYOu4ok1t0ZUePJSDNFXbMNEq8gfb9gY4zQfsk80eE40EUeZhaqyvq
a4JHKJSMus+towDB+Zpy1s3ZEfT8V2leZCjMsIS4yIy/dLHPC3TCL4kS1CVlpw0U
funRhy4FkeiY2Un7ghvhvqSM9l9Dnw3TqxhGUZ0LcdlUaPIKIwzFhHvEVmZsXRFM
cccz5F/r0yeg7r4TbGtTAmxpKSIsx4BaFqf/LICaLrWGo3pDVGlhn9sPflzv1vue
rlR4Lb1NKKCaQTefFmLdEpV7FCiJ0QXeoZNGCJlk9NHZyGoNSRfkVSZRDGt/fb5A
aKfaDvrimgALDB4naWSEplC3slVp9Z2+TpS2zHyqDchGnHIAyx9E6MSOt9F0fs//
DSv83K2j18rRvmnOPsZoy2pySqtMULohIPECPHOtdZ1y+PwhD/FpSNtO51t/hig5
2cPq8DnVycJiXM7VQpTcNXv+M2tZ2IZ/E3g8SHJCNSlGZ73YmZdiO2ZTmXGRPnmx
Xq8gFl/nomMFQSdv6H+nlSy0EA5C3dWqa3pwtKBNrZ8mGAsrEZ2Ptl1TSKfjs28x
tI7KTeZouxA2NlggBllgOFzm5HSSq+4S6l/gWjuR0yu7j2Kpha+UZUaXZHKHt+OU
iFZ6BlITsFEE+cd2Nns2rTJjsjN08j/AfNJOGcjtiFVUjHeCc24=
=tiWi
-----END PGP SIGNATURE-----
Merge tag 'objtool_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Borislav Petkov:
- Update section headers before the respective relocations to not
trigger a safety check in elftoolchain's implementation of libelf
- Do not add garbage data to the .rela.orc_unwind_ip section
* tag 'objtool_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Update section header before relocations
objtool: Check for gelf_update_rel[a] failures
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmFr9jUACgkQEsHwGGHe
VUp23g//XVP10DoYN3kX+wNre2fUDVjwRQ15s0fpLkCyCCYqlSyfj0ijozmvefY1
jvTkfsQUVPmFDsdh7n0GOrZJ02WKxZR9it5NVANc6uDI7uUf1oKRq/+JVF3eoU/s
r5wZcgj53qr697gNi5Edg80P2QUeh/vBBFSI67b7q/+oymQuWjau4GHyLLWM8ekO
7D8D0AKgYCvUluGfbC7Z78Zu9E1Kf5a0WyZofAmXu9sxwVkmNs7rrK/GWNnxp3NR
H9LjJaaSMeOF43DtXC5O0Ag4TJpQiRIZNiysaljrqEPbqLwX/YZhjOeKzLZQHjvU
8vq9p8IrXC4aQ4ZLivsaQBmmcihEJP2Es0YDj73SUlhC03Dxuq8opiFHehPm3mXA
BZ5FqldwWHrqX1I1i45tvA56qi4XhGKcB8lquGTtZ7WGqMB1xqKqg3oW2T/ocD1r
5TaNZ7O0c/1Uo1+OWX6aHETn2pUrk+e1Fn8rT/sUVjfdV+4+LJiNkpleS2n5nfdQ
a7cX3XO9wC7LunA3okyhnM4e9ON7eXfdFCWFV8yZqiybskp9kvgHjLpYWxWn6L0B
uDwOj0lgDQOOoh2focPryBxYFN9NYOibUXCGICFJlup9dowpCsMUOoWm8CYP8x2p
j3YfMzX6WUEKwUOhwRj6Ugfn+fwmWR4ckeWT8ff1lHFxhA6gPMM=
=yLQd
-----END PGP SIGNATURE-----
Merge tag 'perf_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Borislav Petkov:
- Add Sapphire Rapids to the list of CPUs supporting the SMI count MSR
* tag 'perf_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/msr: Add Sapphire Rapids CPU support
Low priority fixes but fixes nonetheless:
- update stub diagnostic print that is no longer accurate
- avoid statically allocated buffer for CPER error record decoding
- avoid sleeping on the efi_runtime semaphore when calling the
ResetSystem EFI runtime service
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmFoF6QACgkQw08iOZLZ
jyQgrgv8DBU0nIiCGcWBC8WJaODZau4ife6kspD3rAmk7xMsjNcTSZ2QXooIXFbM
iChzZvwrJaWCa1HyclBSjVix9IWE1zwU/sfacvbbpooxCA00/UGyboK0uKPgahPF
Gy3pRcRK36mfVO14kWd5lSKD71mGS+VhMqL+Dvz9MxRCwTzanLafGb+QVuSIz2zX
iHSqsUCrOIIMqOZJHB9U3vZdBEnT2xlFirm+KM4DINoxcdo0EZ/aAtnUS/hlXLl8
eWec1nSGPwmCz6Ud10WU+CbQDCT3cFB3NJd9F0lKx5nwWNRkDD/1y0Cq5KVcAFmx
hsYfXOC/zyWMzBMwfeQYqCiVHWsfGjPH4HwE7hG2kloUfhzUfbHPSroDF6vpyXB/
f/K3Zj6ZJRA9V5NaZXaqgIRgpcSWasYLNwmzdNIyV4KuBwmjcjvzKP+mSYSXt7xy
oo8gakoJ3qYCxpIsUOB7iCN8At+sDhuxRncg5RILrN7HXJyPIgAwJ7OVbaMAjRG2
TG58O/em
=+LOo
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Borislav Petkov:
"Forwarded from Ard Biesheuvel through the tip tree. Ard will send
stuff directly in the near future.
Low priority fixes but fixes nonetheless:
- update stub diagnostic print that is no longer accurate
- avoid statically allocated buffer for CPER error record decoding
- avoid sleeping on the efi_runtime semaphore when calling the
ResetSystem EFI runtime service"
* tag 'efi-urgent-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Change down_interruptible() in virt_efi_reset_system() to down_trylock()
efi/cper: use stack buffer for error record decoding
efi/libstub: Simplify "Exiting bootservices" message
shortcomings of some platforms, leading to boot failures.
- Mask out invalid bits in the MXCSR for 32-bit kernels again because
Thomas and I don't know how to mask out bits properly. Third time's the
charm.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmFr8jgACgkQEsHwGGHe
VUrdew//UzC9aVoDuhD/oyK5/rN24x6eipOjjhbFRNHZP6zcPE2+YXiWRk16JKUz
wbQnJP+1MlU4K8swDoN+mqYAvl/SK48dOPGrknnLzMITV3HsfTr6X8wEAKfzSFH6
m25ElTzay4V10oHPSB2B6jOc9GRnMeA/dinfguzmGJfYHgr7dJ89weJ7F3gPP6Hj
+q6g8lJaSJjtZMwweas/0vc6mLb5De6InoimVJQoX2bXcuczS7mou+1SSaaE9wNp
/NnSzAPcbTYkY+2iEFM8hKWKxX/a3j0brJzvrxIw/ggcmoYSNm6Gvw2htjwoPhYs
KlI8Wx2NTvfua93zleaad3131tnIqBgIfpC/7xhD+gYJanbn6p4ZlqrY74qrfPQ2
L15OUnBRFb9/I8zohke7FvjV8OPDJ7P3fp9QkeRp5wwIKSGunQ/AreWo3or+scpY
eNP4DOUSOOObsgpQXbmOBphqnV9UpOtCQUcEaBH4PyF9G2H0KZQiSjD4DCDIiaBa
aQjNlWoC5s89aQdHDGU4iouM5ldmfiT4F65OI2FEClm1/xyqdpMqM+Q0khZKnXow
5hSZitV5tRiR40M4wnE/3XoQt57OaFtwLs8K1InimZGOttgGhr//fsuwaGU0nL5l
oQLg9FuwphRupeMz0a6KmzNcMG34rb1IE8q2twDq3BULi9c97L0=
=dZ4h
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Do not enable AMD memory encryption in Kconfig by default due to
shortcomings of some platforms, leading to boot failures.
- Mask out invalid bits in the MXCSR for 32-bit kernels again because
Thomas and I don't know how to mask out bits properly. Third time's
the charm.
* tag 'x86_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Mask out the invalid MXCSR bits properly
x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT automatically