When deleting the netns and recreating a new one while re-adding the
veth interface, there is a small window of time during which the old
veth interface has not yet been removed. This can cause the new addition
to fail. To resolve this issue, we can either wait for a short while to
ensure that the old veth interface is deleted, or we can specifically
remove the veth interface.
Before this patch:
# ./rps_default_mask.sh
empty rps_default_mask [ ok ]
changing rps_default_mask dont affect existing devices [ ok ]
changing rps_default_mask dont affect existing netns [ ok ]
changing rps_default_mask affect newly created devices [ ok ]
changing rps_default_mask don't affect newly child netns[II][ ok ]
rps_default_mask is 0 by default in child netns [ ok ]
RTNETLINK answers: File exists
changing rps_default_mask in child ns don't affect the main one[ ok ]
cat: /sys/class/net/vethC11an1/queues/rx-0/rps_cpus: No such file or directory
changing rps_default_mask in child ns affects new childns devices./rps_default_mask.sh: line 36: [: -eq: unary operator expected
[fail] expected 1 found
changing rps_default_mask in child ns don't affect existing devices[ ok ]
After this patch:
# ./rps_default_mask.sh
empty rps_default_mask [ ok ]
changing rps_default_mask dont affect existing devices [ ok ]
changing rps_default_mask dont affect existing netns [ ok ]
changing rps_default_mask affect newly created devices [ ok ]
changing rps_default_mask don't affect newly child netns[II][ ok ]
rps_default_mask is 0 by default in child netns [ ok ]
changing rps_default_mask in child ns don't affect the main one[ ok ]
changing rps_default_mask in child ns affects new childns devices[ ok ]
changing rps_default_mask in child ns don't affect existing devices[ ok ]
Fixes: 3a7d84eae0 ("self-tests: more rps self tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20230404072411.879476-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Not all fec MDIO bus drivers support C45 mode transactions. The older fec
hardware block in many ColdFire SoCs does not appear to support them, at
least according to most of the different ColdFire SoC reference manuals.
The bits used to generate C45 access on the iMX parts, in the OP field
of the MMFR register, are documented as generating non-compliant MII
frames (it is not documented as to exactly how they are non-compliant).
Commit 8d03ad1ab0 ("net: fec: Separate C22 and C45 transactions")
means the fec driver will always register c45 MDIO read and write
methods. During probe these will always be accessed now generating
non-compliant MII accesses on ColdFire based devices.
Add a quirk define, FEC_QUIRK_HAS_MDIO_C45, that can be used to
distinguish silicon that supports MDIO C45 framing or not. Add this to
all the existing iMX quirks, so they will be behave as they do now (*).
(*) it seems that some iMX parts may not support C45 transactions either.
The iMX25 and iMX50 Reference Manuals contain similar wording to
the ColdFire Reference Manuals on this.
Fixes: 8d03ad1ab0 ("net: fec: Separate C22 and C45 transactions")
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230404052207.3064861-1-gerg@linux-m68k.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
There is a concurrency bug that may cause the wrong value to be loaded
when a CPU is modifying the maple tree.
CPU1:
mtree_insert_range()
mas_insert()
mas_store_root()
...
mas_root_expand()
...
rcu_assign_pointer(mas->tree->ma_root, mte_mk_root(mas->node));
ma_set_meta(node, maple_leaf_64, 0, slot); <---IP
CPU2:
mtree_load()
mtree_lookup_walk()
ma_data_end();
When CPU1 is about to execute the instruction pointed to by IP, the
ma_data_end() executed by CPU2 may return the wrong end position, which
will cause the value loaded by mtree_load() to be wrong.
An example of triggering the bug:
Add mdelay(100) between rcu_assign_pointer() and ma_set_meta() in
mas_root_expand().
static DEFINE_MTREE(tree);
int work(void *p) {
unsigned long val;
for (int i = 0 ; i< 30; ++i) {
val = (unsigned long)mtree_load(&tree, 8);
mdelay(5);
pr_info("%lu",val);
}
return 0;
}
mt_init_flags(&tree, MT_FLAGS_USE_RCU);
mtree_insert(&tree, 0, (void*)12345, GFP_KERNEL);
run_thread(work)
mtree_insert(&tree, 1, (void*)56789, GFP_KERNEL);
In RCU mode, mtree_load() should always return the value before or after
the data structure is modified, and in this example mtree_load(&tree, 8)
may return 56789 which is not expected, it should always return NULL. Fix
it by put ma_set_meta() before rcu_assign_pointer().
Link: https://lkml.kernel.org/r/20230314124203.91572-4-zhangpeng.00@bytedance.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
if (likely(offset > end))
max = pivots[offset];
The above code should be changed to if (likely(offset < end)), which is
correct. This affects the correctness of ma_data_end(). Now it seems
that the final result will not be wrong, but it is best to change it.
This patch does not change the code as above, because it simplifies the
code by the way.
Link: https://lkml.kernel.org/r/20230314124203.91572-1-zhangpeng.00@bytedance.com
Link: https://lkml.kernel.org/r/20230314124203.91572-2-zhangpeng.00@bytedance.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The si->lock must be held when deleting the si from the available list.
Otherwise, another thread can re-add the si to the available list, which
can lead to memory corruption. The only place we have found where this
happens is in the swapoff path. This case can be described as below:
core 0 core 1
swapoff
del_from_avail_list(si) waiting
try lock si->lock acquire swap_avail_lock
and re-add si into
swap_avail_head
acquire si->lock but missing si already being added again, and continuing
to clear SWP_WRITEOK, etc.
It can be easily found that a massive warning messages can be triggered
inside get_swap_pages() by some special cases, for example, we call
madvise(MADV_PAGEOUT) on blocks of touched memory concurrently, meanwhile,
run much swapon-swapoff operations (e.g. stress-ng-swap).
However, in the worst case, panic can be caused by the above scene. In
swapoff(), the memory used by si could be kept in swap_info[] after
turning off a swap. This means memory corruption will not be caused
immediately until allocated and reset for a new swap in the swapon path.
A panic message caused: (with CONFIG_PLIST_DEBUG enabled)
------------[ cut here ]------------
top: 00000000e58a3003, n: 0000000013e75cda, p: 000000008cd4451a
prev: 0000000035b1e58a, n: 000000008cd4451a, p: 000000002150ee8d
next: 000000008cd4451a, n: 000000008cd4451a, p: 000000008cd4451a
WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70
Modules linked in: rfkill(E) crct10dif_ce(E)...
CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+
Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015
pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
pc : plist_check_prev_next_node+0x50/0x70
lr : plist_check_prev_next_node+0x50/0x70
sp : ffff0018009d3c30
x29: ffff0018009d3c40 x28: ffff800011b32a98
x27: 0000000000000000 x26: ffff001803908000
x25: ffff8000128ea088 x24: ffff800011b32a48
x23: 0000000000000028 x22: ffff001800875c00
x21: ffff800010f9e520 x20: ffff001800875c00
x19: ffff001800fdc6e0 x18: 0000000000000030
x17: 0000000000000000 x16: 0000000000000000
x15: 0736076307640766 x14: 0730073007380731
x13: 0736076307640766 x12: 0730073007380731
x11: 000000000004058d x10: 0000000085a85b76
x9 : ffff8000101436e4 x8 : ffff800011c8ce08
x7 : 0000000000000000 x6 : 0000000000000001
x5 : ffff0017df9ed338 x4 : 0000000000000001
x3 : ffff8017ce62a000 x2 : ffff0017df9ed340
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
plist_check_prev_next_node+0x50/0x70
plist_check_head+0x80/0xf0
plist_add+0x28/0x140
add_to_avail_list+0x9c/0xf0
_enable_swap_info+0x78/0xb4
__do_sys_swapon+0x918/0xa10
__arm64_sys_swapon+0x20/0x30
el0_svc_common+0x8c/0x220
do_el0_svc+0x2c/0x90
el0_svc+0x1c/0x30
el0_sync_handler+0xa8/0xb0
el0_sync+0x148/0x180
irq event stamp: 2082270
Now, si->lock locked before calling 'del_from_avail_list()' to make sure
other thread see the si had been deleted and SWP_WRITEOK cleared together,
will not reinsert again.
This problem exists in versions after stable 5.10.y.
Link: https://lkml.kernel.org/r/20230404154716.23058-1-rongwei.wang@linux.alibaba.com
Fixes: a2468cc9bf ("swap: choose swap device according to numa node")
Tested-by: Yongchen Yin <wb-yyc939293@alibaba-inc.com>
Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The current nilfs2 sysfs support has issues with the timing of creation
and deletion of sysfs entries, potentially leading to null pointer
dereferences, use-after-free, and lockdep warnings.
Some of the sysfs attributes for nilfs2 per-filesystem instance refer to
metadata file "cpfile", "sufile", or "dat", but
nilfs_sysfs_create_device_group that creates those attributes is executed
before the inodes for these metadata files are loaded, and
nilfs_sysfs_delete_device_group which deletes these sysfs entries is
called after releasing their metadata file inodes.
Therefore, access to some of these sysfs attributes may occur outside of
the lifetime of these metadata files, resulting in inode NULL pointer
dereferences or use-after-free.
In addition, the call to nilfs_sysfs_create_device_group() is made during
the locking period of the semaphore "ns_sem" of nilfs object, so the
shrinker call caused by the memory allocation for the sysfs entries, may
derive lock dependencies "ns_sem" -> (shrinker) -> "locks acquired in
nilfs_evict_inode()".
Since nilfs2 may acquire "ns_sem" deep in the call stack holding other
locks via its error handler __nilfs_error(), this causes lockdep to report
circular locking. This is a false positive and no circular locking
actually occurs as no inodes exist yet when
nilfs_sysfs_create_device_group() is called. Fortunately, the lockdep
warnings can be resolved by simply moving the call to
nilfs_sysfs_create_device_group() out of "ns_sem".
This fixes these sysfs issues by revising where the device's sysfs
interface is created/deleted and keeping its lifetime within the lifetime
of the metadata files above.
Link: https://lkml.kernel.org/r/20230330205515.6167-1-konishi.ryusuke@gmail.com
Fixes: dd70edbde2 ("nilfs2: integrate sysfs support into driver")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+979fa7f9c0d086fdc282@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/0000000000003414b505f7885f7e@google.com
Reported-by: syzbot+5b7d542076d9bddc3c6a@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/0000000000006ac86605f5f44eb9@google.com
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Device exclusive page table entries are used to prevent CPU access to a
page whilst it is being accessed from a device. Typically this is used to
implement atomic operations when the underlying bus does not support
atomic access. When a CPU thread encounters a device exclusive entry it
locks the page and restores the original entry after calling mmu notifiers
to signal drivers that exclusive access is no longer available.
The device exclusive entry holds a reference to the page making it safe to
access the struct page whilst the entry is present. However the fault
handling code does not hold the PTL when taking the page lock. This means
if there are multiple threads faulting concurrently on the device
exclusive entry one will remove the entry whilst others will wait on the
page lock without holding a reference.
This can lead to threads locking or waiting on a folio with a zero
refcount. Whilst mmap_lock prevents the pages getting freed via munmap()
they may still be freed by a migration. This leads to warnings such as
PAGE_FLAGS_CHECK_AT_FREE due to the page being locked when the refcount
drops to zero.
Fix this by trying to take a reference on the folio before locking it.
The code already checks the PTE under the PTL and aborts if the entry is
no longer there. It is also possible the folio has been unmapped, freed
and re-allocated allowing a reference to be taken on an unrelated folio.
This case is also detected by the PTE check and the folio is unlocked
without further changes.
Link: https://lkml.kernel.org/r/20230330012519.804116-1-apopple@nvidia.com
Fixes: b756a3b5e7 ("mm: device exclusive memory access")
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The finalization of nilfs_segctor_thread() can race with
nilfs_segctor_kill_thread() which terminates that thread, potentially
causing a use-after-free BUG as KASAN detected.
At the end of nilfs_segctor_thread(), it assigns NULL to "sc_task" member
of "struct nilfs_sc_info" to indicate the thread has finished, and then
notifies nilfs_segctor_kill_thread() of this using waitqueue
"sc_wait_task" on the struct nilfs_sc_info.
However, here, immediately after the NULL assignment to "sc_task", it is
possible that nilfs_segctor_kill_thread() will detect it and return to
continue the deallocation, freeing the nilfs_sc_info structure before the
thread does the notification.
This fixes the issue by protecting the NULL assignment to "sc_task" and
its notification, with spinlock "sc_state_lock" of the struct
nilfs_sc_info. Since nilfs_segctor_kill_thread() does a final check to
see if "sc_task" is NULL with "sc_state_lock" locked, this can eliminate
the race.
Link: https://lkml.kernel.org/r/20230327175318.8060-1-konishi.ryusuke@gmail.com
Reported-by: syzbot+b08ebcc22f8f3e6be43a@syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/00000000000000660d05f7dfa877@google.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When freeable class stat was added to classes file (back in 2016) we
forgot to update zsmalloc documentation. Fix that.
Link: https://lkml.kernel.org/r/20230325024631.2817153-3-senozhatsky@chromium.org
Fixes: 1120ed5483 ("mm/zsmalloc: add `freeable' column to pool stat")
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "zsmalloc: minor documentation updates".
Two minor patches that bring zsmalloc documentation up to date.
This patch (of 2):
Update documentation and reflect new zspages fullness grouping (we don't
use almost_empty and almost_full anymore).
Link: https://lkml.kernel.org/r/20230325024631.2817153-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20230325024631.2817153-2-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Fixes: 67e157eb3639 ("zsmalloc: show per fullness group class stats")
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be
writable even with uffd-wp bit set. It only happens with hugetlb private
mappings, when someone firstly wr-protects a missing pte (which will
install a pte marker), then a write to the same page without any prior
access to the page.
Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before
reaching hugetlb_wp() to avoid taking more locks that userfault won't
need. However there's one CoW optimization path that can trigger
hugetlb_wp() inside hugetlb_no_page(), which will bypass the trap.
This patch skips hugetlb_wp() for CoW and retries the fault if uffd-wp bit
is detected. The new path will only trigger in the CoW optimization path
because generic hugetlb_fault() (e.g. when a present pte was
wr-protected) will resolve the uffd-wp bit already. Also make sure
anonymous UNSHARE won't be affected and can still be resolved, IOW only
skip CoW not CoR.
This patch will be needed for v5.19+ hence copy stable.
[peterx@redhat.com: v2]
Link: https://lkml.kernel.org/r/ZBzOqwF2wrHgBVZb@x1n
[peterx@redhat.com: v3]
Link: https://lkml.kernel.org/r/20230324142620.2344140-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20230321191840.1897940-1-peterx@redhat.com
Fixes: 166f3ecc0d ("mm/hugetlb: hook page faults for uffd write protection")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use the maple tree in RCU mode for VMA tracking.
The maple tree tracks the stack and is able to update the pivot
(lower/upper boundary) in-place to allow the page fault handler to write
to the tree while holding just the mmap read lock. This is safe as the
writes to the stack have a guard VMA which ensures there will always be a
NULL in the direction of the growth and thus will only update a pivot.
It is possible, but not recommended, to have VMAs that grow up/down
without guard VMAs. syzbot has constructed a testcase which sets up a VMA
to grow and consume the empty space. Overwriting the entire NULL entry
causes the tree to be altered in a way that is not safe for concurrent
readers; the readers may see a node being rewritten or one that does not
match the maple state they are using.
Enabling RCU mode allows the concurrent readers to see a stable node and
will return the expected result.
[Liam.Howlett@Oracle.com: we don't need to free the nodes with RCU[
Link: https://lore.kernel.org/linux-mm/000000000000b0a65805f663ace6@google.com/
Link: https://lkml.kernel.org/r/20230227173632.3292573-9-surenb@google.com
Fixes: d4af56c5c7 ("mm: start tracking VMAs with maple tree")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: syzbot+8d95422d3537159ca390@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dereferencing RCU objects within the RCU callback without the RCU check
has caused lockdep to complain. Fix the RCU dereferencing by using the
RCU callback lock to ensure the operation is safe.
Also stop creating a new lock to use for dereferencing during destruction
of the tree or subtree. Instead, pass through a pointer to the tree that
has the lock that is held for RCU dereferencing checking. It also does
not make sense to use the maple state in the freeing scenario as the tree
walk is a special case where the tree no longer has the normal encodings
and parent pointers.
Link: https://lkml.kernel.org/r/20230227173632.3292573-8-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add an smp_rmb() before reading the parent pointer to ensure that anything
read from the node prior to the parent pointer hasn't been reordered ahead
of this check.
The is necessary for RCU mode.
Link: https://lkml.kernel.org/r/20230227173632.3292573-7-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
During the development of the maple tree, the strategy of freeing multiple
nodes changed and, in the process, the pivots were reused to store
pointers to dead nodes. To ensure the readers see accurate pivots, the
writers need to mark the nodes as dead and call smp_wmb() to ensure any
readers can identify the node as dead before using the pivot values.
There were two places where the old method of marking the node as dead
without smp_wmb() were being used, which resulted in RCU readers seeing
the wrong pivot value before seeing the node was dead. Fix this race
condition by using mte_set_node_dead() which has the smp_wmb() call to
ensure the race is closed.
Add a WARN_ON() to the ma_free_rcu() call to ensure all nodes being freed
are marked as dead to ensure there are no other call paths besides the two
updated paths.
This is necessary for the RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-6-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The call to mte_set_dead_node() before the smp_wmb() already calls
smp_wmb() so this is not needed. This is an optimization for the RCU mode
of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-5-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The walk to destroy the nodes was not always setting the node type and
would result in a destroy method potentially using the values as nodes.
Avoid this by setting the correct node types. This is necessary for the
RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-4-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When initially starting a search, the root node may already be in the
process of being replaced in RCU mode. Detect and restart the walk if
this is the case. This is necessary for RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230227173632.3292573-3-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Fix VMA tree modification under mmap read lock".
Syzbot reported a BUG_ON in mm/mmap.c which was found to be caused by an
inconsistency between threads walking the VMA maple tree. The
inconsistency is caused by the page fault handler modifying the maple tree
while holding the mmap_lock for read.
This only happens for stack VMAs. We had thought this was safe as it only
modifies a single pivot in the tree. Unfortunately, syzbot constructed a
test case where the stack had no guard page and grew the stack to abut the
next VMA. This causes us to delete the NULL entry between the two VMAs
and rewrite the node.
We considered several options for fixing this, including dropping the
mmap_lock, then reacquiring it for write; and relaxing the definition of
the tree to permit a zero-length NULL entry in the node. We decided the
best option was to backport some of the RCU patches from -next, which
solve the problem by allocating a new node and RCU-freeing the old node.
Since the problem exists in 6.1, we preferred a solution which is similar
to the one we intended to merge next merge window.
These patches have been in -next since next-20230301, and have received
intensive testing in Android as part of the RCU page fault patchset. They
were also sent as part of the "Per-VMA locks" v4 patch series. Patches 1
to 7 are bug fixes for RCU mode of the tree and patch 8 enables RCU mode
for the tree.
Performance v6.3-rc3 vs patched v6.3-rc3: Running these changes through
mmtests showed there was a 15-20% performance decrease in
will-it-scale/brk1-processes. This tests creating and inserting a single
VMA repeatedly through the brk interface and isn't representative of any
real world applications.
This patch (of 8):
ma_pivots() and ma_data_end() may be called with a dead node. Ensure to
that the node isn't dead before using the returned values.
This is necessary for RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/20230327185532.2354250-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230227173632.3292573-2-surenb@google.com
Fixes: 54a611b605 ("Maple Tree: add new data structure")
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Li <chriscli@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: freak07 <michalechner92@googlemail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Minchan Kim <minchan@google.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Oskolkov <posk@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Punit Agrawal <punit.agrawal@bytedance.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
mt76 has a fix for leaking cleartext frames on a certain scenario and
two firmware file handling related fixes. For brcmfmac we have a fix
for an older SDIO suspend regression and for ath11k avoiding a kernel
crash during hibernation with SUSE kernels.
-----BEGIN PGP SIGNATURE-----
iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmQtUzQRHGt2YWxvQGtl
cm5lbC5vcmcACgkQbhckVSbrbZuz/wgAlpzStpfnJkVUJRRjCb0yWDPKz+NYYeLh
wQ3Ad+T/KUiav870DlFStRcaZTUi6JyIk61cEnIk6ZB36ivCML6KJ/7GAZFIcI2l
lq/dY8zG2GX0kkSufw24P+6tD3LQguejMv0YHYVUEEXTo4b16dyE5nja2acaOjQI
LDQY6Gq5A/TdMyq9VPLWpPjBJkwEWjexyzJu57A4hExgGZviMAyVWNlBdNOkRn2E
sMt/4ibgchVTxnmHfvS8d+gmEuxiMLi7PHn17gY80kHh2seCc9ODg6EbVxvIi18b
gUj6QXfD9dSJ+ghhV3M7oH+6GLSXDf/c+f1Xy8gupupoQiDxJ+T8UQ==
=mdQP
-----END PGP SIGNATURE-----
Merge tag 'wireless-2023-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.3
mt76 has a fix for leaking cleartext frames on a certain scenario and
two firmware file handling related fixes. For brcmfmac we have a fix
for an older SDIO suspend regression and for ath11k avoiding a kernel
crash during hibernation with SUSE kernels.
* tag 'wireless-2023-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: mt76: ignore key disable commands
wifi: ath11k: reduce the MHI timeout to 20s
wifi: mt76: mt7921: fix fw used for offload check for mt7922
wifi: mt76: mt7921: Fix use-after-free in fw features query.
wifi: brcmfmac: Fix SDIO suspend/resume regression
====================
Link: https://lore.kernel.org/r/20230405105536.4E946C433D2@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEDs2BvajyNKlf9TJQvlAcSiqKBOgFAmQtPNcTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRC+UBxKKooE6DYXB/wKpuSPkpn6VHES2Gv3kl1+v3eOSCVI
gt6rNy5HbxcsezYZTbfOzqlh2wxOMdZNDsbG9jDm2W/knwDP01JY0OhQLcxUT6w0
xFgWnoTTkGVtkjPSEE/BL5cy5YMvjbBF2Xn9d6aTDms5BJPdY2mz8NpjITBcf2bd
CLcqCthK2DabZQR9WWVDgGP/q2b+hKDBefioEfX9EbBTAYZ9+0HjMhFl1MdTQ1li
rnva5EFx9XmzqXyBhB+iOZFSgHAxdEF2ZhVN48mDsqAee5jw/COoTv1qc4qMSX79
dCkYP1oTRW39NBkCtw252YbwaYzn9RPhPjPy3kLx7+H/0AWft2ns6wBh
=Dgqo
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-6.3-20230405' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2023-04-05
The first patch is by Oleksij Rempel and fixes a out-of-bounds memory
access in the j1939 protocol.
The remaining 3 patches target the ISOTP protocol. Oliver Hartkopp
fixes the ISOTP protocol to pass information about dropped PDUs to the
user space via control messages. Michal Sojka's patch fixes poll() to
not forward false EPOLLOUT events. And Oliver Hartkopp fixes a race
condition between isotp_sendsmg() and isotp_release().
* tag 'linux-can-fixes-for-6.3-20230405' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: isotp: fix race between isotp_sendsmg() and isotp_release()
can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos
can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
====================
Link: https://lore.kernel.org/r/20230405092444.1802340-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-04-04 (ice)
This series contains updates to ice driver only.
Simei adjusts error path on adding VF Flow Director filters that were
not releasing all resources.
Lingyu adds setting/resetting of VF Flow Director filters counters
during initialization.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Reset FDIR counter in FDIR init stage
ice: fix wrong fallback logic for FDIR
====================
Link: https://lore.kernel.org/r/20230404172306.450880-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Lenovo ThinkPad W530 uses a nvidia k1000m GPU. When this gets used
together with one of the older nvidia binary driver series (the latest
series does not support it), then backlight control does not work.
This is caused by commit 3dbc80a3e4 ("ACPI: video: Make backlight
class device registration a separate step (v2)") combined with
commit 5aa9d943e9 ("ACPI: video: Don't enable fallback path for
creating ACPI backlight by default").
After these changes the acpi_video# backlight device is only registered
when requested by a GPU driver calling acpi_video_register_backlight()
which the nvidia binary driver does not do.
I realize that using the nvidia binary driver is not a supported use-case
and users can workaround this by adding acpi_backlight=video on the kernel
commandline, but the ThinkPad W530 is a popular model under Linux users,
so it seems worthwhile to add a quirk for this.
I will also email Nvidia asking them to make the driver call
acpi_video_register_backlight() when an internal LCD panel is detected.
So maybe the next maintenance release of the drivers will fix this...
Fixes: 5aa9d943e9 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: All applicable <stable@vger.kernel.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On the Apple iMac14,1 and iMac14,2 all-in-ones (monitors with builtin "PC")
the connection between the GPU and the panel is seen by the GPU driver as
regular DP instead of eDP, causing the GPU driver to never call
acpi_video_register_backlight().
(GPU drivers only call acpi_video_register_backlight() when an internal
panel is detected, to avoid non working acpi_video# devices getting
registered on desktops which unfortunately is a real issue.)
Fix the missing acpi_video# backlight device on these all-in-ones by
adding a acpi_backlight=video DMI quirk, so that video.ko will
immediately register the backlight device instead of waiting for
an acpi_video_register_backlight() call.
Fixes: 5aa9d943e9 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: All applicable <stable@vger.kernel.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 3dbc80a3e4 ("ACPI: video: Make backlight class device
registration a separate step (v2)") combined with
commit 5aa9d943e9 ("ACPI: video: Don't enable fallback path for
creating ACPI backlight by default")
Means that the video.ko code now fully depends on the GPU driver calling
acpi_video_register_backlight() for the acpi_video# backlight class
devices to get registered.
This means that if the GPU driver does not do this, acpi_backlight=video
on the cmdline, or DMI quirks for selecting acpi_video# will not work.
This is a problem on for example Apple iMac14,1 all-in-ones where
the monitor's LCD panel shows up as a regular DP connection instead of
eDP so the GPU driver will not call acpi_video_register_backlight() [1].
Fix this by making video.ko directly register the acpi_video# devices
when these have been explicitly requested either on the cmdline or
through DMI quirks (rather then auto-detection being used).
[1] GPU drivers only call acpi_video_register_backlight() when an internal
panel is detected, to avoid non working acpi_video# devices getting
registered on desktops which unfortunately is a real issue.
Fixes: 5aa9d943e9 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: All applicable <stable@vger.kernel.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Allow callers of __acpi_video_get_backlight_type() to pass a pointer
to a bool which will get set to false if the backlight-type comes from
the cmdline or a DMI quirk and set to true if auto-detection was used.
And make __acpi_video_get_backlight_type() non static so that it can
be called directly outside of video_detect.c .
While at it turn the acpi_video_get_backlight_type() and
acpi_video_backlight_use_native() wrappers into static inline functions
in include/acpi/video.h, so that we need to export one less symbol.
Fixes: 5aa9d943e9 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default")
Cc: All applicable <stable@vger.kernel.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After a server reboot, clients are failing to move files with ENOENT.
This is caused by DFS referrals containing multiple separators, which
the server move call doesn't recognize.
v1: Initial patch.
v2: Move prototype to header.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2182472
Fixes: a31080899d ("cifs: sanitize multiple delimiters in prepath")
Actually-Fixes: 24e0a1eff9 ("cifs: switch to new mount api")
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Thiago Rafael Becker <tbecker@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
iov_iter for ep_read_iter can be ITER_UBUF with io_uring.
In that case dup_iter() does not have to allocate iov and it can
return NULL. Fix the assumption by checking for iter_is_ubuf()
other wise ep_read_iter can treat this as failure and return -ENOMEM.
Fixes: 1e23db450c ("io_uring: use iter_ubuf for single range imports")
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20230401060509.3608259-3-dhavale@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
iov_iter for ffs_epfile_read_iter can be ITER_UBUF with io_uring.
In that case dup_iter() does not have to allocate anything and it
can return NULL. ffs_epfile_read_iter treats this as a failure and
returns -ENOMEM. Fix it by checking if iter_is_ubuf().
Fixes: 1e23db450c ("io_uring: use iter_ubuf for single range imports")
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20230401060509.3608259-2-dhavale@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While determining the initial pin assignment to be sent in the configure
message, using the DP_PIN_ASSIGN_DP_ONLY_MASK mask causes the DFP_U to
send both Pin Assignment C and E when both are supported by the DFP_U and
UFP_U. The spec (Table 5-7 DFP_U Pin Assignment Selection Mandates,
VESA DisplayPort Alt Mode Standard v2.0) indicates that the DFP_U never
selects Pin Assignment E when Pin Assignment C is offered.
Update the DP_PIN_ASSIGN_DP_ONLY_MASK conditional to intially select only
Pin Assignment C if it is available.
Fixes: 0e3bb7d689 ("usb: typec: Add driver for DisplayPort alternate mode")
Cc: stable@vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20230329215159.2046932-1-rdbabiera@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan reports that smatch complains about a potential uninitialized
variable being used in the compat alignment fixup code.
The logic is not wrong per se, but we do end up using an uninitialized
variable if reading the instruction that triggered the alignment fault
from user space faults, even if the fault ensures that the uninitialized
value doesn't propagate any further.
Given that we just give up and return 1 if any fault occurs when reading
the instruction, let's get rid of the 'success handling' pattern that
captures the fault in a variable and aborts later, and instead, just
return 1 immediately if any of the get_user() calls result in an
exception.
Fixes: 3fc24ef32d ("arm64: compat: Implement misalignment fixups for multiword loads")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/202304021214.gekJ8yRc-lkp@intel.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230404103625.2386382-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- Fix timerlat notification, as it was not triggering the
notify to users when a new max latency was hit.
- Do not trigger max latency if the tracing is off.
When tracing is off, the ring buffer is not updated, it
does not make sense to notify when there's a new max latency
detected by the tracer, as why that latency happened is not available.
The tracing logic still runs when the ring buffer is disabled, but
it should not be triggering notifications.
- Fix race on freeing the synthetic event "last_cmd" variable by
adding a mutex around it.
- Fix race between reader and writer of the ring buffer by adding
memory barriers. When the writer is still on the reader page
it must have its content visible on the buffer before it moves
the commit index that the reader uses to know how much content is
on the page.
- Make get_lock_parent_ip() always inlined, as it uses _THIS_IP_
and _RET_IP_, which gets broken if it is not inlined.
- Make __field(int, arr[5]) in a TRACE_EVENT() macro fail to build.
The field formats of trace events are calculated by using sizeof(type)
and other means by what is passed into the structure macros like
__field(). The __field() macro is only meant for atom types like
int, long, short, pointer, etc. It is not meant for arrays. But
the code will currently compile with arrays, but then the format
produced will be inaccurate, and user space parsing tools will break.
Two bugs have already been fixed, now add code that will make the
kernel fail to build if another trace event includes this buggy
field format.
- Fix boot up snapshot code:
Boot snapshots were triggering when not even asked for on the
kernel command line. This was caused by two bugs.
1) It would trigger a snapshot on any instance if one was created
from the kernel command line.
2) The error handling would only affect the top level instance.
So the fact that a snapshot was done on a instance that didn't
allocate a buffer triggered a warning written into the top level
buffer, and worse yet, disabled the top level buffer.
- Fix memory leak that was caused when an error was logged in a
trace buffer instance, and then the buffer instance was removed.
The allocated error log messages still need to be freed.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZC2GEBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qsXuAP4xHoYnHMfixLZz9Z4KSz/2LWTjBce8
GItuEmfWXANQ7wD7BS0li8/ISCDg0P/epuWFOHyxP3jpbfPZtpYfrj5Cog4=
=71kh
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix timerlat notification, as it was not triggering the notify to
users when a new max latency was hit.
- Do not trigger max latency if the tracing is off.
When tracing is off, the ring buffer is not updated, it does not make
sense to notify when there's a new max latency detected by the
tracer, as why that latency happened is not available. The tracing
logic still runs when the ring buffer is disabled, but it should not
be triggering notifications.
- Fix race on freeing the synthetic event "last_cmd" variable by adding
a mutex around it.
- Fix race between reader and writer of the ring buffer by adding
memory barriers. When the writer is still on the reader page it must
have its content visible on the buffer before it moves the commit
index that the reader uses to know how much content is on the page.
- Make get_lock_parent_ip() always inlined, as it uses _THIS_IP_ and
_RET_IP_, which gets broken if it is not inlined.
- Make __field(int, arr[5]) in a TRACE_EVENT() macro fail to build.
The field formats of trace events are calculated by using
sizeof(type) and other means by what is passed into the structure
macros like __field(). The __field() macro is only meant for atom
types like int, long, short, pointer, etc. It is not meant for
arrays.
The code will currently compile with arrays, but then the format
produced will be inaccurate, and user space parsing tools will break.
Two bugs have already been fixed, now add code that will make the
kernel fail to build if another trace event includes this buggy field
format.
- Fix boot up snapshot code:
Boot snapshots were triggering when not even asked for on the kernel
command line. This was caused by two bugs:
1) It would trigger a snapshot on any instance if one was created
from the kernel command line.
2) The error handling would only affect the top level instance.
So the fact that a snapshot was done on a instance that didn't
allocate a buffer triggered a warning written into the top level
buffer, and worse yet, disabled the top level buffer.
- Fix memory leak that was caused when an error was logged in a trace
buffer instance, and then the buffer instance was removed.
The allocated error log messages still needed to be freed.
* tag 'trace-v6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Free error logs of tracing instances
tracing: Fix ftrace_boot_snapshot command line logic
tracing: Have tracing_snapshot_instance_cond() write errors to the appropriate instance
tracing: Error if a trace event has an array for a __field()
tracing/osnoise: Fix notify new tracing_max_latency
tracing/timerlat: Notify new max thread latency
ftrace: Mark get_lock_parent_ip() __always_inline
ring-buffer: Fix race while reader and writer are on the same page
tracing/synthetic: Fix races on freeing last_cmd
The device can report discard support without setting the ONCS DSM bit.
When not set, the driver clears max_discard_size expecting it to be set
later. We don't know the size until we have the namespace format,
though, so setting it is deferred until configuring one, but the driver
was abandoning the discard settings due to that initial clearing.
Move the max_discard_size calculation above the check for a '0' discard
size.
Fixes: 1a86924e4f ("nvme: fix interpretation of DMRSL")
Reported-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
drm/i915 fixes for v6.3-rc6:
- Fix DP MST DSC M/N calculation to use compressed bpp
- Fix racy use-after-free in perf ioctl
- Fix context runtime accounting
- Fix handling of GT reset during HuC loading
- Fix use of unsigned vm_fault_t for error values
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87zg7mzomz.fsf@intel.com
In the j1939_tp_tx_dat_new() function, an out-of-bounds memory access
could occur during the memcpy() operation if the size of skb->cb is
larger than the size of struct j1939_sk_buff_cb. This is because the
memcpy() operation uses the size of skb->cb, leading to a read beyond
the struct j1939_sk_buff_cb.
Updated the memcpy() operation to use the size of struct
j1939_sk_buff_cb instead of the size of skb->cb. This ensures that the
memcpy() operation only reads the memory within the bounds of struct
j1939_sk_buff_cb, preventing out-of-bounds memory access.
Additionally, add a BUILD_BUG_ON() to check that the size of skb->cb
is greater than or equal to the size of struct j1939_sk_buff_cb. This
ensures that the skb->cb buffer is large enough to hold the
j1939_sk_buff_cb structure.
Fixes: 9d71dd0c70 ("can: add support of SAE J1939 protocol")
Reported-by: Shuangpeng Bai <sjb7183@psu.edu>
Tested-by: Shuangpeng Bai <sjb7183@psu.edu>
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://groups.google.com/g/syzkaller/c/G_LL-C3plRs/m/-8xCi6dCAgAJ
Link: https://lore.kernel.org/all/20230404073128.3173900-1-o.rempel@pengutronix.de
Cc: stable@vger.kernel.org
[mkl: rephrase commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The same task check in perf_event_set_output has some potential issues
for some usages.
For the current perf code, there is a problem if using of
perf_event_open() to have multiple samples getting into the same mmap’d
memory when they are both attached to the same process.
https://lore.kernel.org/all/92645262-D319-4068-9C44-2409EF44888E@gmail.com/
Because the event->ctx is not ready when the perf_event_set_output() is
invoked in the perf_event_open().
Besides the above issue, before the commit bd27568117 ("perf: Rewrite
core context handling"), perf record can errors out when sampling with
a hardware event and a software event as below.
$ perf record -e cycles,dummy --per-thread ls
failed to mmap with 22 (Invalid argument)
That's because that prior to the commit a hardware event and a software
event are from different task context.
The problem should be a long time issue since commit c3f00c7027
("perk: Separate find_get_context() from event initialization").
The task struct is stored in the event->hw.target for each per-thread
event. It is a more reliable way to determine whether two events are
attached to the same task.
The event->hw.target was also introduced several years ago by the
commit 50f16a8bf9 ("perf: Remove type specific target pointers"). It
can not only be used to fix the issue with the current code, but also
back port to fix the issues with an older kernel.
Note: The event->hw.target was introduced later than commit
c3f00c7027. The patch may cannot be applied between the commit
c3f00c7027 and commit 50f16a8bf9. Anybody that wants to back-port
this at that period may have to find other solutions.
Fixes: c3f00c7027 ("perf: Separate find_get_context() from event initialization")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lkml.kernel.org/r/20230322202449.512091-1-kan.liang@linux.intel.com
Thomas reported that offlining CPUs spends a lot of time in
synchronize_rcu() as called from perf_pmu_migrate_context() even though
he's not actually using uncore events.
Turns out, the thing is unconditionally waiting for RCU, even if there's
no actual events to migrate.
Fixes: 0cda4c0231 ("perf: Introduce perf_pmu_migrate_context()")
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lkml.kernel.org/r/20230403090858.GT4253@hirez.programming.kicks-ass.net
Wait for VPU to be idle in ivpu_pm_suspend_cb() before powering off
the device, so jobs are not lost and TDRs are not triggered after
resume.
Fixes: 852be13f3b ("accel/ivpu: Add PM support")
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230331113603.2802515-3-stanislaw.gruszka@linux.intel.com
Currently job->done_fence is added to every BO handle within a job. If job
handle (command buffer) is shared between multiple submits, KMD will add
the fence in each of them. Then bo_wait_ioctl() executed on command buffer
will exit only when all jobs containing that handle are done.
This creates deadlock scenario for user mode driver in case when job handle
is added as dependency of another job, because bo_wait_ioctl() of first job
will wait until second job finishes, and second job can not finish before
first one.
Having fences added only to job buffer handle allows user space to execute
bo_wait_ioctl() on the job even if it's handle is submitted with other job.
Fixes: cd7272215c ("accel/ivpu: Add command buffer submission logic")
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230331113603.2802515-2-stanislaw.gruszka@linux.intel.com