linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-26 22:21:42 +00:00

Author	SHA1	Message	Date
Peter Xu	bb373dce2c	mm/hugetlb: don't wait for migration entry during follow page That's what the code does with !hugetlb pages, so we should logically do the same for hugetlb, so migration entry will also be treated as no page. This is probably also the last piece in follow_page code that may sleep, the last one should be removed in cf994dd8af27 ("mm/gup: remove FOLL_MIGRATION", 2022-11-16). Link: https://lkml.kernel.org/r/20221216155100.2043537-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: James Houghton <jthoughton@google.com> Cc: Jann Horn <jannh@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-18 17:12:38 -08:00
Peter Xu	243b1f2d3b	mm/hugetlb: let vma_offset_start() to return start Patch series "mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare", v4. Problem ======= huge_pte_offset() is a major helper used by hugetlb code paths to walk a hugetlb pgtable. It's used mostly everywhere since that's needed even before taking the pgtable lock. huge_pte_offset() is always called with mmap lock held with either read or write. It was assumed to be safe but it's actually not. One race condition can easily trigger by: (1) firstly trigger pmd share on a memory range, (2) do huge_pte_offset() on the range, then at the meantime, (3) another thread unshare the pmd range, and the pgtable page is prone to lost if the other shared process wants to free it completely (by either munmap or exit mm). The recent work from Mike on vma lock can resolve most of this already. It's achieved by forbidden pmd unsharing during the lock being taken, so no further risk of the pgtable page being freed. It means if we can take the vma lock around all huge_pte_offset() callers it'll be safe. There're already a bunch of them that we did as per the latest mm-unstable, but also quite a few others that we didn't for various reasons especially on huge_pte_offset() usage. One more thing to mention is that besides the vma lock, i_mmap_rwsem can also be used to protect the pgtable page (along with its pgtable lock) from being freed from under us. IOW, huge_pte_offset() callers need to either hold the vma lock or i_mmap_rwsem to safely walk the pgtables. A reproducer of such problem, based on hugetlb GUP (NOTE: since the race is very hard to trigger, one needs to apply another kernel delay patch too, see below): ======8<======= #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <errno.h> #include <unistd.h> #include <sys/mman.h> #include <fcntl.h> #include <linux/memfd.h> #include <assert.h> #include <pthread.h> #define MSIZE (1UL << 30) /* 1GB / #define PSIZE (2UL << 20) / 2MB / #define HOLD_SEC (1) int pipefd[2]; void buf; void do_map(int fd) { unsigned char tmpbuf, p; int ret; ret = posix_memalign((void )&tmpbuf, MSIZE, MSIZE); if (ret) { perror("posix_memalign() failed"); return NULL; } tmpbuf = mmap(tmpbuf, MSIZE, PROT_READ \| PROT_WRITE, MAP_SHARED \| MAP_FIXED, fd, 0); if (tmpbuf == MAP_FAILED) { perror("mmap() failed"); return NULL; } printf("mmap() -> %p\n", tmpbuf); for (p = tmpbuf; p < tmpbuf + MSIZE; p += PSIZE) { p = 1; } return tmpbuf; } void do_unmap(void buf) { munmap(buf, MSIZE); } void proc2(int fd) { unsigned char c; buf = do_map(fd); if (!buf) return; read(pipefd[0], &c, 1); / * This frees the shared pgtable page, causing use-after-free in * proc1_thread1 when soft walking hugetlb pgtable. / do_unmap(buf); printf("Proc2 quitting\n"); } void proc1_thread1(void data) { / * Trigger follow-page on 1st 2m page. Kernel hack patch needed to * withhold this procedure for easier reproduce. / madvise(buf, PSIZE, MADV_POPULATE_WRITE); printf("Proc1-thread1 quitting\n"); return NULL; } void proc1_thread2(void data) { unsigned char c; / Wait a while until proc1_thread1() start to wait / sleep(0.5); / Trigger pmd unshare / madvise(buf, PSIZE, MADV_DONTNEED); / Kick off proc2 to release the pgtable / write(pipefd[1], &c, 1); printf("Proc1-thread2 quitting\n"); return NULL; } void proc1(int fd) { pthread_t tid1, tid2; int ret; buf = do_map(fd); if (!buf) return; ret = pthread_create(&tid1, NULL, proc1_thread1, NULL); assert(ret == 0); ret = pthread_create(&tid2, NULL, proc1_thread2, NULL); assert(ret == 0); / Kick the child to share the PUD entry / pthread_join(tid1, NULL); pthread_join(tid2, NULL); do_unmap(buf); } int main(void) { int fd, ret; fd = memfd_create("test-huge", MFD_HUGETLB \| MFD_HUGE_2MB); if (fd < 0) { perror("open failed"); return -1; } ret = ftruncate(fd, MSIZE); if (ret) { perror("ftruncate() failed"); return -1; } ret = pipe(pipefd); if (ret) { perror("pipe() failed"); return -1; } if (fork()) { proc1(fd); } else { proc2(fd); } close(pipefd[0]); close(pipefd[1]); close(fd); return 0; } ======8<======= The kernel patch needed to present such a race so it'll trigger 100%: ======8<======= : diff --git a/mm/hugetlb.c b/mm/hugetlb.c : index 9d97c9a2a15d..f8d99dad5004 100644 : --- a/mm/hugetlb.c : +++ b/mm/hugetlb.c : @@ -38,6 +38,7 @@ : #include <asm/page.h> : #include <asm/pgalloc.h> : #include <asm/tlb.h> : +#include <asm/delay.h> : : #include <linux/io.h> : #include <linux/hugetlb.h> : @@ -6290,6 +6291,7 @@ long follow_hugetlb_page(struct mm_struct mm, struct vm_area_struct vma, : bool unshare = false; : int absent; : struct page page; : + unsigned long c = 0; : : /* : * If we have a pending SIGKILL, don't keep faulting pages and : @@ -6309,6 +6311,13 @@ long follow_hugetlb_page(struct mm_struct mm, struct vm_area_struct vma, : */ : pte = huge_pte_offset(mm, vaddr & huge_page_mask(h), : huge_page_size(h)); : + : + pr_info("%s: withhold 1 sec...\n", __func__); : + for (c = 0; c < 100; c++) { : + udelay(10000); : + } : + pr_info("%s: withhold 1 sec...done\n", __func__); : + : if (pte) : ptl = huge_pte_lock(h, mm, pte); : absent = !pte \|\| huge_pte_none(huge_ptep_get(pte)); : ======8<======= It'll trigger use-after-free of the pgtable spinlock: ======8<======= [ 16.959907] follow_hugetlb_page: withhold 1 sec... [ 17.960315] follow_hugetlb_page: withhold 1 sec...done [ 17.960550] ------------[ cut here ]------------ [ 17.960742] DEBUG_LOCKS_WARN_ON(1) [ 17.960756] WARNING: CPU: 3 PID: 542 at kernel/locking/lockdep.c:231 __lock_acquire+0x955/0x1fa0 [ 17.961264] Modules linked in: [ 17.961394] CPU: 3 PID: 542 Comm: hugetlb-pmd-sha Not tainted 6.1.0-rc4-peterx+ #46 [ 17.961704] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 17.962266] RIP: 0010:__lock_acquire+0x955/0x1fa0 [ 17.962516] Code: c0 0f 84 5f fe ff ff 44 8b 1d 0f 9a 29 02 45 85 db 0f 85 4f fe ff ff 48 c7 c6 75 50 83 82 48 c7 c7 1b 4b 7d 82 e8 d3 22 d8 00 <0f> 0b 31 c0 4c 8b 54 24 08 4c 8b 04 24 e9 [ 17.963494] RSP: 0018:ffffc90000e4fba8 EFLAGS: 00010096 [ 17.963704] RAX: 0000000000000016 RBX: fffffffffd3925a8 RCX: 0000000000000000 [ 17.963989] RDX: 0000000000000002 RSI: ffffffff82863ccf RDI: 00000000ffffffff [ 17.964276] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffc90000e4fa58 [ 17.964557] R10: 0000000000000003 R11: ffffffff83162688 R12: 0000000000000000 [ 17.964839] R13: 0000000000000001 R14: ffff888105eac748 R15: 0000000000000001 [ 17.965123] FS: 00007f17c0a00640(0000) GS:ffff888277cc0000(0000) knlGS:0000000000000000 [ 17.965443] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 17.965672] CR2: 00007f17c09ffef8 CR3: 000000010c87a005 CR4: 0000000000770ee0 [ 17.965956] PKRU: 55555554 [ 17.966068] Call Trace: [ 17.966172] <TASK> [ 17.966268] ? tick_nohz_tick_stopped+0x12/0x30 [ 17.966455] lock_acquire+0xbf/0x2b0 [ 17.966603] ? follow_hugetlb_page.cold+0x75/0x5c4 [ 17.966799] ? _printk+0x48/0x4e [ 17.966934] _raw_spin_lock+0x2f/0x40 [ 17.967087] ? follow_hugetlb_page.cold+0x75/0x5c4 [ 17.967285] follow_hugetlb_page.cold+0x75/0x5c4 [ 17.967473] __get_user_pages+0xbb/0x620 [ 17.967635] faultin_vma_page_range+0x9a/0x100 [ 17.967817] madvise_vma_behavior+0x3c0/0xbd0 [ 17.967998] ? mas_prev+0x11/0x290 [ 17.968141] ? find_vma_prev+0x5e/0xa0 [ 17.968304] ? madvise_vma_anon_name+0x70/0x70 [ 17.968486] madvise_walk_vmas+0xa9/0x120 [ 17.968650] do_madvise.part.0+0xfa/0x270 [ 17.968813] __x64_sys_madvise+0x5a/0x70 [ 17.968974] do_syscall_64+0x37/0x90 [ 17.969123] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 17.969329] RIP: 0033:0x7f1840f0efdb [ 17.969477] Code: c3 66 0f 1f 44 00 00 48 8b 15 39 6e 0e 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 0f 1f 44 00 00 f3 0f 1e fa b8 1c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d 68 [ 17.970205] RSP: 002b:00007f17c09ffe38 EFLAGS: 00000202 ORIG_RAX: 000000000000001c [ 17.970504] RAX: ffffffffffffffda RBX: 00007f17c0a00640 RCX: 00007f1840f0efdb [ 17.970786] RDX: 0000000000000017 RSI: 0000000000200000 RDI: 00007f1800000000 [ 17.971068] RBP: 00007f17c09ffe50 R08: 0000000000000000 R09: 00007ffd3954164f [ 17.971353] R10: 00007f1840e10348 R11: 0000000000000202 R12: ffffffffffffff80 [ 17.971709] R13: 0000000000000000 R14: 00007ffd39541550 R15: 00007f17c0200000 [ 17.972083] </TASK> [ 17.972199] irq event stamp: 2353 [ 17.972372] hardirqs last enabled at (2353): [<ffffffff8117fe4e>] __up_console_sem+0x5e/0x70 [ 17.972869] hardirqs last disabled at (2352): [<ffffffff8117fe33>] __up_console_sem+0x43/0x70 [ 17.973365] softirqs last enabled at (2330): [<ffffffff810f763d>] __irq_exit_rcu+0xed/0x160 [ 17.973857] softirqs last disabled at (2323): [<ffffffff810f763d>] __irq_exit_rcu+0xed/0x160 [ 17.974341] ---[ end trace 0000000000000000 ]--- [ 17.974614] BUG: kernel NULL pointer dereference, address: 00000000000000b8 [ 17.975012] #PF: supervisor read access in kernel mode [ 17.975314] #PF: error_code(0x0000) - not-present page [ 17.975615] PGD 103f7b067 P4D 103f7b067 PUD 106cd7067 PMD 0 [ 17.975943] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 17.976197] CPU: 3 PID: 542 Comm: hugetlb-pmd-sha Tainted: G W 6.1.0-rc4-peterx+ #46 [ 17.976712] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 17.977370] RIP: 0010:__lock_acquire+0x190/0x1fa0 [ 17.977655] Code: 98 00 00 00 41 89 46 24 81 e2 ff 1f 00 00 48 0f a3 15 e4 ba dd 02 0f 83 ff 05 00 00 48 8d 04 52 48 c1 e0 06 48 05 c0 d2 f4 83 <44> 0f b6 a0 b8 00 00 00 41 0f b7 46 20 6f [ 17.979170] RSP: 0018:ffffc90000e4fba8 EFLAGS: 00010046 [ 17.979787] RAX: 0000000000000000 RBX: fffffffffd3925a8 RCX: 0000000000000000 [ 17.980838] RDX: 0000000000000002 RSI: ffffffff82863ccf RDI: 00000000ffffffff [ 17.982048] RBP: 0000000000000000 R08: ffff888105eac720 R09: ffffc90000e4fa58 [ 17.982892] R10: ffff888105eab900 R11: ffffffff83162688 R12: 0000000000000000 [ 17.983771] R13: 0000000000000001 R14: ffff888105eac748 R15: 0000000000000001 [ 17.984815] FS: 00007f17c0a00640(0000) GS:ffff888277cc0000(0000) knlGS:0000000000000000 [ 17.985924] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 17.986265] CR2: 00000000000000b8 CR3: 000000010c87a005 CR4: 0000000000770ee0 [ 17.986674] PKRU: 55555554 [ 17.986832] Call Trace: [ 17.987012] <TASK> [ 17.987266] ? tick_nohz_tick_stopped+0x12/0x30 [ 17.987770] lock_acquire+0xbf/0x2b0 [ 17.988118] ? follow_hugetlb_page.cold+0x75/0x5c4 [ 17.988575] ? _printk+0x48/0x4e [ 17.988889] _raw_spin_lock+0x2f/0x40 [ 17.989243] ? follow_hugetlb_page.cold+0x75/0x5c4 [ 17.989687] follow_hugetlb_page.cold+0x75/0x5c4 [ 17.990119] __get_user_pages+0xbb/0x620 [ 17.990500] faultin_vma_page_range+0x9a/0x100 [ 17.990928] madvise_vma_behavior+0x3c0/0xbd0 [ 17.991354] ? mas_prev+0x11/0x290 [ 17.991678] ? find_vma_prev+0x5e/0xa0 [ 17.992024] ? madvise_vma_anon_name+0x70/0x70 [ 17.992421] madvise_walk_vmas+0xa9/0x120 [ 17.992793] do_madvise.part.0+0xfa/0x270 [ 17.993166] __x64_sys_madvise+0x5a/0x70 [ 17.993539] do_syscall_64+0x37/0x90 [ 17.993879] entry_SYSCALL_64_after_hwframe+0x63/0xcd ======8<======= Resolution ========== This patchset protects all the huge_pte_offset() callers to also take the vma lock properly. Patch Layout ============ Patch 1-2: cleanup, or dependency of the follow up patches Patch 3: before fixing, document huge_pte_offset() on lock required Patch 4-8: each patch resolves one possible race condition Patch 9: introduce hugetlb_walk() to replace huge_pte_offset() Tests ===== The series is verified with the above reproducer so the race cannot trigger anymore. It also passes all hugetlb kselftests. This patch (of 9): Even though vma_offset_start() is named like that, it's not returning "the start address of the range" but rather the offset we should use to offset the vma->vm_start address. Make it return the real value of the start vaddr, and it also helps for all the callers because whenever the retval is used, it'll be ultimately added into the vma->vm_start anyway, so it's better. Link: https://lkml.kernel.org/r/20221216155100.2043537-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20221216155100.2043537-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: James Houghton <jthoughton@google.com> Cc: Jann Horn <jannh@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-18 17:12:38 -08:00
Mike Kravetz	379c2e60e8	hugetlb: update vma flag check for hugetlb vma lock The check for whether a hugetlb vma lock exists partially depends on the vma's flags. Currently, it checks for either VM_MAYSHARE or VM_SHARED. The reason both flags are used is because VM_MAYSHARE was previously cleared in hugetlb vmas as they are tore down. This is no longer the case, and only the VM_MAYSHARE check is required. Link: https://lkml.kernel.org/r/20221212235042.178355-2-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: James Houghton <jthoughton@google.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-18 17:12:38 -08:00
Jeff Xu	11f75a0144	selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC Tests to verify MFD_NOEXEC, MFD_EXEC and vm.memfd_noexec sysctl. Link: https://lkml.kernel.org/r/20221215001205.51969-6-jeffxu@google.com Signed-off-by: Jeff Xu <jeffxu@google.com> Co-developed-by: Daniel Verkamp <dverkamp@chromium.org> Signed-off-by: Daniel Verkamp <dverkamp@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jorge Lucangeli Obes <jorgelo@chromium.org> Cc: kernel test robot <lkp@intel.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-18 17:12:37 -08:00
Jeff Xu	c4f75bc8bd	mm/memfd: add write seals when apply SEAL_EXEC to executable memfd In order to avoid WX mappings, add F_SEAL_WRITE when apply F_SEAL_EXEC to an executable memfd, so W^X from start. This implys application need to fill the content of the memfd first, after F_SEAL_EXEC is applied, application can no longer modify the content of the memfd. Typically, application seals the memfd right after writing to it. For example: 1. memfd_create(MFD_EXEC). 2. write() code to the memfd. 3. fcntl(F_ADD_SEALS, F_SEAL_EXEC) to convert the memfd to W^X. 4. call exec() on the memfd. Link: https://lkml.kernel.org/r/20221215001205.51969-5-jeffxu@google.com Signed-off-by: Jeff Xu <jeffxu@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Daniel Verkamp <dverkamp@chromium.org> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jorge Lucangeli Obes <jorgelo@chromium.org> Cc: kernel test robot <lkp@intel.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-18 17:12:37 -08:00
Jeff Xu	105ff5339f	mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC The new MFD_NOEXEC_SEAL and MFD_EXEC flags allows application to set executable bit at creation time (memfd_create). When MFD_NOEXEC_SEAL is set, memfd is created without executable bit (mode:0666), and sealed with F_SEAL_EXEC, so it can't be chmod to be executable (mode: 0777) after creation. when MFD_EXEC flag is set, memfd is created with executable bit (mode:0777), this is the same as the old behavior of memfd_create. The new pid namespaced sysctl vm.memfd_noexec has 3 values: 0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like MFD_EXEC was set. 1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like MFD_NOEXEC_SEAL was set. 2: memfd_create() without MFD_NOEXEC_SEAL will be rejected. The sysctl allows finer control of memfd_create for old-software that doesn't set the executable bit, for example, a container with vm.memfd_noexec=1 means the old-software will create non-executable memfd by default. Also, the value of memfd_noexec is passed to child namespace at creation time. For example, if the init namespace has vm.memfd_noexec=2, all its children namespaces will be created with 2. [akpm@linux-foundation.org: add stub functions to fix build] [akpm@linux-foundation.org: remove unneeded register_pid_ns_ctl_table_vm() stub, per Jeff] [akpm@linux-foundation.org: s/pr_warn_ratelimited/pr_warn_once/, per review] [akpm@linux-foundation.org: fix CONFIG_SYSCTL=n warning] Link: https://lkml.kernel.org/r/20221215001205.51969-4-jeffxu@google.com Signed-off-by: Jeff Xu <jeffxu@google.com> Co-developed-by: Daniel Verkamp <dverkamp@chromium.org> Signed-off-by: Daniel Verkamp <dverkamp@chromium.org> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jorge Lucangeli Obes <jorgelo@chromium.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-18 17:12:37 -08:00
Daniel Verkamp	32d118ad50	selftests/memfd: add tests for F_SEAL_EXEC Basic tests to ensure that user/group/other execute bits cannot be changed after applying F_SEAL_EXEC to a memfd. Link: https://lkml.kernel.org/r/20221215001205.51969-3-jeffxu@google.com Signed-off-by: Daniel Verkamp <dverkamp@chromium.org> Co-developed-by: Jeff Xu <jeffxu@google.com> Signed-off-by: Jeff Xu <jeffxu@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jorge Lucangeli Obes <jorgelo@chromium.org> Cc: kernel test robot <lkp@intel.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-18 17:12:37 -08:00
Daniel Verkamp	6fd7353829	mm/memfd: add F_SEAL_EXEC Patch series "mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC", v8. Since Linux introduced the memfd feature, memfd have always had their execute bit set, and the memfd_create() syscall doesn't allow setting it differently. However, in a secure by default system, such as ChromeOS, (where all executables should come from the rootfs, which is protected by Verified boot), this executable nature of memfd opens a door for NoExec bypass and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm process created a memfd to share the content with an external process, however the memfd is overwritten and used for executing arbitrary code and root escalation. [2] lists more VRP in this kind. On the other hand, executable memfd has its legit use, runc uses memfd’s seal and executable feature to copy the contents of the binary then execute them, for such system, we need a solution to differentiate runc's use of executable memfds and an attacker's [3]. To address those above, this set of patches add following: 1> Let memfd_create() set X bit at creation time. 2> Let memfd to be sealed for modifying X bit. 3> A new pid namespace sysctl: vm.memfd_noexec to control the behavior of X bit.For example, if a container has vm.memfd_noexec=2, then memfd_create() without MFD_NOEXEC_SEAL will be rejected. 4> A new security hook in memfd_create(). This make it possible to a new LSM, which rejects or allows executable memfd based on its security policy. This patch (of 5): The new F_SEAL_EXEC flag will prevent modification of the exec bits: written as traditional octal mask, 0111, or as named flags, S_IXUSR \| S_IXGRP \| S_IXOTH. Any chmod(2) or similar call that attempts to modify any of these bits after the seal is applied will fail with errno EPERM. This will preserve the execute bits as they are at the time of sealing, so the memfd will become either permanently executable or permanently un-executable. Link: https://lkml.kernel.org/r/20221215001205.51969-1-jeffxu@google.com Link: https://lkml.kernel.org/r/20221215001205.51969-2-jeffxu@google.com Signed-off-by: Daniel Verkamp <dverkamp@chromium.org> Co-developed-by: Jeff Xu <jeffxu@google.com> Signed-off-by: Jeff Xu <jeffxu@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jorge Lucangeli Obes <jorgelo@chromium.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-18 17:12:37 -08:00
Peter Xu	f1eb1bacfb	mm/uffd: always wr-protect pte in pte\|pmd_mkuffd_wp() This patch is a cleanup to always wr-protect pte/pmd in mkuffd_wp paths. The reasons I still think this patch is worthwhile, are: (1) It is a cleanup already; diffstat tells. (2) It just feels natural after I thought about this, if the pte is uffd protected, let's remove the write bit no matter what it was. (2) Since x86 is the only arch that supports uffd-wp, it also redefines pte\|pmd_mkuffd_wp() in that it should always contain removals of write bits. It means any future arch that want to implement uffd-wp should naturally follow this rule too. It's good to make it a default, even if with vm_page_prot changes on VM_UFFD_WP. (3) It covers more than vm_page_prot. So no chance of any potential future "accident" (like pte_mkdirty() sparc64 or loongarch, even though it just got its pte_mkdirty fixed <1 month ago). It'll be fairly clear when reading the code too that we don't worry anything before a pte_mkuffd_wp() on uncertainty of the write bit. We may call pte_wrprotect() one more time in some paths (e.g. thp split), but that should be fully local bitop instruction so the overhead should be negligible. Although this patch should logically also fix all the known issues on uffd-wp too recently on page migration (not for numa hint recovery - that may need another explcit pte_wrprotect), but this is not the plan for that fix. So no fixes, and stable doesn't need this. Link: https://lkml.kernel.org/r/20221214201533.1774616-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ives van Hoorne <ives@codesandbox.io> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-18 17:12:37 -08:00
Sidhartha Kumar	04a42e72d7	mm: move folio_set_compound_order() to mm/internal.h folio_set_compound_order() is moved to an mm-internal location so external folio users cannot misuse this function. Change the name of the function to folio_set_order() and use WARN_ON_ONCE() rather than BUG_ON. Also, handle the case if a non-large folio is passed and add clarifying comments to the function. Link: https://lore.kernel.org/lkml/20221207223731.32784-1-sidhartha.kumar@oracle.com/T/ Link: https://lkml.kernel.org/r/20221215061757.223440-1-sidhartha.kumar@oracle.com Fixes: `9fd330582b` ("mm: add folio dtor and order setter functions") Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Suggested-by: Mike Kravetz <mike.kravetz@oracle.com> Suggested-by: Muchun Song <songmuchun@bytedance.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Suggested-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-18 17:12:36 -08:00
Andrew Morton	1301f93134	Pull mm-hotfixes-stable dependencies into mm-stable. Merge branch 'mm-hotfixes-stable' into mm-stable	2023-01-18 17:03:20 -08:00
Peter Xu	7e3ce3f8d2	mm: fix a few rare cases of using swapin error pte marker This patch should harden commit `15520a3f04` ("mm: use pte markers for swap errors") on using pte markers for swapin errors on a few corner cases. 1. Propagate swapin errors across fork()s: if there're swapin errors in the parent mm, after fork()s the child should sigbus too when an error page is accessed. 2. Fix a rare condition race in pte_marker_clear() where a uffd-wp pte marker can be quickly switched to a swapin error. 3. Explicitly ignore swapin error pte markers in change_protection(). I mostly don't worry on (2) or (3) at all, but we should still have them. Case (1) is special because it can potentially cause silent data corrupt on child when parent has swapin error triggered with swapoff, but since swapin error is rare itself already it's probably not easy to trigger either. Currently there is a priority difference between the uffd-wp bit and the swapin error entry, in which the swapin error always has higher priority (e.g. we don't need to wr-protect a swapin error pte marker). If there will be a 3rd bit introduced, we'll probably need to consider a more involved approach so we may need to start operate on the bits. Let's leave that for later. This patch is tested with case (1) explicitly where we'll get corrupted data before in the child if there's existing swapin error pte markers, and after patch applied the child can be rightfully killed. We don't need to copy stable for this one since `15520a3f04` just landed as part of v6.2-rc1, only "Fixes" applied. Link: https://lkml.kernel.org/r/20221214200453.1772655-3-peterx@redhat.com Fixes: `15520a3f04` ("mm: use pte markers for swap errors") Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-18 17:02:19 -08:00
Peter Xu	49d6d7fb63	mm/uffd: fix pte marker when fork() without fork event Patch series "mm: Fixes on pte markers". Patch 1 resolves the syzkiller report from Pengfei. Patch 2 further harden pte markers when used with the recent swapin error markers. The major case is we should persist a swapin error marker after fork(), so child shouldn't read a corrupted page. This patch (of 2): When fork(), dst_vma is not guaranteed to have VM_UFFD_WP even if src may have it and has pte marker installed. The warning is improper along with the comment. The right thing is to inherit the pte marker when needed, or keep the dst pte empty. A vague guess is this happened by an accident when there's the prior patch to introduce src/dst vma into this helper during the uffd-wp feature got developed and I probably messed up in the rebase, since if we replace dst_vma with src_vma the warning & comment it all makes sense too. Hugetlb did exactly the right here (copy_hugetlb_page_range()). Fix the general path. Reproducer: https://github.com/xupengfe/syzkaller_logs/blob/main/221208_115556_copy_page_range/repro.c Bugzilla report: https://bugzilla.kernel.org/show_bug.cgi?id=216808 Link: https://lkml.kernel.org/r/20221214200453.1772655-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20221214200453.1772655-2-peterx@redhat.com Fixes: `c56d1b62cc` ("mm/shmem: handle uffd-wp during fork()") Signed-off-by: Peter Xu <peterx@redhat.com> Reported-by: Pengfei Xu <pengfei.xu@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: <stable@vger.kernel.org> # 5.19+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2023-01-18 17:02:19 -08:00
Andrew Morton	bd86d2ea36	Sync with v6.2-rc4 Merge branch 'master' into mm-hotfixes-stable	2023-01-18 16:52:20 -08:00
Andrew Morton	0e18a6b49b	Sync with v6.2-rc4 Merge branch 'master' into mm-stable	2023-01-18 16:51:53 -08:00
Linus Torvalds	5dc4c995db	Linux 6.2-rc4	2023-01-15 09:22:43 -06:00
Linus Torvalds	f0f70ddb8f	- Make sure the poking PGD is pinned for Xen PV as it requires it this way - Fixes for two resctrl races when moving a task or creating a new monitoring group - Fix SEV-SNP guests running under HyperV where MTRRs are disabled to not return a UC- type mapping type on memremap() and thus cause a serious slowdown - Fix insn mnemonics in bioscall.S now that binutils is starting to fix confusing insn suffixes -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPD5xsACgkQEsHwGGHe VUr65g/8CkfKQKIQ/kPn1B+M/PI4S8DBmz7CdufQTbB66GSfDwRpGnxIKJKZM1UG pyOmP1kHVXGGCsvFQimalxnBtFx6t3wFS+R+c/l5pRtgU63bwQpNjJ+vBcBpb4xs J7f06VgF6jB8qk0NqXBKNJt4kauZA50wiErYm8PU/yf0tmHratjrvDfKQmus9pI4 AMcQEudRhDDo8xqn8fvIC/S7xm9/TgNzKxYP+HciPa74HZm5vrjS0hIyZkIsh5d7 Q4MsP4VaBH12W6MbpF9TBw5VTDomwY8oS4xTNfkhyOTcHi8uLrMjA66VmaJwWXpe EZjOk4+KhtxYigaI5oQO9M5e9IINK6ZfbzU65P74wTvP3orszsBPG7QvodIzLc76 YI1GEea/bXgxYgPuMR6spZoGMS58K7BXgViVMVRwZ9bZk17+5pfbLkzKqZsfw/s/ nj3n8z4ayoc+ffDPllh0471Dn16ugKMf+EvW2Su1Q1QoaA7icNNkZrKaM6eSuoFr bClsrHeglQFadwy4kmY0fi7BUfiKTp+c5Ur7lM7VojrjH0XcoZreThVg0tNrk3ZR IAyBjtCdtg1rzrRfb7qBf6B+WNGdxYWGuDgOPiH1Xh+/plvyCqQ2/3AGWQAbu/qP to+IYmZg09mRpVBDYpCY4z6IzRbMJ9rRho4rNXCQUeAzSVMCrZE= =W2zR -----END PGP SIGNATURE----- Merge tag 'x86_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Make sure the poking PGD is pinned for Xen PV as it requires it this way - Fixes for two resctrl races when moving a task or creating a new monitoring group - Fix SEV-SNP guests running under HyperV where MTRRs are disabled to not return a UC- type mapping type on memremap() and thus cause a serious slowdown - Fix insn mnemonics in bioscall.S now that binutils is starting to fix confusing insn suffixes * tag 'x86_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: fix poking_init() for Xen PV guests x86/resctrl: Fix event counts regression in reused RMIDs x86/resctrl: Fix task CLOSID/RMID update race x86/pat: Fix pat_x_mtrr_type() for MTRR disabled case x86/boot: Avoid using Intel mnemonics in AT&T syntax asm	2023-01-15 07:17:44 -06:00
Linus Torvalds	8aa9761223	- Fix the EDAC device's confusion in the polling setting units - Fix a memory leak in highbank's probing function -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPD41kACgkQEsHwGGHe VUqYOQ//bsKMgAbPDbvgSm8OkFvtLd+r38RE+dcg84QrBaeQ7z2x269i1bUhqkQV baatecdz+VoNQK3zmrIRshwgCGuwDEpRWAuPKJxoCfSU1eNzEipxHCoctNcp2q+Y 7nP65JtkQ6y1nEnBYh5h1snwbM3yuLIvmmuhYhPpd204k00L16Tzsu1uGiz7nDsZ 9ohBtS6laHztJnVd9gKIHOcgBCylPjNYpRecsgZk2COK/O9uTo8drQ46gKxwHQ4l P4RsvDligoaFO8rRLAzg2sfsOt9O2runkdWhYnVKSLdXOOA+wQ+eacOuzWs5dNpQ BQfWZB3GwxQ3+G1c2WOVrq/15YNrxpqMA9/060vPzGqW9VDCnOnoS3w5nHCDw5Ep YB+3OVn31Z4Hw+nP/WCR/0QYKDSBmqvGd4wKmHcUnL22rRZSl6UZTSqnBw3yrhV9 YzmZ+r3B9b7AtvDs3qWMjQwFYnxmeU7DSXsxjbUpDcMDuy/Y8/P9CSzE2ruC26L1 TdK/Vy6igoDCThzv0RMFEUfmpojIYoTtGNuyMrA90ZNKwXKVOcf9mM3TQm4vLH7m P3Q9UvBlApoaKbXSZaU0cV+klnba2prAnu4LTVeBWoyjPlmFZkv9ZxDHId8n2rie dTe/KZJRlScfwVtp83xh428ixdYWNs03R6P5tQAMKoZNLnixqXQ= =TiMH -----END PGP SIGNATURE----- Merge tag 'edac_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fixes from Borislav Petkov: - Fix the EDAC device's confusion in the polling setting units - Fix a memory leak in highbank's probing function * tag 'edac_urgent_for_v6.2_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/highbank: Fix memory leak in highbank_mc_probe() EDAC/device: Fix period calculation in edac_device_reset_delay_period()	2023-01-15 07:12:58 -06:00
Linus Torvalds	b1d63f0c77	powerpc fixes for 6.2 #3 - Fix a build failure with some versions of ld that have an odd version string. - Fix incorrect use of mutex in the IMC PMU driver. Thanks to: Kajol Jain, Michael Petlan, Ojaswin Mujoo, Peter Zijlstra, Yang Yingliang. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmPD2ZoTHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgIJTD/9HeqFTpviUPd0QCW2HWVv2jwLA97Qy BGsMVz6eSk+f3YBFDFJ0n5KqYR4Bry5mgwPm1nQe86cKr+ZCdtnF+10vPL0TEdK5 M4a+u9PA+xqq9ukIN96ZP7QGi80YY0RRIrfkOwK6iQJVLSxT+TUcl1ko/iPalbI7 /5c2BCnkIoDbwU1ux3/6+uJCDFE3oLy5v1nt/mbzuekFTRvPGRCt+DWBywrYKYi8 XfBBVTz6F+PoBZ7vZa6Kv5IQANiuU4COTjpm9AjVvjp0oKfYskBmtZvHQhr3v6v4 HZsms49w0r3D7sZgEB2hmKFM2/QijSDeyBxnmt/hHeNYMMiFg+0lhoxoNoykzcN9 UfFU7NID3uqruYMkhAxiCIvyul9Vzcr+pe3GgooY+AtuokhuMUEUXJjEDIyXtoci 2VnEsdbl0/gihdHedfhLRXlEn8xz6fQvxDcpYZClSIDeS/nL7cuPd/9+JHn1hilq aJ0MX6VUKnwkSBb9Gkd1bt09jS6lqDUQS5+88IMvoJo53xHVlHF+5kqXAJRkpt1w XsDMBuKqT/aEC+rI5GyHXNglGuBYqMvEmbdEGtIVFCbUkcI35Xe8RXKmqDbO2U7N qqfYj53dgaptJF/sllcGuCTj6JrdJvKEVnuzsIwoE+XyXzr1e9UBmyZ+I//VjTjL Mhy9rXp7tJYnkw== =BaFo -----END PGP SIGNATURE----- Merge tag 'powerpc-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix a build failure with some versions of ld that have an odd version string - Fix incorrect use of mutex in the IMC PMU driver Thanks to Kajol Jain, Michael Petlan, Ojaswin Mujoo, Peter Zijlstra, and Yang Yingliang. * tag 'powerpc-6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s/hash: Make stress_hpt_timer_fn() static powerpc/imc-pmu: Fix use of mutex in IRQs disabled section powerpc/boot: Fix incorrect version calculation issue in ld_version	2023-01-15 07:09:41 -06:00
Linus Torvalds	7c69844052	IOMMU Fixes for Linux v6.2-rc3 Including: - Core: Fix an iommu-group refcount leak - Fix overflow issue in IOVA alloc path - ARM-SMMU fixes from Will: - Fix VFIO regression on NXP SoCs by reporting IOMMU_CAP_CACHE_COHERENCY - Fix SMMU shutdown paths to avoid device unregistration race - Error handling fix for Mediatek IOMMU driver -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmPC1nsACgkQK/BELZcB GuN+aw//c4rOO4buNaG/T00HfSdSGq1VwG1aIicslC82IDnh28R4A0iGoGtlmXJo +2qC2VPQaH7SpU7WEjhwIjuXUyuYQF5gvrZFnrumHGRSYI7IYze793TAbsGA9bLV Wn20rygyLlptu+wnGYHIG9PkB041ysjqJtQpRvT5AvUYW3Z9BoNDWs5YwJ9Qfm+W pm781ctgURPSmNK+wKKkRh5CCteWRxhKh8FKMvQ9o6lAoJNB/dcPpyE2oJ+lMojm kKhONbvQe3DdRm/zNY3gV1chTDPNeyIHhDGc6/NA1oAjuETlhzOG3JIrroijzsnA dZOJSJ6/jzqA6ZBh5hhuyUSbB0rRAN2URnrO2eFfJaVw7GJH60pdA7asxu37gNuF umbtsdzBZW0xba3qL7tvASZnKZCVeEsR4D6Apb36eaR7h6U7X1kKXOAK1PqHVS7+ LjT7RCMBx+UbKSpvT2ETMlLHpSDNA81X9yzssA4H7Cyk17NguB/L9Hd/I6uzbb26 ZHI/mZRJ0d4DvXCzmQK4760A2TSfAPA9UiseevDNQHgb8ZYw0hWqJbTaKsF5UfUe MEi0Jd0djQek++oqX7lQar04hbWC6BJ/aY12eATNVVVcuPqJPluysczcQdat342s fZzNx5ghRsSLxCVyuu1pigmoikSCI/SdoPnO69Dw/86bWvS+Djs= =THhI -----END PGP SIGNATURE----- Merge tag 'iommu-fixes-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: - Core: Fix an iommu-group refcount leak - Fix overflow issue in IOVA alloc path - ARM-SMMU fixes from Will: - Fix VFIO regression on NXP SoCs by reporting IOMMU_CAP_CACHE_COHERENCY - Fix SMMU shutdown paths to avoid device unregistration race - Error handling fix for Mediatek IOMMU driver * tag 'iommu-fixes-v6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/mediatek-v1: Fix an error handling path in mtk_iommu_v1_probe() iommu/iova: Fix alloc iova overflows issue iommu: Fix refcount leak in iommu_device_claim_dma_owner iommu/arm-smmu-v3: Don't unregister on shutdown iommu/arm-smmu: Don't unregister on shutdown iommu/arm-smmu: Report IOMMU_CAP_CACHE_COHERENCY even betterer	2023-01-14 10:48:15 -06:00
Linus Torvalds	4f43ade45d	memblock: always release pages to the buddy allocator in memblock_free_late() If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages() only releases pages to the buddy allocator if they are not in the deferred range. This is correct for free pages (as defined by for_each_free_mem_pfn_range_in_zone()) because free pages in the deferred range will be initialized and released as part of the deferred init process. memblock_free_pages() is called by memblock_free_late(), which is used to free reserved ranges after memblock_free_all() has run. All pages in reserved ranges have been initialized at that point, and accordingly, those pages are not touched by the deferred init process. This means that currently, if the pages that memblock_free_late() intends to release are in the deferred range, they will never be released to the buddy allocator. They will forever be reserved. In addition, memblock_free_pages() calls kmsan_memblock_free_pages(), which is also correct for free pages but is not correct for reserved pages. KMSAN metadata for reserved pages is initialized by kmsan_init_shadow(), which runs shortly before memblock_free_all(). For both of these reasons, memblock_free_pages() should only be called for free pages, and memblock_free_late() should call __free_pages_core() directly instead. One case where this issue can occur in the wild is EFI boot on x86_64. The x86 EFI code reserves all EFI boot services memory ranges via memblock_reserve() and frees them later via memblock_free_late() (efi_reserve_boot_services() and efi_free_boot_services(), respectively). If any of those ranges happens to fall within the deferred init range, the pages will not be released and that memory will be unavailable. For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI: v6.2-rc2: Node 0, zone DMA spanned 4095 present 3999 managed 3840 Node 0, zone DMA32 spanned 246652 present 245868 managed 178867 v6.2-rc2 + patch: Node 0, zone DMA spanned 4095 present 3999 managed 3840 Node 0, zone DMA32 spanned 246652 present 245868 managed 222816 # +43,949 pages -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmPCrI8QHHJwcHRAa2Vy bmVsLm9yZwAKCRA5A4Ymyw79kT1lB/wPbLpePLzZfDGyV/NR9gi4FuJiaRfhlklV rbxnJce050GERbSQoF/r4zrxn2pzvIWGMh1xWZBGi/q8mT2rOIYtVqUahY9YuL/Z 7+xqdCOALIxEj+cXqYocqp8/NFgUWLGuMoomc9lWvEkUs+zOvkD8Z/bRecfPYvOa BftPALmtXgx46Ecce0gZvvh4YULpVLNdDPPiwZTabV+47Cl8+cJ0Y+iEHsUfOesU hQG0unWJH77O3IU4QxiirLekLP/6a5O5f0W7u3PZmNNv7N+UdwE+De+QF0aamfgA LZDO1qOakflegFZvK0JchCzS4hc6dtRKqIvNM3cCBMXLvV4REHKP =geNh -----END PGP SIGNATURE----- Merge tag 'fixes-2023-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock fix from Mike Rapoport: "memblock: always release pages to the buddy allocator in memblock_free_late() If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages() only releases pages to the buddy allocator if they are not in the deferred range. This is correct for free pages (as defined by for_each_free_mem_pfn_range_in_zone()) because free pages in the deferred range will be initialized and released as part of the deferred init process. memblock_free_pages() is called by memblock_free_late(), which is used to free reserved ranges after memblock_free_all() has run. All pages in reserved ranges have been initialized at that point, and accordingly, those pages are not touched by the deferred init process. This means that currently, if the pages that memblock_free_late() intends to release are in the deferred range, they will never be released to the buddy allocator. They will forever be reserved. In addition, memblock_free_pages() calls kmsan_memblock_free_pages(), which is also correct for free pages but is not correct for reserved pages. KMSAN metadata for reserved pages is initialized by kmsan_init_shadow(), which runs shortly before memblock_free_all(). For both of these reasons, memblock_free_pages() should only be called for free pages, and memblock_free_late() should call __free_pages_core() directly instead. One case where this issue can occur in the wild is EFI boot on x86_64. The x86 EFI code reserves all EFI boot services memory ranges via memblock_reserve() and frees them later via memblock_free_late() (efi_reserve_boot_services() and efi_free_boot_services(), respectively). If any of those ranges happens to fall within the deferred init range, the pages will not be released and that memory will be unavailable. For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI: v6.2-rc2: Node 0, zone DMA spanned 4095 present 3999 managed 3840 Node 0, zone DMA32 spanned 246652 present 245868 managed 178867 v6.2-rc2 + patch: Node 0, zone DMA spanned 4095 present 3999 managed 3840 Node 0, zone DMA32 spanned 246652 present 245868 managed 222816 # +43,949 pages" * tag 'fixes-2023-01-14' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: mm: Always release pages to the buddy allocator in memblock_free_late().	2023-01-14 10:08:08 -06:00
Linus Torvalds	880ca43e5c	kernel hardening fixes for v6.2-rc4 - Fix CFI hash randomization with KASAN (Sami Tolvanen) - Check size of coreboot table entry and use flex-array -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmPB6IwWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJgqhEACZc2ehC6DNc3JSQEbZ9H47FL6Z pnyNvi+ZkC+vxENjH6WMMjtwehWHiQJVcHNaK9eF9/7A3pq58axw3RyeeVbPmC3B E0bDjJqaknAKa9FtFdyCTXD0V1TmY/s+oHTZHUXohq9ctI+hJT3reTJ55Uo5jlyV 8aB2lvbg8Bch4BAmg7z8gd3208VL30Q3Go0mspmovYUXVCvnwe08SyROIoJZnE9+ m5IIRfVCNFrAda1DPfiNeqQcE2EnKhTT0ESwtZbQ0HS5z1zJRYjs8gaeY63iQTNn tR1mpP97RngzQ1jCfZP3dZIuYA1TLgz/px0WraYflrpnYpzJOl0XLiigXefU5lyL 7YtGb9xuu8TXMI2D+n52DlYXGRjc9I7zUMPg03y7sC4BnKX5eA6Qda4plP5kvxxp K9PSO91RkS+01nwvXCNs7ISkQ1YpayDyNxsiDIqmHx3po9QB5QniceAa5mIYR/ld v9QKzRhLELiq8cYdu+fgfSOEaY8q9+/k+kEHakfsrXoLaiK2RVw4Y++S6Fh1QIy4 R8DHdhd8j33Yws96xRhI2P+g5mVzDpdEN1TtskdO5WjefCT83R84qqJsEaklVTrI AQDSweQfF+hc+B1PkDRbCgiLeSUnPfxzwdSoy35fc9/qg/JnoQMHFrkJB2Xn2+hv KaFfgM93f1CbCW/KDg== =KwzM -----END PGP SIGNATURE----- Merge tag 'hardening-v6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kernel hardening fixes from Kees Cook: - Fix CFI hash randomization with KASAN (Sami Tolvanen) - Check size of coreboot table entry and use flex-array * tag 'hardening-v6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: kbuild: Fix CFI hash randomization with KASAN firmware: coreboot: Check size of table entry and use flex-array	2023-01-14 10:04:00 -06:00
Linus Torvalds	8b7be52f3f	modules-6.2-rc4 Just one fix for modules by Nick. -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmPB5S4SHG1jZ3JvZkBr ZXJuZWwub3JnAAoJEM4jHQowkoin0OUQALx+uch7cZvl3aznH4CAkF1U2mKuESLT unQsnSlxIF0BETTAdPbhcCVEtKi102Xq4R6ffCSilB33b0IGM6qY6ZUQb/GMGhg5 qIur7EDiFksjCBmBuE1iNaAknkiKeirkJTEC4lEOl8OLXprmnuMeeR+E0BDi5sH3 YIxexLuhyKeF9Ke4bjAJ7A4WKvh2R7yamISCUcP6GL3w7U9urSjo9FHKcUmFH9nr qCX/1zE0fz7iJzb9YtBNVhdgKNOYNOxa7TDNXQCVuLcZQfmQqDMBVgKdOkj6TAWX 6L2CHT4N9IjPt9+DrlEUf6bSSwP4N4aFdyMAo6UlVXcgvEbTS/kdoMSqFOAQSAdb G/lzsvS3fD76VcCdZkwAFYnEhUTJ4xWTS0oaI++tu0EFX5lvRuQv2DRt0WULlD/u L0paUwmjtVajcSIATxRZkjoMiVD4btDRz30kaIUU/xoc1Gg/EADrSLHESaZ9eZVL EJ40aqLLIRBXGZrVEzvf97HIzuQiKfaPzywNvbMpxG3m0tV2pn3Z4ts/A8aO7c+O mBDnTURiZN6pT+xsnJBvqWrlXwPRUGwI+NjRcdPZhUyfgj5MHpEI8PAcUWy6TTUn H2P6x2iC3/nypqhnwjoixSptjaUWcak3R6UgwVS2YqfjePCqaq0wg9DAFDQH0Yx3 awAOoum0Ubin =aa3U -----END PGP SIGNATURE----- Merge tag 'modules-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull module fix from Luis Chamberlain: "Just one fix for modules by Nick" * tag 'modules-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: kallsyms: Fix scheduling with interrupts disabled in self-test	2023-01-14 08:17:27 -06:00
Linus Torvalds	b35ad63eec	7 smb3 client fixes -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmPB/t0ACgkQiiy9cAdy T1Gvbwv+LIXF5dHNGHDuezecbD8T9sRF2v15Mh7i6SSg7BWeXXebYY4dyrSQ5SHu KqsN2Y3P3A0ZQ3ApzFryImM6BwSpOeyCsVRhl7VCWnMgXcroqc/O6F6/YRFVkUAi iWWZLXM7WFBQGXUbPXaiPc+wCRARrnul9p+48Teyy0CJWiWormQmkznVxeihErDX /pWdQdvJeFcUrIj1H3e4cyJF2hVzRiUGI/eZmBGlDyaK192vYgGYO2AhHnTfd7fU dUJ+/trVw0koyC5/86veHRqCcXzFD44ORkAB46NaCic1K8t+RPhmxgtriiNLcCrQ kEmeub6ayPkuniV88NBEPXaDy0S/cEYHr7GEuZTn4sq+hw//y5KbKN3sa6aLHsMH 46BIHcyTXU59eNJ4lWOjoqD2NiqP2GYFY4PZftB85H1EW8Fchwcsw4WzW0ENpzmi qcWslXDKYjJIZ6NHnBiR/FYI1VdsmoGbDzMhrHreCrWCuIuVaqfrFPMnvE8f5dqY fkfnkqT4 =SdZA -----END PGP SIGNATURE----- Merge tag '6.2-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: - memory leak and double free fix - two symlink fixes - minor cleanup fix - two smb1 fixes * tag '6.2-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix uninitialized memory read for smb311 posix symlink create cifs: fix potential memory leaks in session setup cifs: do not query ifaces on smb1 mounts cifs: fix double free on failed kerberos auth cifs: remove redundant assignment to the variable match cifs: fix file info setting in cifs_open_file() cifs: fix file info setting in cifs_query_path_info()	2023-01-14 08:08:25 -06:00
Linus Torvalds	8e76813085	SCSI fixes on 20230114 Two minor fixes in the hisi_sas driver which only impact enterprise style multi-expander and shared disk situations and no core changes. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCY8KxpSYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishRAlAP9cQmiq M8WGimymqulRFEpkWpDM1R06eqHDI2K0h5/rfAD+KlgS0cOVvx0nenWyhmprYvDX 2Z239J+WXOjSJq8UJL4= =gMQX -----END PGP SIGNATURE----- Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two minor fixes in the hisi_sas driver which only impact enterprise style multi-expander and shared disk situations and no core changes" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: hisi_sas: Set a port invalid only if there are no devices attached when refreshing port id scsi: hisi_sas: Use abort task set to reset SAS disks when discovered	2023-01-14 07:57:25 -06:00
Linus Torvalds	34cbf89afc	ATA fixes for 6.2-rc4 A single fix for rc4 to prevent building the pata_cs5535 driver with user mode linux as it uses msr operations that are not defined with UML. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCY8HnowAKCRDdoc3SxdoY dmDWAPwKMUiDzFSkeD7zfKGwCd72HWUa1298yL+XnD8Y7vLBtgEAqhYi9fAnVnN0 dUHm12rEwFOay+lVwxWuQowFaVmzGAs= =RXSo -----END PGP SIGNATURE----- Merge tag 'ata-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ATA fix from Damien Le Moal: "A single fix to prevent building the pata_cs5535 driver with user mode linux as it uses msr operations that are not defined with UML" * tag 'ata-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: ata: pata_cs5535: Don't build on UML	2023-01-14 07:52:11 -06:00
Linus Torvalds	97ec4d559d	block-6.2-2023-01-13 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPBsFAQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpqgnEAC0OqxnMsOPNbkLO7k6FsSrG7ZoENkOIMCt Grk3D1cPkM13I0xc+WiaOBezMriPzfdXvt5AGDn9fd53Ih47qpSY4eU6pCqoCk5y HWdn8KXZvhJGZsSy0Nz+cfPDW/8diJON8YBpJwWM/DfDdP8XibtjlIMTVTtJab6h aGWjmy3leNfghOJ0cZ1wjL6maWFoowQASs52PZfajSc0mQ5X0i8BgQb1WOHNu89C vEir9PYlTmdMnYlAKLsyEL3KoGUPm++zSLtJeyWYavlCMGK5WTyNkzmeXqsQhAGf b1LjovQASe//1t2wvCzQviRf4cae0pE9JhiaYt2oxoDdHrfQj/WPndVS4yE9c+0O BnLVTCFHNv86TRXNCbEUzI+Ftj6m9qt4MrHz8YpstX7FxGxYC+T5RqTwYClWZQ0j llBuJUHj+kkAv6kBMJCHTyat6pxIDgcb52QMJr5mFWuEaTloraBIJC70hMtxBQV/ j5mrBYqCngCHVs+hAl9UQ4zqQVSvkeT11QFvwFolxIfs7qtfLqeGzYxvaeomqO3V sA+H5NY50OEuPfFFmCpcNUJXeUKg7wP39iNHdz6P5cCDBCfUwbNbgKKKNmBovaC+ KhPd8Xo1MmzDuF+cylvTcjOBDte4425GN7PBj4vP1xbuHYcjg6AEFLawgqE9Y4XX xyNlgJXPOg== =ujiw -----END PGP SIGNATURE----- Merge tag 'block-6.2-2023-01-13' of git://git.kernel.dk/linux Pull block fixes from Jens Axboe: "Nothing major in here, just a collection of NVMe fixes and dropping a wrong might_sleep() that static checkers tripped over but which isn't valid" * tag 'block-6.2-2023-01-13' of git://git.kernel.dk/linux: MAINTAINERS: stop nvme matching for nvmem files nvme: don't allow unprivileged passthrough on partitions nvme: replace the "bool vec" arguments with flags in the ioctl path nvme: remove __nvme_ioctl nvme-pci: fix error handling in nvme_pci_enable() nvme-pci: add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllers nvme-apple: add NVME_QUIRK_IDENTIFY_CNS quirk to fix regression block: Drop spurious might_sleep() from blk_put_queue()	2023-01-13 17:41:19 -06:00
Linus Torvalds	2ce7592df9	io_uring-6.2-2023-01-13 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPBsL8QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgplLXD/9kmbwSseR8sm8s6rB3mZcfgN4vvkRDa5Kg r5bj32sOl2o37szObGI53MneFcCpdOszA2R3AHqUoj7ZQsyPlt59OIpKtdBlp6xv ACX1weBnxMTtdMZ90Mp/GxGLZuvSifbfL4z5YgbRRKnGorz3prmfRkKuO51DcJes mhoPGTDUsAmxoU261LzuHI7DtD69We8yBzj58p21NF8DvBI3NtuWWhOI+EGs2CDr aqilG5LxOYBKuEY660KhWKMAXwVbVkyLak4eIh4iax0R0Um/SKXnRMIRI3L1QZoP eqGtF4zGJGM+47RUnl6GF59/IsyRZR04GJlZ5ma8aqqZNh4oj8E6A+hpgczJ0025 QM4hG+NwqXeNpMjN0PwDypo3WoYjyIhaMoEyCT7dmVzj62y3pm3DDaZDnzUlQF5w 8YvoSPwyCzWHWFDGEZp6WRYn2YfoiE+L48euknADmTL5FwUOs1J5OUK0v8BcAYLO tqJ8Q8emx15ZTzHeI+Z9lDYIuYKBU9XuO8ugfbss/Xx2tQMDeHe6BIaujhwam6c4 BWyAIViXgWKpIIDB3Emsu3lStc3PJ1WLbBdw4ja0nwhCRB7IeclzZf1IZHTT4xsQ eg5EK2QORLrlY9keCoFhqfT4guSYQptBlwPuxhHM2gcrxXegR6294hBoPVSd1be5 g5NK0rrSWA== =9C0J -----END PGP SIGNATURE----- Merge tag 'io_uring-6.2-2023-01-13' of git://git.kernel.dk/linux Pull io_uring fixes from Jens Axboe: "A fix for a regression that happened last week, rest is fixes that will be headed to stable as well. In detail: - Fix for a regression added with the leak fix from last week (me) - In writing a test case for that leak, inadvertently discovered a case where we a poll request can race. So fix that up and mark it for stable, and also ensure that fdinfo covers both the poll tables that we have. The latter was an oversight when the split poll table were added (me) - Fix for a lockdep reported issue with IOPOLL (Pavel)" * tag 'io_uring-6.2-2023-01-13' of git://git.kernel.dk/linux: io_uring: lock overflowing for IOPOLL io_uring/poll: attempt request issue after racy poll wakeup io_uring/fdinfo: include locked hash table in fdinfo output io_uring/poll: add hash if ready poll request can't complete inline io_uring/io-wq: only free worker if it was allocated for creation	2023-01-13 17:37:09 -06:00
Linus Torvalds	9e058c2952	pci-v6.2-fixes-1 -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmPBniAUHGJoZWxnYWFz QGdvb2dsZS5jb20ACgkQWYigwDrT+vwOjRAAhjyRAgyiZV2rWS4pyvpQpqcpZWD9 796ZSqnzLJjVYCymGvUTX23FEA48n59+bCM/WpfEGUPrBf8LZQxC9YOCm6ltuM8+ FoSBykW/tHPq5IWaLzgrWpHeDOgEnZu/WFGGvrV3tl1mLpM1SJT8bGDsjHXlo+FM qkTEiA3nUEKQs5x9r2TTLCeUWGPNTIHNd2VfuxOqM3qC/nVCOfTTxU8nm6Lk7Eix nboAugAIADJIjs/+ZGekLBuzZYPkLYuDTyMYJ5hdo1p7wWCLc9gArEqvXKwVgmD3 ptenZeOlQi9Ay45HmkfIgfgKeeQ7REJj3dx04vf67neAianyUrB0EZDqDjR7LmgM ozlNt0XjyoeEhu6AQS0s1LZtbDiED1R/00P6Gb+YEjUCVipW2lEYYwP0v9dsnNoh 6wblgnkQoxLFM+5CAXRmCmpaoQn0Uam7okfVeohtsz8/kNQF2St0hjzr4Dmws+O3 k9PUqnnUl4ByElzpEDesVGZMJ3pxFVH15ufu8VnRqN60pLTvNrsPyU4cVnG176Rc 3RSDN3zMtPxnHJVy4r3bTNEZsX/7RUrOb4xScOXMmRDBMUc8QdscF8Oj1ucKlj5j mp7vB/7+VjU96uRarRyqUxGeQc77DCTcvOa1IGh/cuYom8ZJ6vpSCpKy6f6SFGuf i8iTTUcQKCdqVW4= =Fv2v -----END PGP SIGNATURE----- Merge tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci fixes from Bjorn Helgaas: - Work around apparent firmware issue that made Linux reject MMCONFIG space, which broke PCI extended config space (Bjorn Helgaas) - Fix CONFIG_PCIE_BT1 dependency due to mid-air collision between a PCI_MSI_IRQ_DOMAIN -> PCI_MSI change and addition of PCIE_BT1 (Lukas Bulwahn) * tag 'pci-v6.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space x86/pci: Simplify is_mmconf_reserved() messages PCI: dwc: Adjust to recent removal of PCI_MSI_IRQ_DOMAIN	2023-01-13 17:32:22 -06:00
Sami Tolvanen	42633ed852	kbuild: Fix CFI hash randomization with KASAN Clang emits a asan.module_ctor constructor to each object file when KASAN is enabled, and these functions are indirectly called in do_ctors. With CONFIG_CFI_CLANG, the compiler also emits a CFI type hash before each address-taken global function so they can pass indirect call checks. However, in commit `0c3e806ec0` ("x86/cfi: Add boot time hash randomization"), x86 implemented boot time hash randomization, which relies on the .cfi_sites section generated by objtool. As objtool is run against vmlinux.o instead of individual object files with X86_KERNEL_IBT (enabled by default), CFI types in object files that are not part of vmlinux.o end up not being included in .cfi_sites, and thus won't get randomized and trip CFI when called. Only .vmlinux.export.o and init/version-timestamp.o are linked into vmlinux separately from vmlinux.o. As these files don't contain any functions, disable KASAN for both of them to avoid breaking hash randomization. Link: https://github.com/ClangBuiltLinux/linux/issues/1742 Fixes: `0c3e806ec0` ("x86/cfi: Add boot time hash randomization") Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230112224948.1479453-2-samitolvanen@google.com	2023-01-13 15:22:03 -08:00
Kees Cook	3b293487b8	firmware: coreboot: Check size of table entry and use flex-array The memcpy() of the data following a coreboot_table_entry couldn't be evaluated by the compiler under CONFIG_FORTIFY_SOURCE. To make it easier to reason about, add an explicit flexible array member to struct coreboot_device so the entire entry can be copied at once. Additionally, validate the sizes before copying. Avoids this run-time false positive warning: memcpy: detected field-spanning write (size 168) of single field "&device->entry" at drivers/firmware/google/coreboot_table.c:103 (size 8) Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Link: https://lore.kernel.org/all/03ae2704-8c30-f9f0-215b-7cdf4ad35a9a@molgen.mpg.de/ Cc: Jack Rosenthal <jrosenth@chromium.org> Cc: Guenter Roeck <groeck@chromium.org> Cc: Julius Werner <jwerner@chromium.org> Cc: Brian Norris <briannorris@chromium.org> Cc: Stephen Boyd <swboyd@chromium.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Julius Werner <jwerner@chromium.org> Reviewed-by: Guenter Roeck <groeck@chromium.org> Link: https://lore.kernel.org/r/20230107031406.gonna.761-kees@kernel.org Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Jack Rosenthal <jrosenth@chromium.org> Link: https://lore.kernel.org/r/20230112230312.give.446-kees@kernel.org	2023-01-13 15:22:03 -08:00
Nicholas Piggin	da35048f26	kallsyms: Fix scheduling with interrupts disabled in self-test kallsyms_on_each* may schedule so must not be called with interrupts disabled. The iteration function could disable interrupts, but this also changes lookup_symbol() to match the change to the other timing code. Reported-by: Erhard F. <erhard_f@mailbox.org> Link: https://lore.kernel.org/all/bug-216902-206035@https.bugzilla.kernel.org%2F/ Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202212251728.8d0872ff-oliver.sang@intel.com Fixes: `30f3bb0977` ("kallsyms: Add self-test facility") Tested-by: "Erhard F." <erhard_f@mailbox.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>	2023-01-13 15:09:08 -08:00
Peter Foley	22eebaa631	ata: pata_cs5535: Don't build on UML This driver uses MSR functions that aren't implemented under UML. Avoid building it to prevent tripping up allyesconfig. e.g. /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x3a3): undefined reference to `__tracepoint_read_msr' /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x3d2): undefined reference to `__tracepoint_write_msr' /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x457): undefined reference to `__tracepoint_write_msr' /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x481): undefined reference to `do_trace_write_msr' /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x4d5): undefined reference to `do_trace_write_msr' /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x4f5): undefined reference to `do_trace_read_msr' /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: pata_cs5535.c:(.text+0x51c): undefined reference to `do_trace_write_msr' Signed-off-by: Peter Foley <pefoley2@pefoley.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>	2023-01-14 07:38:48 +09:00
Linus Torvalds	92783a90bc	ARM: * Fix the PMCR_EL0 reset value after the PMU rework * Correctly handle S2 fault triggered by a S1 page table walk by not always classifying it as a write, as this breaks on R/O memslots * Document why we cannot exit with KVM_EXIT_MMIO when taking a write fault from a S1 PTW on a R/O memslot * Put the Apple M2 on the naughty list for not being able to correctly implement the vgic SEIS feature, just like the M1 before it * Reviewer updates: Alex is stepping down, replaced by Zenghui x86: * Fix various rare locking issues in Xen emulation and teach lockdep to detect them * Documentation improvements * Do not return host topology information from KVM_GET_SUPPORTED_CPUID -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmPAT3EUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroPmDAf+ICCVMwgm+PjAc6NuXzaUk6BFGWKF 1lzMvnKb6ARnhMKwyjl/Sf5EgnTuucnSTBHuE1kjaLkPUDNJvi4oRXVdDwKjtXnZ Zxk4dpsNLWVfALHTk1KweIkR5KNif0kugUh9RNp6zOBnoTVRh8XdCHpeDv73tJaG R1gCAreVTDbp+wNrVpiImUfYAZ4GrGpwwWRH/xLAGDWoTL9Z9J5tQygf+0C429n/ eJoTrToLjESbYadDgCNDD+TUkHbeDVg8aeio2JZga9SvH3RBhwriLqz26v9yvikL UoY96AySMaiox4pgCUYUl8nng8MR8AG4C4vpNnLalj7tfHxRfhtAwD0EYw== =gDOV -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "ARM: - Fix the PMCR_EL0 reset value after the PMU rework - Correctly handle S2 fault triggered by a S1 page table walk by not always classifying it as a write, as this breaks on R/O memslots - Document why we cannot exit with KVM_EXIT_MMIO when taking a write fault from a S1 PTW on a R/O memslot - Put the Apple M2 on the naughty list for not being able to correctly implement the vgic SEIS feature, just like the M1 before it - Reviewer updates: Alex is stepping down, replaced by Zenghui x86: - Fix various rare locking issues in Xen emulation and teach lockdep to detect them - Documentation improvements - Do not return host topology information from KVM_GET_SUPPORTED_CPUID" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/xen: Avoid deadlock by adding kvm->arch.xen.xen_lock leaf node lock KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule KVM: x86/xen: Fix potential deadlock in kvm_xen_update_runstate_guest() KVM: x86/xen: Fix lockdep warning on "recursive" gpc locking Documentation: kvm: fix SRCU locking order docs KVM: x86: Do not return host topology information from KVM_GET_SUPPORTED_CPUID KVM: nSVM: clarify recalc_intercepts() wrt CR8 MAINTAINERS: Remove myself as a KVM/arm64 reviewer MAINTAINERS: Add Zenghui Yu as a KVM/arm64 reviewer KVM: arm64: vgic: Add Apple M2 cpus to the list of broken SEIS implementations KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_* KVM: arm64: Document the behaviour of S1PTW faults on RO memslots KVM: arm64: Fix S1PTW handling on RO memslots KVM: arm64: PMU: Fix PMCR_EL0 reset value	2023-01-13 14:41:50 -06:00
Mateusz Guzik	f5fe24ef17	lockref: stop doing cpu_relax in the cmpxchg loop On the x86-64 architecture even a failing cmpxchg grants exclusive access to the cacheline, making it preferable to retry the failed op immediately instead of stalling with the pause instruction. To illustrate the impact, below are benchmark results obtained by running various will-it-scale tests on top of the 6.2-rc3 kernel and Cascade Lake (2 sockets * 24 cores * 2 threads) CPU. All results in ops/s. Note there is some variance in re-runs, but the code is consistently faster when contention is present. open3 ("Same file open/close"): proc stock no-pause 1 805603 814942 (+%1) 2 1054980 1054781 (-0%) 8 1544802 1822858 (+18%) 24 1191064 2199665 (+84%) 48 851582 1469860 (+72%) 96 609481 1427170 (+134%) fstat2 ("Same file fstat"): proc stock no-pause 1 3013872 3047636 (+1%) 2 4284687 4400421 (+2%) 8 3257721 5530156 (+69%) 24 2239819 5466127 (+144%) 48 1701072 5256609 (+209%) 96 `1269157` 6649326 (+423%) Additionally, a kernel with a private patch to help access() scalability: access2 ("Same file access"): proc stock patched patched +nopause 24 2378041 2005501 5370335 (-15% / +125%) That is, fixing the problems in access itself reduces scalability after the cacheline ping-pong only happens in lockref with the pause instruction. Note that fstat and access benchmarks are not currently integrated into will-it-scale, but interested parties can find them in pull requests to said project. Code at hand has a rather tortured history. First modification showed up in commit `d472d9d98b` ("lockref: Relax in cmpxchg loop"), written with Itanium in mind. Later it got patched up to use an arch-dependent macro to stop doing it on s390 where it caused a significant regression. Said macro had undergone revisions and was ultimately eliminated later, going back to cpu_relax. While I intended to only remove cpu_relax for x86-64, I got the following comment from Linus: I would actually prefer just removing it entirely and see if somebody else hollers. You have the numbers to prove it hurts on real hardware, and I don't think we have any numbers to the contrary. So I think it's better to trust the numbers and remove it as a failure, than say "let's just remove it on x86-64 and leave everybody else with the potentially broken code" Additionally, Will Deacon (maintainer of the arm64 port, one of the architectures previously benchmarked): So, from the arm64 side of the fence, I'm perfectly happy just removing the cpu_relax() calls from lockref. As such, come back full circle in history and whack it altogether. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/all/CAGudoHHx0Nqg6DE70zAVA75eV-HXfWyhVMWZ-aSeOofkA_=WdA@mail.gmail.com/ Acked-by: Tony Luck <tony.luck@intel.com> # ia64 Acked-by: Nicholas Piggin <npiggin@gmail.com> # powerpc Acked-by: Will Deacon <will@kernel.org> # arm64 Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2023-01-13 14:35:38 -06:00
Bjorn Helgaas	fd3a8cff4d	x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space Normally we reject ECAM space unless it is reported as reserved in the E820 table or via a PNP0C02 _CRS method (PCI Firmware, r3.3, sec 4.1.2). `07eab0901e` ("efi/x86: Remove EfiMemoryMappedIO from E820 map"), removes E820 entries that correspond to EfiMemoryMappedIO regions because some other firmware uses EfiMemoryMappedIO for PCI host bridge windows, and the E820 entries prevent Linux from allocating BAR space for hot-added devices. Some firmware doesn't report ECAM space via PNP0C02 _CRS methods, but does mention it as an EfiMemoryMappedIO region via EFI GetMemoryMap(), which is normally converted to an E820 entry by a bootloader or EFI stub. After `07eab0901e`, that E820 entry is removed, so we reject this ECAM space, which makes PCI extended config space (offsets 0x100-0xfff) inaccessible. The lack of extended config space breaks anything that relies on it, including perf, VSEC telemetry, EDAC, QAT, SR-IOV, etc. Allow use of ECAM for extended config space when the region is covered by an EfiMemoryMappedIO region, even if it's not included in E820 or PNP0C02 _CRS. Link: https://lore.kernel.org/r/ac2693d8-8ba3-72e0-5b66-b3ae008d539d@linux.intel.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=216891 Fixes: `07eab0901e` ("efi/x86: Remove EfiMemoryMappedIO from E820 map") Link: https://lore.kernel.org/r/20230110180243.1590045-3-helgaas@kernel.org Reported-by: Kan Liang <kan.liang@linux.intel.com> Reported-by: Tony Luck <tony.luck@intel.com> Reported-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reported-by: Yunying Sun <yunying.sun@intel.com> Reported-by: Baowen Zheng <baowen.zheng@corigine.com> Reported-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reported-by: Yang Lixiao <lixiao.yang@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: Yunying Sun <yunying.sun@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>	2023-01-13 11:53:54 -06:00
Linus Torvalds	0bf913e07b	First batch of EFI fixes for v6.2: - avoid a potential crash on the efi_subsys_init() error path - use more appropriate error code for runtime services calls issued after a crash in the firmware occurred - avoid READ_ONCE() for accessing firmware tables that may appear misaligned in memory -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmPBg68ACgkQw08iOZLZ jyQs5Qv+PVg06BhEqN+vwNQy6vd4ezTxmDAy7yx751mo3HIw0qT0ohsCIpRydq0c +qlCXa+Uu/yr/IQplfDT9vY+MEwD9iuwJha8ltGRWM3++yEF4uQXowHDoEKsO84l 5PaC37EfOvHmV6UdFdIF0OYDOcRvX2FsIbmUKRyvIav1e+QRLvUWWKKEmAh04c7G yNc0837kmoOpjKrYPc8j2n3dVUbhrFUW5eLIFmd8yrR+GRu6Ae5RH3J7iF7Nqtrq oReYYq3XpmYg8c00WV0NKVuB0DK7fhGY7jcbDfLmTrPwqVzLjxQGecxsQPYnqrJd mZywkm2fM8KIJy2LQDJOVOZaDAzaC2SkrpELHX/MnPK1UrP561AIv/sXK+3+UBEm b6m5dHbJgaifKP3kkbc9Cy4f9avLJOdjdXH5f5zPe7it54yHLsacEvjT6M2oiunx zIvTd/MXi24J+tzgxr08KM5wHXgLGh+fUM7BfZTvEVQmUjY8TnIPjsaAJhTS3jzV TN3/XAWi =4LbF -----END PGP SIGNATURE----- Merge tag 'efi-fixes-for-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - avoid a potential crash on the efi_subsys_init() error path - use more appropriate error code for runtime services calls issued after a crash in the firmware occurred - avoid READ_ONCE() for accessing firmware tables that may appear misaligned in memory * tag 'efi-fixes-for-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi: tpm: Avoid READ_ONCE() for accessing the event log efi: rt-wrapper: Add missing include efi: fix userspace infinite retry read efivars after EFI runtime services page fault efi: fix NULL-deref in init error path	2023-01-13 10:37:10 -06:00
Linus Torvalds	40d92fc4fa	Three documentation fixes (or rather two and one warning): - Sphinx 6.0 broke our configuration mechanism, so fix it. - I broke our configuration for non-Alabaster themes; Akira fixed it. - Deprecate Sphinx < 2.4 with an eye toward future removal -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmPBhWwPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5Y+i8IAJCd8qgopxIcmzif8ncsrZFIdk3FBd4INCU3 gEr5IBQN10Fm3es8FcWQPhX8nqFzlyG9GyjNSEfZpKYF9y3zWx9l5xOD0f6Ki6F4 HEaBcP11zQCSbrdZMR2in7fW+SNqIjJ+srDLrLkG2d4il6IbbSwx121pjPxJgHkK Y4Sj4Aa3fm5m5JzqArc8/IQRl6ewrfCuGXGh2RdunMzdf22Q2vMdIzEfhyilV1Cg FSXttLDTq5huRSBv8PYaMJnpx2mMn+si8c5mFcNV6oDP+VG4m2rBw4kYQk6q0rU2 xJFnbh7oThKyQ955k+sxJYoSxq9Fd5lXX/3d+HtqSvvC/WAP8gY= =ttzZ -----END PGP SIGNATURE----- Merge tag 'docs-6.2-fixes' of git://git.lwn.net/linux Pull documentation fixes from Jonathan Corbet: "Three documentation fixes (or rather two and one warning): - Sphinx 6.0 broke our configuration mechanism, so fix it - I broke our configuration for non-Alabaster themes; Akira fixed it - Deprecate Sphinx < 2.4 with an eye toward future removal" * tag 'docs-6.2-fixes' of git://git.lwn.net/linux: docs/conf.py: Use about.html only in sidebar of alabaster theme docs: Deprecate use of Sphinx < 2.4.x docs: Fix the docs build with Sphinx 6.0	2023-01-13 10:35:26 -06:00
Ard Biesheuvel	d3f450533b	efi: tpm: Avoid READ_ONCE() for accessing the event log Nathan reports that recent kernels built with LTO will crash when doing EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a misaligned load from the TPM event log, which is annotated with READ_ONCE(), and under LTO, this gets translated into a LDAR instruction which does not tolerate misaligned accesses. Interestingly, this does not happen when booting the same kernel straight from the UEFI shell, and so the fact that the event log may appear misaligned in memory may be caused by a bug in GRUB or SHIM. However, using READ_ONCE() to access firmware tables is slightly unusual in any case, and here, we only need to ensure that 'event' is not dereferenced again after it gets unmapped, but this is already taken care of by the implicit barrier() semantics of the early_memunmap() call. Cc: <stable@vger.kernel.org> Cc: Peter Jones <pjones@redhat.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://github.com/ClangBuiltLinux/linux/issues/1782 Signed-off-by: Ard Biesheuvel <ardb@kernel.org>	2023-01-13 17:15:17 +01:00
Pavel Begunkov	544d163d65	io_uring: lock overflowing for IOPOLL syzbot reports an issue with overflow filling for IOPOLL: WARNING: CPU: 0 PID: 28 at io_uring/io_uring.c:734 io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734 CPU: 0 PID: 28 Comm: kworker/u4:1 Not tainted 6.2.0-rc3-syzkaller-16369-g358a161a6a9e #0 Workqueue: events_unbound io_ring_exit_work Call trace: io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734 io_req_cqe_overflow+0x5c/0x70 io_uring/io_uring.c:773 io_fill_cqe_req io_uring/io_uring.h:168 [inline] io_do_iopoll+0x474/0x62c io_uring/rw.c:1065 io_iopoll_try_reap_events+0x6c/0x108 io_uring/io_uring.c:1513 io_uring_try_cancel_requests+0x13c/0x258 io_uring/io_uring.c:3056 io_ring_exit_work+0xec/0x390 io_uring/io_uring.c:2869 process_one_work+0x2d8/0x504 kernel/workqueue.c:2289 worker_thread+0x340/0x610 kernel/workqueue.c:2436 kthread+0x12c/0x158 kernel/kthread.c:376 ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:863 There is no real problem for normal IOPOLL as flush is also called with uring_lock taken, but it's getting more complicated for IOPOLL\|SQPOLL, for which __io_cqring_overflow_flush() happens from the CQ waiting path. Reported-and-tested-by: syzbot+6805087452d72929404e@syzkaller.appspotmail.com Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-01-13 07:32:46 -07:00
Linus Torvalds	689968db7b	sound fixes for 6.2-rc4 This became a slightly big update, but it's more or less expected, as the first batch after holidays. All changes (but for the last two last-minute fixes) have been stewed in linux-next long enough, so it's fairly safe to take. - PCM UAF fix in 32bit compat layer - ASoC board-specific fixes for Intel, AMD, Medathek, Qualcomm - SOF power management fixes - ASoC Intel link failure fixes - A series of fixes for USB-audio regressions - CS35L41 HD-audio codec regression fixes - HD-audio device-specific fixes / quirks Note that one SPI patch has been taken in ASoC subtree mistakenly, and the same fix is found in spi tree, but it should be OK to apply. -----BEGIN PGP SIGNATURE----- iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmPBZIsOHHRpd2FpQHN1 c2UuZGUACgkQLtJE4w1nLE/LEQ//XLbjI0EnjdpBgYRMocykaI2Y3n+72rncfTwQ FX8mtGtVpBsOp6LCQhJX/0A5B+NcBRk1atyJRsuaAuWxC298jZPMQaSBjlHiP56f 3IcV67gBM5Ve3JwttjF3AtveQPHe+pYh0RRy9HHd4qEEPpO+JdTwHbex+5Xnf3Qq jLDPwtrC5fd/Q38mfs881yoRLmnq3OppHJ630SklJBmdjuvVjHhXinOaDY5KBR42 /ewf946RKJFUpz0w72Hl4Kaw6kQ1HWb2kef+RZmdSp49EmUXyVmi9mxSSBJlDV0f QueZlvcZj+Hm6cAw1Kt6pSpShDJO/LwuCuRTaiprBYq+SbCSxaq2mWz0vzyGx8un U9QmLK4FRtPsZm3SPfVfDpROXc7ovydMOvOzFF6uRou/UdJZLnQYXmFTV5gphREa jSaekWqQhrV53LTYBFn38PDfq+xLEeBx7VsYBOLUHyR7QX2rdpRpLc9XJkgLy5tJ xozwrQFNdAvynfPGwBPBUAuWQOvRnuj0W9QIUmA4U2mxwvCbNGJETkI46/Wzixx0 C7n6JA6bc8rBsWHWXefGaabUaCqnBrPKxVqThYLJy05FUh58SP85Xo+k780CRQT7 tLMw2P7sJ1KKlvrVrmCqA2DlsraRT2pBcQEghKmfgded31/3PxAylhPuNO+E1QxP Eq2uo2g= =PKWN -----END PGP SIGNATURE----- Merge tag 'sound-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This became a slightly big update, but it's more or less expected, as the first batch after holidays. All changes (but for the last two last-minute fixes) have been stewed in linux-next long enough, so it's fairly safe to take: - PCM UAF fix in 32bit compat layer - ASoC board-specific fixes for Intel, AMD, Medathek, Qualcomm - SOF power management fixes - ASoC Intel link failure fixes - A series of fixes for USB-audio regressions - CS35L41 HD-audio codec regression fixes - HD-audio device-specific fixes / quirks Note that one SPI patch has been taken in ASoC subtree mistakenly, and the same fix is found in spi tree, but it should be OK to apply" * tag 'sound-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (39 commits) ALSA: pcm: Move rwsem lock inside snd_ctl_elem_read to prevent UAF ALSA: usb-audio: Fix possible NULL pointer dereference in snd_usb_pcm_has_fixed_rate() ALSA: hda/realtek: Enable mute/micmute LEDs on HP Spectre x360 13-aw0xxx ASoC: fsl-asoc-card: Fix naming of AC'97 CODEC widgets ASoC: fsl_ssi: Rename AC'97 streams to avoid collisions with AC'97 CODEC ALSA: hda/hdmi: Add a HP device 0x8715 to force connect list ALSA: control-led: use strscpy in set_led_id() ALSA: usb-audio: Always initialize fixed_rate in snd_usb_find_implicit_fb_sync_format() ASoC: dt-bindings: qcom,lpass-tx-macro: correct clocks on SC7280 ASoC: dt-bindings: qcom,lpass-wsa-macro: correct clocks on SM8250 ASoC: qcom: Fix building APQ8016 machine driver without SOUNDWIRE ALSA: hda: cs35l41: Check runtime suspend capability at runtime_idle ALSA: hda: cs35l41: Don't return -EINVAL from system suspend/resume ASoC: fsl_micfil: Correct the number of steps on SX controls ALSA: hda/realtek: fix mute/micmute LEDs don't work for a HP platform Revert "ALSA: usb-audio: Drop superfluous interface setup at parsing" ALSA: usb-audio: More refactoring of hw constraint rules ALSA: usb-audio: Relax hw constraints for implicit fb sync ALSA: usb-audio: Make sure to stop endpoints before closing EPs ALSA: hda - Enable headset mic on another Dell laptop with ALC3254 ...	2023-01-13 08:20:29 -06:00
Linus Torvalds	d863f0539b	Power management fixes for 6.2-rc4 - Fix cpufreq policy reference counting in amd-pstate to prevent it from crashing on removal (Perry Yuan). - Fix double initialization and set suspend-freq for Apple's cpufreq driver (Arnd Bergmann, Hector Martin). - Fix reading of "reg" property, update cpufreq-dt's blocklist and update DT documentation for Qualcomm's cpufreq driver (Konrad Dybcio, Krzysztof Kozlowski). - Replace 0 with NULL in the Armada cpufreq driver (Miles Chen). - Fix potential overflows in the CPPC cpufreq driver (Pierre Gondois). - Update blocklist for the Tegra234 Soc cpufreq driver (Sumit Gupta). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmPBNrMSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxk9kP+waNWD/1Nuh8owqkTJduMV6uh46rOMdH EO1siHGFxyfQn6IW6ch4wOwcMEjdk0VnRgcq5/sI1EVIE66aERBM0og3Se+g3RCg nhYm3ZAhHOisHB3adhUVvdkHvD43iwuE1DedHSKI+TWDO5DmZeDD92/iCmXJ0esK 4mBAGHYX4Kr3aJNDvpMX6Q9VfLrBJ5+o3XM6PrKCk+tx1Aq7dH1M/iV/oLlM+tPz OJus5KAwJGR0U+VXlIwu1d9CFyvynND15mfSl57G8RCxluhJl59+W4Vfs2ti/Pqo Vf553jcftEBJB/aey5QwdEgJa3zkJYJZRg698NadF+WmlEZR1xGA0n4R7VSFVU88 xE7iSed4fIGfWTP01G72ZruF9Z7A6KIyvzq3vyhcl01yb7vsF1WC21qG/YGsPkRQ i7JrLpbKYU+94rsfNEY2dTxYAL5Qs2cWld0S5iXxZVPixPeOeOtm56ci/dwPcuSA FEeXGolStd1hwM70i4LYlcQ2RkcvJcEI/SF6jJpaiawXWdZihlQYHEq8/uNUsG5y 6gEfMUd1H7aershUBylqlcgMCpQIYHctI1M4hSePjhNy91Q1Ajuy+RCzL2YEtuwE /NdPsVPHmbrHvHdQkbH3ntdj+S40Tnn9o8ytZit4iSGW1OIqqNexhqEb6NR82YGb lx98QCiCw48e =j/pc -----END PGP SIGNATURE----- Merge tag 'pm-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix assorted issues in the ARM cpufreq drivers and in the AMD P-state driver. Specifics: - Fix cpufreq policy reference counting in amd-pstate to prevent it from crashing on removal (Perry Yuan) - Fix double initialization and set suspend-freq for Apple's cpufreq driver (Arnd Bergmann, Hector Martin) - Fix reading of "reg" property, update cpufreq-dt's blocklist and update DT documentation for Qualcomm's cpufreq driver (Konrad Dybcio, Krzysztof Kozlowski) - Replace 0 with NULL in the Armada cpufreq driver (Miles Chen) - Fix potential overflows in the CPPC cpufreq driver (Pierre Gondois) - Update blocklist for the Tegra234 Soc cpufreq driver (Sumit Gupta)" * tag 'pm-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: amd-pstate: fix kernel hang issue while amd-pstate unregistering cpufreq: armada-37xx: stop using 0 as NULL pointer cpufreq: apple-soc: Switch to the lowest frequency on suspend dt-bindings: cpufreq: cpufreq-qcom-hw: document interrupts cpufreq: Add SM6375 to cpufreq-dt-platdev blocklist cpufreq: Add Tegra234 to cpufreq-dt-platdev blocklist cpufreq: qcom-hw: Fix reading "reg" with address/size-cells != 2 cpufreq: CPPC: Add u64 casts to avoid overflowing cpufreq: apple: remove duplicate intializer	2023-01-13 07:38:14 -06:00
Linus Torvalds	cdbbca256c	ACPI fixes for 6.2-rc4 - Improve ACPI companion lookup for backlight devices in the cases when there is more than one candidate ACPI device object (Hans de Goede). - Add missing support for manual selection of NVidia-WMI-EC or Apple GMUX backlight in the kernel command line to the ACPI backlight driver (Hans de Goede). - Skip ACPI IRQ override on Asus Expertbook B2402CBA (Tamim Khan). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmPBNbISHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxixkP/3aIp3UZCjMwRqv0KtFGB3+bLiIcrVVb WILER6FwaC9zy1fts2oa23nDT6l+DRdr6nqYA4K4557gK0pETJr/UK6uAFNOVFbA fiVe3saGeHTOd+Oq8xcchX/sMSMneYwJwkByVaYwixLVvXSb4XNHAew5y0L3diod xig1OreiPtEX8FvLCg6GNWBf9ULq1xiRPz2NMfx/FmkmvpQuFNVwlkrO7RJQMiCD 0uIwNqvM4wCgHyi7dwV6XsPebgpEJmwz4PaDkJ8FhpzARqes00Ua6wiBnhptuOsd J1G1MEFeH095+xeSivnLesYhwrwiFGvLAq9pKIJYnuiRz80m2Y7ekhUuP1APPN3Z 5e+o2KqlDYOBZxD1tfDuJiaBwP+nzqKNseaYDeBR+Anj4lPnolCq/XQ4tdR1P+0o hL8Xo8JWzN17r8AewLYN+pLwUceFrv7IL8pkSb+JF4UyZsLqmpYbdsulVSh8BDLt g36lrABcZYqeFLiQD/ep3F39MTMNtkN3F1N6GUq0KHdyh/2f8LUWIpBQD35udaeb GSamDfCIevSBDc6lk9kIvIrsh+uaB+Tk9ziMMZLIm4wB9ZD5ugYov7GrKo9cQuhi 3tIhjwScsGAfRMcvzo4zKkjxq1+nffHL6HJs4y15xsDnxgd09VjNNseBU+8D1y1F 2lcIUduQWUyd =bDXc -----END PGP SIGNATURE----- Merge tag 'acpi-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These add one more ACPI IRQ override quirk, improve ACPI companion lookup for backlight devices and add missing kernel command line option values for backlight detection. Specifics: - Improve ACPI companion lookup for backlight devices in the cases when there is more than one candidate ACPI device object (Hans de Goede) - Add missing support for manual selection of NVidia-WMI-EC or Apple GMUX backlight in the kernel command line to the ACPI backlight driver (Hans de Goede) - Skip ACPI IRQ override on Asus Expertbook B2402CBA (Tamim Khan)" * tag 'acpi-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: Fix selecting wrong ACPI fwnode for the iGPU on some Dell laptops ACPI: video: Allow selecting NVidia-WMI-EC or Apple GMUX backlight from the cmdline ACPI: resource: Skip IRQ override on Asus Expertbook B2402CBA	2023-01-13 07:32:55 -06:00
Linus Torvalds	0d0833e039	platform-drivers-x86 for v6.2-2 A small set of assorted fixes and hardware-id additions for 6.2. The following is an automated git shortlog grouped by driver: asus-nb-wmi: - Add alternate mapping for KEY_SCREENLOCK - Add alternate mapping for KEY_CAMERA asus-wmi: - Don't load fan curves without fan - Ignore fan on E410MA - Add quirk wmi_ignore_fan dell-privacy: - Only register SW_CAMERA_LENS_COVER if present - Fix SW_CAMERA_LENS_COVER reporting ideapad-laptop: - Add Legion 5 15ARH05 DMI id to set_fn_lock_led_list[] int3472/discrete: - Ensure the clk/power enable pins are in output mode intel/pmc/core: - Add Meteor Lake mobile support platform/surface: - aggregator: Add missing call to ssam_request_sync_free() - aggregator: Ignore command messages not intended for us platform/x86/amd: - Fix refcount leak in amd_pmc_probe simatic-ipc: - add another model - correct name of a model sony-laptop: - Don't turn off 0x153 keyboard backlight during probe thinkpad_acpi: - Fix profile mode display in AMT mode touchscreen_dmi: - Add info for the CSL Panther Tab HD -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmPBN78UHGhkZWdvZWRl QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9yQ7AgAuKK+TDlru+rup5PSvUBiRddYX8VI U+cJokT9sp748zau+S7zy+1PDYtAnaXbV6wf6/YwANq6Pw9aI9MCMFyc2iXzIDCW fp6d8xvow5XuWG/cK3rggl3WxzInyE2rcSI5epQPV9ylZSOPSPI8CKug/68I2L7W kohws/18ujOU4J5Y8ATH1jY3t8Zx+uA7sdU/Oo6hiA4Xen1qrABCSgcGgWNqxfqb C6tk1kF5agLmvR5I7Y0bDh1EHeN1CALPjl8MibEyYFldASLxmCYogx4bGDQBf0Qm XFZ5MxLdFbHDFXiyaKh+RNW2uHzbJV3rXYVOyUy2eXahBRGj+yoFwDK8Zw== =tB+M -----END PGP SIGNATURE----- Merge tag 'platform-drivers-x86-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: "A set of assorted fixes and hardware-id additions" * tag 'platform-drivers-x86-v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: thinkpad_acpi: Fix profile mode display in AMT mode platform/x86: int3472/discrete: Ensure the clk/power enable pins are in output mode platform/x86/amd: Fix refcount leak in amd_pmc_probe platform/x86: intel/pmc/core: Add Meteor Lake mobile support platform/x86: simatic-ipc: add another model platform/x86: simatic-ipc: correct name of a model platform/x86: dell-privacy: Only register SW_CAMERA_LENS_COVER if present platform/x86: dell-privacy: Fix SW_CAMERA_LENS_COVER reporting platform/x86: asus-wmi: Don't load fan curves without fan platform/x86: asus-wmi: Ignore fan on E410MA platform/x86: asus-wmi: Add quirk wmi_ignore_fan platform/x86: asus-nb-wmi: Add alternate mapping for KEY_SCREENLOCK platform/x86: asus-nb-wmi: Add alternate mapping for KEY_CAMERA platform/surface: aggregator: Add missing call to ssam_request_sync_free() platform/surface: aggregator: Ignore command messages not intended for us platform/x86: touchscreen_dmi: Add info for the CSL Panther Tab HD platform/x86: ideapad-laptop: Add Legion 5 15ARH05 DMI id to set_fn_lock_led_list[] platform/x86: sony-laptop: Don't turn off 0x153 keyboard backlight during probe	2023-01-13 07:26:40 -06:00
Linus Torvalds	ff5ebafd51	drm fixes for 6.2-rc4 buddy: - benchmark regression fix for top-down buddy allocation panel: - add Lenovo panel orientation quirk ttm: - fix kernel oops regression amdgpu: - fix missing fence references - fix missing pipeline sync fencing - SMU13 fan speed fix - SMU13 fix power cap handling - SMU13 BACO fix - Fix a possible segfault in bo validation error case - Delay removal of firmware framebuffer - Fix error when unloading amdkfd: - SVM fix when clearing vram - GC11 fix for multi-GPU i915: - Reserve enough fence slot for i915_vma_unbind_vsync - Fix potential use after free - Reset engines twice in case of reset failure - Use multi-cast registers for SVG Unit registers msm: - display: - doc warning fixes - dt attribs cleanups - memory leak fix - error handing in hdmi probe fix - dp_aux_isr incorrect signalling fix - shutdown path fix - accel: - a5xx: fix quirks to be a bitmask - a6xx: fix gx halt to avoid 1s hang - kexec shutdown fix - fix potential double free vmwgfx: - drop rcu usage to make code more robust virtio: - fix use-after-free in gem handle code nouveau: - drop unused nouveau_fbcon.c -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEEKbZHaGwW9KfbeusDHTzWXnEhr4FAmPA5skACgkQDHTzWXnE hr69GA/8CgYwN/4vWDiQ+p4rX6muw0gicxKmfJZLxt8BFnQSSjESWZ6/L201JeZT dl34SdZ++rx8yYLeoJKHKocyvMIj9goHeGdNkeyhaoZvlfPVf0sAMNjYfysIvcGk s4HoLoXgzH8SCO2dp2MgOstJNncZNSZrH13b3UkgqQuB+VrL+pGx3qJM7z9Khe1j vtgpBStgFIlkvDYHuTJDsn0X4E543EBs54U4g/Jc3WcnQwtRycUhXmkFDOtwkM/d bwghisms6P+OvSAMdU2JWDwYLe/87zeXklKZqJzWpbrcB1iZPF/L2B40CuSXidj+ cXjNbWlwm0Yn6ytHMXp7+3bV6VTvDFmI+1uabXVH0wn8UIxX9WoKJeW7JgYnveXU FG4Un/PhILeRxZa3jRNJVhPPq4JWjoINJvVmwSMMZdKT9x5MvdfHy+gsCpP6Ojjy ++MjslROZE0ciYPmwG2WPsmwylV00aztdIcNHzXZp4tX79hGw6cOfFjH9rfUUqJv W52WVQnJ+JHv3BgFCyqXReUdSkXT39J3c54L1E9rK+OpVvc1i2gEN+eTVoNp1Vwn 4Gyb8MPKj//NxaUNwpEDfBRp6scd553xz5K+0SPBNB+G/XnRxoPT8jz0/ivpYGDd WB73KZbvxRz2vzyy+biuEctyVTDlKDNM3UADW83eFspxHNthX28= =OCT2 -----END PGP SIGNATURE----- Merge tag 'drm-fixes-2023-01-13' of git://anongit.freedesktop.org/drm/drm Pull drm fixes from Dave Airlie: "There is a bit of a post-holiday build up here I expect, small fixes across the board, amdgpu and msm being the main leaders, with others having a few. One code removal patch for nouveau: buddy: - benchmark regression fix for top-down buddy allocation panel: - add Lenovo panel orientation quirk ttm: - fix kernel oops regression amdgpu: - fix missing fence references - fix missing pipeline sync fencing - SMU13 fan speed fix - SMU13 fix power cap handling - SMU13 BACO fix - Fix a possible segfault in bo validation error case - Delay removal of firmware framebuffer - Fix error when unloading amdkfd: - SVM fix when clearing vram - GC11 fix for multi-GPU i915: - Reserve enough fence slot for i915_vma_unbind_vsync - Fix potential use after free - Reset engines twice in case of reset failure - Use multi-cast registers for SVG Unit registers msm: - display: - doc warning fixes - dt attribs cleanups - memory leak fix - error handing in hdmi probe fix - dp_aux_isr incorrect signalling fix - shutdown path fix - accel: - a5xx: fix quirks to be a bitmask - a6xx: fix gx halt to avoid 1s hang - kexec shutdown fix - fix potential double free vmwgfx: - drop rcu usage to make code more robust virtio: - fix use-after-free in gem handle code nouveau: - drop unused nouveau_fbcon.c" * tag 'drm-fixes-2023-01-13' of git://anongit.freedesktop.org/drm/drm: (35 commits) drm: Optimize drm buddy top-down allocation method drm/ttm: Fix a regression causing kernel oops'es drm/i915/gt: Cover rest of SVG unit MCR registers drm/nouveau: Remove file nouveau_fbcon.c drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPU drm/amd/pm/smu13: BACO is supported when it's in BACO state drm/amdkfd: Add sync after creating vram bo drm/i915/gt: Reset twice drm/amdgpu: fix pipeline sync v2 drm/vmwgfx: Remove rcu locks from user resources drm/virtio: Fix GEM handle creation UAF drm/amdgpu: Fixed bug on error when unloading amdgpu drm/amd: Delay removal of the firmware framebuffer drm/amdgpu: Fix potential NULL dereference drm/i915: Fix potential context UAFs drm/i915: Reserve enough fence slot for i915_vma_unbind_async drm: Add orientation quirk for Lenovo ideapad D330-10IGL drm/msm/a6xx: Avoid gx gbit halt during rpm suspend drm/msm/adreno: Make adreno quirks not overwrite each other drm/msm: another fix for the headless Adreno GPU ...	2023-01-13 07:18:59 -06:00
Clement Lecigne	56b88b5056	ALSA: pcm: Move rwsem lock inside snd_ctl_elem_read to prevent UAF Takes rwsem lock inside snd_ctl_elem_read instead of snd_ctl_elem_read_user like it was done for write in commit `1fa4445f9a` ("ALSA: control - introduce snd_ctl_notify_one() helper"). Doing this way we are also fixing the following locking issue happening in the compat path which can be easily triggered and turned into an use-after-free. 64-bits: snd_ctl_ioctl snd_ctl_elem_read_user [takes controls_rwsem] snd_ctl_elem_read [lock properly held, all good] [drops controls_rwsem] 32-bits: snd_ctl_ioctl_compat snd_ctl_elem_write_read_compat ctl_elem_write_read snd_ctl_elem_read [missing lock, not good] CVE-2023-0266 was assigned for this issue. Cc: stable@kernel.org # 5.13+ Signed-off-by: Clement Lecigne <clecigne@google.com> Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20230113120745.25464-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>	2023-01-13 14:15:26 +01:00
Linus Torvalds	d45b832d6f	arm64 fixes for -rc4 - Fix PAGE_TABLE_CHECK failures on hugepage splitting path - Fix PSCI encoding of MEM_PROTECT_RANGE function in UAPI header - Fix NULL deref when accessing debugfs node if PSCI is not present - Fix MTE core dumping when VMA list is being updated concurrently - Fix SME signal frame handling when SVE is not implemented by the CPU - Fix asm constraints for cmpxchg_double() to hazard both words - Fix build failure with stack tracer and older versions of Clang - Bring back workaround for Cortex-A715 erratum 2645198 -----BEGIN PGP SIGNATURE----- iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmO9SzwQHHdpbGxAa2Vy bmVsLm9yZwAKCRC3rHDchMFjNLdYB/9pX4El38TX4Y4M6sR2yl+m1rkGRiU4nV3N MKJ3ZVjrx87QZ8CKVYmJbnHzolN0Art9WvqFnyxtPMBlZyWzHjtsrQnad3VwLDOu 4qmqjDCXvPod1EncCxBiGu28FZ88HoLqhnwWB6O2Su6TlczD0kJTfzincdyzqvi2 r0uUlBd9gtFt3sjV+sLPjE6NqMf9MfhoOLLafijz7ZMElQL+2/BjZxhpHLaWhUz1 aHIp4w841TJOuSlCwstX20Nc6Q9+6ta07bw+TD/flyQ+IGUptgDEoIrpjdSO5b2t zFFHHN5IXovAJPDfhAdXGAbC2SDFyYJtURCpv6hVt/SSsilGEbYg =241k -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "Here's a sizeable batch of Friday the 13th arm64 fixes for -rc4. What could possibly go wrong? The obvious reason we have so much here is because of the holiday season right after the merge window, but we've also brought back an erratum workaround that was previously dropped at the last minute and there's an MTE coredumping fix that strays outside of the arch/arm64 directory. Summary: - Fix PAGE_TABLE_CHECK failures on hugepage splitting path - Fix PSCI encoding of MEM_PROTECT_RANGE function in UAPI header - Fix NULL deref when accessing debugfs node if PSCI is not present - Fix MTE core dumping when VMA list is being updated concurrently - Fix SME signal frame handling when SVE is not implemented by the CPU - Fix asm constraints for cmpxchg_double() to hazard both words - Fix build failure with stack tracer and older versions of Clang - Bring back workaround for Cortex-A715 erratum 2645198" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix build with CC=clang, CONFIG_FTRACE=y and CONFIG_STACK_TRACER=y arm64/mm: Define dummy pud_user_exec() when using 2-level page-table arm64: errata: Workaround possible Cortex-A715 [ESR\|FAR]_ELx corruption firmware/psci: Don't register with debugfs if PSCI isn't available firmware/psci: Fix MEM_PROTECT_RANGE function numbers arm64/signal: Always allocate SVE signal frames on SME only systems arm64/signal: Always accept SVE signal frames on SME only systems arm64/sme: Fix context switch for SME only systems arm64: cmpxchg_double*: hazard against entire exchange variable arm64/uprobes: change the uprobe_opcode_t typedef to fix the sparse warning arm64: mte: Avoid the racy walk of the vma list during core dump elfcore: Add a cprm parameter to elf_core_extra_{phdrs,data_size} arm64: mte: Fix double-freeing of the temporary tag storage during coredump arm64: ptrace: Use ARM64_SME to guard the SME register enumerations arm64/mm: add pud_user_exec() check in pud_user_accessible_page() arm64/mm: fix incorrect file_map_count for invalid pmd	2023-01-13 07:11:45 -06:00
Christophe JAILLET	142e821f68	iommu/mediatek-v1: Fix an error handling path in mtk_iommu_v1_probe() A clk, prepared and enabled in mtk_iommu_v1_hw_init(), is not released in the error handling path of mtk_iommu_v1_probe(). Add the corresponding clk_disable_unprepare(), as already done in the remove function. Fixes: `b17336c55d` ("iommu/mediatek: add support for mtk iommu generation one HW") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Yong Wu <yong.wu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Link: https://lore.kernel.org/r/593e7b7d97c6e064b29716b091a9d4fd122241fb.1671473163.git.christophe.jaillet@wanadoo.fr Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-01-13 13:46:32 +01:00
Yunfei Wang	dcdb3ba7e2	iommu/iova: Fix alloc iova overflows issue In __alloc_and_insert_iova_range, there is an issue that retry_pfn overflows. The value of iovad->anchor.pfn_hi is ~0UL, then when iovad->cached_node is iovad->anchor, curr_iova->pfn_hi + 1 will overflow. As a result, if the retry logic is executed, low_pfn is updated to 0, and then new_pfn < low_pfn returns false to make the allocation successful. This issue occurs in the following two situations: 1. The first iova size exceeds the domain size. When initializing iova domain, iovad->cached_node is assigned as iovad->anchor. For example, the iova domain size is 10M, start_pfn is 0x1_F000_0000, and the iova size allocated for the first time is 11M. The following is the log information, new->pfn_lo is smaller than iovad->cached_node. Example log as follows: [ 223.798112][T1705487] sh: [name:iova&]__alloc_and_insert_iova_range start_pfn:0x1f0000,retry_pfn:0x0,size:0xb00,limit_pfn:0x1f0a00 [ 223.799590][T1705487] sh: [name:iova&]__alloc_and_insert_iova_range success start_pfn:0x1f0000,new->pfn_lo:0x1efe00,new->pfn_hi:0x1f08ff 2. The node with the largest iova->pfn_lo value in the iova domain is deleted, iovad->cached_node will be updated to iovad->anchor, and then the alloc iova size exceeds the maximum iova size that can be allocated in the domain. After judging that retry_pfn is less than limit_pfn, call retry_pfn+1 to fix the overflow issue. Signed-off-by: jianjiao zeng <jianjiao.zeng@mediatek.com> Signed-off-by: Yunfei Wang <yf.wang@mediatek.com> Cc: <stable@vger.kernel.org> # 5.15.* Fixes: `4e89dce725` ("iommu/iova: Retry from last rb tree node if iova search fails") Acked-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20230111063801.25107-1-yf.wang@mediatek.com Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-01-13 13:46:31 +01:00
Miaoqian Lin	a6a9a5da68	iommu: Fix refcount leak in iommu_device_claim_dma_owner iommu_group_get() returns the group with the reference incremented. Move iommu_group_get() after owner check to fix the refcount leak. Fixes: `89395ccedb` ("iommu: Add device-centric DMA ownership interfaces") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20221230083100.1489569-1-linmq006@gmail.com [ joro: Remove *group = NULL initialization ] Signed-off-by: Joerg Roedel <jroedel@suse.de>	2023-01-13 13:46:22 +01:00

1 2 3 4 5 ...

1154258 Commits