From 16c0cc0ce3059e315a0aab6538061d95a6612589 Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Fri, 11 Dec 2020 13:36:27 -0800 Subject: [PATCH 1/8] revert "mm/filemap: add static for function __add_to_page_cache_locked" Revert commit 3351b16af494 ("mm/filemap: add static for function __add_to_page_cache_locked") due to incompatibility with ALLOW_ERROR_INJECTION which result in build errors. Link: https://lkml.kernel.org/r/CAADnVQJ6tmzBXvtroBuEH6QA0H+q7yaSKxrVvVxhqr3KBZdEXg@mail.gmail.com Tested-by: Justin Forbes Tested-by: Greg Thelen Acked-by: Alexei Starovoitov Cc: Michal Kubecek Cc: Alex Shi Cc: Souptick Joarder Cc: Daniel Borkmann Cc: Josef Bacik Cc: Tony Luck Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/filemap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/filemap.c b/mm/filemap.c index 331f4261d723..0b2067b3c328 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -827,7 +827,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask) } EXPORT_SYMBOL_GPL(replace_page_cache_page); -static noinline int __add_to_page_cache_locked(struct page *page, +noinline int __add_to_page_cache_locked(struct page *page, struct address_space *mapping, pgoff_t offset, gfp_t gfp, void **shadowp) From 40d6366e9d86d9a67b5642040e76082fdb5bdcf9 Mon Sep 17 00:00:00 2001 From: Miles Chen Date: Fri, 11 Dec 2020 13:36:31 -0800 Subject: [PATCH 2/8] proc: use untagged_addr() for pagemap_read addresses When we try to visit the pagemap of a tagged userspace pointer, we find that the start_vaddr is not correct because of the tag. To fix it, we should untag the userspace pointers in pagemap_read(). I tested with 5.10-rc4 and the issue remains. Explanation from Catalin in [1]: "Arguably, that's a user-space bug since tagged file offsets were never supported. In this case it's not even a tag at bit 56 as per the arm64 tagged address ABI but rather down to bit 47. You could say that the problem is caused by the C library (malloc()) or whoever created the tagged vaddr and passed it to this function. It's not a kernel regression as we've never supported it. Now, pagemap is a special case where the offset is usually not generated as a classic file offset but rather derived by shifting a user virtual address. I guess we can make a concession for pagemap (only) and allow such offset with the tag at bit (56 - PAGE_SHIFT + 3)" My test code is based on [2]: A userspace pointer which has been tagged by 0xb4: 0xb400007662f541c8 userspace program: uint64 OsLayer::VirtualToPhysical(void *vaddr) { uint64 frame, paddr, pfnmask, pagemask; int pagesize = sysconf(_SC_PAGESIZE); off64_t off = ((uintptr_t)vaddr) / pagesize * 8; // off = 0xb400007662f541c8 / pagesize * 8 = 0x5a00003b317aa0 int fd = open(kPagemapPath, O_RDONLY); ... if (lseek64(fd, off, SEEK_SET) != off || read(fd, &frame, 8) != 8) { int err = errno; string errtxt = ErrorString(err); if (fd >= 0) close(fd); return 0; } ... } kernel fs/proc/task_mmu.c: static ssize_t pagemap_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { ... src = *ppos; svpfn = src / PM_ENTRY_BYTES; // svpfn == 0xb400007662f54 start_vaddr = svpfn << PAGE_SHIFT; // start_vaddr == 0xb400007662f54000 end_vaddr = mm->task_size; /* watch out for wraparound */ // svpfn == 0xb400007662f54 // (mm->task_size >> PAGE) == 0x8000000 if (svpfn > mm->task_size >> PAGE_SHIFT) // the condition is true because of the tag 0xb4 start_vaddr = end_vaddr; ret = 0; while (count && (start_vaddr < end_vaddr)) { // we cannot visit correct entry because start_vaddr is set to end_vaddr int len; unsigned long end; ... } ... } [1] https://lore.kernel.org/patchwork/patch/1343258/ [2] https://github.com/stressapptest/stressapptest/blob/master/src/os.cc#L158 Link: https://lkml.kernel.org/r/20201204024347.8295-1-miles.chen@mediatek.com Signed-off-by: Miles Chen Reviewed-by: Vincenzo Frascino Reviewed-by: Catalin Marinas Cc: Alexey Dobriyan Cc: Andrey Konovalov Cc: Alexander Potapenko Cc: Vincenzo Frascino Cc: Andrey Ryabinin Cc: Catalin Marinas Cc: Dmitry Vyukov Cc: Marco Elver Cc: Will Deacon Cc: Eric W. Biederman Cc: Song Bao Hua (Barry Song) Cc: [5.4-] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/task_mmu.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 217aa2705d5d..ee5a235b3056 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1599,11 +1599,15 @@ static ssize_t pagemap_read(struct file *file, char __user *buf, src = *ppos; svpfn = src / PM_ENTRY_BYTES; - start_vaddr = svpfn << PAGE_SHIFT; end_vaddr = mm->task_size; /* watch out for wraparound */ - if (svpfn > mm->task_size >> PAGE_SHIFT) + start_vaddr = end_vaddr; + if (svpfn <= (ULONG_MAX >> PAGE_SHIFT)) + start_vaddr = untagged_addr(svpfn << PAGE_SHIFT); + + /* Ensure the address is inside the task */ + if (start_vaddr > mm->task_size) start_vaddr = end_vaddr; /* From 84edc2eff82730d45ab513ecec49cb63beb973c9 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 11 Dec 2020 13:36:35 -0800 Subject: [PATCH 3/8] selftest/fpu: avoid clang warning With extra warnings enabled, clang complains about the redundant -mhard-float argument: clang: error: argument unused during compilation: '-mhard-float' [-Werror,-Wunused-command-line-argument] Move this into the gcc-only part of the Makefile. Link: https://lkml.kernel.org/r/20201203223652.1320700-1-arnd@kernel.org Fixes: 4185b3b92792 ("selftests/fpu: Add an FPU selftest") Signed-off-by: Arnd Bergmann Cc: Nathan Chancellor Cc: Nick Desaulniers Cc: Petteri Aimonen Cc: Borislav Petkov Cc: Arnd Bergmann Cc: Andy Shevchenko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- lib/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/Makefile b/lib/Makefile index ce45af50983a..d415fc7067c5 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -107,7 +107,7 @@ obj-$(CONFIG_TEST_FREE_PAGES) += test_free_pages.o # off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS # get appended last to CFLAGS and thus override those previous compiler options. # -FPU_CFLAGS := -mhard-float -msse -msse2 +FPU_CFLAGS := -msse -msse2 ifdef CONFIG_CC_IS_GCC # Stack alignment mismatch, proceed with caution. # GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 @@ -120,6 +120,7 @@ ifdef CONFIG_CC_IS_GCC # -mpreferred-stack-boundary=3 is not between 4 and 12 # # can be triggered. Otherwise gcc doesn't complain. +FPU_CFLAGS += -mhard-float FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4) endif From 14dc3983b5dff513a90bd5a8cc90acaf7867c3d0 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 11 Dec 2020 13:36:38 -0800 Subject: [PATCH 4/8] kbuild: avoid static_assert for genksyms genksyms does not know or care about the _Static_assert() built-in, and sometimes falls back to ignoring the later symbols, which causes undefined behavior such as WARNING: modpost: EXPORT symbol "ethtool_set_ethtool_phy_ops" [vmlinux] version generation failed, symbol will not be versioned. ld: net/ethtool/common.o: relocation R_AARCH64_ABS32 against `__crc_ethtool_set_ethtool_phy_ops' can not be used when making a shared object net/ethtool/common.o:(_ftrace_annotated_branch+0x0): dangerous relocation: unsupported relocation Redefine static_assert for genksyms to avoid that. Link: https://lkml.kernel.org/r/20201203230955.1482058-1-arnd@kernel.org Signed-off-by: Arnd Bergmann Suggested-by: Ard Biesheuvel Cc: Masahiro Yamada Cc: Michal Marek Cc: Kees Cook Cc: Rikard Falkeborn Cc: Marco Elver Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/build_bug.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/include/linux/build_bug.h b/include/linux/build_bug.h index e3a0be2c90ad..7bb66e15b481 100644 --- a/include/linux/build_bug.h +++ b/include/linux/build_bug.h @@ -77,4 +77,9 @@ #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr) #define __static_assert(expr, msg, ...) _Static_assert(expr, msg) +#ifdef __GENKSYMS__ +/* genksyms gets confused by _Static_assert */ +#define _Static_assert(expr, ...) +#endif + #endif /* _LINUX_BUILD_BUG_H */ From 55d5b7dd6451b58489ce384282ca5a4a289eb8d5 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 11 Dec 2020 13:36:42 -0800 Subject: [PATCH 5/8] initramfs: fix clang build failure There is only one function in init/initramfs.c that is in the .text section, and it is marked __weak. When building with clang-12 and the integrated assembler, this leads to a bug with recordmcount: ./scripts/recordmcount "init/initramfs.o" Cannot find symbol for section 2: .text. init/initramfs.o: failed I'm not quite sure what exactly goes wrong, but I notice that this function is only ever called from an __init function, and normally inlined. Marking it __init as well is clearly correct and it leads to recordmcount no longer complaining. Link: https://lkml.kernel.org/r/20201204165742.3815221-1-arnd@kernel.org Signed-off-by: Arnd Bergmann Cc: Nathan Chancellor Cc: Nick Desaulniers Cc: Barret Rhoden Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/initramfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/init/initramfs.c b/init/initramfs.c index 1f97c0328a7a..55b74d7e5260 100644 --- a/init/initramfs.c +++ b/init/initramfs.c @@ -535,7 +535,7 @@ extern unsigned long __initramfs_size; #include #include -void __weak free_initrd_mem(unsigned long start, unsigned long end) +void __weak __init free_initrd_mem(unsigned long start, unsigned long end) { #ifdef CONFIG_ARCH_KEEP_MEMBLOCK unsigned long aligned_start = ALIGN_DOWN(start, PAGE_SIZE); From 6e7b64b9dd6d96537d816ea07ec26b7dedd397b9 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 11 Dec 2020 13:36:46 -0800 Subject: [PATCH 6/8] elfcore: fix building with clang kernel/elfcore.c only contains weak symbols, which triggers a bug with clang in combination with recordmcount: Cannot find symbol for section 2: .text. kernel/elfcore.o: failed Move the empty stubs into linux/elfcore.h as inline functions. As only two architectures use these, just use the architecture specific Kconfig symbols to key off the declaration. Link: https://lkml.kernel.org/r/20201204165742.3815221-2-arnd@kernel.org Signed-off-by: Arnd Bergmann Cc: Nathan Chancellor Cc: Nick Desaulniers Cc: Barret Rhoden Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/elfcore.h | 22 ++++++++++++++++++++++ kernel/Makefile | 1 - kernel/elfcore.c | 26 -------------------------- 3 files changed, 22 insertions(+), 27 deletions(-) delete mode 100644 kernel/elfcore.c diff --git a/include/linux/elfcore.h b/include/linux/elfcore.h index 46c3d691f677..de51c1bef27d 100644 --- a/include/linux/elfcore.h +++ b/include/linux/elfcore.h @@ -104,6 +104,7 @@ static inline int elf_core_copy_task_fpregs(struct task_struct *t, struct pt_reg #endif } +#if defined(CONFIG_UM) || defined(CONFIG_IA64) /* * These functions parameterize elf_core_dump in fs/binfmt_elf.c to write out * extra segments containing the gate DSO contents. Dumping its @@ -118,5 +119,26 @@ elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset); extern int elf_core_write_extra_data(struct coredump_params *cprm); extern size_t elf_core_extra_data_size(void); +#else +static inline Elf_Half elf_core_extra_phdrs(void) +{ + return 0; +} + +static inline int elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset) +{ + return 1; +} + +static inline int elf_core_write_extra_data(struct coredump_params *cprm) +{ + return 1; +} + +static inline size_t elf_core_extra_data_size(void) +{ + return 0; +} +#endif #endif /* _LINUX_ELFCORE_H */ diff --git a/kernel/Makefile b/kernel/Makefile index af601b9bda0e..6c9f19911be0 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -97,7 +97,6 @@ obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o obj-$(CONFIG_TRACEPOINTS) += tracepoint.o obj-$(CONFIG_LATENCYTOP) += latencytop.o -obj-$(CONFIG_ELFCORE) += elfcore.o obj-$(CONFIG_FUNCTION_TRACER) += trace/ obj-$(CONFIG_TRACING) += trace/ obj-$(CONFIG_TRACE_CLOCK) += trace/ diff --git a/kernel/elfcore.c b/kernel/elfcore.c deleted file mode 100644 index 57fb4dcff434..000000000000 --- a/kernel/elfcore.c +++ /dev/null @@ -1,26 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -#include -#include -#include -#include -#include - -Elf_Half __weak elf_core_extra_phdrs(void) -{ - return 0; -} - -int __weak elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset) -{ - return 1; -} - -int __weak elf_core_write_extra_data(struct coredump_params *cprm) -{ - return 1; -} - -size_t __weak elf_core_extra_data_size(void) -{ - return 0; -} From 6c82d45c7f0348b44e00bd7dcccfc47dec7577d1 Mon Sep 17 00:00:00 2001 From: Kuan-Ying Lee Date: Fri, 11 Dec 2020 13:36:49 -0800 Subject: [PATCH 7/8] kasan: fix object remaining in offline per-cpu quarantine We hit this issue in our internal test. When enabling generic kasan, a kfree()'d object is put into per-cpu quarantine first. If the cpu goes offline, object still remains in the per-cpu quarantine. If we call kmem_cache_destroy() now, slub will report "Objects remaining" error. ============================================================================= BUG test_module_slab (Not tainted): Objects remaining in test_module_slab on __kmem_cache_shutdown() ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: Slab 0x(____ptrval____) objects=34 used=1 fp=0x(____ptrval____) flags=0x2ffff00000010200 CPU: 3 PID: 176 Comm: cat Tainted: G B 5.10.0-rc1-00007-g4525c8781ec0-dirty #10 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x2b0 show_stack+0x18/0x68 dump_stack+0xfc/0x168 slab_err+0xac/0xd4 __kmem_cache_shutdown+0x1e4/0x3c8 kmem_cache_destroy+0x68/0x130 test_version_show+0x84/0xf0 module_attr_show+0x40/0x60 sysfs_kf_seq_show+0x128/0x1c0 kernfs_seq_show+0xa0/0xb8 seq_read+0x1f0/0x7e8 kernfs_fop_read+0x70/0x338 vfs_read+0xe4/0x250 ksys_read+0xc8/0x180 __arm64_sys_read+0x44/0x58 el0_svc_common.constprop.0+0xac/0x228 do_el0_svc+0x38/0xa0 el0_sync_handler+0x170/0x178 el0_sync+0x174/0x180 INFO: Object 0x(____ptrval____) @offset=15848 INFO: Allocated in test_version_show+0x98/0xf0 age=8188 cpu=6 pid=172 stack_trace_save+0x9c/0xd0 set_track+0x64/0xf0 alloc_debug_processing+0x104/0x1a0 ___slab_alloc+0x628/0x648 __slab_alloc.isra.0+0x2c/0x58 kmem_cache_alloc+0x560/0x588 test_version_show+0x98/0xf0 module_attr_show+0x40/0x60 sysfs_kf_seq_show+0x128/0x1c0 kernfs_seq_show+0xa0/0xb8 seq_read+0x1f0/0x7e8 kernfs_fop_read+0x70/0x338 vfs_read+0xe4/0x250 ksys_read+0xc8/0x180 __arm64_sys_read+0x44/0x58 el0_svc_common.constprop.0+0xac/0x228 kmem_cache_destroy test_module_slab: Slab cache still has objects Register a cpu hotplug function to remove all objects in the offline per-cpu quarantine when cpu is going offline. Set a per-cpu variable to indicate this cpu is offline. [qiang.zhang@windriver.com: fix slab double free when cpu-hotplug] Link: https://lkml.kernel.org/r/20201204102206.20237-1-qiang.zhang@windriver.com Link: https://lkml.kernel.org/r/1606895585-17382-2-git-send-email-Kuan-Ying.Lee@mediatek.com Signed-off-by: Kuan-Ying Lee Signed-off-by: Zqiang Suggested-by: Dmitry Vyukov Reported-by: Guangye Yang Reviewed-by: Dmitry Vyukov Cc: Andrey Ryabinin Cc: Alexander Potapenko Cc: Matthias Brugger Cc: Nicholas Tang Cc: Miles Chen Cc: Qian Cai Cc: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/kasan/quarantine.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c index 4c5375810449..0e3f8494628f 100644 --- a/mm/kasan/quarantine.c +++ b/mm/kasan/quarantine.c @@ -29,6 +29,7 @@ #include #include #include +#include #include "../slab.h" #include "kasan.h" @@ -43,6 +44,7 @@ struct qlist_head { struct qlist_node *head; struct qlist_node *tail; size_t bytes; + bool offline; }; #define QLIST_INIT { NULL, NULL, 0 } @@ -188,6 +190,10 @@ void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache) local_irq_save(flags); q = this_cpu_ptr(&cpu_quarantine); + if (q->offline) { + local_irq_restore(flags); + return; + } qlist_put(q, &info->quarantine_link, cache->size); if (unlikely(q->bytes > QUARANTINE_PERCPU_SIZE)) { qlist_move_all(q, &temp); @@ -328,3 +334,36 @@ void quarantine_remove_cache(struct kmem_cache *cache) synchronize_srcu(&remove_cache_srcu); } + +static int kasan_cpu_online(unsigned int cpu) +{ + this_cpu_ptr(&cpu_quarantine)->offline = false; + return 0; +} + +static int kasan_cpu_offline(unsigned int cpu) +{ + struct qlist_head *q; + + q = this_cpu_ptr(&cpu_quarantine); + /* Ensure the ordering between the writing to q->offline and + * qlist_free_all. Otherwise, cpu_quarantine may be corrupted + * by interrupt. + */ + WRITE_ONCE(q->offline, true); + barrier(); + qlist_free_all(q, NULL); + return 0; +} + +static int __init kasan_cpu_quarantine_init(void) +{ + int ret = 0; + + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "mm/kasan:online", + kasan_cpu_online, kasan_cpu_offline); + if (ret < 0) + pr_err("kasan cpu quarantine register failed [%d]\n", ret); + return ret; +} +late_initcall(kasan_cpu_quarantine_init); From ba9c1201beaa86a773e83be5654602a0667e4a4d Mon Sep 17 00:00:00 2001 From: Gerald Schaefer Date: Fri, 11 Dec 2020 13:36:53 -0800 Subject: [PATCH 8/8] mm/hugetlb: clear compound_nr before freeing gigantic pages Commit 1378a5ee451a ("mm: store compound_nr as well as compound_order") added compound_nr counter to first tail struct page, overlaying with page->mapping. The overlay itself is fine, but while freeing gigantic hugepages via free_contig_range(), a "bad page" check will trigger for non-NULL page->mapping on the first tail page: BUG: Bad page state in process bash pfn:380001 page:00000000c35f0856 refcount:0 mapcount:0 mapping:00000000126b68aa index:0x0 pfn:0x380001 aops:0x0 flags: 0x3ffff00000000000() raw: 3ffff00000000000 0000000000000100 0000000000000122 0000000100000000 raw: 0000000000000000 0000000000000000 ffffffff00000000 0000000000000000 page dumped because: non-NULL mapping Modules linked in: CPU: 6 PID: 616 Comm: bash Not tainted 5.10.0-rc7-next-20201208 #1 Hardware name: IBM 3906 M03 703 (LPAR) Call Trace: show_stack+0x6e/0xe8 dump_stack+0x90/0xc8 bad_page+0xd6/0x130 free_pcppages_bulk+0x26a/0x800 free_unref_page+0x6e/0x90 free_contig_range+0x94/0xe8 update_and_free_page+0x1c4/0x2c8 free_pool_huge_page+0x11e/0x138 set_max_huge_pages+0x228/0x300 nr_hugepages_store_common+0xb8/0x130 kernfs_fop_write+0xd2/0x218 vfs_write+0xb0/0x2b8 ksys_write+0xac/0xe0 system_call+0xe6/0x288 Disabling lock debugging due to kernel taint This is because only the compound_order is cleared in destroy_compound_gigantic_page(), and compound_nr is set to 1U << order == 1 for order 0 in set_compound_order(page, 0). Fix this by explicitly clearing compound_nr for first tail page after calling set_compound_order(page, 0). Link: https://lkml.kernel.org/r/20201208182813.66391-2-gerald.schaefer@linux.ibm.com Fixes: 1378a5ee451a ("mm: store compound_nr as well as compound_order") Signed-off-by: Gerald Schaefer Reviewed-by: Matthew Wilcox (Oracle) Cc: Heiko Carstens Cc: Mike Kravetz Cc: Christian Borntraeger Cc: [5.9+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/hugetlb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 37f15c3c24dc..d029d938d26d 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1216,6 +1216,7 @@ static void destroy_compound_gigantic_page(struct page *page, } set_compound_order(page, 0); + page[1].compound_nr = 0; __ClearPageHead(page); }