Merge misc updates from Andrew Morton:
"146 patches.
Subsystems affected by this patch series: kthread, ia64, scripts,
ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
damon)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits)
mm/damon: hide kernel pointer from tracepoint event
mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
mm/damon/dbgfs: remove an unnecessary variable
mm/damon: move the implementation of damon_insert_region to damon.h
mm/damon: add access checking for hugetlb pages
Docs/admin-guide/mm/damon/usage: update for schemes statistics
mm/damon/dbgfs: support all DAMOS stats
Docs/admin-guide/mm/damon/reclaim: document statistics parameters
mm/damon/reclaim: provide reclamation statistics
mm/damon/schemes: account how many times quota limit has exceeded
mm/damon/schemes: account scheme actions that successfully applied
mm/damon: remove a mistakenly added comment for a future feature
Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
Docs/admin-guide/mm/damon/usage: remove redundant information
Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
mm/damon: convert macro functions to static inline functions
mm/damon: modify damon_rand() macro to static inline function
mm/damon: move damon_rand() definition into damon.h
...
Since commit 4064b98270 ("mm: allow VM_FAULT_RETRY for multiple
times") allowed VM_FAULT_RETRY for multiple times, the
FAULT_FLAG_ALLOW_RETRY bit of fault_flag will not be changed in the page
fault path, so the following check is no longer needed:
flags & FAULT_FLAG_ALLOW_RETRY
So just remove it.
[akpm@linux-foundation.org: coding style fixes]
Link: https://lkml.kernel.org/r/20211110123358.36511-1-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Peter Xu <peterx@redhat.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"Lots of cleanups and preparation; highlights:
- futex: Cleanup and remove runtime futex_cmpxchg detection
- rtmutex: Some fixes for the PREEMPT_RT locking infrastructure
- kcsan: Share owner_on_cpu() between mutex,rtmutex and rwsem and
annotate the racy owner->on_cpu access *once*.
- atomic64: Dead-Code-Elemination"
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHdvssACgkQEsHwGGHe
VUrbBg//VQvz5BwddIJDj9utt5AvSixNcTF5mJyFKCSIqO0S4J8nCNcvJjZ2bs4S
w1YmInFbp0WFGUhaIZiw0e6KWJUoINTng4MfHDZosS1doT2of53ZaQqXs3i81jDz
87w8ADVHL0x4+BNjdsIwbcuPSDTmJFoyFOdeXTIl9hv9ZULT8m4Mt+LJuUHNZ+vF
rS1jyseVPWkcm5y+Yie0rhip+ygzbfbt0ArsLfRcrBJsKr6oxLxV2DDF+2djXuuP
d2OgGT7VkbgAhoKpzVXUiHsT6ppR5Mn5TLSa4EZ4bPPCUFldOhKuCAImF3T6yVIa
44iX5vQN9v5VHBy6ocPbdOIBuYBYVGCMurh1t7pbpB6G+mmSxMiyta5MY37POwjv
K2JT9mC2A6a4d17gue5FT3mnJMBB4eHwVaDfAwCZs/5rRNuoTz4aY5Xy04Mq0ltI
39uarwBd5hwSugBWg44AS5E9h52E654FQ7g6iS4NtUvJuuaXBTl43EcZWx2+mnPL
zY+iOMVMgg33VIVcm/mlf/6zWL0LXPmILUiA1fp4Q9/n8u1EuOOyeA/GsC9Pl3wO
HY3KpYJA5eQpIk/JEnzKm5ZE3pCrUdH6VDC/SB4owQtafQG6OxyQVP1Gj7KYxZsD
NqqpJ4nkKooc5f5DqVEN8wrjyYsnVxEfriEG09OoR6wI3MqyUA4=
=vrYy
-----END PGP SIGNATURE-----
Merge tag 'locking_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Borislav Petkov:
"Lots of cleanups and preparation. Highlights:
- futex: Cleanup and remove runtime futex_cmpxchg detection
- rtmutex: Some fixes for the PREEMPT_RT locking infrastructure
- kcsan: Share owner_on_cpu() between mutex,rtmutex and rwsem and
annotate the racy owner->on_cpu access *once*.
- atomic64: Dead-Code-Elemination"
[ Description above by Peter Zijlstra ]
* tag 'locking_core_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/atomic: atomic64: Remove unusable atomic ops
futex: Fix additional regressions
locking: Allow to include asm/spinlock_types.h from linux/spinlock_types_raw.h
x86/mm: Include spinlock_t definition in pgtable.
locking: Mark racy reads of owner->on_cpu
locking: Make owner_on_cpu() into <linux/sched.h>
lockdep/selftests: Adapt ww-tests for PREEMPT_RT
lockdep/selftests: Skip the softirq related tests on PREEMPT_RT
lockdep/selftests: Unbalanced migrate_disable() & rcu_read_lock().
lockdep/selftests: Avoid using local_lock_{acquire|release}().
lockdep: Remove softirq accounting on PREEMPT_RT.
locking/rtmutex: Add rt_mutex_lock_nest_lock() and rt_mutex_lock_killable().
locking/rtmutex: Squash self-deadlock check for ww_rt_mutex.
locking: Remove rt_rwlock_is_contended().
sched: Trigger warning if ->migration_disabled counter underflows.
futex: Fix sparc32/m68k/nds32 build regression
futex: Remove futex_cmpxchg detection
futex: Ensure futex_atomic_cmpxchg_inatomic() is present
kernel/locking: Use a pointer in ww_mutex_trylock().
Add a dtb=<filename> option to boot UML with a devicetree blob. This
can be used for testing driver code using UML.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
[rw: Add dependency on CONFIG_OF]
Signed-off-by: Richard Weinberger <richard@nod.at>
The file loading support in initrd.c can be re-used for
loading devicetrees. Move it out of initrd.c.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Remove address space overrides using set_fs() for User Mode Linux.
Note that just like the existing kernel access case of the uaccess
routines the new nofault kernel handlers do not actually have any
exception handling. This is probably broken, but not change to the
status quo.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Only one extern in there is needed in processor-generic.h, and it's
not needed anywhere else. So move it over there and get rid of
the include in processor-generic.h, adding includes of registers.h
to the few files that need the declarations in it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
Fix the following coccinelle reports:
./arch/um/kernel/mem.c:89:2-5: WARNING: Use BUG_ON instead of if
condition followed by BUG.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn>
Signed-off-by: Richard Weinberger <richard@nod.at>
Rename set_signals() as there's at least one driver that
uses the same name and can now be built on UM due to PCI
support, and thus we can get symbol conflicts.
Also rename set_signals_trace() to be consistent.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 68f5d3f3b6 ("um: add PCI over virtio emulation driver")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Now that all architectures have a working futex implementation in any
configuration, remove the runtime detection code.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Vineet Gupta <vgupta@kernel.org>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Link: https://lore.kernel.org/r/20211026100432.1730393-2-arnd@kernel.org
Pull exit cleanups from Eric Biederman:
"While looking at some issues related to the exit path in the kernel I
found several instances where the code is not using the existing
abstractions properly.
This set of changes introduces force_fatal_sig a way of sending a
signal and not allowing it to be caught, and corrects the misuse of
the existing abstractions that I found.
A lot of the misuse of the existing abstractions are silly things such
as doing something after calling a no return function, rolling BUG by
hand, doing more work than necessary to terminate a kernel thread, or
calling do_exit(SIGKILL) instead of calling force_sig(SIGKILL).
In the review a deficiency in force_fatal_sig and force_sig_seccomp
where ptrace or sigaction could prevent the delivery of the signal was
found. I have added a change that adds SA_IMMUTABLE to change that
makes it impossible to interrupt the delivery of those signals, and
allows backporting to fix force_sig_seccomp
And Arnd found an issue where a function passed to kthread_run had the
wrong prototype, and after my cleanup was failing to build."
* 'exit-cleanups-for-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits)
soc: ti: fix wkup_m3_rproc_boot_thread return type
signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed
signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
exit/r8188eu: Replace the macro thread_exit with a simple return 0
exit/rtl8712: Replace the macro thread_exit with a simple return 0
exit/rtl8723bs: Replace the macro thread_exit with a simple return 0
signal/x86: In emulate_vsyscall force a signal instead of calling do_exit
signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig
signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails
exit/syscall_user_dispatch: Send ordinary signals on failure
signal: Implement force_fatal_sig
exit/kthread: Have kernel threads return instead of calling do_exit
signal/s390: Use force_sigsegv in default_trap_handler
signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.
signal/vm86_32: Replace open coded BUG_ON with an actual BUG_ON
signal/sparc: In setup_tsb_params convert open coded BUG into BUG
signal/powerpc: On swapcontext failure force SIGSEGV
signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL)
signal/mips: Update (_save|_restore)_fp_context to fail with -EFAULT
signal/sparc32: Remove unreachable do_exit in do_sparc_fault
...
Merge misc updates from Andrew Morton:
"257 patches.
Subsystems affected by this patch series: scripts, ocfs2, vfs, and
mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache,
gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc,
pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools,
memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm,
vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram,
cleanups, kfence, and damon)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits)
mm/damon: remove return value from before_terminate callback
mm/damon: fix a few spelling mistakes in comments and a pr_debug message
mm/damon: simplify stop mechanism
Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
Docs/admin-guide/mm/damon/start: simplify the content
Docs/admin-guide/mm/damon/start: fix a wrong link
Docs/admin-guide/mm/damon/start: fix wrong example commands
mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on
mm/damon: remove unnecessary variable initialization
Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)
selftests/damon: support watermarks
mm/damon/dbgfs: support watermarks
mm/damon/schemes: activate schemes based on a watermarks mechanism
tools/selftests/damon: update for regions prioritization of schemes
mm/damon/dbgfs: support prioritization weights
mm/damon/vaddr,paddr: support pageout prioritization
mm/damon/schemes: prioritize regions within the quotas
mm/damon/selftests: support schemes quotas
mm/damon/dbgfs: support quotas of schemes
...
Rename memblock_free_ptr() to memblock_free() and use memblock_free()
when freeing a virtual pointer so that memblock_free() will be a
counterpart of memblock_alloc()
The callers are updated with the below semantic patch and manual
addition of (void *) casting to pointers that are represented by
unsigned long variables.
@@
identifier vaddr;
expression size;
@@
(
- memblock_phys_free(__pa(vaddr), size);
+ memblock_free(vaddr, size);
|
- memblock_free_ptr(vaddr, size);
+ memblock_free(vaddr, size);
)
[sfr@canb.auug.org.au: fixup]
Link: https://lkml.kernel.org/r/20211018192940.3d1d532f@canb.auug.org.au
Link: https://lkml.kernel.org/r/20210930185031.18648-7-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since memblock_free() operates on a physical range, make its name
reflect it and rename it to memblock_phys_free(), so it will be a
logical counterpart to memblock_phys_alloc().
The callers are updated with the below semantic patch:
@@
expression addr;
expression size;
@@
- memblock_free(addr, size);
+ memblock_phys_free(addr, size);
Link: https://lkml.kernel.org/r/20210930185031.18648-6-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Juergen Gross <jgross@suse.com>
Cc: Shahab Vahedi <Shahab.Vahedi@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Revert the printk format based wchan() symbol resolution as it can leak
the raw value in case that the symbol is not resolvable.
- Make wchan() more robust and work with all kind of unwinders by
enforcing that the task stays blocked while unwinding is in progress.
- Prevent sched_fork() from accessing an invalid sched_task_group
- Improve asymmetric packing logic
- Extend scheduler statistics to RT and DL scheduling classes and add
statistics for bandwith burst to the SCHED_FAIR class.
- Properly account SCHED_IDLE entities
- Prevent a potential deadlock when initial priority is assigned to a
newly created kthread. A recent change to plug a race between cpuset and
__sched_setscheduler() introduced a new lock dependency which is now
triggered. Break the lock dependency chain by moving the priority
assignment to the thread function.
- Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.
- Improve idle balancing in general and especially for NOHZ enabled
systems.
- Provide proper interfaces for live patching so it does not have to
fiddle with scheduler internals.
- Add cluster aware scheduling support.
- A small set of tweaks for RT (irqwork, wait_task_inactive(), various
scheduler options and delaying mmdrop)
- The usual small tweaks and improvements all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmF/OUkTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoR/5D/9ikdGNpKg9osNqJ3GjAmxsK6kVkB29
iFe2k8pIpWDToWQf/wQRGih4Yj3Cl49QSnZcPIibh2/12EB1qrrW6iSPJkInz8Ec
/1LS5/Vewn2OyoxyXZjdvGC5gTXEodSbIazASvX7nvdMeI4gsAsL5etzrMJirT/t
aymqvr7zovvywrwMTQJrGjUMo9l4ewE8tafMNNhRu1BHU1U4ojM9yvThyRAAcmp7
3Xy49A+Yq3IgrvYI4u8FMK5Zh08KaxSFjiLhePGm/bF+wSfYmWop2TP1jY05W2Uo
ti8hfbJMUoFRYuMxAiEldkItnc0wV4M9PtWZZ/x+B71bs65Y4Zjt9cW+rxJv2+m1
vzV31EsQwGnOti072dzWN4c/cZqngVXAjaNtErvDwJUr+Tw1ayv9KUvuodMQqZY6
mu68bFUO2kV9EMe1CBOv51Uy1RGHyLj3rlNqrkw+Xp5ISE9Ad2vhUEiRp5bQx5Ci
V/XFhGZkGUluh0vccrdFlNYZwhj8cZEzkOPCnPSeZ+bq8SyZE6xuHH/lTP1CJCOy
s800rW1huM+kgV+zRN8adDkGXibAk9N3RtVGnQXmuEy8gB9LZmQg+JeM2wsc9B+6
i0gdqZnsjNAfoK+BBAG4holxptSL8/eOJsFH8ZNIoxQ+iqooyPx9tFX7yXnRTBQj
d2qWG7UvoseT+g==
=fgtS
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:
- Revert the printk format based wchan() symbol resolution as it can
leak the raw value in case that the symbol is not resolvable.
- Make wchan() more robust and work with all kind of unwinders by
enforcing that the task stays blocked while unwinding is in progress.
- Prevent sched_fork() from accessing an invalid sched_task_group
- Improve asymmetric packing logic
- Extend scheduler statistics to RT and DL scheduling classes and add
statistics for bandwith burst to the SCHED_FAIR class.
- Properly account SCHED_IDLE entities
- Prevent a potential deadlock when initial priority is assigned to a
newly created kthread. A recent change to plug a race between cpuset
and __sched_setscheduler() introduced a new lock dependency which is
now triggered. Break the lock dependency chain by moving the priority
assignment to the thread function.
- Fix the idle time reporting in /proc/uptime for NOHZ enabled systems.
- Improve idle balancing in general and especially for NOHZ enabled
systems.
- Provide proper interfaces for live patching so it does not have to
fiddle with scheduler internals.
- Add cluster aware scheduling support.
- A small set of tweaks for RT (irqwork, wait_task_inactive(), various
scheduler options and delaying mmdrop)
- The usual small tweaks and improvements all over the place
* tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits)
sched/fair: Cleanup newidle_balance
sched/fair: Remove sysctl_sched_migration_cost condition
sched/fair: Wait before decaying max_newidle_lb_cost
sched/fair: Skip update_blocked_averages if we are defering load balance
sched/fair: Account update_blocked_averages in newidle_balance cost
x86: Fix __get_wchan() for !STACKTRACE
sched,x86: Fix L2 cache mask
sched/core: Remove rq_relock()
sched: Improve wake_up_all_idle_cpus() take #2
irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT
irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT
irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support.
sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ
sched: Add cluster scheduler level for x86
sched: Add cluster scheduler level in core and related Kconfig for ARM64
topology: Represent clusters of CPUs within a die
sched: Disable -Wunused-but-set-variable
sched: Add wrapper for get_wchan() to keep task blocked
x86: Fix get_wchan() to support the ORC unwinder
proc: Use task_is_running() for wchan in /proc/$pid/stat
...
Now that force_fatal_sig exists it is unnecessary and a bit confusing
to use force_sigsegv in cases where the simpler force_fatal_sig is
wanted. So change every instance we can to make the code clearer.
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Link: https://lkml.kernel.org/r/877de7jrev.fsf@disp2133
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Rewrite retpoline thunk call sites to be indirect calls for
spectre_v2=off. This ensures spectre_v2=off is as near to a
RETPOLINE=n build as possible.
This is the replacement for objtool writing alternative entries to
ensure the same and achieves feature-parity with the previous
approach.
One noteworthy feature is that it relies on the thunks to be in
machine order to compute the register index.
Specifically, this does not yet address the Jcc __x86_indirect_thunk_*
calls generated by clang, a future patch will add this.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211026120310.232495794@infradead.org
Having a stable wchan means the process must be blocked and for it to
stay that way while performing stack unwinding.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm]
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Link: https://lkml.kernel.org/r/20211008111626.332092234@infradead.org
- Support for VMAP_STACK
- Support for splice_write in hostfs
- Fixes for virt-pci
- Fixes for virtio_uml
- Various fixes
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmE6Xv4WHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wbpMD/0UBswFdI9J6ePQf2+UyQ3sfFay
xZ5/gyL+Ou0k/hwcjLx4DtIQBXkNiwgiKF+ncHvMXTr/oKAo5f7UsGYyMNIKlbKO
LrIpc6avqmeovTtOuVhm6VML/m7rvJYC/wJ0VFu6CN2aELoRZLXfeogwn1beAl6p
3JKc54tbew5022lZF6Df/QEpkCyuOjWMnEn/khJGuz+vmkodV+5cegZqxJIAnWrU
NVGf7laiV+rBWY4SVXiuJBGTNFwLZkORNa5evBScum85aqwaFawepZT0pNKEt4tc
Lalyy7jACriWeQJeQksWACfexYFPywQU/ebYcAlQ9b0wd5aZxi8IJc9wj0a1Oz3N
i2DEf09/Zk8eE1cbpp6GP+pbvlqNVsAgtLane2Wzxc1kuJGiFYeXCiDyCFzbhbxW
rsTiP3oAxC7OjFwebmtCvBbK9GSl5ETDwfOg+nl2idIK0cds292ju3bWL9vO6VRP
Cjxzn7ZaJYvPlrRHo5yujLURqRZSrkPcL/XthIDQJNjXMd8j2AYMRVM2n0gFLu7g
jSphwg8t3SmCrolGtUucadTPNMR5pE3rQTN+tbhqwGp+Cs+MnM7CqKUv+JoRC7KF
1qH/1p9tiz/utIpjKmvNZtZRwnElBoEgyoY6RdtqlCMnDcuLpDdmCRyWDsHAzXKg
1X9ym5QqDj5zSLxsXg==
=RgAO
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- Support for VMAP_STACK
- Support for splice_write in hostfs
- Fixes for virt-pci
- Fixes for virtio_uml
- Various fixes
* tag 'for-linus-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: fix stub location calculation
um: virt-pci: fix uapi documentation
um: enable VMAP_STACK
um: virt-pci: don't do DMA from stack
hostfs: support splice_write
um: virtio_uml: fix memory leak on init failures
um: virtio_uml: include linux/virtio-uml.h
lib/logic_iomem: fix sparse warnings
um: make PCI emulation driver init/exit static
Merge more updates from Andrew Morton:
"147 patches, based on 7d2a07b769.
Subsystems affected by this patch series: mm (memory-hotplug, rmap,
ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan),
alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib,
checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig,
selftests, ipc, and scripts"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits)
scripts: check_extable: fix typo in user error message
mm/workingset: correct kernel-doc notations
ipc: replace costly bailout check in sysvipc_find_ipc()
selftests/memfd: remove unused variable
Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
configs: remove the obsolete CONFIG_INPUT_POLLDEV
prctl: allow to setup brk for et_dyn executables
pid: cleanup the stale comment mentioning pidmap_init().
kernel/fork.c: unexport get_{mm,task}_exe_file
coredump: fix memleak in dump_vma_snapshot()
fs/coredump.c: log if a core dump is aborted due to changed file permissions
nilfs2: use refcount_dec_and_lock() to fix potential UAF
nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
nilfs2: fix NULL pointer in nilfs_##name##_attr_release
nilfs2: fix memory leak in nilfs_sysfs_create_device_group
trap: cleanup trap_init()
init: move usermodehelper_enable() to populate_rootfs()
...
There are some empty trap_init() definitions in different ARCHs, Introduce
a new weak trap_init() function to clean them up.
Link: https://lkml.kernel.org/r/20210812123602.76356-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm32]
Acked-by: Vineet Gupta [arc]
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <palmerdabbelt@google.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit 9f0b4807a4 ("um: rework userspace stubs to not hard-code
stub location") I changed stub_segv_handler() to do a calculation with
a pointer to a stack variable to find the data page that we're using
for the stack and the rest of the data. This same commit was meant to
do it as well for stub_clone_handler(), but the change inadvertently
went into commit 84b2789d61 ("um: separate child and parent errors
in clone stub") instead.
This was reported to not be compiled correctly by gcc 5, causing the
code to crash here. I'm not sure why, perhaps it's UB because the var
isn't initialized? In any case, this trick always seemed bad, so just
create a new inline function that does the calculation in assembly.
Reported-by: subashab@codeaurora.org
Fixes: 9f0b4807a4 ("um: rework userspace stubs to not hard-code stub location")
Fixes: 84b2789d61 ("um: separate child and parent errors in clone stub")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This is a preparation for changing over architectures to the
generic implementation one at a time. As there are no callers
of either __strncpy_from_user() or __strnlen_user(), fold these
into the strncpy_from_user() and strnlen_user() functions to make
each implementation independent of the others.
Many of these implementations have known bugs, but the intention
here is to not change behavior at all and stay compatible with
those bugs for the moment.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
- Support for optimized routines based on the host CPU
- Support for PCI via virtio
- Various fixes
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmDnZwAWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wW1BD/9SHWGYhxLY+xL27eO0Q8XOPePb
diqllGavzq3fcakmJ3+6iIpb/WYX0ztu1M4KMBRP3QxNjP6nFkS1ph3PC0LL3ec2
h23hRfOrhlQd4rdonPcq/Z7oXKhrkem9G6KneVfvB94HmXnaZIrNBjwQRy0uRMXE
/IVNH4o6YMR8Av/VrG+L6BS+O/oXVnYVSLOuXsIrxmxS24NybsOpRzHvl14ZUsHt
eiwzcRC3ugAaxJn8cOSrHdBwvdOgbFFWEtMITcesQpYru+EmQcsCZdmJ0DbwsV2e
9k+LrVoy0CZFoekBtaaFvZq+JVBjUZKoAUYBML4ejWnQKolJH0BZQRh4RT0rbTjc
UMiuE3kFUsdJjzJRyO4pcqpwaNhCiZ2XrwyKeev/FLIn95bD1xbLJWfRvoKhioiI
X+1vujN2+N5n8T+u8sCVohujJCkUkMjevfF6ew8rvYOj3FrGqTi4jgrXUFAIsjLa
mHdA92oHIjNOCjyVIqnoUFTDltVMW9CwnLtd5nPnGvJoMtsj7lthy6fdtdPH0WVu
iNR4toE/AjBJo4rtib/irYbZtqmw2AbBFqoRk4yj8Fw4ZdSPYELwAR1aah0Oce9R
t1T9OE66vlr28XIC0NF917JfSNkc2eXnx4B21Zh+a/68XSJ1FzXPTob3lvXVVhQR
Ou4aw6dH7mql/2bq1w==
=wAww
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- Support for optimized routines based on the host CPU
- Support for PCI via virtio
- Various fixes
* tag 'for-linus-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: remove unneeded semicolon in um_arch.c
um: Remove the repeated declaration
um: fix error return code in winch_tramp()
um: fix error return code in slip_open()
um: Fix stack pointer alignment
um: implement flush_cache_vmap/flush_cache_vunmap
um: add a UML specific futex implementation
um: enable the use of optimized xor routines in UML
um: Add support for host CPU flags and alignment
um: allow not setting extra rpaths in the linux binary
um: virtio/pci: enable suspend/resume
um: add PCI over virtio emulation driver
um: irqs: allow invoking time-travel handler multiple times
um: time-travel/signals: fix ndelay() in interrupt
um: expose time-travel mode to userspace side
um: export signals_enabled directly
um: remove unused smp_sigio_handler() declaration
lib: add iomem emulation (logic_iomem)
um: allow disabling NO_IOMEM
Merge more updates from Andrew Morton:
"190 patches.
Subsystems affected by this patch series: mm (hugetlb, userfaultfd,
vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock,
migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap,
zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc,
core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs,
signals, exec, kcov, selftests, compress/decompress, and ipc"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
ipc/util.c: use binary search for max_idx
ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock
ipc: use kmalloc for msg_queue and shmid_kernel
ipc sem: use kvmalloc for sem_undo allocation
lib/decompressors: remove set but not used variabled 'level'
selftests/vm/pkeys: exercise x86 XSAVE init state
selftests/vm/pkeys: refill shadow register after implicit kernel write
selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
kcov: add __no_sanitize_coverage to fix noinstr for all architectures
exec: remove checks in __register_bimfmt()
x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned
hfsplus: report create_date to kstat.btime
hfsplus: remove unnecessary oom message
nilfs2: remove redundant continue statement in a while-loop
kprobes: remove duplicated strong free_insn_page in x86 and s390
init: print out unknown kernel parameters
checkpatch: do not complain about positive return values starting with EPOLL
checkpatch: improve the indented label test
checkpatch: scripts/spdxcheck.py now requires python3
...
kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out panic and
oops helpers.
There are several purposes of doing this:
- dropping dependency in bug.h
- dropping a loop by moving out panic_notifier.h
- unload kernel.h from something which has its own domain
At the same time convert users tree-wide to use new headers, although for
the time being include new header back to kernel.h to avoid twisted
indirected includes for existing users.
[akpm@linux-foundation.org: thread_info.h needs limits.h]
[andriy.shevchenko@linux.intel.com: ia64 fix]
Link: https://lkml.kernel.org/r/20210520130557.55277-1-andriy.shevchenko@linux.intel.com
Link: https://lkml.kernel.org/r/20210511074137.33666-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Co-developed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Sebastian Reichel <sre@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Consolidate the macros for .byte ... opcode sequences
- Deduplicate register offset defines in include files
- Simplify the ia32,x32 compat handling of the related syscall tables to
get rid of #ifdeffery.
- Clear all EFLAGS which are not required for syscall handling
- Consolidate the syscall tables and switch the generation over to the
generic shell script and remove the CFLAGS tweaks which are not longer
required.
- Use 'int' type for system call numbers to match the generic code.
- Add more selftests for syscalls
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmDbKzMTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoae8D/9+pksdf8lE5dRLtngSeTDLiyIV+qq4
vSks7XfrTTAhOV2nRwtIulc2CO6H7jcvn6ehmiC/X0Tn9JK5brwSJJYryNEjA3cp
3p9jPrB1w1SDhx35JzILN4DDaJfI3jobLSLDq0KQzuEL0+c0R4l3WBplpCzbLjqj
NaFQgslf8RSnjha9NLTKzlzSaNNNo9Ioo6DyrsBDEdcRBtAPlFfdVtT3oJE73ANH
dK5POoVWysmAnDAwEW17j9bBJLtxeWsrhM9CrtqvcKr3HhK9WjWUFAr+diQf5GKf
BAD2A+5y8wZQXvFOuC9WZxfQwUFSLExt8BfcXblOUbf2CdlvoYVzOlvI141kA++4
q4wQ1vl6MbLCp6wLysc3bnwKUEmnf2E4Iyj5JR2aFrw096pAoZ3ZbAQi7s3Vhb16
aSbGxIw3rHRuB0f8VmOA0iEHiXlkRmE/K+nH1/uDTUZLaDpktPvpKQJsp0+9qXFk
eVtEw4bVKJ7q5ozjMzpm9aPxPp1v8MGxUOJOy80W7Ti+vBp2KmMKc1gy8QsYrTvW
Vzvpp3U+/WFh2X7AG0zlP/JEnOuJmMwMK5QhzMC2rEbaHJ66ht7SABvtSbOHHw5Z
zugxTE0lx3n7izCxW1RLEu//xtWY0FbU2L5oE2Ace27myUPeBQCDJzynUn93dMM9
9nq2TtgTCF6XvA==
=+sb9
-----END PGP SIGNATURE-----
Merge tag 'x86-entry-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 entry code related updates from Thomas Gleixner:
- Consolidate the macros for .byte ... opcode sequences
- Deduplicate register offset defines in include files
- Simplify the ia32,x32 compat handling of the related syscall tables
to get rid of #ifdeffery.
- Clear all EFLAGS which are not required for syscall handling
- Consolidate the syscall tables and switch the generation over to the
generic shell script and remove the CFLAGS tweaks which are not
longer required.
- Use 'int' type for system call numbers to match the generic code.
- Add more selftests for syscalls
* tag 'x86-entry-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/syscalls: Don't adjust CFLAGS for syscall tables
x86/syscalls: Remove -Wno-override-init for syscall tables
x86/uml/syscalls: Remove array index from syscall initializers
x86/syscalls: Clear 'offset' and 'prefix' in case they are set in env
x86/entry: Use int everywhere for system call numbers
x86/entry: Treat out of range and gap system calls the same
x86/entry/64: Sign-extend system calls on entry to int
selftests/x86/syscall: Add tests under ptrace to syscall_numbering_64
selftests/x86/syscall: Simplify message reporting in syscall_numbering
selftests/x86/syscall: Update and extend syscall_numbering_64
x86/syscalls: Switch to generic syscallhdr.sh
x86/syscalls: Use __NR_syscalls instead of __NR_syscall_max
x86/unistd: Define X32_NR_syscalls only for 64-bit kernel
x86/syscalls: Stop filling syscall arrays with *_sys_ni_syscall
x86/syscalls: Switch to generic syscalltbl.sh
x86/entry/x32: Rename __x32_compat_sys_* to __x64_compat_sys_*
Fix following coccicheck warning:
./arch/um/kernel/um_arch.c:284:34-35: Unneeded semicolon
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
task_is_running(p).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210611082838.222401495@infradead.org
GCC assumes that stack is aligned to 16-byte on call sites [1].
Since GCC 8, GCC began using 16-byte aligned SSE instructions to
implement assignments to structs on stack. When
CC_OPTIMIZE_FOR_PERFORMANCE is enabled, this affects
os-Linux/sigio.c, write_sigio_thread:
struct pollfds *fds, tmp;
tmp = current_poll;
Note that struct pollfds is exactly 16 bytes in size.
GCC 8+ generates assembly similar to:
movdqa (%rdi),%xmm0
movaps %xmm0,-0x50(%rbp)
This is an issue, because movaps will #GP if -0x50(%rbp) is not
aligned to 16 bytes [2], and how rbp gets assigned to is via glibc
clone thread_start, then function prologue, going though execution
trace similar to (showing only relevant instructions):
sub $0x10,%rsi
mov %rcx,0x8(%rsi)
mov %rdi,(%rsi)
syscall
pop %rax
pop %rdi
callq *%rax
push %rbp
mov %rsp,%rbp
The stack pointer always points to the topmost element on stack,
rather then the space right above the topmost. On push, the
pointer decrements first before writing to the memory pointed to
by it. Therefore, there is no need to have the stack pointer
pointer always point to valid memory unless the stack is poped;
so the `- sizeof(void *)` in the code is unnecessary.
On the other hand, glibc reserves the 16 bytes it needs on stack
and pops itself, so by the call instruction the stack pointer
is exactly the caller-supplied sp. It then push the 16 bytes of
the return address and the saved stack pointer, so the base
pointer will be 16-byte aligned if and only if the caller
supplied sp is 16-byte aligned. Therefore, the caller must supply
a 16-byte aligned pointer, which `stack + UM_KERN_PAGE_SIZE`
already satisfies.
On a side note, musl is unaffected by this issue because it forces
16 byte alignment via `and $-16,%rsi` in its clone wrapper.
Similarly, glibc i386 is also unaffected because it has
`andl $0xfffffff0, %ecx`.
To reproduce this bug, enable CONFIG_UML_RTC and
CC_OPTIMIZE_FOR_PERFORMANCE. uml_rtc will call
add_sigio_fd which will then cause write_sigio_thread to either go
into segfault loop or panic with "Segfault with no mm".
Similarly, signal stacks will be aligned by the host kernel upon
signal delivery. `- sizeof(void *)` to sigaltstack is
unconventional and extraneous.
On a related note, initialization of longjmp buffers do require
`- sizeof(void *)`. This is to account for the return address
that would have been pushed to the stack at the call site.
The reason for uml to respect 16-byte alignment, rather than
telling GCC to assume 8-byte alignment like the host kernel since
commit d9b0cde91c ("x86-64, gcc: Use
-mpreferred-stack-boundary=3 if supported"), is because uml links
against libc. There is no reason to assume libc is also compiled
with that flag and assumes 8-byte alignment rather than 16-byte.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40838
[2] https://c9x.me/x86/html/file_module_x86_id_180.html
Signed-off-by: YiFei Zhu <zhuyifei1999@gmail.com>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Richard Weinberger <richard@nod.at>
The generic asm futex implementation emulates atomic access to
memory by doing a get_user followed by put_user. These translate
to two mapping operations on UML with paging enabled in the
meantime. This, in turn may end up changing interrupts,
invoking the signal loop, etc.
This replaces the generic implementation by a mapping followed
by an operation on the mapped segment.
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
1. Reflect host cpu flags into the UML instance so they can
be used to select the correct implementations for xor, crypto, etc.
2. Reflect host cache alignment into UML instance. This is
important when running 32 bit on a 64 bit host as 32 bit by
default aligns to 32 while the actual alignment should be 64.
Ditto for some Xeons which align at 128.
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
To support testing of PCI/PCIe drivers in UML, add a PCI bus
support driver. This driver uses virtio, which in UML is really
just vhost-user, to talk to devices, and adds the devices to
the virtual PCI bus in the system.
Since virtio already allows DMA/bus mastering this really isn't
all that hard, of course we need the logic_iomem infrastructure
that was added by a previous patch.
The protocol to talk to the device is has a few fairly simple
messages for reading to/writing from config and IO spaces, and
messages for the device to send the various interrupts (INT#,
MSI/MSI-X and while suspended PME#).
Note that currently no offical virtio device ID is assigned for
this protocol, as a consequence this patch requires defining it
in the Kconfig, with a default that makes the driver refuse to
work at all.
Finally, in order to add support for MSI/MSI-X interrupts, some
small changes are needed in the UML IRQ code, it needs to have
more interrupts, changing NR_IRQS from 64 to 128 if this driver
is enabled, but not actually use them for anything so that the
generic IRQ domain/MSI infrastructure can allocate IRQ numbers.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
If we happen to get multiple messages while IRQS are already
suspended, we still need to handle them, since otherwise the
simulation blocks.
Remove the "prevent nesting" part, time_travel_add_irq_event()
will deal with being called multiple times just fine.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
We should be able to ndelay() from any context, even from an
interrupt context! However, this is broken (not functionally,
but locking-wise) in time-travel because we'll get into the
time-travel code and enable interrupts to handle messages on
other time-travel aware subsystems (only virtio for now).
Luckily, I've already reworked the time-travel aware signal
(interrupt) delivery for suspend/resume to have a time travel
handler, which runs directly in the context of the signal and
not from the Linux interrupt.
In order to fix this time-travel issue then, we need to do a
few things:
1) rework the signal handling code to call time-travel handlers
(only) if interrupts are disabled but signals aren't blocked,
instead of marking it only pending there. This is needed to
not deadlock other communication.
2) rework time-travel to not enable interrupts while it's
waiting for a message;
3) rework time-travel to not (just) disable interrupts but
rather block signals at a lower level while it needs them
disabled for communicating with the controller.
Finally, since now we can actually spend even virtual time
in interrupts-disabled sections, the delay warning when we
deliver a time-travel delayed interrupt is no longer valid,
things can (and should) now get delayed.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Use signals_enabled instead of always jumping through
a function call to read it, there's not much point in
that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
__NR_syscall_max is only used by x86 and UML. In contrast, __NR_syscalls is
widely used by all the architectures.
Convert __NR_syscall_max to __NR_syscalls and adjust the usage sites.
This prepares x86 to switch to the generic syscallhdr.sh script.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210517073815.97426-6-masahiroy@kernel.org
- Disable CONFIG_GCOV when built with modules
- Many fixes for W=1 related warnings
- Code cleanup
Due to lack of time I was unable to prepare a bigger pull request.
PR for the next merge window will contain more interesting material, I promise. :-)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmCRqjcWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wRmdD/9bm7ob+9PxQ/weLPgMC97J+neq
h34lxoQrxryDDv85uO5sGmg75BZ9TRC4NJUwEC9KuqsbPBDexbTiUyZQCI6p7CnZ
frfIWsnnNfSWRHluMr26/fZZnUpbz4myw3BrplH266ULPmGomCQD27Nbg+BtVIgv
2Na54B1IBVVQYi1kliirRC0+GC6JE4wifbDmqglweOMT7tiBfDbTrQP0s6Qez6jO
9/yosugD9dsnyWzlwsLHe28Wlj3mlFDTHYAWcuYzR1B4RA60tjf5w0sYaVw862o1
eq59B3aRH9v+KUkEOWa/85G91ZNRN/KO+CrLAsUDlicFelzFQwYGdWwLzMiXT++y
D9joaRRDhoACO03M4kAPAoRFyUjn4k3/WD0HNUZYhWKSRaRzVffYH9caybmsLmlt
mMXv8AQKBuZQP1EVaEPS8S1w4uprS1JTUks8YXNuD7r0/k3zPEiSGHL35JUns9BG
N8XuPFz52NGffylGEt8wriOV7qbVJ7OUnAABGyQ8hUOuDKnObx/YpJTdOacmS6NP
jXZrxV5Y1KDG1d4D9BcPbaouAw0+HPO02PuFBp8K3Uc19BZ+bo4/IpinjFXKLo9z
3LaC2mw9r6Dfws35ksrvYZiRWrH7bVXqP+EJG+SvW6OBpNYg4/woRT7hbvc0IkxL
2KTnQspgIQWVO5u4Hg==
=xWpC
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- Disable CONFIG_GCOV when built with modules
- Many fixes for W=1 related warnings
- Code cleanup
* tag 'for-linus-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Fix W=1 missing-include-dirs warnings
um: elf.h: Fix W=1 warning for empty body in 'do' statement
um: pgtable.h: Fix W=1 warning for empty body in 'do' statement
um: Remove unused including <linux/version.h>
um: Add 2 missing libs to fix various build errors
um: Replace if (cond) BUG() with BUG_ON()
um: Disable CONFIG_GCOV with MODULES
um: Remove unneeded variable 'ret'
um: Mark all kernel symbols as local
um: Fix tag order in stub_32.h
mem_init_print_info() is called in mem_init() on each architecture, and
pass NULL argument, so using void argument and move it into mm_init().
Link: https://lkml.kernel.org/r/20210317015210.33641-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86]
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> [powerpc]
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com> [sparc64]
Acked-by: Russell King <rmk+kernel@armlinux.org.uk> [arm]
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Guo Ren <guoren@kernel.org>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Peter Zijlstra" <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmCIBMIACgkQUqAMR0iA
lPIt9w//bbHUN/JsNtLCs/849oExdUn/thVajrD5yELttYZXhdzbXncNdkGX9tlU
4JmExmUoqKYdN6JhSnrcYvckHj7XXZM7pVh9IdzqRh10MEXIQ+7IUHjQc8034Zs/
W4/oZmfMtBjszap+cJ9hvdp9qaJkPz/fRLGlrbjc1K4hhxDa1gGmeD35SKswGltm
q6RzX3uRl5JbBrYsLoqb28MGYRHhjf2+Pvndoj+5Nn9FtwPSot6jAkyqY5Y6iJlS
W2EsFqOt+Kv7/I93FyQlnXC6Nx7vntmow7knmmGPXDf2BqLb0J8Bxl3fwuzpQoao
nZzL/p9GQ4ZXF6y8gRV8+RzPIcftBdayOswEDGH0LzlTkbAe/9Sq9Lo7a4Z8jxHW
ro0P+PSRK5Ksm7jvpVmSTg+Nt+XqDA5zA1lAorX1UjsyeDDNF9ndQ4C+ZNhCKo54
y+RDgtAArJMIvsHLQ53ReoOct5NnGVNb8G/r3bIAu+Dn6K3nesr6fP1XG8iduseL
yFlLB7w214BQMr2B/C+8lQvj54wWE4lea2+LNvObxC5b8puYj0fEniUxTYP6bcB5
QT+LfTToufYz4US7ggJy6hoEfohifGWVvDHbn9tXmyXotSTHH7pHdYypqY+UO+kl
7BkwzNFCm4qCIKsg8nyJxT2hDOlpcCrQx1dBIjveMqJ0c5+ahXU=
=ovSn
-----END PGP SIGNATURE-----
Merge tag 'printk-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Stop synchronizing kernel log buffer readers by logbuf_lock. As a
result, the access to the buffer is fully lockless now.
Note that printk() itself still uses locks because it tries to flush
the messages to the console immediately. Also the per-CPU temporary
buffers are still there because they prevent infinite recursion and
serialize backtraces from NMI. All this is going to change in the
future.
- kmsg_dump API rework and cleanup as a side effect of the logbuf_lock
removal.
- Make bstr_printf() aware that %pf and %pF formats could deference the
given pointer.
- Show also page flags by %pGp format.
- Clarify the documentation for plain pointer printing.
- Do not show no_hash_pointers warning multiple times.
- Update Senozhatsky email address.
- Some clean up.
* tag 'printk-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (24 commits)
lib/vsprintf.c: remove leftover 'f' and 'F' cases from bstr_printf()
printk: clarify the documentation for plain pointer printing
kernel/printk.c: Fixed mundane typos
printk: rename vprintk_func to vprintk
vsprintf: dump full information of page flags in pGp
mm, slub: don't combine pr_err with INFO
mm, slub: use pGp to print page flags
MAINTAINERS: update Senozhatsky email address
lib/vsprintf: do not show no_hash_pointers message multiple times
printk: console: remove unnecessary safe buffer usage
printk: kmsg_dump: remove _nolock() variants
printk: remove logbuf_lock
printk: introduce a kmsg_dump iterator
printk: kmsg_dumper: remove @active field
printk: add syslog_lock
printk: use atomic64_t for devkmsg_user.seq
printk: use seqcount_latch for clear_seq
printk: introduce CONSOLE_LOG_MAX
printk: consolidate kmsg_dump_get_buffer/syslog_print_all code
printk: refactor kmsg_dump_get_buffer()
...
Fix the following coccinelle reports:
./arch/um/kernel/mem.c:77:3-6: WARNING: Use BUG_ON instead of if
condition followed by BUG.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
CONFIG_GCOV doesn't work with modules, and for various reasons
it cannot work, see also
https://lore.kernel.org/r/d36ea54d8c0a8dd706826ba844a6f27691f45d55.camel@sipsolutions.net
Make CONFIG_GCOV depend on !MODULES to avoid anyone
running into issues there. This also means we need
not export the gcov symbols.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Ritesh reported a bug [1] against UML, noting that it crashed on
startup. The backtrace shows the following (heavily redacted):
(gdb) bt
...
#26 0x0000000060015b5d in sem_init () at ipc/sem.c:268
#27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-gnu/libcom_err.so.2
#28 0x00007f8990ab8fb2 in call_init (...) at dl-init.c:72
...
#40 0x00007f89909bf3a6 in nss_load_library (...) at nsswitch.c:359
...
#44 0x00007f8990895e35 in _nss_compat_getgrnam_r (...) at nss_compat/compat-grp.c:486
#45 0x00007f8990968b85 in __getgrnam_r [...]
#46 0x00007f89909d6b77 in grantpt [...]
#47 0x00007f8990a9394e in __GI_openpty [...]
#48 0x00000000604a1f65 in openpty_cb (...) at arch/um/os-Linux/sigio.c:407
#49 0x00000000604a58d0 in start_idle_thread (...) at arch/um/os-Linux/skas/process.c:598
#50 0x0000000060004a3d in start_uml () at arch/um/kernel/skas/process.c:45
#51 0x00000000600047b2 in linux_main (...) at arch/um/kernel/um_arch.c:334
#52 0x000000006000574f in main (...) at arch/um/os-Linux/main.c:144
indicating that the UML function openpty_cb() calls openpty(),
which internally calls __getgrnam_r(), which causes the nsswitch
machinery to get started.
This loads, through lots of indirection that I snipped, the
libcom_err.so.2 library, which (in an unknown function, "??")
calls sem_init().
Now, of course it wants to get libpthread's sem_init(), since
it's linked against libpthread. However, the dynamic linker
looks up that symbol against the binary first, and gets the
kernel's sem_init().
Hajime Tazaki noted that "objcopy -L" can localize a symbol,
so the dynamic linker wouldn't do the lookup this way. I tried,
but for some reason that didn't seem to work.
Doing the same thing in the linker script instead does seem to
work, though I cannot entirely explain - it *also* works if I
just add "VERSION { { global: *; }; }" instead, indicating that
something else is happening that I don't really understand. It
may be that explicitly doing that marks them with some kind of
empty version, and that's different from the default.
Explicitly marking them with a version breaks kallsyms, so that
doesn't seem to be possible.
Marking all the symbols as local seems correct, and does seem
to address the issue, so do that. Also do it for static link,
nsswitch libraries could still be loaded there.
[1] https://bugs.debian.org/983379
Reported-by: Ritesh Raj Sarraf <rrs@debian.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Tested-By: Ritesh Raj Sarraf <rrs@debian.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Rather than storing the iterator information in the registered
kmsg_dumper structure, create a separate iterator structure. The
kmsg_dump_iter structure can reside on the stack of the caller, thus
allowing lockless use of the kmsg_dump functions.
Update code that accesses the kernel logs using the kmsg_dumper
structure to use the new kmsg_dump_iter structure. For kmsg_dumpers,
this also means adding a call to kmsg_dump_rewind() to initialize
the iterator.
All this is in preparation for removal of @logbuf_lock.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org> # pstore
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210303101528.29901-13-john.ogness@linutronix.de
The kmsg_dumper can be called from any context and CPU, possibly
from multiple CPUs simultaneously. Since a static buffer is used
to retrieve the kernel logs, this buffer must be protected against
simultaneous dumping. Skip dumping if another context is already
dumping.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210303101528.29901-2-john.ogness@linutronix.de
PF_IO_WORKER are kernel threads too, but they aren't PF_KTHREAD in the
sense that we don't assign ->set_child_tid with our own structure. Just
ensure that every arch sets up the PF_IO_WORKER threads like kthreads
in the arch implementation of copy_thread().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a pseudo RTC that simply is able to send an alarm signal
waking up the system at a given time in the future.
Since apparently timerfd_create() FDs don't support SIGIO, we
use the sigio-creating helper thread, which just learned to do
suspend/resume properly in the previous patch.
For time-travel mode, OTOH, just add an event at the specified
time in the future, and that's already sufficient to wake up
the system at that point in time since suspend will just be in
an "endless wait".
For s2idle support also call pm_system_wakeup().
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>