mirror of
https://github.com/torvalds/linux.git
synced 2024-11-22 12:11:40 +00:00
f2ef39727a
548 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
0352387523 |
First step of consolidating the VDSO data page handling:
The VDSO data page handling is architecture specific for historical reasons, but there is no real technical reason to do so. Aside of that VDSO data has become a dump ground for various mechanisms and fail to provide a clear separation of the functionalities. Clean this up by: * consolidating the VDSO page data by getting rid of architecture specific warts especially in x86 and PowerPC. * removing the last includes of header files which are pulling in other headers outside of the VDSO namespace. * seperating timekeeping and other VDSO data accordingly. Further consolidation of the VDSO page handling is done in subsequent changes scheduled for the next merge window. This also lays the ground for expanding the VDSO time getters for independent PTP clocks in a generic way without making every architecture add support seperately. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmc7kyoTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoVBjD/9awdN2YeCGIM9rlHIktUdNRmRSL2SL 6av1CPffN5DenONYTXWrDYPkC4yfjUwIs8H57uzFo10yA7RQ/Qfq+O68k5GnuFew jvpmmYSZ6TT21AmAaCIhn+kdl9YbEJFvN2AWH85Bl29k9FGB04VzJlQMMjfEZ1a5 Mhwv+cfYNuPSZmU570jcxW2XgbyTWlLZBByXX/Tuz9bwpmtszba507bvo45x6gIP twaWNzrsyJpdXfMrfUnRiChN8jHlDN7I6fgQvpsoRH5FOiVwIFo0Ip2rKbk+ONfD W/rcU5oeqRIxRVDHzf2Sv8WPHMCLRv01ZHBcbJOtgvZC3YiKgKYoeEKabu9ZL1BH 6VmrxjYOBBFQHOYAKPqBuS7BgH5PmtMbDdSZXDfRaAKaCzhCRysdlWW7z48r2R// zPufb7J6Tle23AkuZWhFjvlGgSBl4zxnTFn31HYOyQps3TMI4y50Z2DhE/EeU8a6 DRl8/k1KQVDUZ6udJogS5kOr1J8pFtUPrA2uhR8UyLdx7YKiCzcdO1qWAjtXlVe8 oNpzinU+H9bQqGe9IyS7kCG9xNaCRZNkln5Q1WfnkTzg5f6ihfaCvIku3l4bgVpw 3HmcxYiC6RxQB+ozwN7hzCCKT4L9aMhr/457TNOqRkj2Elw3nvJ02L4aI86XAKLE jwO9Fkp9qcCxCw== =q5eD -----END PGP SIGNATURE----- Merge tag 'timers-vdso-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull vdso data page handling updates from Thomas Gleixner: "First steps of consolidating the VDSO data page handling. The VDSO data page handling is architecture specific for historical reasons, but there is no real technical reason to do so. Aside of that VDSO data has become a dump ground for various mechanisms and fail to provide a clear separation of the functionalities. Clean this up by: - consolidating the VDSO page data by getting rid of architecture specific warts especially in x86 and PowerPC. - removing the last includes of header files which are pulling in other headers outside of the VDSO namespace. - seperating timekeeping and other VDSO data accordingly. Further consolidation of the VDSO page handling is done in subsequent changes scheduled for the next merge window. This also lays the ground for expanding the VDSO time getters for independent PTP clocks in a generic way without making every architecture add support seperately" * tag 'timers-vdso-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits) x86/vdso: Add missing brackets in switch case vdso: Rename struct arch_vdso_data to arch_vdso_time_data powerpc: Split systemcfg struct definitions out from vdso powerpc: Split systemcfg data out of vdso data page powerpc: Add kconfig option for the systemcfg page powerpc/pseries/lparcfg: Use num_possible_cpus() for potential processors powerpc/pseries/lparcfg: Fix printing of system_active_processors powerpc/procfs: Propagate error of remap_pfn_range() powerpc/vdso: Remove offset comment from 32bit vdso_arch_data x86/vdso: Split virtual clock pages into dedicated mapping x86/vdso: Delete vvar.h x86/vdso: Access vdso data without vvar.h x86/vdso: Move the rng offset to vsyscall.h x86/vdso: Access rng vdso data without vvar.h x86/vdso: Access timens vdso data without vvar.h x86/vdso: Allocate vvar page from C code x86/vdso: Access rng data from kernel without vvar x86/vdso: Place vdso_data at beginning of vvar page x86/vdso: Use __arch_get_vdso_data() to access vdso data x86/mm/mmap: Remove arch_vma_name() ... |
||
Linus Torvalds
|
3f020399e4 |
Scheduler changes for v6.13:
- Core facilities: - Add the "Lazy preemption" model (CONFIG_PREEMPT_LAZY=y), which optimizes fair-class preemption by delaying preemption requests to the tick boundary, while working as full preemption for RR/FIFO/DEADLINE classes. (Peter Zijlstra) - x86: Enable Lazy preemption (Peter Zijlstra) - riscv: Enable Lazy preemption (Jisheng Zhang) - Initialize idle tasks only once (Thomas Gleixner) - sched/ext: Remove sched_fork() hack (Thomas Gleixner) - Fair scheduler: - Optimize the PLACE_LAG when se->vlag is zero (Huang Shijie) - Idle loop: Optimize the generic idle loop by removing unnecessary memory barrier (Zhongqiu Han) - RSEQ: - Improve cache locality of RSEQ concurrency IDs for intermittent workloads (Mathieu Desnoyers) - Waitqueues: - Make wake_up_{bit,var} less fragile (Neil Brown) - PSI: - Pass enqueue/dequeue flags to psi callbacks directly (Johannes Weiner) - Preparatory patches for proxy execution: - core: Add move_queued_task_locked helper (Connor O'Brien) - core: Consolidate pick_*_task to task_is_pushable helper (Connor O'Brien) - core: Split out __schedule() deactivate task logic into a helper (John Stultz) - core: Split scheduler and execution contexts (Peter Zijlstra) - locking/mutex: Make mutex::wait_lock irq safe (Juri Lelli) - locking/mutex: Expose __mutex_owner() (Juri Lelli) - locking/mutex: Remove wakeups from under mutex::wait_lock (Peter Zijlstra) - Misc fixes and cleanups: - core: Remove unused __HAVE_THREAD_FUNCTIONS hook support (David Disseldorp) - core: Update the comment for TIF_NEED_RESCHED_LAZY (Sebastian Andrzej Siewior) - wait: Remove unused bit_wait_io_timeout (Dr. David Alan Gilbert) - fair: remove the DOUBLE_TICK feature (Huang Shijie) - fair: fix the comment for PREEMPT_SHORT (Huang Shijie) - uclamp: Fix unnused variable warning (Christian Loehle) - rt: No PREEMPT_RT=y for all{yes,mod}config Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmc7fnQRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hZTBAAozVdWA2m51aNa67HvAZta/olmrIagVbW inwbTgqa8b+UfeWEuKOfrZr5khjEh6pLgR3dBTib1uH6xxYj/Okds+qbPWSBPVLh yzavlm/zJZM1U1XtxE3eyVfqWik4GrY7DoIMDQQr+YH7rNXonJeJkll38OI2E5MC q3Q01qyMo8RJJX8qkf3f8ObOoP/51NsVniTw0Zb2fzEhXz8FjezLlxk6cMfgSkJG lg9gfIwUZ7Xg5neRo4kJcc3Ht31KYOhWSiupBJzRD1hss/N/AybvMcTX/Cm8d07w HIAdDDAn84o46miFo/a0V/hsJZ72idWbqxVJUCtaezrpOUiFkG+uInRvG/ynr0lF 5dEI9f+6PUw8Nc7L72IyHkobjPqS2IefSaxYYCBKmxMX2qrenfTor/pKiWzzhBIl rX3MZSuUJ8NjV4rNGD/qXRM1IsMJrsDwxDyv+sRec3XdH33x286ds6aAUEPDQ6N7 96VS0sOKcNUJN8776ErNjlIxRl8HTlpkaO3nZlQIfXgTlXUpRvOuKbEWqP+606lo oANgJTKgUhgJPWZnvmdRxDjSiOp93QcImjus9i1tN81FGiEDleONsJUxu2Di1E5+ s1nCiytjq+cdvzCqFyiOZUh+g6kSZ4yXxNgLg2UvbXzX1zOeUQT3WtyKUhMPXhU8 esh1TgbUbpE= =Zcqj -----END PGP SIGNATURE----- Merge tag 'sched-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core facilities: - Add the "Lazy preemption" model (CONFIG_PREEMPT_LAZY=y), which optimizes fair-class preemption by delaying preemption requests to the tick boundary, while working as full preemption for RR/FIFO/DEADLINE classes. (Peter Zijlstra) - x86: Enable Lazy preemption (Peter Zijlstra) - riscv: Enable Lazy preemption (Jisheng Zhang) - Initialize idle tasks only once (Thomas Gleixner) - sched/ext: Remove sched_fork() hack (Thomas Gleixner) Fair scheduler: - Optimize the PLACE_LAG when se->vlag is zero (Huang Shijie) Idle loop: - Optimize the generic idle loop by removing unnecessary memory barrier (Zhongqiu Han) RSEQ: - Improve cache locality of RSEQ concurrency IDs for intermittent workloads (Mathieu Desnoyers) Waitqueues: - Make wake_up_{bit,var} less fragile (Neil Brown) PSI: - Pass enqueue/dequeue flags to psi callbacks directly (Johannes Weiner) Preparatory patches for proxy execution: - Add move_queued_task_locked helper (Connor O'Brien) - Consolidate pick_*_task to task_is_pushable helper (Connor O'Brien) - Split out __schedule() deactivate task logic into a helper (John Stultz) - Split scheduler and execution contexts (Peter Zijlstra) - Make mutex::wait_lock irq safe (Juri Lelli) - Expose __mutex_owner() (Juri Lelli) - Remove wakeups from under mutex::wait_lock (Peter Zijlstra) Misc fixes and cleanups: - Remove unused __HAVE_THREAD_FUNCTIONS hook support (David Disseldorp) - Update the comment for TIF_NEED_RESCHED_LAZY (Sebastian Andrzej Siewior) - Remove unused bit_wait_io_timeout (Dr. David Alan Gilbert) - remove the DOUBLE_TICK feature (Huang Shijie) - fix the comment for PREEMPT_SHORT (Huang Shijie) - Fix unnused variable warning (Christian Loehle) - No PREEMPT_RT=y for all{yes,mod}config" * tag 'sched-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) sched, x86: Update the comment for TIF_NEED_RESCHED_LAZY. sched: No PREEMPT_RT=y for all{yes,mod}config riscv: add PREEMPT_LAZY support sched, x86: Enable Lazy preemption sched: Enable PREEMPT_DYNAMIC for PREEMPT_RT sched: Add Lazy preemption model sched: Add TIF_NEED_RESCHED_LAZY infrastructure sched/ext: Remove sched_fork() hack sched: Initialize idle tasks only once sched: psi: pass enqueue/dequeue flags to psi callbacks directly sched/uclamp: Fix unnused variable warning sched: Split scheduler and execution contexts sched: Split out __schedule() deactivate task logic into a helper sched: Consolidate pick_*_task to task_is_pushable helper sched: Add move_queued_task_locked helper locking/mutex: Expose __mutex_owner() locking/mutex: Make mutex::wait_lock irq safe locking/mutex: Remove wakeups from under mutex::wait_lock sched: Improve cache locality of RSEQ concurrency IDs for intermittent workloads sched: idle: Optimize the generic idle loop by removing needless memory barrier ... |
||
Dave Vasilevsky
|
31daa34315 |
crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32
Fixes boot failures on 6.9 on PPC_BOOK3S_32 machines using Open Firmware.
On these machines, the kernel refuses to boot from non-zero
PHYSICAL_START, which occurs when CRASH_DUMP is on.
Since most PPC_BOOK3S_32 machines boot via Open Firmware, it should
default to off for them. Users booting via some other mechanism can still
turn it on explicitly.
Does not change the default on any other architectures for the
time being.
Link: https://lkml.kernel.org/r/20240917163720.1644584-1-dave@vasilevsky.ca
Fixes:
|
||
Jisheng Zhang
|
22aaec357c |
riscv: add PREEMPT_LAZY support
riscv has switched to GENERIC_ENTRY, so adding PREEMPT_LAZY is as simple as adding TIF_NEED_RESCHED_LAZY related definitions and enabling ARCH_HAS_PREEMPT_LAZY. [bigeasy: Replace old PREEMPT_AUTO bits with new PREEMPT_LAZY ] Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lkml.kernel.org/r/20241021151257.102296-4-bigeasy@linutronix.de |
||
Nam Cao
|
a812eee0b6 |
vdso: Rename struct arch_vdso_data to arch_vdso_time_data
The struct arch_vdso_data is only about vdso time data. So rename it to arch_vdso_time_data to make it obvious. Non time-related data will be migrated out of these structs soon. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-28-b64f0842d512@linutronix.de |
||
Conor Dooley
|
33549fcf37
|
RISC-V: disallow gcc + rust builds
During the discussion before supporting rust on riscv, it was decided
not to support gcc yet, due to differences in extension handling
compared to llvm (only the version of libclang matching the c compiler
is supported). Recently Jason Montleon reported [1] that building with
gcc caused build issues, due to unsupported arguments being passed to
libclang. After some discussion between myself and Miguel, it is better
to disable gcc + rust builds to match the original intent, and
subsequently support it when an appropriate set of extensions can be
deduced from the version of libclang.
Closes: https://lore.kernel.org/all/20240917000848.720765-2-jmontleo@redhat.com/ [1]
Link: https://lore.kernel.org/all/20240926-battering-revolt-6c6a7827413e@spud/ [2]
Fixes:
|
||
Alexandre Ghiti
|
cfb10de185
|
riscv: Fix kernel stack size when KASAN is enabled
We use Kconfig to select the kernel stack size, doubling the default
size if KASAN is enabled.
But that actually only works if KASAN is selected from the beginning,
meaning that if KASAN config is added later (for example using
menuconfig), CONFIG_THREAD_SIZE_ORDER won't be updated, keeping the
default size, which is not enough for KASAN as reported in [1].
So fix this by moving the logic to compute the right kernel stack into a
header.
Fixes:
|
||
Linus Torvalds
|
5701725692 |
Rust changes for v6.12
Toolchain and infrastructure: - Support 'MITIGATION_{RETHUNK,RETPOLINE,SLS}' (which cleans up objtool warnings), teach objtool about 'noreturn' Rust symbols and mimic '___ADDRESSABLE()' for 'module_{init,exit}'. With that, we should be objtool-warning-free, so enable it to run for all Rust object files. - KASAN (no 'SW_TAGS'), KCFI and shadow call sanitizer support. - Support 'RUSTC_VERSION', including re-config and re-build on change. - Split helpers file into several files in a folder, to avoid conflicts in it. Eventually those files will be moved to the right places with the new build system. In addition, remove the need to manually export the symbols defined there, reusing existing machinery for that. - Relax restriction on configurations with Rust + GCC plugins to just the RANDSTRUCT plugin. 'kernel' crate: - New 'list' module: doubly-linked linked list for use with reference counted values, which is heavily used by the upcoming Rust Binder. This includes 'ListArc' (a wrapper around 'Arc' that is guaranteed unique for the given ID), 'AtomicTracker' (tracks whether a 'ListArc' exists using an atomic), 'ListLinks' (the prev/next pointers for an item in a linked list), 'List' (the linked list itself), 'Iter' (an iterator over a 'List'), 'Cursor' (a cursor into a 'List' that allows to remove elements), 'ListArcField' (a field exclusively owned by a 'ListArc'), as well as support for heterogeneous lists. - New 'rbtree' module: red-black tree abstractions used by the upcoming Rust Binder. This includes 'RBTree' (the red-black tree itself), 'RBTreeNode' (a node), 'RBTreeNodeReservation' (a memory reservation for a node), 'Iter' and 'IterMut' (immutable and mutable iterators), 'Cursor' (bidirectional cursor that allows to remove elements), as well as an entry API similar to the Rust standard library one. - 'init' module: add 'write_[pin_]init' methods and the 'InPlaceWrite' trait. Add the 'assert_pinned!' macro. - 'sync' module: implement the 'InPlaceInit' trait for 'Arc' by introducing an associated type in the trait. - 'alloc' module: add 'drop_contents' method to 'BoxExt'. - 'types' module: implement the 'ForeignOwnable' trait for 'Pin<Box<T>>' and improve the trait's documentation. In addition, add the 'into_raw' method to the 'ARef' type. - 'error' module: in preparation for the upcoming Rust support for 32-bit architectures, like arm, locally allow Clippy lint for those. Documentation: - https://rust.docs.kernel.org has been announced, so link to it. - Enable rustdoc's "jump to definition" feature, making its output a bit closer to the experience in a cross-referencer. - Debian Testing now also provides recent Rust releases (outside of the freeze period), so add it to the list. MAINTAINERS: - Trevor is joining as reviewer of the "RUST" entry. And a few other small bits. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmbzNz4ACgkQGXyLc2ht IW3muA/9HcPL0QqVB5+SqSRqcatmrFU/wq8Oaa6Z/No0JaynqyikK+R1WNokUd/5 WpQi4PC1OYV+ekyAuWdkooKmaSqagH5r53XlezNw+cM5zo8y7p0otVlbepQ0t3Ky pVEmfDRIeSFXsKrg91BJUKyJf70TQlgSggDVCExlanfOjPz88C1+s3EcJ/XWYGKQ cRk/XDdbF5eNaldp2MriVF0fw7XktgIrmVzxt/z0lb4PE7RaCAnO6gSQI+90Vb2d zvyOYKS4AkqE3suFvDIIUlPUv+8XbACj0c4wvBZHH5uZGTbgWUffqygJ45GqChEt c4fS/+E8VaM1z0EvxNczC0nQkfLwkTc1mgbP+sG3VZJMPVCJ2zQan1/ond7GqCpw pt6uQaGvDsAvllm7sbiAIVaAY81icqyYWKfNBXLLEL7DhY5je5Wq+E83XQ8d5u5F EuapnZhW3y12d6UCsSe9bD8W45NFoWHPXky1TzT+whTxnX1yH9YsPXbJceGSbbgd Lw3GmUtZx2bVAMToVjNFD2lPA3OmPY1e2lk0jwzTuQrEXfnZYuzbjqs3YUijb7xR AlsWfIb0IHBwHWpB7da24ezqWP2VD4eaDdD8/+LmDSj6XLngxMNWRLKmXT000eTW vIFP9GJrvag2R3YFPhrurgGpRsp8HUTLtvcZROxp2JVQGQ7Z4Ww= =52BN -----END PGP SIGNATURE----- Merge tag 'rust-6.12' of https://github.com/Rust-for-Linux/linux Pull Rust updates from Miguel Ojeda: "Toolchain and infrastructure: - Support 'MITIGATION_{RETHUNK,RETPOLINE,SLS}' (which cleans up objtool warnings), teach objtool about 'noreturn' Rust symbols and mimic '___ADDRESSABLE()' for 'module_{init,exit}'. With that, we should be objtool-warning-free, so enable it to run for all Rust object files. - KASAN (no 'SW_TAGS'), KCFI and shadow call sanitizer support. - Support 'RUSTC_VERSION', including re-config and re-build on change. - Split helpers file into several files in a folder, to avoid conflicts in it. Eventually those files will be moved to the right places with the new build system. In addition, remove the need to manually export the symbols defined there, reusing existing machinery for that. - Relax restriction on configurations with Rust + GCC plugins to just the RANDSTRUCT plugin. 'kernel' crate: - New 'list' module: doubly-linked linked list for use with reference counted values, which is heavily used by the upcoming Rust Binder. This includes 'ListArc' (a wrapper around 'Arc' that is guaranteed unique for the given ID), 'AtomicTracker' (tracks whether a 'ListArc' exists using an atomic), 'ListLinks' (the prev/next pointers for an item in a linked list), 'List' (the linked list itself), 'Iter' (an iterator over a 'List'), 'Cursor' (a cursor into a 'List' that allows to remove elements), 'ListArcField' (a field exclusively owned by a 'ListArc'), as well as support for heterogeneous lists. - New 'rbtree' module: red-black tree abstractions used by the upcoming Rust Binder. This includes 'RBTree' (the red-black tree itself), 'RBTreeNode' (a node), 'RBTreeNodeReservation' (a memory reservation for a node), 'Iter' and 'IterMut' (immutable and mutable iterators), 'Cursor' (bidirectional cursor that allows to remove elements), as well as an entry API similar to the Rust standard library one. - 'init' module: add 'write_[pin_]init' methods and the 'InPlaceWrite' trait. Add the 'assert_pinned!' macro. - 'sync' module: implement the 'InPlaceInit' trait for 'Arc' by introducing an associated type in the trait. - 'alloc' module: add 'drop_contents' method to 'BoxExt'. - 'types' module: implement the 'ForeignOwnable' trait for 'Pin<Box<T>>' and improve the trait's documentation. In addition, add the 'into_raw' method to the 'ARef' type. - 'error' module: in preparation for the upcoming Rust support for 32-bit architectures, like arm, locally allow Clippy lint for those. Documentation: - https://rust.docs.kernel.org has been announced, so link to it. - Enable rustdoc's "jump to definition" feature, making its output a bit closer to the experience in a cross-referencer. - Debian Testing now also provides recent Rust releases (outside of the freeze period), so add it to the list. MAINTAINERS: - Trevor is joining as reviewer of the "RUST" entry. And a few other small bits" * tag 'rust-6.12' of https://github.com/Rust-for-Linux/linux: (54 commits) kasan: rust: Add KASAN smoke test via UAF kbuild: rust: Enable KASAN support rust: kasan: Rust does not support KHWASAN kbuild: rust: Define probing macros for rustc kasan: simplify and clarify Makefile rust: cfi: add support for CFI_CLANG with Rust cfi: add CONFIG_CFI_ICALL_NORMALIZE_INTEGERS rust: support for shadow call stack sanitizer docs: rust: include other expressions in conditional compilation section kbuild: rust: replace proc macros dependency on `core.o` with the version text kbuild: rust: rebuild if the version text changes kbuild: rust: re-run Kconfig if the version text changes kbuild: rust: add `CONFIG_RUSTC_VERSION` rust: avoid `box_uninit_write` feature MAINTAINERS: add Trevor Gross as Rust reviewer rust: rbtree: add `RBTree::entry` rust: rbtree: add cursor rust: rbtree: add mutable iterator rust: rbtree: add iterator rust: rbtree: add red-black tree implementation backed by the C version ... |
||
Linus Torvalds
|
97d8894b6f |
RISC-V Patches for the 6.12 Merge Window, Part 1
* Support for using Zkr to seed KASLR. * Support for IPI-triggered CPU backtracing. * Support for generic CPU vulnerabilities reporting to userspace. * A few cleanups for missing licenses. * The size limit on the XIP kernel has been removed. * Support for tracing userspace stacks. * Support for the Svvptc extension. * Various cleanups and fixes throughout the tree. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmbykZITHHBhbG1lckBk YWJiZWx0LmNvbQAKCRAuExnzX7sYiTDLEACABTPlMX+Ke1lMYWj6MLBTXMSlQJ6w TKLVk3slp4POPsd+ViWy4J6EJDpCkNp6iK30Bv4AGainA3RJnbS7neKYy+MTw0ZV pJWTn73sxHeF+APnZ+geFYcmyteL/SZplgHgwLipFaojs7i/+tOvLFRRD3LueCz6 Q6Muvke9R5uN6LL3tUrmIhJeyZjOwJtKEYoRz6Fo5Tq3RRK0VHYZkdpqRQ86rr9U yNbRNlBLlANstq8xjtiwm22ImPGIsvvhs0AvFft0H72zhf3Tfy0VcTLlDJYwkdKN Bz59v4N4jg1ajXuAsj4/BQdj5uRkzJNZ3Yq1yvMmGI+2UV1tInHEMeZi63+4gs8F 0FYCziA+j6/08u8v8CrwdDOpJ9Iq/NMV6359bt5ySu3rM6QnPRGnHGkv4Ptk9Oyh sSPiuHS8YxpRBOB8xXNp5xFipTZ4+65VqCz253pAsbbp5aZ/o9Jw/bbBFFWcSsRZ tV2xhELVzPiC7dowfYkQhcNZOLlCaJ/Mvo2nMOtjbUwzaXkcjIRcYEy7+dTlfXyr 3h5sStjWAKEmtLEvCYSI+lAT544tbV1x+lCMJGuvztapsTMtAP4GpQKblXl5qqnV L+VWIPJV1elI26H/p/Max9Z1s48NtfwoJSRL7Rnaya6uJ3iWG/BtajHlFbvgOkJf ObPZ//Yr9e91gg== =zDwL -----END PGP SIGNATURE----- Merge tag 'riscv-for-linus-6.12-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support using Zkr to seed KASLR - Support IPI-triggered CPU backtracing - Support for generic CPU vulnerabilities reporting to userspace - A few cleanups for missing licenses - The size limit on the XIP kernel has been removed - Support for tracing userspace stacks - Support for the Svvptc extension - Various cleanups and fixes throughout the tree * tag 'riscv-for-linus-6.12-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (47 commits) crash: Fix riscv64 crash memory reserve dead loop perf/riscv-sbi: Add platform specific firmware event handling tools: Optimize ring buffer for riscv tools: Add riscv barrier implementation RISC-V: Don't have MAX_PHYSMEM_BITS exceed phys_addr_t ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE riscv: Enable bitops instrumentation riscv: Omit optimized string routines when using KASAN ACPI: RISCV: Make acpi_numa_get_nid() to be static riscv: Randomize lower bits of stack address selftests: riscv: Allow mmap test to compile on 32-bit riscv: Make riscv_isa_vendor_ext_andes array static riscv: Use LIST_HEAD() to simplify code riscv: defconfig: Disable RZ/Five peripheral support RISC-V: Implement kgdb_roundup_cpus() to enable future NMI Roundup riscv: avoid Imbalance in RAS riscv: cacheinfo: Add back init_cache_level() function riscv: Remove unused _TIF_WORK_MASK drivers/perf: riscv: Remove redundant macro check riscv: define ILLEGAL_POINTER_VALUE for 64bit ... |
||
Linus Torvalds
|
7856a56541 |
Many singleton patches - please see the various changelogs for details.
Quite a lot of nilfs2 work this time around. Notable patch series in this pull request are: "mul_u64_u64_div_u64: new implementation" by Nicolas Pitre, with assistance from Uwe Kleine-König. Reimplement mul_u64_u64_div_u64() to provide (much) more accurate results. The current implementation was causing Uwe some issues in the PWM drivers. "xz: Updates to license, filters, and compression options" from Lasse Collin. Miscellaneous maintenance and kinor feature work to the xz decompressor. "Fix some GDB command error and add some GDB commands" from Kuan-Ying Lee. Fixes and enhancements to the gdb scripts. "treewide: add missing MODULE_DESCRIPTION() macros" from Jeff Johnson. Adds lots of MODULE_DESCRIPTIONs, thus fixing lots of warnings about this. "nilfs2: add support for some common ioctls" from Ryusuke Konishi. Adds various commonly-available ioctls to nilfs2. "This series fixes a number of formatting issues in kernel doc comments" from Ryusuke Konishi does that. "nilfs2: prevent unexpected ENOENT propagation" from Ryusuke Konishi. Fix issues where -ENOENT was being unintentionally and inappropriately returned to userspace. "nilfs2: assorted cleanups" from Huang Xiaojia. "nilfs2: fix potential issues with empty b-tree nodes" from Ryusuke Konishi fixes some issues which can occur on corrupted nilfs2 filesystems. "scripts/decode_stacktrace.sh: improve error reporting and usability" from Luca Ceresoli does those things. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZu7dpAAKCRDdBJ7gKXxA jsPqAPwMDEZyKlfSw7QioEHNHDkmkbP7VYCYR0CbUnppbztwpAD8D37aVbWQ+UzM 3nnOq3W2Pc2o/20zqi8Upf1mnvUrygQ= =/NWE -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2024-09-21-07-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Many singleton patches - please see the various changelogs for details. Quite a lot of nilfs2 work this time around. Notable patch series in this pull request are: - "mul_u64_u64_div_u64: new implementation" by Nicolas Pitre, with assistance from Uwe Kleine-König. Reimplement mul_u64_u64_div_u64() to provide (much) more accurate results. The current implementation was causing Uwe some issues in the PWM drivers. - "xz: Updates to license, filters, and compression options" from Lasse Collin. Miscellaneous maintenance and kinor feature work to the xz decompressor. - "Fix some GDB command error and add some GDB commands" from Kuan-Ying Lee. Fixes and enhancements to the gdb scripts. - "treewide: add missing MODULE_DESCRIPTION() macros" from Jeff Johnson. Adds lots of MODULE_DESCRIPTIONs, thus fixing lots of warnings about this. - "nilfs2: add support for some common ioctls" from Ryusuke Konishi. Adds various commonly-available ioctls to nilfs2. - "This series fixes a number of formatting issues in kernel doc comments" from Ryusuke Konishi does that. - "nilfs2: prevent unexpected ENOENT propagation" from Ryusuke Konishi. Fix issues where -ENOENT was being unintentionally and inappropriately returned to userspace. - "nilfs2: assorted cleanups" from Huang Xiaojia. - "nilfs2: fix potential issues with empty b-tree nodes" from Ryusuke Konishi fixes some issues which can occur on corrupted nilfs2 filesystems. - "scripts/decode_stacktrace.sh: improve error reporting and usability" from Luca Ceresoli does those things" * tag 'mm-nonmm-stable-2024-09-21-07-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (103 commits) list: test: increase coverage of list_test_list_replace*() list: test: fix tests for list_cut_position() proc: use __auto_type more treewide: correct the typo 'retun' ocfs2: cleanup return value and mlog in ocfs2_global_read_info() nilfs2: remove duplicate 'unlikely()' usage nilfs2: fix potential oob read in nilfs_btree_check_delete() nilfs2: determine empty node blocks as corrupted nilfs2: fix potential null-ptr-deref in nilfs_btree_insert() user_namespace: use kmemdup_array() instead of kmemdup() for multiple allocation tools/mm: rm thp_swap_allocator_test when make clean squashfs: fix percpu address space issues in decompressor_multi_percpu.c lib: glob.c: added null check for character class nilfs2: refactor nilfs_segctor_thread() nilfs2: use kthread_create and kthread_stop for the log writer thread nilfs2: remove sc_timer_task nilfs2: do not repair reserved inode bitmap in nilfs_new_inode() nilfs2: eliminate the shared counter and spinlock for i_generation nilfs2: separate inode type information from i_state field nilfs2: use the BITS_PER_LONG macro ... |
||
Sebastian Andrzej Siewior
|
2638e4e6b1 |
riscv: Allow to enable PREEMPT_RT.
It is really time. riscv has all the required architecture related changes, that have been identified over time, in order to enable PREEMPT_RT. With the recent printk changes, the last known road block has been addressed. Allow to enable PREEMPT_RT on riscv. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nam Cao <namcao@linutronix.de> # Visionfive 2 Link: https://lore.kernel.org/all/20240906111841.562402-4-bigeasy@linutronix.de |
||
Palmer Dabbelt
|
9b2863e2cc
|
Merge patch series "riscv: select ARCH_USE_SYM_ANNOTATIONS"
Jisheng Zhang <jszhang@kernel.org> says:
commit
|
||
Palmer Dabbelt
|
f25170a053
|
Merge patch series "riscv: stacktrace: Add USER_STACKTRACE support"
Jinjie Ruan <ruanjinjie@huawei.com> says: Add RISC-V USER_STACKTRACE support, and fix the fp alignment bug in perf_callchain_user() by the way as Björn pointed out. * b4-shazam-merge: riscv: stacktrace: Add USER_STACKTRACE support riscv: Fix fp alignment bug in perf_callchain_user() Link: https://lore.kernel.org/r/20240708032847.2998158-1-ruanjinjie@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Jisheng Zhang
|
5c178472af
|
riscv: define ILLEGAL_POINTER_VALUE for 64bit
This is used in poison.h for poison pointer offset. Based on current
SV39, SV48 and SV57 vm layout, 0xdead000000000000 is a proper value
that is not mappable, this can avoid potentially turning an oops to
an expolit.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Fixes:
|
||
Jisheng Zhang
|
7c9d980e46
|
riscv: select ARCH_USE_SYM_ANNOTATIONS
Now, riscv has been converted to the new style SYM_ assembler annotations. So select ARCH_USE_SYM_ANNOTATIONS to ensure the deprecated macros such as ENTRY(), END(), WEAK() and so on are not available and we don't regress. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-By: Clément Léger <cleger@rivosinc.com> Link: https://lore.kernel.org/r/20240709160536.3690-3-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Jinjie Ruan
|
1a74833182
|
riscv: stacktrace: Add USER_STACKTRACE support
Currently, userstacktrace is unsupported for riscv. So use the perf_callchain_user() code as blueprint to implement the arch_stack_walk_user() which add userstacktrace support on riscv. Meanwhile, we can use arch_stack_walk_user() to simplify the implementation of perf_callchain_user(). A ftrace test case is shown as below: # cd /sys/kernel/debug/tracing # echo 1 > options/userstacktrace # echo 1 > options/sym-userobj # echo 1 > events/sched/sched_process_fork/enable # cat trace ...... bash-178 [000] ...1. 97.968395: sched_process_fork: comm=bash pid=178 child_comm=bash child_pid=231 bash-178 [000] ...1. 97.970075: <user stack trace> => /lib/libc.so.6[+0xb5090] Also a simple perf test is ok as below: # perf record -e cpu-clock --call-graph fp top # perf report --call-graph ..... [[31m 66.54%[[m 0.00% top [kernel.kallsyms] [k] ret_from_exception | ---ret_from_exception | |--[[31m58.97%[[m--do_trap_ecall_u | | | |--[[31m17.34%[[m--__riscv_sys_read | | ksys_read | | | | | --[[31m16.88%[[m--vfs_read | | | | | |--[[31m10.90%[[m--seq_read Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Tested-by: Jinjie Ruan <ruanjinjie@huawei.com> Cc: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20240708032847.2998158-3-ruanjinjie@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Alice Ryhl
|
d077242d68 |
rust: support for shadow call stack sanitizer
Add all of the flags that are needed to support the shadow call stack (SCS) sanitizer with Rust, and updates Kconfig to allow only configurations that work. The -Zfixed-x18 flag is required to use SCS on arm64, and requires rustc version 1.80.0 or greater. This restriction is reflected in Kconfig. When CONFIG_DYNAMIC_SCS is enabled, the build will be configured to include unwind tables in the build artifacts. Dynamic SCS uses the unwind tables at boot to find all places that need to be patched. The -Cforce-unwind-tables=y flag ensures that unwind tables are available for Rust code. In non-dynamic mode, the -Zsanitizer=shadow-call-stack flag is what enables the SCS sanitizer. Using this flag requires rustc version 1.82.0 or greater on the targets used by Rust in the kernel. This restriction is reflected in Kconfig. It is possible to avoid the requirement of rustc 1.80.0 by using -Ctarget-feature=+reserve-x18 instead of -Zfixed-x18. However, this flag emits a warning during the build, so this patch does not add support for using it and instead requires 1.80.0 or greater. The dependency is placed on `select HAVE_RUST` to avoid a situation where enabling Rust silently turns off the sanitizer. Instead, turning on the sanitizer results in Rust being disabled. We generally do not want changes to CONFIG_RUST to result in any mitigations being changed or turned off. At the time of writing, rustc 1.82.0 only exists via the nightly release channel. There is a chance that the -Zsanitizer=shadow-call-stack flag will end up needing 1.83.0 instead, but I think it is small. Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20240829-shadow-call-stack-v7-1-2f62a4432abf@google.com [ Fixed indentation using spaces. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org> |
||
Rafael J. Wysocki
|
45de40574f |
Merge branch 'acpi-riscv'
Merge ACPI and irqchip updates related to external interrupt controller support on RISC-V: - Add ACPI device enumeration support for interrupt controller probing including taking dependencies into account (Sunil V L). - Implement ACPI-based interrupt controller probing on RISC-V (Sunil V L). - Add ACPI support for AIA in riscv-intc and add ACPI support to riscv-imsic, riscv-aplic, and sifive-plic (Sunil V L). * acpi-riscv: irqchip/sifive-plic: Add ACPI support irqchip/riscv-aplic: Add ACPI support irqchip/riscv-imsic: Add ACPI support irqchip/riscv-imsic-state: Create separate function for DT irqchip/riscv-intc: Add ACPI support for AIA ACPI: RISC-V: Implement function to add implicit dependencies ACPI: RISC-V: Initialize GSI mapping structures ACPI: RISC-V: Implement function to reorder irqchip probe entries ACPI: RISC-V: Implement PCI related functionality ACPI: pci_link: Clear the dependencies after probe ACPI: bus: Add RINTC IRQ model for RISC-V ACPI: scan: Define weak function to populate dependencies ACPI: scan: Add RISC-V interrupt controllers to honor list ACPI: scan: Refactor dependency creation ACPI: bus: Add acpi_riscv_init() function ACPI: scan: Add a weak arch_sort_irqchip_probe() to order the IRQCHIP probe arm64: PCI: Migrate ACPI related functions to pci-acpi.c |
||
Anton Blanchard
|
5ba7a75a53
|
riscv: Fix toolchain vector detection
A recent change to gcc flags rv64iv as no longer valid:
cc1: sorry, unimplemented: Currently the 'V' implementation
requires the 'M' extension
and as a result vector support is disabled. Fix this by adding m
to our toolchain vector detection code.
Signed-off-by: Anton Blanchard <antonb@tenstorrent.com>
Fixes:
|
||
Lasse Collin
|
ab4ce9831a |
riscv: boot: add Image.xz support
The Image.* targets existed for other compressors already. Bootloader support is needed for decompression. This is for CONFIG_EFI_ZBOOT=n. With CONFIG_EFI_ZBOOT=y, XZ was already available. Comparision with Linux 6.10 RV64GC tinyconfig (in KiB): 1027 Image 594 Image.gz 541 Image.zst 510 Image.lzma 474 Image.xz Link: https://lkml.kernel.org/r/20240721133633.47721-17-lasse.collin@tukaani.org Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Jules Maselbas <jmaselbas@zdiv.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Joel Stanley <joel@jms.id.au> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Jubin Zhong <zhongjubin@huawei.com> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rui Li <me@lirui.org> Cc: Sam James <sam@gentoo.org> Cc: Simon Glass <sjg@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Sunil V L
|
01415e78cf |
ACPI: RISC-V: Implement PCI related functionality
Replace the dummy implementation for PCI related functions with actual implementation. This needs ECAM and MCFG CONFIG options to be enabled for RISC-V. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> Link: https://patch.msgid.link/20240812005929.113499-10-sunilvl@ventanamicro.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
Jinjie Ruan
|
0e3f3649d4
|
riscv: Enable generic CPU vulnerabilites support
Currently x86, ARM and ARM64 support generic CPU vulnerabilites, but RISC-V not, such as: # cd /sys/devices/system/cpu/vulnerabilities/ x86: # cat spec_store_bypass Mitigation: Speculative Store Bypass disabled via prctl and seccomp # cat meltdown Not affected ARM64: # cat spec_store_bypass Mitigation: Speculative Store Bypass disabled via prctl and seccomp # cat meltdown Mitigation: PTI RISC-V: # cat /sys/devices/system/cpu/vulnerabilities # ... No such file or directory As SiFive RISC-V Core IP offerings are not affected by Meltdown and Spectre, it can use the default weak function as below: # cat spec_store_bypass Not affected # cat meltdown Not affected Link: https://www.sifive.cn/blog/sifive-statement-on-meltdown-and-spectre Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://lore.kernel.org/r/20240703022732.2068316-1-ruanjinjie@huawei.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Linus Torvalds
|
c9f33436d8 |
RISC-V Patches for the 6.11 Merge Window, Part 2
* Support for NUMA (via SRAT and SLIT), console output (via SPCR), and cache info (via PPTT) on ACPI-based systems. * The trap entry/exit code no longer breaks the return address stack predictor on many systems, which results in an improvement to trap latency. * Support for HAVE_ARCH_STACKLEAK. * The sv39 linear map has been extended to support 128GiB mappings. * The frequency of the mtime CSR is now visible via hwprobe. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmaj2EYTHHBhbG1lckBk YWJiZWx0LmNvbQAKCRAuExnzX7sYiVG3D/9kNHTI09iPDJd6fTChE3cpMxy7xXXE URX3Avu+gYsJmIbYyg4RnQ8FGFN7icKBCrQqs7JmLliU0NU+YMcCcjsJA2QaivbD VAlaex1qNcvNGteHrpbqhr3Zs4zw8GlBkB3KFTLyPAp61bybGo0a/A5ONJ7ScQIW RWHewAPgb86cQ0Q34JpO87TqvMM0KMvhQP5dip+olaFjLRBzhXmGFZfHqA80kTWl 0ytYclVCHZMtO/5mnQpuIOVs1IKw9L4wa0sivOQF0iLTqfKDFALa6yZsThHA/w3e JVuBAdQhcPZ3fgO2fUfJPlW16GmRC2/tdiFg5NFw8k4vo7DYBwX55ztPKXqDrJDM 8ah85IeLiPar/A/uHdn6bPjK+aGMuzklKF50r62XXAc2fL8mza1sdvKCVOy2EOLn JyGI9c/10KpvN/DW8g7hPefhvbx4+tCKkFcPqf++VQha6W8cQdCKi+Li0Pm8TTnp XPQjIvSlDDG1Pl4ofgBSFoyB8pkBXNzvv8NZp+YYtnqSOLAKaZuP+KwA8TwHdvGM pdCXcL3KHiLy4/pJWEoNTutD0mbJ7PUIb2P/KkjqYDgp4F1n0Hg+/aeSIp+7a4Pv yTBctIGxrlriQMIdtWCR8tyhcPP4pDpGYkW0K15EE16G0NK0fjD89LEXYqT6ae2R C0QgiwnVe/eopg== =zeUn -----END PGP SIGNATURE----- Merge tag 'riscv-for-linus-6.11-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - Support for NUMA (via SRAT and SLIT), console output (via SPCR), and cache info (via PPTT) on ACPI-based systems. - The trap entry/exit code no longer breaks the return address stack predictor on many systems, which results in an improvement to trap latency. - Support for HAVE_ARCH_STACKLEAK. - The sv39 linear map has been extended to support 128GiB mappings. - The frequency of the mtime CSR is now visible via hwprobe. * tag 'riscv-for-linus-6.11-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (21 commits) RISC-V: Provide the frequency of time CSR via hwprobe riscv: Extend sv39 linear mapping max size to 128G riscv: enable HAVE_ARCH_STACKLEAK riscv: signal: Remove unlikely() from WARN_ON() condition riscv: Improve exception and system call latency RISC-V: Select ACPI PPTT drivers riscv: cacheinfo: initialize cacheinfo's level and type from ACPI PPTT riscv: cacheinfo: remove the useless input parameter (node) of ci_leaf_init() RISC-V: ACPI: Enable SPCR table for console output on RISC-V riscv: boot: remove duplicated targets line trace: riscv: Remove deprecated kprobe on ftrace support riscv: cpufeature: Extract common elements from extension checking riscv: Introduce vendor variants of extension helpers riscv: Add vendor extensions to /proc/cpuinfo riscv: Extend cpufeature.c to detect vendor extensions RISC-V: run savedefconfig for defconfig RISC-V: hwprobe: sort EXT_KEY()s in hwprobe_isa_ext0() alphabetically ACPI: NUMA: replace pr_info with pr_debug in arch_acpi_numa_init ACPI: NUMA: change the ACPI_NUMA to a hidden option ACPI: NUMA: Add handler for SRAT RINTC affinity structure ... |
||
Palmer Dabbelt
|
3aa1a7d013
|
Merge patch series "RISC-V: Select ACPI PPTT drivers"
This series adds support for ACPI PPTT via cacheinfo. * b4-shazam-merge: RISC-V: Select ACPI PPTT drivers riscv: cacheinfo: initialize cacheinfo's level and type from ACPI PPTT riscv: cacheinfo: remove the useless input parameter (node) of ci_leaf_init() Link: https://lore.kernel.org/r/20240617131425.7526-1-cuiyunhui@bytedance.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Palmer Dabbelt
|
ec1dc56b54
|
Merge patch "Enable SPCR table for console output on RISC-V"
Sia Jee Heng <jeeheng.sia@starfivetech.com> says: The ACPI SPCR code has been used to enable console output for ARM64 and X86. The same code can be reused for RISC-V. Furthermore, SPCR table is mandated for headless system as outlined in the RISC-V BRS Specification, chapter 6. * b4-shazam-merge: RISC-V: ACPI: Enable SPCR table for console output on RISC-V Link: https://lore.kernel.org/r/20240502073751.102093-1-jeeheng.sia@starfivetech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Jisheng Zhang
|
b5db73fb18
|
riscv: enable HAVE_ARCH_STACKLEAK
Add support for the stackleak feature. Whenever the kernel returns to user space the kernel stack is filled with a poison value. At the same time, disables the plugin in EFI stub code because EFI stub is out of scope for the protection. Tested on qemu and milkv duo: / # echo STACKLEAK_ERASING > /sys/kernel/debug/provoke-crash/DIRECT [ 38.675575] lkdtm: Performing direct entry STACKLEAK_ERASING [ 38.678448] lkdtm: stackleak stack usage: [ 38.678448] high offset: 288 bytes [ 38.678448] current: 496 bytes [ 38.678448] lowest: 1328 bytes [ 38.678448] tracked: 1328 bytes [ 38.678448] untracked: 448 bytes [ 38.678448] poisoned: 14312 bytes [ 38.678448] low offset: 8 bytes [ 38.689887] lkdtm: OK: the rest of the thread stack is properly erased Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240623235316.2010-1-jszhang@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Yunhui Cui
|
66381d3677
|
RISC-V: Select ACPI PPTT drivers
After adding ACPI support to populate_cache_leaves(), RISC-V can build cacheinfo through the ACPI PPTT table, thus enabling the ACPI_PPTT configuration. Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Link: https://lore.kernel.org/r/20240617131425.7526-3-cuiyunhui@bytedance.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Sia Jee Heng
|
38738947db
|
RISC-V: ACPI: Enable SPCR table for console output on RISC-V
The ACPI SPCR code has been used to enable console output for ARM64 and X86. The same code can be reused for RISC-V. Furthermore, SPCR table is mandated for headless system as outlined in the RISC-V BRS Specification, chapter 6. Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Link: https://lore.kernel.org/r/20240502073751.102093-2-jeeheng.sia@starfivetech.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Jinjie Ruan
|
3308172276
|
trace: riscv: Remove deprecated kprobe on ftrace support
Since commit
|
||
Linus Torvalds
|
ca83c61cb3 |
Kbuild updates for v6.11
- Remove tristate choice support from Kconfig - Stop using the PROVIDE() directive in the linker script - Reduce the number of links for the combination of CONFIG_DEBUG_INFO_BTF and CONFIG_KALLSYMS - Enable the warning for symbol reference to .exit.* sections by default - Fix warnings in RPM package builds - Improve scripts/make_fit.py to generate a FIT image with separate base DTB and overlays - Improve choice value calculation in Kconfig - Fix conditional prompt behavior in choice in Kconfig - Remove support for the uncommon EMAIL environment variable in Debian package builds - Remove support for the uncommon "name <email>" form for the DEBEMAIL environment variable - Raise the minimum supported GNU Make version to 4.0 - Remove stale code for the absolute kallsyms - Move header files commonly used for host programs to scripts/include/ - Introduce the pacman-pkg target to generate a pacman package used in Arch Linux - Clean up Kconfig -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmagBLUVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsGmoUQAJ8pnURs0g+Rcyk6bdY/qtXBYkS+ nXpIK1ssFgRRgAQdeszYtvBqLFzb0wRCSie87G1AriD/JkVVTjCCY1For1y+vs0u a7HfxitHhZpPyZW/T+WMQ3LViNccpkx+DFAcoRH8xOY/XPEJKVUby332jOIXMuyg +NKIELQJVsLhcDofTUGb5VfIQektw219n5c4jKjXdNk4ZtE24xCRM5X528ZebwWJ RZhMvJ968PyIH1IRXvNt6dsKBxoGIwPP8IO6yW9hzHaNsBqt7MGSChSel7r1VKpk iwCNApJvEiVBe5wvTSVOVro7/8p/AZ70CQAqnMJV+dNnRqtGqW7NvL6XAjZRJgJJ Uxe5NSrXgQd3FtqfcbXLetBgp9zGVt328nHm1HXHR5rFsvoOiTvO7hHPbhA+OoWJ fs+jHzEXdAMRgsNrczPWU5Svq6MgGe4v8HBf0m8N1Uy65t/O+z9ti2QAw7kIFlbu /VSFNjw4CHmNxGhnH0khCMsy85FwVIt9Ux+2d6IEc0gP8S1Qa1HgHGAoVI4U51eS 9dxEPVJNPOugaIVHheuS3wimEO6wzaJcQHn4IXaasMA7P6Yo4G/jiGoy4cb9qPTM Hb+GaOltUy7vDoG4D2LSym8zR8rdKwbIf/5psdZrq/IWVKq5p+p7KWs3aOykSoM7 o6Hb532Ioalhm8je =BYu7 -----END PGP SIGNATURE----- Merge tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Remove tristate choice support from Kconfig - Stop using the PROVIDE() directive in the linker script - Reduce the number of links for the combination of CONFIG_KALLSYMS and CONFIG_DEBUG_INFO_BTF - Enable the warning for symbol reference to .exit.* sections by default - Fix warnings in RPM package builds - Improve scripts/make_fit.py to generate a FIT image with separate base DTB and overlays - Improve choice value calculation in Kconfig - Fix conditional prompt behavior in choice in Kconfig - Remove support for the uncommon EMAIL environment variable in Debian package builds - Remove support for the uncommon "name <email>" form for the DEBEMAIL environment variable - Raise the minimum supported GNU Make version to 4.0 - Remove stale code for the absolute kallsyms - Move header files commonly used for host programs to scripts/include/ - Introduce the pacman-pkg target to generate a pacman package used in Arch Linux - Clean up Kconfig * tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (65 commits) kbuild: doc: gcc to CC change kallsyms: change sym_entry::percpu_absolute to bool type kallsyms: unify seq and start_pos fields of struct sym_entry kallsyms: add more original symbol type/name in comment lines kallsyms: use \t instead of a tab in printf() kallsyms: avoid repeated calculation of array size for markers kbuild: add script and target to generate pacman package modpost: use generic macros for hash table implementation kbuild: move some helper headers from scripts/kconfig/ to scripts/include/ Makefile: add comment to discourage tools/* addition for kernel builds kbuild: clean up scripts/remove-stale-files kconfig: recursive checks drop file/lineno kbuild: rpm-pkg: introduce a simple changelog section for kernel.spec kallsyms: get rid of code for absolute kallsyms kbuild: Create INSTALL_PATH directory if it does not exist kbuild: Abort make on install failures kconfig: remove 'e1' and 'e2' macros from expression deduplication kconfig: remove SYMBOL_CHOICEVAL flag kconfig: add const qualifiers to several function arguments kconfig: call expr_eliminate_yn() at least once in expr_eliminate_dups() ... |
||
Palmer Dabbelt
|
b9a603da42
|
Merge patch series "riscv: Separate vendor extensions from standard extensions"
Charlie Jenkins <charlie@rivosinc.com> says: All extensions, both standard and vendor, live in one struct "riscv_isa_ext". There is currently one vendor extension, xandespmu, but it is likely that more vendor extensions will be added to the kernel in the future. As more vendor extensions (and standard extensions) are added, riscv_isa_ext will become more bloated with a mix of vendor and standard extensions. This also allows each vendor to be conditionally enabled through Kconfig. * b4-shazam-merge: riscv: cpufeature: Extract common elements from extension checking riscv: Introduce vendor variants of extension helpers riscv: Add vendor extensions to /proc/cpuinfo riscv: Extend cpufeature.c to detect vendor extensions Link: https://lore.kernel.org/r/20240719-support_vendor_extensions-v3-0-0af7587bbec0@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Charlie Jenkins
|
23c996fc2b
|
riscv: Extend cpufeature.c to detect vendor extensions
Instead of grouping all vendor extensions into the same riscv_isa_ext that standard instructions use, create a struct "riscv_isa_vendor_ext_data_list" that allows each vendor to maintain their vendor extensions independently of the standard extensions. xandespmu is currently the only vendor extension so that is the only extension that is affected by this change. An additional benefit of this is that the extensions of each vendor can be conditionally enabled. A config RISCV_ISA_VENDOR_EXT_ANDES has been added to allow for that. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andy Chiu <andy.chiu@sifive.com> Tested-by: Yu Chien Peter Lin <peterlin@andestech.com> Reviewed-by: Yu Chien Peter Lin <peterlin@andestech.com> Link: https://lore.kernel.org/r/20240719-support_vendor_extensions-v3-1-0af7587bbec0@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Linus Torvalds
|
f557af081d |
RISC-V Patches for the 6.11 Merge Window, Part 1
* Support for various new ISA extensions: * The Zve32[xf] and Zve64[xfd] sub-extensios of the vector extension. * Zimop and Zcmop for may-be-operations. * The Zca, Zcf, Zcd and Zcb sub-extensions of the C extension. * Zawrs, * riscv,cpu-intc is now dtschema. * A handful of performance improvements and cleanups to text patching. * Support for memory hot{,un}plug * The highest user-allocatable virtual address is now visible in hwprobe. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmabIGETHHBhbG1lckBk YWJiZWx0LmNvbQAKCRAuExnzX7sYiQe8D/9QPCaOnoP5OCZbwjkRBwaVxyknNyD0 l+YNXk7Jk3B/oaOv3d7Bz+uWt1SG4j4jkfyuGJ81StZykp4/R7T823TZrPhog9VX IJm580MtvE49I2i1qJ+ZQti9wpiM+80lFnyMCzY6S7rrM9m62tKgUpARZcWoA55P iUo5bku99TYCcU2k1pnPrNSPQvVpECpv7tG0PwKpQd5DiYjbPp+aw5cQWN+izdOB 6raOZ0buzP7McszvO/gcJs+kuHwrp0JSRvNxc2pwYZ0lx00p3hSV8UdtIMlI9Qm/ z3gkQGHwc6UVMPHo1x0Gr5ShUTCI/iSwy4/7aY4NNXF6Sj99b8alt9GcbYqNAE7V k7sibCR7dhL4ods/GFMmzR7cQYlwlwtO+/ILak7rXhNvA32Xy1WUABguhP9ElTmw 1ZS2hnRv6wc7MA2V7HBamf5mPXM6HQyC3oKy3njzDSJdiGIG7aa+TOfRAD+L/1Du QjIrKp6XcPIsZNjh8H3nMDVJ0VvDNnS4d4LbfNQc23VPzf57kFUqbli1pS0hBjFT ELEItH9dgSx+T5Qebdy/QMC3RG8Yc1IUdw6VQ7Jny/uCCEZNq+VZ+bXxspMmswCp sUIyDplJTJfRt3G2OxK0b95x6oj8jbaJOQfv6PBF71dDBsChg8eXFVJ2NDrX4Bvr h2MPK7vGBtFz8w== =+ICi -----END PGP SIGNATURE----- Merge tag 'riscv-for-linus-6.11-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for various new ISA extensions: * The Zve32[xf] and Zve64[xfd] sub-extensios of the vector extension * Zimop and Zcmop for may-be-operations * The Zca, Zcf, Zcd and Zcb sub-extensions of the C extension * Zawrs - riscv,cpu-intc is now dtschema - A handful of performance improvements and cleanups to text patching - Support for memory hot{,un}plug - The highest user-allocatable virtual address is now visible in hwprobe * tag 'riscv-for-linus-6.11-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (58 commits) riscv: lib: relax assembly constraints in hweight riscv: set trap vector earlier KVM: riscv: selftests: Add Zawrs extension to get-reg-list test KVM: riscv: Support guest wrs.nto riscv: hwprobe: export Zawrs ISA extension riscv: Add Zawrs support for spinlocks dt-bindings: riscv: Add Zawrs ISA extension description riscv: Provide a definition for 'pause' riscv: hwprobe: export highest virtual userspace address riscv: Improve sbi_ecall() code generation by reordering arguments riscv: Add tracepoints for SBI calls and returns riscv: Optimize crc32 with Zbc extension riscv: Enable DAX VMEMMAP optimization riscv: mm: Add support for ZONE_DEVICE virtio-mem: Enable virtio-mem for RISC-V riscv: Enable memory hotplugging for RISC-V riscv: mm: Take memory hotplug read-lock during kernel page table dump riscv: mm: Add memory hotplugging support riscv: mm: Add pfn_to_kaddr() implementation riscv: mm: Refactor create_linear_mapping_range() for memory hot add ... |
||
Masahiro Yamada
|
b9d73218d7 |
treewide: change conditional prompt for choices to 'depends on'
While Documentation/kbuild/kconfig-language.rst provides a brief explanation, there are recurring confusions regarding the usage of a prompt followed by 'if <expr>'. This conditional controls _only_ the prompt. A typical usage is as follows: menuconfig BLOCK bool "Enable the block layer" if EXPERT default y When EXPERT=n, the prompt is hidden, but this config entry is still active, and BLOCK is set to its default value 'y'. This is reasonable because you are likely want to enable the block device support. When EXPERT=y, the prompt is shown, allowing you to toggle BLOCK. Please note that it is different from 'depends on EXPERT', which would enable and disable the entire config entry. However, this conditional prompt has never worked in a choice block. The following two work in the same way: when EXPERT is disabled, the choice block is entirely disabled. [Test Code 1] choice prompt "choose" if EXPERT config A bool "A" config B bool "B" endchoice [Test Code 2] choice prompt "choose" depends on EXPERT config A bool "A" config B bool "B" endchoice I believe the first case should hide only the prompt, producing the default: CONFIG_A=y # CONFIG_B is not set The next commit will change (fix) the behavior of the conditional prompt in choice blocks. I see several choice blocks wrongly using a conditional prompt, where 'depends on' makes more sense. To preserve the current behavior, this commit converts such misuses. I did not touch the following entry in arch/x86/Kconfig: choice prompt "Memory split" if EXPERT default VMSPLIT_3G This is truly the correct use of the conditional prompt; when EXPERT=n, this choice block should silently select the reasonable VMSPLIT_3G, although the resulting PAGE_OFFSET will not be affected anyway. Presumably, the one in fs/jffs2/Kconfig is also correct, but I converted it to 'depends on' to avoid any potential behavioral change. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> |
||
Palmer Dabbelt
|
5ee121a393
|
Merge patch series "riscv: Apply Zawrs when available"
Andrew Jones <ajones@ventanamicro.com> says:
Zawrs provides two instructions (wrs.nto and wrs.sto), where both are
meant to allow the hart to enter a low-power state while waiting on a
store to a memory location. The instructions also both wait an
implementation-defined "short" duration (unless the implementation
terminates the stall for another reason). The difference is that while
wrs.sto will terminate when the duration elapses, wrs.nto, depending on
configuration, will either just keep waiting or an ILL exception will be
raised. Linux will use wrs.nto, so if platforms have an implementation
which falls in the "just keep waiting" category (which is not expected),
then it should _not_ advertise Zawrs in the hardware description.
Like wfi (and with the same {m,h}status bits to configure it), when
wrs.nto is configured to raise exceptions it's expected that the higher
privilege level will see the instruction was a wait instruction, do
something, and then resume execution following the instruction. For
example, KVM does configure exceptions for wfi (hstatus.VTW=1) and
therefore also for wrs.nto. KVM does this for wfi since it's better to
allow other tasks to be scheduled while a VCPU waits for an interrupt.
For waits such as those where wrs.nto/sto would be used, which are
typically locks, it is also a good idea for KVM to be involved, as it
can attempt to schedule the lock holding VCPU.
This series starts with Christoph's addition of the riscv
smp_cond_load_relaxed function which applies wrs.sto when available.
That patch has been reworked to use wrs.nto and to use the same approach
as Arm for the wait loop, since we can't have arbitrary C code between
the load-reserved and the wrs. Then, hwprobe support is added (since the
instructions are also usable from usermode), and finally KVM is
taught about wrs.nto, allowing guests to see and use the Zawrs
extension.
We still don't have test results from hardware, and it's not possible to
prove that using Zawrs is a win when testing on QEMU, not even when
oversubscribing VCPUs to guests. However, it is possible to use KVM
selftests to force a scenario where we can prove Zawrs does its job and
does it well. [4] is a test which does this and, on my machine, without
Zawrs it takes 16 seconds to complete and with Zawrs it takes 0.25
seconds.
This series is also available here [1]. In order to use QEMU for testing
a build with [2] is needed. In order to enable guests to use Zawrs with
KVM using kvmtool, the branch at [3] may be used.
[1] https://github.com/jones-drew/linux/commits/riscv/zawrs-v3/
[2] https://lore.kernel.org/all/20240312152901.512001-2-ajones@ventanamicro.com/
[3] https://github.com/jones-drew/kvmtool/commits/riscv/zawrs/
[4]
|
||
Christoph Müllner
|
b8ddb0df30
|
riscv: Add Zawrs support for spinlocks
RISC-V code uses the generic ticket lock implementation, which calls the macros smp_cond_load_relaxed() and smp_cond_load_acquire(). Introduce a RISC-V specific implementation of smp_cond_load_relaxed() which applies WRS.NTO of the Zawrs extension in order to reduce power consumption while waiting and allows hypervisors to enable guests to trap while waiting. smp_cond_load_acquire() doesn't need a RISC-V specific implementation as the generic implementation is based on smp_cond_load_relaxed() and smp_acquire__after_ctrl_dep() sufficiently provides the acquire semantics. This implementation is heavily based on Arm's approach which is the approach Andrea Parri also suggested. The Zawrs specification can be found here: https://github.com/riscv/riscv-zawrs/blob/main/zawrs.adoc Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Co-developed-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240426100820.14762-11-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Andrew Jones
|
6da111574b
|
riscv: Provide a definition for 'pause'
If we're going to provide the encoding for 'pause' in cpu_relax() anyway, then we can drop the toolchain checks and just always use it. The advantage of doing this is that other code that need pause don't need to also define it (yes, another use is coming). Add the definition to insn-def.h since it's an instruction definition and also because insn-def.h doesn't include much, so it's safe to include from asm/vdso/processor.h without concern for circular dependencies. Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240426100820.14762-9-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Xiao Wang
|
a43fe27d65
|
riscv: Optimize crc32 with Zbc extension
As suggested by the B-ext spec, the Zbc (carry-less multiplication) instructions can be used to accelerate CRC calculations. Currently, the crc32 is the most widely used crc function inside kernel, so this patch focuses on the optimization of just the crc32 APIs. Compared with the current table-lookup based optimization, Zbc based optimization can also achieve large stride during CRC calculation loop, meantime, it avoids the memory access latency of the table-lookup based implementation and it reduces memory footprint. If Zbc feature is not supported in a runtime environment, then the table-lookup based implementation would serve as fallback via alternative mechanism. By inspecting the vmlinux built by gcc v12.2.0 with default optimization level (-O2), we can see below instruction count change for each 8-byte stride in the CRC32 loop: rv64: crc32_be (54->31), crc32_le (54->13), __crc32c_le (54->13) rv32: crc32_be (50->32), crc32_le (50->16), __crc32c_le (50->16) The compile target CPU is little endian, extra effort is needed for byte swapping for the crc32_be API, thus, the instruction count change is not as significant as that in the *_le cases. This patch is tested on QEMU VM with the kernel CRC32 selftest for both rv64 and rv32. Running the CRC32 selftest on a real hardware (SpacemiT K1) with Zbc extension shows 65% and 125% performance improvement respectively on crc32_test() and crc32c_test(). Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240621054707.1847548-1-xiao.w.wang@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Palmer Dabbelt
|
60a6707f58
|
Merge patch series "riscv: Memory Hot(Un)Plug support"
Björn Töpel <bjorn@kernel.org> says: From: Björn Töpel <bjorn@rivosinc.com> ================================================================ Memory Hot(Un)Plug support (and ZONE_DEVICE) for the RISC-V port ================================================================ Introduction ============ To quote "Documentation/admin-guide/mm/memory-hotplug.rst": "Memory hot(un)plug allows for increasing and decreasing the size of physical memory available to a machine at runtime." This series adds memory hot(un)plugging, and ZONE_DEVICE support for the RISC-V Linux port. MM configuration ================ RISC-V MM has the following configuration: * Memory blocks are 128M, analogous to x86-64. It uses PMD ("hugepage") vmemmaps. From that follows that 2M (PMD) worth of vmemmap spans 32768 pages á 4K which gets us 128M. * The pageblock size is the minimum minimum virtio_mem size, and on RISC-V it's 2M (2^9 * 4K). Implementation ============== The PGD table on RISC-V is shared/copied between for all processes. To avoid doing page table synchronization, the first patch (patch 1) pre-allocated the PGD entries for vmemmap/direct map. By doing that the init_mm PGD will be fixed at kernel init, and synchronization can be avoided all together. The following two patches (patch 2-3) does some preparations, followed by the actual MHP implementation (patch 4-5). Then, MHP and virtio-mem are enabled (patch 6-7), and finally ZONE_DEVICE support is added (patch 8). MHP and locking =============== TL;DR: The MHP does not step on any toes, except for ptdump. Additional locking is required for ptdump. Long version: For v2 I spent some time digging into init_mm synchronization/update. Here are my findings, and I'd love them to be corrected if incorrect. It's been a gnarly path... The `init_mm` structure is a special mm (perhaps not a "real" one). It's a "lazy context" that tracks kernel page table resources, e.g., the kernel page table (swapper_pg_dir), a kernel page_table_lock (more about the usage below), mmap_lock, and such. `init_mm` does not track/contain any VMAs. Having the `init_mm` is convenient, so that the regular kernel page table walk/modify functions can be used. Now, `init_mm` being special means that the locking for kernel page tables are special as well. On RISC-V the PGD (top-level page table structure), similar to x86, is shared (copied) with user processes. If the kernel PGD is modified, it has to be synched to user-mode processes PGDs. This is avoided by pre-populating the PGD, so it'll be fixed from boot. The in-kernel pgd regions are documented in `Documentation/arch/riscv/vm-layout.rst`. The distinct regions are: * vmemmap * vmalloc/ioremap space * direct mapping of all physical memory * kasan * modules, BPF * kernel Memory hotplug is the process of adding/removing memory to/from the kernel. Adding is done in two phases: 1. Add the memory to the kernel 2. Online memory, making it available to the page allocator. Step 1 is partially architecture dependent, and updates the init_mm page table: * Update the direct map page tables. The direct map is a linear map, representing all physical memory: `virt = phys + PAGE_OFFSET` * Add a `struct page` for each added page of memory. Update the vmemmap (virtual mapping to the `struct page`, so we can easily transform a kernel virtual address to a `struct page *` address. From an MHP perspective, there are two regions of the PGD that are updated: * vmemmap * direct mapping of all physical memory The `struct mm_struct` has a couple of locks in play: * `spinlock_t page_table_lock` protects the page table, and some counters * `struct rw_semaphore mmap_lock` protect an mm's VMAs Note again that `init_mm` does not contain any VMAs, but still uses the mmap_lock in some places. The `page_table_lock` was originally used to to protect all pages tables, but more recently a split page table lock has been introduced. The split lock has a per-table lock for the PTE and PMD tables. If split lock is disabled, all tables are guarded by `mm->page_table_lock` (for user processes). Split page table locks are not used for init_mm. MHP operations is typically synchronized using `DEFINE_STATIC_PERCPU_RWSEM(mem_hotplug_lock)`. Actors ------ The following non-MHP actors in the kernel traverses (read), and/or modifies the kernel PGD. * `ptdump` Walks the entire `init_mm`, via `ptdump_walk_pgd()` with the `mmap_write_lock(init_mm)` taken. Observation: ptdump can race with MHP, and needs additional locking to avoid crashes/races. * `set_direct_*` / `arch/riscv/mm/pageattr.c` The `set_direct_*` functionality is used to "synchronize" the direct map to other kernel mappings, e.g. modules/kernel text. The direct map is using "as large huge table mappings as possible", which means that the `set_direct_*` might need to split the direct map. The `set_direct_*` functions operates with the `mmap_write_lock(init_mm)` taken. Observation: `set_direct_*` uses the direct map, but will never modify the same entry as MHP. If there is a mapping, that entry will never race with MHP. Further, MHP acts when memory is offline. * HVO / `mm/hugetlb_vmemmap` HVO optimizes the backing `struct page` for hugetlb pages, which means changing the "vmemmap" region. HVO can split (merge?) a vmemmap pmd. However, it will never race with MHP, since HVO only operates at online memory. HVO cannot touch memory being MHP added or removed. * `apply_to_page_range` Walks a range, creates pages and applies a callback (setting permissions) for the page. When creating a table, it might use `int __pte_alloc_kernel(pmd_t *pmd)` which takes the `init_mm.page_table_lock` to synchronize pmd populate. Used by: `mm/vmalloc.c` and `mm/kasan/shadow.c`. The KASAN callback takes the `init_mm.page_table_lock` to synchronize pte creation. Observations: `apply_to_page_range` applies to the "vmalloc/ioremap space" region, and "kasan" region. *Not* affected by MHP. * `apply_to_existing_page_range` Walks a range, applies a callback (setting permissions) for the page (no page creation). Used by: `kernel/bpf/arena.c` and `mm/kasan/shadow.c`. The KASAN callback takes the `init_mm.page_table_lock` to synchronize pte creation. *Not* affected by MHP regions. * `apply_to_existing_page_range` applies to the "vmalloc/ioremap space" region, and "kasan" region. *Not* affected by MHP regions. * `ioremap_page_range` and `vmap_page_range` Uses the same internal function, and might create table entries at the "vmalloc/ioremap space" region. Can call `__pte_alloc_kernel()` which takes the `init_mm.page_table_lock` synchronizing pmd populate in the region. *Not* affected by MHP regions. Summary: * MHP add will never modify the same page table entries, as any of the other actors. * MHP remove is done when memory is offlined, and will not clash with any of the actors. * Functions that walk the entire kernel page table need synchronization * It's sufficient to add the MHP lock ptdump. Testing ======= This series adds basic DT supported hotplugging. There is a QEMU series enabling MHP for the RISC-V "virt" machine here: [1] ACPI/MSI support is still in the making for RISC-V, and prior proper (ACPI) PCI MSI support lands [2] and NUMA SRAT support [3], it hard to try it out. I've prepared a QEMU branch with proper ACPI GED/PC-DIMM support [4], and a this series with the required prerequisites [5] (AIA, ACPI AIA MADT, ACPI NUMA SRAT). To test with virtio-mem, e.g.: | qemu-system-riscv64 \ | -machine virt,aia=aplic-imsic \ | -cpu rv64,v=true,vlen=256,elen=64,h=true,zbkb=on,zbkc=on,zbkx=on,zkr=on,zkt=on,svinval=on,svnapot=on,svpbmt=on \ | -nodefaults \ | -nographic -smp 8 -kernel rv64-u-boot.bin \ | -drive file=rootfs.img,format=raw,if=virtio \ | -device virtio-rng-pci \ | -m 16G,slots=3,maxmem=32G \ | -object memory-backend-ram,id=mem0,size=16G \ | -numa node,nodeid=0,memdev=mem0 \ | -serial chardev:char0 \ | -mon chardev=char0,mode=readline \ | -chardev stdio,mux=on,id=char0 \ | -device pci-serial,id=serial0,chardev=char0 \ | -object memory-backend-ram,id=vmem0,size=2G \ | -device virtio-mem-pci,id=vm0,memdev=vmem0,node=0 where "rv64-u-boot.bin" is U-boot with EFI/ACPI-support (use [6] if you're lazy). In the QEMU monitor: | (qemu) info memory-devices | (qemu) qom-set vm0 requested-size 1G ...to test DAX/KMEM, use the follow QEMU parameters: | -object memory-backend-file,id=mem1,share=on,mem-path=virtio_pmem.img,size=4G \ | -device virtio-pmem-pci,memdev=mem1,id=nv1 and the regular ndctl/daxctl dance. If you're brave to try the ACPI branch, add "acpi=on" to "-machine virt", and test PC-DIMM MHP (in addition to virtio-{p},mem): In the QEMU monitor: | (qemu) object_add memory-backend-ram,id=mem1,size=1G | (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 You can also try hot-remove with some QEMU options, say: | -object memory-backend-file,id=mem-1,size=256M,mem-path=/pagesize-2MB | -device pc-dimm,id=mem1,memdev=mem-1 | -object memory-backend-file,id=mem-2,size=1G,mem-path=/pagesize-1GB | -device pc-dimm,id=mem2,memdev=mem-2 | -object memory-backend-file,id=mem-3,size=256M,mem-path=/pagesize-2MB | -device pc-dimm,id=mem3,memdev=mem-3 Remove "acpi=on" to run with DT. Thanks to Alex, Andrew, David, and Oscar for all comments/tests/fixups. References ========== [1] https://lore.kernel.org/qemu-devel/20240521105635.795211-1-bjorn@kernel.org/ [2] https://lore.kernel.org/linux-riscv/20240501121742.1215792-1-sunilvl@ventanamicro.com/ [3] https://lore.kernel.org/linux-riscv/cover.1713778236.git.haibo1.xu@intel.com/ [4] https://github.com/bjoto/qemu/commits/virtio-mem-pc-dimm-mhp-acpi-v2/ [5] https://github.com/bjoto/linux/commits/mhp-v4-acpi [6] https://github.com/bjoto/riscv-rootfs-utils/tree/acpi * b4-shazam-merge: riscv: Enable DAX VMEMMAP optimization riscv: mm: Add support for ZONE_DEVICE virtio-mem: Enable virtio-mem for RISC-V riscv: Enable memory hotplugging for RISC-V riscv: mm: Take memory hotplug read-lock during kernel page table dump riscv: mm: Add memory hotplugging support riscv: mm: Add pfn_to_kaddr() implementation riscv: mm: Refactor create_linear_mapping_range() for memory hot add riscv: mm: Change attribute from __init to __meminit for page functions riscv: mm: Pre-allocate vmemmap/direct map/kasan PGD entries riscv: mm: Properly forward vmemmap_populate() altmap parameter Link: https://lore.kernel.org/r/20240605114100.315918-1-bjorn@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Björn Töpel
|
4705c1571a
|
riscv: Enable DAX VMEMMAP optimization
Now that DAX is usable, enable the DAX VMEMMAP optimization as well. Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240605114100.315918-12-bjorn@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Björn Töpel
|
216e04bf1e
|
riscv: mm: Add support for ZONE_DEVICE
ZONE_DEVICE pages need DEVMAP PTEs support to function (ARCH_HAS_PTE_DEVMAP). Claim another RSW (reserved for software) bit in the PTE for DEVMAP mark, add the corresponding helpers, and enable ARCH_HAS_PTE_DEVMAP for riscv64. Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20240605114100.315918-11-bjorn@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Björn Töpel
|
f8c2a24055
|
riscv: Enable memory hotplugging for RISC-V
Enable ARCH_ENABLE_MEMORY_HOTPLUG and ARCH_ENABLE_MEMORY_HOTREMOVE for RISC-V. Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20240605114100.315918-9-bjorn@kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Haibo Xu
|
d6ecd18893
|
riscv: dmi: Add SMBIOS/DMI support
Enable the dmi driver for riscv which would allow access the SMBIOS info through some userspace file(/sys/firmware/dmi/*). The change was based on that of arm64 and has been verified by dmidecode tool. Signed-off-by: Haibo Xu <haibo1.xu@intel.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20240613065507.287577-1-haibo1.xu@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> |
||
Jakub Kicinski
|
62b5bf58b9 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/pensando/ionic/ionic_txrx.c |
||
Jakub Kicinski
|
e19de2064f |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/ti/icssg/icssg_classifier.c |
||
Nam Cao
|
7bed516174
|
riscv: enable HAVE_ARCH_HUGE_VMAP for XIP kernel
HAVE_ARCH_HUGE_VMAP also works on XIP kernel, so remove its dependency on
!XIP_KERNEL.
This also fixes a boot problem for XIP kernel introduced by the commit in
"Fixes:". This commit used huge page mapping for vmemmap, but huge page
vmap was not enabled for XIP kernel.
Fixes:
|
||
Jakub Kicinski
|
4b3529edbb |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZlWtmQAKCRDbK58LschI g0TUAQDT76jx7Rq1DShCtZ3eqiBMNkYczK8b+GqNsSG8YGduaAEA1jn/GN+H65Rh atQZ/pYAfLZflMV04+XE0GyBr5q1uQg= =NczG -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-05-28 We've added 23 non-merge commits during the last 11 day(s) which contain a total of 45 files changed, 696 insertions(+), 277 deletions(-). The main changes are: 1) Rename skb's mono_delivery_time to tstamp_type for extensibility and add SKB_CLOCK_TAI type support to bpf_skb_set_tstamp(), from Abhishek Chauhan. 2) Add netfilter CT zone ID and direction to bpf_ct_opts so that arbitrary CT zones can be used from XDP/tc BPF netfilter CT helper functions, from Brad Cowie. 3) Several tweaks to the instruction-set.rst IETF doc to address the Last Call review comments, from Dave Thaler. 4) Small batch of riscv64 BPF JIT optimizations in order to emit more compressed instructions to the JITed image for better icache efficiency, from Xiao Wang. 5) Sort bpftool C dump output from BTF, aiming to simplify vmlinux.h diffing and forcing more natural type definitions ordering, from Mykyta Yatsenko. 6) Use DEV_STATS_INC() macro in BPF redirect helpers to silence a syzbot/KCSAN race report for the tx_errors counter, from Jiang Yunshui. 7) Un-constify bpf_func_info in bpftool to fix compilation with LLVM 17+ which started treating const structs as constants and thus breaking full BTF program name resolution, from Ivan Babrou. 8) Fix up BPF program numbers in test_sockmap selftest in order to reduce some of the test-internal array sizes, from Geliang Tang. 9) Small cleanup in Makefile.btf script to use test-ge check for v1.25-only pahole, from Alan Maguire. 10) Fix bpftool's make dependencies for vmlinux.h in order to avoid needless rebuilds in some corner cases, from Artem Savkov. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits) bpf, net: Use DEV_STAT_INC() bpf, docs: Fix instruction.rst indentation bpf, docs: Clarify call local offset bpf, docs: Add table captions bpf, docs: clarify sign extension of 64-bit use of 32-bit imm bpf, docs: Use RFC 2119 language for ISA requirements bpf, docs: Move sentence about returning R0 to abi.rst bpf: constify member bpf_sysctl_kern:: Table riscv, bpf: Try RVC for reg move within BPF_CMPXCHG JIT riscv, bpf: Use STACK_ALIGN macro for size rounding up riscv, bpf: Optimize zextw insn with Zba extension selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets net: Add additional bit to support clockid_t timestamp type net: Rename mono_delivery_time to tstamp_type for scalabilty selftests/bpf: Update tests for new ct zone opts for nf_conntrack kfuncs net: netfilter: Make ct zone opts configurable for bpf ct helpers selftests/bpf: Fix prog numbers in test_sockmap bpf: Remove unused variable "prev_state" bpftool: Un-const bpf_func_info to fix it for llvm 17 and newer bpf: Fix order of args in call to bpf_map_kvcalloc ... ==================== Link: https://lore.kernel.org/r/20240528105924.30905-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Linus Torvalds
|
f1f9984fdc |
RISC-V Patches for the 6.10 Merge Window, Part 2
* The compression format used for boot images is now configurable at build time, and these formats are shown in `make help`. * access_ok() has been optimized. * A pair of performance bugs have been fixed in the uaccess handlers. * Various fixes and cleanups, including one for the IMSIC build failure and one for the early-boot ftrace illegal NOPs bug. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmZQtRwTHHBhbG1lckBk YWJiZWx0LmNvbQAKCRAuExnzX7sYiWPBD/0UitwMg88m6urvMd0Pfvwwbu/OnGqW TZT8C55iJi/e5f9K4mBrSyjATI8z/MblD+Zz0adX8ygavS4JuQ7DoWwb1yTT3pww +z74FkWeJuiar+HfbhQ602CfMrnzvWjnyJ3URemqy5pIBKyvD9gGkDJDZwf8hJTk Vqh5qVtnBqFBO9kWpIx+/pLCfpyHVNkhWr1AzKfoqQ1WPIpZ/o0IGdvS88rL+EBR QOXiwVhEsRfC+LT6Jhn8l2bGp7PaSRVOid19OxNsJKpAhpL6AOscaafclVrLBuTd gkys0rT2dHdoWTAkPHQpvlOI6OmGTgopxo5pUKJHS8J9VRoBun25zC1FGBF8uyVd 05CabWPnh7olNsRge9XiNj3x8PXjGVi7X7wUbRgOBG5aDc6TbKdxu37J0tXe0M7a Q74ctQvk8Nk6bQWirgTNlfJJHzL5pJbKc9VwY5uGX4qTmH+yEvCIt45ZXgXOuS/F eqijStkkdXUDnkMdcpaZJvXP80rHcgfP8bqevvPymRli8ER9zj9aXJQ3rmCUcPz+ EtbyS+vOEN31wNTA1EQlfIRxfvr22x7r70DDdRwmhuD1W1tgfblm+R0Cq76I5rnJ VSgXKq1b4mY0eautqXEnPGyqb7H8iJIq7AoyfbzzWN+4u6yVEUvpDKueeksy+fFt sGNtjWqGhWyKXg== =/Qtt -----END PGP SIGNATURE----- Merge tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull more RISC-V updates from Palmer Dabbelt: - The compression format used for boot images is now configurable at build time, and these formats are shown in `make help` - access_ok() has been optimized - A pair of performance bugs have been fixed in the uaccess handlers - Various fixes and cleanups, including one for the IMSIC build failure and one for the early-boot ftrace illegal NOPs bug * tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix early ftrace nop patching irqchip: riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict riscv: selftests: Add signal handling vector tests riscv: mm: accelerate pagefault when badaccess riscv: uaccess: Relax the threshold for fast path riscv: uaccess: Allow the last potential unrolled copy riscv: typo in comment for get_f64_reg Use bool value in set_cpu_online() riscv: selftests: Add hwprobe binaries to .gitignore riscv: stacktrace: fixed walk_stackframe() ftrace: riscv: move from REGS to ARGS riscv: do not select MODULE_SECTIONS by default riscv: show help string for riscv-specific targets riscv: make image compression configurable riscv: cpufeature: Fix extension subset checking riscv: cpufeature: Fix thead vector hwcap removal riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled riscv: Define TASK_SIZE_MAX for __access_ok() riscv: Remove PGDIR_SIZE_L3 and TASK_SIZE_MIN |
||
Xiao Wang
|
c12603e76e |
riscv, bpf: Optimize zextw insn with Zba extension
The Zba extension provides add.uw insn which can be used to implement zext.w with rs2 set as ZERO. Signed-off-by: Xiao Wang <xiao.w.wang@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Pu Lehui <pulehui@huawei.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/bpf/20240516090430.493122-1-xiao.w.wang@intel.com |
||
Linus Torvalds
|
c760b3725e |
- A series ("kbuild: enable more warnings by default") from Arnd
Bergmann which enables a number of additional build-time warnings. We fixed all the fallout which we could find, there may still be a few stragglers. - Samuel Holland has developed the series "Unified cross-architecture kernel-mode FPU API". This does a lot of consolidation of per-architecture kernel-mode FPU usage and enables the use of newer AMD GPUs on RISC-V. - Tao Su has fixed some selftests build warnings in the series "Selftests: Fix compilation warnings due to missing _GNU_SOURCE definition". - This pull also includes a nilfs2 fixup from Ryusuke Konishi. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZk6OSAAKCRDdBJ7gKXxA jpTGAP9hQaZ+g7CO38hKQAtEI8rwcZJtvUAP84pZEGMjYMGLxQD/S8z1o7UHx61j DUbnunbOkU/UcPx3Fs/gp4KcJARMEgs= =EPi9 -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more non-mm updates from Andrew Morton: - A series ("kbuild: enable more warnings by default") from Arnd Bergmann which enables a number of additional build-time warnings. We fixed all the fallout which we could find, there may still be a few stragglers. - Samuel Holland has developed the series "Unified cross-architecture kernel-mode FPU API". This does a lot of consolidation of per-architecture kernel-mode FPU usage and enables the use of newer AMD GPUs on RISC-V. - Tao Su has fixed some selftests build warnings in the series "Selftests: Fix compilation warnings due to missing _GNU_SOURCE definition". - This pull also includes a nilfs2 fixup from Ryusuke Konishi. * tag 'mm-nonmm-stable-2024-05-22-17-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits) nilfs2: make block erasure safe in nilfs_finish_roll_forward() selftests/harness: use 1024 in place of LINE_MAX Revert "selftests/harness: remove use of LINE_MAX" selftests/fpu: allow building on other architectures selftests/fpu: move FP code to a separate translation unit drm/amd/display: use ARCH_HAS_KERNEL_FPU_SUPPORT drm/amd/display: only use hard-float, not altivec on powerpc riscv: add support for kernel-mode FPU x86: implement ARCH_HAS_KERNEL_FPU_SUPPORT powerpc: implement ARCH_HAS_KERNEL_FPU_SUPPORT LoongArch: implement ARCH_HAS_KERNEL_FPU_SUPPORT lib/raid6: use CC_FLAGS_FPU for NEON CFLAGS arm64: crypto: use CC_FLAGS_FPU for NEON CFLAGS arm64: implement ARCH_HAS_KERNEL_FPU_SUPPORT ARM: crypto: use CC_FLAGS_FPU for NEON CFLAGS ARM: implement ARCH_HAS_KERNEL_FPU_SUPPORT arch: add ARCH_HAS_KERNEL_FPU_SUPPORT x86/fpu: fix asm/fpu/types.h include guard kbuild: enable -Wcast-function-type-strict unconditionally kbuild: enable -Wformat-truncation on clang ... |