Commit Graph

385 Commits

Author SHA1 Message Date
Palmer Dabbelt
782aefb177
Merge patch series "riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION"
Jisheng Zhang <jszhang@kernel.org> says:

When trying to run linux with various opensource riscv core on
resource limited FPGA platforms, for example, those FPGAs with less
than 16MB SDRAM, I want to save mem as much as possible. One of the
major technologies is kernel size optimizations, I found that riscv
does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION, which
passes -fdata-sections, -ffunction-sections to CFLAGS and passes the
--gc-sections flag to the linker.

This not only benefits my case on FPGA but also benefits defconfigs.
Here are some notable improvements from enabling this with defconfigs:

nommu_k210_defconfig:
   text    data     bss     dec     hex
1112009  410288   59837 1582134  182436     before
 962838  376656   51285 1390779  1538bb     after

rv32_defconfig:
   text    data     bss     dec     hex
8804455 2816544  290577 11911576 b5c198     before
8692295 2779872  288977 11761144 b375f8     after

defconfig:
   text    data     bss     dec     hex
9438267 3391332  485333 13314932 cb2b74     before
9285914 3350052  483349 13119315 c82f53     after

patch1 and patch2 are clean ups.
patch3 fixes a typo.
patch4 finally enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for riscv.

* b4-shazam-merge:
  riscv: disable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLD
  riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION
  vmlinux.lds.h: use correct .init.data.* section name
  riscv: vmlinux-xip.lds.S: remove .alternative section
  riscv: move options to keep entries sorted
  riscv: Fix orphan section warnings caused by kernel/pi

Link: https://lore.kernel.org/r/20230523165502.2592-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-07-01 07:38:19 -07:00
Linus Torvalds
533925cb76 RISC-V Patches for the 6.5 Merge Window, Part 1
* Support for ACPI.
 * Various cleanups to the ISA string parsing, including making them
   case-insensitive
 * Support for the vector extension.
 * Support for independent irq/softirq stacks.
 * Our CPU DT binding now has "unevaluatedProperties: false"
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmSe70ATHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiWNPD/0ZfSdQ0A/gMVOzAD4zFKPEqQ6ffW2V
 Zy6Jo7UDNqKsiai7QA4XB1uyYIv/y1yUKJ0oeBVcA9Nzyq+TW9QDcApDBTabxAUI
 agY19YKw6VVZ+p7I9sMsf6EbdJdkNfSAzcQACPxb4ScEoaf9X+oAK5qgXuRuWluh
 qQuVkkJlgWc/t1cuUkrRdJmHQYvjP3zL7z4o344q2IVpXJkNNu0GeP+HbF8BYKcA
 +I/TTA5JY3kCIaxkpF2rU6pE6T5T9xrPmRYZ7bZoPUPnbL+M8As/jx3ym52Y4WGp
 kf8pgkxixOjU64kVJOH66CA8GaOiaAH/ptjQb0ZmCaGrHhr7aOT9HrkX4rU1lS8T
 stPphfM4gGPcCoPgRqSl+mEhBzjII8maOBLtbricAoQi6efRq8fzoOGaif/QpCbc
 6n0LGS4nQPGVyD3rAPfHxxfrlGJR+SsgyDvjZoDhqauFglims14GnK+eBeO8zrui
 Aj/uuAS63VIYprJWC1NOBJlU2WKZiOGhCANpZ6W6SH21PYn2WjsVILqaGh+WN8ZO
 KOHxZNaN8fQag0Yg7oNAUb7l6S0DHYtJIksFnFW2Rf2+VT58RAMYRQbpbhr7Tqr+
 jLgIR8PkFrBERHE49IqLGhAxGDnNzAUysMRw9pIk7WIre2Jt4wPqUdl+ee+5ErIX
 jiYfSFZw9q28UA==
 =Fpq8
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.5-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - Support for ACPI

 - Various cleanups to the ISA string parsing, including making them
   case-insensitive

 - Support for the vector extension

 - Support for independent irq/softirq stacks

 - Our CPU DT binding now has "unevaluatedProperties: false"

* tag 'riscv-for-linus-6.5-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (78 commits)
  riscv: hibernate: remove WARN_ON in save_processor_state
  dt-bindings: riscv: cpus: switch to unevaluatedProperties: false
  dt-bindings: riscv: cpus: add a ref the common cpu schema
  riscv: stack: Add config of thread stack size
  riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK
  riscv: stack: Support HAVE_IRQ_EXIT_ON_IRQ_STACK
  RISC-V: always report presence of extensions formerly part of the base ISA
  dt-bindings: riscv: explicitly mention assumption of Zicntr & Zihpm support
  RISC-V: remove decrement/increment dance in ISA string parser
  RISC-V: rework comments in ISA string parser
  RISC-V: validate riscv,isa at boot, not during ISA string parsing
  RISC-V: split early & late of_node to hartid mapping
  RISC-V: simplify register width check in ISA string parsing
  perf: RISC-V: Limit the number of counters returned from SBI
  riscv: replace deprecated scall with ecall
  riscv: uprobes: Restore thread.bad_cause
  riscv: mm: try VMA lock-based page fault handling first
  riscv: mm: Pre-allocate PGD entries for vmalloc/modules area
  RISC-V: hwprobe: Expose Zba, Zbb, and Zbs
  RISC-V: Track ISA extensions per hart
  ...
2023-06-30 09:37:26 -07:00
Linus Torvalds
9471f1f2f5 Merge branch 'expand-stack'
This modifies our user mode stack expansion code to always take the
mmap_lock for writing before modifying the VM layout.

It's actually something we always technically should have done, but
because we didn't strictly need it, we were being lazy ("opportunistic"
sounds so much better, doesn't it?) about things, and had this hack in
place where we would extend the stack vma in-place without doing the
proper locking.

And it worked fine.  We just needed to change vm_start (or, in the case
of grow-up stacks, vm_end) and together with some special ad-hoc locking
using the anon_vma lock and the mm->page_table_lock, it all was fairly
straightforward.

That is, it was all fine until Ruihan Li pointed out that now that the
vma layout uses the maple tree code, we *really* don't just change
vm_start and vm_end any more, and the locking really is broken.  Oops.

It's not actually all _that_ horrible to fix this once and for all, and
do proper locking, but it's a bit painful.  We have basically three
different cases of stack expansion, and they all work just a bit
differently:

 - the common and obvious case is the page fault handling. It's actually
   fairly simple and straightforward, except for the fact that we have
   something like 24 different versions of it, and you end up in a maze
   of twisty little passages, all alike.

 - the simplest case is the execve() code that creates a new stack.
   There are no real locking concerns because it's all in a private new
   VM that hasn't been exposed to anybody, but lockdep still can end up
   unhappy if you get it wrong.

 - and finally, we have GUP and page pinning, which shouldn't really be
   expanding the stack in the first place, but in addition to execve()
   we also use it for ptrace(). And debuggers do want to possibly access
   memory under the stack pointer and thus need to be able to expand the
   stack as a special case.

None of these cases are exactly complicated, but the page fault case in
particular is just repeated slightly differently many many times.  And
ia64 in particular has a fairly complicated situation where you can have
both a regular grow-down stack _and_ a special grow-up stack for the
register backing store.

So to make this slightly more manageable, the bulk of this series is to
first create a helper function for the most common page fault case, and
convert all the straightforward architectures to it.

Thus the new 'lock_mm_and_find_vma()' helper function, which ends up
being used by x86, arm, powerpc, mips, riscv, alpha, arc, csky, hexagon,
loongarch, nios2, sh, sparc32, and xtensa.  So we not only convert more
than half the architectures, we now have more shared code and avoid some
of those twisty little passages.

And largely due to this common helper function, the full diffstat of
this series ends up deleting more lines than it adds.

That still leaves eight architectures (ia64, m68k, microblaze, openrisc,
parisc, s390, sparc64 and um) that end up doing 'expand_stack()'
manually because they are doing something slightly different from the
normal pattern.  Along with the couple of special cases in execve() and
GUP.

So there's a couple of patches that first create 'locked' helper
versions of the stack expansion functions, so that there's a obvious
path forward in the conversion.  The execve() case is then actually
pretty simple, and is a nice cleanup from our old "grow-up stackls are
special, because at execve time even they grow down".

The #ifdef CONFIG_STACK_GROWSUP in that code just goes away, because
it's just more straightforward to write out the stack expansion there
manually, instead od having get_user_pages_remote() do it for us in some
situations but not others and have to worry about locking rules for GUP.

And the final step is then to just convert the remaining odd cases to a
new world order where 'expand_stack()' is called with the mmap_lock held
for reading, but where it might drop it and upgrade it to a write, only
to return with it held for reading (in the success case) or with it
completely dropped (in the failure case).

In the process, we remove all the stack expansion from GUP (where
dropping the lock wouldn't be ok without special rules anyway), and add
it in manually to __access_remote_vm() for ptrace().

Thanks to Adrian Glaubitz and Frank Scheiner who tested the ia64 cases.
Everything else here felt pretty straightforward, but the ia64 rules for
stack expansion are really quite odd and very different from everything
else.  Also thanks to Vegard Nossum who caught me getting one of those
odd conditions entirely the wrong way around.

Anyway, I think I want to actually move all the stack expansion code to
a whole new file of its own, rather than have it split up between
mm/mmap.c and mm/memory.c, but since this will have to be backported to
the initial maple tree vma introduction anyway, I tried to keep the
patches _fairly_ minimal.

Also, while I don't think it's valid to expand the stack from GUP, the
final patch in here is a "warn if some crazy GUP user wants to try to
expand the stack" patch.  That one will be reverted before the final
release, but it's left to catch any odd cases during the merge window
and release candidates.

Reported-by: Ruihan Li <lrh2000@pku.edu.cn>

* branch 'expand-stack':
  gup: add warning if some caller would seem to want stack expansion
  mm: always expand the stack with the mmap write lock held
  execve: expand new process stack manually ahead of time
  mm: make find_extend_vma() fail if write lock not held
  powerpc/mm: convert coprocessor fault to lock_mm_and_find_vma()
  mm/fault: convert remaining simple cases to lock_mm_and_find_vma()
  arm/mm: Convert to using lock_mm_and_find_vma()
  riscv/mm: Convert to using lock_mm_and_find_vma()
  mips/mm: Convert to using lock_mm_and_find_vma()
  powerpc/mm: Convert to using lock_mm_and_find_vma()
  arm64/mm: Convert to using lock_mm_and_find_vma()
  mm: make the page fault mmap locking killable
  mm: introduce new 'lock_mm_and_find_vma()' page fault helper
2023-06-28 20:35:21 -07:00
Linus Torvalds
9244724fbf A large update for SMP management:
- Parallel CPU bringup
 
     The reason why people are interested in parallel bringup is to shorten
     the (kexec) reboot time of cloud servers to reduce the downtime of the
     VM tenants.
 
     The current fully serialized bringup does the following per AP:
 
       1) Prepare callbacks (allocate, intialize, create threads)
       2) Kick the AP alive (e.g. INIT/SIPI on x86)
       3) Wait for the AP to report alive state
       4) Let the AP continue through the atomic bringup
       5) Let the AP run the threaded bringup to full online state
 
     There are two significant delays:
 
       #3 The time for an AP to report alive state in start_secondary() on
          x86 has been measured in the range between 350us and 3.5ms
          depending on vendor and CPU type, BIOS microcode size etc.
 
       #4 The atomic bringup does the microcode update. This has been
          measured to take up to ~8ms on the primary threads depending on
          the microcode patch size to apply.
 
     On a two socket SKL server with 56 cores (112 threads) the boot CPU
     spends on current mainline about 800ms busy waiting for the APs to come
     up and apply microcode. That's more than 80% of the actual onlining
     procedure.
 
     This can be reduced significantly by splitting the bringup mechanism
     into two parts:
 
       1) Run the prepare callbacks and kick the AP alive for each AP which
       	 needs to be brought up.
 
 	 The APs wake up, do their firmware initialization and run the low
       	 level kernel startup code including microcode loading in parallel
       	 up to the first synchronization point. (#1 and #2 above)
 
       2) Run the rest of the bringup code strictly serialized per CPU
       	 (#3 - #5 above) as it's done today.
 
 	 Parallelizing that stage of the CPU bringup might be possible in
 	 theory, but it's questionable whether required surgery would be
 	 justified for a pretty small gain.
 
     If the system is large enough the first AP is already waiting at the
     first synchronization point when the boot CPU finished the wake-up of
     the last AP. That reduces the AP bringup time on that SKL from ~800ms
     to ~80ms, i.e. by a factor ~10x.
 
     The actual gain varies wildly depending on the system, CPU, microcode
     patch size and other factors. There are some opportunities to reduce
     the overhead further, but that needs some deep surgery in the x86 CPU
     bringup code.
 
     For now this is only enabled on x86, but the core functionality
     obviously works for all SMP capable architectures.
 
   - Enhancements for SMP function call tracing so it is possible to locate
     the scheduling and the actual execution points. That allows to measure
     IPI delivery time precisely.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmSZb/YTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRoOD/9vAiGI3IhGyZcX/RjXxauSHf8Pmqll
 05jUubFi5Vi3tKI1ubMOsnMmJTw2yy5xDyS/iGj7AcbRLq9uQd3iMtsXXHNBzo/X
 FNxnuWTXYUj0vcOYJ+j4puBumFzzpRCprqccMInH0kUnSWzbnaQCeelicZORAf+w
 zUYrswK4HpBXHDOnvPw6Z7MYQe+zyDQSwjSftstLyROzu+lCEw/9KUaysY2epShJ
 wHClxS2XqMnpY4rJ/CmJAlRhD0Plb89zXyo6k9YZYVDWoAcmBZy6vaTO4qoR171L
 37ApqrgsksMkjFycCMnmrFIlkeb7bkrYDQ5y+xqC3JPTlYDKOYmITV5fZ83HD77o
 K7FAhl/CgkPq2Ec+d82GFLVBKR1rijbwHf7a0nhfUy0yMeaJCxGp4uQ45uQ09asi
 a/VG2T38EgxVdseC92HRhcdd3pipwCb5wqjCH/XdhdlQrk9NfeIeP+TxF4QhADhg
 dApp3ifhHSnuEul7+HNUkC6U+Zc8UeDPdu5lvxSTp2ooQ0JwaGgC5PJq3nI9RUi2
 Vv826NHOknEjFInOQcwvp6SJPfcuSTF75Yx6xKz8EZ3HHxpvlolxZLq+3ohSfOKn
 2efOuZO5bEu4S/G2tRDYcy+CBvNVSrtZmCVqSOS039c8quBWQV7cj0334cjzf+5T
 TRiSzvssbYYmaw==
 =Y8if
 -----END PGP SIGNATURE-----

Merge tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull SMP updates from Thomas Gleixner:
 "A large update for SMP management:

   - Parallel CPU bringup

     The reason why people are interested in parallel bringup is to
     shorten the (kexec) reboot time of cloud servers to reduce the
     downtime of the VM tenants.

     The current fully serialized bringup does the following per AP:

       1) Prepare callbacks (allocate, intialize, create threads)
       2) Kick the AP alive (e.g. INIT/SIPI on x86)
       3) Wait for the AP to report alive state
       4) Let the AP continue through the atomic bringup
       5) Let the AP run the threaded bringup to full online state

     There are two significant delays:

       #3 The time for an AP to report alive state in start_secondary()
          on x86 has been measured in the range between 350us and 3.5ms
          depending on vendor and CPU type, BIOS microcode size etc.

       #4 The atomic bringup does the microcode update. This has been
          measured to take up to ~8ms on the primary threads depending
          on the microcode patch size to apply.

     On a two socket SKL server with 56 cores (112 threads) the boot CPU
     spends on current mainline about 800ms busy waiting for the APs to
     come up and apply microcode. That's more than 80% of the actual
     onlining procedure.

     This can be reduced significantly by splitting the bringup
     mechanism into two parts:

       1) Run the prepare callbacks and kick the AP alive for each AP
          which needs to be brought up.

          The APs wake up, do their firmware initialization and run the
          low level kernel startup code including microcode loading in
          parallel up to the first synchronization point. (#1 and #2
          above)

       2) Run the rest of the bringup code strictly serialized per CPU
          (#3 - #5 above) as it's done today.

          Parallelizing that stage of the CPU bringup might be possible
          in theory, but it's questionable whether required surgery
          would be justified for a pretty small gain.

     If the system is large enough the first AP is already waiting at
     the first synchronization point when the boot CPU finished the
     wake-up of the last AP. That reduces the AP bringup time on that
     SKL from ~800ms to ~80ms, i.e. by a factor ~10x.

     The actual gain varies wildly depending on the system, CPU,
     microcode patch size and other factors. There are some
     opportunities to reduce the overhead further, but that needs some
     deep surgery in the x86 CPU bringup code.

     For now this is only enabled on x86, but the core functionality
     obviously works for all SMP capable architectures.

   - Enhancements for SMP function call tracing so it is possible to
     locate the scheduling and the actual execution points. That allows
     to measure IPI delivery time precisely"

* tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
  trace,smp: Add tracepoints for scheduling remotelly called functions
  trace,smp: Add tracepoints around remotelly called functions
  MAINTAINERS: Add CPU HOTPLUG entry
  x86/smpboot: Fix the parallel bringup decision
  x86/realmode: Make stack lock work in trampoline_compat()
  x86/smp: Initialize cpu_primary_thread_mask late
  cpu/hotplug: Fix off by one in cpuhp_bringup_mask()
  x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils
  x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it
  x86/smpboot: Support parallel startup of secondary CPUs
  x86/smpboot: Implement a bit spinlock to protect the realmode stack
  x86/apic: Save the APIC virtual base address
  cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE
  x86/apic: Provide cpu_primary_thread mask
  x86/smpboot: Enable split CPU startup
  cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism
  cpu/hotplug: Reset task stack state in _cpu_up()
  cpu/hotplug: Remove unused state functions
  riscv: Switch to hotplug core state synchronization
  parisc: Switch to hotplug core state synchronization
  ...
2023-06-26 13:59:56 -07:00
Nick Desaulniers
f7584322e4
riscv: disable HAVE_LD_DEAD_CODE_DATA_ELIMINATION for LLD
Linking allyesconfig with ld.lld-17 with CONFIG_DEAD_CODE_ELIMINATION=y
takes hours.  Assuming this is a performance regression that can be
fixed, tentatively disable this for now so that allyesconfig builds
don't start timing out.  If and when there's a fix to ld.lld, this can
be converted to a version check instead so that users of older but still
supported versions of ld.lld don't hurt themselves by enabling
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y.

Link: https://github.com/ClangBuiltLinux/linux/issues/1881
Link: https://lore.kernel.org/linux-riscv/ZJXTwqZIkXLxXaSi@google.com/
Reported-by: Palmer Dabbelt <palmer@dabbelt.com>
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-25 16:30:50 -07:00
Zhangjin Wu
c828856b51
riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION
Select CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION for RISC-V, allowing
the user to enable dead code elimination. In order for this to work,
ensure that we keep the alternative table by annotating them with KEEP.

This boots well on qemu with both rv32_defconfig & rv64 defconfig, but
it only shrinks their builds by ~1%, a smaller config is thereforce
customized to test this feature:

          | rv32                   | rv64
  --------|------------------------|---------------------
   No DCE | 4460684                | 4893488
      DCE | 3986716                | 4376400
   Shrink |  473968 (~10.6%)       |  517088 (~10.5%)

The config used above only reserves necessary options to boot on qemu
with serial console, more like the size-critical embedded scenes:

  - rv64 config: https://pastebin.com/crz82T0s
  - rv32 config: rv64 config + 32-bit.config

Here is Jisheng's original commit-msg:
When trying to run linux with various opensource riscv core on
resource limited FPGA platforms, for example, those FPGAs with less
than 16MB SDRAM, I want to save mem as much as possible. One of the
major technologies is kernel size optimizations, I found that riscv
does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION, which
passes -fdata-sections, -ffunction-sections to CFLAGS and passes the
--gc-sections flag to the linker.

This not only benefits my case on FPGA but also benefits defconfigs.
Here are some notable improvements from enabling this with defconfigs:

nommu_k210_defconfig:
   text    data     bss     dec     hex
1112009  410288   59837 1582134  182436     before
 962838  376656   51285 1390779  1538bb     after

rv32_defconfig:
   text    data     bss     dec     hex
8804455 2816544  290577 11911576 b5c198     before
8692295 2779872  288977 11761144 b375f8     after

defconfig:
   text    data     bss     dec     hex
9438267 3391332  485333 13314932 cb2b74     before
9285914 3350052  483349 13119315 c82f53     after

Signed-off-by: Zhangjin Wu <falcon@tinylab.org>
Co-developed-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Tested-by: Bin Meng <bmeng@tinylab.org>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com> # build
Link: https://lore.kernel.org/r/20230523165502.2592-5-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-25 16:24:05 -07:00
Jisheng Zhang
ab7fa6b05e
riscv: move options to keep entries sorted
Recently, some commits break the entries order. Properly move their
locations to keep entries sorted.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Guo Ren <guoren@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com> # build
Link: https://lore.kernel.org/r/20230523165502.2592-2-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-25 16:24:02 -07:00
Ben Hutchings
7267ef7b0b riscv/mm: Convert to using lock_mm_and_find_vma()
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-24 14:12:58 -07:00
Palmer Dabbelt
b5e13f3ace
Merge patch series "riscv: Add independent irq/softirq stacks support"
guoren@kernel.org <guoren@kernel.org> says:

From: Guo Ren <guoren@linux.alibaba.com>

This patch series adds independent irq/softirq stacks to decrease the
press of the thread stack. Also, add a thread STACK_SIZE config for
users to adjust the proper size during compile time.

* b4-shazam-merge:
  riscv: stack: Add config of thread stack size
  riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK
  riscv: stack: Support HAVE_IRQ_EXIT_ON_IRQ_STACK

Link: https://lore.kernel.org/r/20230614013018.2168426-1-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-23 10:06:21 -07:00
Guo Ren
a7555f6b62
riscv: stack: Add config of thread stack size
The commit 0cac21b02b ("riscv: use 16KB kernel stack on 64-bit")
increases the thread size mandatory, but some scenarios, such as D1 with
a small memory footprint, would suffer from that. After independent irq
stack support, let's give users a choice to determine their custom stack
size.

Link: https://lore.kernel.org/linux-riscv/5f6e6c39-b846-4392-b468-02202404de28@www.fastmail.com/
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230614013018.2168426-4-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-22 10:38:37 -07:00
Guo Ren
dd69d07a5a
riscv: stack: Support HAVE_SOFTIRQ_ON_OWN_STACK
Add the HAVE_SOFTIRQ_ON_OWN_STACK feature for the IRQ_STACKS config, and
the irq and softirq use the same irq_stack of percpu.

Tested-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230614013018.2168426-3-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-22 10:38:36 -07:00
Guo Ren
163e76cc6e
riscv: stack: Support HAVE_IRQ_EXIT_ON_IRQ_STACK
Add independent irq stacks for percpu to prevent kernel stack overflows.
It is also compatible with VMAP_STACK by arch_alloc_vmap_stack.

Tested-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Clément Léger <cleger@rivosinc.com>
Link: https://lore.kernel.org/r/20230614013018.2168426-2-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-22 10:38:35 -07:00
Jisheng Zhang
648321fa0d
riscv: mm: try VMA lock-based page fault handling first
Attempt VMA lock-based page fault handling first, and fall back to the
existing mmap_lock-based handling if that fails.

A simple running the ebizzy benchmark on Lichee Pi 4A shows that
PER_VMA_LOCK can improve the ebizzy benchmark by about 32.68%. In
theory, the more CPUs, the bigger improvement, but I don't have any
HW platform which has more than 4 CPUs.

This is the riscv variant of "x86/mm: try VMA lock-based page fault
handling first".

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Link: https://lore.kernel.org/r/20230523165942.2630-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-20 08:10:13 -07:00
Ruan Jinjie
99a670b206
riscv: fix kprobe __user string arg print fault issue
On riscv qemu platform, when add kprobe event on do_sys_open() to show
filename string arg, it just print fault as follow:

echo 'p:myprobe do_sys_open dfd=$arg1 filename=+0($arg2):string flags=$arg3
mode=$arg4' > kprobe_events

bash-166     [000] ...1.   360.195367: myprobe: (do_sys_open+0x0/0x84)
dfd=0xffffffffffffff9c filename=(fault) flags=0x8241 mode=0x1b6

bash-166     [000] ...1.   360.219369: myprobe: (do_sys_open+0x0/0x84)
dfd=0xffffffffffffff9c filename=(fault) flags=0x8241 mode=0x1b6

bash-191     [000] ...1.   360.378827: myprobe: (do_sys_open+0x0/0x84)
dfd=0xffffffffffffff9c filename=(fault) flags=0x98800 mode=0x0

As riscv do not select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE,
the +0($arg2) addr is processed as a kernel address though it is a
userspace address, cause the above filename=(fault) print. So select
ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE to avoid the issue, after that the
kprobe trace is ok as below:

bash-166     [000] ...1.    96.767641: myprobe: (do_sys_open+0x0/0x84)
dfd=0xffffffffffffff9c filename="/dev/null" flags=0x8241 mode=0x1b6

bash-166     [000] ...1.    96.793751: myprobe: (do_sys_open+0x0/0x84)
dfd=0xffffffffffffff9c filename="/dev/null" flags=0x8241 mode=0x1b6

bash-177     [000] ...1.    96.962354: myprobe: (do_sys_open+0x0/0x84)
dfd=0xffffffffffffff9c filename="/sys/kernel/debug/tracing/events/kprobes/"
flags=0x98800 mode=0x0

Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Fixes: 0ebeea8ca8 ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work")
Link: https://lore.kernel.org/r/20230504072910.3742842-1-ruanjinjie@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 10:23:19 -07:00
Palmer Dabbelt
d5e45e810e
Merge patch series "riscv: Add vector ISA support"
Andy Chiu <andy.chiu@sifive.com> says:

This is the v21 patch series for adding Vector extension support in
Linux. Please refer to [1] for the introduction of the patchset. The
v21 patch series was aimed to solve build issues from v19, provide usage
guideline for the prctl interface, and address review comments on v20.

Thank every one who has been reviewing, suggesting on the topic. Hope
this get a step closer to the final merge.

* b4-shazam-merge: (27 commits)
  selftests: add .gitignore file for RISC-V hwprobe
  selftests: Test RISC-V Vector prctl interface
  riscv: Add documentation for Vector
  riscv: Enable Vector code to be built
  riscv: detect assembler support for .option arch
  riscv: Add sysctl to set the default vector rule for new processes
  riscv: Add prctl controls for userspace vector management
  riscv: hwcap: change ELF_HWCAP to a function
  riscv: KVM: Add vector lazy save/restore support
  riscv: kvm: Add V extension to KVM ISA
  riscv: prevent stack corruption by reserving task_pt_regs(p) early
  riscv: signal: validate altstack to reflect Vector
  riscv: signal: Report signal frame size to userspace via auxv
  riscv: signal: Add sigcontext save/restore for vector
  riscv: signal: check fp-reserved words unconditionally
  riscv: Add ptrace vector support
  riscv: Allocate user's vector context in the first-use trap
  riscv: Add task switch support for vector
  riscv: Introduce struct/helpers to save/restore per-task Vector state
  riscv: Introduce riscv_v_vsize to record size of Vector context
  ...

Link: https://lore.kernel.org/r/20230605110724.21391-1-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:17:09 -07:00
Guo Ren
fa8e7cce55
riscv: Enable Vector code to be built
This patch adds configs for building Vector code. First it detects the
reqired toolchain support for building the code. Then it provides an
option setting whether Vector is implicitly enabled to userspace.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Co-developed-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Co-developed-by: Andy Chiu <andy.chiu@sifive.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230605110724.21391-25-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:16:56 -07:00
Andy Chiu
e4bb020f3d
riscv: detect assembler support for .option arch
Some extensions use .option arch directive to selectively enable certain
extensions in parts of its assembly code. For example, Zbb uses it to
inform assmebler to emit bit manipulation instructions. However,
supporting of this directive only exist on GNU assembler and has not
landed on clang at the moment, making TOOLCHAIN_HAS_ZBB depend on
AS_IS_GNU.

While it is still under review at https://reviews.llvm.org/D123515, the
upcoming Vector patch also requires this feature in assembler. Thus,
provide Kconfig AS_HAS_OPTION_ARCH to detect such feature. Then
TOOLCHAIN_HAS_XXX will be turned on automatically when the feature land.

Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230605110724.21391-24-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:16:55 -07:00
Sunil V L
a91a9ffbd3
RISC-V: Add support to build the ACPI core
Enable ACPI core for RISC-V after adding architecture-specific
interfaces and header files required to build the ACPI core.

1) Couple of header files are required unconditionally by the ACPI
core. Add empty acenv.h and cpu.h header files.

2) If CONFIG_PCI is enabled, a few PCI related interfaces need to
be provided by the architecture. Define dummy interfaces for now
so that build succeeds. Actual implementation will be added when
PCI support is added for ACPI along with external interrupt
controller support.

3) A few globals and memory mapping related functions specific
to the architecture need to be provided.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230515054928.2079268-7-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-01 08:45:01 -07:00
Conor Dooley
ed309ce522
RISC-V: mark hibernation as nonportable
Hibernation support depends on firmware marking its reserved/PMP
protected regions as not accessible from Linux.
The latest versions of the de-facto SBI implementation (OpenSBI) do
not do this, having dropped the no-map property to enable 1 GiB huge
page mappings by the kernel.
This was exposed by commit 3335068f87 ("riscv: Use PUD/P4D/PGD pages
for the linear mapping"), which made the first 2 MiB of DRAM (where SBI
typically resides) accessible by the kernel.
Attempting to hibernate with either OpenSBI, or other implementations
following its lead, will lead to a kernel panic ([1], [2]) as the
hibernation process will attempt to save/restore any mapped regions,
including the PMP protected regions in use by the SBI implementation.

Mark hibernation as depending on "NONPORTABLE", as only a small subset
of systems are capable of supporting it, until such time that an SBI
implementation independent way to communicate what regions are in use
has been agreed on.

As hibernation support landed in v6.4-rc1, disabling it for most
platforms does not constitute a regression. The alternative would have
been reverting commit 3335068f87 ("riscv: Use PUD/P4D/PGD pages for
the linear mapping").
Doing so would permit hibernation on platforms with these SBI
implementations, but would limit the options we have to solve the
protection of the region without causing a regression in hibernation
support.

Reported-by: Song Shuai <suagrfillet@gmail.com>
Link: https://lore.kernel.org/all/CAAYs2=gQvkhTeioMmqRDVGjdtNF_vhB+vm_1dHJxPNi75YDQ_Q@mail.gmail.com/ [1]
Reported-by: JeeHeng Sia <jeeheng.sia@starfivetech.com>
Link: https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/ITXwaKfA6z8 [2]
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230526-astride-detonator-9ae120051159@wendy
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-05-29 06:38:04 -07:00
Thomas Gleixner
72b11aa7f8 riscv: Switch to hotplug core state synchronization
Switch to the CPU hotplug core state tracking and synchronization
mechanim. No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205256.916055844@linutronix.de
2023-05-15 13:44:59 +02:00
Linus Torvalds
982365a8f5 RISC-V Patches for the 6.4 Merge Window, Part 2
* Support for hibernation.
 * .rela.dyn has been moved to init.
 * A fix for the SBI probing to allow for implementation-defined
   behavior.
 * Various other fixes and cleanups throughout the tree.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmRVHRATHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiearD/9tUL5STN3icSO58t2EBAmp4CuyBqWo
 KVhOmLmvZqz259GeqfcRsHANszLTwRPzyWxHQJGugPzAphZu3ukQRR8BEDTwwZJO
 toIhv9hXZ4RAu8Chi6Fs/J1WyYVyqSneGTk68xXBXOmm1MWaqU91z92Q5bJGfWqy
 yBSPOTMFvnHHAOdhIXigxLl+z0Y9EV013L18aesHArnuDHIgPGSF9UI6slQ7ThNV
 PhR+VsApd3Ho7+njOzK+mn+1afICKXXGAtmrPjyEt+nE4LmaJc/XY471SPTSlr3U
 BLWm3jmVTK/0peZxce4I2H6k3gz21PiSAy21E+26Bp2+lZD1iWH601eUyasLY88n
 FYXF5VQNvwMx8Ba/yN4VmQ8M25eJ7s7AKWvGa6VLwu0iHxGWmePqoaFuI6JaSXON
 TzJFJDN9xAaBf4Jt7c2c4X9tPJTEFZu6V51AaDDJllw/IJicwHNlNskZUsfvmqqb
 wE/fF6VtcrvEoeKvizOyZGXMs6Wgg6soufL0Ve8rD12U6ZBknVkGruQxF7B+JYsJ
 Ri6ndfKuguMRm6hZmJlVCfFULtm+D6wFczWmmfF562AFISAticib8u/kPz3jAGCu
 GbozEi333FFLBat2QpPK9zL0sH6tj7GCT3ppJjpjUtCmGPyyZuD8zT3rgTxSc8pe
 fp1EE13A2rsU3A==
 =xoqj
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.4-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull more RISC-V updates from Palmer Dabbelt:

 - Support for hibernation

 - The .rela.dyn section has been moved to the init area

 - A fix for the SBI probing to allow for implementation-defined
   behavior

 - Various other fixes and cleanups throughout the tree

* tag 'riscv-for-linus-6.4-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: include cpufeature.h in cpufeature.c
  riscv: Move .rela.dyn to the init sections
  dt-bindings: riscv: explicitly mention assumption of Zicsr & Zifencei support
  riscv: compat_syscall_table: Fixup compile warning
  RISC-V: fixup in-flight collision with ARCH_WANT_OPTIMIZE_VMEMMAP rename
  RISC-V: fix sifive and thead section mismatches in errata
  RISC-V: Align SBI probe implementation with spec
  riscv: mm: remove redundant parameter of create_fdt_early_page_table
  riscv: Adjust dependencies of HAVE_DYNAMIC_FTRACE selection
  RISC-V: Add arch functions to support hibernation/suspend-to-disk
  RISC-V: mm: Enable huge page support to kernel_page_present() function
  RISC-V: Factor out common code of __cpu_resume_enter()
  RISC-V: Change suspend_save_csrs and suspend_restore_csrs to public function
2023-05-05 12:23:33 -07:00
Conor Dooley
26b0812f4c
RISC-V: fixup in-flight collision with ARCH_WANT_OPTIMIZE_VMEMMAP rename
Lukas warned that ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP had been
renamed in the mm tree & that RISC-V would need a fixup as part of the
merge. The warning was missed however, and RISC-V is selecting the
orphaned Kconfig option.

Fixes: 89d77f71f4 ("Merge tag 'riscv-for-linus-6.4-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux")
Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>.
Link: https://lore.kernel.org/linux-riscv/CAKXUXMyVeg2kQK_edKHtMD3eADrDK_PKhCSVkMrLDdYgTQQ5rg@mail.gmail.com/
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230429-trilogy-jolly-12bf5c53d62d@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-29 15:33:10 -07:00
Palmer Dabbelt
38dab744f7
Merge patch series "RISC-V Hibernation Support"
Sia Jee Heng <jeeheng.sia@starfivetech.com> says:

This series adds RISC-V Hibernation/suspend to disk support.
Low level Arch functions were created to support hibernation.
swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
cpu state onto the stack, then calling swsusp_save() to save the memory
image.

Arch specific hibernation header is implemented and is utilized by the
arch_hibernation_header_restore() and arch_hibernation_header_save()
functions. The arch specific hibernation header consists of satp, hartid,
and the cpu_resume address. The kernel built version is also need to be
saved into the hibernation image header to making sure only the same
kernel is restore when resume.

swsusp_arch_resume() creates a temporary page table that covering only
the linear map. It copies the restore code to a 'safe' page, then start to
restore the memory image. Once completed, it restores the original
kernel's page table. It then calls into __hibernate_cpu_resume()
to restore the CPU context. Finally, it follows the normal hibernation
path back to the hibernation core.

To enable hibernation/suspend to disk into RISCV, the below config
need to be enabled:
- CONFIG_HIBERNATION
- CONFIG_ARCH_HIBERNATION_HEADER
- CONFIG_ARCH_HIBERNATION_POSSIBLE

At high-level, this series includes the following changes:
1) Change suspend_save_csrs() and suspend_restore_csrs()
   to public function as these functions are common to
   suspend/hibernation. (patch 1)
2) Refactor the common code in the __cpu_resume_enter() function and
   __hibernate_cpu_resume() function. The common code are used by
   hibernation and suspend. (patch 2)
3) Enhance kernel_page_present() function to support huge page. (patch 3)
4) Add arch/riscv low level functions to support
   hibernation/suspend to disk. (patch 4)

* b4-shazam-merge:
  RISC-V: Add arch functions to support hibernation/suspend-to-disk
  RISC-V: mm: Enable huge page support to kernel_page_present() function
  RISC-V: Factor out common code of __cpu_resume_enter()
  RISC-V: Change suspend_save_csrs and suspend_restore_csrs to public function

Link: https://lore.kernel.org/r/20230330064321.1008373-1-jeeheng.sia@starfivetech.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-29 11:27:33 -07:00
Nathan Chancellor
b3d6bdfea2
riscv: Adjust dependencies of HAVE_DYNAMIC_FTRACE selection
When building allmodconfig with clang and its integrated assembler and
linking with a version of GNU ld prior to 2.36, the following link error
occurs:

  riscv64-linux-gnu-ld: .init.data has both ordered [`__patchable_function_entries' in init/main.o] and unordered [`.init_array.0' in kernel/trace/trace_benchmark.o] sections
  riscv64-linux-gnu-ld: final link failed: bad value

This is the same error addressed by commit 45bd895180 ("arm64: Improve
HAVE_DYNAMIC_FTRACE_WITH_REGS selection for clang") for arm64. See that
changelog for a full description of why this error occurs with this
combination of tools.

In a similar manner as that change, restrict the
CONFIG_HAVE_DYNAMIC_FTRACE selection to combinations of tools known to
work so that there are no errors.

Link: https://github.com/ClangBuiltLinux/linux/issues/1817
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230404-riscv-dynamic-ftrace-checks-clang-v1-1-0ce296b7d423@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-29 11:27:32 -07:00
Sia Jee Heng
c031721001
RISC-V: Add arch functions to support hibernation/suspend-to-disk
Low level Arch functions were created to support hibernation.
swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
cpu state onto the stack, then calling swsusp_save() to save the memory
image.

Arch specific hibernation header is implemented and is utilized by the
arch_hibernation_header_restore() and arch_hibernation_header_save()
functions. The arch specific hibernation header consists of satp, hartid,
and the cpu_resume address. The kernel built version is also need to be
saved into the hibernation image header to making sure only the same
kernel is restore when resume.

swsusp_arch_resume() creates a temporary page table that covering only
the linear map. It copies the restore code to a 'safe' page, then start
to restore the memory image. Once completed, it restores the original
kernel's page table. It then calls into __hibernate_cpu_resume()
to restore the CPU context. Finally, it follows the normal hibernation
path back to the hibernation core.

To enable hibernation/suspend to disk into RISCV, the below config
need to be enabled:
- CONFIG_HIBERNATION
- CONFIG_ARCH_HIBERNATION_HEADER
- CONFIG_ARCH_HIBERNATION_POSSIBLE

Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230330064321.1008373-5-jeeheng.sia@starfivetech.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-29 11:25:13 -07:00
Linus Torvalds
b28e6315a0 dma-mapping updates for Linux 6.4
- fix a PageHighMem check in dma-coherent initialization (Doug Berger)
  - clean up the coherency defaul initialiation (Jiaxun Yang)
  - add cacheline to user/kernel dma-debug space dump messages
    (Desnes Nunes, Geert Uytterhoeve)
  - swiotlb statistics improvements (Michael Kelley)
  - misc cleanups (Petr Tesarik)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmRLYsoLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYP4+RAAwpIqI198CrPxodCuBdwetuxznwncdwFvU3W+NQLF
 cC5gDeUB2ZZevVh3moKITV7gXHrbTJF7jQs9jpWV0QEA5APzu0WDf3Y0m4sXPVpn
 E9jS3jGJyntZ9rIMzHFs/lguI37xzT1YRAHAYgoZ84b7K/9g94NgEE2HecfNKVqZ
 D6PN0UJcA4KQo+5UJ7MWiQxWM3QAwVfSKsP1mXv51tiRGo4UUzNW77Ej2nKRJjhK
 wDNiZ+08khfeS2BuF9J2ebAzpgma5EgweH2z7zmx8Ch5t4Cx6hVAQ4Z6axbZMGjP
 HxXPw5rIwZTnQYoaGU86BrxrFH2j2bb963kWoDzliH+4PQrJ/iIEpkF7vu5Y2oWr
 WtXdOo6CsdQh1rT1UWA87ZYDtkWgj3/ITv5xJrXf8VyD9WHHSPst616XHLzBLGzo
 Hc+lAPhnVm59XZhQbVgXZy37Eqa9qHEG6GIRUkwD13nttSSfLfizO0IlXlH+awQV
 2A+TjbAt2lneUaRzMPfxG/yFt3rPqbBfSWj3o2ClPPn9sKksKxj7IjNW0v81Ztq/
 H6UmYRuq+wlQJzlwiF8+6SzoBXObztrmtIa2ipiM5k+xePG1jsPGFLm98UMlPcxN
 5IMz78DQ/hE3K3fKRt6clImd98xq5R0H9iUQPor2I7C/67fpTjThDRdHDUina1tk
 Oxo=
 =vAit
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-6.4-2023-04-28' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - fix a PageHighMem check in dma-coherent initialization (Doug Berger)

 - clean up the coherency defaul initialiation (Jiaxun Yang)

 - add cacheline to user/kernel dma-debug space dump messages (Desnes
   Nunes, Geert Uytterhoeve)

 - swiotlb statistics improvements (Michael Kelley)

 - misc cleanups (Petr Tesarik)

* tag 'dma-mapping-6.4-2023-04-28' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: Omit total_used and used_hiwater if !CONFIG_DEBUG_FS
  swiotlb: track and report io_tlb_used high water marks in debugfs
  swiotlb: fix debugfs reporting of reserved memory pools
  swiotlb: relocate PageHighMem test away from rmem_swiotlb_setup
  of: address: always use dma_default_coherent for default coherency
  dma-mapping: provide CONFIG_ARCH_DMA_DEFAULT_COHERENT
  dma-mapping: provide a fallback dma_default_coherent
  dma-debug: Use %pa to format phys_addr_t
  dma-debug: add cacheline to user/kernel space dump messages
  dma-debug: small dma_debug_entry's comment and variable name updates
  dma-direct: cleanup parameters to dma_direct_optimal_gfp_mask
2023-04-29 10:29:57 -07:00
Linus Torvalds
89d77f71f4 RISC-V Patches for the 6.4 Merge Window, Part 1
* Support for runtime detection of the Svnapot extension.
 * Support for Zicboz when clearing pages.
 * We've moved to GENERIC_ENTRY.
 * Support for !MMU on rv32 systems.
 * The linear region is now mapped via huge pages.
 * Support for building relocatable kernels.
 * Support for the hwprobe interface.
 * Various fixes and cleanups throughout the tree.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmRL5rcTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYibpcD/0RnmO+N2OJxsJXf0KtHv4LlChAFaMZ
 mfcsU8lv8r3Rz1USJGyVoE57885R+iUw1664ic6Gj9Ll9/A+BDVyqlNeo1BZ7nnv
 6hZawSh8XGMyCJoatjaCSMW6VKObsSpHXLoA0mxtj06w1XhtpUnzjv4SZQqBYxC2
 7+/cfy6l3uGdSKQ0R402sF8PE+l3HthhO+Cw9NYHQZisAHEQrfFpXRnrovhs+vX0
 aVxoWo8bmIhhNke2jh6dnGhfFfAs+UClbaKgZfe8af6feboo+Tal3+OibiEy1K1j
 hDQ3w/G5jAdwSqnNPdXzpk4srskUOhP9is8AG79vCasMxybQIBfZcc7/kLmmQX+2
 xt1EoDVD/lSO1p+CWRautLXEsInWbpBYaSJie7WcR4SHe8S7/nomTDlwkJHx5cma
 mkSYHJKNwCbamDTI3gXg8nrScbxsRnJQsQUolFDwAeRz7AYVwtqVh8VxAWqAdU3q
 xUNKrUpCAzNC3d5GL7pmRfZrqjpQhuFXkHFSy85vaCPuckBu926OzxpKBmX4Kea1
 qLYWfxv78bcwuY47FWJKcd97Ib63iBYDgarJxvrHrwDaHV2xjBOmdapNPUc2PswT
 a938enbYYnJHIbuSmbeNBPF4iF6nKUXshyfZu7tCZl6MzsXloUckGdm++j97Bpvr
 g6G3ZP6STSQBmw==
 =oxQd
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.4-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - Support for runtime detection of the Svnapot extension

 - Support for Zicboz when clearing pages

 - We've moved to GENERIC_ENTRY

 - Support for !MMU on rv32 systems

 - The linear region is now mapped via huge pages

 - Support for building relocatable kernels

 - Support for the hwprobe interface

 - Various fixes and cleanups throughout the tree

* tag 'riscv-for-linus-6.4-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (57 commits)
  RISC-V: hwprobe: Explicity check for -1 in vdso init
  RISC-V: hwprobe: There can only be one first
  riscv: Allow to downgrade paging mode from the command line
  dt-bindings: riscv: add sv57 mmu-type
  RISC-V: hwprobe: Remove __init on probe_vendor_features()
  riscv: Use --emit-relocs in order to move .rela.dyn in init
  riscv: Check relocations at compile time
  powerpc: Move script to check relocations at compile time in scripts/
  riscv: Introduce CONFIG_RELOCATABLE
  riscv: Move .rela.dyn outside of init to avoid empty relocations
  riscv: Prepare EFI header for relocatable kernels
  riscv: Unconditionnally select KASAN_VMALLOC if KASAN
  riscv: Fix ptdump when KASAN is enabled
  riscv: Fix EFI stub usage of KASAN instrumented strcmp function
  riscv: Move DTB_EARLY_BASE_VA to the kernel address space
  riscv: Rework kasan population functions
  riscv: Split early and final KASAN population functions
  riscv: Use PUD/P4D/PGD pages for the linear mapping
  riscv: Move the linear mapping creation in its own function
  riscv: Get rid of riscv_pfn_base variable
  ...
2023-04-28 16:55:39 -07:00
Linus Torvalds
53b5e72b9d asm-generic updates for 6.4
These are various cleanups, fixing a number of uapi header files to no
 longer reference CONFIG_* symbols, and one patch that introduces the
 new CONFIG_HAS_IOPORT symbol for architectures that provide working
 inb()/outb() macros, as a preparation for adding driver dependencies
 on those in the following release.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmRG8IkACgkQYKtH/8kJ
 Uid15Q/9E/neIIEqEk6IvtyhUicrJiIZUM0rGoYtWXiz75ggk6Kx9+3I+j8zIQ/E
 kf2TzAG7q9Md7nfTDFLr4FSr0IcNDj+VG4nYxUyDHdKGcARO+g9Kpdvscxip3lgU
 Rw5w74Gyd30u4iUKGS39OYuxcCgl9LaFjMA9Gh402Oiaoh+OYLmgQS9h/goUD5KN
 Nd+AoFvkdbnHl0/SpxthLRyL5rFEATBmAY7apYViPyMvfjS3gfDJwXJR9jkKgi6X
 Qs4t8Op8BA3h84dCuo6VcFqgAJs2Wiq3nyTSUnkF8NxJ2RFTpeiVgfsLOzXHeDgz
 SKDB4Lp14o3mlyZyj00MWq1uMJRRetUgNiVb6iHOoKQ/E4demBdh+mhIFRybjM5B
 XNTWFcg9PWFCMa4W9jnLfZBc881X4+7T+qUF8I0W/1AbRJUmyGj8HO6jLceC4yGD
 UYLn5oFPM6OWXHp6DqJrCr9Yw8h6fuviQZFEbl/ARlgVGt+J4KbYweJYk8DzfX6t
 PZIj8LskOqyIpRuC2oDA1PHxkaJ1/z+N5oRBHq1uicSh4fxY5HW7HnyzgF08+R3k
 cf+fjAhC3TfGusHkBwQKQJvpxrxZjPuvYXDZ0GxTvNKJRB8eMeiTm1n41E5oTVwQ
 swSblSCjZj/fMVVPXLcjxEW4SBNWRxa9Lz3tIPXb3RheU10Lfy8=
 =H3k4
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic updates from Arnd Bergmann:
 "These are various cleanups, fixing a number of uapi header files to no
  longer reference CONFIG_* symbols, and one patch that introduces the
  new CONFIG_HAS_IOPORT symbol for architectures that provide working
  inb()/outb() macros, as a preparation for adding driver dependencies
  on those in the following release"

* tag 'asm-generic-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  Kconfig: introduce HAS_IOPORT option and select it as necessary
  scripts: Update the CONFIG_* ignore list in headers_install.sh
  pktcdvd: Remove CONFIG_CDROM_PKTCDVD_WCACHE from uapi header
  Move bp_type_idx to include/linux/hw_breakpoint.h
  Move ep_take_care_of_epollwakeup() to fs/eventpoll.c
  Move COMPAT_ATM_ADDPARTY to net/atm/svc.c
2023-04-25 12:22:11 -07:00
Thomas Gleixner
f37202aa6e irqchip changes for 6.4
- Large RISC-V IPI rework to make way for a new interrupt
   architecture
 
 - More Loongarch fixes from Lianmin Lv, fixing issues in the so
   called "dual-bridge" systems.
 
 - Workaround for the nvidia T241 chip that gets confused in
   3 and 4 socket configurations, leading to the GIC
   malfunctionning in some contexts
 
 - Drop support for non-firmware driven GIC configurarations
   now that the old ARM11MP Cavium board is gone
 
 - Workaround for the Rockchip 3588 chip that doesn't
   correctly deal with the shareability attributes.
 
 - Replace uses of of_find_property() with the more appropriate
   of_property_read_bool()
 
 - Make bcm-6345-l1 request its MMIO region
 
 - Add suspend support to the SiFive PLIC
 
 - Drop support for stih415, stih416 and stid127 platforms
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmRCjF0PHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDfAcQAMCqxwZcMPtMDcj1pfuiPM4mcPnb81nB80ms
 0Clc+O/fLvW4bduVQN9S+P4aJ3WNYUw0nTrF13CzUZpPOLGt8NlylSZ0+EQV6+th
 60+YoJ0X5ovlQ6tMUzhJjuU2HFQxXyvNmkXsrr4xaPlZaOt01OCNGhKVwuq+f7nJ
 U87id3M3Nt6dSj//NonBdy+61y/XZiPQXZZP6act71f5Rz+REnBdEESzWLbzV9jo
 UiVZD24OWLE+A0N4JqTXI4ruXIaOdwfOi31hN/OF3JrWWrM99RYK/pLJM/mEvCdw
 HA0acS/uj9wQLUD9OFPCKE3aGOA4rdcWUrwzQvINZMIezHgaWfuOT/J5BbF9iH+Y
 nrySLQB9xl9TVtMsfCOmjc+yPDDQ0AycrV++pn6CH59dx4WR3e2wTe68arHMoO9h
 ljvEHVb2inDYJqHhyYKhNehlD1YGNFcOT4QRrduWxQezl3RKdYEjJngmm7QBfRT1
 ee62Jm2RodPAknuDnnJNsFnHAE8emu9RLdKviEmPT66oclhw2sgn/V2JC+O+Oe0e
 TkXrn4AzrIpEjbEUMJAow7RTSpzIGx921fcqbSpZrf6k75rgxaKBPPxtV4uANfII
 gbOKklj15pljuPuxsputWzkdU9xVvgdg/5CHw6+AA68rNeB16NbTMe9RsiYNzlF6
 7XBGnW3Q
 =TsID
 -----END PGP SIGNATURE-----

Merge tag 'irqchip-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core

Pull irqchip changes from Marc Zyngier:

 - Large RISC-V IPI rework to make way for a new interrupt
   architecture

 - More Loongarch fixes from Lianmin Lv, fixing issues in the so
   called "dual-bridge" systems.

 - Workaround for the nvidia T241 chip that gets confused in
   3 and 4 socket configurations, leading to the GIC
   malfunctionning in some contexts

 - Drop support for non-firmware driven GIC configurarations
   now that the old ARM11MP Cavium board is gone

 - Workaround for the Rockchip 3588 chip that doesn't
   correctly deal with the shareability attributes.

 - Replace uses of of_find_property() with the more appropriate
   of_property_read_bool()

 - Make bcm-6345-l1 request its MMIO region

 - Add suspend support to the SiFive PLIC

 - Drop support for stih415, stih416 and stid127 platforms

Link: https://lore.kernel.org/lkml/20230421132104.3021536-1-maz@kernel.org
2023-04-21 17:30:57 +02:00
Palmer Dabbelt
310c33dc7a
Merge patch series "Introduce 64b relocatable kernel"
Alexandre Ghiti <alexghiti@rivosinc.com> says:

After multiple attempts, this patchset is now based on the fact that the
64b kernel mapping was moved outside the linear mapping.

The first patch allows to build relocatable kernels but is not selected
by default. That patch is a requirement for KASLR.
The second and third patches take advantage of an already existing powerpc
script that checks relocations at compile-time, and uses it for riscv.

* b4-shazam-merge:
  riscv: Use --emit-relocs in order to move .rela.dyn in init
  riscv: Check relocations at compile time
  powerpc: Move script to check relocations at compile time in scripts/
  riscv: Introduce CONFIG_RELOCATABLE
  riscv: Move .rela.dyn outside of init to avoid empty relocations
  riscv: Prepare EFI header for relocatable kernels

Link: https://lore.kernel.org/r/20230329045329.64565-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-19 07:47:45 -07:00
Alexandre Ghiti
39b3307294
riscv: Introduce CONFIG_RELOCATABLE
This config allows to compile 64b kernel as PIE and to relocate it at
any virtual address at runtime: this paves the way to KASLR.
Runtime relocation is possible since relocation metadata are embedded into
the kernel.

Note that relocating at runtime introduces an overhead even if the
kernel is loaded at the same address it was linked at and that the compiler
options are those used in arm64 which uses the same RELA relocation
format.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230329045329.64565-4-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-19 07:46:30 -07:00
Palmer Dabbelt
2667e3673f
Merge patch series "RISC-V kasan rework"
Alexandre Ghiti <alexghiti@rivosinc.com> says:

As described in patch 2, our current kasan implementation is intricate,
so I tried to simplify the implementation and mimic what arm64/x86 are
doing.

In addition it fixes UEFI bootflow with a kasan kernel and kasan inline
instrumentation: all kasan configurations were tested on a large ubuntu
kernel with success with KASAN_KUNIT_TEST and KASAN_MODULE_TEST.

inline ubuntu config + uefi:
 sv39: OK
 sv48: OK
 sv57: OK

outline ubuntu config + uefi:
 sv39: OK
 sv48: OK
 sv57: OK

Actually 1 test always fails with KASAN_KUNIT_TEST that I have to check:
KASAN failure expected in "set_bit(nr, addr)", but none occurrred

Note that Palmer recently proposed to remove COMMAND_LINE_SIZE from the
userspace abi
https://lore.kernel.org/lkml/20221211061358.28035-1-palmer@rivosinc.com/T/
so that we can finally increase the command line to fit all kasan kernel
parameters.

All of this should hopefully fix the syzkaller riscv build that has been
failing for a few months now, any test is appreciated and if I can help
in any way, please ask.

* b4-shazam-merge:
  riscv: Unconditionnally select KASAN_VMALLOC if KASAN
  riscv: Fix ptdump when KASAN is enabled
  riscv: Fix EFI stub usage of KASAN instrumented strcmp function
  riscv: Move DTB_EARLY_BASE_VA to the kernel address space
  riscv: Rework kasan population functions
  riscv: Split early and final KASAN population functions

Link: https://lore.kernel.org/r/20230203075232.274282-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-19 07:24:56 -07:00
Alexandre Ghiti
864046c512
riscv: Unconditionnally select KASAN_VMALLOC if KASAN
If KASAN is enabled, VMAP_STACK depends on KASAN_VMALLOC so enable
KASAN_VMALLOC with KASAN so that we can enable VMAP_STACK by default.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-7-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-19 07:24:54 -07:00
Conor Dooley
5464912cfa
RISC-V: align ISA extension Kconfig help text with each other
Other extensions only capitalise the first letter in the text visible
in Kconfig menus, and provide a short comment about the extension's
meaning. Do the same for Svnapot & Svpbmt.

The precedent for capitalisation in the Kconfig text was set by Zicbom
& sorta followed for Zicboz. The RVI styling used for multi-letter
extensions only capitalises the first letter, so do the same here.
If nothing else, my OCD likes it when the extensions follow a consistent
pattern.

While editing one of the lines, reformat the "spelling" of 64-bit.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230405-pucker-cogwheel-3a999a94a2f2@wendy
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-18 20:37:21 -07:00
Song Shuai
8bf7b3b667
riscv: Kconfig: enable SCHED_MC kconfig
RISC-V now builds the sched domain based on the simple possible map.

Enable SCHED_MC to make the building based on cpu_coregroup_mask()
which also takes care of the NUMA and cores with LLC.

Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230310110336.970985-1-suagrfillet@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-18 20:35:43 -07:00
Palmer Dabbelt
eb04e72b34
Merge patch series "RISC-V Hardware Probing User Interface"
Evan Green <evan@rivosinc.com> says:

There's been a bunch of off-list discussions about this, including at
Plumbers.  The original plan was to do something involving providing an
ISA string to userspace, but ISA strings just aren't sufficient for a
stable ABI any more: in order to parse an ISA string users need the
version of the specifications that the string is written to, the version
of each extension (sometimes at a finer granularity than the RISC-V
releases/versions encode), and the expected use case for the ISA string
(ie, is it a U-mode or M-mode string).  That's a lot of complexity to
try and keep ABI compatible and it's probably going to continue to grow,
as even if there's no more complexity in the specifications we'll have
to deal with the various ISA string parsing oddities that end up all
over userspace.

Instead this patch set takes a very different approach and provides a set
of key/value pairs that encode various bits about the system.  The big
advantage here is that we can clearly define what these mean so we can
ensure ABI stability, but it also allows us to encode information that's
unlikely to ever appear in an ISA string (see the misaligned access
performance, for example).  The resulting interface looks a lot like
what arm64 and x86 do, and will hopefully fit well into something like
ACPI in the future.

The actual user interface is a syscall, with a vDSO function in front of
it. The vDSO function can answer some queries without a syscall at all,
and falls back to the syscall for cases it doesn't have answers to.
Currently we prepopulate it with an array of answers for all keys and
a CPU set of "all CPUs". This can be adjusted as necessary to provide
fast answers to the most common queries.

An example series in glibc exposing this syscall and using it in an
ifunc selector for memcpy can be found at [1].

I was asked about the performance delta between this and something like
sysfs. I created a small test program and ran it on a Nezha D1
Allwinner board. Doing each operation 100000 times and dividing, these
operations take the following amount of time:
 - open()+read()+close() of /sys/kernel/cpu_byteorder: 3.8us
 - access("/sys/kernel/cpu_byteorder", R_OK): 1.3us
 - riscv_hwprobe() vDSO and syscall: .0094us
 - riscv_hwprobe() vDSO with no syscall: 0.0091us

These numbers get farther apart if we query multiple keys, as sysfs will
scale linearly with the number of keys, where the dedicated syscall
stays the same. To frame these numbers, I also did a tight
fork/exec/wait loop, which I measured as 4.8ms. So doing 4
open/read/close operations is a delta of about 0.3%, versus a single vDSO
call is a delta of essentially zero.

[1] https://patchwork.ozlabs.org/project/glibc/list/?series=343050

* b4-shazam-merge:
  RISC-V: Add hwprobe vDSO function and data
  selftests: Test the new RISC-V hwprobe interface
  RISC-V: hwprobe: Support probing of misaligned access performance
  RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
  RISC-V: Add a syscall for HW probing
  RISC-V: Move struct riscv_cpuinfo to new header

Link: https://lore.kernel.org/r/20230407231103.2622178-1-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-18 19:49:51 -07:00
Evan Green
aa5af0aa90
RISC-V: Add hwprobe vDSO function and data
Add a vDSO function __vdso_riscv_hwprobe, which can sit in front of the
riscv_hwprobe syscall and answer common queries. We stash a copy of
static answers for the "all CPUs" case in the vDSO data page. This data
is private to the vDSO, so we can decide later to change what's stored
there or under what conditions we defer to the syscall. Currently all
data can be discovered at boot, so the vDSO function answers all queries
when the cpumask is set to the "all CPUs" hint.

There's also a boolean in the data that lets the vDSO function know that
all CPUs are the same. In that case, the vDSO will also answer queries
for arbitrary CPU masks in addition to the "all CPUs" hint.

Signed-off-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-7-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-18 15:48:18 -07:00
Anup Patel
832f15f426 RISC-V: Treat IPIs as normal Linux IRQs
Currently, the RISC-V kernel provides arch specific hooks (i.e.
struct riscv_ipi_ops) to register IPI handling methods. The stats
gathering of IPIs is also arch specific in the RISC-V kernel.

Other architectures (such as ARM, ARM64, and MIPS) have moved away
from custom arch specific IPI handling methods. Currently, these
architectures have Linux irqchip drivers providing a range of Linux
IRQ numbers to be used as IPIs and IPI triggering is done using
generic IPI APIs. This approach allows architectures to treat IPIs
as normal Linux IRQs and IPI stats gathering is done by the generic
Linux IRQ subsystem.

We extend the RISC-V IPI handling as-per above approach so that arch
specific IPI handling methods (struct riscv_ipi_ops) can be removed
and the IPI handling is done through the Linux IRQ subsystem.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230328035223.1480939-4-apatel@ventanamicro.com
2023-04-08 11:26:24 +01:00
Jiaxun Yang
c00a60d6f4 of: address: always use dma_default_coherent for default coherency
As for now all arches have dma_default_coherent reflecting default
DMA coherency for of devices, so there is no need to have a standalone
config option.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-04-07 07:38:28 +02:00
Niklas Schnelle
fcbfe8121a
Kconfig: introduce HAS_IOPORT option and select it as necessary
We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O
Port access. In a future patch HAS_IOPORT=n will disable compilation of
the I/O accessor functions inb()/outb() and friends on architectures
which can not meaningfully support legacy I/O spaces such as s390.

The following architectures do not select HAS_IOPORT:

* ARC
* C-SKY
* Hexagon
* Nios II
* OpenRISC
* s390
* User-Mode Linux
* Xtensa

All other architectures select HAS_IOPORT at least conditionally.

The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs
for HAS_IOPORT specific sections will be added in subsequent patches on
a per subsystem basis.

Co-developed-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net> # for ARCH=um
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-04-05 22:15:19 +02:00
Conor Dooley
d34a6b715a
RISC-V: convert new selectors of RISCV_ALTERNATIVE to dependencies
for-next contains two additional extensions that select
RISCV_ALTERNATIVE. RISCV_ALTERNATIVE no longer needs to be selected by
individual config options as it is now selected for !XIP_KERNEL builds
by the top level RISCV option.
These extensions rely on the alternative framework, so convert the
"select"s to "depends on"s instead.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230324121240.3594777-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-29 12:27:01 -07:00
Palmer Dabbelt
be693ef2a4
Merge patch series "RISC-V: Fixes for riscv_has_extension[un]likely()'s alternative dependency"
Conor Dooley <conor.dooley@microchip.com> says:

Here's my attempt at fixing both the use of an FPU on XIP kernels and
the issue that Jason ran into where CONFIG_FPU, which needs the
alternatives frame work for has_fpu() checks, could be enabled without
the alternatives actually being present.

For the former, a "slow" fallback that does not use alternatives is
added to riscv_has_extension_[un]likely() that can be used with XIP.
Obviously, we want to make use of Jisheng's alternatives based approach
where possible, so any users of riscv_has_extension_[un]likely() will
want to make sure that they select RISCV_ALTERNATIVE.
If they don't however, they'll hit the fallback path which (should,
sparing a silly mistake from me!) behave in the same way, thus
succeeding silently. Sounds like a

To prevent "depends on !XIP_KERNEL; select RISCV_ALTERNATIVE" spreading
like the plague through the various places that want to check for the
presence of extensions, and sidestep the potential silent "success"
mentioned above, all users RISCV_ALTERNATIVE are converted from selects
to dependencies, with the option being selected for all !XIP_KERNEL
builds.

I know that the VDSO was a key place that Jisheng wanted to use the new
helper rather than static branches, and I think the fallback path
should not cause issues there.

See the thread at [1] for the prior discussion.

1 - https://lore.kernel.org/linux-riscv/20230128172856.3814-1-jszhang@kernel.org/T/#m21390d570997145d31dd8bb95002fd61f99c6573

[Palmer: these were also merged into fixes, but there's a cleanup that
depends on the merge so I'm taking it into for-next as well.]

* b4-shazam-merge:
  RISC-V: always select RISCV_ALTERNATIVE for non-xip kernels
  RISC-V: add non-alternative fallback for riscv_has_extension_[un]likely()

Link: https://lore.kernel.org/r/20230324100538.3514663-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

* commit '1ee7fc3f4d0a93831a20d5566f203d5ad6d44de8':
  RISC-V: always select RISCV_ALTERNATIVE for non-xip kernels
  RISC-V: add non-alternative fallback for riscv_has_extension_[un]likely()
2023-03-29 12:26:38 -07:00
Palmer Dabbelt
4622f15909
Merge patch series "RISC-V: Fixes for riscv_has_extension[un]likely()'s alternative dependency"
Conor Dooley <conor.dooley@microchip.com> says:

Here's my attempt at fixing both the use of an FPU on XIP kernels and
the issue that Jason ran into where CONFIG_FPU, which needs the
alternatives frame work for has_fpu() checks, could be enabled without
the alternatives actually being present.

For the former, a "slow" fallback that does not use alternatives is
added to riscv_has_extension_[un]likely() that can be used with XIP.
Obviously, we want to make use of Jisheng's alternatives based approach
where possible, so any users of riscv_has_extension_[un]likely() will
want to make sure that they select RISCV_ALTERNATIVE.
If they don't however, they'll hit the fallback path which (should,
sparing a silly mistake from me!) behave in the same way, thus
succeeding silently. Sounds like a

To prevent "depends on !XIP_KERNEL; select RISCV_ALTERNATIVE" spreading
like the plague through the various places that want to check for the
presence of extensions, and sidestep the potential silent "success"
mentioned above, all users RISCV_ALTERNATIVE are converted from selects
to dependencies, with the option being selected for all !XIP_KERNEL
builds.

I know that the VDSO was a key place that Jisheng wanted to use the new
helper rather than static branches, and I think the fallback path
should not cause issues there.

See the thread at [1] for the prior discussion.

1 - https://lore.kernel.org/linux-riscv/20230128172856.3814-1-jszhang@kernel.org/T/#m21390d570997145d31dd8bb95002fd61f99c6573

[Palmer: merging in the fixes as a branch as there's some features that
depend on it.]

* b4-shazam-merge:
  RISC-V: always select RISCV_ALTERNATIVE for non-xip kernels
  RISC-V: add non-alternative fallback for riscv_has_extension_[un]likely()

Link: https://lore.kernel.org/r/20230324100538.3514663-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-29 12:23:00 -07:00
Conor Dooley
1ee7fc3f4d
RISC-V: always select RISCV_ALTERNATIVE for non-xip kernels
When moving switch_to's has_fpu() over to using
riscv_has_extension_likely() rather than static branches, the FPU code
gained a dependency on the alternatives framework.
That dependency has now been removed, as riscv_has_extension_ikely() now
contains a fallback path, using __riscv_isa_extension_available(), but
if CONFIG_RISCV_ALTERNATIVE isn't selected when CONFIG_FPU is, has_fpu()
checks will not benefit from the "fast path" that the alternatives
framework provides.

We want to ensure that alternatives are available whenever
riscv_has_extension_[un]likely() is used, rather than silently falling
back to the slow path, but rather than rely on selecting
RISCV_ALTERNATIVE in the myriad of locations that may use
riscv_has_extension_[un]likely(), select it (almost) always instead by
adding it to the main RISCV config entry.
xip kernels cannot make use of the alternatives framework, so it is not
enabled for those configurations, although this is the status quo.

All current sites that select RISCV_ALTERNATIVE are converted to
dependencies on the option instead. The explicit dependencies on
!XIP_KERNEL can be dropped, as RISCV_ALTERNATIVE is not user selectable.

Fixes: 702e64550b ("riscv: fpu: switch has_fpu() to riscv_has_extension_likely()")
Link: https://lore.kernel.org/all/ZBruFRwt3rUVngPu@zx2c4.com/
Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/r/20230324100538.3514663-3-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-29 11:48:39 -07:00
Palmer Dabbelt
e97be4fbc1
Merge patch series "Add RISC-V 32 NOMMU support"
Jesse Taube <mr.bossman075@gmail.com> says:

This patch-set aims to add NOMMU support to RV32.
Many people want to build simple emulators or HDL
models of RISC-V this patch makes it possible to
run linux on them.

Yimin Gu is the original author of this set.
Submitted here:
https://lists.buildroot.org/pipermail/buildroot/2022-November/656134.html

Though Jesse T rewrote the Dconf.

* b4-shazam-merge:
  riscv: configs: Add nommu PHONY defconfig for RV32
  riscv: Kconfig: Allow RV32 to build with no MMU

Link: https://lore.kernel.org/r/20230301002657.352637-1-Mr.Bossman075@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-27 16:27:49 -07:00
Yimin Gu
b5e2c507b0
riscv: Kconfig: Allow RV32 to build with no MMU
Some RISC-V 32bit cores do not have an MMU, and the kernel should be
able to build for them. This patch enables the RV32 to be built with
no MMU support.

Signed-off-by: Yimin Gu <ustcymgu@gmail.com>
CC: Jesse Taube <Mr.Bossman075@gmail.com>
Tested-by: Waldemar Brodkorb <wbx@openadk.org>
Signed-off-by: Jesse Taube <Mr.Bossman075@gmail.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230301002657.352637-3-Mr.Bossman075@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-27 16:27:46 -07:00
Palmer Dabbelt
e45d6a52fe
Merge patch series "riscv: Add GENERIC_ENTRY support"
guoren@kernel.org <guoren@kernel.org> says:

From: Guo Ren <guoren@linux.alibaba.com>

The patches convert riscv to use the generic entry infrastructure from
kernel/entry/*. Some optimization for entry.S with new .macro and merge
ret_from_kernel_thread into ret_from_fork.

* b4-shazam-merge:
  riscv: entry: Consolidate general regs saving/restoring
  riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork
  riscv: entry: Remove extra level wrappers of trace_hardirqs_{on,off}
  riscv: entry: Convert to generic entry
  riscv: entry: Add noinstr to prevent instrumentation inserted
  riscv: ptrace: Remove duplicate operation

Link: https://lore.kernel.org/r/20230222033021.983168-1-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-24 13:34:43 -07:00
Nathan Chancellor
e89c2e815e
riscv: Handle zicsr/zifencei issues between clang and binutils
There are two related issues that appear in certain combinations with
clang and GNU binutils.

The first occurs when a version of clang that supports zicsr or zifencei
via '-march=' [1] (i.e, >= 17.x) is used in combination with a version
of GNU binutils that do not recognize zicsr and zifencei in the
'-march=' value (i.e., < 2.36):

  riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zicsr2p0_zifencei2p0: Invalid or unknown z ISA extension: 'zifencei'
  riscv64-linux-gnu-ld: failed to merge target specific data of file fs/efivarfs/file.o
  riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zicsr2p0_zifencei2p0: Invalid or unknown z ISA extension: 'zifencei'
  riscv64-linux-gnu-ld: failed to merge target specific data of file fs/efivarfs/super.o

The second occurs when a version of clang that does not support zicsr or
zifencei via '-march=' (i.e., <= 16.x) is used in combination with a
version of GNU as that defaults to a newer ISA base spec, which requires
specifying zicsr and zifencei in the '-march=' value explicitly (i.e, >=
2.38):

  ../arch/riscv/kernel/kexec_relocate.S: Assembler messages:
  ../arch/riscv/kernel/kexec_relocate.S:147: Error: unrecognized opcode `fence.i', extension `zifencei' required
  clang-12: error: assembler command failed with exit code 1 (use -v to see invocation)

This is the same issue addressed by commit 6df2a016c0 ("riscv: fix
build with binutils 2.38") (see [2] for additional information) but
older versions of clang miss out on it because the cc-option check
fails:

  clang-12: error: invalid arch name 'rv64imac_zicsr_zifencei', unsupported standard user-level extension 'zicsr'
  clang-12: error: invalid arch name 'rv64imac_zicsr_zifencei', unsupported standard user-level extension 'zicsr'

To resolve the first issue, only attempt to add zicsr and zifencei to
the march string when using the GNU assembler 2.38 or newer, which is
when the default ISA spec was updated, requiring these extensions to be
specified explicitly. LLVM implements an older version of the base
specification for all currently released versions, so these instructions
are available as part of the 'i' extension. If LLVM's implementation is
updated in the future, a CONFIG_AS_IS_LLVM condition can be added to
CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI.

To resolve the second issue, use version 2.2 of the base ISA spec when
using an older version of clang that does not support zicsr or zifencei
via '-march=', as that is the spec version most compatible with the one
clang/LLVM implements and avoids the need to specify zicsr and zifencei
explicitly due to still being a part of 'i'.

[1]: 22e199e6af
[2]: https://lore.kernel.org/ZAxT7T9Xy1Fo3d5W@aurel32.net/

Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1808
Co-developed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230313-riscv-zicsr-zifencei-fiasco-v1-1-dd1b7840a551@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-23 12:52:49 -07:00
Guo Ren
f0bddf5058
riscv: entry: Convert to generic entry
This patch converts riscv to use the generic entry infrastructure from
kernel/entry/*. The generic entry makes maintainers' work easier and
codes more elegant. Here are the changes:

 - More clear entry.S with handle_exception and ret_from_exception
 - Get rid of complex custom signal implementation
 - Move syscall procedure from assembly to C, which is much more
   readable.
 - Connect ret_from_fork & ret_from_kernel_thread to generic entry.
 - Wrap with irqentry_enter/exit and syscall_enter/exit_from_user_mode
 - Use the standard preemption code instead of custom

Suggested-by: Huacai Chen <chenhuacai@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Yipeng Zou <zouyipeng@huawei.com>
Tested-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Link: https://lore.kernel.org/r/20230222033021.983168-5-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-23 08:47:00 -07:00
Palmer Dabbelt
4b740779ac
Merge patch series "RISC-V: Apply Zicboz to clear_page"
Andrew Jones <ajones@ventanamicro.com> says:

When the Zicboz extension is available we can more rapidly zero naturally
aligned Zicboz block sized chunks of memory. As pages are always page
aligned and are larger than any Zicboz block size will be, then
clear_page() appears to be a good candidate for the extension. While cycle
count and energy consumption should also be considered, we can be pretty
certain that implementing clear_page() with the Zicboz extension is a win
by comparing the new dynamic instruction count with its current count[1].
Doing so we see that the new count is just over a quarter of the old count
(see patch6's commit message for more details).

For those of you who reviewed v1[2], you may be looking for the memset()
patches. As pointed out in v1, and a couple follow-up emails, it's not
clear that patching memset() is a win yet. When I get a chance to test
on real hardware with a comprehensive benchmark collection then I can
post the memset() patches separately (assuming the benchmarks show it's
worthwhile).

* b4-shazam-merge:
  RISC-V: KVM: Expose Zicboz to the guest
  RISC-V: KVM: Provide UAPI for Zicboz block size
  RISC-V: Use Zicboz in clear_page when available
  RISC-V: cpufeatures: Put the upper 16 bits of patch ID to work
  RISC-V: Add Zicboz detection and block size parsing
  dt-bindings: riscv: Document cboz-block-size
  RISC-V: Factor out body of riscv_init_cbom_blocksize loop
  RISC-V: alternatives: Support patching multiple insns in assembly

Link: https://lore.kernel.org/r/20230224162631.405473-1-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-15 07:11:08 -07:00