mirror of
https://github.com/torvalds/linux.git
synced 2024-11-24 05:02:12 +00:00
b66973b82d
436 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
857f1268a5 |
Changes in this cycle were:
- Shrink 'struct instruction', to improve objtool performance & memory footprint. - Other maximum memory usage reductions - this makes the build both faster, and fixes kernel build OOM failures on allyesconfig and similar configs when they try to build the final (large) vmlinux.o. - Fix ORC unwinding when a kprobe (INT3) is set on a stack-modifying single-byte instruction (PUSH/POP or LEAVE). This requires the extension of the ORC metadata structure with a 'signal' field. - Misc fixes & cleanups. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmQAVp8RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1gV6A//YbWb4nNxYbRFBd1O3FnFfy4efrDQ4btI hwkL6f7jka9RnIpIEatJvaLdNvyN5tuPCC/+B5eVnvFdd1JcBUmj5D+zYFt6H6qt BG4M6TNHFkP1kOJVfFGn8UPRfoMz2oMiEqilpsc1Yuf7b3ldMJtGUoHaeZC9pyqe RUisKNw4WHZp2G/gTBUWxW17xpWY3Awgch/w4HCu8wMnR+uEC44i0UCBfnAadl36 ar66PfhMJcQIv0XkK9wu43g7+HFnjpxHOx35JW3lRot0xRnwl/JcsmaX5iPkh0gt HV8eLH80J0homeMZDY7vWIKJxGeLkIdfjO5gxwTdnFc9rQw3GwHp1B7WTS6J3Vwe gM00kyaGly3CvkKMiz5QQBfViWCjE25nYS8X0i9Oz6Gk58IkRPGByaDTKRjNrDJB BwH9DE9xb3dPVZRv/PejkTdggQWo+FDTrL8ulHIjUFK11M7VubwkskecNHkfpAOE TRy5iLjMocF8u7hdyec6Mma2K6qEndC2Rw9ZMPQ7TeieMsBcl63cSRgSJLFfdRhr /5c6Hr2SNQKU8xu+3j49GyBwFvp4CwCa+GPs9/o+l0uCvuKNIn9B788cm4TjxLJ9 C3PRzE6B/CaLhYvlC5k5cNM+I4YpoMU/mvSvY6HcC0Duj2nSAWS2VV60MVMDpqVX 8nK4xnla2tM= =bpPY -----END PGP SIGNATURE----- Merge tag 'objtool-core-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - Shrink 'struct instruction', to improve objtool performance & memory footprint - Other maximum memory usage reductions - this makes the build both faster, and fixes kernel build OOM failures on allyesconfig and similar configs when they try to build the final (large) vmlinux.o - Fix ORC unwinding when a kprobe (INT3) is set on a stack-modifying single-byte instruction (PUSH/POP or LEAVE). This requires the extension of the ORC metadata structure with a 'signal' field - Misc fixes & cleanups * tag 'objtool-core-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) objtool: Fix ORC 'signal' propagation objtool: Remove instruction::list x86: Fix FILL_RETURN_BUFFER objtool: Fix overlapping alternatives objtool: Union instruction::{call_dest,jump_table} objtool: Remove instruction::reloc objtool: Shrink instruction::{type,visited} objtool: Make instruction::alts a single-linked list objtool: Make instruction::stack_ops a single-linked list objtool: Change arch_decode_instruction() signature x86/entry: Fix unwinding from kprobe on PUSH/POP instruction x86/unwind/orc: Add 'signal' field to ORC metadata objtool: Optimize layout of struct special_alt objtool: Optimize layout of struct symbol objtool: Allocate multiple structures with calloc() objtool: Make struct check_options static objtool: Make struct entries[] static and const objtool: Fix HOSTCC flag usage objtool: Properly support make V=1 objtool: Install libsubcmd in build ... |
||
Linus Torvalds
|
a8356cdb5b |
LoongArch changes for v6.3
1, Make -mstrict-align configurable; 2, Add kernel relocation and KASLR support; 3, Add single kernel image implementation for kdump; 4, Add hardware breakpoints/watchpoints support; 5, Add kprobes/kretprobes/kprobes_on_ftrace support; 6, Add LoongArch support for some selftests. -----BEGIN PGP SIGNATURE----- iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmP+9H0WHGNoZW5odWFj YWlAa2VybmVsLm9yZwAKCRAChivD8uImerz+D/98MjkLXM4qtgfAxuBKpVdEVA4U bzO19UlpqWlwTJbwrhf0GYsRrAis37PTVJG4eNORJairJ/oTkMtEEBPhwq0D9Whc URDEh+VrjzFztLsu2OlvzOA9gE7lpg+xAx2LKflP7ixlOELOWeercDLW3octp5/J CJDE8wPaw9tJrMHFWuiVybs03yZmY3YFV55JdWL9hY8Ryy4DY5997mruOfzjvHpl EfDgQM2zCn2JSQwaD+Kl3MHxHyRx07Tj2wnZAh9ptaGeptK/yplc7nqRwhe7BevS QwClhJNPICcOi+evZ7cDUY0PTL4evpw2KRnF1N4zw+58RhZECjVrCEJNdf6L1scj muptQngWKrE/TJvn4way3cJr44stSCtT71elPhn629S23my/CauMmFqCqKpYOPOf pxwzzCaqDcaZKwMu96qBkZS76tIrhoNeNFntj+C9RS+8ezY3+o144S3vF1A6A9Zb M4gwa2NiQuLqnCUwKK6dZkLQVX2NMIMViUkYNKdUStxNWx/K7fFmXcl0ycAFpGYp 8Q95LLH34jUrpSgqMSCmcylsPvNiN1QnuXFnw8Tu+zDthp5dOzio60tORLPM1ZUq gobPeGjeTQInq4eMCf2B5HH8fOMVtJyj6H4K9G1M6HUMg64UtcBp6BvEbwPxTxNN sIOFUjDfDnBiIXWF4w== =SzL5 -----END PGP SIGNATURE----- Merge tag 'loongarch-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch updates from Huacai Chen: - Make -mstrict-align configurable - Add kernel relocation and KASLR support - Add single kernel image implementation for kdump - Add hardware breakpoints/watchpoints support - Add kprobes/kretprobes/kprobes_on_ftrace support - Add LoongArch support for some selftests. * tag 'loongarch-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (23 commits) selftests/ftrace: Add LoongArch kprobe args string tests support selftests/seccomp: Add LoongArch selftesting support tools: Add LoongArch build infrastructure samples/kprobes: Add LoongArch support LoongArch: Mark some assembler symbols as non-kprobe-able LoongArch: Add kprobes on ftrace support LoongArch: Add kretprobes support LoongArch: Add kprobes support LoongArch: Simulate branch and PC* instructions LoongArch: ptrace: Add hardware single step support LoongArch: ptrace: Add function argument access API LoongArch: ptrace: Expose hardware breakpoints to debuggers LoongArch: Add hardware breakpoints/watchpoints support LoongArch: kdump: Add crashkernel=YM handling LoongArch: kdump: Add single kernel image implementation LoongArch: Add support for kernel address space layout randomization (KASLR) LoongArch: Add support for kernel relocation LoongArch: Add la_abs macro implementation LoongArch: Add JUMP_VIRT_ADDR macro implementation to avoid using la.abs LoongArch: Use la.pcrel instead of la.abs when it's trivially possible ... |
||
Huacai Chen
|
121ff07bde |
tools: Add LoongArch build infrastructure
We will add tools support for LoongArch (bpf, perf, objtool, etc.), add build infrastructure and common headers for preparation. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> |
||
Linus Torvalds
|
0df82189bc |
perf tools changes for v6.3:
- 'perf lock contention' improvements: - Add -o/--lock-owner option: $ sudo ./perf lock contention -abo -- ./perf bench sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 4.766 [sec] 4.766540 usecs/op 209795 ops/sec contended total wait max wait avg wait pid owner 403 565.32 us 26.81 us 1.40 us -1 Unknown 4 27.99 us 8.57 us 7.00 us 1583145 sched-pipe 1 8.25 us 8.25 us 8.25 us 1583144 sched-pipe 1 2.03 us 2.03 us 2.03 us 5068 chrome The owner is unknown in most cases. Filtering only for the mutex locks, it will more likely get the owners. - -S/--callstack-filter is to limit display entries having the given string in the callstack $ sudo ./perf lock contention -abv -S net sleep 1 ... contended total wait max wait avg wait type caller 5 70.20 us 16.13 us 14.04 us spinlock __dev_queue_xmit+0xb6d 0xffffffffa5dd1c60 _raw_spin_lock+0x30 0xffffffffa5b8f6ed __dev_queue_xmit+0xb6d 0xffffffffa5cd8267 ip6_finish_output2+0x2c7 0xffffffffa5cdac14 ip6_finish_output+0x1d4 0xffffffffa5cdb477 ip6_xmit+0x457 0xffffffffa5d1fd17 inet6_csk_xmit+0xd7 0xffffffffa5c5f4aa __tcp_transmit_skb+0x54a 0xffffffffa5c6467d tcp_keepalive_timer+0x2fd Please note that to have the -b option (BPF) working above one has to build with BUILD_BPF_SKEL=1. - Add more 'perf test' entries to test these new features. - Add Ian Rogers to MAINTAINERS as a perf tools reviewer. - Add support for retire latency feature (pipeline stall of a instruction compared to the previous one, in cycles) present on some Intel processors. - Add 'perf c2c' report option to show false sharing with adjacent cachelines, to be used in machines with cacheline prefetching, where accesses to a cacheline brings the next one too. - Skip 'perf test bpf' when the required kernel-debuginfo package isn't installed. perf script: - Add 'cgroup' field for 'perf script' output: $ perf record --all-cgroups -- true $ perf script -F comm,pid,cgroup true 337112 /user.slice/user-657345.slice/user@657345.service/... true 337112 /user.slice/user-657345.slice/user@657345.service/... true 337112 /user.slice/user-657345.slice/user@657345.service/... true 337112 /user.slice/user-657345.slice/user@657345.service/... - Add support for showing branch speculation information in 'perf script' and in the 'perf report' raw dump (-D). perf record: - Fix 'perf record' segfault with --overwrite and --max-size. Intel PT: - Add support for synthesizing "cycle" events from Intel PT traces as we support "instruction" events when Intel PT CYC packets are available. This enables much more accurate profiles than when using the regular 'perf record -e cycles' (the default) when the workload lasts for very short periods (<10ms). - .plt symbol handling improvements, better handling IBT (in the past MPX) done in the context of decoding Intel PT processor traces, IFUNC symbols on x86_64, static executables, understanding .plt.got symbols on x86_64. - Add a 'perf test' to test symbol resolution, part of the .plt improvements series, this tests things like symbol size in contexts where only the symbol start is available (kallsyms), etc. - Better handle auxtrace/Intel PT data when using pipe mode (perf record sleep 1|perf report). - Fix symbol lookup with kcore with multiple segments match stext, getting the symbol resolution to just show DSOs as unknown. ARM: - Timestamp improvements for ARM64 systems with ETMv4 (Embedded Trace Macrocell v4). - Ensure ARM64 CoreSight timestamps don't go backwards. - Document that ARM64 SPE (Statistical Profiling Extension) is used with 'perf c2c/mem'. - Add raw decoding for ARM64 SPEv1.2 previous branch address. - Update neoverse-n2-v2 ARM vendor events (JSON tables): topdown L1, TLB, cache, branch, PE utilization and instruction mix metrics. - Update decoder code for OpenCSD version 1.4, on ARM64 systems. - Fix command line auto-complete of CPU events on aarch64. perf test/bench: - Switch basic BPF filtering test to use syscall tracepoint to avoid the variable number of probes inserted when using the previous probe point (do_epoll_wait) that happens on different CPU architectures. - Fix DWARF unwind test by adding non-inline to expected function in a backtrace. - Use 'grep -c' where the longer form 'grep | wc -l' was being used. - Add getpid and execve benchmarks to 'perf bench syscall'. Miscellaneous: - Avoid d3-flame-graph package dependency in 'perf script flamegraph', making this feature more generally available. - Add JSON metric events to present CPI stall cycles in Power10. - Assorted improvements/refactorings on the JSON metrics parsing code. Build: - Fix 'perf probe' and 'perf test' when libtraceevent isn't linked, as several tests use tracepoints, those should be skipped. - More fallout fixes for the removal of tools/lib/traceevent/. - Fix build error when linking with libpfm. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCY/YzGgAKCRCyPKLppCJ+ J98CAP4/GD3E86Dk+S+w5FmPEHuBKootuZ3pHOqCnXLiyKFZqgEAs9TWOg9KVKGh io9cLluMjzfRwQrND8cpn3VfXxWvVAQ= =L1qh -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools updates from Arnaldo Carvalho de Melo: "Miscellaneous: - Add Ian Rogers to MAINTAINERS as a perf tools reviewer. - Add support for retire latency feature (pipeline stall of a instruction compared to the previous one, in cycles) present on some Intel processors. - Add 'perf c2c' report option to show false sharing with adjacent cachelines, to be used in machines with cacheline prefetching, where accesses to a cacheline brings the next one too. - Skip 'perf test bpf' when the required kernel-debuginfo package isn't installed. - Avoid d3-flame-graph package dependency in 'perf script flamegraph', making this feature more generally available. - Add JSON metric events to present CPI stall cycles in Power10. - Assorted improvements/refactorings on the JSON metrics parsing code. perf lock contention: - Add -o/--lock-owner option: $ sudo ./perf lock contention -abo -- ./perf bench sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 4.766 [sec] 4.766540 usecs/op 209795 ops/sec contended total wait max wait avg wait pid owner 403 565.32 us 26.81 us 1.40 us -1 Unknown 4 27.99 us 8.57 us 7.00 us 1583145 sched-pipe 1 8.25 us 8.25 us 8.25 us 1583144 sched-pipe 1 2.03 us 2.03 us 2.03 us 5068 chrome The owner is unknown in most cases. Filtering only for the mutex locks, it will more likely get the owners. - -S/--callstack-filter is to limit display entries having the given string in the callstack: $ sudo ./perf lock contention -abv -S net sleep 1 ... contended total wait max wait avg wait type caller 5 70.20 us 16.13 us 14.04 us spinlock __dev_queue_xmit+0xb6d 0xffffffffa5dd1c60 _raw_spin_lock+0x30 0xffffffffa5b8f6ed __dev_queue_xmit+0xb6d 0xffffffffa5cd8267 ip6_finish_output2+0x2c7 0xffffffffa5cdac14 ip6_finish_output+0x1d4 0xffffffffa5cdb477 ip6_xmit+0x457 0xffffffffa5d1fd17 inet6_csk_xmit+0xd7 0xffffffffa5c5f4aa __tcp_transmit_skb+0x54a 0xffffffffa5c6467d tcp_keepalive_timer+0x2fd Please note that to have the -b option (BPF) working above one has to build with BUILD_BPF_SKEL=1. - Add more 'perf test' entries to test these new features. perf script: - Add 'cgroup' field for 'perf script' output: $ perf record --all-cgroups -- true $ perf script -F comm,pid,cgroup true 337112 /user.slice/user-657345.slice/user@657345.service/... true 337112 /user.slice/user-657345.slice/user@657345.service/... true 337112 /user.slice/user-657345.slice/user@657345.service/... true 337112 /user.slice/user-657345.slice/user@657345.service/... - Add support for showing branch speculation information in 'perf script' and in the 'perf report' raw dump (-D). perf record: - Fix 'perf record' segfault with --overwrite and --max-size. perf test/bench: - Switch basic BPF filtering test to use syscall tracepoint to avoid the variable number of probes inserted when using the previous probe point (do_epoll_wait) that happens on different CPU architectures. - Fix DWARF unwind test by adding non-inline to expected function in a backtrace. - Use 'grep -c' where the longer form 'grep | wc -l' was being used. - Add getpid and execve benchmarks to 'perf bench syscall'. Intel PT: - Add support for synthesizing "cycle" events from Intel PT traces as we support "instruction" events when Intel PT CYC packets are available. This enables much more accurate profiles than when using the regular 'perf record -e cycles' (the default) when the workload lasts for very short periods (<10ms). - .plt symbol handling improvements, better handling IBT (in the past MPX) done in the context of decoding Intel PT processor traces, IFUNC symbols on x86_64, static executables, understanding .plt.got symbols on x86_64. - Add a 'perf test' to test symbol resolution, part of the .plt improvements series, this tests things like symbol size in contexts where only the symbol start is available (kallsyms), etc. - Better handle auxtrace/Intel PT data when using pipe mode (perf record sleep 1|perf report). - Fix symbol lookup with kcore with multiple segments match stext, getting the symbol resolution to just show DSOs as unknown. ARM: - Timestamp improvements for ARM64 systems with ETMv4 (Embedded Trace Macrocell v4). - Ensure ARM64 CoreSight timestamps don't go backwards. - Document that ARM64 SPE (Statistical Profiling Extension) is used with 'perf c2c/mem'. - Add raw decoding for ARM64 SPEv1.2 previous branch address. - Update neoverse-n2-v2 ARM vendor events (JSON tables): topdown L1, TLB, cache, branch, PE utilization and instruction mix metrics. - Update decoder code for OpenCSD version 1.4, on ARM64 systems. - Fix command line auto-complete of CPU events on aarch64. Build: - Fix 'perf probe' and 'perf test' when libtraceevent isn't linked, as several tests use tracepoints, those should be skipped. - More fallout fixes for the removal of tools/lib/traceevent/. - Fix build error when linking with libpfm" * tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (114 commits) perf tests stat_all_metrics: Change true workload to sleep workload for system wide check perf vendor events power10: Add JSON metric events to present CPI stall cycles in powerpc perf intel-pt: Synthesize cycle events perf c2c: Add report option to show false sharing in adjacent cachelines perf record: Fix segfault with --overwrite and --max-size perf stat: Avoid merging/aggregating metric counts twice perf tools: Fix perf tool build error in util/pfm.c perf tools: Fix auto-complete on aarch64 perf lock contention: Support old rw_semaphore type perf lock contention: Add -o/--lock-owner option perf lock contention: Fix to save callstack for the default modified perf test bpf: Skip test if kernel-debuginfo is not present perf probe: Update the exit error codes in function try_to_find_probe_trace_event perf script: Fix missing Retire Latency fields option documentation perf event x86: Add retire_lat when synthesizing PERF_SAMPLE_WEIGHT_STRUCT perf test x86: Support the retire_lat (Retire Latency) sample_type check perf test bpf: Check for libtraceevent support perf script: Support Retire Latency perf report: Support Retire Latency perf lock contention: Support filters for different aggregation ... |
||
Ingo Molnar
|
585a78c1f7 |
Merge branch 'linus' into objtool/core, to pick up Xen dependencies
Pick up dependencies - freshly merged upstream via xen-next - before applying dependent objtool changes. Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
Linus Torvalds
|
877934769e |
- Cache the AMD debug registers in per-CPU variables to avoid MSR writes
where possible, when supporting a debug registers swap feature for SEV-ES guests - Add support for AMD's version of eIBRS called Automatic IBRS which is a set-and-forget control of indirect branch restriction speculation resources on privilege change - Add support for a new x86 instruction - LKGS - Load kernel GS which is part of the FRED infrastructure - Reset SPEC_CTRL upon init to accomodate use cases like kexec which rediscover - Other smaller fixes and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmP1RDIACgkQEsHwGGHe VUohBw//ZB9ZRqsrKdm6D9YaP2x4Zb+kqKqo6rjYeWaYqyPyCwDujPwh+pb3Oq1t aj62muDv1t/wEJc8mKNkfXkjEEtBVAOcpb5YIpKreoEvNKyevol83Ih0u5iJcTRE E5qf8HDS8b/JZrcazJJLl6WQmQNH5RiKSu5bbCpRhoeOcyo5pRYR5MztK9vNmAQk GMdwHsUSU+jN8uiE4HnpaOb/luhgFindRwZVTpdjJegQWLABS8cl3CKeTv4+PW45 isvv37XnQP248wsptIEVRHeG6g3g/HtvwRx7DikUw06QwUyUK7H9hJssOoSP8TL9 u4psRwfWnJ1OxU6klL+s0Ii+pjQ97wXmK/oqK7QkdUwhWqR/mQAW2e9kWHAngyDn A6mKbzSM6HFAeSXQpB9cMb6uvYRD44SngDFe3WXtEK8jiiQ70ikUm4E28I5KJOPg s+RyioHk0NFRHYSOOBqNG1NKz6ED7L3GbgbbzxkgMh21AAyI3X351t+PtGoLV5ew eqOsM7lbg9Scg1LvPk1JcoALS8USWqgar397rz9qGUs+OkPWBtEBCmTdMz/Eb+2t g/WHdLS5/ajSs5gNhT99W3DeqZMPDEkgBRSeyBBmY3CUD3gBL2wXEktRXv504zBR RC4oyUPX3c9E2ib6GATLE3kBLbcz9hTWbMxF+X3lLJvTVd/Qc2o= =v/ZC -----END PGP SIGNATURE----- Merge tag 'x86_cpu_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpuid updates from Borislav Petkov: - Cache the AMD debug registers in per-CPU variables to avoid MSR writes where possible, when supporting a debug registers swap feature for SEV-ES guests - Add support for AMD's version of eIBRS called Automatic IBRS which is a set-and-forget control of indirect branch restriction speculation resources on privilege change - Add support for a new x86 instruction - LKGS - Load kernel GS which is part of the FRED infrastructure - Reset SPEC_CTRL upon init to accomodate use cases like kexec which rediscover - Other smaller fixes and cleanups * tag 'x86_cpu_for_v6.3_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/amd: Cache debug register values in percpu variables KVM: x86: Propagate the AMD Automatic IBRS feature to the guest x86/cpu: Support AMD Automatic IBRS x86/cpu, kvm: Add the SMM_CTL MSR not present feature x86/cpu, kvm: Add the Null Selector Clears Base feature x86/cpu, kvm: Move X86_FEATURE_LFENCE_RDTSC to its native leaf x86/cpu, kvm: Add the NO_NESTED_DATA_BP feature KVM: x86: Move open-coded CPUID leaf 0x80000021 EAX bit propagation code x86/cpu, kvm: Add support for CPUID_80000021_EAX x86/gsseg: Add the new <asm/gsseg.h> header to <asm/asm-prototypes.h> x86/gsseg: Use the LKGS instruction if available for load_gs_index() x86/gsseg: Move load_gs_index() to its own new header file x86/gsseg: Make asm_load_gs_index() take an u16 x86/opcode: Add the LKGS instruction to x86-opcode-map x86/cpufeature: Add the CPU feature bit for LKGS x86/bugs: Reset speculation control settings on init x86/cpu: Remove redundant extern x86_read_arch_cap_msr() |
||
Josh Poimboeuf
|
ffb1b4a410 |
x86/unwind/orc: Add 'signal' field to ORC metadata
Add a 'signal' field which allows unwind hints to specify whether the instruction pointer should be taken literally (like for most interrupts and exceptions) rather than decremented (like for call stack return addresses) when used to find the next ORC entry. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/d2c5ec4d83a45b513d8fd72fab59f1a8cfa46871.1676068346.git.jpoimboe@kernel.org |
||
Tiezhu Yang
|
540f8b5640 |
perf bench syscall: Add execve syscall benchmark
This commit adds the execve syscall benchmark, more syscall benchmarks can be added in the future. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/1668052208-14047-5-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Tiezhu Yang
|
391f84e555 |
perf bench syscall: Add getpgid syscall benchmark
This commit adds a simple getpgid syscall benchmark, more syscall benchmarks can be added in the future. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/1668052208-14047-4-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Tiezhu Yang
|
3fe91f3262 |
perf bench syscall: Introduce bench_syscall_common()
In the current code, there is only a basic syscall benchmark via getppid, this is not enough. Introduce bench_syscall_common() so that we can add more syscalls to benchmark. This is preparation for later patch, no functionality change. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/1668052208-14047-3-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Tiezhu Yang
|
1bad502775 |
tools x86: Keep list sorted by number in unistd_{32,64}.h
It is better to keep list sorted by number in unistd_{32,64}.h, so that we can add more syscall number to a proper position. This is preparation for later patch, no functionality change. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/1668052208-14047-2-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
8c51e8f4e9 |
tools headers arm64: Sync arm64's cputype.h with the kernel sources
To get the changes in: |
||
Arnaldo Carvalho de Melo
|
7f2d4cdd2f |
tools kvm headers arm64: Update KVM header from the kernel sources
To pick the changes from:
|
||
Arnaldo Carvalho de Melo
|
effa76856f |
tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
To pick the changes in: |
||
H. Peter Anvin (Intel)
|
5a91f12660 |
x86/opcode: Add the LKGS instruction to x86-opcode-map
Add the instruction opcode used by LKGS to x86-opcode-map. Opcode number is per public FRED draft spec v3.0. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230112072032.35626-3-xin3.li@intel.com |
||
H. Peter Anvin (Intel)
|
660569472d |
x86/cpufeature: Add the CPU feature bit for LKGS
Add the CPU feature bit for LKGS (Load "Kernel" GS). LKGS instruction is introduced with Intel FRED (flexible return and event delivery) specification. Search for the latest FRED spec in most search engines with this search pattern: site:intel.com FRED (flexible return and event delivery) specification LKGS behaves like the MOV to GS instruction except that it loads the base address into the IA32_KERNEL_GS_BASE MSR instead of the GS segment’s descriptor cache, which is exactly what Linux kernel does to load a user level GS base. Thus, with LKGS, there is no need to SWAPGS away from the kernel GS base. [ mingo: Minor tweaks to the description. ] Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230112072032.35626-2-xin3.li@intel.com |
||
Linus Torvalds
|
d1ac1a2b14 |
perf tools fixes and improvements for v6.2: 2nd batch
- Don't stop building perf if python setuptools isn't installed, just disable the affected perf feature. - Remove explicit reference to python 2.x devel files, that warning is about python-devel, no matter what version, being unavailable and thus disabling the linking with libpython. - Don't use -Werror=switch-enum when building the python support that handles libtraceevent enumerations, as there is no good way to test if some specific enum entry is available with the libtraceevent installed on the system. - Introduce 'perf lock contention' --type-filter and --lock-filter, to filter by lock type and lock name: $ sudo ./perf lock record -a -- ./perf bench sched messaging $ sudo ./perf lock contention -E 5 -Y spinlock contended total wait max wait avg wait type caller 802 1.26 ms 11.73 us 1.58 us spinlock __wake_up_common_lock+0x62 13 787.16 us 105.44 us 60.55 us spinlock remove_wait_queue+0x14 12 612.96 us 78.70 us 51.08 us spinlock prepare_to_wait+0x27 114 340.68 us 12.61 us 2.99 us spinlock try_to_wake_up+0x1f5 83 226.38 us 9.15 us 2.73 us spinlock folio_lruvec_lock_irqsave+0x5e $ sudo ./perf lock contention -l contended total wait max wait avg wait address symbol 57 1.11 ms 42.83 us 19.54 us ffff9f4140059000 15 280.88 us 23.51 us 18.73 us ffffffff9d007a40 jiffies_lock 1 20.49 us 20.49 us 20.49 us ffffffff9d0d50c0 rcu_state 1 9.02 us 9.02 us 9.02 us ffff9f41759e9ba0 $ sudo ./perf lock contention -L jiffies_lock,rcu_state contended total wait max wait avg wait type caller 15 280.88 us 23.51 us 18.73 us spinlock tick_sched_do_timer+0x93 1 20.49 us 20.49 us 20.49 us spinlock __softirqentry_text_start+0xeb $ sudo ./perf lock contention -L ffff9f4140059000 contended total wait max wait avg wait type caller 38 779.40 us 42.83 us 20.51 us spinlock worker_thread+0x50 11 216.30 us 39.87 us 19.66 us spinlock queue_work_on+0x39 8 118.13 us 20.51 us 14.77 us spinlock kthread+0xe5 - Fix splitting CC into compiler and options when checking if a option is present in clang to build the python binding, needed in systems such as yocto that set CC to, e.g.: "gcc --sysroot=/a/b/c". - Refresh metris and events for Intel systems: alderlake. alderlake-n, bonnell, broadwell, broadwellde, broadwellx, cascadelakex, elkhartlake, goldmont, goldmontplus, haswell, haswellx, icelake, icelakex, ivybridge, ivytown, jaketown, knightslanding, meteorlake, nehalemep, nehalemex, sandybridge, sapphirerapids, silvermont, skylake, skylakex, snowridgex, tigerlake, westmereep-dp, westmereep-sp, westmereex. - Add vendor events files (JSON) for AMD Zen 4, from sections 2.1.15.4 "Core Performance Monitor Counters", 2.1.15.5 "L3 Cache Performance Monitor Counter"s and Section 7.1 "Fabric Performance Monitor Counter (PMC) Events" in the Processor Programming Reference (PPR) for AMD Family 19h Model 11h Revision B1 processors. This constitutes events which capture op dispatch, execution and retirement, branch prediction, L1 and L2 cache activity, TLB activity, L3 cache activity and data bandwidth for various links and interfaces in the Data Fabric. - Also, from the same PPR are metrics taken from Section 2.1.15.2 "Performance Measurement", including pipeline utilization, which are new to Zen 4 processors and useful for finding performance bottlenecks by analyzing activity at different stages of the pipeline. - Greatly improve the 'srcline', 'srcline_from', 'srcline_to' and 'srcfile' sort keys performance by postponing calling the external addr2line utility to the collapse phase of histogram bucketing. - Fix 'perf test' "all PMU test" to skip parametrized events, that requires setting up and are not supported by this test. - Update tools/ copies of kernel headers: features, disabled-features, fscrypt.h, i915_drm.h, msr-index.h, power pc syscall table and kvm.h. - Add .DELETE_ON_ERROR special Makefile target to clean up partially updated files on error. - Simplify the mksyscalltbl script for arm64 by avoiding to run the host compiler to create the syscall table, do it all just with the shell script. - Further fixes to honour quiet mode (-q). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCY6SJ+gAKCRCyPKLppCJ+ J5JSAQCSokw2lsIqelDfoBfOQcMwah4ogW1vuO5KiepHgGOjuwD/d+65IxFIRA/h tJjAtq4fReyi4u4eTc1aLgUwFh7V0ws= =rneN -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.2-2-2022-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull more perf tools updates from Arnaldo Carvalho de Melo: "perf tools fixes and improvements: - Don't stop building perf if python setuptools isn't installed, just disable the affected perf feature. - Remove explicit reference to python 2.x devel files, that warning is about python-devel, no matter what version, being unavailable and thus disabling the linking with libpython. - Don't use -Werror=switch-enum when building the python support that handles libtraceevent enumerations, as there is no good way to test if some specific enum entry is available with the libtraceevent installed on the system. - Introduce 'perf lock contention' --type-filter and --lock-filter, to filter by lock type and lock name: $ sudo ./perf lock record -a -- ./perf bench sched messaging $ sudo ./perf lock contention -E 5 -Y spinlock contended total wait max wait avg wait type caller 802 1.26 ms 11.73 us 1.58 us spinlock __wake_up_common_lock+0x62 13 787.16 us 105.44 us 60.55 us spinlock remove_wait_queue+0x14 12 612.96 us 78.70 us 51.08 us spinlock prepare_to_wait+0x27 114 340.68 us 12.61 us 2.99 us spinlock try_to_wake_up+0x1f5 83 226.38 us 9.15 us 2.73 us spinlock folio_lruvec_lock_irqsave+0x5e $ sudo ./perf lock contention -l contended total wait max wait avg wait address symbol 57 1.11 ms 42.83 us 19.54 us ffff9f4140059000 15 280.88 us 23.51 us 18.73 us ffffffff9d007a40 jiffies_lock 1 20.49 us 20.49 us 20.49 us ffffffff9d0d50c0 rcu_state 1 9.02 us 9.02 us 9.02 us ffff9f41759e9ba0 $ sudo ./perf lock contention -L jiffies_lock,rcu_state contended total wait max wait avg wait type caller 15 280.88 us 23.51 us 18.73 us spinlock tick_sched_do_timer+0x93 1 20.49 us 20.49 us 20.49 us spinlock __softirqentry_text_start+0xeb $ sudo ./perf lock contention -L ffff9f4140059000 contended total wait max wait avg wait type caller 38 779.40 us 42.83 us 20.51 us spinlock worker_thread+0x50 11 216.30 us 39.87 us 19.66 us spinlock queue_work_on+0x39 8 118.13 us 20.51 us 14.77 us spinlock kthread+0xe5 - Fix splitting CC into compiler and options when checking if a option is present in clang to build the python binding, needed in systems such as yocto that set CC to, e.g.: "gcc --sysroot=/a/b/c". - Refresh metris and events for Intel systems: alderlake. alderlake-n, bonnell, broadwell, broadwellde, broadwellx, cascadelakex, elkhartlake, goldmont, goldmontplus, haswell, haswellx, icelake, icelakex, ivybridge, ivytown, jaketown, knightslanding, meteorlake, nehalemep, nehalemex, sandybridge, sapphirerapids, silvermont, skylake, skylakex, snowridgex, tigerlake, westmereep-dp, westmereep-sp, westmereex. - Add vendor events files (JSON) for AMD Zen 4, from sections 2.1.15.4 "Core Performance Monitor Counters", 2.1.15.5 "L3 Cache Performance Monitor Counter"s and Section 7.1 "Fabric Performance Monitor Counter (PMC) Events" in the Processor Programming Reference (PPR) for AMD Family 19h Model 11h Revision B1 processors. This constitutes events which capture op dispatch, execution and retirement, branch prediction, L1 and L2 cache activity, TLB activity, L3 cache activity and data bandwidth for various links and interfaces in the Data Fabric. - Also, from the same PPR are metrics taken from Section 2.1.15.2 "Performance Measurement", including pipeline utilization, which are new to Zen 4 processors and useful for finding performance bottlenecks by analyzing activity at different stages of the pipeline. - Greatly improve the 'srcline', 'srcline_from', 'srcline_to' and 'srcfile' sort keys performance by postponing calling the external addr2line utility to the collapse phase of histogram bucketing. - Fix 'perf test' "all PMU test" to skip parametrized events, that requires setting up and are not supported by this test. - Update tools/ copies of kernel headers: features, disabled-features, fscrypt.h, i915_drm.h, msr-index.h, power pc syscall table and kvm.h. - Add .DELETE_ON_ERROR special Makefile target to clean up partially updated files on error. - Simplify the mksyscalltbl script for arm64 by avoiding to run the host compiler to create the syscall table, do it all just with the shell script. - Further fixes to honour quiet mode (-q)" * tag 'perf-tools-for-v6.2-2-2022-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (67 commits) perf python: Fix splitting CC into compiler and options perf scripting python: Don't be strict at handling libtraceevent enumerations perf arm64: Simplify mksyscalltbl perf build: Remove explicit reference to python 2.x devel files perf vendor events amd: Add Zen 4 mapping perf vendor events amd: Add Zen 4 metrics perf vendor events amd: Add Zen 4 uncore events perf vendor events amd: Add Zen 4 core events perf vendor events intel: Refresh westmereex events perf vendor events intel: Refresh westmereep-sp events perf vendor events intel: Refresh westmereep-dp events perf vendor events intel: Refresh tigerlake metrics and events perf vendor events intel: Refresh snowridgex events perf vendor events intel: Refresh skylakex metrics and events perf vendor events intel: Refresh skylake metrics and events perf vendor events intel: Refresh silvermont events perf vendor events intel: Refresh sapphirerapids metrics and events perf vendor events intel: Refresh sandybridge metrics and events perf vendor events intel: Refresh nehalemex events perf vendor events intel: Refresh nehalemep events ... |
||
Arnaldo Carvalho de Melo
|
a66558dcb1 |
tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes in: |
||
Linus Torvalds
|
35f79d0e2c |
parisc architecture fixes for kernel v6.2-rc1:
Fixes: - Fix potential null-ptr-deref in start_task() - Fix kgdb console on serial port - Add missing FORCE prerequisites in Makefile - Drop PMD_SHIFT from calculation in pgtable.h Enhancements: - Implement a wrapper to align madvise() MADV_* constants with other architectures - If machine supports running MPE/XL, show the MPE model string Cleanups: - Drop duplicate kgdb console code - Indenting fixes in setup_cmdline() -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCY6B/cgAKCRD3ErUQojoP X85pAQCC6YpSYON3KZRfABeiDTRCKcGm72p7JQRnyj88XCq6ZAEA40T2qpRpjoYi NaXr28mxHFYh4Z0c5Y7K5EuFTT7gAA4= =e2Jd -----END PGP SIGNATURE----- Merge tag 'parisc-for-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: "There is one noteable patch, which allows the parisc kernel to use the same MADV_xxx constants as the other architectures going forward. With that change only alpha has one entry left (MADV_DONTNEED is 6 vs 4 on others) which is different. To prevent an ABI breakage, a wrapper is included which translates old MADV values to the new ones, so existing userspace isn't affected. Reason for that patch is, that some applications wrongly used the standard MADV_xxx values even on some non-x86 platforms and as such those programs failed to run correctly on parisc (examples are qemu-user, tor browser and boringssl). Then the kgdb console and the LED code received some fixes, and some 0-day warnings are now gone. Finally, the very last compile warning which was visible during a kernel build is now fixed too (in the vDSO code). The majority of the patches are tagged for stable series and in summary this patchset is quite small and drops more code than it adds: Fixes: - Fix potential null-ptr-deref in start_task() - Fix kgdb console on serial port - Add missing FORCE prerequisites in Makefile - Drop PMD_SHIFT from calculation in pgtable.h Enhancements: - Implement a wrapper to align madvise() MADV_* constants with other architectures - If machine supports running MPE/XL, show the MPE model string Cleanups: - Drop duplicate kgdb console code - Indenting fixes in setup_cmdline()" * tag 'parisc-for-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Show MPE/iX model string at bootup parisc: Add missing FORCE prerequisites in Makefile parisc: Move pdc_result struct to firmware.c parisc: Drop locking in pdc console code parisc: Drop duplicate kgdb_pdc console parisc: Fix locking in pdc_iodc_print() firmware call parisc: Drop PMD_SHIFT from calculation in pgtable.h parisc: Align parisc MADV_XXX constants with all other architectures parisc: led: Fix potential null-ptr-deref in start_task() parisc: Fix inconsistent indenting in setup_cmdline() |
||
Arnaldo Carvalho de Melo
|
51c4f2bf53 |
tools headers cpufeatures: Sync with the kernel sources
To pick the changes from: |
||
Arnaldo Carvalho de Melo
|
0bc1d0e2c1 |
tools headers disabled-cpufeatures: Sync with the kernel sources
To pick the changes from:
|
||
Helge Deller
|
71bdea6f79 |
parisc: Align parisc MADV_XXX constants with all other architectures
Adjust some MADV_XXX constants to be in sync what their values are on all other platforms. There is currently no reason to have an own numbering on parisc, but it requires workarounds in many userspace sources (e.g. glibc, qemu, ...) - which are often forgotten and thus introduce bugs and different behaviour on parisc. A wrapper avoids an ABI breakage for existing userspace applications by translating any old values to the new ones, so this change allows us to move over all programs to the new ABI over time. Signed-off-by: Helge Deller <deller@gmx.de> |
||
Linus Torvalds
|
8fa590bf34 |
ARM64:
* Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. * Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. * Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on (see merge commit |
||
Linus Torvalds
|
7a76117f9f |
platform-drivers-x86 for v6.2-1
Highlights: - Intel: - PMC: Add support for Meteor Lake - Intel On Demand: various updates - ideapad-laptop: - Add support for various Fn keys on new models - Fix touchpad on/off handling in a generic way to avoid having to add more and more quirks - android-x86-tablets: Add support for 2 more X86 Android tablet models - New Dell WMI DDV driver - Miscellaneous cleanups and small bugfixes The following is an automated git shortlog grouped by driver: ACPI: - battery: Pass battery hook pointer to hook callbacks ISST: - Fix typo in comments Move existing HP drivers to a new hp subdir: - Move existing HP drivers to a new hp subdir dell: - Add new dell-wmi-ddv driver dell-ddv: - Warn if ePPID has a suspicious length - Improve buffer handling huawei-wmi: - remove unnecessary member - fix return value calculation - do not hard-code sizes ideapad-laptop: - Make touchpad_ctrl_via_ec a module option - Stop writing VPCCMD_W_TOUCHPAD at probe time - Send KEY_TOUCHPAD_TOGGLE on some models - Only toggle ps2 aux port on/off on select models - Do not send KEY_TOUCHPAD* events on probe / resume - Refactor ideapad_sync_touchpad_state() - support for more special keys in WMI - Add new _CFG bit numbers for future use - Revert "check for touchpad support in _CFG" intel/pmc: - Relocate Alder Lake PCH support - Relocate Tiger Lake PCH support - Relocate Ice Lake PCH support - Relocate Cannon Lake Point PCH support - Relocate Sunrise Point PCH support - Move variable declarations and definitions to header and core.c - Replace all the reg_map with init functions intel/pmc/core: - Add Meteor Lake support to pmc core driver intel_scu_ipc: - fix possible name leak in __intel_scu_ipc_register() mxm-wmi: - fix memleak in mxm_wmi_call_mx[ds|mx]() platform/mellanox: - mlxbf-pmc: Fix event typo - Add BlueField-3 support in the tmfifo driver platform/x86/amd: - pmc: Add a workaround for an s0i3 issue on Cezanne platform/x86/amd/pmf: - pass the struct by reference platform/x86/dell: - alienware-wmi: Use sysfs_emit() instead of scnprintf() platform/x86/intel: - pmc: Fix repeated word in comment platform/x86/intel/hid: - Add module-params for 5 button array + SW_TABLET_MODE reporting platform/x86/intel/sdsi: - Add meter certificate support - Support different GUIDs - Hide attributes if hardware doesn't support - Add Intel On Demand text sony-laptop: - Convert to use sysfs_emit_at() API thinkpad_acpi: - use strstarts() - Fix max_brightness of thinklight tools/arch/x86: - intel_sdsi: Add support for reading meter certificates - intel_sdsi: Add support for new GUID - intel_sdsi: Read more On Demand registers - intel_sdsi: Add Intel On Demand text - intel_sdsi: Add support for reading state certificates uv_sysfs: - Use sysfs_emit() instead of scnprintf() wireless-hotkey: - use ACPI HID as phys x86-android-tablets: - Add Advantech MICA-071 extra button - Add Lenovo Yoga Tab 3 (YT3-X90F) charger + fuel-gauge data - Add Medion Lifetab S10346 data -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmOW+QgUHGhkZWdvZWRl QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9yAPwf/dAYLHiC2ox5YlNTLX2DvU+jOpeBv W+EIx4oHQz1+O9jrWMLyvS9zTwTEAf6ANLiMP3damEvtJnB72ClgFITzlJAaB4zN yj0SdxoBRMt6zDL2QwMkwitvb5kJonLfO2H7NsMwA6f0KP1X8sio3oVRAMMVwlzz nwDKM/VBpuxmy+d880wRRoAkgRkTsPIOwBkYdo1525NU7kkTmtrMpgM+SXQsHTJn TB9uQnyuiq5/znh3k1Qn+OGwXQezmGz2Fb76IcW5RzUQDew6n6b3kzILee5ddynT Pa7/ibwpV+FtZjm2kS/l4tV+WPdA+s5TSWoq7Hz0jzBX9GdOORcMZmEneg== =z16d -----END PGP SIGNATURE----- Merge tag 'platform-drivers-x86-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Hans de Goede: - Intel: - PMC: Add support for Meteor Lake - Intel On Demand: various updates - Ideapad-laptop: - Add support for various Fn keys on new models - Fix touchpad on/off handling in a generic way to avoid having to add more and more quirks - Android x86 tablets: - Add support for two more X86 Android tablet models - New Dell WMI DDV driver - Miscellaneous cleanups and small bugfixes * tag 'platform-drivers-x86-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (52 commits) platform/mellanox: mlxbf-pmc: Fix event typo platform/x86: intel_scu_ipc: fix possible name leak in __intel_scu_ipc_register() platform/x86: sony-laptop: Convert to use sysfs_emit_at() API platform/x86/dell: alienware-wmi: Use sysfs_emit() instead of scnprintf() platform/x86: uv_sysfs: Use sysfs_emit() instead of scnprintf() platform/x86: mxm-wmi: fix memleak in mxm_wmi_call_mx[ds|mx]() platform/x86: x86-android-tablets: Add Advantech MICA-071 extra button platform/x86: x86-android-tablets: Add Lenovo Yoga Tab 3 (YT3-X90F) charger + fuel-gauge data platform/x86: x86-android-tablets: Add Medion Lifetab S10346 data platform/x86: wireless-hotkey: use ACPI HID as phys platform/x86/intel/hid: Add module-params for 5 button array + SW_TABLET_MODE reporting platform/x86: ideapad-laptop: Make touchpad_ctrl_via_ec a module option platform/x86: ideapad-laptop: Stop writing VPCCMD_W_TOUCHPAD at probe time platform/x86: ideapad-laptop: Send KEY_TOUCHPAD_TOGGLE on some models platform/x86: ideapad-laptop: Only toggle ps2 aux port on/off on select models platform/x86: ideapad-laptop: Do not send KEY_TOUCHPAD* events on probe / resume platform/x86: ideapad-laptop: Refactor ideapad_sync_touchpad_state() tools/arch/x86: intel_sdsi: Add support for reading meter certificates tools/arch/x86: intel_sdsi: Add support for new GUID tools/arch/x86: intel_sdsi: Read more On Demand registers ... |
||
Sean Christopherson
|
bb056c0f08 |
tools: KVM: selftests: Convert clear/set_bit() to actual atomics
Convert {clear,set}_bit() to atomics as KVM's ucall implementation relies on clear_bit() being atomic, they are defined in atomic.h, and the same helpers in the kernel proper are atomic. KVM's ucall infrastructure is the only user of clear_bit() in tools/, and there are no true set_bit() users. tools/testing/nvdimm/ does make heavy use of set_bit(), but that code builds into a kernel module of sorts, i.e. pulls in all of the kernel's header and so is already getting the kernel's atomic set_bit(). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221119013450.2643007-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
Sean Christopherson
|
36293352ff |
tools: Drop "atomic_" prefix from atomic test_and_set_bit()
Drop the "atomic_" prefix from tools' atomic_test_and_set_bit() to match the kernel nomenclature where test_and_set_bit() is atomic, and __test_and_set_bit() provides the non-atomic variant. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221119013450.2643007-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
Javier Martinez Canillas
|
66a9221d73 |
KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl
The documentation says that the ioctl has been deprecated, but it has been actually removed and the remaining references are just left overs. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com> Message-Id: <20221202105011.185147-3-javierm@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
David E. Box
|
7fdc03a737 |
tools/arch/x86: intel_sdsi: Add support for reading meter certificates
Add option to read and decode On Demand meter certificates. Link: https://github.com/intel/intel-sdsi/blob/master/meter-certificate.rst Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20221119002343.1281885-10-david.e.box@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> |
||
David E. Box
|
429e789c67 |
tools/arch/x86: intel_sdsi: Add support for new GUID
The structure and content of the On Demand registers is based on the GUID which is read from hardware through sysfs. Add support for decoding the registers of a new GUID 0xF210D9EF. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Link: https://lore.kernel.org/r/20221119002343.1281885-9-david.e.box@linux.intel.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> |
||
David E. Box
|
a8041a89b7 |
tools/arch/x86: intel_sdsi: Read more On Demand registers
Add decoding of the following On Demand register fields: 1. NVRAM content authorization error status 2. Enabled features: telemetry and attestation 3. Key provisioning status 4. NVRAM update limit 5. PCU_CR3_CAPID_CFG Link: https://github.com/intel/intel-sdsi/blob/master/state-certificate-encoding.rst Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20221119002343.1281885-8-david.e.box@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> |
||
David E. Box
|
334599bccb |
tools/arch/x86: intel_sdsi: Add Intel On Demand text
Intel Software Defined Silicon (SDSi) is now officially known as Intel On Demand. Change text in tool to indicate this. Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20221119002343.1281885-7-david.e.box@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> |
||
David E. Box
|
3088258ea7 |
tools/arch/x86: intel_sdsi: Add support for reading state certificates
Add option to read and decode On Demand state certificates. Link: https://github.com/intel/intel-sdsi/blob/master/state-certificate-encoding.rst Signed-off-by: David E. Box <david.e.box@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20221119002343.1281885-6-david.e.box@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> |
||
Peter Gonda
|
cf4694be2b |
tools: Add atomic_test_and_set_bit()
Add x86 and generic implementations of atomic_test_and_set_bit() to allow
KVM selftests to atomically manage bitmaps.
Note, the generic version is taken from arch_test_and_set_bit() as of
commit
|
||
Borislav Petkov
|
2632daebaf |
x86/cpu: Restore AMD's DE_CFG MSR after resume
DE_CFG contains the LFENCE serializing bit, restore it on resume too.
This is relevant to older families due to the way how they do S3.
Unify and correct naming while at it.
Fixes:
|
||
Arnaldo Carvalho de Melo
|
74455fd7e4 |
tools headers cpufeatures: Sync with the kernel sources
To pick the changes from:
|
||
Arnaldo Carvalho de Melo
|
4402e360d0 |
tools headers: Update the copy of x86's memcpy_64.S used in 'perf bench'
We also need to add SYM_TYPED_FUNC_START() to util/include/linux/linkage.h
and update tools/perf/check_headers.sh to ignore the include cfi_types.h
line when checking if the kernel original files drifted from the copies
we carry.
This is to get the changes from:
|
||
Arnaldo Carvalho de Melo
|
ffc1df3dc9 |
tools headers arm64: Sync arm64's cputype.h with the kernel sources
To get the changes in:
|
||
Arnaldo Carvalho de Melo
|
a3a365655a |
tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes in: |
||
Ravi Bangoria
|
160ae99365 |
perf amd ibs: Sync arch/x86/include/asm/amd-ibs.h header with the kernel
Although new details added into this header is currently used by kernel only, tools copy needs to be in sync with kernel file to avoid tools/perf/check-headers.sh warnings. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: https://lore.kernel.org/r/20221006153946.7816-3-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
356edeca2e |
tools headers cpufeatures: Sync with the kernel sources
To pick the changes from:
|
||
Arnaldo Carvalho de Melo
|
dbcfe5ec3f |
tools kvm headers arm64: Update KVM header from the kernel sources
To pick the changes from:
|
||
Nick Desaulniers
|
a0a12c3ed0 |
asm goto: eradicate CC_HAS_ASM_GOTO
GCC has supported asm goto since 4.5, and Clang has since version 9.0.0. The minimum supported versions of these tools for the build according to Documentation/process/changes.rst are 5.1 and 11.0.0 respectively. Remove the feature detection script, Kconfig option, and clean up some fallback code that is no longer supported. The removed script was also testing for a GCC specific bug that was fixed in the 4.7 release. Also remove workarounds for bpftrace using clang older than 9.0.0, since other BPF backend fixes are required at this point. Link: https://lore.kernel.org/lkml/CAK7LNATSr=BXKfkdW8f-H5VT_w=xBpT2ZQcZ7rm6JfkdE+QnmA@mail.gmail.com/ Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48637 Acked-by: Borislav Petkov <bp@suse.de> Suggested-by: Masahiro Yamada <masahiroy@kernel.org> Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Arnaldo Carvalho de Melo
|
e5bc0deae5 |
tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
To pick the changes in: |
||
Arnaldo Carvalho de Melo
|
eea085d114 |
tools headers UAPI: Sync KVM's vmx.h header with the kernel sources
To pick the changes in:
|
||
Arnaldo Carvalho de Melo
|
25f3089517 |
tools headers kvm s390: Sync headers with the kernel sources
To pick the changes in:
|
||
Arnaldo Carvalho de Melo
|
62ed93d199 |
tools headers cpufeatures: Sync with the kernel sources
To pick the changes from: |
||
Arnaldo Carvalho de Melo
|
7f7f86a7bd |
tools arch x86: Sync the msr-index.h copy with the kernel sources
To pick up the changes in: |
||
Linus Torvalds
|
5318b987fe |
More from the CPU vulnerability nightmares front:
Intel eIBRS machines do not sufficiently mitigate against RET mispredictions when doing a VM Exit therefore an additional RSB, one-entry stuffing is needed. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLqsGsACgkQEsHwGGHe VUpXGg//ZEkxhf3Ri7X9PknAWNG6eIEqigKqWcdnOw+Oq/GMVb6q7JQsqowK7KBZ AKcY5c/KkljTJNohditnfSOePyCG5nDTPgfkjzIawnaVdyJWMRCz/L4X2cv6ykDl 2l2EvQm4Ro8XAogYhE7GzDg/osaVfx93OkLCQj278VrEMWgM/dN2RZLpn+qiIkNt DyFlQ7cr5UASh/svtKLko268oT4JwhQSbDHVFLMJ52VaLXX36yx4rValZHUKFdox ZDyj+kiszFHYGsI94KAD0dYx76p6mHnwRc4y/HkVcO8vTacQ2b9yFYBGTiQatITf 0Nk1RIm9m3rzoJ82r/U0xSIDwbIhZlOVNm2QtCPkXqJZZFhopYsZUnq2TXhSWk4x GQg/2dDY6gb/5MSdyLJmvrTUtzResVyb/hYL6SevOsIRnkwe35P6vDDyp15F3TYK YvidZSfEyjtdLISBknqYRQD964dgNZu9ewrj+WuJNJr+A2fUvBzUebXjxHREsugN jWp5GyuagEKTtneVCvjwnii+ptCm6yfzgZYLbHmmV+zhinyE9H1xiwVDvo5T7DDS ZJCBgoioqMhp5qR59pkWz/S5SNGui2rzEHbAh4grANy8R/X5ASRv7UHT9uAo6ve1 xpw6qnE37CLzuLhj8IOdrnzWwLiq7qZ/lYN7m+mCMVlwRWobbOo= =a8em -----END PGP SIGNATURE----- Merge tag 'x86_bugs_pbrsb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 eIBRS fixes from Borislav Petkov: "More from the CPU vulnerability nightmares front: Intel eIBRS machines do not sufficiently mitigate against RET mispredictions when doing a VM Exit therefore an additional RSB, one-entry stuffing is needed" * tag 'x86_bugs_pbrsb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation: Add LFENCE to RSB fill sequence x86/speculation: Add RSB VM Exit protections |
||
Linus Torvalds
|
48a577dc1b |
perf tools changes for v6.0: 1st batch
- Introduce 'perf lock contention' subtool, using new lock contention tracepoints and using BPF for in kernel aggregation and then userspace processing using the perf tooling infrastructure for resolving symbols, target specification, etc. Since the new lock contention tracepoints don't provide lock names, get up to 8 stack traces and display the first non-lock function symbol name as a caller: $ perf lock report -F acquired,contended,avg_wait,wait_total Name acquired contended avg wait total wait update_blocked_a... 40 40 3.61 us 144.45 us kernfs_fop_open+... 5 5 3.64 us 18.18 us _nohz_idle_balance 3 3 2.65 us 7.95 us tick_do_update_j... 1 1 6.04 us 6.04 us ep_scan_ready_list 1 1 3.93 us 3.93 us Supports the usual 'perf record' + 'perf report' workflow as well as a BCC/bpftrace like mode where you start the tool and then press control+C to get results: $ sudo perf lock contention -b ^C contended total wait max wait avg wait type caller 42 192.67 us 13.64 us 4.59 us spinlock queue_work_on+0x20 23 85.54 us 10.28 us 3.72 us spinlock worker_thread+0x14a 6 13.92 us 6.51 us 2.32 us mutex kernfs_iop_permission+0x30 3 11.59 us 10.04 us 3.86 us mutex kernfs_dop_revalidate+0x3c 1 7.52 us 7.52 us 7.52 us spinlock kthread+0x115 1 7.24 us 7.24 us 7.24 us rwlock:W sys_epoll_wait+0x148 2 7.08 us 3.99 us 3.54 us spinlock delayed_work_timer_fn+0x1b 1 6.41 us 6.41 us 6.41 us spinlock idle_balance+0xa06 2 2.50 us 1.83 us 1.25 us mutex kernfs_iop_lookup+0x2f 1 1.71 us 1.71 us 1.71 us mutex kernfs_iop_getattr+0x2c ... - Add new 'perf kwork' tool to trace time properties of kernel work (such as softirq, and workqueue), uses eBPF skeletons to collect info in kernel space, aggregating data that then gets processed by the userspace tool, e.g.: # perf kwork report Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | ---------------------------------------------------------------------------------------------------- nvme0q5:130 | 004 | 1.101 ms | 49 | 0.051 ms | 26035.056403 s | 26035.056455 s | amdgpu:162 | 002 | 0.176 ms | 9 | 0.046 ms | 26035.268020 s | 26035.268066 s | nvme0q24:149 | 023 | 0.161 ms | 55 | 0.009 ms | 26035.655280 s | 26035.655288 s | nvme0q20:145 | 019 | 0.090 ms | 33 | 0.014 ms | 26035.939018 s | 26035.939032 s | nvme0q31:156 | 030 | 0.075 ms | 21 | 0.010 ms | 26035.052237 s | 26035.052247 s | nvme0q8:133 | 007 | 0.062 ms | 12 | 0.021 ms | 26035.416840 s | 26035.416861 s | nvme0q6:131 | 005 | 0.054 ms | 22 | 0.010 ms | 26035.199919 s | 26035.199929 s | nvme0q19:144 | 018 | 0.052 ms | 14 | 0.010 ms | 26035.110615 s | 26035.110625 s | nvme0q7:132 | 006 | 0.049 ms | 13 | 0.007 ms | 26035.125180 s | 26035.125187 s | nvme0q18:143 | 017 | 0.033 ms | 14 | 0.007 ms | 26035.169698 s | 26035.169705 s | nvme0q17:142 | 016 | 0.013 ms | 1 | 0.013 ms | 26035.565147 s | 26035.565160 s | enp5s0-rx-0:164 | 006 | 0.004 ms | 4 | 0.002 ms | 26035.928882 s | 26035.928884 s | enp5s0-tx-0:166 | 008 | 0.003 ms | 3 | 0.002 ms | 26035.870923 s | 26035.870925 s | -------------------------------------------------------------------------------------------------------- See commit log messages for more examples with extra options to limit the events time window, etc. - Add support for new AMD IBS (Instruction Based Sampling) features: With the DataSrc extensions, the source of data can be decoded among: - Local L3 or other L1/L2 in CCX. - A peer cache in a near CCX. - Data returned from DRAM. - A peer cache in a far CCX. - DRAM address map with "long latency" bit set. - Data returned from MMIO/Config/PCI/APIC. - Extension Memory (S-Link, GenZ, etc - identified by the CS target and/or address map at DF's choice). - Peer Agent Memory. - Support hardware tracing with Intel PT on guest machines, combining the traces with the ones in the host machine. - Add a "-m" option to 'perf buildid-list' to show kernel and modules build-ids, to display all of the information needed to do external symbolization of kernel stack traces, such as those collected by bpf_get_stackid(). - Add arch TSC frequency information to perf.data file headers. - Handle changes in the binutils disassembler function signatures in perf, bpftool and bpf_jit_disasm (Acked by the bpftool maintainer). - Fix building the perf perl binding with the newest gcc in distros such as fedora rawhide, where some new warnings were breaking the build as perf uses -Werror. - Add 'perf test' entry for branch stack sampling. - Add ARM SPE system wide 'perf test' entry. - Add user space counter reading tests to 'perf test'. - Build with python3 by default, if available. - Add python converter script for the vendor JSON event files. - Update vendor JSON files for alderlake, bonnell, broadwell, broadwellde, broadwellx, cascadelakex, elkhartlake, goldmont, goldmontplus, haswell, haswellx, icelake, icelakex, ivybridge, ivytown, jaketown, knightslanding, nehalemep, nehalemex, sandybridge, sapphirerapids, silvermont, skylake, skylakex, snowridgex, tigerlake, westmereep-dp, westmereep-sp and westmereex. - Add vendor JSON File for Intel meteorlake. - Add Arm Cortex-A78C and X1C JSON vendor event files. - Add workaround to symbol address reading from ELF files without phdr, falling back to the previoous equation. - Convert legacy map definition to BTF-defined in the perf BPF script test. - Rework prologue generation code to stop using libbpf deprecated APIs. - Add default hybrid events for 'perf stat' on x86. - Add topdown metrics in the default 'perf stat' on the hybrid machines (big/little cores). - Prefer sampled CPU when exporting JSON in 'perf data convert' - Fix ('perf stat CSV output linter') and ("Check branch stack sampling") 'perf test' entries on s390. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYuw6gwAKCRCyPKLppCJ+ J5+iAP0RL6sKMhzdkRjRYfG8CluJ401YaPHadzv5jxP8gOZz2gEAsuYDrMF9t1zB 4DqORfobdX9UQEJjP9oRltU73GM0swI= =2/M0 -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.0-2022-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools updates from Arnaldo Carvalho de Melo: - Introduce 'perf lock contention' subtool, using new lock contention tracepoints and using BPF for in kernel aggregation and then userspace processing using the perf tooling infrastructure for resolving symbols, target specification, etc. Since the new lock contention tracepoints don't provide lock names, get up to 8 stack traces and display the first non-lock function symbol name as a caller: $ perf lock report -F acquired,contended,avg_wait,wait_total Name acquired contended avg wait total wait update_blocked_a... 40 40 3.61 us 144.45 us kernfs_fop_open+... 5 5 3.64 us 18.18 us _nohz_idle_balance 3 3 2.65 us 7.95 us tick_do_update_j... 1 1 6.04 us 6.04 us ep_scan_ready_list 1 1 3.93 us 3.93 us Supports the usual 'perf record' + 'perf report' workflow as well as a BCC/bpftrace like mode where you start the tool and then press control+C to get results: $ sudo perf lock contention -b ^C contended total wait max wait avg wait type caller 42 192.67 us 13.64 us 4.59 us spinlock queue_work_on+0x20 23 85.54 us 10.28 us 3.72 us spinlock worker_thread+0x14a 6 13.92 us 6.51 us 2.32 us mutex kernfs_iop_permission+0x30 3 11.59 us 10.04 us 3.86 us mutex kernfs_dop_revalidate+0x3c 1 7.52 us 7.52 us 7.52 us spinlock kthread+0x115 1 7.24 us 7.24 us 7.24 us rwlock:W sys_epoll_wait+0x148 2 7.08 us 3.99 us 3.54 us spinlock delayed_work_timer_fn+0x1b 1 6.41 us 6.41 us 6.41 us spinlock idle_balance+0xa06 2 2.50 us 1.83 us 1.25 us mutex kernfs_iop_lookup+0x2f 1 1.71 us 1.71 us 1.71 us mutex kernfs_iop_getattr+0x2c ... - Add new 'perf kwork' tool to trace time properties of kernel work (such as softirq, and workqueue), uses eBPF skeletons to collect info in kernel space, aggregating data that then gets processed by the userspace tool, e.g.: # perf kwork report Kwork Name | Cpu | Total Runtime | Count | Max runtime | Max runtime start | Max runtime end | ---------------------------------------------------------------------------------------------------- nvme0q5:130 | 004 | 1.101 ms | 49 | 0.051 ms | 26035.056403 s | 26035.056455 s | amdgpu:162 | 002 | 0.176 ms | 9 | 0.046 ms | 26035.268020 s | 26035.268066 s | nvme0q24:149 | 023 | 0.161 ms | 55 | 0.009 ms | 26035.655280 s | 26035.655288 s | nvme0q20:145 | 019 | 0.090 ms | 33 | 0.014 ms | 26035.939018 s | 26035.939032 s | nvme0q31:156 | 030 | 0.075 ms | 21 | 0.010 ms | 26035.052237 s | 26035.052247 s | nvme0q8:133 | 007 | 0.062 ms | 12 | 0.021 ms | 26035.416840 s | 26035.416861 s | nvme0q6:131 | 005 | 0.054 ms | 22 | 0.010 ms | 26035.199919 s | 26035.199929 s | nvme0q19:144 | 018 | 0.052 ms | 14 | 0.010 ms | 26035.110615 s | 26035.110625 s | nvme0q7:132 | 006 | 0.049 ms | 13 | 0.007 ms | 26035.125180 s | 26035.125187 s | nvme0q18:143 | 017 | 0.033 ms | 14 | 0.007 ms | 26035.169698 s | 26035.169705 s | nvme0q17:142 | 016 | 0.013 ms | 1 | 0.013 ms | 26035.565147 s | 26035.565160 s | enp5s0-rx-0:164 | 006 | 0.004 ms | 4 | 0.002 ms | 26035.928882 s | 26035.928884 s | enp5s0-tx-0:166 | 008 | 0.003 ms | 3 | 0.002 ms | 26035.870923 s | 26035.870925 s | -------------------------------------------------------------------------------------------------------- See commit log messages for more examples with extra options to limit the events time window, etc. - Add support for new AMD IBS (Instruction Based Sampling) features: With the DataSrc extensions, the source of data can be decoded among: - Local L3 or other L1/L2 in CCX. - A peer cache in a near CCX. - Data returned from DRAM. - A peer cache in a far CCX. - DRAM address map with "long latency" bit set. - Data returned from MMIO/Config/PCI/APIC. - Extension Memory (S-Link, GenZ, etc - identified by the CS target and/or address map at DF's choice). - Peer Agent Memory. - Support hardware tracing with Intel PT on guest machines, combining the traces with the ones in the host machine. - Add a "-m" option to 'perf buildid-list' to show kernel and modules build-ids, to display all of the information needed to do external symbolization of kernel stack traces, such as those collected by bpf_get_stackid(). - Add arch TSC frequency information to perf.data file headers. - Handle changes in the binutils disassembler function signatures in perf, bpftool and bpf_jit_disasm (Acked by the bpftool maintainer). - Fix building the perf perl binding with the newest gcc in distros such as fedora rawhide, where some new warnings were breaking the build as perf uses -Werror. - Add 'perf test' entry for branch stack sampling. - Add ARM SPE system wide 'perf test' entry. - Add user space counter reading tests to 'perf test'. - Build with python3 by default, if available. - Add python converter script for the vendor JSON event files. - Update vendor JSON files for most Intel cores. - Add vendor JSON File for Intel meteorlake. - Add Arm Cortex-A78C and X1C JSON vendor event files. - Add workaround to symbol address reading from ELF files without phdr, falling back to the previoous equation. - Convert legacy map definition to BTF-defined in the perf BPF script test. - Rework prologue generation code to stop using libbpf deprecated APIs. - Add default hybrid events for 'perf stat' on x86. - Add topdown metrics in the default 'perf stat' on the hybrid machines (big/little cores). - Prefer sampled CPU when exporting JSON in 'perf data convert' - Fix ('perf stat CSV output linter') and ("Check branch stack sampling") 'perf test' entries on s390. * tag 'perf-tools-for-v6.0-2022-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (169 commits) perf stat: Refactor __run_perf_stat() common code perf lock: Print the number of lost entries for BPF perf lock: Add --map-nr-entries option perf lock: Introduce struct lock_contention perf scripting python: Do not build fail on deprecation warnings genelf: Use HAVE_LIBCRYPTO_SUPPORT, not the never defined HAVE_LIBCRYPTO perf build: Suppress openssl v3 deprecation warnings in libcrypto feature test perf parse-events: Break out tracepoint and printing perf parse-events: Don't #define YY_EXTRA_TYPE tools bpftool: Don't display disassembler-four-args feature test tools bpftool: Fix compilation error with new binutils tools bpf_jit_disasm: Don't display disassembler-four-args feature test tools bpf_jit_disasm: Fix compilation error with new binutils tools perf: Fix compilation error with new binutils tools include: add dis-asm-compat.h to handle version differences tools build: Don't display disassembler-four-args feature test tools build: Add feature test for init_disassemble_info API changes perf test: Add ARM SPE system wide test perf tools: Rework prologue generation code perf bpf: Convert legacy map definition to BTF-defined ... |
||
Daniel Sneddon
|
2b12993220 |
x86/speculation: Add RSB VM Exit protections
tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as documented for RET instructions after VM exits. Mitigate it with a new one-entry RSB stuffing mechanism and a new LFENCE. == Background == Indirect Branch Restricted Speculation (IBRS) was designed to help mitigate Branch Target Injection and Speculative Store Bypass, i.e. Spectre, attacks. IBRS prevents software run in less privileged modes from affecting branch prediction in more privileged modes. IBRS requires the MSR to be written on every privilege level change. To overcome some of the performance issues of IBRS, Enhanced IBRS was introduced. eIBRS is an "always on" IBRS, in other words, just turn it on once instead of writing the MSR on every privilege level change. When eIBRS is enabled, more privileged modes should be protected from less privileged modes, including protecting VMMs from guests. == Problem == Here's a simplification of how guests are run on Linux' KVM: void run_kvm_guest(void) { // Prepare to run guest VMRESUME(); // Clean up after guest runs } The execution flow for that would look something like this to the processor: 1. Host-side: call run_kvm_guest() 2. Host-side: VMRESUME 3. Guest runs, does "CALL guest_function" 4. VM exit, host runs again 5. Host might make some "cleanup" function calls 6. Host-side: RET from run_kvm_guest() Now, when back on the host, there are a couple of possible scenarios of post-guest activity the host needs to do before executing host code: * on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not touched and Linux has to do a 32-entry stuffing. * on eIBRS hardware, VM exit with IBRS enabled, or restoring the host IBRS=1 shortly after VM exit, has a documented side effect of flushing the RSB except in this PBRSB situation where the software needs to stuff the last RSB entry "by hand". IOW, with eIBRS supported, host RET instructions should no longer be influenced by guest behavior after the host retires a single CALL instruction. However, if the RET instructions are "unbalanced" with CALLs after a VM exit as is the RET in #6, it might speculatively use the address for the instruction after the CALL in #3 as an RSB prediction. This is a problem since the (untrusted) guest controls this address. Balanced CALL/RET instruction pairs such as in step #5 are not affected. == Solution == The PBRSB issue affects a wide variety of Intel processors which support eIBRS. But not all of them need mitigation. Today, X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e., eIBRS systems which enable legacy IBRS explicitly. However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT and most of them need a new mitigation. Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT. The lighter-weight mitigation performs a CALL instruction which is immediately followed by a speculative execution barrier (INT3). This steers speculative execution to the barrier -- just like a retpoline -- which ensures that speculation can never reach an unbalanced RET. Then, ensure this CALL is retired before continuing execution with an LFENCE. In other words, the window of exposure is opened at VM exit where RET behavior is troublesome. While the window is open, force RSB predictions sampling for RET targets to a dead end at the INT3. Close the window with the LFENCE. There is a subset of eIBRS systems which are not vulnerable to PBRSB. Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB. Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO. [ bp: Massage, incorporate review comments from Andy Cooper. ] Signed-off-by: Daniel Sneddon <daniel.sneddon@linux.intel.com> Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> |