If a group has events which are from different hybrid PMUs,
shows a warning:
"WARNING: events in group from different hybrid PMUs!"
This is to remind the user not to put the core event and atom
event into one group.
Next, just disable grouping.
# perf stat -e "{cpu_core/cycles/,cpu_atom/cycles/}" -a -- sleep 1
WARNING: events in group from different hybrid PMUs!
WARNING: grouped events cpus do not match, disabling group:
anon group { cpu_core/cycles/, cpu_atom/cycles/ }
Performance counter stats for 'system wide':
5,438,125 cpu_core/cycles/
3,914,586 cpu_atom/cycles/
1.004250966 seconds time elapsed
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210427070139.25256-17-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Previously if '-e' is not specified in perf stat, some software events
and hardware events are added to evlist by default.
Before:
# perf stat -a -- sleep 1
Performance counter stats for 'system wide':
24,044.40 msec cpu-clock # 23.946 CPUs utilized
99 context-switches # 4.117 /sec
24 cpu-migrations # 0.998 /sec
3 page-faults # 0.125 /sec
7,000,244 cycles # 0.000 GHz
2,955,024 instructions # 0.42 insn per cycle
608,941 branches # 25.326 K/sec
31,991 branch-misses # 5.25% of all branches
1.004106859 seconds time elapsed
Among the events, cycles, instructions, branches and branch-misses
are hardware events.
One hybrid platform, two hardware events are created for one
hardware event.
cpu_core/cycles/,
cpu_atom/cycles/,
cpu_core/instructions/,
cpu_atom/instructions/,
cpu_core/branches/,
cpu_atom/branches/,
cpu_core/branch-misses/,
cpu_atom/branch-misses/
These events would be added to evlist on hybrid platform.
Since parse_events() has been supported to create two hardware events
for one event on hybrid platform, so we just use parse_events(evlist,
"cycles,instructions,branches,branch-misses") to create the default
events and add them to evlist.
After:
# perf stat -a -- sleep 1
Performance counter stats for 'system wide':
24,043.99 msec cpu-clock # 23.991 CPUs utilized
139 context-switches # 5.781 /sec
25 cpu-migrations # 1.040 /sec
6 page-faults # 0.250 /sec
10,381,751 cpu_core/cycles/ # 431.782 K/sec
1,264,216 cpu_atom/cycles/ # 52.579 K/sec
3,406,958 cpu_core/instructions/ # 141.697 K/sec
414,588 cpu_atom/instructions/ # 17.243 K/sec
705,149 cpu_core/branches/ # 29.327 K/sec
82,358 cpu_atom/branches/ # 3.425 K/sec
40,821 cpu_core/branch-misses/ # 1.698 K/sec
9,086 cpu_atom/branch-misses/ # 377.891 /sec
1.002228863 seconds time elapsed
We can see two events are created for one hardware event.
One TODO is, the shadow stats looks a bit different, now it's just
'M/sec'.
The perf_stat__update_shadow_stats and perf_stat__print_shadow_stats
need to be improved in future if we want to get the original shadow
stats.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210427070139.25256-15-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On hybrid platform, user may want to enable events on one pmu.
Following syntax are supported:
cpu_core/<event>/
cpu_atom/<event>/
But the syntax doesn't work for cache event.
Before:
# perf stat -e cpu_core/LLC-loads/ -a -- sleep 1
event syntax error: 'cpu_core/LLC-loads/'
\___ unknown term 'LLC-loads' for pmu 'cpu_core'
Cache events are a bit complex. We can't create aliases for them.
We use another solution. For example, if we use "cpu_core/LLC-loads/",
in parse_events_add_pmu(), term->config is "LLC-loads".
Then we create a new parser to scan "LLC-loads". The
parse_events_add_cache() would be called during parsing.
The parse_state->hybrid_pmu_name is used to identify the pmu
where the event should be enabled on.
After:
# perf stat -e cpu_core/LLC-loads/ -a -- sleep 1
Performance counter stats for 'system wide':
24,593 cpu_core/LLC-loads/
1.003911601 seconds time elapsed
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210427070139.25256-13-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The functions perf_pmu__is_hybrid and perf_pmu__find_hybrid_pmu
can be used to identify the hybrid platform and return the found
hybrid cpu pmu. All the detected hybrid pmus have been saved in
'perf_pmu__hybrid_pmus' list. So we just need to search this list.
perf_pmu__hybrid_type_to_pmu converts the user specified string
to hybrid pmu name. This is used to support the '--cputype' option
in next patches.
perf_pmu__has_hybrid checks the existing of hybrid pmu. Note that,
we have to define it in pmu.c (make pmu-hybrid.c no more symbol
dependency), otherwise perf test python would be failed.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210427070139.25256-7-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We identify the cpu_core pmu and cpu_atom pmu by explicitly
checking following files:
For cpu_core, checks:
"/sys/bus/event_source/devices/cpu_core/cpus"
For cpu_atom, checks:
"/sys/bus/event_source/devices/cpu_atom/cpus"
If the 'cpus' file exists and it has data, the pmu exists.
But in order not to hardcode the "cpu_core" and "cpu_atom",
and make the code in a generic way.
So if the path "/sys/bus/event_source/devices/cpu_xxx/cpus" exists, the
hybrid pmu exists. All the detected hybrid pmus are linked to a global
list 'perf_pmu__hybrid_pmus' and then next we just need to iterate the
list to get all hybrid pmu by using perf_pmu__for_each_hybrid_pmu.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210427070139.25256-6-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For some Intel platforms, such as Alderlake, which is a hybrid platform
and it consists of atom cpu and core cpu. Each cpu has dedicated event
list. Part of events are available on core cpu, part of events are
available on atom cpu.
The kernel exports new cpu pmus: cpu_core and cpu_atom. The event in
json is added with a new field "Unit" to indicate which pmu the event
is available on.
For example, one event in cache.json,
{
"BriefDescription": "Counts the number of load ops retired that",
"CollectPEBSRecord": "2",
"Counter": "0,1,2,3",
"EventCode": "0xd2",
"EventName": "MEM_LOAD_UOPS_RETIRED_MISC.MMIO",
"PEBScounters": "0,1,2,3",
"SampleAfterValue": "1000003",
"UMask": "0x80",
"Unit": "cpu_atom"
},
The unit "cpu_atom" indicates this event is only available on "cpu_atom".
In generated pmu-events.c, we can see:
{
.name = "mem_load_uops_retired_misc.mmio",
.event = "period=1000003,umask=0x80,event=0xd2",
.desc = "Counts the number of load ops retired that. Unit: cpu_atom ",
.topic = "cache",
.pmu = "cpu_atom",
},
But if without this patch, the "uncore_" prefix is added before "cpu_atom",
such as:
.pmu = "uncore_cpu_atom"
That would be a wrong pmu.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210427070139.25256-3-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To make the output more readable, I think it's better to remove 0's in
the output. Also the dummy event has no event stats so it just wasts
the space. Let's use the --skip-empty option to suppress it.
$ perf report --stat --skip-empty
Aggregated stats:
TOTAL events: 16530
MMAP events: 226
COMM events: 1596
EXIT events: 2
THROTTLE events: 121
UNTHROTTLE events: 117
FORK events: 1595
SAMPLE events: 719
MMAP2 events: 12147
CGROUP events: 2
FINISHED_ROUND events: 2
THREAD_MAP events: 1
CPU_MAP events: 1
TIME_CONV events: 1
cycles stats:
SAMPLE events: 719
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210427013717.1651674-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently, to use BPF to aggregate perf event counters, the user uses
--bpf-counters option. Enable "use bpf by default" events with a config
option, stat.bpf-counter-events. Events with name in the option will use
BPF.
This also enables mixed BPF event and regular event in the same sesssion.
For example:
perf config stat.bpf-counter-events=instructions
perf stat -e instructions,cs
The second command will use BPF for "instructions" but not "cs".
Signed-off-by: Song Liu <song@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/r/20210425214333.1090950-4-song@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf event updates from Ingo Molnar:
- Improve Intel uncore PMU support:
- Parse uncore 'discovery tables' - a new hardware capability
enumeration method introduced on the latest Intel platforms. This
table is in a well-defined PCI namespace location and is read via
MMIO. It is organized in an rbtree.
These uncore tables will allow the discovery of standard counter
blocks, but fancier counters still need to be enumerated
explicitly.
- Add Alder Lake support
- Improve IIO stacks to PMON mapping support on Skylake servers
- Add Intel Alder Lake PMU support - which requires the introduction of
'hybrid' CPUs and PMUs. Alder Lake is a mix of Golden Cove ('big')
and Gracemont ('small' - Atom derived) cores.
The CPU-side feature set is entirely symmetrical - but on the PMU
side there's core type dependent PMU functionality.
- Reduce data loss with CPU level hardware tracing on Intel PT / AUX
profiling, by fixing the AUX allocation watermark logic.
- Improve ring buffer allocation on NUMA systems
- Put 'struct perf_event' into their separate kmem_cache pool
- Add support for synchronous signals for select perf events. The
immediate motivation is to support low-overhead sampling-based race
detection for user-space code. The feature consists of the following
main changes:
- Add thread-only event inheritance via
perf_event_attr::inherit_thread, which limits inheritance of
events to CLONE_THREAD.
- Add the ability for events to not leak through exec(), via
perf_event_attr::remove_on_exec.
- Allow the generation of SIGTRAP via perf_event_attr::sigtrap,
extend siginfo with an u64 ::si_perf, and add the breakpoint
information to ::si_addr and ::si_perf if the event is
PERF_TYPE_BREAKPOINT.
The siginfo support is adequate for breakpoints right now - but the
new field can be used to introduce support for other types of
metadata passed over siginfo as well.
- Misc fixes, cleanups and smaller updates.
* tag 'perf-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
signal, perf: Add missing TRAP_PERF case in siginfo_layout()
signal, perf: Fix siginfo_t by avoiding u64 on 32-bit architectures
perf/x86: Allow for 8<num_fixed_counters<16
perf/x86/rapl: Add support for Intel Alder Lake
perf/x86/cstate: Add Alder Lake CPU support
perf/x86/msr: Add Alder Lake CPU support
perf/x86/intel/uncore: Add Alder Lake support
perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
perf/x86/intel: Add Alder Lake Hybrid support
perf/x86: Support filter_match callback
perf/x86/intel: Add attr_update for Hybrid PMUs
perf/x86: Add structures for the attributes of Hybrid PMUs
perf/x86: Register hybrid PMUs
perf/x86: Factor out x86_pmu_show_pmu_cap
perf/x86: Remove temporary pmu assignment in event_init
perf/x86/intel: Factor out intel_pmu_check_extra_regs
perf/x86/intel: Factor out intel_pmu_check_event_constraints
perf/x86/intel: Factor out intel_pmu_check_num_counters
perf/x86: Hybrid PMU support for extra_regs
perf/x86: Hybrid PMU support for event constraints
...
Pull objtool updates from Ingo Molnar:
- Standardize the crypto asm code so that it looks like compiler-
generated code to objtool - so that it can understand it. This
enables unwinding from crypto asm code - and also fixes the last
known remaining objtool warnings for LTO and more.
- x86 decoder fixes: clean up and fix the decoder, and also extend it a
bit
- Misc fixes and cleanups
* tag 'objtool-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/crypto: Enable objtool in crypto code
x86/crypto/sha512-ssse3: Standardize stack alignment prologue
x86/crypto/sha512-avx2: Standardize stack alignment prologue
x86/crypto/sha512-avx: Standardize stack alignment prologue
x86/crypto/sha256-avx2: Standardize stack alignment prologue
x86/crypto/sha1_avx2: Standardize stack alignment prologue
x86/crypto/sha_ni: Standardize stack alignment prologue
x86/crypto/crc32c-pcl-intel: Standardize jump table
x86/crypto/camellia-aesni-avx2: Unconditionally allocate stack buffer
x86/crypto/aesni-intel_avx: Standardize stack alignment prologue
x86/crypto/aesni-intel_avx: Fix register usage comments
x86/crypto/aesni-intel_avx: Remove unused macros
objtool: Support asm jump tables
objtool: Parse options from OBJTOOL_ARGS
objtool: Collate parse_options() users
objtool: Add --backup
objtool,x86: More ModRM sugar
objtool,x86: Rewrite ADD/SUB/AND
objtool,x86: Support %riz encodings
objtool,x86: Simplify register decode
...
Pull locking updates from Ingo Molnar:
- rtmutex cleanup & spring cleaning pass that removes ~400 lines of
code
- Futex simplifications & cleanups
- Add debugging to the CSD code, to help track down a tenacious race
(or hw problem)
- Add lockdep_assert_not_held(), to allow code to require a lock to not
be held, and propagate this into the ath10k driver
- Misc LKMM documentation updates
- Misc KCSAN updates: cleanups & documentation updates
- Misc fixes and cleanups
- Fix locktorture bugs with ww_mutexes
* tag 'locking-core-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
kcsan: Fix printk format string
static_call: Relax static_call_update() function argument type
static_call: Fix unused variable warn w/o MODULE
locking/rtmutex: Clean up signal handling in __rt_mutex_slowlock()
locking/rtmutex: Restrict the trylock WARN_ON() to debug
locking/rtmutex: Fix misleading comment in rt_mutex_postunlock()
locking/rtmutex: Consolidate the fast/slowpath invocation
locking/rtmutex: Make text section and inlining consistent
locking/rtmutex: Move debug functions as inlines into common header
locking/rtmutex: Decrapify __rt_mutex_init()
locking/rtmutex: Remove pointless CONFIG_RT_MUTEXES=n stubs
locking/rtmutex: Inline chainwalk depth check
locking/rtmutex: Move rt_mutex_debug_task_free() to rtmutex.c
locking/rtmutex: Remove empty and unused debug stubs
locking/rtmutex: Consolidate rt_mutex_init()
locking/rtmutex: Remove output from deadlock detector
locking/rtmutex: Remove rtmutex deadlock tester leftovers
locking/rtmutex: Remove rt_mutex_timed_lock()
MAINTAINERS: Add myself as futex reviewer
locking/mutex: Remove repeated declaration
...
Pull RCU updates from Ingo Molnar:
- Support for "N" as alias for last bit in bitmap parsing library (eg
using syntax like "nohz_full=2-N")
- kvfree_rcu updates
- mm_dump_obj() updates. (One of these is to mm, but was suggested by
Andrew Morton.)
- RCU callback offloading update
- Polling RCU grace-period interfaces
- Realtime-related RCU updates
- Tasks-RCU updates
- Torture-test updates
- Torture-test scripting updates
- Miscellaneous fixes
* tag 'core-rcu-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
rcutorture: Test start_poll_synchronize_rcu() and poll_state_synchronize_rcu()
rcu: Provide polling interfaces for Tiny RCU grace periods
torture: Fix kvm.sh --datestamp regex check
torture: Consolidate qemu-cmd duration editing into kvm-transform.sh
torture: Print proper vmlinux path for kvm-again.sh runs
torture: Make TORTURE_TRUST_MAKE available in kvm-again.sh environment
torture: Make kvm-transform.sh update jitter commands
torture: Add --duration argument to kvm-again.sh
torture: Add kvm-again.sh to rerun a previous torture-test
torture: Create a "batches" file for build reuse
torture: De-capitalize TORTURE_SUITE
torture: Make upper-case-only no-dot no-slash scenario names official
torture: Rename SRCU-t and SRCU-u to avoid lowercase characters
torture: Remove no-mpstat error message
torture: Record kvm-test-1-run.sh and kvm-test-1-run-qemu.sh PIDs
torture: Record jitter start/stop commands
torture: Extract kvm-test-1-run-qemu.sh from kvm-test-1-run.sh
torture: Record TORTURE_KCONFIG_GDB_ARG in qemu-cmd
torture: Abstract jitter.sh start/stop into scripts
rcu: Provide polling interfaces for Tree RCU grace periods
...
Pull KUnit updates from Shuah Khan:
"Several fixes and a new feature to support failure from dynamic
analysis tools such as UBSAN and fake ops for testing.
- a fake ops struct for testing a "free" function to complain if it
was called with an invalid argument, or caught a double-free. Most
return void and have no normal means of signalling failure (e.g.
super_operations, iommu_ops, etc.)"
* tag 'linux-kselftest-kunit-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
Documentation: kunit: add tips for using current->kunit_test
kunit: fix -Wunused-function warning for __kunit_fail_current_test
kunit: support failure from dynamic analysis tools
kunit: tool: make --kunitconfig accept dirs, add lib/kunit fragment
kunit: make KUNIT_EXPECT_STREQ() quote values, don't print literals
kunit: Match parenthesis alignment to improve code readability
Pull Kselftest updates from Shuah Khan:
- fixes and updates to resctrl test from Fenghua Yu and Reinette Chatre
- fixes to Kselftest documentation, framework
- minor spelling correction in timers test
* tag 'linux-kselftest-next-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (25 commits)
selftests/resctrl: Change a few printed messages
Documentation: kselftest: fix path to test module files
selftests/resctrl: Create .gitignore to include resctrl_tests
selftests/resctrl: Fix checking for < 0 for unsigned values
selftests/resctrl: Fix incorrect parsing of iMC counters
selftests/resctrl: Fix unmount resctrl FS
selftests/resctrl: Skip the test if requested resctrl feature is not supported
selftests/resctrl: Modularize resctrl test suite main() function
selftests/resctrl: Don't hard code value of "no_of_bits" variable
selftests/resctrl: Fix MBA/MBM results reporting format
selftests/resctrl: Use resctrl/info for feature detection
selftests/resctrl: Check for resctrl mount point only if resctrl FS is supported
selftests/resctrl: Add config dependencies
selftests/resctrl: Fix a printed message
selftests/resctrl: Share show_cache_info() by CAT and CMT tests
selftests/resctrl: Call kselftest APIs to log test results
selftests/resctrl: Rename CQM test as CMT test
selftests/resctrl: Fix missing options "-n" and "-p"
selftests/resctrl: Ensure sibling CPU is not same as original CPU
selftests/resctrl: Clean up resctrl features check
...
Pull x86 updates from Borislav Petkov:
- Turn the stack canary into a normal __percpu variable on 32-bit which
gets rid of the LAZY_GS stuff and a lot of code.
- Add an insn_decode() API which all users of the instruction decoder
should preferrably use. Its goal is to keep the details of the
instruction decoder away from its users and simplify and streamline
how one decodes insns in the kernel. Convert its users to it.
- kprobes improvements and fixes
- Set the maximum DIE per package variable on Hygon
- Rip out the dynamic NOP selection and simplify all the machinery
around selecting NOPs. Use the simplified NOPs in objtool now too.
- Add Xeon Sapphire Rapids to list of CPUs that support PPIN
- Simplify the retpolines by folding the entire thing into an
alternative now that objtool can handle alternatives with stack ops.
Then, have objtool rewrite the call to the retpoline with the
alternative which then will get patched at boot time.
- Document Intel uarch per models in intel-family.h
- Make Sub-NUMA Clustering topology the default and Cluster-on-Die the
exception on Intel.
* tag 'x86_core_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
x86, sched: Treat Intel SNC topology as default, COD as exception
x86/cpu: Comment Skylake server stepping too
x86/cpu: Resort and comment Intel models
objtool/x86: Rewrite retpoline thunk calls
objtool: Skip magical retpoline .altinstr_replacement
objtool: Cache instruction relocs
objtool: Keep track of retpoline call sites
objtool: Add elf_create_undef_symbol()
objtool: Extract elf_symbol_add()
objtool: Extract elf_strtab_concat()
objtool: Create reloc sections implicitly
objtool: Add elf_create_reloc() helper
objtool: Rework the elf_rebuild_reloc_section() logic
objtool: Fix static_call list generation
objtool: Handle per arch retpoline naming
objtool: Correctly handle retpoline thunk calls
x86/retpoline: Simplify retpolines
x86/alternatives: Optimize optimize_nops()
x86: Add insn_decode_kernel()
x86/kprobes: Move 'inline' to the beginning of the kprobe_is_ss() declaration
...
Similarly as b02709587e ("bpf: Fix propagation of 32-bit signed bounds
from 64-bit bounds."), we also need to fix the propagation of 32 bit
unsigned bounds from 64 bit counterparts. That is, really only set the
u32_{min,max}_value when /both/ {umin,umax}_value safely fit in 32 bit
space. For example, the register with a umin_value == 1 does /not/ imply
that u32_min_value is also equal to 1, since umax_value could be much
larger than 32 bit subregister can hold, and thus u32_min_value is in
the interval [0,1] instead.
Before fix, invalid tracking result of R2_w=inv1:
[...]
5: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0) R10=fp0
5: (35) if r2 >= 0x1 goto pc+1
[...] // goto path
7: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umin_value=1) R10=fp0
7: (b6) if w2 <= 0x1 goto pc+1
[...] // goto path
9: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,smin_value=-9223372036854775807,smax_value=9223372032559808513,umin_value=1,umax_value=18446744069414584321,var_off=(0x1; 0xffffffff00000000),s32_min_value=1,s32_max_value=1,u32_max_value=1) R10=fp0
9: (bc) w2 = w2
10: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv1 R10=fp0
[...]
After fix, correct tracking result of R2_w=inv(id=0,umax_value=1,var_off=(0x0; 0x1)):
[...]
5: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0) R10=fp0
5: (35) if r2 >= 0x1 goto pc+1
[...] // goto path
7: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umin_value=1) R10=fp0
7: (b6) if w2 <= 0x1 goto pc+1
[...] // goto path
9: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,smax_value=9223372032559808513,umax_value=18446744069414584321,var_off=(0x0; 0xffffffff00000001),s32_min_value=0,s32_max_value=1,u32_max_value=1) R10=fp0
9: (bc) w2 = w2
10: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0,umax_value=1,var_off=(0x0; 0x1)) R10=fp0
[...]
Thus, same issue as in b02709587e holds for unsigned subregister tracking.
Also, align __reg64_bound_u32() similarly to __reg64_bound_s32() as done in
b02709587e to make them uniform again.
Fixes: 3f50f132d8 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: Manfred Paul (@_manfp)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Fix failed tests checks in core_reloc test runner, which allowed failing tests
to pass quietly. Also add extra check to make sure that expected to fail test cases with
invalid names are caught as test failure anyway, as this is not an expected
failure mode. Also fix mislabeled probed vs direct bitfield test cases.
Fixes: 124a892d1c ("selftests/bpf: Test TYPE_EXISTS and TYPE_SIZE CO-RE relocations")
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210426192949.416837-6-andrii@kernel.org
Negative field existence cases for have a broken assumption that FIELD_EXISTS
CO-RE relo will fail for fields that match the name but have incompatible type
signature. That's not how CO-RE relocations generally behave. Types and fields
that match by name but not by expected type are treated as non-matching
candidates and are skipped. Error later is reported if no matching candidate
was found. That's what happens for most relocations, but existence relocations
(FIELD_EXISTS and TYPE_EXISTS) are more permissive and they are designed to
return 0 or 1, depending if a match is found. This allows to handle
name-conflicting but incompatible types in BPF code easily. Combined with
___flavor suffixes, it's possible to handle pretty much any structural type
changes in kernel within the compiled once BPF source code.
So, long story short, negative field existence test cases are invalid in their
assumptions, so this patch reworks them into a single consolidated positive
case that doesn't match any of the fields.
Fixes: c7566a6969 ("selftests/bpf: Add field existence CO-RE relocs tests")
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210426192949.416837-5-andrii@kernel.org
Add BTF_KIND_FLOAT support when doing CO-RE field type compatibility check.
Without this, relocations against float/double fields will fail.
Also adjust one error message to emit instruction index instead of less
convenient instruction byte offset.
Fixes: 22541a9eeb ("libbpf: Add BTF_KIND_FLOAT support")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20210426192949.416837-3-andrii@kernel.org