Pull perf fixes from Ingo Molnar:
"Misc race fixes uncovered by fuzzing efforts, a Sparse fix, two PMU
driver fixes, plus miscellanous tooling fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Reject non sampling events with precise_ip
perf/x86/intel: Account interrupts for PEBS errors
perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
perf/core: Fix sys_perf_event_open() vs. hotplug
perf/x86/intel: Use ULL constant to prevent undefined shift behaviour
perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code
perf/x86: Set pmu->module in Intel PMU modules
perf probe: Fix to probe on gcc generated symbols for offline kernel
perf probe: Fix --funcs to show correct symbols for offline module
perf symbols: Robustify reading of build-id from sysfs
perf tools: Install tools/lib/traceevent plugins with install-bin
tools lib traceevent: Fix prev/next_prio for deadline tasks
perf record: Fix --switch-output documentation and comment
perf record: Make __record_options static
tools lib subcmd: Add OPT_STRING_OPTARG_SET option
perf probe: Fix to get correct modname from elf header
samples/bpf trace_output_user: Remove duplicate sys/ioctl.h include
samples/bpf sock_example: Avoid getting ethhdr from two includes
perf sched timehist: Show total scheduling time
We set info.count to 1 in mtty_get_irq_info() so static checkers
complain that, "Why do we have impossible conditions?" The answer is
that it seems to be left over dead code that can be safely removed.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This is a sample driver for documentation so the impact is probably
pretty low. But we should check that bar_index is valid so we
don't write beyond the end of the mdev_state->region_info[] array.
Fixes: 9d1a546c53 ("docs: Sample driver to demonstrate how to use Mediated device framework.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The copy_to_user() function returns the number of bytes which it wasn't
able to copy but we want to return a negative error code.
Fixes: 9d1a546c53 ("docs: Sample driver to demonstrate how to use Mediated device framework.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Fixes:
- Fix prev/next_prio formatting for deadline tasks in libtraceevent (Daniel Bristot de Oliveira)
- Robustify reading of build-ids from /sys/kernel/note (Arnaldo Carvalho de Melo)
- Fix building some sample/bpf in Alpine Linux 3.4 (Arnaldo Carvalho de Melo)
- Fix 'make install-bin' to install libtraceevent plugins (Arnaldo Carvalho de Melo)
- Fix 'perf record --switch-output' documentation and comment (Jiri Olsa)
- 'perf probe' fixes for cross arch probing (Masami Hiramatsu)
Improvement:
- Show total scheduling time in 'perf sched timehist' (Namhyumg Kim)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJYbS6KAAoJENZQFvNTUqpAWqkP/1xdvbDqeTA2YjcWG+ffQJye
LULSNQpRShvBGpB3zsOzOebOtw0R5oWb7YAp4JFZNyMIk5qcnS9gVwdw9Qx1lc/W
Gq8OuhI28Pr9AHE37eTJ1+12HI2aAOylwKFqUpEWMzoaUqBDRh5JV2MeVGVQXBRe
YBJ4Qwk1a7Ng74uzN/pdiC6V7pjddPEoLZEDG5p35vrgtNAY1JGlX5rvNR7CO6nN
72QHRnB4r1d9fgNGS5oO5zEU1q5+JN5phe+gDANdk/8pCqDegRjZpHnj5gcm71jr
mEvbstmlV1XPZSA9aFVu2L5LZuv6asS02HjrbGydqO2DUTpDgrynHsjAuqqaTnwk
kLGwA2I+Kyh8g155H+HrQANONe2TTOXRkBVWIoFBWkhd1JUS/p3GE2qvqx5N36sr
U26d9i/Bk/pbOYOdRUFMc4lckZY2HrrRWj8dVr06qLr2rlbYoLv8hpqvdyHr9MJS
Id2Yqqd1JVzBDU3/+GfdFDBjltVtpSG1mdc6LWy9IpXFMJoSOQNsrbjf+mgMadeV
EbtY8MdDZ525EB8niv9E9Vfd56LiWTKCjg9Mc758+U+Ns50YlQFsDq2hBO2+jQ/k
W3QNX+WJiLgxqQK8adveXoYgCSWPNVttIvg8Lj1KNk8Y//vUC1KSrvDfaKiROzLB
iF0u5mLRAQ+Pfec6X5PP
=o9KL
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-for-mingo-4.10-20170104' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes and one improvement from Arnaldo Carvalho de Melo:
Fixes:
- Fix prev/next_prio formatting for deadline tasks in libtraceevent (Daniel Bristot de Oliveira)
- Robustify reading of build-ids from /sys/kernel/note (Arnaldo Carvalho de Melo)
- Fix building some sample/bpf in Alpine Linux 3.4 (Arnaldo Carvalho de Melo)
- Fix 'make install-bin' to install libtraceevent plugins (Arnaldo Carvalho de Melo)
- Fix 'perf record --switch-output' documentation and comment (Jiri Olsa)
- Fix 'perf probe' for cross arch probing (Masami Hiramatsu)
Improvement:
- Show total scheduling time in 'perf sched timehist' (Namhyumg Kim)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is just sample code. We forget to set the error codes in a couple
places.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Abstract access to mdev_device so that we can define which interfaces
are public rather than relying on comments in the structure.
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Jike Song <jike.song@intel.com>
Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>
Rather than hoping for good behavior by marking some elements
internal, enforce it by making the entire structure private and
creating an accessor function for the one useful external field.
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Jike Song <jike.song@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>
Add an mdev_ prefix so we're not poluting the namespace so much.
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Jike Song <jike.song@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>
This sample driver was originally under Documentation/ and was moved
to samples, but build support was never adjusted for the new location.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Stringer <joe@ovn.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-3awp0nv8tpnblatojmwjww7z@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To avoid the following build failure on Alpine Linux 3.4, that has
clang-3.8 with the bpf target:
HOSTCC samples/bpf/sock_example.o
In file included from /usr/include/net/ethernet.h:10:0,
from /git/linux/samples/bpf/sock_example.h:7,
from /git/linux/samples/bpf/sock_example.c:30:
/usr/include/netinet/if_ether.h:96:8: error: redefinition of 'struct
ethhdr'
struct ethhdr {
^
In file included from /git/linux/samples/bpf/sock_example.c:26:0:
./usr/include/linux/if_ether.h:144:8: note: originally defined here
struct ethhdr {
^
scripts/Makefile.host:124: recipe for target
'samples/bpf/sock_example.o' failed
make[2]: *** [samples/bpf/sock_example.o] Error 1
/git/linux/Makefile:1658: recipe for target 'samples/bpf/' failed
So include net/if_ether.h for the needs of sock_example.h, using the
same include that sock_example.c uses.
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Stringer <joe@ovn.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-m9avekl1b651qe1r1zd5tzz9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf fixes from Ingo Molnar:
"On the kernel side there's two x86 PMU driver fixes and a uprobes fix,
plus on the tooling side there's a number of fixes and some late
updates"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
perf sched timehist: Fix invalid period calculation
perf sched timehist: Remove hardcoded 'comm_width' check at print_summary
perf sched timehist: Enlarge default 'comm_width'
perf sched timehist: Honour 'comm_width' when aligning the headers
perf/x86: Fix overlap counter scheduling bug
perf/x86/pebs: Fix handling of PEBS buffer overflows
samples/bpf: Move open_raw_sock to separate header
samples/bpf: Remove perf_event_open() declaration
samples/bpf: Be consistent with bpf_load_program bpf_insn parameter
tools lib bpf: Add bpf_prog_{attach,detach}
samples/bpf: Switch over to libbpf
perf diff: Do not overwrite valid build id
perf annotate: Don't throw error for zero length symbols
perf bench futex: Fix lock-pi help string
perf trace: Check if MAP_32BIT is defined (again)
samples/bpf: Make perf_event_read() static
uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation
samples/bpf: Make samples more libbpf-centric
tools lib bpf: Add flags to bpf_create_map()
tools lib bpf: use __u32 from linux/types.h
...
This function was declared in libbpf.c and was the only remaining
function in this library, but has nothing to do with BPF. Shift it out
into a new header, sock_example.h, and include it from the relevant
samples.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20161209024620.31660-8-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This declaration was made in samples/bpf/libbpf.c for convenience, but
there's already one in tools/perf/perf-sys.h. Reuse that one.
Committer notes:
Testing it:
$ make -j4 O=../build/v4.9.0-rc8+ samples/bpf/
make[1]: Entering directory '/home/build/v4.9.0-rc8+'
CHK include/config/kernel.release
GEN ./Makefile
CHK include/generated/uapi/linux/version.h
Using /home/acme/git/linux as source for kernel
CHK include/generated/utsrelease.h
CHK include/generated/timeconst.h
CHK include/generated/bounds.h
CHK include/generated/asm-offsets.h
CALL /home/acme/git/linux/scripts/checksyscalls.sh
HOSTCC samples/bpf/test_verifier.o
HOSTCC samples/bpf/libbpf.o
HOSTCC samples/bpf/../../tools/lib/bpf/bpf.o
HOSTCC samples/bpf/test_maps.o
HOSTCC samples/bpf/sock_example.o
HOSTCC samples/bpf/bpf_load.o
<SNIP>
HOSTLD samples/bpf/trace_event
HOSTLD samples/bpf/sampleip
HOSTLD samples/bpf/tc_l2_redirect
make[1]: Leaving directory '/home/build/v4.9.0-rc8+'
$
Also tested the offwaketime resulting from the rebuild, seems to work as
before.
Signed-off-by: Joe Stringer <joe@ovn.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20161209024620.31660-7-joe@ovn.org
[ Use -I$(srctree)/tools/lib/ to support out of source code tree builds ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Only one of the examples declare the bpf_insn bpf proggie as a const:
$ grep 'struct bpf_insn [a-z]' samples/bpf/*.c
samples/bpf/fds_example.c: static const struct bpf_insn insns[] = {
samples/bpf/sock_example.c: struct bpf_insn prog[] = {
samples/bpf/test_cgrp2_attach2.c: struct bpf_insn prog[] = {
samples/bpf/test_cgrp2_attach.c: struct bpf_insn prog[] = {
samples/bpf/test_cgrp2_sock.c: struct bpf_insn prog[] = {
$
Which causes this warning:
[root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
<SNIP>
HOSTCC samples/bpf/fds_example.o
/git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
/git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
insns, insns_cnt, "GPL", 0,
^~~~~
In file included from /git/linux/samples/bpf/libbpf.h:5:0,
from /git/linux/samples/bpf/bpf_load.h:4,
from /git/linux/samples/bpf/fds_example.c:15:
/git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
^~~~~~~~~~~~~~~~
HOSTCC samples/bpf/sockex1_user.o
So just ditch that 'const' to reduce build noise, leaving changing the
bpf_load_program() bpf_insn parameter to const to a later patch, if deemed
adequate.
Cc: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-1z5xee8n3oa66jf62bpv16ed@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit d8c5b17f2b ("samples: bpf: add userspace example for attaching
eBPF programs to cgroups") added these functions to samples/libbpf, but
during this merge all of the samples libbpf functionality is shifting to
tools/lib/bpf. Shift these functions there.
Committer notes:
Use bzero + attr.FIELD = value instead of 'attr = { .FIELD = value, just
like the other wrapper calls to sys_bpf with bpf_attr to make this build
in older toolchais, such as the ones in CentOS 5 and 6.
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-au2zvtsh55vqeo3v3uw7jr4c@git.kernel.org
Link: 353e6f298c.patch
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now that libbpf under tools/lib/bpf/* is synced with the version from
samples/bpf, we can get rid most of the libbpf library here.
Committer notes:
Built it in a docker fedora rawhide container and ran it in the f25 host, seems
to work just like it did before this patch, i.e. the switch to tools/lib/bpf/
doesn't seem to have introduced problems and Joe said he tested it with
all the entries in samples/bpf/ and other code he found:
[root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux headers_install
<SNIP>
[root@f5065a7d6272 linux]# rm -rf /tmp/build/linux/samples/bpf/
[root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
make[1]: Entering directory '/tmp/build/linux'
CHK include/config/kernel.release
HOSTCC scripts/basic/fixdep
GEN ./Makefile
CHK include/generated/uapi/linux/version.h
Using /git/linux as source for kernel
CHK include/generated/utsrelease.h
HOSTCC scripts/basic/bin2c
HOSTCC arch/x86/tools/relocs_32.o
HOSTCC arch/x86/tools/relocs_64.o
LD samples/bpf/built-in.o
<SNIP>
HOSTCC samples/bpf/fds_example.o
HOSTCC samples/bpf/sockex1_user.o
/git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
/git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
insns, insns_cnt, "GPL", 0,
^~~~~
In file included from /git/linux/samples/bpf/libbpf.h:5:0,
from /git/linux/samples/bpf/bpf_load.h:4,
from /git/linux/samples/bpf/fds_example.c:15:
/git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
^~~~~~~~~~~~~~~~
HOSTCC samples/bpf/sockex2_user.o
<SNIP>
HOSTCC samples/bpf/xdp_tx_iptunnel_user.o
clang -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include -I/git/linux/arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated -I/git/linux/include -I./include -I/git/linux/arch/x86/include/uapi -I/git/linux/include/uapi -I./include/generated/uapi -include /git/linux/include/linux/kconfig.h \
-D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
-Wno-compare-distinct-pointer-types \
-Wno-gnu-variable-sized-type-not-at-end \
-Wno-address-of-packed-member -Wno-tautological-compare \
-O2 -emit-llvm -c /git/linux/samples/bpf/sockex1_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/sockex1_kern.o
HOSTLD samples/bpf/tc_l2_redirect
<SNIP>
HOSTLD samples/bpf/lwt_len_hist
HOSTLD samples/bpf/xdp_tx_iptunnel
make[1]: Leaving directory '/tmp/build/linux'
[root@f5065a7d6272 linux]#
And then, in the host:
[root@jouet bpf]# mount | grep "docker.*devicemapper\/"
/dev/mapper/docker-253:0-1705076-9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 on /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 type xfs (rw,relatime,context="system_u:object_r:container_file_t:s0:c73,c276",nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota)
[root@jouet bpf]# cd /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9/rootfs/tmp/build/linux/samples/bpf/
[root@jouet bpf]# file offwaketime
offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=f423d171e0487b2f802b6a792657f0f3c8f6d155, not stripped
[root@jouet bpf]# readelf -SW offwaketime
offwaketime offwaketime_kern.o offwaketime_user.o
[root@jouet bpf]# readelf -SW offwaketime_kern.o
There are 11 section headers, starting at offset 0x700:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .strtab STRTAB 0000000000000000 000658 0000a8 00 0 0 1
[ 2] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 4
[ 3] kprobe/try_to_wake_up PROGBITS 0000000000000000 000040 0000d8 00 AX 0 0 8
[ 4] .relkprobe/try_to_wake_up REL 0000000000000000 0005a8 000020 10 10 3 8
[ 5] tracepoint/sched/sched_switch PROGBITS 0000000000000000 000118 000318 00 AX 0 0 8
[ 6] .reltracepoint/sched/sched_switch REL 0000000000000000 0005c8 000090 10 10 5 8
[ 7] maps PROGBITS 0000000000000000 000430 000050 00 WA 0 0 4
[ 8] license PROGBITS 0000000000000000 000480 000004 00 WA 0 0 1
[ 9] version PROGBITS 0000000000000000 000484 000004 00 WA 0 0 4
[10] .symtab SYMTAB 0000000000000000 000488 000120 18 1 4 8
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
[root@jouet bpf]# ./offwaketime | head -3
qemu-system-x86;entry_SYSCALL_64_fastpath;sys_ppoll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;hrtimer_wakeup;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel;start_cpu;;swapper/0 4
firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 1
swapper/2;start_cpu;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 61
[root@jouet bpf]#
Signed-off-by: Joe Stringer <joe@ovn.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: 5c40f54a52.patch
Link: http://lkml.kernel.org/n/tip-xr8twtx7sjh5821g8qw47yxk@git.kernel.org
[ Use -I$(srctree)/tools/lib/ to support out of source code tree builds, as noticed by Wang Nan ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
While testing Joe's conversion of samples/bpf/ to use tools/lib/bpf/ I noticed
some warnings building samples/bpf/ on a Fedora Rawhide container, with
clang/llvm 3.9 I noticed this:
[root@1e797fdfbf4f linux]# make -j4 O=/tmp/build/linux/ samples/bpf/
make[1]: Entering directory '/tmp/build/linux'
CHK include/config/kernel.release
GEN ./Makefile
CHK include/generated/uapi/linux/version.h
Using /git/linux as source for kernel
<SNIP>
HOSTCC samples/bpf/trace_output_user.o
/git/linux/samples/bpf/trace_output_user.c:64:6: warning: no previous
prototype for 'perf_event_read' [-Wmissing-prototypes]
void perf_event_read(print_fn fn)
^~~~~~~~~~~~~~~
HOSTLD samples/bpf/trace_output
make[1]: Leaving directory '/tmp/build/linux'
Shut up the compiler by making that function static.
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Joe Stringer <joe@ovn.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20161215152927.GC6866@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull kbuild updates from Michal Marek:
- prototypes for x86 asm-exported symbols (Adam Borowski) and a warning
about missing CRCs (Nick Piggin)
- asm-exports fix for LTO (Nicolas Pitre)
- thin archives improvements (Nick Piggin)
- linker script fix for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION (Nick
Piggin)
- genksyms support for __builtin_va_list keyword
- misc minor fixes
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
x86/kbuild: enable modversions for symbols exported from asm
kbuild: fix scripts/adjust_autoksyms.sh* for the no modules case
scripts/kallsyms: remove last remnants of --page-offset option
make use of make variable CURDIR instead of calling pwd
kbuild: cmd_export_list: tighten the sed script
kbuild: minor improvement for thin archives build
kbuild: modpost warn if export version crc is missing
kbuild: keep data tables through dead code elimination
kbuild: improve linker compatibility with lib-ksyms.o build
genksyms: Regenerate parser
kbuild/genksyms: handle va_list type
kbuild: thin archives for multi-y targets
kbuild: kallsyms allow 3-pass generation if symbols size has changed
o STM can hook into the function tracer
o Function filtering now supports more advance glob matching
o Ftrace selftests updates and added tests
o Softirq tag in traces now show only softirqs
o ARM nop added to non traced locations at compile time
o New trace_marker_raw file that allows for binary input
o Optimizations to the ring buffer
o Removal of kmap in trace_marker
o Wakeup and irqsoff tracers now adhere to the set_graph_notrace file
o Other various fixes and clean ups
Note, there are two patches marked for stable. These were discovered
near the end of the 4.9 rc release cycle. By the time I had them tested
it was just a matter of days before 4.9 would be released, and I
figured I would just submit them in the merge window. They are old
bugs and not critical. Nothing non-root could abuse.
-----BEGIN PGP SIGNATURE-----
iQExBAABCAAbBQJYUrFHFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
2+AIAIr20kSQV/nA5htGAeCTobVk3WUxY6bvjd9mIJDKPP19akNLyREW0G3KnfCr
yhx4aFRZG98fRu/6F8qieRosyN36lADDVYHelMFHMpcTOpE2aZGjaaOuNGxOEA9v
FmMPTX+K3+dzKyFP4l68R3+5JuQ1/AqLTioTWeLW8IDQ2OOVsjD8+0BuXrNKMJDY
o6U4Hk5U/vn+zHc6BmgBzloAXemBd7iJ1t5V3FRRGvm8yv3HU85Twc5ofGeYTWvB
J8PboEywRlIzxg0Kd8mxnMI5PgaKZSEc2ub8E7cY/CZ5PYpDE2xDA2hJmJgfYp00
1VW+DHRpRZfElsCcya6S6P4bs5Y=
=MGZ/
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"This release has a few updates:
- STM can hook into the function tracer
- Function filtering now supports more advance glob matching
- Ftrace selftests updates and added tests
- Softirq tag in traces now show only softirqs
- ARM nop added to non traced locations at compile time
- New trace_marker_raw file that allows for binary input
- Optimizations to the ring buffer
- Removal of kmap in trace_marker
- Wakeup and irqsoff tracers now adhere to the set_graph_notrace file
- Other various fixes and clean ups"
* tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (42 commits)
selftests: ftrace: Shift down default message verbosity
kprobes/trace: Fix kprobe selftest for newer gcc
tracing/kprobes: Add a helper method to return number of probe hits
tracing/rb: Init the CPU mask on allocation
tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions too
fgraph: Handle a case where a tracer ignores set_graph_notrace
tracing: Replace kmap with copy_from_user() in trace_marker writing
ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it
tracing: Allow benchmark to be enabled at early_initcall()
tracing: Have system enable return error if one of the events fail
tracing: Do not start benchmark on boot up
tracing: Have the reg function allow to fail
ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inline
ring-buffer: Froce rb_update_write_stamp() to be inlined
ring-buffer: Force inline of hotpath helper functions
tracing: Make __buffer_unlock_commit() always_inline
tracing: Make tracepoint_printk a static_key
ring-buffer: Always inline rb_event_data()
ring-buffer: Make rb_reserve_next_event() always inlined
...
Switch all of the sample code to use the function names from
tools/lib/bpf so that they're consistent with that, and to declare their
own log buffers. This allow the next commit to be purely devoted to
getting rid of the duplicate library in samples/bpf.
Committer notes:
Testing it:
On a fedora rawhide container, with clang/llvm 3.9, sharing the host
linux kernel git tree:
# make O=/tmp/build/linux/ headers_install
# make O=/tmp/build/linux -C samples/bpf/
Since I forgot to make it privileged, just tested it outside the
container, using what it generated:
# uname -a
Linux jouet 4.9.0-rc8+ #1 SMP Mon Dec 12 11:20:49 BRT 2016 x86_64 x86_64 x86_64 GNU/Linux
# cd /var/lib/docker/devicemapper/mnt/c43e09a53ff56c86a07baf79847f00e2cc2a17a1e2220e1adbf8cbc62734feda/rootfs/tmp/build/linux/samples/bpf/
# ls -la offwaketime
-rwxr-xr-x. 1 root root 24200 Dec 15 12:19 offwaketime
# file offwaketime
offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=c940d3f127d5e66cdd680e42d885cb0b64f8a0e4, not stripped
# readelf -SW offwaketime_kern.o | grep PROGBITS
[ 2] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 4
[ 3] kprobe/try_to_wake_up PROGBITS 0000000000000000 000040 0000d8 00 AX 0 0 8
[ 5] tracepoint/sched/sched_switch PROGBITS 0000000000000000 000118 000318 00 AX 0 0 8
[ 7] maps PROGBITS 0000000000000000 000430 000050 00 WA 0 0 4
[ 8] license PROGBITS 0000000000000000 000480 000004 00 WA 0 0 1
[ 9] version PROGBITS 0000000000000000 000484 000004 00 WA 0 0 4
# ./offwaketime | head -5
swapper/1;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 106
CPU 0/KVM;entry_SYSCALL_64_fastpath;sys_ioctl;do_vfs_ioctl;kvm_vcpu_ioctl;kvm_arch_vcpu_ioctl_run;kvm_vcpu_block;schedule;__schedule;-;try_to_wake_up;swake_up_locked;swake_up;apic_timer_expired;apic_timer_fn;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary;;swapper/3 2
Compositor;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;futex_requeue;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;SoftwareVsyncTh 5
firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 13
JS Helper;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;firefox 2
#
Signed-off-by: Joe Stringer <joe@ovn.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20161214224342.12858-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull security subsystem updates from James Morris:
"Generally pretty quiet for this release. Highlights:
Yama:
- allow ptrace access for original parent after re-parenting
TPM:
- add documentation
- many bugfixes & cleanups
- define a generic open() method for ascii & bios measurements
Integrity:
- Harden against malformed xattrs
SELinux:
- bugfixes & cleanups
Smack:
- Remove unnecessary smack_known_invalid label
- Do not apply star label in smack_setprocattr hook
- parse mnt opts after privileges check (fixes unpriv DoS vuln)"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (56 commits)
Yama: allow access for the current ptrace parent
tpm: adjust return value of tpm_read_log
tpm: vtpm_proxy: conditionally call tpm_chip_unregister
tpm: Fix handling of missing event log
tpm: Check the bios_dir entry for NULL before accessing it
tpm: return -ENODEV if np is not set
tpm: cleanup of printk error messages
tpm: replace of_find_node_by_name() with dev of_node property
tpm: redefine read_log() to handle ACPI/OF at runtime
tpm: fix the missing .owner in tpm_bios_measurements_ops
tpm: have event log use the tpm_chip
tpm: drop tpm1_chip_register(/unregister)
tpm: replace dynamically allocated bios_dir with a static array
tpm: replace symbolic permission with octal for securityfs files
char: tpm: fix kerneldoc tpm2_unseal_trusted name typo
tpm_tis: Allow tpm_tis to be bound using DT
tpm, tpm_vtpm_proxy: add kdoc comments for VTPM_PROXY_IOC_NEW_DEV
tpm: Only call pm_runtime_get_sync if device has a parent
tpm: define a generic open() method for ascii & bios measurements
Documentation: tpm: add the Physical TPM device tree binding documentation
...
- VFIO updates for v4.10 primarily include a new Mediated Device
interface, which essentially allows software defined devices to be
exposed to users through VFIO. The host vendor driver providing
this virtual device polices, or mediates user access to the device.
These devices often incorporate portions of real devices, for
instance the primary initial users of this interface expose vGPUs
which allow the user to map mediated devices, or mdevs, to a
portion of a physical GPU. QEMU composes these mdevs into PCI
representations using the existing VFIO user API. This enables
both Intel KVM-GT support, which is also expected to arrive into
Linux mainline during the v4.10 merge window, as well as NVIDIA
vGPU, and also Channel I/O devices (aka CCW devices) for s390
virtualization support. (Kirti Wankhede, Neo Jia)
- Drop unnecessary uses of pcibios_err_to_errno() (Cao Jin)
- Fixes to VFIO capability chain handling (Eric Auger)
- Error handling fixes for fallout from mdev (Christophe JAILLET)
- Notifiers to expose struct kvm to mdev vendor drivers (Jike Song)
- type1 IOMMU model search fixes (Kirti Wankhede, Neo Jia)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJYSyCtAAoJECObm247sIsi+rIP/3Q/GE3zaDdz1iKQK/c/qhs6
0Pl45opAqw4wCJDCZIhRmoHmsCaT4KkeJKU1fiYc0mKJhW11HfA4DTFwzBqrHBj7
7wPjHTaWwlFRHCYVCWYEp5g9UASyD8ubWGyZKzqIXELFoAvwuBL3SULNj4neJKKR
rPcHTVxJ7laYIjHFzuNUi/MWEdjxPT9oJn8Bm9mhISwPglIMU9nkIR20ChaSeFJb
MiFqFW7BcvkVyqupjpksM9DodpNZu+3uSMVtgASNVNbilf0FXJr0d8RCbeSxTIfm
rEsZ5+0PrklhCtmRRl5EB+tNawgaism8wAF74KIO//76vE02Usrxb0b5mTIZ8TiN
6/Z+WID5D+ZRt8hp9hJIJmGE/sM/odH4r174dPaiEkMvOB9ksDIPkzgbtDbVY40c
DACb7/n3ZZA0an2Eq2HEx/BqTOvt9sgu367KVvhuoIArQcb5SM94GT03Dv+pKnax
Cxmro2oaWmAV3IS0vNzbCIddsFqlPjkFIYxjtzBy+bVLg2RN3STyaSL6cwJsydSU
KLcCPiYtovczKFj7RJlgVlqh5/8uZ7SEffTkIggehdnVPAfDlK9p9BYqLCgAoWpN
vwWidM3qOIjooRXQgxUwJgJsl4MLRMoA/gFP4iHbqOgIAGtUDRHuQ4muvkf+LLxg
wpgfXsBQNRuVcZHBUEVe
=gc6j
-----END PGP SIGNATURE-----
Merge tag 'vfio-v4.10-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- VFIO updates for v4.10 primarily include a new Mediated Device
interface, which essentially allows software defined devices to be
exposed to users through VFIO. The host vendor driver providing this
virtual device polices, or mediates user access to the device.
These devices often incorporate portions of real devices, for
instance the primary initial users of this interface expose vGPUs
which allow the user to map mediated devices, or mdevs, to a portion
of a physical GPU. QEMU composes these mdevs into PCI representations
using the existing VFIO user API. This enables both Intel KVM-GT
support, which is also expected to arrive into Linux mainline during
the v4.10 merge window, as well as NVIDIA vGPU, and also Channel I/O
devices (aka CCW devices) for s390 virtualization support. (Kirti
Wankhede, Neo Jia)
- Drop unnecessary uses of pcibios_err_to_errno() (Cao Jin)
- Fixes to VFIO capability chain handling (Eric Auger)
- Error handling fixes for fallout from mdev (Christophe JAILLET)
- Notifiers to expose struct kvm to mdev vendor drivers (Jike Song)
- type1 IOMMU model search fixes (Kirti Wankhede, Neo Jia)
* tag 'vfio-v4.10-rc1' of git://github.com/awilliam/linux-vfio: (30 commits)
vfio iommu type1: Fix size argument to vfio_find_dma() in pin_pages/unpin_pages
vfio iommu type1: Fix size argument to vfio_find_dma() during DMA UNMAP.
vfio iommu type1: WARN_ON if notifier block is not unregistered
kvm: set/clear kvm to/from vfio_group when group add/delete
vfio: support notifier chain in vfio_group
vfio: vfio_register_notifier: classify iommu notifier
vfio: Fix handling of error returned by 'vfio_group_get_from_dev()'
vfio: fix vfio_info_cap_add/shift
vfio/pci: Drop unnecessary pcibios_err_to_errno()
MAINTAINERS: Add entry VFIO based Mediated device drivers
docs: Sample driver to demonstrate how to use Mediated device framework.
docs: Sysfs ABI for mediated device framework
docs: Add Documentation for Mediated devices
vfio: Define device_api strings
vfio_platform: Updated to use vfio_set_irqs_validate_and_prepare()
vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare()
vfio: Introduce vfio_set_irqs_validate_and_prepare()
vfio_pci: Update vfio_pci to use vfio_info_add_capability()
vfio: Introduce common function to add capabilities
vfio iommu: Add blocking notifier to notify DMA_UNMAP
...
make already provides the current working directory in a variable, so make
use of it instead of forking a shell. Also replace usage of PWD by
CURDIR. PWD is provided by most shells, but not all, so this makes the
build system more robust.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Michal Marek <mmarek@suse.com>
Some tracepoints have a registration function that gets enabled when the
tracepoint is enabled. There may be cases that the registraction function
must fail (for example, can't allocate enough memory). In this case, the
tracepoint should also fail to register, otherwise the user would not know
why the tracepoint is not working.
Cc: David Howells <dhowells@redhat.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The XDP prog checks if the incoming packet matches any VIP:PORT
combination in the BPF hashmap. If it is, it will encapsulate
the packet with a IPv4/v6 header as instructed by the value of
the BPF hashmap and then XDP_TX it out.
The VIP:PORT -> IP-Encap-Info can be specified by the cmd args
of the user prog.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the sample program test_cgrp2_attach2. This program is
similar to test_cgrp2_attach, but it performs automated testing of the
cgroupv2 BPF attached filters. It runs the following checks:
* Simple filter attachment
* Application of filters to child cgroups
* Overriding filters on child cgroups
* Checking that this still works when the parent filter is removed
The filters that are used here are simply allow all / deny all filters, so
it isn't checking the actual functionality of the filters, but rather
the behaviour around detachment / attachment. If net_cls is enabled,
this test will fail.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modifies test_current_task_under_cgroup_user. The test has
several helpers around creating a temporary environment for cgroup
testing, and moving the current task around cgroups. This set of
helpers can then be used in other tests.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
silence some of the clang compiler warnings like:
include/linux/fs.h:2693:9: warning: comparison of unsigned enum expression < 0 is always false
arch/x86/include/asm/processor.h:491:30: warning: taking address of packed member 'sp0' of class or structure 'x86_hw_tss' may result in an unaligned pointer value
include/linux/cgroup-defs.h:326:16: warning: field 'cgrp' with variable sized type 'struct cgroup' not at the end of a struct or class is a GNU extension
since they add too much noise to samples/bpf/ build.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Couple conflicts resolved here:
1) In the MACB driver, a bug fix to properly initialize the
RX tail pointer properly overlapped with some changes
to support variable sized rings.
2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
overlapping with a reorganization of the driver to support
ACPI, OF, as well as PCI variants of the chip.
3) In 'net' we had several probe error path bug fixes to the
stmmac driver, meanwhile a lot of this code was cleaned up
and reorganized in 'net-next'.
4) The cls_flower classifier obtained a helper function in
'net-next' called __fl_delete() and this overlapped with
Daniel Borkamann's bug fix to use RCU for object destruction
in 'net'. It also overlapped with Jiri's change to guard
the rhashtable_remove_fast() call with a check against
tc_skip_sw().
5) In mlx4, a revert bug fix in 'net' overlapped with some
unrelated changes in 'net-next'.
6) In geneve, a stale header pointer after pskb_expand_head()
bug fix in 'net' overlapped with a large reorganization of
the same code in 'net-next'. Since the 'net-next' code no
longer had the bug in question, there was nothing to do
other than to simply take the 'net-next' hunks.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add examples preventing a process in a cgroup from opening a socket
based family, protocol and type.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for section names starting with cgroup/skb and cgroup/sock.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a simple program to demonstrate the ability to attach a bpf program
to a cgroup that sets sk_bound_dev_if for AF_INET{6} sockets when they
are created.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following build error:
HOSTCC samples/bpf/test_lru_dist.o
../samples/bpf/test_lru_dist.c:25:22: fatal error: bpf_util.h: No such file or directory
This is due to objtree != srctree.
Use srctree, since that's where bpf_util.h is located.
Fixes: e00c7b216f ("bpf: fix multiple issues in selftest suite and samples")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modifies test_cgrp2_attach to use getopt so we can use standard
command line parsing.
It also adds an option to run the program in detach only mode. This does
not attach a new filter at the cgroup, but only runs the detach command.
Lastly, it changes the attach code to not detach and then attach. It relies
on the 'hotswap' behaviour of CGroup BPF programs to be able to change
in-place. If detach-then-attach behaviour needs to be tested, the example
can be run in detach only mode prior to attachment.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The files "sampleip_kern.c" and "trace_event_kern.c" directly access
"ctx->regs.ip" which is not available on s390x. Fix this and use the
PT_REGS_IP() macro instead.
Also fix the macro for s390x and use "psw.addr" from "pt_regs".
Reported-by: Zvonko Kosic <zvonko.kosic@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) The test_lru_map and test_lru_dist fails building on my machine since
the sys/resource.h header is not included.
2) test_verifier fails in one test case where we try to call an invalid
function, since the verifier log output changed wrt printing function
names.
3) Current selftest suite code relies on sysconf(_SC_NPROCESSORS_CONF) for
retrieving the number of possible CPUs. This is broken at least in our
scenario and really just doesn't work.
glibc tries a number of things for retrieving _SC_NPROCESSORS_CONF.
First it tries equivalent of /sys/devices/system/cpu/cpu[0-9]* | wc -l,
if that fails, depending on the config, it either tries to count CPUs
in /proc/cpuinfo, or returns the _SC_NPROCESSORS_ONLN value instead.
If /proc/cpuinfo has some issue, it returns just 1 worst case. This
oddity is nothing new [1], but semantics/behaviour seems to be settled.
_SC_NPROCESSORS_ONLN will parse /sys/devices/system/cpu/online, if
that fails it looks into /proc/stat for cpuX entries, and if also that
fails for some reason, /proc/cpuinfo is consulted (and returning 1 if
unlikely all breaks down).
While that might match num_possible_cpus() from the kernel in some
cases, it's really not guaranteed with CPU hotplugging, and can result
in a buffer overflow since the array in user space could have too few
number of slots, and on perpcu map lookup, the kernel will write beyond
that memory of the value buffer.
William Tu reported such mismatches:
[...] The fact that sysconf(_SC_NPROCESSORS_CONF) != num_possible_cpu()
happens when CPU hotadd is enabled. For example, in Fusion when
setting vcpu.hotadd = "TRUE" or in KVM, setting ./qemu-system-x86_64
-smp 2, maxcpus=4 ... the num_possible_cpu() will be 4 and sysconf()
will be 2 [2]. [...]
Documentation/cputopology.txt says /sys/devices/system/cpu/possible
outputs cpu_possible_mask. That is the same as in num_possible_cpus(),
so first step would be to fix the _SC_NPROCESSORS_CONF calls with our
own implementation. Later, we could add support to bpf(2) for passing
a mask via CPU_SET(3), for example, to just select a subset of CPUs.
BPF samples code needs this fix as well (at least so that people stop
copying this). Thus, define bpf_num_possible_cpus() once in selftests
and import it from there for the sample code to avoid duplicating it.
The remaining sysconf(_SC_NPROCESSORS_CONF) in samples are unrelated.
After all three issues are fixed, the test suite runs fine again:
# make run_tests | grep self
selftests: test_verifier [PASS]
selftests: test_maps [PASS]
selftests: test_lru_map [PASS]
selftests: test_kmod.sh [PASS]
[1] https://www.sourceware.org/ml/libc-alpha/2011-06/msg00079.html
[2] https://www.mail-archive.com/netdev@vger.kernel.org/msg121183.html
Fixes: 3059303f59 ("samples/bpf: update tracex[23] examples to use per-cpu maps")
Fixes: 86af8b4191 ("Add sample for adding simple drop program to link")
Fixes: df570f5772 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
Fixes: e155967179 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_HASH")
Fixes: ebb676daa1 ("bpf: Print function name in addition to function id")
Fixes: 5db58faf98 ("bpf: Add tests for the LRU bpf_htab")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a simple userpace program to demonstrate the new API to attach eBPF
programs to cgroups. This is what it does:
* Create arraymap in kernel with 4 byte keys and 8 byte values
* Load eBPF program
The eBPF program accesses the map passed in to store two pieces of
information. The number of invocations of the program, which maps
to the number of packets received, is stored to key 0. Key 1 is
incremented on each iteration by the number of bytes stored in
the skb.
* Detach any eBPF program previously attached to the cgroup
* Attach the new program to the cgroup using BPF_PROG_ATTACH
* Once a second, read map[0] and map[1] to see how many bytes and
packets were seen on any socket of tasks in the given cgroup.
The program takes a cgroup path as 1st argument, and either "ingress"
or "egress" as 2nd. Optionally, "drop" can be passed as 3rd argument,
which will make the generated eBPF program return 0 instead of 1, so
the kernel will drop the packet.
libbpf gained two new wrappers for the new syscall commands.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
llvm can emit relocations into sections other than program code
(like debug info sections). Ignore them during parsing of elf file
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
since llvm commit "Do not expand UNDEF SDNode during insn selection lowering"
llvm will generate code that uses uninitialized registers for cases
where C code is actually uses uninitialized data.
So this sockex2 example is technically broken.
Fix it by initializing on the stack variable fully.
Also increase verifier buffer limit, since verifier output
may not fit in 64k for this sockex2 code depending on llvm version.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Sample driver creates mdev device that simulates serial port over PCI
card.
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
This patch has some unit tests and a test_lru_dist.
The test_lru_dist reads in the numeric keys from a file.
The files used here are generated by a modified fio-genzipf tool
originated from the fio test suit. The sample data file can be
found here: https://github.com/iamkafai/bpf-lru
The zipf.* data files have 100k numeric keys and the key is also
ranged from 1 to 100k.
The test_lru_dist outputs the number of unique keys (nr_unique).
F.e. The following means, 61239 of them is unique out of 100k keys.
nr_misses means it cannot be found in the LRU map, so nr_misses
must be >= nr_unique. test_lru_dist also simulates a perfect LRU
map as a comparison:
[root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \
/root/zipf.100k.a1_01.out 4000 1
...
test_parallel_lru_dist (map_type:9 map_flags:0x0):
task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31603(/100000)
task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000)
....
test_parallel_lru_dist (map_type:9 map_flags:0x2):
task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31710(/100000)
task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000)
[root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \
/root/zipf.100k.a0_01.out 40000 1
...
test_parallel_lru_dist (map_type:9 map_flags:0x0):
task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67054(/100000)
task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000)
...
test_parallel_lru_dist (map_type:9 map_flags:0x2):
task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67068(/100000)
task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000)
LRU map has also been added to map_perf_test:
/* Global LRU */
[root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
./map_perf_test 16 $i | awk '{r += $3}END{print r " updates"}'; done
1 cpus: 2934082 updates
4 cpus: 7391434 updates
8 cpus: 6500576 updates
/* Percpu LRU */
[root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
./map_perf_test 32 $i | awk '{r += $3}END{print r " updates"}'; done
1 cpus: 2896553 updates
4 cpus: 9766395 updates
8 cpus: 17460553 updates
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The test creates two netns, ns1 and ns2. The host (the default netns)
has an ipip or ip6tnl dev configured for tunneling traffic to the ns2.
ping VIPS from ns1 <----> host <--tunnel--> ns2 (VIPs at loopback)
The test is to have ns1 pinging VIPs configured at the loopback
interface in ns2.
The VIPs are 10.10.1.102 and 2401:face::66 (which are configured
at lo@ns2). [Note: 0x66 => 102].
At ns1, the VIPs are routed _via_ the host.
At the host, bpf programs are installed at the veth to redirect packets
from a veth to the ipip/ip6tnl. The test is configured in a way so
that both ingress and egress can be tested.
At ns2, the ipip/ip6tnl dev is configured with the local and remote address
specified. The return path is routed to the dev ipip/ip6tnl.
During egress test, the host also locally tests pinging the VIPs to ensure
that bpf_redirect at egress also works for the direct egress (i.e. not
forwarding from dev ve1 to ve2).
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, the program size was incorrectly truncated to 8 bits,
resulting in broken labels in large programs. Also changes the jump
resolution loop to not rely on undefined behavior (making a pointer
point before the filter array).
Signed-off-by: Ricky Zhou <rickyz@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Either CAP_SYS_ADMIN or PR_SET_NO_NEW_PRIVS is required to enable
seccomp. This allows samples/seccomp/dropper to be run without
CAP_SYS_ADMIN.
Signed-off-by: Ricky Zhou <rickyz@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
In f6041c1d, a separate SAMPLES_SECCOMP option was added. This changed
hostprogs-y to hostprogs-m, so adjust it.
Signed-off-by: Ricky Zhou <rickyz@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>