Commit Graph

17084 Commits

Author SHA1 Message Date
David S. Miller
95337b9821 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Remove the broute pseudo hook, implement this from the bridge
   prerouting hook instead. Now broute becomes real table in ebtables,
   from Florian Westphal. This also includes a size reduction patch for the
   bridge control buffer area via squashing boolean into bitfields and
   a selftest.

2) Add OS passive fingerprint version matching, from Fernando Fernandez.

3) Support for gue encapsulation for IPVS, from Jacky Hu.

4) Add support for NAT to the inet family, from Florian Westphal.
   This includes support for masquerade, redirect and nat extensions.

5) Skip interface lookup in flowtable, use device in the dst object.

6) Add jiffies64_to_msecs() and use it, from Li RongQing.

7) Remove unused parameter in nf_tables_set_desc_parse(), from Colin Ian King.

8) Statify several functions, patches from YueHaibing and Florian Westphal.

9) Add an optimized version of nf_inet_addr_cmp(), from Li RongQing.

10) Merge route extension to core, also from Florian.

11) Use IS_ENABLED(CONFIG_NF_NAT) instead of NF_NAT_NEEDED, from Florian.

12) Merge ip/ip6 masquerade extensions, from Florian. This includes
    netdevice notifier unification.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-15 12:07:35 -07:00
Jiri Pirko
38f58c9723 netdevsim: move sdev specific bpf debugfs files to sdev dir
Some netdevsim bpf debugfs files are per-sdev, yet they are defined per
netdevsim instance. Move them under sdev directory.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:49:54 -07:00
David Ahern
56490b623a selftests: Add debugging options to pmtu.sh
pmtu.sh script runs a number of tests and dumps a summary of pass/fail.
If a test fails, it is near impossible to debug why. For example:

    TEST: ipv6: PMTU exceptions                       [FAIL]

There are a lot of commands run behind the scenes for this test. Which
one is failing?

Add a VERBOSE option to show commands that are run and any output from
those commands. Add a PAUSE_ON_FAIL option to halt the script if a test
fails allowing users to poke around with the setup in the failed state.

In the process, rename tracing to TRACING and move declaration to top
with the new variables.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 21:32:00 -07:00
David S. Miller
bb23581b9b Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2019-04-12

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Improve BPF verifier scalability for large programs through two
   optimizations: i) remove verifier states that are not useful in pruning,
   ii) stop walking parentage chain once first LIVE_READ is seen. Combined
   gives approx 20x speedup. Increase limits for accepting large programs
   under root, and add various stress tests, from Alexei.

2) Implement global data support in BPF. This enables static global variables
   for .data, .rodata and .bss sections to be properly handled which allows
   for more natural program development. This also opens up the possibility
   to optimize program workflow by compiling ELFs only once and later only
   rewriting section data before reload, from Daniel and with test cases and
   libbpf refactoring from Joe.

3) Add config option to generate BTF type info for vmlinux as part of the
   kernel build process. DWARF debug info is converted via pahole to BTF.
   Latter relies on libbpf and makes use of BTF deduplication algorithm which
   results in 100x savings compared to DWARF data. Resulting .BTF section is
   typically about 2MB in size, from Andrii.

4) Add BPF verifier support for stack access with variable offset from
   helpers and add various test cases along with it, from Andrey.

5) Extend bpf_skb_adjust_room() growth BPF helper to mark inner MAC header
   so that L2 encapsulation can be used for tc tunnels, from Alan.

6) Add support for input __sk_buff context in BPF_PROG_TEST_RUN so that
   users can define a subset of allowed __sk_buff fields that get fed into
   the test program, from Stanislav.

7) Add bpf fs multi-dimensional array tests for BTF test suite and fix up
   various UBSAN warnings in bpftool, from Yonghong.

8) Generate a pkg-config file for libbpf, from Luca.

9) Dump program's BTF id in bpftool, from Prashant.

10) libbpf fix to use smaller BPF log buffer size for AF_XDP's XDP
    program, from Magnus.

11) kallsyms related fixes for the case when symbols are not present in
    BPF selftests and samples, from Daniel
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 17:00:05 -07:00
Florian Westphal
26f7fe4a5d selftests: netfilter: add ebtables broute test case
ebtables -t broute allows to redirect packets in a way that
they get pushed up the stack, even if the interface is part
of a bridge.

In case of IP packets to non-local address, this means
those IP packets are routed instead of bridged-forwarded, just
as if the bridge would not have existed.

Expected test output is:
PASS: netns connectivity: ns1 and ns2 can reach each other
PASS: ns1/ns2 connectivity with active broute rule
PASS: ns1/ns2 connectivity with active broute rule and bridge forward drop

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-12 01:45:58 +02:00
Daniel Borkmann
6b7a21140f tools: add smp_* barrier variants to include infrastructure
Add the definition for smp_rmb(), smp_wmb(), and smp_mb() to the
tools include infrastructure: this patch adds the implementation
for x86-64 and arm64, and have it fall back as currently is for
other archs which do not have it implemented at this point. The
x86-64 one uses lock + add combination for smp_mb() with address
below red zone.

This is on top of 09d62154f6 ("tools, perf: add and use optimized
ring_buffer_{read_head, write_tail} helpers"), which didn't touch
smp_* barrier implementations. Magnus recently rightfully reported
however that the latter on x86-64 still wrongly falls back to sfence,
lfence and mfence respectively, thus fix that for applications under
tools making use of these to avoid such ugly surprises. The main
header under tools (include/asm/barrier.h) will in that case not
select the fallback implementation.

Reported-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-11 14:45:50 -07:00
Alan Maguire
3ec61df82b selftests_bpf: add L2 encap to test_tc_tunnel
Update test_tc_tunnel to verify adding inner L2 header
encapsulation (an MPLS label or ethernet header) works.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-11 22:50:57 +02:00
Alan Maguire
1db04c300a bpf: sync bpf.h to tools/ for BPF_F_ADJ_ROOM_ENCAP_L2
Sync include/uapi/linux/bpf.h with tools/ equivalent to add
BPF_F_ADJ_ROOM_ENCAP_L2(len) macro.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-11 22:50:57 +02:00
Alan Maguire
166b5a7f2c selftests_bpf: extend test_tc_tunnel for UDP encap
commit 868d523535 ("bpf: add bpf_skb_adjust_room encap flags")
introduced support to bpf_skb_adjust_room for GSO-friendly GRE
and UDP encapsulation and later introduced associated test_tc_tunnel
tests.  Here those tests are extended to cover UDP encapsulation also.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-11 22:50:56 +02:00
Vlad Buslov
9e35552ae1 net: sched: flower: use correct ht function to prevent duplicates
Implementation of function rhashtable_insert_fast() check if its internal
helper function __rhashtable_insert_fast() returns non-NULL pointer and
seemingly return -EEXIST in such case. However, since
__rhashtable_insert_fast() is called with NULL key pointer, it never
actually checks for duplicates, which means that -EEXIST is never returned
to the user. Use rhashtable_lookup_insert_fast() hash table API instead. In
order to verify that it works as expected and prevent the problem from
happening in future, extend tc-tests with new test that verifies that no
new filters with existing key can be inserted to flower classifier.

Fixes: 1f17f7742e ("net: sched: flower: insert filter to ht before offloading it to hw")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11 11:33:06 -07:00
Stanislav Fomichev
3daf8e703e selftests: bpf: add selftest for __sk_buff context in BPF_PROG_TEST_RUN
Simple test that sets cb to {1,2,3,4,5} and priority to 6, runs bpf
program that fails if cb is not what we expect and increments cb[i] and
priority. When the test finishes, we check that cb is now {2,3,4,5,6}
and priority is 7.

We also test the sanity checks:
* ctx_in is provided, but ctx_size_in is zero (same for
  ctx_out/ctx_size_out)
* unexpected non-zero fields in __sk_buff return EINVAL

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-11 10:21:41 +02:00
Stanislav Fomichev
5e903c656b libbpf: add support for ctx_{size, }_{in, out} in BPF_PROG_TEST_RUN
Support recently introduced input/output context for test runs.
We extend only bpf_prog_test_run_xattr. bpf_prog_test_run is
unextendable and left as is.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-11 10:21:41 +02:00
Prashant Bhole
569b0c7773 tools/bpftool: show btf id in program information
Let's add a way to know whether a program has btf context.
Patch adds 'btf_id' in the output of program listing.
When btf_id is present, it means program has btf context.

Sample output:
user@test# bpftool prog list
25: xdp  name xdp_prog1  tag 539ec6ce11b52f98  gpl
	loaded_at 2019-04-10T11:44:20+0900  uid 0
	xlated 488B  not jited  memlock 4096B  map_ids 23
	btf_id 1

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-11 10:21:39 +02:00
Andrey Ignatov
d5adbdd77e libbpf: Fix build with gcc-8
Reported in [1].

With gcc 8.3.0 the following error is issued:

  cc -Ibpf@sta -I. -I.. -I.././include -I.././include/uapi
  -fdiagnostics-color=always -fsanitize=address,undefined -fno-omit-frame-pointer
  -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -g -fPIC -g -O2
  -Werror -Wall -Wno-pointer-arith -Wno-sign-compare  -MD -MQ
  'bpf@sta/src_libbpf.c.o' -MF 'bpf@sta/src_libbpf.c.o.d' -o
  'bpf@sta/src_libbpf.c.o' -c ../src/libbpf.c
  ../src/libbpf.c: In function 'bpf_object__elf_collect':
  ../src/libbpf.c:947:18: error: 'map_def_sz' may be used uninitialized in this
  function [-Werror=maybe-uninitialized]
     if (map_def_sz <= sizeof(struct bpf_map_def)) {
         ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  ../src/libbpf.c:827:18: note: 'map_def_sz' was declared here
    int i, map_idx, map_def_sz, nr_syms, nr_maps = 0, nr_maps_glob = 0;
                    ^~~~~~~~~~

According to [2] -Wmaybe-uninitialized is enabled by -Wall.
Same error is generated by clang's -Wconditional-uninitialized.

[1] https://github.com/libbpf/libbpf/pull/29#issuecomment-481902601
[2] https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html

Fixes: d859900c4c ("bpf, libbpf: support global data/bss/rodata sections")
Reported-by: Evgeny Vereshchagin <evvers@ya.ru>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-11 10:21:38 +02:00
Magnus Karlsson
50bd645b3a libbpf: fix crash in XDP socket part with new larger BPF_LOG_BUF_SIZE
In commit da11b41758 ("libbpf: teach libbpf about log_level bit 2"),
the BPF_LOG_BUF_SIZE was increased to 16M. The XDP socket part of
libbpf allocated the log_buf on the stack, but for the new 16M buffer
size this is not going to work. Change the code so it uses a 16K buffer
instead.

Fixes: da11b41758 ("libbpf: teach libbpf about log_level bit 2")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-10 09:51:50 +02:00
Yonghong Song
69a0f9ecef bpf, bpftool: fix a few ubsan warnings
The issue is reported at https://github.com/libbpf/libbpf/issues/28.

Basically, per C standard, for
  void *memcpy(void *dest, const void *src, size_t n)
if "dest" or "src" is NULL, regardless of whether "n" is 0 or not,
the result of memcpy is undefined. clang ubsan reported three such
instances in bpf.c with the following pattern:
  memcpy(dest, 0, 0).

Although in practice, no known compiler will cause issues when
copy size is 0. Let us still fix the issue to silence ubsan
warnings.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-10 09:46:51 +02:00
Daniel Borkmann
c861168b7c bpf, selftest: add test cases for BTF Var and DataSec
Extend test_btf with various positive and negative tests around
BTF verification of kind Var and DataSec. All passing as well:

  # ./test_btf
  [...]
  BTF raw test[4] (global data test #1): OK
  BTF raw test[5] (global data test #2): OK
  BTF raw test[6] (global data test #3): OK
  BTF raw test[7] (global data test #4, unsupported linkage): OK
  BTF raw test[8] (global data test #5, invalid var type): OK
  BTF raw test[9] (global data test #6, invalid var type (fwd type)): OK
  BTF raw test[10] (global data test #7, invalid var type (fwd type)): OK
  BTF raw test[11] (global data test #8, invalid var size): OK
  BTF raw test[12] (global data test #9, invalid var size): OK
  BTF raw test[13] (global data test #10, invalid var size): OK
  BTF raw test[14] (global data test #11, multiple section members): OK
  BTF raw test[15] (global data test #12, invalid offset): OK
  BTF raw test[16] (global data test #13, invalid offset): OK
  BTF raw test[17] (global data test #14, invalid offset): OK
  BTF raw test[18] (global data test #15, not var kind): OK
  BTF raw test[19] (global data test #16, invalid var referencing sec): OK
  BTF raw test[20] (global data test #17, invalid var referencing var): OK
  BTF raw test[21] (global data test #18, invalid var loop): OK
  BTF raw test[22] (global data test #19, invalid var referencing var): OK
  BTF raw test[23] (global data test #20, invalid ptr referencing var): OK
  BTF raw test[24] (global data test #21, var included in struct): OK
  BTF raw test[25] (global data test #22, array of var): OK
  [...]
  PASS:167 SKIP:0 FAIL:0

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-09 17:05:47 -07:00
Joe Stringer
b915ebe6d9 bpf, selftest: test global data/bss/rodata sections
Add tests for libbpf relocation of static variable references
into the .data, .rodata and .bss sections of the ELF, also add
read-only test for .rodata. All passing:

  # ./test_progs
  [...]
  test_global_data:PASS:load program 0 nsec
  test_global_data:PASS:pass global data run 925 nsec
  test_global_data_number:PASS:relocate .bss reference 925 nsec
  test_global_data_number:PASS:relocate .data reference 925 nsec
  test_global_data_number:PASS:relocate .rodata reference 925 nsec
  test_global_data_number:PASS:relocate .bss reference 925 nsec
  test_global_data_number:PASS:relocate .data reference 925 nsec
  test_global_data_number:PASS:relocate .rodata reference 925 nsec
  test_global_data_number:PASS:relocate .bss reference 925 nsec
  test_global_data_number:PASS:relocate .bss reference 925 nsec
  test_global_data_number:PASS:relocate .rodata reference 925 nsec
  test_global_data_number:PASS:relocate .rodata reference 925 nsec
  test_global_data_number:PASS:relocate .rodata reference 925 nsec
  test_global_data_string:PASS:relocate .rodata reference 925 nsec
  test_global_data_string:PASS:relocate .data reference 925 nsec
  test_global_data_string:PASS:relocate .bss reference 925 nsec
  test_global_data_string:PASS:relocate .data reference 925 nsec
  test_global_data_string:PASS:relocate .bss reference 925 nsec
  test_global_data_struct:PASS:relocate .rodata reference 925 nsec
  test_global_data_struct:PASS:relocate .bss reference 925 nsec
  test_global_data_struct:PASS:relocate .rodata reference 925 nsec
  test_global_data_struct:PASS:relocate .data reference 925 nsec
  test_global_data_rdonly:PASS:test .rodata read-only map 925 nsec
  [...]
  Summary: 229 PASSED, 0 FAILED

Note map helper signatures have been changed to avoid warnings
when passing in const data.

Joint work with Daniel Borkmann.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-09 17:05:47 -07:00
Daniel Borkmann
fb2abb73e5 bpf, selftest: test {rd, wr}only flags and direct value access
Extend test_verifier with various test cases around the two kernel
extensions, that is, {rd,wr}only map support as well as direct map
value access. All passing, one skipped due to xskmap not present
on test machine:

  # ./test_verifier
  [...]
  #948/p XDP pkt read, pkt_meta' <= pkt_data, bad access 1 OK
  #949/p XDP pkt read, pkt_meta' <= pkt_data, bad access 2 OK
  #950/p XDP pkt read, pkt_data <= pkt_meta', good access OK
  #951/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK
  #952/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK
  Summary: 1410 PASSED, 1 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-09 17:05:47 -07:00
Daniel Borkmann
817998afa0 bpf: bpftool support for dumping data/bss/rodata sections
Add the ability to bpftool to handle BTF Var and DataSec kinds
in order to dump them out of btf_dumper_type(). The value has a
single object with the section name, which itself holds an array
of variables it dumps. A single variable is an object by itself
printed along with its name. From there further type information
is dumped along with corresponding value information.

Example output from .rodata:

  # ./bpftool m d i 150
  [{
          "value": {
              ".rodata": [{
                      "load_static_data.bar": 18446744073709551615
                  },{
                      "num2": 24
                  },{
                      "num5": 43947
                  },{
                      "num6": 171
                  },{
                      "str0": [97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,0,0,0,0,0,0
                      ]
                  },{
                      "struct0": {
                          "a": 42,
                          "b": 4278120431,
                          "c": 1229782938247303441
                      }
                  },{
                      "struct2": {
                          "a": 0,
                          "b": 0,
                          "c": 0
                      }
                  }
              ]
          }
      }
  ]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-09 17:05:47 -07:00
Daniel Borkmann
1713d68b3b bpf, libbpf: add support for BTF Var and DataSec
This adds libbpf support for BTF Var and DataSec kinds. Main point
here is that libbpf needs to do some preparatory work before the
whole BTF object can be loaded into the kernel, that is, fixing up
of DataSec size taken from the ELF section size and non-static
variable offset which needs to be taken from the ELF's string section.

Upstream LLVM doesn't fix these up since at time of BTF emission
it is too early in the compilation process thus this information
isn't available yet, hence loader needs to take care of it.

Note, deduplication handling has not been in the scope of this work
and needs to be addressed in a future commit.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://reviews.llvm.org/D59441
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-09 17:05:47 -07:00
Daniel Borkmann
d859900c4c bpf, libbpf: support global data/bss/rodata sections
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.

Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:

 1) In bpf_object__elf_collect() step we pick up ".data",
    ".rodata", and ".bss" section information.

 2) If present, in bpf_object__init_internal_map() we add
    maps to the obj's map array that corresponds to each
    of the present sections. Given section size and access
    properties can differ, a single entry array map is
    created with value size that is corresponding to the
    ELF section size of .data, .bss or .rodata. These
    internal maps are integrated into the normal map
    handling of libbpf such that when user traverses all
    obj maps, they can be differentiated from user-created
    ones via bpf_map__is_internal(). In later steps when
    we actually create these maps in the kernel via
    bpf_object__create_maps(), then for .data and .rodata
    sections their content is copied into the map through
    bpf_map_update_elem(). For .bss this is not necessary
    since array map is already zero-initialized by default.
    Additionally, for .rodata the map is frozen as read-only
    after setup, such that neither from program nor syscall
    side writes would be possible.

 3) In bpf_program__collect_reloc() step, we record the
    corresponding map, insn index, and relocation type for
    the global data.

 4) And last but not least in the actual relocation step in
    bpf_program__relocate(), we mark the ldimm64 instruction
    with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
    imm field the map's file descriptor is stored as similarly
    done as in BPF_PSEUDO_MAP_FD, and in the second imm field
    (as ldimm64 is 2-insn wide) we store the access offset
    into the section. Given these maps have only single element
    ldimm64's off remains zero in both parts.

 5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
    load will then store the actual target address in order
    to have a 'map-lookup'-free access. That is, the actual
    map value base address + offset. The destination register
    in the verifier will then be marked as PTR_TO_MAP_VALUE,
    containing the fixed offset as reg->off and backing BPF
    map as reg->map_ptr. Meaning, it's treated as any other
    normal map value from verification side, only with
    efficient, direct value access instead of actual call to
    map lookup helper as in the typical case.

Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.

Simple example dump of program using globals vars in each
section:

  # bpftool prog
  [...]
  6784: sched_cls  name load_static_dat  tag a7e1291567277844  gpl
        loaded_at 2019-03-11T15:39:34+0000  uid 0
        xlated 1776B  jited 993B  memlock 4096B  map_ids 2238,2237,2235,2236,2239,2240

  # bpftool map show id 2237
  2237: array  name test_glo.bss  flags 0x0
        key 4B  value 64B  max_entries 1  memlock 4096B
  # bpftool map show id 2235
  2235: array  name test_glo.data  flags 0x0
        key 4B  value 64B  max_entries 1  memlock 4096B
  # bpftool map show id 2236
  2236: array  name test_glo.rodata  flags 0x80
        key 4B  value 96B  max_entries 1  memlock 4096B

  # bpftool prog dump xlated id 6784
  int load_static_data(struct __sk_buff * skb):
  ; int load_static_data(struct __sk_buff *skb)
     0: (b7) r6 = 0
  ; test_reloc(number, 0, &num0);
     1: (63) *(u32 *)(r10 -4) = r6
     2: (bf) r2 = r10
  ; int load_static_data(struct __sk_buff *skb)
     3: (07) r2 += -4
  ; test_reloc(number, 0, &num0);
     4: (18) r1 = map[id:2238]
     6: (18) r3 = map[id:2237][0]+0    <-- direct addr in .bss area
     8: (b7) r4 = 0
     9: (85) call array_map_update_elem#100464
    10: (b7) r1 = 1
  ; test_reloc(number, 1, &num1);
  [...]
  ; test_reloc(string, 2, str2);
   120: (18) r8 = map[id:2237][0]+16   <-- same here at offset +16
   122: (18) r1 = map[id:2239]
   124: (18) r3 = map[id:2237][0]+16
   126: (b7) r4 = 0
   127: (85) call array_map_update_elem#100464
   128: (b7) r1 = 120
  ; str1[5] = 'x';
   129: (73) *(u8 *)(r9 +5) = r1
  ; test_reloc(string, 3, str1);
   130: (b7) r1 = 3
   131: (63) *(u32 *)(r10 -4) = r1
   132: (b7) r9 = 3
   133: (bf) r2 = r10
  ; int load_static_data(struct __sk_buff *skb)
   134: (07) r2 += -4
  ; test_reloc(string, 3, str1);
   135: (18) r1 = map[id:2239]
   137: (18) r3 = map[id:2235][0]+16   <-- direct addr in .data area
   139: (b7) r4 = 0
   140: (85) call array_map_update_elem#100464
   141: (b7) r1 = 111
  ; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
   142: (73) *(u8 *)(r8 +6) = r1       <-- further access based on .bss data
   143: (b7) r1 = 108
   144: (73) *(u8 *)(r8 +5) = r1
  [...]

For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].

Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").

  [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
      http://vger.kernel.org/lpc-bpf2018.html#session-3

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-09 17:05:47 -07:00
Joe Stringer
f8c7a4d4dc bpf, libbpf: refactor relocation handling
Adjust the code for relocations slightly with no functional changes,
so that upcoming patches that will introduce support for relocations
into the .data, .rodata and .bss sections can be added independent
of these changes.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-09 17:05:47 -07:00
Daniel Borkmann
c83fef6bc5 bpf: sync {btf, bpf}.h uapi header from tools infrastructure
Pull in latest changes from both headers, so we can make use of
them in libbpf.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-09 17:05:47 -07:00
Daniel Borkmann
d8eca5bbb2 bpf: implement lookup-free direct value access for maps
This generic extension to BPF maps allows for directly loading
an address residing inside a BPF map value as a single BPF
ldimm64 instruction!

The idea is similar to what BPF_PSEUDO_MAP_FD does today, which
is a special src_reg flag for ldimm64 instruction that indicates
that inside the first part of the double insns's imm field is a
file descriptor which the verifier then replaces as a full 64bit
address of the map into both imm parts. For the newly added
BPF_PSEUDO_MAP_VALUE src_reg flag, the idea is the following:
the first part of the double insns's imm field is again a file
descriptor corresponding to the map, and the second part of the
imm field is an offset into the value. The verifier will then
replace both imm parts with an address that points into the BPF
map value at the given value offset for maps that support this
operation. Currently supported is array map with single entry.
It is possible to support more than just single map element by
reusing both 16bit off fields of the insns as a map index, so
full array map lookup could be expressed that way. It hasn't
been implemented here due to lack of concrete use case, but
could easily be done so in future in a compatible way, since
both off fields right now have to be 0 and would correctly
denote a map index 0.

The BPF_PSEUDO_MAP_VALUE is a distinct flag as otherwise with
BPF_PSEUDO_MAP_FD we could not differ offset 0 between load of
map pointer versus load of map's value at offset 0, and changing
BPF_PSEUDO_MAP_FD's encoding into off by one to differ between
regular map pointer and map value pointer would add unnecessary
complexity and increases barrier for debugability thus less
suitable. Using the second part of the imm field as an offset
into the value does /not/ come with limitations since maximum
possible value size is in u32 universe anyway.

This optimization allows for efficiently retrieving an address
to a map value memory area without having to issue a helper call
which needs to prepare registers according to calling convention,
etc, without needing the extra NULL test, and without having to
add the offset in an additional instruction to the value base
pointer. The verifier then treats the destination register as
PTR_TO_MAP_VALUE with constant reg->off from the user passed
offset from the second imm field, and guarantees that this is
within bounds of the map value. Any subsequent operations are
normally treated as typical map value handling without anything
extra needed from verification side.

The two map operations for direct value access have been added to
array map for now. In future other types could be supported as
well depending on the use case. The main use case for this commit
is to allow for BPF loader support for global variables that
reside in .data/.rodata/.bss sections such that we can directly
load the address of them with minimal additional infrastructure
required. Loader support has been added in subsequent commits for
libbpf library.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-09 17:05:46 -07:00
David S. Miller
310655b07a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-04-08 23:39:36 -07:00
Linus Torvalds
869e3305f2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Off by one and bounds checking fixes in NFC, from Dan Carpenter.

 2) There have been many weird regressions in r8169 since we turned ASPM
    support on, some are still not understood nor completely resolved.
    Let's turn this back off for now. From Heiner Kallweit.

 3) Signess fixes for ethtool speed value handling, from Michael
    Zhivich.

 4) Handle timestamps properly in macb driver, from Paul Thomas.

 5) Two erspan fixes, it's the usual "skb ->data potentially reallocated
    and we're holding a stale protocol header pointer". From Lorenzo
    Bianconi.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  bnxt_en: Reset device on RX buffer errors.
  bnxt_en: Improve RX consumer index validity check.
  net: macb driver, check for SKBTX_HW_TSTAMP
  qlogic: qlcnic: fix use of SPEED_UNKNOWN ethtool constant
  broadcom: tg3: fix use of SPEED_UNKNOWN ethtool constant
  ethtool: avoid signed-unsigned comparison in ethtool_validate_speed()
  net: ip6_gre: fix possible use-after-free in ip6erspan_rcv
  net: ip_gre: fix possible use-after-free in erspan_rcv
  r8169: disable ASPM again
  MAINTAINERS: ieee802154: update documentation file pattern
  net: vrf: Fix ping failed when vrf mtu is set to 0
  selftests: add a tc matchall test case
  nfc: nci: Potential off by one in ->pipes[] array
  NFC: nci: Add some bounds checking in nci_hci_cmd_received()
2019-04-08 17:10:46 -10:00
Tadeusz Struk
6da70580af selftests/tpm2: Open tpm dev in unbuffered mode
In order to have control over how many bytes are read or written
the device needs to be opened in unbuffered mode.

Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
2019-04-08 15:58:55 -07:00
Tadeusz Struk
f1a0ba6ccc selftests/tpm2: Extend tests to cover partial reads
Three new tests added:
1. Send get random cmd, read header in 1st read, read the rest in second
   read - expect success
2. Send get random cmd, read only part of the response, send another
   get random command, read the response - expect success
3. Send get random cmd followed by another get random cmd, without
   reading the first response - expect the second cmd to fail with -EBUSY

Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
2019-04-08 15:58:55 -07:00
David Ahern
228ddb3315 selftests: fib_tests: Add tests for ipv6 gateway with ipv4 route
Add tests for ipv6 gateway with ipv4 route. Tests include basic
single path with ping to verify connectivity and multipath.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-08 15:22:41 -07:00
Florian Westphal
6978cdb129 kselftests: extend nft_nat with inet family based nat hooks
With older nft versions, this will cause:
[..]
PASS: ipv6 ping to ns1 was ip6 NATted to ns2
/dev/stdin:4:30-31: Error: syntax error, unexpected to, expecting newline or semicolon
                ip daddr 10.0.1.99 dnat ip to 10.0.2.99
                                           ^^
SKIP: inet nat tests
PASS: ip IP masquerade for ns2
[..]

as there is currently no way to detect if nft will be able to parse
the inet format.

redirect and masquerade tests need to be skipped in this case for inet
too because nft userspace has overzealous family check and rejects their
use in the inet family.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08 23:03:04 +02:00
Nicolas Dichtel
b959ecf8f9 selftests: add a tc matchall test case
This is a follow up of the commit 0db6f8befc ("net/sched: fix ->get
helper of the matchall cls").

To test it:
$ cd tools/testing/selftests/tc-testing
$ ln -s ../plugin-lib/nsPlugin.py plugins/20-nsPlugin.py
$ ./tdc.py -n -e 2638

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-07 19:31:49 -07:00
Andrey Ignatov
ff466b5805 libbpf: Ignore -Wformat-nonliteral warning
vsprintf() in __base_pr() uses nonliteral format string and it breaks
compilation for those who provide corresponding extra CFLAGS, e.g.:
https://github.com/libbpf/libbpf/issues/27

If libbpf is built with the flags from PR:

  libbpf.c:68:26: error: format string is not a string literal
  [-Werror,-Wformat-nonliteral]
          return vfprintf(stderr, format, args);
                                  ^~~~~~
  1 error generated.

Ignore this warning since the use case in libbpf.c is legit.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-04-06 23:13:54 -07:00
Nikolay Aleksandrov
f1054c65bc selftests: forwarding: test for bridge mcast traffic after report and leave
This test is split in two, the first part checks if a report creates a
corresponding mdb entry and if traffic is properly forwarded to it, and
the second part checks if the mdb entry is deleted after a leave and
if traffic is *not* forwarded to it. Since the mcast querier is enabled
we should see standard mcast snooping bridge behaviour.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-06 18:30:04 -07:00
David S. Miller
f83f715195 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor comment merge conflict in mlx5.

Staging driver has a fixup due to the skb->xmit_more changes
in 'net-next', but was removed in 'net'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-05 14:14:19 -07:00
Andrey Ignatov
07f9196241 selftests/bpf: Test unbounded var_off stack access
Test the case when reg->smax_value is too small/big and can overflow,
and separately min and max values outside of stack bounds.

Example of output:
  # ./test_verifier
  #856/p indirect variable-offset stack access, unbounded OK
  #857/p indirect variable-offset stack access, max out of bound OK
  #858/p indirect variable-offset stack access, min out of bound OK

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-05 16:50:08 +02:00
Andrey Ignatov
2c6927dbdc selftests/bpf: Test indirect var_off stack access in unpriv mode
Test that verifier rejects indirect stack access with variable offset in
unprivileged mode and accepts same code in privileged mode.

Since pointer arithmetics is prohibited in unprivileged mode verifier
should reject the program even before it gets to helper call that uses
variable offset, at the time when that variable offset is trying to be
constructed.

Example of output:
  # ./test_verifier
  ...
  #859/u indirect variable-offset stack access, priv vs unpriv OK
  #859/p indirect variable-offset stack access, priv vs unpriv OK

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-05 16:50:08 +02:00
Andrey Ignatov
f68a5b4464 selftests/bpf: Test indirect var_off stack access in raw mode
Test that verifier rejects indirect access to uninitialized stack with
variable offset.

Example of output:
  # ./test_verifier
  ...
  #859/p indirect variable-offset stack access, uninitialized OK

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-05 16:50:07 +02:00
Linus Torvalds
0548740e53 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Several hash table refcount fixes in batman-adv, from Sven
    Eckelmann.

 2) Use after free in bpf_evict_inode(), from Daniel Borkmann.

 3) Fix mdio bus registration in ixgbe, from Ivan Vecera.

 4) Unbounded loop in __skb_try_recv_datagram(), from Paolo Abeni.

 5) ila rhashtable corruption fix from Herbert Xu.

 6) Don't allow upper-devices to be added to vrf devices, from Sabrina
    Dubroca.

 7) Add qmi_wwan device ID for Olicard 600, from Bjørn Mork.

 8) Don't leave skb->next poisoned in __netif_receive_skb_list_ptype,
    from Alexander Lobakin.

 9) Missing IDR checks in mlx5 driver, from Aditya Pakki.

10) Fix false connection termination in ktls, from Jakub Kicinski.

11) Work around some ASPM issues with r8169 by disabling rx interrupt
    coalescing on certain chips. From Heiner Kallweit.

12) Properly use per-cpu qstat values on NOLOCK qdiscs, from Paolo
    Abeni.

13) Fully initialize sockaddr_in structures in SCTP, from Xin Long.

14) Various BPF flow dissector fixes from Stanislav Fomichev.

15) Divide by zero in act_sample, from Davide Caratti.

16) Fix bridging multicast regression introduced by rhashtable
    conversion, from Nikolay Aleksandrov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
  ibmvnic: Fix completion structure initialization
  ipv6: sit: reset ip header pointer in ipip6_rcv
  net: bridge: always clear mcast matching struct on reports and leaves
  libcxgb: fix incorrect ppmax calculation
  vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2x
  sch_cake: Make sure we can write the IP header before changing DSCP bits
  sch_cake: Use tc_skb_protocol() helper for getting packet protocol
  tcp: Ensure DCTCP reacts to losses
  net/sched: act_sample: fix divide by zero in the traffic path
  net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop
  net: hns: Fix sparse: some warnings in HNS drivers
  net: hns: Fix WARNING when remove HNS driver with SMMU enabled
  net: hns: fix ICMP6 neighbor solicitation messages discard problem
  net: hns: Fix probabilistic memory overwrite when HNS driver initialized
  net: hns: Use NAPI_POLL_WEIGHT for hns driver
  net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()
  flow_dissector: rst'ify documentation
  ipv6: Fix dangling pointer when ipv6 fragment
  net-gro: Fix GRO flush when receiving a GSO packet.
  flow_dissector: document BPF flow dissector environment
  ...
2019-04-04 18:07:12 -10:00
David S. Miller
5ba5780117 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2019-04-04

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Batch of fixes to the existing BPF flow dissector API to support
   calling BPF programs from the eth_get_headlen context (support for
   latter is planned to be added in bpf-next), from Stanislav.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04 13:30:55 -07:00
Rafael J. Wysocki
58b0cf8e24 Merge branch 'pm-tools'
* pm-tools:
  tools/power turbostat: update version number
  tools/power turbostat: Warn on bad ACPI LPIT data
  tools/power turbostat: Add checks for failure of fgets() and fscanf()
  tools/power turbostat: Also read package power on AMD F17h (Zen)
  tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL
  tools/power turbostat: Do not display an error on systems without a cpufreq driver
  tools/power turbostat: Add Die column
  tools/power turbostat: Add Icelake support
  tools/power turbostat: Cleanup CNL-specific code
  tools/power turbostat: Cleanup CC3-skip code
  tools/power turbostat: Restore ability to execute in topology-order
2019-04-04 21:57:45 +02:00
Davide Caratti
fae2708174 net/sched: act_sample: fix divide by zero in the traffic path
the control path of 'sample' action does not validate the value of 'rate'
provided by the user, but then it uses it as divisor in the traffic path.
Validate it in tcf_sample_init(), and return -EINVAL with a proper extack
message in case that value is zero, to fix a splat with the script below:

 # tc f a dev test0 egress matchall action sample rate 0 group 1 index 2
 # tc -s a s action sample
 total acts 1

         action order 0: sample rate 1/0 group 1 pipe
          index 2 ref 1 bind 1 installed 19 sec used 19 sec
         Action statistics:
         Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
         backlog 0b 0p requeues 0
 # ping 192.0.2.1 -I test0 -c1 -q

 divide error: 0000 [#1] SMP PTI
 CPU: 1 PID: 6192 Comm: ping Not tainted 5.1.0-rc2.diag2+ #591
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:tcf_sample_act+0x9e/0x1e0 [act_sample]
 Code: 6a f1 85 c0 74 0d 80 3d 83 1a 00 00 00 0f 84 9c 00 00 00 4d 85 e4 0f 84 85 00 00 00 e8 9b d7 9c f1 44 8b 8b e0 00 00 00 31 d2 <41> f7 f1 85 d2 75 70 f6 85 83 00 00 00 10 48 8b 45 10 8b 88 08 01
 RSP: 0018:ffffae320190ba30 EFLAGS: 00010246
 RAX: 00000000b0677d21 RBX: ffff8af1ed9ec000 RCX: 0000000059a9fe49
 RDX: 0000000000000000 RSI: 000000000c7e33b7 RDI: ffff8af23daa0af0
 RBP: ffff8af1ee11b200 R08: 0000000074fcaf7e R09: 0000000000000000
 R10: 0000000000000050 R11: ffffffffb3088680 R12: ffff8af232307f80
 R13: 0000000000000003 R14: ffff8af1ed9ec000 R15: 0000000000000000
 FS:  00007fe9c6d2f740(0000) GS:ffff8af23da80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fff6772f000 CR3: 00000000746a2004 CR4: 00000000001606e0
 Call Trace:
  tcf_action_exec+0x7c/0x1c0
  tcf_classify+0x57/0x160
  __dev_queue_xmit+0x3dc/0xd10
  ip_finish_output2+0x257/0x6d0
  ip_output+0x75/0x280
  ip_send_skb+0x15/0x40
  raw_sendmsg+0xae3/0x1410
  sock_sendmsg+0x36/0x40
  __sys_sendto+0x10e/0x140
  __x64_sys_sendto+0x24/0x30
  do_syscall_64+0x60/0x210
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [...]
  Kernel panic - not syncing: Fatal exception in interrupt

Add a TDC selftest to document that 'rate' is now being validated.

Reported-by: Matteo Croce <mcroce@redhat.com>
Fixes: 5c5670fae4 ("net/sched: Introduce sample tc action")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Yotam Gigi <yotam.gi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-04 10:46:33 -07:00
Daniel T. Lee
e67b2c7154 samples, selftests/bpf: add NULL check for ksym_search
Since, ksym_search added with verification logic for symbols existence,
it could return NULL when the kernel symbols are not loaded.

This commit will add NULL check logic after ksym_search.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-04 16:43:47 +02:00
Daniel T. Lee
0979ff7992 selftests/bpf: ksym_search won't check symbols exists
Currently, ksym_search located at trace_helpers won't check symbols are
existing or not.

In ksym_search, when symbol is not found, it will return &syms[0](_stext).
But when the kernel symbols are not loaded, it will return NULL, which is
not a desired action.

This commit will add verification logic whether symbols are loaded prior
to the symbol search.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-04 16:43:46 +02:00
Alexei Starovoitov
8aa2d4b4b9 selftests/bpf: synthetic tests to push verifier limits
Add a test to generate 1m ld_imm64 insns to stress the verifier.

Bump the size of fill_ld_abs_vlan_push_pop test from 4k to 29k
and jump_around_ld_abs from 4k to 5.5k.
Larger sizes are not possible due to 16-bit offset encoding
in jump instructions.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-04 01:27:38 +02:00
Alexei Starovoitov
e5e7a8f2d8 selftests/bpf: add few verifier scale tests
Add 3 basic tests that stress verifier scalability.

test_verif_scale1.c calls non-inlined jhash() function 90 times on
different position in the packet.
This test simulates network packet parsing.
jhash function is ~140 instructions and main program is ~1200 insns.

test_verif_scale2.c force inlines jhash() function 90 times.
This program is ~15k instructions long.

test_verif_scale3.c calls non-inlined jhash() function 90 times on
But this time jhash has to process 32-bytes from the packet
instead of 14-bytes in tests 1 and 2.
jhash function is ~230 insns and main program is ~1200 insns.

$ test_progs -s
can be used to see verifier stats.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-04 01:27:38 +02:00
Alexei Starovoitov
da11b41758 libbpf: teach libbpf about log_level bit 2
Allow bpf_prog_load_xattr() to specify log_level for program loading.

Teach libbpf to accept log_level with bit 2 set.

Increase default BPF_LOG_BUF_SIZE from 256k to 16M.
There is no downside to increase it to a maximum allowed by old kernels.
Existing 256k limit caused ENOSPC errors and users were not able to see
verifier error which is printed at the end of the verifier log.

If ENOSPC is hit, double the verifier log and try again to capture
the verifier error.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-04 01:27:38 +02:00
Stanislav Fomichev
822fe61795 net/flow_dissector: pass flow_keys->n_proto to BPF programs
This is a preparation for the next commit that would prohibit access to
the most fields of __sk_buff from the BPF programs.

Instead of requiring BPF flow dissector programs to look into skb,
pass all input data in the flow_keys.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-03 16:49:48 +02:00
Stanislav Fomichev
2c3af7d901 selftests/bpf: fix vlan handling in flow dissector program
When we tail call PROG(VLAN) from parse_eth_proto we don't need to peek
back to handle vlan proto because we didn't adjust nhoff/thoff yet. Use
flow_keys->n_proto, that we set in parse_eth_proto instead and
properly increment nhoff as well.

Also, always use skb->protocol and don't look at skb->vlan_present.
skb->vlan_present indicates that vlan information is stored out-of-band
in skb->vlan_{tci,proto} and vlan header is already pulled from skb.
That means, skb->vlan_present == true is not relevant for BPF flow
dissector.

Add simple test cases with VLAN tagged frames:
  * single vlan for ipv4
  * double vlan for ipv6

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-03 16:49:48 +02:00
Stanislav Fomichev
7596aa3ea8 selftests: bpf: remove duplicate .flags initialization in ctx_skb.c
verifier/ctx_skb.c:708:11: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
        .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-04-02 23:17:18 +02:00