Commit Graph

812468 Commits

Author SHA1 Message Date
Jiong Wang
ac7a1717a2 nfp: bpf: complete ALU32 logic shift supports
The following ALU32 logic shift supports are missing:

  BPF_ALU | BPF_LSH | BPF_X
  BPF_ALU | BPF_RSH | BPF_X
  BPF_ALU | BPF_RSH | BPF_K

For BPF_RSH | BPF_K, it could be implemented using NFP direct shift
instruction. For the other BPF_X shifts, NFP indirect shifts sequences need
to be used.

Separate code-gen hook is assigned to each instruction to make the
implementation clear.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-01 18:03:49 -08:00
Jiong Wang
db0a4b3b6b nfp: bpf: correct the behavior for shifts by zero
Shifts by zero do nothing, and should be treated as nops.

Even though compiler is not supposed to generate such instructions and
manual written assembly is unlikely to have them, but they are legal
instructions and have defined behavior.

This patch correct existing shifts code-gen to make sure they do nothing
when shift amount is zero except when the instruction is ALU32 for which
high bits need to be cleared.

For shift amount bigger than type size, already, NFP JIT back-end errors
out for immediate shift and only low 5 bits will be taken into account for
indirect shift which is the same as x86.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-01 18:03:49 -08:00
Stanislav Fomichev
2a11815409 selftests/bpf: remove generated verifier/tests.h on 'make clean'
'make clean' is supposed to remove generated files.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-01 15:52:53 -08:00
David S. Miller
d6b0a01faa Merge branch 'devlink-add-device-driver-information-API'
Jakub Kicinski says:

====================
devlink: add device (driver) information API

fw_version field in ethtool -i does not suit modern needs with 31
characters being quite limiting on more complex systems.  There is
also no distinction between the running and flashed versions of
the firmware.

Since the driver information pertains to the entire device, rather
than a particular netdev, it seems wise to move it do devlink, at
the same time fixing the aforementioned issues.

The new API allows exposing the device serial number and versions
of the components of the card - both hardware, firmware (running
and flashed).  Driver authors can choose descriptive identifiers
for the version fields.  A few version identifiers which seemed
relevant for most devices have been added to the global devlink
header.

Example:
$ devlink dev info pci/0000:05:00.0
pci/0000:05:00.0:
  driver nfp
  serial_number 16240145
  versions:
    fixed:
      board.id AMDA0099-0001
      board.rev 07
      board.vendor SMA
      board.model carbon
    running:
      fw.mgmt: 010156.010156.010156
      fw.cpld: 0x44
      fw.app: sriov-2.1.16
    stored:
      fw.mgmt: 010158.010158.010158
      fw.cpld: 0x44
      fw.app: sriov-2.1.20

Last patch also includes a compat code for ethtool.  If driver
reports no fw_version via the traditional ethtool API, ethtool
can call into devlink and try to cram as many versions as possible
into the 31 characters.

v4:
 - use IS_REACHABLE instead of IS_ENABLED in last patch.

v3 (Jiri):
 - rename various functions and attributes;
 - break out the version helpers per-type;
 - make the compat code parse a dump instead of special casing
   in each helper;
 - move generic version defines to a separate patch.

v2:
 - rebase.

this non-RFC, v3 some would say:
 - add three more versions in the NFP patches;
 - add last patch (ethool compat) - Andrew & Michal.

RFCv2:
 - use one driver op;
 - allow longer serial number;
 - wrap the skb into an opaque request struct;
 - add some common identifier into the devlink header.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:30:31 -08:00
Jakub Kicinski
ddb6e99e2d ethtool: add compat for devlink info
If driver did not fill the fw_version field, try to call into
the new devlink get_info op and collect the versions that way.
We assume ethtool was always reporting running versions.

v4:
 - use IS_REACHABLE() to avoid problems with DEVLINK=m (kbuildbot).
v3 (Jiri):
 - do a dump and then parse it instead of special handling;
 - concatenate all versions (well, all that fit :)).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:30:31 -08:00
Jakub Kicinski
7c908f467d nfp: devlink: report the running and flashed versions
Report versions of firmware components using the new NSP command.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:30:31 -08:00
Jakub Kicinski
b96588400a nfp: nsp: add support for versions command
Retrieve the FW versions with the new command.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:30:31 -08:00
Jakub Kicinski
937a3e2645 nfp: devlink: report fixed versions
Report information about the hardware.

RFCv2:
 - add defines for board IDs which are likely to be reusable for
   other drivers (Jiri).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:30:31 -08:00
Jakub Kicinski
4adba00839 nfp: devlink: report driver name and serial number
Report the basic info through new devlink info API.

RFCv2:
 - add driver name;
 - align serial to core changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:30:30 -08:00
Jakub Kicinski
785bd550c4 devlink: add generic info version names
Add defines and docs for generic info versions.

v3:
 - add docs;
 - separate patch (Jiri).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:30:30 -08:00
Jakub Kicinski
fc6fae7dd9 devlink: add version reporting to devlink info API
ethtool -i has a few fixed-size fields which can be used to report
firmware version and expansion ROM version. Unfortunately, modern
hardware has more firmware components. There is usually some
datapath microcode, management controller, PXE drivers, and a
CPLD load. Running ethtool -i on modern controllers reveals the
fact that vendors cram multiple values into firmware version field.

Here are some examples from systems I could lay my hands on quickly:

tg3:  "FFV20.2.17 bc 5720-v1.39"
i40e: "6.01 0x800034a4 1.1747.0"
nfp:  "0.0.3.5 0.25 sriov-2.1.16 nic"

Add a new devlink API to allow retrieving multiple versions, and
provide user-readable name for those versions.

While at it break down the versions into three categories:
 - fixed - this is the board/fixed component version, usually vendors
           report information like the board version in the PCI VPD,
           but it will benefit from naming and common API as well;
 - running - this is the running firmware version;
 - stored - this is firmware in the flash, after firmware update
            this value will reflect the flashed version, while the
            running version may only be updated after reboot.

v3:
 - add per-type helpers instead of using the special argument (Jiri).
RFCv2:
 - remove the nesting in attr DEVLINK_ATTR_INFO_VERSIONS (now
   versions are mixed with other info attrs)l
 - have the driver report versions from the same callback as
   other info.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:30:30 -08:00
Jakub Kicinski
f9cf22882c devlink: add device information API
ethtool -i has served us well for a long time, but its showing
its limitations more and more. The device information should
also be reported per device not per-netdev.

Lay foundation for a simple devlink-based way of reading device
info. Add driver name and device serial number as initial pieces
of information exposed via this new API.

v3:
 - rename helpers (Jiri);
 - rename driver name attr (Jiri);
 - remove double spacing in commit message (Jiri).
RFC v2:
 - wrap the skb into an opaque structure (Jiri);
 - allow the serial number of be any length (Jiri & Andrew);
 - add driver name (Jonathan).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:30:30 -08:00
David S. Miller
26281e2c83 Merge branch 'selftests-Various-fixes'
Petr Machata says:

====================
selftests: Various fixes

This patch set contains various fixes whose common denominator is
improving quality of forwarding and mlxsw selftests.

Most of the fixes are improvements in determinism (such that timing and
latency don't impact the test performance). These were prompted by
regular runs of the test suite on a hardware emulator, the performance
of which is necessarily lower than that of the real device.

Patches #1 (from Ido), #2 and #3 make changes to ping limits.

Patches #4 and #5 add more sleep in places where things need more time
to finish.

Patches #6 and #7 fix two tests in the suite of mirror-to-gretap tests
where underlay involves a VLAN device over an 802.1q bridge.

Patches #8, #9 and #10 fix bugs in mirror-to-gretap test where underlay
involves a LAG device.

Patch #11 fixes a missed RET initialization in mirror-to-gretap flower
test.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:26:37 -08:00
Petr Machata
084fafe9ef selftests: forwarding: mirror_gre_flower: Fix test result handling
The global variable RET needs to be initialized before each call to
log_test. This test case sets it once before running the tests, but then
calls log_tests for every individual test. Thus a failure in one of the
tests causes spurious failures in follow-up tests as well.

Fix by moving the initialization of RET from test_all() to
full_test_span_gre_dir_acl(), a function that implements the test.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:26:37 -08:00
Petr Machata
2243cad9ff selftests: forwarding: mirror_gre_bridge_1q_lag: Ignore ARP
This test sets up mirroring such that it mirrors all overlay traffic.
That includes ARP, which causes occasional miscounts and spurious
failures. Ignore ARP explicitly to avoid these problems.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:26:37 -08:00
Petr Machata
ba22b65edc selftests: forwarding: mirror_gre_bridge_1q_lag: Enable forwarding
This test relies on routing in the primary traffic path, but neglects to
enable forwarding. Do so.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:26:36 -08:00
Petr Machata
a99dd629e8 selftests: forwarding: mirror_gre_bridge_1q_lag: Flush neighbors
After one LAG slave is downed and another upped, it takes a while for
the neighbor on a bridge to time out and get renegotiated. The test does
prompt update of FDB entries by arpinging. But because the neighbor
still references another address, offloading is not possible, and some
packets may end up not being mirrored.

To force the neighbor renegotiation, simply flush the neighbor table at
the bridge.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:26:36 -08:00
Petr Machata
ccdb66dd2f selftests: forwarding: mirror_gre_vlan_bridge_1q: Fix roaming test
ARP or ND traffic can cause spurious migration of FDB back to $swp3.
Mirroring is then updated in accordance with the change, and mirrored
packets are seen at h3, causing a failure.

Detect the case of this spurious roaming, and retry the test.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:26:36 -08:00
Petr Machata
35036b0b09 selftests: forwarding: mirror_gre_vlan_bridge_1q: Fix untagged test
The untagged egress test sets up mirroring to {,ip6}gretap such that the
underlay goes through a bridge. Then VLAN flags are manipulated to test
that the traffic leaves the bridge 802.1q-tagged or not, as appropriate.

However, when a neighbor expires at the time that the bridge VLAN is
configured as PVID and egress untagged, the following discovery process
can't finish, because the IP address on H3 is still at the VLAN-tagged
netdevice. This manifests by occasional failures where only several of
the 10 required packets get through.

Therefore, when reconfiguring the VLAN flags, move the IP address to the
appropriate device in the H3 VRF.

In addition to that, take this opportunity to embed an ASCII art diagram
to make the topology move obvious.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:26:36 -08:00
Petr Machata
db2c5bfcdf selftests: forwarding: mirror_lib: Wait for tardy mirrored packets
When running in an environment with poor performance (such as a
simulator), processing mirrored packets can take a while. Evaluating the
condition too soon leads to spurious "seen 9, expected 10" failures as
the last packet doesn't have enough time to get mirrored and the mirror
to arrive and bump the observed counters.

Wait for one ping interval before evaluating the test.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:26:36 -08:00
Petr Machata
3dc178a9ef selftests: forwarding: mirror_gre_changes: Fix TTL test
When running in a simulator, the TTL change takes a while to settle and
during this time the performance of the packet processing is lowered.
The resulting instability leads to ping sending more packets as it
assumes some have been dropped. This then leads to regular spurious
failures as more packets than expected are observed.

Sleep a bit to give the system time to stabilize.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:26:36 -08:00
Petr Machata
f3b05bb819 selftests: mlxsw: Update ping limits
The current ping intervals are too short for running mirroring tests in
simulator. This leads to ping sending a follow-up ping before the reply
arrives, thus sending more than the requested 10 ICMP requests. This
traffic is seen at the counters, and causes spurious failures.

Bump interval and timeout numbers 5x in mirroring tests to address the
spurious failures.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:26:36 -08:00
Petr Machata
0175cb5922 selftests: forwarding: mirror_lib: Update ping limits
The current ping intervals are too short for running mirroring tests in
simulator. This leads to ping sending a follow-up ping before the reply
arrives, thus sending more than the requested 10 ICMP requests. Those
are mirrored, and over a certain threshold the test case run is
considered a failure, because too much traffic is observed.

Bump interval and timeout numbers 5x in mirroring tests to address the
spurious failures.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:26:36 -08:00
Ido Schimmel
b6a4fd6800 selftests: forwarding: Make ping timeout configurable
The current timeout (2 seconds) proved to be too low for some (emulated)
systems where we run the tests.

Make the timeout configurable and default to 5 seconds.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:26:36 -08:00
Martin Kepplinger
3fc46fc9f6 ipconfig: add carrier_timeout kernel parameter
commit 3fb72f1e6e ("ipconfig wait for carrier") added a
"wait for carrier" policy, with a fixed worst case maximum wait
of two minutes.

Now make the wait for carrier timeout configurable on the kernel
commandline and use the 120s as the default.

The timeout messages introduced with
commit 5e404cd658 ("ipconfig: add informative timeout messages while
waiting for carrier") are done in a fixed interval of 20 seconds, just
like they were before (240/12).

Signed-off-by: Martin Kepplinger <martin.kepplinger@ginzinger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:24:13 -08:00
Gustavo A. R. Silva
1f533ba6d5 ipv4: fib: use struct_size() in kzalloc()
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:12:29 -08:00
Gustavo A. R. Silva
ee69804714 nfp: use struct_size() in kzalloc()
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:12:29 -08:00
Gustavo A. R. Silva
6541d02590 tulip: eeprom: use struct_size() in kmalloc()
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

instance = kmalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:12:29 -08:00
Gustavo A. R. Silva
c49f0ce0b6 cxgb4: smt: use struct_size() in kvzalloc()
One of the more common cases of allocation size calculations is
finding the size of a structure that has a zero-sized array at
the end, along with memory for some number of elements for that
array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

instance = kvzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:12:29 -08:00
Gustavo A. R. Silva
3ebb18a48c cxgb4: sched: use struct_size() in kvzalloc()
One of the more common cases of allocation size calculations is
finding the size of a structure that has a zero-sized array at
the end, along with memory for some number of elements for that
array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

instance = kvzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:12:29 -08:00
Dave Watson
5b053e121f net: tls: Set async_capable for tls zerocopy only if we see EINPROGRESS
Currently we don't zerocopy if the crypto framework async bit is set.
However some crypto algorithms (such as x86 AESNI) support async,
but in the context of sendmsg, will never run asynchronously.  Instead,
check for actual EINPROGRESS return code before assuming algorithm is
async.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:05:07 -08:00
David S. Miller
665cf634e6 Merge branch 'tls-1.3-support'
Dave Watson says:

====================
net: tls: TLS 1.3 support

This patchset adds 256bit keys and TLS1.3 support to the kernel TLS
socket.

TLS 1.3 is requested by passing TLS_1_3_VERSION in the setsockopt
call, which changes the framing as required for TLS1.3.

256bit keys are requested by passing TLS_CIPHER_AES_GCM_256 in the
sockopt.  This is a fairly straightforward passthrough to the crypto
framework.

256bit keys work with both TLS 1.2 and TLS 1.3

TLS 1.3 requires a different AAD layout, necessitating some minor
refactoring.  It also moves the message type byte to the encrypted
portion of the message, instead of the cleartext header as it was in
TLS1.2.  This requires moving the control message handling to after
decryption, but is otherwise similar.

V1 -> V2

The first two patches were dropped, and sent separately, one as a
bugfix to the net tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:00:55 -08:00
Dave Watson
8debd67e79 net: tls: Add tests for TLS 1.3
Change most tests to TLS 1.3, while adding tests for previous TLS 1.2
behavior.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:00:55 -08:00
Dave Watson
130b392c6c net: tls: Add tls 1.3 support
TLS 1.3 has minor changes from TLS 1.2 at the record layer.

* Header now hardcodes the same version and application content type in
  the header.
* The real content type is appended after the data, before encryption (or
  after decryption).
* The IV is xored with the sequence number, instead of concatinating four
  bytes of IV with the explicit IV.
* Zero-padding:  No exlicit length is given, we search backwards from the
  end of the decrypted data for the first non-zero byte, which is the
  content type.  Currently recv supports reading zero-padding, but there
  is no way for send to add zero padding.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:00:55 -08:00
Dave Watson
fedf201e12 net: tls: Refactor control message handling on recv
For TLS 1.3, the control message is encrypted.  Handle control
message checks after decryption.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:00:55 -08:00
Dave Watson
a2ef9b6a22 net: tls: Refactor tls aad space size calculation
TLS 1.3 has a different AAD size, use a variable in the code to
make TLS 1.3 support easy.

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:00:55 -08:00
Dave Watson
fb99bce712 net: tls: Support 256 bit keys
Wire up support for 256 bit keys from the setsockopt to the crypto
framework

Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:00:55 -08:00
Daniel Borkmann
473c5daa86 Merge branch 'bpf-xdp-sample-libbpf'
Maciej Fijalkowski says:

====================
This patchset tries to address the situation where:
* user loads a particular xdp sample application that does stats polling
* user loads another sample application on the same interface
* then, user sends SIGINT/SIGTERM to the app that was attached as a first one
* second application ends up with an unloaded xdp program

1st patch contains a helper libbpf function for getting the map fd by a
given map name.
In patch 2 Jesper removes the read_trace_pipe usage from xdp_redirect_cpu which
was a blocker for converting this sample to libbpf usage.
3rd patch updates a bunch of xdp samples to make the use of libbpf.
Patch 4 adjusts RLIMIT_MEMLOCK for two samples touched in this patchset.
In patch 5 extack messages are added for cases where dev_change_xdp_fd returns
with an error so user has an idea what was the reason for not attaching the
xdp program onto interface.
Patch 6 makes the samples behavior similar to what iproute2 does when loading
xdp prog - the "force" flag is introduced.
Patch 7 introduces the libbpf function that will query the driver from
userspace about the currently attached xdp prog id.

Use it in samples that do polling by checking the prog id in signal handler
and comparing it with previously stored one which is the scope of patch 8.

Thanks!

v1->v2:
* add a libbpf helper for getting a prog via relative index
* include xdp_redirect_cpu into conversion

v2->v3: mostly addressing Daniel's/Jesper's comments
* get rid of the helper from v1->v2
* feed the xdp_redirect_cpu with program name instead of number

v3->v4:
* fix help message in xdp_sample_pkts

v4->v5:
* in get_link_xdp_fd, assign prog_id only when libbpf_nl_get_link returned
  with 0
* add extack messages in dev_change_xdp_fd
* check the return value of bpf_get_link_xdp_id when exiting from sample progs

v5->v6:
* rebase
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 23:37:53 +01:00
Maciej Fijalkowski
3b7a8ec2de samples/bpf: Check the prog id before exiting
Check the program id within the signal handler on polling xdp samples
that were previously converted to libbpf usage. Avoid the situation of
unloading the program that was not attached by sample that is exiting.
Handle also the case where bpf_get_link_xdp_id didn't exit with an error
but the xdp program was not found on an interface.

Reported-by: Michal Papaj <michal.papaj@intel.com>
Reported-by: Jakub Spizewski <jakub.spizewski@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 23:37:51 +01:00
Maciej Fijalkowski
50db9f0731 libbpf: Add a support for getting xdp prog id on ifindex
Since we have a dedicated netlink attributes for xdp setup on a
particular interface, it is now possible to retrieve the program id that
is currently attached to the interface. The use case is targeted for
sample xdp programs, which will store the program id just after loading
bpf program onto iface. On shutdown, the sample will make sure that it
can unload the program by querying again the iface and verifying that
both program id's matches.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 23:37:51 +01:00
Maciej Fijalkowski
743e568c15 samples/bpf: Add a "force" flag to XDP samples
Make xdp samples consistent with iproute2 behavior and set the
XDP_FLAGS_UPDATE_IF_NOEXIST by default when setting the xdp program on
interface. Provide an option for user to force the program loading,
which as a result will not include the mentioned flag in
bpf_set_link_xdp_fd call.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 23:37:51 +01:00
Maciej Fijalkowski
01dde20ce0 xdp: Provide extack messages when prog attachment failed
In order to provide more meaningful messages to user when the process of
loading xdp program onto network interface failed, let's add extack
messages within dev_change_xdp_fd.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 23:37:51 +01:00
Maciej Fijalkowski
6a5457618f samples/bpf: Extend RLIMIT_MEMLOCK for xdp_{sample_pkts, router_ipv4}
There is a common problem with xdp samples that happens when user wants
to run a particular sample and some bpf program is already loaded. The
default 64kb RLIMIT_MEMLOCK resource limit will cause a following error
(assuming that xdp sample that is failing was converted to libbpf
usage):

libbpf: Error in bpf_object__probe_name():Operation not permitted(1).
Couldn't load basic 'r0 = 0' BPF program.
libbpf: failed to load object './xdp_sample_pkts_kern.o'

Fix it in xdp_sample_pkts and xdp_router_ipv4 by setting RLIMIT_MEMLOCK
to RLIM_INFINITY.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 23:37:51 +01:00
Maciej Fijalkowski
bbaf6029c4 samples/bpf: Convert XDP samples to libbpf usage
Some of XDP samples that are attaching the bpf program to the interface
via libbpf's bpf_set_link_xdp_fd are still using the bpf_load.c for
loading and manipulating the ebpf program and maps. Convert them to do
this through libbpf usage and remove bpf_load from the picture.

While at it remove what looks like debug leftover in
xdp_redirect_map_user.c

In xdp_redirect_cpu, change the way that the program to be loaded onto
interface is chosen - user now needs to pass the program's section name
instead of the relative number. In case of typo print out the section
names to choose from.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 23:37:51 +01:00
Jesper Dangaard Brouer
7313798b14 samples/bpf: xdp_redirect_cpu have not need for read_trace_pipe
The sample xdp_redirect_cpu is not using helper bpf_trace_printk.
Thus it makes no sense that the --debug option us reading
from /sys/kernel/debug/tracing/trace_pipe via read_trace_pipe.
Simply remove it.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 23:37:51 +01:00
Maciej Fijalkowski
f3cea32d56 libbpf: Add a helper for retrieving a map fd for a given name
XDP samples are mostly cooperating with eBPF maps through their file
descriptors. In case of a eBPF program that contains multiple maps it
might be tiresome to iterate through them and call bpf_map__fd for each
one. Add a helper mostly based on bpf_object__find_map_by_name, but
instead of returning the struct bpf_map pointer, return map fd.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 23:37:50 +01:00
Sandipan Das
6f20c71d85 bpf: powerpc64: add JIT support for bpf line info
This adds support for generating bpf line info for
JITed programs.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 21:03:34 +01:00
Daniel Borkmann
2863debfbc Merge branch 'bpf-spinlocks'
Alexei Starovoitov says:

====================
Many algorithms need to read and modify several variables atomically.
Until now it was hard to impossible to implement such algorithms in BPF.
Hence introduce support for bpf_spin_lock.

The api consists of 'struct bpf_spin_lock' that should be placed
inside hash/array/cgroup_local_storage element
and bpf_spin_lock/unlock() helper function.

Example:
struct hash_elem {
    int cnt;
    struct bpf_spin_lock lock;
};
struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
if (val) {
    bpf_spin_lock(&val->lock);
    val->cnt++;
    bpf_spin_unlock(&val->lock);
}

and BPF_F_LOCK flag for lookup/update bpf syscall commands that
allows user space to read/write map elements under lock.

Together these primitives allow race free access to map elements
from bpf programs and from user space.

Key restriction: root only.
Key requirement: maps must be annotated with BTF.

This concept was discussed at Linux Plumbers Conference 2018.
Thank you everyone who participated and helped to iron out details
of api and implementation.

Patch 1: bpf_spin_lock support in the verifier, BTF, hash, array.
Patch 2: bpf_spin_lock in cgroup local storage.
Patches 3,4,5: tests
Patch 6: BPF_F_LOCK flag to lookup/update
Patches 7,8,9: tests

v6->v7:
- fixed this_cpu->__this_cpu per Peter's suggestion and added Ack.
- simplified bpf_spin_lock and load/store overlap check in the verifier
  as suggested by Andrii
- rebase

v5->v6:
- adopted arch_spinlock approach suggested by Peter
- switched to spin_lock_irqsave equivalent as the simplest way
  to avoid deadlocks in rare case of nested networking progs
  (cgroup-bpf prog in preempt_disable vs clsbpf in softirq sharing
  the same map with bpf_spin_lock)
  bpf_spin_lock is only allowed in networking progs that don't
  have arbitrary entry points unlike tracing progs.
- rebase and split test_verifier tests

v4->v5:
- disallow bpf_spin_lock for tracing progs due to insufficient preemption checks
- socket filter progs cannot use bpf_spin_lock due to missing preempt_disable
- fix atomic_set_release. Spotted by Peter.
- fixed hash_of_maps

v3->v4:
- fix BPF_EXIST | BPF_NOEXIST check patch 6. Spotted by Jakub. Thanks!
- rebase

v2->v3:
- fixed build on ia64 and archs where qspinlock is not supported
- fixed missing lock init during lookup w/o BPF_F_LOCK. Spotted by Martin

v1->v2:
- addressed several issues spotted by Daniel and Martin in patch 1
- added test11 to patch 4 as suggested by Daniel
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 20:55:41 +01:00
Alexei Starovoitov
ba72a7b4ba selftests/bpf: test for BPF_F_LOCK
Add C based test that runs 4 bpf programs in parallel
that update the same hash and array maps.
And another 2 threads that read from these two maps
via lookup(key, value, BPF_F_LOCK) api
to make sure the user space sees consistent value in both
hash and array elements while user space races with kernel bpf progs.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 20:55:39 +01:00
Alexei Starovoitov
df5d22facd libbpf: introduce bpf_map_lookup_elem_flags()
Introduce
int bpf_map_lookup_elem_flags(int fd, const void *key, void *value, __u64 flags)
helper to lookup array/hash/cgroup_local_storage elements with BPF_F_LOCK flag.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-01 20:55:39 +01:00