linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-22 10:56:40 +00:00

Author	SHA1	Message	Date
Jakub Kicinski	4adba00839	nfp: devlink: report driver name and serial number Report the basic info through new devlink info API. RFCv2: - add driver name; - align serial to core changes. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:30:30 -08:00
Jakub Kicinski	785bd550c4	devlink: add generic info version names Add defines and docs for generic info versions. v3: - add docs; - separate patch (Jiri). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:30:30 -08:00
Jakub Kicinski	fc6fae7dd9	devlink: add version reporting to devlink info API ethtool -i has a few fixed-size fields which can be used to report firmware version and expansion ROM version. Unfortunately, modern hardware has more firmware components. There is usually some datapath microcode, management controller, PXE drivers, and a CPLD load. Running ethtool -i on modern controllers reveals the fact that vendors cram multiple values into firmware version field. Here are some examples from systems I could lay my hands on quickly: tg3: "FFV20.2.17 bc 5720-v1.39" i40e: "6.01 0x800034a4 1.1747.0" nfp: "0.0.3.5 0.25 sriov-2.1.16 nic" Add a new devlink API to allow retrieving multiple versions, and provide user-readable name for those versions. While at it break down the versions into three categories: - fixed - this is the board/fixed component version, usually vendors report information like the board version in the PCI VPD, but it will benefit from naming and common API as well; - running - this is the running firmware version; - stored - this is firmware in the flash, after firmware update this value will reflect the flashed version, while the running version may only be updated after reboot. v3: - add per-type helpers instead of using the special argument (Jiri). RFCv2: - remove the nesting in attr DEVLINK_ATTR_INFO_VERSIONS (now versions are mixed with other info attrs)l - have the driver report versions from the same callback as other info. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:30:30 -08:00
Jakub Kicinski	f9cf22882c	devlink: add device information API ethtool -i has served us well for a long time, but its showing its limitations more and more. The device information should also be reported per device not per-netdev. Lay foundation for a simple devlink-based way of reading device info. Add driver name and device serial number as initial pieces of information exposed via this new API. v3: - rename helpers (Jiri); - rename driver name attr (Jiri); - remove double spacing in commit message (Jiri). RFC v2: - wrap the skb into an opaque structure (Jiri); - allow the serial number of be any length (Jiri & Andrew); - add driver name (Jonathan). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:30:30 -08:00
David S. Miller	26281e2c83	Merge branch 'selftests-Various-fixes' Petr Machata says: ==================== selftests: Various fixes This patch set contains various fixes whose common denominator is improving quality of forwarding and mlxsw selftests. Most of the fixes are improvements in determinism (such that timing and latency don't impact the test performance). These were prompted by regular runs of the test suite on a hardware emulator, the performance of which is necessarily lower than that of the real device. Patches #1 (from Ido), #2 and #3 make changes to ping limits. Patches #4 and #5 add more sleep in places where things need more time to finish. Patches #6 and #7 fix two tests in the suite of mirror-to-gretap tests where underlay involves a VLAN device over an 802.1q bridge. Patches #8, #9 and #10 fix bugs in mirror-to-gretap test where underlay involves a LAG device. Patch #11 fixes a missed RET initialization in mirror-to-gretap flower test. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:26:37 -08:00
Petr Machata	084fafe9ef	selftests: forwarding: mirror_gre_flower: Fix test result handling The global variable RET needs to be initialized before each call to log_test. This test case sets it once before running the tests, but then calls log_tests for every individual test. Thus a failure in one of the tests causes spurious failures in follow-up tests as well. Fix by moving the initialization of RET from test_all() to full_test_span_gre_dir_acl(), a function that implements the test. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:26:37 -08:00
Petr Machata	2243cad9ff	selftests: forwarding: mirror_gre_bridge_1q_lag: Ignore ARP This test sets up mirroring such that it mirrors all overlay traffic. That includes ARP, which causes occasional miscounts and spurious failures. Ignore ARP explicitly to avoid these problems. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:26:37 -08:00
Petr Machata	ba22b65edc	selftests: forwarding: mirror_gre_bridge_1q_lag: Enable forwarding This test relies on routing in the primary traffic path, but neglects to enable forwarding. Do so. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:26:36 -08:00
Petr Machata	a99dd629e8	selftests: forwarding: mirror_gre_bridge_1q_lag: Flush neighbors After one LAG slave is downed and another upped, it takes a while for the neighbor on a bridge to time out and get renegotiated. The test does prompt update of FDB entries by arpinging. But because the neighbor still references another address, offloading is not possible, and some packets may end up not being mirrored. To force the neighbor renegotiation, simply flush the neighbor table at the bridge. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:26:36 -08:00
Petr Machata	ccdb66dd2f	selftests: forwarding: mirror_gre_vlan_bridge_1q: Fix roaming test ARP or ND traffic can cause spurious migration of FDB back to $swp3. Mirroring is then updated in accordance with the change, and mirrored packets are seen at h3, causing a failure. Detect the case of this spurious roaming, and retry the test. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:26:36 -08:00
Petr Machata	35036b0b09	selftests: forwarding: mirror_gre_vlan_bridge_1q: Fix untagged test The untagged egress test sets up mirroring to {,ip6}gretap such that the underlay goes through a bridge. Then VLAN flags are manipulated to test that the traffic leaves the bridge 802.1q-tagged or not, as appropriate. However, when a neighbor expires at the time that the bridge VLAN is configured as PVID and egress untagged, the following discovery process can't finish, because the IP address on H3 is still at the VLAN-tagged netdevice. This manifests by occasional failures where only several of the 10 required packets get through. Therefore, when reconfiguring the VLAN flags, move the IP address to the appropriate device in the H3 VRF. In addition to that, take this opportunity to embed an ASCII art diagram to make the topology move obvious. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:26:36 -08:00
Petr Machata	db2c5bfcdf	selftests: forwarding: mirror_lib: Wait for tardy mirrored packets When running in an environment with poor performance (such as a simulator), processing mirrored packets can take a while. Evaluating the condition too soon leads to spurious "seen 9, expected 10" failures as the last packet doesn't have enough time to get mirrored and the mirror to arrive and bump the observed counters. Wait for one ping interval before evaluating the test. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:26:36 -08:00
Petr Machata	3dc178a9ef	selftests: forwarding: mirror_gre_changes: Fix TTL test When running in a simulator, the TTL change takes a while to settle and during this time the performance of the packet processing is lowered. The resulting instability leads to ping sending more packets as it assumes some have been dropped. This then leads to regular spurious failures as more packets than expected are observed. Sleep a bit to give the system time to stabilize. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:26:36 -08:00
Petr Machata	f3b05bb819	selftests: mlxsw: Update ping limits The current ping intervals are too short for running mirroring tests in simulator. This leads to ping sending a follow-up ping before the reply arrives, thus sending more than the requested 10 ICMP requests. This traffic is seen at the counters, and causes spurious failures. Bump interval and timeout numbers 5x in mirroring tests to address the spurious failures. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:26:36 -08:00
Petr Machata	0175cb5922	selftests: forwarding: mirror_lib: Update ping limits The current ping intervals are too short for running mirroring tests in simulator. This leads to ping sending a follow-up ping before the reply arrives, thus sending more than the requested 10 ICMP requests. Those are mirrored, and over a certain threshold the test case run is considered a failure, because too much traffic is observed. Bump interval and timeout numbers 5x in mirroring tests to address the spurious failures. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:26:36 -08:00
Ido Schimmel	b6a4fd6800	selftests: forwarding: Make ping timeout configurable The current timeout (2 seconds) proved to be too low for some (emulated) systems where we run the tests. Make the timeout configurable and default to 5 seconds. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:26:36 -08:00
Martin Kepplinger	3fc46fc9f6	ipconfig: add carrier_timeout kernel parameter commit `3fb72f1e6e` ("ipconfig wait for carrier") added a "wait for carrier" policy, with a fixed worst case maximum wait of two minutes. Now make the wait for carrier timeout configurable on the kernel commandline and use the 120s as the default. The timeout messages introduced with commit `5e404cd658` ("ipconfig: add informative timeout messages while waiting for carrier") are done in a fixed interval of 20 seconds, just like they were before (240/12). Signed-off-by: Martin Kepplinger <martin.kepplinger@ginzinger.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:24:13 -08:00
Gustavo A. R. Silva	1f533ba6d5	ipv4: fib: use struct_size() in kzalloc() One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:12:29 -08:00
Gustavo A. R. Silva	ee69804714	nfp: use struct_size() in kzalloc() One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:12:29 -08:00
Gustavo A. R. Silva	6541d02590	tulip: eeprom: use struct_size() in kmalloc() One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kmalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:12:29 -08:00
Gustavo A. R. Silva	c49f0ce0b6	cxgb4: smt: use struct_size() in kvzalloc() One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kvzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:12:29 -08:00
Gustavo A. R. Silva	3ebb18a48c	cxgb4: sched: use struct_size() in kvzalloc() One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; instance = kvzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:12:29 -08:00
Dave Watson	5b053e121f	net: tls: Set async_capable for tls zerocopy only if we see EINPROGRESS Currently we don't zerocopy if the crypto framework async bit is set. However some crypto algorithms (such as x86 AESNI) support async, but in the context of sendmsg, will never run asynchronously. Instead, check for actual EINPROGRESS return code before assuming algorithm is async. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:05:07 -08:00
David S. Miller	665cf634e6	Merge branch 'tls-1.3-support' Dave Watson says: ==================== net: tls: TLS 1.3 support This patchset adds 256bit keys and TLS1.3 support to the kernel TLS socket. TLS 1.3 is requested by passing TLS_1_3_VERSION in the setsockopt call, which changes the framing as required for TLS1.3. 256bit keys are requested by passing TLS_CIPHER_AES_GCM_256 in the sockopt. This is a fairly straightforward passthrough to the crypto framework. 256bit keys work with both TLS 1.2 and TLS 1.3 TLS 1.3 requires a different AAD layout, necessitating some minor refactoring. It also moves the message type byte to the encrypted portion of the message, instead of the cleartext header as it was in TLS1.2. This requires moving the control message handling to after decryption, but is otherwise similar. V1 -> V2 The first two patches were dropped, and sent separately, one as a bugfix to the net tree. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:00:55 -08:00
Dave Watson	8debd67e79	net: tls: Add tests for TLS 1.3 Change most tests to TLS 1.3, while adding tests for previous TLS 1.2 behavior. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:00:55 -08:00
Dave Watson	130b392c6c	net: tls: Add tls 1.3 support TLS 1.3 has minor changes from TLS 1.2 at the record layer. * Header now hardcodes the same version and application content type in the header. * The real content type is appended after the data, before encryption (or after decryption). * The IV is xored with the sequence number, instead of concatinating four bytes of IV with the explicit IV. * Zero-padding: No exlicit length is given, we search backwards from the end of the decrypted data for the first non-zero byte, which is the content type. Currently recv supports reading zero-padding, but there is no way for send to add zero padding. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:00:55 -08:00
Dave Watson	fedf201e12	net: tls: Refactor control message handling on recv For TLS 1.3, the control message is encrypted. Handle control message checks after decryption. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:00:55 -08:00
Dave Watson	a2ef9b6a22	net: tls: Refactor tls aad space size calculation TLS 1.3 has a different AAD size, use a variable in the code to make TLS 1.3 support easy. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:00:55 -08:00
Dave Watson	fb99bce712	net: tls: Support 256 bit keys Wire up support for 256 bit keys from the setsockopt to the crypto framework Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:00:55 -08:00
Daniel Borkmann	473c5daa86	Merge branch 'bpf-xdp-sample-libbpf' Maciej Fijalkowski says: ==================== This patchset tries to address the situation where: * user loads a particular xdp sample application that does stats polling * user loads another sample application on the same interface * then, user sends SIGINT/SIGTERM to the app that was attached as a first one * second application ends up with an unloaded xdp program 1st patch contains a helper libbpf function for getting the map fd by a given map name. In patch 2 Jesper removes the read_trace_pipe usage from xdp_redirect_cpu which was a blocker for converting this sample to libbpf usage. 3rd patch updates a bunch of xdp samples to make the use of libbpf. Patch 4 adjusts RLIMIT_MEMLOCK for two samples touched in this patchset. In patch 5 extack messages are added for cases where dev_change_xdp_fd returns with an error so user has an idea what was the reason for not attaching the xdp program onto interface. Patch 6 makes the samples behavior similar to what iproute2 does when loading xdp prog - the "force" flag is introduced. Patch 7 introduces the libbpf function that will query the driver from userspace about the currently attached xdp prog id. Use it in samples that do polling by checking the prog id in signal handler and comparing it with previously stored one which is the scope of patch 8. Thanks! v1->v2: * add a libbpf helper for getting a prog via relative index * include xdp_redirect_cpu into conversion v2->v3: mostly addressing Daniel's/Jesper's comments * get rid of the helper from v1->v2 * feed the xdp_redirect_cpu with program name instead of number v3->v4: * fix help message in xdp_sample_pkts v4->v5: * in get_link_xdp_fd, assign prog_id only when libbpf_nl_get_link returned with 0 * add extack messages in dev_change_xdp_fd * check the return value of bpf_get_link_xdp_id when exiting from sample progs v5->v6: * rebase ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 23:37:53 +01:00
Maciej Fijalkowski	3b7a8ec2de	samples/bpf: Check the prog id before exiting Check the program id within the signal handler on polling xdp samples that were previously converted to libbpf usage. Avoid the situation of unloading the program that was not attached by sample that is exiting. Handle also the case where bpf_get_link_xdp_id didn't exit with an error but the xdp program was not found on an interface. Reported-by: Michal Papaj <michal.papaj@intel.com> Reported-by: Jakub Spizewski <jakub.spizewski@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 23:37:51 +01:00
Maciej Fijalkowski	50db9f0731	libbpf: Add a support for getting xdp prog id on ifindex Since we have a dedicated netlink attributes for xdp setup on a particular interface, it is now possible to retrieve the program id that is currently attached to the interface. The use case is targeted for sample xdp programs, which will store the program id just after loading bpf program onto iface. On shutdown, the sample will make sure that it can unload the program by querying again the iface and verifying that both program id's matches. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 23:37:51 +01:00
Maciej Fijalkowski	743e568c15	samples/bpf: Add a "force" flag to XDP samples Make xdp samples consistent with iproute2 behavior and set the XDP_FLAGS_UPDATE_IF_NOEXIST by default when setting the xdp program on interface. Provide an option for user to force the program loading, which as a result will not include the mentioned flag in bpf_set_link_xdp_fd call. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 23:37:51 +01:00
Maciej Fijalkowski	01dde20ce0	xdp: Provide extack messages when prog attachment failed In order to provide more meaningful messages to user when the process of loading xdp program onto network interface failed, let's add extack messages within dev_change_xdp_fd. Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 23:37:51 +01:00
Maciej Fijalkowski	6a5457618f	samples/bpf: Extend RLIMIT_MEMLOCK for xdp_{sample_pkts, router_ipv4} There is a common problem with xdp samples that happens when user wants to run a particular sample and some bpf program is already loaded. The default 64kb RLIMIT_MEMLOCK resource limit will cause a following error (assuming that xdp sample that is failing was converted to libbpf usage): libbpf: Error in bpf_object__probe_name():Operation not permitted(1). Couldn't load basic 'r0 = 0' BPF program. libbpf: failed to load object './xdp_sample_pkts_kern.o' Fix it in xdp_sample_pkts and xdp_router_ipv4 by setting RLIMIT_MEMLOCK to RLIM_INFINITY. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 23:37:51 +01:00
Maciej Fijalkowski	bbaf6029c4	samples/bpf: Convert XDP samples to libbpf usage Some of XDP samples that are attaching the bpf program to the interface via libbpf's bpf_set_link_xdp_fd are still using the bpf_load.c for loading and manipulating the ebpf program and maps. Convert them to do this through libbpf usage and remove bpf_load from the picture. While at it remove what looks like debug leftover in xdp_redirect_map_user.c In xdp_redirect_cpu, change the way that the program to be loaded onto interface is chosen - user now needs to pass the program's section name instead of the relative number. In case of typo print out the section names to choose from. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 23:37:51 +01:00
Jesper Dangaard Brouer	7313798b14	samples/bpf: xdp_redirect_cpu have not need for read_trace_pipe The sample xdp_redirect_cpu is not using helper bpf_trace_printk. Thus it makes no sense that the --debug option us reading from /sys/kernel/debug/tracing/trace_pipe via read_trace_pipe. Simply remove it. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 23:37:51 +01:00
Maciej Fijalkowski	f3cea32d56	libbpf: Add a helper for retrieving a map fd for a given name XDP samples are mostly cooperating with eBPF maps through their file descriptors. In case of a eBPF program that contains multiple maps it might be tiresome to iterate through them and call bpf_map__fd for each one. Add a helper mostly based on bpf_object__find_map_by_name, but instead of returning the struct bpf_map pointer, return map fd. Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 23:37:50 +01:00
Sandipan Das	6f20c71d85	bpf: powerpc64: add JIT support for bpf line info This adds support for generating bpf line info for JITed programs. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 21:03:34 +01:00
Daniel Borkmann	2863debfbc	Merge branch 'bpf-spinlocks' Alexei Starovoitov says: ==================== Many algorithms need to read and modify several variables atomically. Until now it was hard to impossible to implement such algorithms in BPF. Hence introduce support for bpf_spin_lock. The api consists of 'struct bpf_spin_lock' that should be placed inside hash/array/cgroup_local_storage element and bpf_spin_lock/unlock() helper function. Example: struct hash_elem { int cnt; struct bpf_spin_lock lock; }; struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key); if (val) { bpf_spin_lock(&val->lock); val->cnt++; bpf_spin_unlock(&val->lock); } and BPF_F_LOCK flag for lookup/update bpf syscall commands that allows user space to read/write map elements under lock. Together these primitives allow race free access to map elements from bpf programs and from user space. Key restriction: root only. Key requirement: maps must be annotated with BTF. This concept was discussed at Linux Plumbers Conference 2018. Thank you everyone who participated and helped to iron out details of api and implementation. Patch 1: bpf_spin_lock support in the verifier, BTF, hash, array. Patch 2: bpf_spin_lock in cgroup local storage. Patches 3,4,5: tests Patch 6: BPF_F_LOCK flag to lookup/update Patches 7,8,9: tests v6->v7: - fixed this_cpu->__this_cpu per Peter's suggestion and added Ack. - simplified bpf_spin_lock and load/store overlap check in the verifier as suggested by Andrii - rebase v5->v6: - adopted arch_spinlock approach suggested by Peter - switched to spin_lock_irqsave equivalent as the simplest way to avoid deadlocks in rare case of nested networking progs (cgroup-bpf prog in preempt_disable vs clsbpf in softirq sharing the same map with bpf_spin_lock) bpf_spin_lock is only allowed in networking progs that don't have arbitrary entry points unlike tracing progs. - rebase and split test_verifier tests v4->v5: - disallow bpf_spin_lock for tracing progs due to insufficient preemption checks - socket filter progs cannot use bpf_spin_lock due to missing preempt_disable - fix atomic_set_release. Spotted by Peter. - fixed hash_of_maps v3->v4: - fix BPF_EXIST \| BPF_NOEXIST check patch 6. Spotted by Jakub. Thanks! - rebase v2->v3: - fixed build on ia64 and archs where qspinlock is not supported - fixed missing lock init during lookup w/o BPF_F_LOCK. Spotted by Martin v1->v2: - addressed several issues spotted by Daniel and Martin in patch 1 - added test11 to patch 4 as suggested by Daniel ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 20:55:41 +01:00
Alexei Starovoitov	ba72a7b4ba	selftests/bpf: test for BPF_F_LOCK Add C based test that runs 4 bpf programs in parallel that update the same hash and array maps. And another 2 threads that read from these two maps via lookup(key, value, BPF_F_LOCK) api to make sure the user space sees consistent value in both hash and array elements while user space races with kernel bpf progs. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 20:55:39 +01:00
Alexei Starovoitov	df5d22facd	libbpf: introduce bpf_map_lookup_elem_flags() Introduce int bpf_map_lookup_elem_flags(int fd, const void key, void value, __u64 flags) helper to lookup array/hash/cgroup_local_storage elements with BPF_F_LOCK flag. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 20:55:39 +01:00
Alexei Starovoitov	e44ac9a22b	tools/bpf: sync uapi/bpf.h add BPF_F_LOCK definition to tools/include/uapi/linux/bpf.h Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 20:55:39 +01:00
Alexei Starovoitov	96049f3afd	bpf: introduce BPF_F_LOCK flag Introduce BPF_F_LOCK flag for map_lookup and map_update syscall commands and for map_update() helper function. In all these cases take a lock of existing element (which was provided in BTF description) before copying (in or out) the rest of map value. Implementation details that are part of uapi: Array: The array map takes the element lock for lookup/update. Hash: hash map also takes the lock for lookup/update and tries to avoid the bucket lock. If old element exists it takes the element lock and updates the element in place. If element doesn't exist it allocates new one and inserts into hash table while holding the bucket lock. In rare case the hashmap has to take both the bucket lock and the element lock to update old value in place. Cgroup local storage: It is similar to array. update in place and lookup are done with lock taken. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 20:55:39 +01:00
Alexei Starovoitov	ab963beb9f	selftests/bpf: add bpf_spin_lock C test add bpf_spin_lock C based test that requires latest llvm with BTF support Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 20:55:39 +01:00
Alexei Starovoitov	b4d4556c32	selftests/bpf: add bpf_spin_lock verifier tests add bpf_spin_lock tests to test_verifier.c that don't require latest llvm with BTF support Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 20:55:39 +01:00
Alexei Starovoitov	7dac3ae42c	tools/bpf: sync include/uapi/linux/bpf.h sync bpf.h Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 20:55:39 +01:00
Alexei Starovoitov	e16d2f1ab9	bpf: add support for bpf_spin_lock to cgroup local storage Allow 'struct bpf_spin_lock' to reside inside cgroup local storage. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 20:55:38 +01:00
Alexei Starovoitov	d83525ca62	bpf: introduce bpf_spin_lock Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let bpf program serialize access to other variables. Example: struct hash_elem { int cnt; struct bpf_spin_lock lock; }; struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key); if (val) { bpf_spin_lock(&val->lock); val->cnt++; bpf_spin_unlock(&val->lock); } Restrictions and safety checks: - bpf_spin_lock is only allowed inside HASH and ARRAY maps. - BTF description of the map is mandatory for safety analysis. - bpf program can take one bpf_spin_lock at a time, since two or more can cause dead locks. - only one 'struct bpf_spin_lock' is allowed per map element. It drastically simplifies implementation yet allows bpf program to use any number of bpf_spin_locks. - when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed. - bpf program must bpf_spin_unlock() before return. - bpf program can access 'struct bpf_spin_lock' only via bpf_spin_lock()/bpf_spin_unlock() helpers. - load/store into 'struct bpf_spin_lock lock;' field is not allowed. - to use bpf_spin_lock() helper the BTF description of map value must be a struct and have 'struct bpf_spin_lock anyname;' field at the top level. Nested lock inside another struct is not allowed. - syscall map_lookup doesn't copy bpf_spin_lock field to user space. - syscall map_update and program map_update do not update bpf_spin_lock field. - bpf_spin_lock cannot be on the stack or inside networking packet. bpf_spin_lock can only be inside HASH or ARRAY map value. - bpf_spin_lock is available to root only and to all program types. - bpf_spin_lock is not allowed in inner maps of map-in-map. - ld_abs is not allowed inside spin_lock-ed region. - tracing progs and socket filter progs cannot use bpf_spin_lock due to insufficient preemption checks Implementation details: - cgroup-bpf class of programs can nest with xdp/tc programs. Hence bpf_spin_lock is equivalent to spin_lock_irqsave. Other solutions to avoid nested bpf_spin_lock are possible. Like making sure that all networking progs run with softirq disabled. spin_lock_irqsave is the simplest and doesn't add overhead to the programs that don't use it. - arch_spinlock_t is used when its implemented as queued_spin_lock - archs can force their own arch_spinlock_t - on architectures where queued_spin_lock is not available and sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used. - presence of bpf_spin_lock inside map value could have been indicated via extra flag during map_create, but specifying it via BTF is cleaner. It provides introspection for map key/value and reduces user mistakes. Next steps: - allow bpf_spin_lock in other map types (like cgroup local storage) - introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper to request kernel to grab bpf_spin_lock before rewriting the value. That will serialize access to map elements. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-02-01 20:55:38 +01:00
David S. Miller	d3a5fd3c98	This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - Add DHCPACKs for DAT snooping, by Linus Luessing - Update copyright years for 2019, by Sven Eckelmann -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAlxUKp8WHHN3QHNpbW9u d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoc1rD/oC39wZl/ZdubJogpGWI3G9SdU8 SxbjnaCVCkLMsLJH/wv/OM9OWwHa2OhvOzmMO5Zq47dy5SYXqBMkvv0QOMJcUQqz 1nPMssTVo9gQLrJiPiP+yXGZ5+sc4dsLCiykFsCs7YRhqeStL1QTZkAOEKfy0K+7 6G5OQsLT62aY+1OWLbb5oJaB9nlhUY+GyuQZ213jNuYxP7I6MyM9FfMHokASMLn8 H58fIvDyGkC0PXvYZiJedlnFBU92TPEnBAV/afJ8egcmYQw9jkWL3cbS5ZzqDG4m 49p9/Xmt2ARsf4UMDxQTEE3elw3tu1PZGPSecTmU+rRzHyYHIIYWnFgTZWmK7/zU TKQMlrPx4ky8HOyIY6/5AHNR7x5muchgxf0ft+4Jf0Bf+rGFIgdqfAIeQliUNAUc IW+HC0c1SEU/519a6z1V/ARrC6W4qk8aBZ0G4zyx+76KLvxlyvgjEo3XNasyNJnY GpHHhpyIeY5xeNOsGmoVrQraJRMqwr4jnWdcmf1LS8o9loB6X7bij8SRKUsEwNi2 AkIK2sojUf6c4YZW/GVqIgvPuGtZL/Sy9FHx7Ve5f1NRuxpcMStSVV+daYqHYEe6 72/WYF+oJc4fCR3zAXc67wVoyuPVPxOKpwh2uJUeVbvdvHZ0+dpOV740ktdFjIWQ 3XAGhl/dSrn7OXYJqA== =G6IW -----END PGP SIGNATURE----- Merge tag 'batadv-next-for-davem-20190201' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - Add DHCPACKs for DAT snooping, by Linus Luessing - Update copyright years for 2019, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 11:04:13 -08:00

1 2 3 4 5 ...

812147 Commits