linux

Author	SHA1	Message	Date
Guillaume Nault	41fdfffd57	selftests: forwarding: Add MPLS L2VPN test Connect hosts H1 and H2 using two intermediate encapsulation routers (LER1 and LER2). These routers encapsulate traffic from the hosts, including the original Ethernet header, into MPLS. Use ping to test reachability between H1 and H2. Signed-off-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/625f5c1aafa3a8085f8d3e082d680a82e16ffbaa.1606918980.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 17:44:06 -08:00
Tom Rix	0911d463b3	net: bna: remove trailing semicolon in macro definition The macro use will already have a semicolon. Clean up escaped newlines. Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20201202163622.3733506-1-trix@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 17:41:49 -08:00
Hoang Le	43fcd906d9	tipc: support 128bit node identity for peer removing We add the support to remove a specific node down with 128bit node identifier, as an alternative to legacy 32-bit node address. example: $tipc peer remove identiy <1001002\|16777777> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Link: https://lore.kernel.org/r/20201203035045.4564-1-hoang.h.le@dektech.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 17:40:27 -08:00
Simon Horman	7f356166ae	nfp: Replace zero-length array with flexible-array member There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use "flexible array members"[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Louis Peens <louis.peens@netronome.com> Link: https://lore.kernel.org/r/20201204125601.24876-1-simon.horman@netronome.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 16:00:20 -08:00
Bongsu Jeon	4fb7b98c7b	nfc: s3fwrn5: skip the NFC bootloader mode If there isn't a proper NFC firmware image, Bootloader mode will be skipped. Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://lore.kernel.org/r/20201203225257.2446-1-bongsu.jeon@samsung.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 15:30:47 -08:00
Jakub Kicinski	43be3a3c65	Merge branch 'perf-optimizations-for-tcp-recv-zerocopy' Arjun Roy says: ==================== Perf. optimizations for TCP Recv. Zerocopy This patchset contains several optimizations for TCP Recv. Zerocopy. Summarized: 1. It is possible that a read payload is not exactly page aligned - that there may exist "straggler" bytes that we cannot map into the caller's address space cleanly. For this, we allow the caller to provide as argument a "hybrid copy buffer", turning getsockopt(TCP_ZEROCOPY_RECEIVE) into a "hybrid" operation that allows the caller to avoid a subsequent recvmsg() call to read the stragglers. 2. Similarly, for "small" read payloads that are either below the size of a page, or small enough that remapping pages is not a performance win - we allow the user to short-circuit the remapping operations entirely and simply copy into the buffer provided. Some of the patches in the middle of this set are refactors to support this "short-circuiting" optimization. 3. We allow the user to provide a hint that performing a page zap operation (and the accompanying TLB shootdown) may not be necessary, for the provided region that the kernel will attempt to map pages into. This allows us to avoid this expensive operation while holding the socket lock, which provides a significant performance advantage. With all of these changes combined, "medium" sized receive traffic (multiple tens to few hundreds of KB) see significant efficiency gains when using TCP receive zerocopy instead of regular recvmsg(). For example, with RPC-style traffic with 32KB messages, there is a roughly 15% efficiency improvement when using zerocopy. Without these changes, there is a roughly 60-70% efficiency reduction with such messages when employing zerocopy. ==================== Link: https://lore.kernel.org/r/20201202225349.935284-1-arjunroy.kdev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:40:55 -08:00
Arjun Roy	94ab9eb9b2	net-zerocopy: Defer vm zap unless actually needed. Zapping pages is required only if we are calling vm_insert_page into a region where pages had previously been mapped. Receive zerocopy allows reusing such regions, and hitherto called zap_page_range() before calling vm_insert_page() in that range. zap_page_range() can also be triggered from userspace with madvise(MADV_DONTNEED). If userspace is configured to call this before reusing a segment, or if there was nothing mapped at this virtual address to begin with, we can avoid calling zap_page_range() under the socket lock. That said, if userspace does not do that, then we are still responsible for calling zap_page_range(). This patch adds a flag that the user can use to hint to the kernel that a zap is not required. If the flag is not set, or if an older user application does not have a flags field at all, then the kernel calls zap_page_range as before. Also, if the flag is set but a zap is still required, the kernel performs that zap as necessary. Thus incorrectly indicating that a zap can be avoided does not change the correctness of operation. It also increases the batchsize for vm_insert_pages and prefetches the page struct for the batch since we're about to bump the refcount. An alternative mechanism could be to not have a flag, assume by default a zap is not needed, and fall back to zapping if needed. However, this would harm performance for older applications for which a zap is necessary, and thus we implement it with an explicit flag so newer applications can opt in. When using RPC-style traffic with medium sized (tens of KB) RPCs, this change yields an efficency improvement of about 30% for QPS/CPU usage. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:40:53 -08:00
Arjun Roy	0c3936d32f	net-zerocopy: Set zerocopy hint when data is copied Set zerocopy hint, event when falling back to copy, so that the pending data can be efficiently received using zerocopy when possible. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:40:53 -08:00
Arjun Roy	f21a3c4803	net-zerocopy: Introduce short-circuit small reads. Sometimes, we may call tcp receive zerocopy when inq is 0, or inq < PAGE_SIZE, or inq is generally small enough that it is cheaper to copy rather than remap pages. In these cases, we may want to either return early (inq=0) or attempt to use the provided copy buffer to simply copy the received data. This allows us to save both system call overhead and the latency of acquiring mmap_sem in read mode for cases where it would be useless to do so. This patchset enables this behaviour by: 1. Returning quickly if inq is 0. 2. Attempting to perform a regular copy if a hybrid copybuffer is provided and it is large enough to absorb all available bytes. 3. Return quickly if no such buffer was provided and there are less than PAGE_SIZE bytes available. For small RPC ping-pong workloads, normally we would have 1 getsockopt(), 1 recvmsg() and 1 sendmsg() call per RPC. With this change, we remove the recvmsg() call entirely, reducing the syscall overhead by about 33%. In testing with small (hundreds of bytes) RPC traffic, this yields a syscall reduction of about 33% and an efficiency gain of about 3-5% when defined as QPS/CPU Util. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:40:53 -08:00
Arjun Roy	936ced4157	net-zerocopy: Fast return if inq < PAGE_SIZE Sometimes, we may call tcp receive zerocopy when inq is 0, or inq < PAGE_SIZE, in which case we cannot remap pages. In this case, simply return the appropriate hint for regular copying without taking mmap_sem. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:40:53 -08:00
Arjun Roy	98917cf0d6	net-zerocopy: Refactor frag-is-remappable test. Refactor frag-is-remappable test for tcp receive zerocopy. This is part of a patch set that introduces short-circuited hybrid copies for small receive operations, which results in roughly 33% fewer syscalls for small RPC scenarios. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:40:52 -08:00
Arjun Roy	7fba5309ef	net-zerocopy: Refactor skb frag fast-forward op. Refactor skb frag fast-forwarding for tcp receive zerocopy. This is part of a patch set that introduces short-circuited hybrid copies for small receive operations, which results in roughly 33% fewer syscalls for small RPC scenarios. skb_advance_to_frag(), given a skb and an offset into the skb, iterates from the first frag for the skb until we're at the frag specified by the offset. Assuming the offset provided refers to how many bytes in the skb are already read, the returned frag points to the next frag we may read from, while offset_frag is set to the number of bytes from this frag that we have already read. If frag is not null and offset_frag is equal to 0, then we may be able to map this frag's page into the process address space with vm_insert_page(). However, if offset_frag is not equal to 0, then we cannot do so. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:40:52 -08:00
Arjun Roy	2cd8116184	net-tcp: Introduce tcp_recvmsg_locked(). Refactor tcp_recvmsg() by splitting it into locked and unlocked portions. Callers already holding the socket lock and not using ERRQUEUE/cmsg/busy polling can simply call tcp_recvmsg_locked(). This is in preparation for a short-circuit copy performed by TCP receive zerocopy for small (< PAGE_SIZE, or otherwise requested by the user) reads. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:40:52 -08:00
Arjun Roy	18fb76ed53	net-zerocopy: Copy straggler unaligned data for TCP Rx. zerocopy. When TCP receive zerocopy does not successfully map the entire requested space, it outputs a 'hint' that the caller should recvmsg(). Augment zerocopy to accept a user buffer that it tries to copy this hint into - if it is possible to copy the entire hint, it will do so. This elides a recvmsg() call for received traffic that isn't exactly page-aligned in size. This was tested with RPC-style traffic of arbitrary sizes. Normally, each received message required at least one getsockopt() call, and one recvmsg() call for the remaining unaligned data. With this change, almost all of the recvmsg() calls are eliminated, leading to a savings of about 25%-50% in number of system calls for RPC-style workloads. Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:40:52 -08:00
Jakub Kicinski	4be986c824	Merge branch 'seg6-add-support-for-srv6-end-dt4-dt6-behavior' Andrea Mayer says: ==================== seg6: add support for SRv6 End.DT4/DT6 behavior This patchset provides support for the SRv6 End.DT4 and End.DT6 (VRF mode) behaviors. The SRv6 End.DT4 behavior is used to implement multi-tenant IPv4 L3 VPNs. It decapsulates the received packets and performs IPv4 routing lookup in the routing table of the tenant. The SRv6 End.DT4 Linux implementation leverages a VRF device in order to force the routing lookup into the associated routing table. The SRv6 End.DT4 behavior is defined in the SRv6 Network Programming [1]. The Linux kernel already offers an implementation of the SRv6 End.DT6 behavior which allows us to set up IPv6 L3 VPNs over SRv6 networks. This new implementation of DT6 is based on the same VRF infrastructure already exploited for implementing the SRv6 End.DT4 behavior. The aim of the new SRv6 End.DT6 in VRF mode consists in simplifying the construction of IPv6 L3 VPN services in the multi-tenant environment. Currently, the two SRv6 End.DT6 implementations (legacy and VRF mode) coexist seamlessly and can be chosen according to the context and the user preferences. - Patch 1 is needed to solve a pre-existing issue with tunneled packets when a sniffer is attached; - Patch 2 improves the management of the seg6local attributes used by the SRv6 behaviors; - Patch 3 adds support for optional attributes in SRv6 behaviors; - Patch 4 introduces two callbacks used for customizing the creation/destruction of a SRv6 behavior; - Patch 5 is the core patch that adds support for the SRv6 End.DT4 behavior; - Patch 6 introduces the VRF support for SRv6 End.DT6 behavior; - Patch 7 adds the selftest for SRv6 End.DT4 behavior; - Patch 8 adds the selftest for SRv6 End.DT6 (VRF mode) behavior. Regarding iproute2, the support for the new "vrftable" attribute, required by both SRv6 End.DT4 and End.DT6 (VRF mode) behaviors, is provided in a different patchset that will follow shortly. I would like to thank David Ahern for his support during the development of this patchset. [1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming ==================== Link: https://lore.kernel.org/r/20201202130517.4967-1-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:30:53 -08:00
Andrea Mayer	2bc035538e	selftests: add selftest for the SRv6 End.DT6 (VRF) behavior this selftest is designed for evaluating the new SRv6 End.DT6 (VRF) behavior used, in this example, for implementing IPv6 L3 VPN use cases. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Paolo Lungaroni <paolo.lungaroni@cnit.it> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:30:51 -08:00
Andrea Mayer	2195444e09	selftests: add selftest for the SRv6 End.DT4 behavior this selftest is designed for evaluating the new SRv6 End.DT4 behavior used, in this example, for implementing IPv4 L3 VPN use cases. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:30:51 -08:00
Andrea Mayer	20a081b798	seg6: add VRF support for SRv6 End.DT6 behavior SRv6 End.DT6 is defined in the SRv6 Network Programming [1]. The Linux kernel already offers an implementation of the SRv6 End.DT6 behavior which permits IPv6 L3 VPNs over SRv6 networks. This implementation is not particularly suitable in contexts where we need to deploy IPv6 L3 VPNs among different tenants which share the same network address schemes. The underlying problem lies in the fact that the current version of DT6 (called legacy DT6 from now on) needs a complex configuration to be applied on routers which requires ad-hoc routes and routing policy rules to ensure the correct isolation of tenants. Consequently, a new implementation of DT6 has been introduced with the aim of simplifying the construction of IPv6 L3 VPN services in the multi-tenant environment using SRv6 networks. To accomplish this task, we reused the same VRF infrastructure and SRv6 core components already exploited for implementing the SRv6 End.DT4 behavior. Currently the two End.DT6 implementations coexist seamlessly and can be used depending on the context and the user preferences. So, in order to support both versions of DT6 a new attribute (vrftable) has been introduced which allows us to differentiate the implementation of the behavior to be used. A SRv6 End.DT6 legacy behavior is still instantiated using a command like the following one: $ ip -6 route add 2001:db8::1 encap seg6local action End.DT6 table 100 dev eth0 While to instantiate the SRv6 End.DT6 in VRF mode, the command is still pretty straight forward: $ ip -6 route add 2001:db8::1 encap seg6local action End.DT6 vrftable 100 dev eth0. Obviously as in the case of SRv6 End.DT4, the VRF strict_mode parameter must be set (net.vrf.strict_mode=1) and the VRF associated with table 100 must exist. Please note that the instances of SRv6 End.DT6 legacy and End.DT6 VRF mode can coexist in the same system/configuration without problems. [1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:30:51 -08:00
Andrea Mayer	664d6f8686	seg6: add support for the SRv6 End.DT4 behavior SRv6 End.DT4 is defined in the SRv6 Network Programming [1]. The SRv6 End.DT4 is used to implement IPv4 L3VPN use-cases in multi-tenants environments. It decapsulates the received packets and it performs IPv4 routing lookup in the routing table of the tenant. The SRv6 End.DT4 Linux implementation leverages a VRF device in order to force the routing lookup into the associated routing table. To make the End.DT4 work properly, it must be guaranteed that the routing table used for routing lookup operations is bound to one and only one VRF during the tunnel creation. Such constraint has to be enforced by enabling the VRF strict_mode sysctl parameter, i.e: $ sysctl -wq net.vrf.strict_mode=1. At JANOG44, LINE corporation presented their multi-tenant DC architecture using SRv6 [2]. In the slides, they reported that the Linux kernel is missing the support of SRv6 End.DT4 behavior. The SRv6 End.DT4 behavior can be instantiated using a command similar to the following: $ ip route add 2001:db8::1 encap seg6local action End.DT4 vrftable 100 dev eth0 We introduce the "vrftable" extension in iproute2 in a following patch. [1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming [2] https://speakerdeck.com/line_developers/line-data-center-networking-with-srv6 Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:30:50 -08:00
Andrea Mayer	cfdf64a034	seg6: add callbacks for customizing the creation/destruction of a behavior We introduce two callbacks used for customizing the creation/destruction of a SRv6 behavior. Such callbacks are defined in the new struct seg6_local_lwtunnel_ops and hereafter we provide a brief description of them: - build_state(...): used for calling the custom constructor of the behavior during its initialization phase and after all the attributes have been parsed successfully; - destroy_state(...): used for calling the custom destructor of the behavior before it is completely destroyed. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:30:50 -08:00
Andrea Mayer	0a3021f1d4	seg6: add support for optional attributes in SRv6 behaviors Before this patch, each SRv6 behavior specifies a set of required attributes that must be provided by the userspace application when such behavior is going to be instantiated. If at least one of the required attributes is not provided, the creation of the behavior fails. The SRv6 behavior framework lacks a way to manage optional attributes. By definition, an optional attribute for a SRv6 behavior consists of an attribute which may or may not be provided by the userspace. Therefore, if an optional attribute is missing (and thus not supplied by the user) the creation of the behavior goes ahead without any issue. This patch explicitly differentiates the required attributes from the optional attributes. In particular, each behavior can declare a set of required attributes and a set of optional ones. The semantic of the required attributes remains totally unaffected by this patch. The introduction of the optional attributes does NOT impact on the backward compatibility of the existing SRv6 behaviors. It is essential to note that if an (optional or required) attribute is supplied to a SRv6 behavior which does not expect it, the behavior simply discards such attribute without generating any error or warning. This operating mode remained unchanged both before and after the introduction of the optional attributes extension. The optional attributes are one of the key components used to implement the SRv6 End.DT6 behavior based on the Virtual Routing and Forwarding (VRF) framework. The optional attributes make possible the coexistence of the already existing SRv6 End.DT6 implementation with the new SRv6 End.DT6 VRF-based implementation without breaking any backward compatibility. Further details on the SRv6 End.DT6 behavior (VRF mode) are reported in subsequent patches. From the userspace point of view, the support for optional attributes DO NOT require any changes to the userspace applications, i.e: iproute2 unless new attributes (required or optional) are needed. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:30:50 -08:00
Andrea Mayer	964adce526	seg6: improve management of behavior attributes Depending on the attribute (i.e.: SEG6_LOCAL_SRH, SEG6_LOCAL_TABLE, etc), the parse() callback performs some validity checks on the provided input and updates the tunnel state (slwt) with the result of the parsing operation. However, an attribute may also need to reserve some additional resources (i.e.: memory or setting up an eBPF program) in the parse() callback to complete the parsing operation. The parse() callbacks are invoked by the parse_nla_action() for each attribute belonging to a specific behavior. Given a behavior with N attributes, if the parsing of the i-th attribute fails, the parse_nla_action() returns immediately with an error. Nonetheless, the resources acquired during the parsing of the i-1 attributes are not freed by the parse_nla_action(). Attributes which acquire resources must release them in an explicit way in both the seg6_local_{build/destroy}_state(). However, adding a new attribute of this type requires changes to seg6_local_{build/destroy}_state() to release the resources correctly. The seg6local infrastructure still lacks a simple and structured way to release the resources acquired in the parse() operations. We introduced a new callback in the struct seg6_action_param named destroy(). This callback releases any resource which may have been acquired in the parse() counterpart. Each attribute may or may not implement the destroy() callback depending on whether it needs to free some acquired resources. The destroy() callback comes with several of advantages: 1) we can have many attributes as we want for a given behavior with no need to explicitly free the taken resources; 2) As in case of the seg6_local_build_state(), the seg6_local_destroy_state() does not need to handle the release of resources directly. Indeed, it calls the destroy_attrs() function which is in charge of calling the destroy() callback for every set attribute. We do not need to patch seg6_local_{build/destroy}_state() anymore as we add new attributes; 3) the code is more readable and better structured. Indeed, all the information needed to handle a given attribute are contained in only one place; 4) it facilitates the integration with new features introduced in further patches. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:30:50 -08:00
Andrea Mayer	0489390882	vrf: add mac header for tunneled packets when sniffer is attached Before this patch, a sniffer attached to a VRF used as the receiving interface of L3 tunneled packets detects them as malformed packets and it complains about that (i.e.: tcpdump shows bogus packets). The reason is that a tunneled L3 packet does not carry any L2 information and when the VRF is set as the receiving interface of a decapsulated L3 packet, no mac header is currently set or valid. Therefore, the purpose of this patch consists of adding a MAC header to any packet which is directly received on the VRF interface ONLY IF: i) a sniffer is attached on the VRF and ii) the mac header is not set. In this case, the mac address of the VRF is copied in both the destination and the source address of the ethernet header. The protocol type is set either to IPv4 or IPv6, depending on which L3 packet is received. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 13:30:50 -08:00
Jakub Kicinski	846c3c9cfe	wireless-drivers-next patches for v5.11 First set of patches for v5.11. rtw88 getting improvements to work better with Bluetooth and other driver also getting some new features. mhi-ath11k-immutable branch was pulled from mhi tree to avoid conflicts with mhi tree. Major changes: rtw88 * major bluetooth co-existance improvements wilc1000 * Wi-Fi Multimedia (WMM) support ath11k * Fast Initial Link Setup (FILS) discovery and unsolicited broadcast probe response support * qcom,ath11k-calibration-variant Device Tree setting * cold boot calibration support * new DFS region: JP wnc36xx * enable connection monitoring and keepalive in firmware ath10k * firmware IRAM recovery feature mhi * merge mhi-ath11k-immutable branch to make MHI API change go smoothly -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJfyTQyAAoJEG4XJFUm622bCdcIAIyVnqdW7pnoDmWIyQmAEnD9 vGARkzghPHXnufpOzohyDdxT12X9klhrxSVIgzEgH1/pl3i1PpnF6KXyGFCC44Lw wrLXhQygPzmIW1IZtJJE3G72WExXoRjWx6LD1I7C7oEIduqFixXADmK2tKzFp795 Jxum+sOeT6+Dk1OvO/fIroBHX73mRE9zAuiTIMpt2G1j8uXs9QVfcTbTrUshLASN 0sX9J6JutltBuM4G7+bFpVzKnLnlQ7ebUaF6nvTCQsgHWZwkS7yAubSWX9sFohbR UXgQHNE83s/esOg7nBxAfqTKP8mbxsobmxZtxE5GR5vFY5FJDxqP9Zc2KzPp39w= =CbX/ -----END PGP SIGNATURE----- Merge tag 'wireless-drivers-next-2020-12-03' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for v5.11 First set of patches for v5.11. rtw88 getting improvements to work better with Bluetooth and other driver also getting some new features. mhi-ath11k-immutable branch was pulled from mhi tree to avoid conflicts with mhi tree. Major changes: rtw88 * major bluetooth co-existance improvements wilc1000 * Wi-Fi Multimedia (WMM) support ath11k * Fast Initial Link Setup (FILS) discovery and unsolicited broadcast probe response support * qcom,ath11k-calibration-variant Device Tree setting * cold boot calibration support * new DFS region: JP wnc36xx * enable connection monitoring and keepalive in firmware ath10k * firmware IRAM recovery feature mhi * merge mhi-ath11k-immutable branch to make MHI API change go smoothly * tag 'wireless-drivers-next-2020-12-03' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next: (180 commits) wl1251: remove trailing semicolon in macro definition airo: remove trailing semicolon in macro definition wilc1000: added queue support for WMM wilc1000: call complete() for failure in wilc_wlan_txq_add_cfg_pkt() wilc1000: free resource in wilc_wlan_txq_add_mgmt_pkt() for failure path wilc1000: free resource in wilc_wlan_txq_add_net_pkt() for failure path wilc1000: added 'ndo_set_mac_address' callback support brcmfmac: expose firmware config files through modinfo wlcore: Switch to using the new API kobj_to_dev() rtw88: coex: add feature to enhance HID coexistence performance rtw88: coex: upgrade coexistence A2DP mechanism rtw88: coex: add action for coexistence in hardware initial rtw88: coex: add function to avoid cck lock rtw88: coex: change the coexistence mechanism for WLAN connected rtw88: coex: change the coexistence mechanism for HID rtw88: coex: update AFH information while in free-run mode rtw88: coex: update the mechanism for A2DP + PAN rtw88: coex: add debug message rtw88: coex: run coexistence when WLAN entering/leaving LPS Revert "rtl8xxxu: Add Buffalo WI-U3-866D to list of supported devices" ... ==================== Link: https://lore.kernel.org/r/20201203185732.9CFA5C433ED@smtp.codeaurora.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 10:56:37 -08:00
Anders Roxell	fdd8b8249e	dpaa_eth: fix build errorr in dpaa_fq_init When building FSL_DPAA_ETH the following build error shows up: /tmp/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c: In function ‘dpaa_fq_init’: /tmp/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c:1135:9: error: too few arguments to function ‘xdp_rxq_info_reg’ 1135 \| err = xdp_rxq_info_reg(&dpaa_fq->xdp_rxq, dpaa_fq->net_dev, \| ^~~~~~~~~~~~~~~~ Commit `b02e5a0ebb` ("xsk: Propagate napi_id to XDP socket Rx path") added an extra argument to function xdp_rxq_info_reg and commit `d57e57d0cd` ("dpaa_eth: add XDP_TX support") didn't know about that extra argument. Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/r/20201203144343.790719-1-anders.roxell@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 10:23:02 -08:00
Jakub Kicinski	a1dd1d8697	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-12-03 The main changes are: 1) Support BTF in kernel modules, from Andrii. 2) Introduce preferred busy-polling, from Björn. 3) bpf_ima_inode_hash() and bpf_bprm_opts_set() helpers, from KP Singh. 4) Memcg-based memory accounting for bpf objects, from Roman. 5) Allow bpf_{s,g}etsockopt from cgroup bind{4,6} hooks, from Stanislav. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (118 commits) selftests/bpf: Fix invalid use of strncat in test_sockmap libbpf: Use memcpy instead of strncpy to please GCC selftests/bpf: Add fentry/fexit/fmod_ret selftest for kernel module selftests/bpf: Add tp_btf CO-RE reloc test for modules libbpf: Support attachment of BPF tracing programs to kernel modules libbpf: Factor out low-level BPF program loading helper bpf: Allow to specify kernel module BTFs when attaching BPF programs bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier selftests/bpf: Add CO-RE relocs selftest relying on kernel module BTF selftests/bpf: Add support for marking sub-tests as skipped selftests/bpf: Add bpf_testmod kernel module for testing libbpf: Add kernel module BTF support for CO-RE relocations libbpf: Refactor CO-RE relocs to not assume a single BTF object libbpf: Add internal helper to load BTF data by FD bpf: Keep module's btf_data_size intact after load bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address() selftests/bpf: Add Userspace tests for TCP_WINDOW_CLAMP bpf: Adds support for setting window clamp samples/bpf: Fix spelling mistake "recieving" -> "receiving" bpf: Fix cold build of test_progs-no_alu32 ... ==================== Link: https://lore.kernel.org/r/20201204021936.85653-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 07:48:12 -08:00
Andrii Nakryiko	eceae70bde	selftests/bpf: Fix invalid use of strncat in test_sockmap strncat()'s third argument is how many bytes will be added in addition to already existing bytes in destination. Plus extra zero byte will be added after that. So existing use in test_sockmap has many opportunities to overflow the string and cause memory corruptions. And in this case, GCC complains for a good reason. Fixes: `16962b2404` ("bpf: sockmap, add selftests") Fixes: `73563aa3d9` ("selftests/bpf: test_sockmap, print additional test options") Fixes: `1ade9abadf` ("bpf: test_sockmap, add options for msg_pop_data() helper") Fixes: `463bac5f1c` ("bpf, selftests: Add test for ktls with skb bpf ingress policy") Fixes: `e9dd904708` ("bpf: add tls support for testing in test_sockmap") Fixes: `753fb2ee09` ("bpf: sockmap, add msg_peek tests to test_sockmap") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203235440.2302137-2-andrii@kernel.org	2020-12-03 18:07:05 -08:00
Andrii Nakryiko	3015b500ae	libbpf: Use memcpy instead of strncpy to please GCC Some versions of GCC are really nit-picky about strncpy() use. Use memcpy(), as they are pretty much equivalent for the case of fixed length strings. Fixes: `e459f49b43` ("libbpf: Separate XDP program load with xsk socket creation") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203235440.2302137-1-andrii@kernel.org	2020-12-03 18:07:05 -08:00
Alexei Starovoitov	8158c5fd61	Merge branch 'Support BTF-powered BPF tracing programs for kernel modules' Andrii Nakryiko says: ==================== This patch sets extends kernel and libbpf with support for attaching BTF-powered raw tracepoint (tp_btf) and tracing (fentry/fexit/fmod_ret/lsm) BPF programs to BPF hooks defined in kernel modules. As part of that, libbpf now supports performing CO-RE relocations against types in kernel module BTFs, in addition to existing vmlinux BTF support. Kernel UAPI for BPF_PROG_LOAD now allows to specify kernel module (or vmlinux) BTF object FD in attach_btf_obj_fd field, aliased to attach_prog_fd. This is used to identify which BTF object needs to be used for finding BTF type by provided attach_btf_id. This patch set also sets up a convenient and fully-controlled custom kernel module (called "bpf_testmod"), that is a predictable playground for all the BPF selftests, that rely on module BTFs. Currently pahole doesn't generate BTF_KIND_FUNC info for ftrace-able static functions in kernel modules, so expose traced function in bpf_sidecar.ko. Once pahole is enhanced, we can go back to static function. From end user perspective there are no extra actions that need to happen. Libbpf will continue searching across all kernel module BTFs, if desired attach BTF type is not found in vmlinux. That way it doesn't matter if BPF hook that user is trying to attach to is built into vmlinux image or is loaded in kernel module. v5->v6: - move btf_put() back to syscall.c (kernel test robot); - added close(fd) in patch #5 (John); v4->v5: - use FD to specify BTF object (Alexei); - move prog->aux->attach_btf putting into bpf_prog_free() for consistency with putting prog->aux->dst_prog; - fix BTF FD leak(s) in libbpf; v3->v4: - merge together patch sets [0] and [1]; - avoid increasing bpf_reg_state by reordering fields (Alexei); - preserve btf_data_size in struct module; v2->v3: - fix subtle uninitialized variable use in BTF ID iteration code; v1->v2: - module_put() inside preempt_disable() region (Alexei); - bpf_sidecar -> bpf_testmod rename (Alexei); - test_progs more relaxed handling of bpf_testmod; - test_progs marks skipped sub-tests properly as SKIP now. [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=393677&state=* [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=393679&state=* ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-12-03 17:38:42 -08:00
Andrii Nakryiko	1e38abefcf	selftests/bpf: Add fentry/fexit/fmod_ret selftest for kernel module Add new selftest checking attachment of fentry/fexit/fmod_ret (and raw tracepoint ones for completeness) BPF programs to kernel module function. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-15-andrii@kernel.org	2020-12-03 17:38:21 -08:00
Andrii Nakryiko	bc9ed69c79	selftests/bpf: Add tp_btf CO-RE reloc test for modules Add another CO-RE relocation test for kernel module relocations. This time for tp_btf with direct memory reads. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-14-andrii@kernel.org	2020-12-03 17:38:21 -08:00
Andrii Nakryiko	91abb4a6d7	libbpf: Support attachment of BPF tracing programs to kernel modules Teach libbpf to search for BTF types in kernel modules for tracing BPF programs. This allows attachment of raw_tp/fentry/fexit/fmod_ret/etc BPF program types to tracepoints and functions in kernel modules. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-13-andrii@kernel.org	2020-12-03 17:38:21 -08:00
Andrii Nakryiko	6aef10a481	libbpf: Factor out low-level BPF program loading helper Refactor low-level API for BPF program loading to not rely on public API types. This allows painless extension without constant efforts to cleverly not break backwards compatibility. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-12-andrii@kernel.org	2020-12-03 17:38:21 -08:00
Andrii Nakryiko	290248a5b7	bpf: Allow to specify kernel module BTFs when attaching BPF programs Add ability for user-space programs to specify non-vmlinux BTF when attaching BTF-powered BPF programs: raw_tp, fentry/fexit/fmod_ret, LSM, etc. For this, attach_prog_fd (now with the alias name attach_btf_obj_fd) should specify FD of a module or vmlinux BTF object. For backwards compatibility reasons, 0 denotes vmlinux BTF. Only kernel BTF (vmlinux or module) can be specified. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-11-andrii@kernel.org	2020-12-03 17:38:21 -08:00
Andrii Nakryiko	22dc4a0f5e	bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier Remove a permeating assumption thoughout BPF verifier of vmlinux BTF. Instead, wherever BTF type IDs are involved, also track the instance of struct btf that goes along with the type ID. This allows to gradually add support for kernel module BTFs and using/tracking module types across BPF helper calls and registers. This patch also renames btf_id() function to btf_obj_id() to minimize naming clash with using btf_id to denote BTF type ID, rather than BTF object's ID. Also, altough btf_vmlinux can't get destructed and thus doesn't need refcounting, module BTFs need that, so apply BTF refcounting universally when BPF program is using BTF-powered attachment (tp_btf, fentry/fexit, etc). This makes for simpler clean up code. Now that BTF type ID is not enough to uniquely identify a BTF type, extend BPF trampoline key to include BTF object ID. To differentiate that from target program BPF ID, set 31st bit of type ID. BTF type IDs (at least currently) are not allowed to take full 32 bits, so there is no danger of confusing that bit with a valid BTF type ID. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-10-andrii@kernel.org	2020-12-03 17:38:21 -08:00
Andrii Nakryiko	6bcd39d366	selftests/bpf: Add CO-RE relocs selftest relying on kernel module BTF Add a self-tests validating libbpf is able to perform CO-RE relocations against the type defined in kernel module BTF. if bpf_testmod.o is not supported by the kernel (e.g., due to version mismatch), skip tests, instead of failing. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-9-andrii@kernel.org	2020-12-03 17:38:21 -08:00
Andrii Nakryiko	5ed31472b9	selftests/bpf: Add support for marking sub-tests as skipped Previously skipped sub-tests would be counted as passing with ":OK" appened in the log. Change that to be accounted as ":SKIP". Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-8-andrii@kernel.org	2020-12-03 17:38:21 -08:00
Andrii Nakryiko	9f7fa22589	selftests/bpf: Add bpf_testmod kernel module for testing Add bpf_testmod module, which is conceptually out-of-tree module and provides ways for selftests/bpf to test various kernel module-related functionality: raw tracepoint, fentry/fexit/fmod_ret, etc. This module will be auto-loaded by test_progs test runner and expected by some of selftests to be present and loaded. Pahole currently isn't able to generate BTF for static functions in kernel modules, so make sure traced function is global. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201203204634.1325171-7-andrii@kernel.org	2020-12-03 17:38:20 -08:00
Andrii Nakryiko	4f33a53d56	libbpf: Add kernel module BTF support for CO-RE relocations Teach libbpf to search for candidate types for CO-RE relocations across kernel modules BTFs, in addition to vmlinux BTF. If at least one candidate type is found in vmlinux BTF, kernel module BTFs are not iterated. If vmlinux BTF has no matching candidates, then find all kernel module BTFs and search for all matching candidates across all of them. Kernel's support for module BTFs are inferred from the support for BTF name pointer in BPF UAPI. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-6-andrii@kernel.org	2020-12-03 17:38:20 -08:00
Andrii Nakryiko	0f7515ca7c	libbpf: Refactor CO-RE relocs to not assume a single BTF object Refactor CO-RE relocation candidate search to not expect a single BTF, rather return all candidate types with their corresponding BTF objects. This will allow to extend CO-RE relocations to accommodate kernel module BTFs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201203204634.1325171-5-andrii@kernel.org	2020-12-03 17:38:20 -08:00
Andrii Nakryiko	a19f93cfaf	libbpf: Add internal helper to load BTF data by FD Add a btf_get_from_fd() helper, which constructs struct btf from in-kernel BTF data by FD. This is used for loading module BTFs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-4-andrii@kernel.org	2020-12-03 17:38:20 -08:00
Andrii Nakryiko	2fe8890848	bpf: Keep module's btf_data_size intact after load Having real btf_data_size stored in struct module is benefitial to quickly determine which kernel modules have associated BTF object and which don't. There is no harm in keeping this info, as opposed to keeping invalid pointer. Fixes: `607c543f93` ("bpf: Sanitize BTF data pointer after module is loaded") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-3-andrii@kernel.org	2020-12-03 17:38:20 -08:00
Andrii Nakryiko	12cc126df8	bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address() __module_address() needs to be called with preemption disabled or with module_mutex taken. preempt_disable() is enough for read-only uses, which is what this fix does. Also, module_put() does internal check for NULL, so drop it as well. Fixes: `a38d1107f9` ("bpf: support raw tracepoints in modules") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201203204634.1325171-2-andrii@kernel.org	2020-12-03 17:38:20 -08:00
Alexei Starovoitov	cadd64807c	Merge branch 'Add support to set window_clamp from bpf setsockops' Prankur gupta says: ==================== This patch contains support to set tcp window_field field from bpf setsockops. v2: Used TCP_WINDOW_CLAMP setsockopt logic for bpf_setsockopt (review comment addressed) v3: Created a common function for duplicated code (review comment addressed) v4: Removing logic to pass struct sock and struct tcp_sock together (review comment addressed) ==================== Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-12-03 17:25:24 -08:00
Prankur gupta	55144f31f0	selftests/bpf: Add Userspace tests for TCP_WINDOW_CLAMP Adding selftests for new added functionality to set TCP_WINDOW_CLAMP from bpf setsockopt. Signed-off-by: Prankur gupta <prankgup@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201202213152.435886-3-prankgup@fb.com	2020-12-03 17:23:24 -08:00
Prankur gupta	cb81110997	bpf: Adds support for setting window clamp Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_WINDOW_CLAMP, which sets the maximum receiver window size. It will be useful for limiting receiver window based on RTT. Signed-off-by: Prankur gupta <prankgup@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201202213152.435886-2-prankgup@fb.com	2020-12-03 17:23:24 -08:00
Jakub Kicinski	55fd59b003	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: drivers/net/ethernet/ibm/ibmvnic.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-03 15:44:09 -08:00
Linus Torvalds	bbe2ba04c5	Networking fixes for 5.10-rc7, including fixes from bpf, netfilter, wireless drivers, wireless mesh and can. Current release - regressions: - mt76: usb: fix crash on device removal Current release - always broken: - xsk: Fix umem cleanup from wrong context in socket destruct Previous release - regressions: - net: ip6_gre: set dev->hard_header_len when using header_ops - ipv4: Fix TOS mask in inet_rtm_getroute() - net, xsk: Avoid taking multiple skbuff references Previous release - always broken: - net/x25: prevent a couple of overflows - netfilter: ipset: prevent uninit-value in hash_ip6_add - geneve: pull IP header before ECN decapsulation - mpls: ensure LSE is pullable in TC and openvswitch paths - vxlan: respect needed_headroom of lower device - batman-adv: Consider fragmentation for needed packet headroom - can: drivers: don't count arbitration loss as an error - netfilter: bridge: reset skb->pkt_type after POST_ROUTING traversal - inet_ecn: Fix endianness of checksum update when setting ECT(1) - ibmvnic: fix various corner cases around reset handling - net/mlx5: fix rejecting unsupported Connect-X6DX SW steering - net/mlx5: Enforce HW TX csum offload with kTLS Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl/JS3sACgkQMUZtbf5S Irs7QA/9ELcJ2gklCJwrlVGXNhUddGpZH9OX2K3WL/c1ZzgARt3e0jkO88lY25Tk tXTRTelx7xzHUNBmXJhBx1Wj8H+S/5A1FLMdl3ZqkeFrvrYIUxSvnbRoFB0CALrV OXYtsd7P86BHrT5hQNGte9V5JV5LpYAUvH6+QSD7mWOzul0gtIcKEJ7claypYuRT hm+wt2ENSRU3bNNwOVG8SoA1CEFFXePfyqEr6cBTs+1/OyzYV4880LvJXVdwwOx0 DogwsPt5L53Y2uoOaFKVRr2SUVzOi9Y79FAX3rfqIqoi89xcbK6ihHsb4ldGxkAy ILZEU/Y4lB6YsdtJjGGrB7cPhiWOl0AzPYgmOczWHw/5LMzgWKEt6H/JvkjGSlQJ pXixi6/cmsQOS6o5ydQT9Iu5qLMOOduv2mmQmOPJHkq8/SgiYTuTUiJkXgL8pPv+ Mq4Qm4JL+6aB2WL0NNzlqjVnIbFQmmGdrYGWdQnSeTN6X4T/uFQIz4fSQlQmFils qw1MBLZfhgjc4npfC0j5LdcABhC0BwEGelTJBKnc6+MbZlDTv2NdzP7wldzpjalR /a0/hLHsDMCkft92BQ3jp0C1LSikSYAhBPRJLSQiQbxzBv5JnDr6S5WpBTtBoDKT LdEqlS+mo0GwRK3pm2vSHQ4iVJY9v0PV0SbeJXH/SlJGYieUqJc= =HskU -----END PGP SIGNATURE----- Merge tag 'net-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.10-rc7, including fixes from bpf, netfilter, wireless drivers, wireless mesh and can. Current release - regressions: - mt76: usb: fix crash on device removal Current release - always broken: - xsk: Fix umem cleanup from wrong context in socket destruct Previous release - regressions: - net: ip6_gre: set dev->hard_header_len when using header_ops - ipv4: Fix TOS mask in inet_rtm_getroute() - net, xsk: Avoid taking multiple skbuff references Previous release - always broken: - net/x25: prevent a couple of overflows - netfilter: ipset: prevent uninit-value in hash_ip6_add - geneve: pull IP header before ECN decapsulation - mpls: ensure LSE is pullable in TC and openvswitch paths - vxlan: respect needed_headroom of lower device - batman-adv: Consider fragmentation for needed packet headroom - can: drivers: don't count arbitration loss as an error - netfilter: bridge: reset skb->pkt_type after POST_ROUTING traversal - inet_ecn: Fix endianness of checksum update when setting ECT(1) - ibmvnic: fix various corner cases around reset handling - net/mlx5: fix rejecting unsupported Connect-X6DX SW steering - net/mlx5: Enforce HW TX csum offload with kTLS" * tag 'net-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits) net/mlx5: DR, Proper handling of unsupported Connect-X6DX SW steering net/mlx5e: kTLS, Enforce HW TX csum offload with kTLS net: mlx5e: fix fs_tcp.c build when IPV6 is not enabled net/mlx5: Fix wrong address reclaim when command interface is down net/sched: act_mpls: ensure LSE is pullable before reading it net: openvswitch: ensure LSE is pullable before reading it net: skbuff: ensure LSE is pullable before decrementing the MPLS ttl net: mvpp2: Fix error return code in mvpp2_open() chelsio/chtls: fix a double free in chtls_setkey() rtw88: debug: Fix uninitialized memory in debugfs code vxlan: fix error return code in __vxlan_dev_create() net: pasemi: fix error return code in pasemi_mac_open() cxgb3: fix error return code in t3_sge_alloc_qset() net/x25: prevent a couple of overflows dpaa_eth: copy timestamp fields to new skb in A-050385 workaround net: ip6_gre: set dev->hard_header_len when using header_ops mt76: usb: fix crash on device removal iwlwifi: pcie: add some missing entries for AX210 iwlwifi: pcie: invert values of NO_160 device config entries iwlwifi: pcie: add one missing entry for AX210 ...	2020-12-03 13:10:11 -08:00
Jakub Kicinski	a4390e966f	Merge branch 'mptcp-reject-invalid-mp_join-requests-right-away' Florian Westphal says: ==================== mptcp: reject invalid mp_join requests right away At the moment MPTCP can detect an invalid join request (invalid token, max number of subflows reached, and so on) right away but cannot reject the connection until the 3WHS has completed. Instead the connection will complete and the subflow is reset afterwards. To send the reset most information is already available, but we don't have good spot where the reset could be sent: 1. The ->init_req callback is too early and also doesn't allow to return an error that could be used to inform the TCP stack that the SYN should be dropped. 2. The ->route_req callback lacks the skb needed to send a reset. 3. The ->send_synack callback is the best fit from the available hooks, but its called after the request socket has been inserted into the queue already. This means we'd have to remove it again right away. From a technical point of view, the second hook would be best: 1. Its before insertion into listener queue. 2. If it returns NULL TCP will drop the packet for us. Problem is that we'd have to pass the skb to the function just for MPTCP. Paolo suggested to merge init_req and route_req callbacks instead: This makes all info available to MPTCP -- a return value of NULL drops the packet and MPTCP can send the reset if needed. Because 'route_req' has a 'const struct sock *', this means either removal of const qualifier, or a bit of code churn to pass 'const' in security land. This does the latter; I did not find any spots that need write access to struct sock. To recap, the two alternatives are: 1. Solve it entirely in MPTCP: use the ->send_synack callback to unlink the request socket from the listener & drop it. 2. Avoid 'security' churn by removing the const qualifier. ==================== Link: https://lore.kernel.org/r/20201130153631.21872-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-03 12:56:05 -08:00
Florian Westphal	3ecfbe3e82	mptcp: emit tcp reset when a join request fails RFC 8684 says: If the token is unknown or the host wants to refuse subflow establishment (for example, due to a limit on the number of subflows it will permit), the receiver will send back a reset (RST) signal, analogous to an unknown port in TCP, containing an MP_TCPRST option (Section 3.6) with an "MPTCP specific error" reason code. mptcp-next doesn't support MP_TCPRST yet, this can be added in another change. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-03 12:56:03 -08:00

1 2 3 4 5 ...

969797 Commits