linux

Author	SHA1	Message	Date
Hao Luo	eaa6bcb71e	bpf: Introduce bpf_per_cpu_ptr() Add bpf_per_cpu_ptr() to help bpf programs access percpu vars. bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel except that it may return NULL. This happens when the cpu parameter is out of range. So the caller must check the returned value. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200929235049.2533242-5-haoluo@google.com	2020-10-02 15:00:49 -07:00
Hao Luo	2c2f6abeff	selftests/bpf: Ksyms_btf to test typed ksyms Selftests for typed ksyms. Tests two types of ksyms: one is a struct, the other is a plain int. This tests two paths in the kernel. Struct ksyms will be converted into PTR_TO_BTF_ID by the verifier while int typed ksyms will be converted into PTR_TO_MEM. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200929235049.2533242-4-haoluo@google.com	2020-10-02 14:59:25 -07:00
Hao Luo	d370bbe121	bpf/libbpf: BTF support for typed ksyms If a ksym is defined with a type, libbpf will try to find the ksym's btf information from kernel btf. If a valid btf entry for the ksym is found, libbpf can pass in the found btf id to the verifier, which validates the ksym's type and value. Typeless ksyms (i.e. those defined as 'void') will not have such btf_id, but it has the symbol's address (read from kallsyms) and its value is treated as a raw pointer. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200929235049.2533242-3-haoluo@google.com	2020-10-02 14:59:25 -07:00
Hao Luo	4976b718c3	bpf: Introduce pseudo_btf_id Pseudo_btf_id is a type of ld_imm insn that associates a btf_id to a ksym so that further dereferences on the ksym can use the BTF info to validate accesses. Internally, when seeing a pseudo_btf_id ld insn, the verifier reads the btf_id stored in the insn[0]'s imm field and marks the dst_reg as PTR_TO_BTF_ID. The btf_id points to a VAR_KIND, which is encoded in btf_vminux by pahole. If the VAR is not of a struct type, the dst reg will be marked as PTR_TO_MEM instead of PTR_TO_BTF_ID and the mem_size is resolved to the size of the VAR's type. >From the VAR btf_id, the verifier can also read the address of the ksym's corresponding kernel var from kallsyms and use that to fill dst_reg. Therefore, the proper functionality of pseudo_btf_id depends on (1) kallsyms and (2) the encoding of kernel global VARs in pahole, which should be available since pahole v1.18. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200929235049.2533242-2-haoluo@google.com	2020-10-02 14:59:25 -07:00
Alexei Starovoitov	440c5752a3	Merge branch 'Do not limit cb_flags when creating child sk' Martin KaFai says: ==================== This set fixes an issue that the bpf_skops_init_child() unnecessarily limited the child sk from inheriting all bpf_sock_ops_cb_flags of the listen sk. It also adds a test to check that. ==================== Tested-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-10-02 11:34:55 -07:00
Martin KaFai Lau	96d46c5085	bpf: selftest: Ensure the child sk inherited all bpf_sock_ops_cb_flags This patch adds a test to ensure the child sk inherited everything from the bpf_sock_ops_cb_flags of the listen sk: 1. Sets one more cb_flags (BPF_SOCK_OPS_STATE_CB_FLAG) to the listen sk in test_tcp_hdr_options.c 2. Saves the skops->bpf_sock_ops_cb_flags when handling the newly established passive connection 3. CHECK() it is the same as the listen sk This also covers the fastopen case as the existing test_tcp_hdr_options.c does. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201002013454.2542367-1-kafai@fb.com	2020-10-02 11:34:48 -07:00
Martin KaFai Lau	82f45c6c4a	bpf: tcp: Do not limit cb_flags when creating child sk from listen sk The commit `0813a84156` ("bpf: tcp: Allow bpf prog to write and parse TCP header option") unnecessarily introduced bpf_skops_init_child() which limited the child sk from inheriting all bpf_sock_ops_cb_flags of the listen sk. That breaks existing user expectation. This patch removes the bpf_skops_init_child() and just allows sock_copy() to do its job to copy everything from listen sk to the child sk. Fixes: `0813a84156` ("bpf: tcp: Allow bpf prog to write and parse TCP header option") Reported-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201002013448.2542025-1-kafai@fb.com	2020-10-02 11:34:48 -07:00
Stanislav Fomichev	48ca6243c6	selftests/bpf: Properly initialize linfo in sockmap_basic When using -Werror=missing-braces, compiler complains about missing braces. Let's use use ={} initialization which should do the job: tools/testing/selftests/bpf/prog_tests/sockmap_basic.c: In function 'test_sockmap_iter': tools/testing/selftests/bpf/prog_tests/sockmap_basic.c:181:8: error: missing braces around initializer [-Werror=missing-braces] union bpf_iter_link_info linfo = {0}; ^ tools/testing/selftests/bpf/prog_tests/sockmap_basic.c:181:8: error: (near initialization for 'linfo.map') [-Werror=missing-braces] tools/testing/selftests/bpf/prog_tests/sockmap_basic.c: At top level: Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20201002000451.1794044-1-sdf@google.com	2020-10-02 16:47:32 +02:00
Stanislav Fomichev	cffcdbff70	selftests/bpf: Initialize duration in xdp_noinline.c Fixes clang error: tools/testing/selftests/bpf/prog_tests/xdp_noinline.c:35:6: error: variable 'duration' is uninitialized when used here [-Werror,-Wuninitialized] if (CHECK(!skel, "skel_open_and_load", "failed\n")) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201001225440.1373233-1-sdf@google.com	2020-10-02 16:46:20 +02:00
Armin Wolf	360f898746	lib8390: Use netif_msg_init to initialize msg_enable bits Use netif_msg_init() to process param settings and use only the proper initialized value of ei_local->msg_level for later processing; Signed-off-by: Armin Wolf <W_Armin@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-01 19:08:46 -07:00
Willy Liu	7a333af6b1	net: phy: realtek: Modify 2.5G PHY name to RTL8226 Realtek single-chip Ethernet PHY solutions can be separated as below: 10M/100Mbps: RTL8201X 1Gbps: RTL8211X 2.5Gbps: RTL8226/RTL8221X RTL8226 is the first version for realtek that compatible 2.5Gbps single PHY. Since RTL8226 is single port only, realtek changes its name to RTL8221B from the second version. PHY ID for RTL8226 is 0x001cc800 and RTL8226B/RTL8221B is 0x001cc840. RTL8125 is not a single PHY solution, it integrates PHY/MAC/PCIE bus controller and embedded memory. Signed-off-by: Willy Liu <willy.liu@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-01 18:49:47 -07:00
Jing Xiangfeng	f1638a4c79	caif_virtio: Remove redundant initialization of variable err After commit `a8c7687bf2` ("caif_virtio: Check that vringh_config is not null"), the variable err is being initialized with '-EINVAL' that is meaningless. So remove it. Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-01 18:46:16 -07:00
Ye Bin	000fe2685b	net-sysfs: Fix inconsistent of format with argument type in net-sysfs.c Fix follow warnings: [net/core/net-sysfs.c:1161]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'int'. [net/core/net-sysfs.c:1162]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'int'. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-01 18:45:23 -07:00
Ye Bin	32be425b45	pktgen: Fix inconsistent of format with argument type in pktgen.c Fix follow warnings: [net/core/pktgen.c:925]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'signed int'. [net/core/pktgen.c:942]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'signed int'. [net/core/pktgen.c:962]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'signed int'. [net/core/pktgen.c:984]: (warning) %u in format string (no. 1) requires 'unsigned int' but the argument type is 'signed int'. [net/core/pktgen.c:1149]: (warning) %d in format string (no. 1) requires 'int' but the argument type is 'unsigned int'. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-01 18:45:23 -07:00
Xie He	8306266c1d	drivers/net/wan/hdlc_fr: Correctly handle special skb->protocol values The fr_hard_header function is used to prepend the header to skbs before transmission. It is used in 3 situations: 1) When a control packet is generated internally in this driver; 2) When a user sends an skb on an Ethernet-emulating PVC device; 3) When a user sends an skb on a normal PVC device. These 3 situations need to be handled differently by fr_hard_header. Different headers should be prepended to the skb in different situations. Currently fr_hard_header distinguishes these 3 situations using skb->protocol. For situation 1 and 2, a special skb->protocol value will be assigned before calling fr_hard_header, so that it can recognize these 2 situations. All skb->protocol values other than these special ones are treated by fr_hard_header as situation 3. However, it is possible that in situation 3, the user sends an skb with one of the special skb->protocol values. In this case, fr_hard_header would incorrectly treat it as situation 1 or 2. This patch tries to solve this issue by using skb->dev instead of skb->protocol to distinguish between these 3 situations. For situation 1, skb->dev would be NULL; for situation 2, skb->dev->type would be ARPHRD_ETHER; and for situation 3, skb->dev->type would be ARPHRD_DLCI. This way fr_hard_header would be able to distinguish these 3 situations correctly regardless what skb->protocol value the user tries to use in situation 3. Cc: Krzysztof Halasa <khc@pm.waw.pl> Signed-off-by: Xie He <xie.he.0141@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-01 18:38:18 -07:00
David S. Miller	23a1f682a9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2020-10-01 The following pull-request contains BPF updates for your net-next tree. We've added 90 non-merge commits during the last 8 day(s) which contain a total of 103 files changed, 7662 insertions(+), 1894 deletions(-). Note that once bpf(/net) tree gets merged into net-next, there will be a small merge conflict in tools/lib/bpf/btf.c between commit `1245008122` ("libbpf: Fix native endian assumption when parsing BTF") from the bpf tree and the commit `3289959b97` ("libbpf: Support BTF loading and raw data output in both endianness") from the bpf-next tree. Correct resolution would be to stick with bpf-next, it should look like: [...] /* check BTF magic / if (fread(&magic, 1, sizeof(magic), f) < sizeof(magic)) { err = -EIO; goto err_out; } if (magic != BTF_MAGIC && magic != bswap_16(BTF_MAGIC)) { / definitely not a raw BTF / err = -EPROTO; goto err_out; } / get file size */ [...] The main changes are: 1) Add bpf_snprintf_btf() and bpf_seq_printf_btf() helpers to support displaying BTF-based kernel data structures out of BPF programs, from Alan Maguire. 2) Speed up RCU tasks trace grace periods by a factor of 50 & fix a few race conditions exposed by it. It was discussed to take these via BPF and networking tree to get better testing exposure, from Paul E. McKenney. 3) Support multi-attach for freplace programs, needed for incremental attachment of multiple XDP progs using libxdp dispatcher model, from Toke Høiland-Jørgensen. 4) libbpf support for appending new BTF types at the end of BTF object, allowing intrusive changes of prog's BTF (useful for future linking), from Andrii Nakryiko. 5) Several BPF helper improvements e.g. avoid atomic op in cookie generator and add a redirect helper into neighboring subsys, from Daniel Borkmann. 6) Allow map updates on sockmaps from bpf_iter context in order to migrate sockmaps from one to another, from Lorenz Bauer. 7) Fix 32 bit to 64 bit assignment from latest alu32 bounds tracking which caused a verifier issue due to type downgrade to scalar, from John Fastabend. 8) Follow-up on tail-call support in BPF subprogs which optimizes x64 JIT prologue and epilogue sections, from Maciej Fijalkowski. 9) Add an option to perf RB map to improve sharing of event entries by avoiding remove- on-close behavior. Also, add BPF_PROG_TEST_RUN for raw_tracepoint, from Song Liu. 10) Fix a crash in AF_XDP's socket_release when memory allocation for UMEMs fails, from Magnus Karlsson. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-01 14:29:01 -07:00
David S. Miller	7c89d9d9f9	Merge branch 'net-ravb-Add-support-for-explicit-internal-clock-delay-c onfiguration' Geert Uytterhoeven says: ==================== net/ravb: Add support for explicit internal clock delay configuration Some Renesas EtherAVB variants support internal clock delay configuration, which can add larger delays than the delays that are typically supported by the PHY (using an "rgmii-id" PHY mode, and/or "[rt]xc-skew-ps" properties). Historically, the EtherAVB driver configured these delays based on the "rgmii-id" PHY mode. This caused issues with PHY drivers that implement PHY internal delays properly[1]. Hence a backwards-compatible workaround was added by masking the PHY mode[2]. This patch series implements the next step of the plan outlined in [3], and adds proper support for explicit configuration of the MAC internal clock delays using new "[rt]x-internal-delay-ps" properties. If none of these properties is present, the driver falls back to the old handling. This can be considered the MAC counterpart of commit `9150069bf5` ("dt-bindings: net: Add tx and rx internal delays"), which applies to the PHY. Note that unlike commit `92252eec91` ("net: phy: Add a helper to return the index for of the internal delay"), no helpers are provided to parse the DT properties, as so far there is a single user only, which supports only zero or a single fixed value. Of course such helpers can be added later, when the need arises, or when deemed useful otherwise. This series consists of 3 parts: 1. DT binding updates documenting the new properties, for both the generic ethernet-controller and the EtherAVB-specific bindings, 2. Conversion to json-schema of the Renesas EtherAVB DT bindings. Technically, the conversion is independent of all of the above. I included it in this series, as it shows how all sanity checks on "[rt]x-internal-delay-ps" values are implemented as DT binding checks, 3. EtherAVB driver update implementing support for the new properties. Given Rob has provided his acks for the DT binding updates, all of this can be merged through net-next. Changes compared to v3[4]: - Add Reviewed-by, - Drop the DT updates, as they will be merged through renesas-devel and arm-soc, and have a hard dependency on this series. Changes compared to v2[5]: - Update recently added board DTS files, - Add Reviewed-by. Changes compared to v1[6]: - Added "[PATCH 1/7] dt-bindings: net: ethernet-controller: Add internal delay properties", - Replace "renesas,[rt]xc-delay-ps" by "[rt]x-internal-delay-ps", - Incorporated EtherAVB DT binding conversion to json-schema, - Add Reviewed-by. Impacted, tested: - Salvator-X(S) with R-Car H3 ES1.0 and ES2.0, M3-W, and M3-N. Not impacted, tested: - Ebisu with R-Car E3. Impacted, not tested: - Salvator-X(S) with other SoC variants, - ULCB with R-Car H3/M3-W/M3-N variants, - V3MSK and Eagle with R-Car V3M, - Draak with R-Car V3H, - HiHope RZ/G2[MN] with RZ/G2M or RZ/G2N, - Beacon EmbeddedWorks RZ/G2M Development Kit. To ease testing, I have pushed this series and the DT updates to the topic/ravb-internal-clock-delays-v4 branch of my renesas-drivers repository at git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers.git. Thanks for applying! References: [1] Commit `bcf3440c6d` ("net: phy: micrel: add phy-mode support for the KSZ9031 PHY") [2] Commit `9b23203c32` ("ravb: Mask PHY mode to avoid inserting delays twice"). https://lore.kernel.org/r/20200529122540.31368-1-geert+renesas@glider.be/ [3] https://lore.kernel.org/r/CAMuHMdU+MR-2tr3-pH55G0GqPG9HwH3XUd=8HZxprFDMGQeWUw@mail.gmail.com/ [4] https://lore.kernel.org/linux-devicetree/20200819134344.27813-1-geert+renesas@glider.be/ [5] https://lore.kernel.org/linux-devicetree/20200706143529.18306-1-geert+renesas@glider.be/ [6] https://lore.kernel.org/linux-devicetree/20200619191554.24942-1-geert+renesas@glider.be/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-01 12:53:31 -07:00
Geert Uytterhoeven	a6f51f2efa	ravb: Add support for explicit internal clock delay configuration Some EtherAVB variants support internal clock delay configuration, which can add larger delays than the delays that are typically supported by the PHY (using an "rgmii-id" PHY mode, and/or "[rt]xc-skew-ps" properties). Historically, the EtherAVB driver configured these delays based on the "rgmii-id" PHY mode. This caused issues with PHY drivers that implement PHY internal delays properly[1]. Hence a backwards-compatible workaround was added by masking the PHY mode[2]. Add proper support for explicit configuration of the MAC internal clock delays using the new "[rt]x-internal-delay-ps" properties. Fall back to the old handling if none of these properties is present. [1] Commit `bcf3440c6d` ("net: phy: micrel: add phy-mode support for the KSZ9031 PHY") [2] Commit `9b23203c32` ("ravb: Mask PHY mode to avoid inserting delays twice"). Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-01 12:53:31 -07:00
Geert Uytterhoeven	ce19a9eb53	ravb: Split delay handling in parsing and applying Currently, full delay handling is done in both the probe and resume paths. Split it in two parts, so the resume path doesn't have to redo the parsing part over and over again. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-01 12:53:31 -07:00
Geert Uytterhoeven	d7adf63311	dt-bindings: net: renesas,etheravb: Convert to json-schema Convert the Renesas Ethernet AVB (EthernetAVB-IF) Device Tree binding documentation to json-schema. Add missing properties. Update the example to match reality. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-01 12:53:30 -07:00
Geert Uytterhoeven	57197b66d0	dt-bindings: net: renesas,ravb: Document internal clock delay properties Some EtherAVB variants support internal clock delay configuration, which can add larger delays than the delays that are typically supported by the PHY (using an "rgmii-*id" PHY mode, and/or "[rt]xc-skew-ps" properties). Add properties for configuring the internal MAC delays. These properties are mandatory, even when specified as zero, to distinguish between old and new DTBs. Update the (bogus) example accordingly. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@gmail.com> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-01 12:53:30 -07:00
Geert Uytterhoeven	0024bad1f4	dt-bindings: net: ethernet-controller: Add internal delay properties Internal Receive and Transmit Clock Delays are a common setting for RGMII capable devices. While these delays are typically applied by the PHY, some MACs support configuring internal clock delay settings, too. Hence add standardized properties to configure this. This is the MAC counterpart of commit `9150069bf5` ("dt-bindings: net: Add tx and rx internal delays"), which applies to the PHY. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-01 12:53:30 -07:00
David S. Miller	87d5034d07	mlx5-updates-2020-09-30 Updates and cleanups for mlx5 driver: 1) From Ariel, Dan Carpenter and Gostavo, Fixes to the previous mlx5 Connection track series. 2) From Yevgeny, trivial cleanups for Software steering 3) From Hamdan, Support for Flow source hint in software steering and E-Switch 4) From Parav and Sunil, Small and trivial E-Switch updates and cleanups in preparation for mlx5 Sub-functions support -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl91WncACgkQSD+KveBX +j4cOQgAuuHPFMht6x3/L1uTuZm3QTam2FmvaNDn1dcZG1oqq3ZILCh9zY4bYQ8x PGcVhtEp8IbTOhgoUM40X3LOfzIBdOK2WfOh54LB6vXX7yvPBk3u1491vrF38eqQ 7JDaJwQpJKir0oqnumQOWHr0vffiTzTZSBu9huUgxBXzzA0HVpm6Z1Kh5G4dm+Ot xI5+Qb4BqssiiSw+NbyQcE8k2/7MgaQh/I6lGjVeN0uxiZCdtz9/sFcfkEsTmp/E lFdi4rdlHF+ejZmhEzobdxQmFo5IEiOKb1MoRTOWAqjQgIHwsnsHfat5O3wn2N7k FNb8lZIKBrrgT8XDbzgQ6T4Im2wLQw== =hFxp -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2020-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-09-30 Updates and cleanups for mlx5 driver: 1) From Ariel, Dan Carpenter and Gostavo, Fixes to the previous mlx5 Connection track series. 2) From Yevgeny, trivial cleanups for Software steering 3) From Hamdan, Support for Flow source hint in software steering and E-Switch 4) From Parav and Sunil, Small and trivial E-Switch updates and cleanups in preparation for mlx5 Sub-functions support ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-01 12:24:52 -07:00
Alexei Starovoitov	6208689fb3	Merge branch 'introduce BPF_F_PRESERVE_ELEMS' Song Liu says: ==================== This set introduces BPF_F_PRESERVE_ELEMS to perf event array for better sharing of perf event. By default, perf event array removes the perf event when the map fd used to add the event is closed. With BPF_F_PRESERVE_ELEMS set, however, the perf event will stay in the array until it is removed, or the map is closed. --- Changes v3 => v5: 1. Clean up in selftest. (Alexei) Changes v2 => v3: 1. Move perf_event_fd_array_map_free() to avoid unnecessary forward declaration. (Daniel) Changes v1 => v2: 1. Rename the flag as BPF_F_PRESERVE_ELEMS. (Alexei, Daniel) 2. Simplify the code and selftest. (Daniel, Alexei) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-09-30 23:21:17 -07:00
Song Liu	d6b4206841	selftests/bpf: Add tests for BPF_F_PRESERVE_ELEMS Add tests for perf event array with and without BPF_F_PRESERVE_ELEMS. Add a perf event to array via fd mfd. Without BPF_F_PRESERVE_ELEMS, the perf event is removed when mfd is closed. With BPF_F_PRESERVE_ELEMS, the perf event is removed when the map is freed. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200930224927.1936644-3-songliubraving@fb.com	2020-09-30 23:21:06 -07:00
Song Liu	792caccc45	bpf: Introduce BPF_F_PRESERVE_ELEMS for perf event array Currently, perf event in perf event array is removed from the array when the map fd used to add the event is closed. This behavior makes it difficult to the share perf events with perf event array. Introduce perf event map that keeps the perf event open with a new flag BPF_F_PRESERVE_ELEMS. With this flag set, perf events in the array are not removed when the original map fd is closed. Instead, the perf event will stay in the map until 1) it is explicitly removed from the array; or 2) the array is freed. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200930224927.1936644-2-songliubraving@fb.com	2020-09-30 23:18:12 -07:00
Gustavo A. R. Silva	ff7ea04ad5	net/mlx5e: Fix potential null pointer dereference Calls to kzalloc() and kvzalloc() should be null-checked in order to avoid any potential failures. In this case, a potential null pointer dereference. Fix this by adding null checks for _parse_attr_ and _flow_ right after allocation. Addresses-Coverity-ID: 1497154 ("Dereference before null check") Fixes: `c620b77215` ("net/mlx5: Refactor tc flow attributes structure") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:31 -07:00
Dan Carpenter	7b2b16ee54	net/mlx5e: Fix a use after free on error in mlx5_tc_ct_shared_counter_get() This code frees "shared_counter" and then dereferences on the next line to get the error code. Fixes: `1edae2335a` ("net/mlx5e: CT: Use the same counter for both directions") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:31 -07:00
Ariel Levkovich	5efbe61788	net/mlx5: Fix dereference on pointer attr after null check When removing a flow from the slow path fdb, a flow attr struct is allocated for the rule removal process. If the allocation fails the code prints a warning message but continues with the removal flow which include dereferencing a pointer which could be null. Fix this by exiting the function in case the attr allocation failed. Fixes: `c620b77215` ("net/mlx5: Refactor tc flow attributes structure") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:31 -07:00
Parav Pandit	7be3412a76	net/mlx5: Use dma device access helper Use the PCI device directly for dma accesses as non PCI device unlikely support IOMMU and dma mappings. Introduce and use helper routine to access DMA device. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:30 -07:00
Hamdan Igbaria	036e19b90f	net/mlx5: E-Switch, Support flow source for local vport Set flow source as hint for local vport. Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:30 -07:00
Parav Pandit	c7eddc6092	net/mlx5: E-switch, Move devlink eswitch ports closer to eswitch Currently devlink eswitch ports are registered and unregistered by the representor layer. However it is better to register them at eswitch layer so that in future user initiated command port add and delete commands can also register/unregister devlink ports without depending on representor layer. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:29 -07:00
Parav Pandit	38679b5a0d	net/mlx5: E-switch, Use helper function to load unload representor To register and unregister devlink ports when loading/unload representors, refactor the code to helper functions. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:29 -07:00
Parav Pandit	2c40db2f1d	net/mlx5: E-switch, Add helper to check egress ACL need Currently only VF vports need egress ACL table. Add a generic helper to check whether a vport need egress ACL table or not. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:29 -07:00
sunils	7cd7becddd	net/mlx5: E-switch, Use PF num in metadata reg c0 Currently only 256 vports can be supported as only 8 bits are reserved for them and 8 bits are reserved for vhca_ids in metadata reg c0. To support more than 256 vports, replace vhca_id with a unique shorter 4-bit PF number which covers upto 16 PF's. Use remaining 12 bits for vports ranging 1-4095. This will continue to generate unique metadata even if multiple PCI devices have same switch_id. Signed-off-by: sunils <sunils@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:28 -07:00
Hamdan Igbaria	0172391967	net/mlx5: DR, Add support for rule creation with flow source hint Skip the rule according to flow arrival source, in case of RX and the source is local port skip and in case of TX and the source is uplink skip, we get this info according to the flow source hint we get from upper layers when creating the rule. This is needed because for example in case of FDB table which has a TX and RX tables and we are inserting a rule with an encap action which is only a TX action, in this case rule will fail on RX, so we can rely on the flow source hint and skip RX in such case. Until now we relied on metadata regc_0 that upper layer mapped the port in the regc_0, but the problem is that upper layer did not always use regc_0 for port mapping, so now we added support to flow source hint which upper layers will pass to SW steering when creating a rule. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Hamdan Igbaria <hamdani@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:28 -07:00
Yevgeny Kliteynik	e6b69bf379	net/mlx5: DR, Call ste_builder directly with tag pointer Instead of getting the tag in each function, call the builder directly with the tag. This will allow to use the same function for building the tag and the bitmask. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:28 -07:00
Yevgeny Kliteynik	92b4b88531	net/mlx5: DR, Remove unneeded local variable The misc3 variable is used only once and can be dropped. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:27 -07:00
Yevgeny Kliteynik	e6422d1da0	net/mlx5: DR, Remove unneeded vlan check from L2 builder When we create a matcher we check that all fields are consumed. There is no need for this specific check. This keeps the STE builder functions simple and clean. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:27 -07:00
Yevgeny Kliteynik	38a5c59d7e	net/mlx5: DR, Remove unneeded check from source port builder Mask validity for ste builders is checked by mlx5dr_ste_build_pre_check during matcher creation. It already checks the mask value of source_vport, so removing this duplicated check. Also, moving there the check of source_eswitch_owner_vhca_id mask. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:27 -07:00
Yevgeny Kliteynik	97ffd895fe	net/mlx5: DR, Replace the check for valid STE entry Validity check is done by reading the next lu_type from the STE, this check can be replaced by checking the refcount. This will make the check independent on internal STE structure. Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>	2020-09-30 21:26:26 -07:00
Jean-Philippe Brucker	3effc06a4d	selftests/bpf: Fix alignment of .BTF_ids Fix a build failure on arm64, due to missing alignment information for the .BTF_ids section: resolve_btfids.test.o: in function `test_resolve_btfids': tools/testing/selftests/bpf/prog_tests/resolve_btfids.c:140:(.text+0x29c): relocation truncated to fit: R_AARCH64_LDST32_ABS_LO12_NC against `.BTF_ids' ld: tools/testing/selftests/bpf/prog_tests/resolve_btfids.c:140: warning: one possible cause of this error is that the symbol is being referenced in the indicated code as if it had a larger alignment than was declared where it was defined In vmlinux, the .BTF_ids section is aligned to 4 bytes by vmlinux.lds.h. In test_progs however, .BTF_ids doesn't have alignment constraints. The arm64 linker expects the btf_id_set.cnt symbol, a u32, to be naturally aligned but finds it misaligned and cannot apply the relocation. Enforce alignment of .BTF_ids to 4 bytes. Fixes: `cd04b04de1` ("selftests/bpf: Add set test to resolve_btfids") Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/bpf/20200930093559.2120126-1-jean-philippe@linaro.org	2020-09-30 18:11:40 -07:00
David S. Miller	f2e834694b	Merge branch 'drop_monitor-Convert-to-use-devlink-tracepoint' Ido Schimmel says: ==================== drop_monitor: Convert to use devlink tracepoint Drop monitor is able to monitor both software and hardware originated drops. Software drops are monitored by having drop monitor register its probe on the 'kfree_skb' tracepoint. Hardware originated drops are monitored by having devlink call into drop monitor whenever it receives a dropped packet from the underlying hardware. This patch set converts drop monitor to monitor both software and hardware originated drops in the same way - by registering its probe on the relevant tracepoint. In addition to drop monitor being more consistent, it is now also possible to build drop monitor as module instead of as a builtin and still monitor hardware originated drops. Initially, CONFIG_NET_DEVLINK implied CONFIG_NET_DROP_MONITOR, but after commit `def2fbffe6` ("kconfig: allow symbols implied by y to become m") we can have CONFIG_NET_DEVLINK=y and CONFIG_NET_DROP_MONITOR=m and hardware originated drops will not be monitored. Patch set overview: Patch #1 adds a tracepoint in devlink for trap reports. Patch #2 prepares probe functions in drop monitor for the new tracepoint. Patch #3 converts drop monitor to use the new tracepoint. Patches #4-#6 perform cleanups after the conversion. Patch #7 adds a test case for drop monitor. Both software originated drops and hardware originated drops (using netdevsim) are tested. Tested: \| CONFIG_NET_DEVLINK \| CONFIG_NET_DROP_MONITOR \| Build \| SW drops \| HW drops \| \| -------------------\|-------------------------\|-------\|----------\|----------\| \| y \| y \| v \| v \| v \| \| y \| m \| v \| v \| v \| \| y \| n \| v \| x \| x \| \| n \| y \| v \| v \| x \| \| n \| m \| v \| v \| x \| \| n \| n \| v \| x \| x \| ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-30 18:01:27 -07:00
Ido Schimmel	b7cc6d3c5c	selftests: net: Add drop monitor test Test that drop monitor correctly captures both software and hardware originated packet drops. # ./drop_monitor_tests.sh Software drops test TEST: Capturing active software drops [ OK ] TEST: Capturing inactive software drops [ OK ] Hardware drops test TEST: Capturing active hardware drops [ OK ] TEST: Capturing inactive hardware drops [ OK ] Tests passed: 4 Tests failed: 0 Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-30 18:01:26 -07:00
Ido Schimmel	93e155967c	drop_monitor: Filter control packets in drop monitor Previously, devlink called into drop monitor in order to report hardware originated drops / exceptions. devlink intentionally filtered control packets and did not pass them to drop monitor as they were not dropped by the underlying hardware. Now drop monitor registers its probe on a generic 'devlink_trap_report' tracepoint and should therefore perform this filtering itself instead of having devlink do that. Add the trap type as metadata and have drop monitor ignore control packets. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-30 18:01:26 -07:00
Ido Schimmel	a848c05f4b	drop_monitor: Remove duplicate struct 'struct net_dm_hw_metadata' is a duplicate of 'struct devlink_trap_metadata'. Remove the former and simplify the code. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-30 18:01:26 -07:00
Ido Schimmel	de9cbb81bd	drop_monitor: Remove no longer used functions The old probe functions that were invoked by drop monitor code are no longer called and can thus be removed. They were replaced by actual probe functions that are registered on the recently introduced 'devlink_trap_report' tracepoint. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-30 18:01:26 -07:00
Ido Schimmel	8ee2267ad3	drop_monitor: Convert to using devlink tracepoint Convert drop monitor to use the recently introduced 'devlink_trap_report' tracepoint instead of having devlink call into drop monitor. This is both consistent with software originated drops ('kfree_skb' tracepoint) and also allows drop monitor to be built as a module and still report hardware originated drops. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-30 18:01:26 -07:00
Ido Schimmel	5855357cd4	drop_monitor: Prepare probe functions for devlink tracepoint Drop monitor supports two alerting modes: Summary and packet. Prepare a probe function for each, so that they could be later registered on the devlink tracepoint by calling register_trace_devlink_trap_report(), based on the configured alerting mode. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-30 18:01:26 -07:00
Ido Schimmel	5b88823bfe	devlink: Add a tracepoint for trap reports Add a tracepoint for trap reports so that drop monitor could register its probe on it. Use trace_devlink_trap_report_enabled() to avoid wasting cycles setting the trap metadata if the tracepoint is not enabled. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-30 18:01:26 -07:00

1 2 3 4 5 ...

952976 Commits