linux

Author	SHA1	Message	Date
Hayes Wang	252df8b866	r8152: replace array with linking list for rx information The original method uses an array to store the rx information. The new one uses a list to link each rx structure. Then, it is possible to increase/decrease the number of rx structure dynamically. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 18:12:08 -07:00
Hayes Wang	ec5791c202	r8152: separate the rx buffer size The different chips may accept different rx buffer sizes. The RTL8152 supports 16K bytes, and RTL8153 support 32K bytes. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 18:12:08 -07:00
Jakub Kicinski	e070ca371f	Merge branch 'net-phy-let-phy_speed_down-up-support-speeds-1Gbps' Heiner says: ==================== So far phy_speed_down/up can be used up to 1Gbps only. Remove this restriction and add needed helpers to phy-core.c v2: - remove unused parameter in patch 1 - rename __phy_speed_down to phy_speed_down_core in patch 2 ==================== Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 17:16:11 -07:00
Heiner Kallweit	65b27995a4	net: phy: let phy_speed_down/up support speeds >1Gbps So far phy_speed_down/up can be used up to 1Gbps only. Remove this restriction by using new helper __phy_speed_down. New member adv_old in struct phy_device is used by phy_speed_up to restore the advertised modes before calling phy_speed_down. Don't simply advertise what is supported because a user may have intentionally removed modes from advertisement. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 17:14:06 -07:00
Heiner Kallweit	331c56ac73	net: phy: add phy_speed_down_core and phy_resolve_min_speed phy_speed_down_core provides most of the functionality for phy_speed_down. It makes use of new helper phy_resolve_min_speed that is based on the sorting of the settings[] array. In certain cases it may be helpful to be able to exclude legacy half duplex modes, therefore prepare phy_resolve_min_speed() for it. v2: - rename __phy_speed_down to phy_speed_down_core Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 17:14:06 -07:00
Heiner Kallweit	7b261e0ef5	net: phy: add __set_linkmode_max_speed We will need the functionality of __set_linkmode_max_speed also for linkmode bitmaps other than phydev->supported. Therefore split it. v2: - remove unused parameter from __set_linkmode_max_speed Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 17:14:06 -07:00
Vlad Buslov	043b8413e8	net: devlink: remove redundant rtnl lock assert It is enough for caller of devlink_compat_switch_id_get() to hold the net device to guarantee that devlink port is not destroyed concurrently. Remove rtnl lock assertion and modify comment to warn user that they must hold either rtnl lock or reference to net device. This is necessary to accommodate future implementation of rtnl-unlocked TC offloads driver callbacks. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 16:47:11 -07:00
Jakub Kicinski	708852dcac	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== The following pull-request contains BPF updates for your net-next tree. There is a small merge conflict in libbpf (Cc Andrii so he's in the loop as well): for (i = 1; i <= btf__get_nr_types(btf); i++) { t = (struct btf_type )btf__type_by_id(btf, i); if (!has_datasec && btf_is_var(t)) { / replace VAR with INT / t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0); <<<<<<< HEAD / * using size = 1 is the safest choice, 4 will be too * big and cause kernel BTF validation failure if * original variable took less than 4 bytes / t->size = 1; (int )(t+1) = BTF_INT_ENC(0, 0, 8); } else if (!has_datasec && kind == BTF_KIND_DATASEC) { ======= t->size = sizeof(int); (int )(t + 1) = BTF_INT_ENC(0, 0, 32); } else if (!has_datasec && btf_is_datasec(t)) { >>>>>>> `72ef80b5ee` / replace DATASEC with STRUCT / Conflict is between the two commits `1d4126c4e1` ("libbpf: sanitize VAR to conservative 1-byte INT") and `b03bc6853c` ("libbpf: convert libbpf code to use new btf helpers"), so we need to pick the sanitation fixup as well as use the new btf_is_datasec() helper and the whitespace cleanup. Looks like the following: [...] if (!has_datasec && btf_is_var(t)) { / replace VAR with INT / t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0); / * using size = 1 is the safest choice, 4 will be too * big and cause kernel BTF validation failure if * original variable took less than 4 bytes / t->size = 1; (int )(t + 1) = BTF_INT_ENC(0, 0, 8); } else if (!has_datasec && btf_is_datasec(t)) { / replace DATASEC with STRUCT */ [...] The main changes are: 1) Addition of core parts of compile once - run everywhere (co-re) effort, that is, relocation of fields offsets in libbpf as well as exposure of kernel's own BTF via sysfs and loading through libbpf, from Andrii. More info on co-re: http://vger.kernel.org/bpfconf2019.html#session-2 and http://vger.kernel.org/lpc-bpf2018.html#session-2 2) Enable passing input flags to the BPF flow dissector to customize parsing and allowing it to stop early similar to the C based one, from Stanislav. 3) Add a BPF helper function that allows generating SYN cookies from XDP and tc BPF, from Petar. 4) Add devmap hash-based map type for more flexibility in device lookup for redirects, from Toke. 5) Improvements to XDP forwarding sample code now utilizing recently enabled devmap lookups, from Jesper. 6) Add support for reporting the effective cgroup progs in bpftool, from Jakub and Takshak. 7) Fix reading kernel config from bpftool via /proc/config.gz, from Peter. 8) Fix AF_XDP umem pages mapping for 32 bit architectures, from Ivan. 9) Follow-up to add two more BPF loop tests for the selftest suite, from Alexei. 10) Add perf event output helper also for other skb-based program types, from Allan. 11) Fix a co-re related compilation error in selftests, from Yonghong. ==================== Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 16:24:57 -07:00
YueHaibing	a9a9676016	net: hns3: Make hclge_func_reset_sync_vf static Fix sparse warning: drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:3190:5: warning: symbol 'hclge_func_reset_sync_vf' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 16:18:12 -07:00
Jiri Pirko	92b4982228	devlink: send notifications for deleted snapshots on region destroy Currently the notifications for deleted snapshots are sent only in case user deletes a snapshot manually. Send the notifications in case region is destroyed too. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 14:55:46 -07:00
Daniel Borkmann	72ef80b5ee	Merge branch 'bpf-libbpf-read-sysfs-btf' Andrii Nakryiko says: ==================== Now that kernel's BTF is exposed through sysfs at well-known location, attempt to load it first as a target BTF for the purpose of BPF CO-RE relocations. Patch #1 is a follow-up patch to rename /sys/kernel/btf/kernel into /sys/kernel/btf/vmlinux. Patch #2 adds ability to load raw BTF contents from sysfs and expands the list of locations libbpf attempts to load vmlinux BTF from. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-13 23:19:42 +02:00
Andrii Nakryiko	a1916a153c	libbpf: attempt to load kernel BTF from sysfs first Add support for loading kernel BTF from sysfs (/sys/kernel/btf/vmlinux) as a target BTF. Also extend the list of on disk search paths for vmlinux ELF image with entries that perf is searching for. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-13 23:19:42 +02:00
Andrii Nakryiko	7fd785685e	btf: rename /sys/kernel/btf/kernel into /sys/kernel/btf/vmlinux Expose kernel's BTF under the name vmlinux to be more uniform with using kernel module names as file names in the future. Fixes: `341dfcf8d7` ("btf: expose BTF info through sysfs") Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-13 23:19:42 +02:00
Petar Penkov	9840a4ffcf	selftests/bpf: fix race in flow dissector tests Since the "last_dissection" map holds only the flow keys for the most recent packet, there is a small race in the skb-less flow dissector tests if a new packet comes between transmitting the test packet, and reading its keys from the map. If this happens, the test packet keys will be overwritten and the test will fail. Changing the "last_dissection" map to a hash map, keyed on the source/dest port pair resolves this issue. Additionally, let's clear the last test results from the map between tests to prevent previous test cases from interfering with the following test cases. Fixes: `0905beec9f` ("selftests/bpf: run flow dissector tests in skb-less mode") Signed-off-by: Petar Penkov <ppenkov@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-13 16:31:10 +02:00
Peter Wu	d66fa3c70e	tools: bpftool: add feature check for zlib bpftool requires libelf, and zlib for decompressing /proc/config.gz. zlib is a transitive dependency via libelf, and became mandatory since elfutils 0.165 (Jan 2016). The feature check of libelf is already done in the elfdep target of tools/lib/bpf/Makefile, pulled in by bpftool via a dependency on libbpf.a. Add a similar feature check for zlib. Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Peter Wu <peter@lekensteyn.nl> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-13 16:22:07 +02:00
Andrii Nakryiko	341dfcf8d7	btf: expose BTF info through sysfs Make .BTF section allocated and expose its contents through sysfs. /sys/kernel/btf directory is created to contain all the BTFs present inside kernel. Currently there is only kernel's main BTF, represented as /sys/kernel/btf/kernel file. Once kernel modules' BTFs are supported, each module will expose its BTF as /sys/kernel/btf/<module-name> file. Current approach relies on a few pieces coming together: 1. pahole is used to take almost final vmlinux image (modulo .BTF and kallsyms) and generate .BTF section by converting DWARF info into BTF. This section is not allocated and not mapped to any segment, though, so is not yet accessible from inside kernel at runtime. 2. objcopy dumps .BTF contents into binary file and subsequently convert binary file into linkable object file with automatically generated symbols _binary__btf_kernel_bin_start and _binary__btf_kernel_bin_end, pointing to start and end, respectively, of BTF raw data. 3. final vmlinux image is generated by linking this object file (and kallsyms, if necessary). sysfs_btf.c then creates /sys/kernel/btf/kernel file and exposes embedded BTF contents through it. This allows, e.g., libbpf and bpftool access BTF info at well-known location, without resorting to searching for vmlinux image on disk (location of which is not standardized and vmlinux image might not be even available in some scenarios, e.g., inside qemu during testing). Alternative approach using .incbin assembler directive to embed BTF contents directly was attempted but didn't work, because sysfs_proc.o is not re-compiled during link-vmlinux.sh stage. This is required, though, to update embedded BTF data (initially empty data is embedded, then pahole generates BTF info and we need to regenerate sysfs_btf.o with updated contents, but it's too late at that point). If BTF couldn't be generated due to missing or too old pahole, sysfs_btf.c handles that gracefully by detecting that _binary__btf_kernel_bin_start (weak symbol) is 0 and not creating /sys/kernel/btf at all. v2->v3: - added Documentation/ABI/testing/sysfs-kernel-btf (Greg K-H); - created proper kobject (btf_kobj) for btf directory (Greg K-H); - undo v2 change of reusing vmlinux, as it causes extra kallsyms pass due to initially missing __binary__btf_kernel_bin_{start/end} symbols; v1->v2: - allow kallsyms stage to re-use vmlinux generated by gen_btf(); Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-13 16:14:15 +02:00
Peter Wu	a664a83457	tools: bpftool: fix reading from /proc/config.gz /proc/config has never existed as far as I can see, but /proc/config.gz is present on Arch Linux. Add support for decompressing config.gz using zlib which is a mandatory dependency of libelf anyway. Replace existing stdio functions with gzFile operations since the latter transparently handles uncompressed and gzip-compressed files. Cc: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-12 11:07:16 +02:00
Greg Kroah-Hartman	53f6f39178	caif: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Richard Fontana <rfontana@redhat.com> Cc: Steve Winslow <swinslow@gmail.com> Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:31:25 -07:00
Greg Kroah-Hartman	6f20a697e4	xen-netback: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Wei Liu <wei.liu@kernel.org> Cc: Paul Durrant <paul.durrant@citrix.com> Cc: xen-devel@lists.xenproject.org Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:30:06 -07:00
David S. Miller	a858390177	Merge branch 'net-dsa-mv88e6xxx-prepare-Wait-Bit-operation' Vivien Didelot says: ==================== net: dsa: mv88e6xxx: prepare Wait Bit operation The Remote Management Interface has its own implementation of a Wait Bit operation, which requires a bit number and a value to wait for. In order to prepare the introduction of this implementation, rework the code waiting for bits and masks in mv88e6xxx to match this signature. This has the benefit to unify the implementation of wait routines while removing obsolete wait and update functions and also reducing the code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:27:15 -07:00
Vivien Didelot	eede236112	net: dsa: mv88e6xxx: add delay in direct SMI wait The mv88e6xxx_smi_direct_wait routine is used to wait on indirect registers access. It is of no exception and must delay between read attempts, like other wait routines. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:27:15 -07:00
Vivien Didelot	1c6463b6fc	net: dsa: mv88e6xxx: fix SMI bit checking The current mv88e6xxx_smi_direct_wait function is only used to check the 16th bit of the (16-bit) SMI Command register. But the bit shift operation is not enough if we eventually use this function to check other bits, thus replace it with a mask. Fixes: `e7ba0fad9c` ("net: dsa: mv88e6xxx: refine SMI support") Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:27:15 -07:00
Vivien Didelot	2ad4da776b	net: dsa: mv88e6xxx: remove wait and update routines Now that we have proper Wait Bit and Wait Mask routines, remove the unused mv88e6xxx_wait routine and its Global 1 and Global 2 variants. The indirect tables such as the Device Mapping Table or Priority Override Table make use of an Update bit to distinguish reading (0) from writing (1) operations. After a write operation occurs, the bit self clears right away so there's no need to wait on it. Thus keep things simple and remove the mv88e6xxx_update helper as well. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:27:15 -07:00
Vivien Didelot	28ae1e9662	net: dsa: mv88e6xxx: wait for AVB Busy bit The AVB is not an indirect table using an Update bit, but a unit using a Busy bit. This means that we must ensure that this bit is cleared before setting it and wait until it gets cleared again after writing an operation. Reflect that. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:27:15 -07:00
Vivien Didelot	19fb7f69da	net: dsa: mv88e6xxx: introduce wait bit routine Many portions of the driver need to wait until a given bit is set or cleared. Some busses even have a specific implementation for this operation. In preparation for such variant, implement a generic Wait Bit routine that can be used by the driver core functions. This allows us to get rid of the custom implementations we may find in the driver. Note that for the EEPROM bits, BUSY and RUNNING bits are independent, thus it is more efficient to wait independently for each bit instead of waiting for their mask. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:27:15 -07:00
Vivien Didelot	683f2244c5	net: dsa: mv88e6xxx: introduce wait mask routine The current mv88e6xxx_wait routine is used to wait for a given mask to be cleared to zero. However in some cases, the driver may have to wait for a given mask to be of a certain non-zero value. Thus provide a generic wait mask routine that will be used to implement the current mv88e6xxx_wait function, and use it to wait for 88E6185 PPU states. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:27:15 -07:00
Vivien Didelot	929938536f	net: dsa: mv88e6xxx: wait for 88E6185 PPU disabled The PPU state of 88E6185 can be either "Disabled at Reset" or "Disabled after Initialization". Because we intentionally clear the PPU Enabled bit before checking its state, it is safe to wait for the MV88E6185_G1_STS_PPU_STATE_DISABLED state explicitly instead of waiting for any state different than MV88E6185_G1_STS_PPU_STATE_POLLING. Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:27:15 -07:00
Heiner Kallweit	eb2e7f0922	r8169: inline rtl8169_free_rx_databuff rtl8169_free_rx_databuff is used in only one place, so let's inline it. We can improve the loop because rtl8169_init_ring zero's RX_databuff before calling rtl8169_rx_fill, and rtl8169_rx_fill fills Rx_databuff starting from index 0. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:25:30 -07:00
David S. Miller	d35bbe84c1	Merge branch 'realtek-phy-next' Heiner Kallweit says: ==================== net: phy: realtek: add support for integrated 2.5Gbps PHY in RTL8125 This series adds support for the integrated 2.5Gbps PHY in RTL8125. First three patches add necessary functionality to phylib. Changes in v2: - added patch 1 - changed patch 4 to use a fake PHY ID that is injected by the network driver. This allows to use a dedicated PHY driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:24:32 -07:00
Heiner Kallweit	087f5b8758	net: phy: realtek: add support for the 2.5Gbps PHY in RTL8125 This adds support for the integrated 2.5Gbps PHY in Realtek RTL8125. Advertisement of 2.5Gbps mode is done via a vendor-specific register. Same applies to reading NBase-T link partner advertisement. Unfortunately this 2.5Gbps PHY shares the PHY ID with the integrated 1Gbps PHY's in other Realtek network chips and so far no method is known to differentiate them. As a workaround use a dedicated fake PHY ID that is set by the network driver by intercepting the MDIO PHY ID read. v2: - Create dedicated PHY driver and use a fake PHY ID that is injected by the network driver. Suggested by Andrew Lunn. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:24:32 -07:00
Heiner Kallweit	bf22b343ca	net: phy: add phy_modify_paged_changed Add helper function phy_modify_paged_changed, behavios is the same as for phy_modify_changed. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:24:32 -07:00
Heiner Kallweit	f4069cd7fa	net: phy: prepare phylib to deal with PHY's extending Clause 22 The integrated PHY in 2.5Gbps chip RTL8125 is the first (known to me) PHY that uses standard Clause 22 for all modes up to 1Gbps and adds 2.5Gbps control using vendor-specific registers. To use phylib for the standard part little extensions are needed: - Move most of genphy_config_aneg to a new function __genphy_config_aneg that takes a parameter whether restarting auto-negotiation is needed (depending on whether content of vendor-specific advertisement register changed). - Don't clear phydev->lp_advertising in genphy_read_status so that we can set non-C22 mode flags before. Basically both changes mimic the behavior of the equivalent Clause 45 functions. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:24:32 -07:00
Heiner Kallweit	3eef868932	net: phy: simplify genphy_config_advert by using the linkmode_adv_to_xxx_t functions Using linkmode_adv_to_mii_adv_t and linkmode_adv_to_mii_ctrl1000_t allows to simplify the code. In addition avoiding the conversion to the legacy u32 advertisement format allows to remove the warning. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Suggested-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:24:32 -07:00
Jiri Pirko	150e8f8a1b	netdevsim: register couple of devlink params Register couple of devlink params, one generic, one driver-specific. Make the values available over debugfs. Example: $ echo "111" > /sys/bus/netdevsim/new_device $ devlink dev param netdevsim/netdevsim111: name max_macs type generic values: cmode driverinit value 32 name test1 type driver-specific values: cmode driverinit value true $ cat /sys/kernel/debug/netdevsim/netdevsim111/max_macs 32 $ cat /sys/kernel/debug/netdevsim/netdevsim111/test1 Y $ devlink dev param set netdevsim/netdevsim111 name max_macs cmode driverinit value 16 $ devlink dev param set netdevsim/netdevsim111 name test1 cmode driverinit value false $ devlink dev reload netdevsim/netdevsim111 $ cat /sys/kernel/debug/netdevsim/netdevsim111/max_macs 16 $ cat /sys/kernel/debug/netdevsim/netdevsim111/test1 Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:20:25 -07:00
David S. Miller	6e5ee48339	Merge branch 'drop_monitor-Capture-dropped-packets-and-metadata' Ido Schimmel says: ==================== drop_monitor: Capture dropped packets and metadata So far drop monitor supported only one mode of operation in which a summary of recent packet drops is periodically sent to user space as a netlink event. The event only includes the drop location (program counter) and number of drops in the last interval. While this mode of operation allows one to understand if the system is dropping packets, it is not sufficient if a more detailed analysis is required. Both the packet itself and related metadata are missing. This patchset extends drop monitor with another mode of operation where the packet - potentially truncated - and metadata (e.g., drop location, timestamp, netdev) are sent to user space as a netlink event. Thanks to the extensible nature of netlink, more metadata can be added in the future. To avoid performing expensive operations in the context in which kfree_skb() is called, the dropped skbs are cloned and queued on per-CPU skb drop list. The list is then processed in process context (using a workqueue), where the netlink messages are allocated, prepared and finally sent to user space. A follow-up patchset will integrate drop monitor with devlink and allow the latter to call into drop monitor to report hardware drops. In the future, XDP drops can be added as well, thereby making drop monitor the go-to netlink channel for diagnosing all packet drops. Example usage with patched dropwatch [1] can be found here [2]. Example dissection of drop monitor netlink events with patched wireshark [3] can be found here [4]. I will submit both changes upstream after the kernel changes are accepted. Another change worth making is adding a dropmon pseudo interface to libpcap, similar to the nflog interface [5]. This will allow users to specifically listen on dropmon traffic instead of capturing all netlink packets via the nlmon netdev. Patches #1-#5 prepare the code towards the actual changes in later patches. Patch #6 adds another mode of operation to drop monitor in which the dropped packet itself is notified to user space along with metadata. Patch #7 allows users to truncate reported packets to a specific length, in case only the headers are of interest. The original length of the packet is added as metadata to the netlink notification. Patch #8 allows user to query the current configuration of drop monitor (e.g., alert mode, truncation length). Patches #9-#10 allow users to tune the length of the per-CPU skb drop list according to their needs. Changes since v1 [6]: * Add skb protocol as metadata. This allows user space to correctly dissect the packet instead of blindly assuming it is an Ethernet packet Changes since RFC [7]: * Limit the length of the per-CPU skb drop list and make it configurable * Do not use the hysteresis timer in packet alert mode * Introduce alert mode operations in a separate patch and only then introduce the new alert mode * Use 'skb->skb_iif' instead of 'skb->dev' because the latter is inside a union with 'dev_scratch' and therefore not guaranteed to point to a valid netdev * Return '-EBUSY' instead of '-EOPNOTSUPP' when trying to configure drop monitor while it is monitoring * Did not change schedule_work() in favor of schedule_work_on() as I did not observe a change in number of tail drops [1] https://github.com/idosch/dropwatch/tree/packet-mode [2] https://gist.github.com/idosch/3d524b887e16bc11b4b19e25c23dcc23#file-gistfile1-txt [3] https://github.com/idosch/wireshark/tree/drop-monitor-v2 [4] https://gist.github.com/idosch/3d524b887e16bc11b4b19e25c23dcc23#file-gistfile2-txt [5] https://github.com/the-tcpdump-group/libpcap/blob/master/pcap-netfilter-linux.c [6] https://patchwork.ozlabs.org/cover/1143443/ [7] https://patchwork.ozlabs.org/cover/1135226/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:31 -07:00
Ido Schimmel	e9feb58020	drop_monitor: Expose tail drop counter Previous patch made the length of the per-CPU skb drop list configurable. Expose a counter that shows how many packets could not be enqueued to this list. This allows users determine the desired queue length. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	30328d46af	drop_monitor: Make drop queue length configurable In packet alert mode, each CPU holds a list of dropped skbs that need to be processed in process context and sent to user space. To avoid exhausting the system's memory the maximum length of this queue is currently set to 1000. Allow users to tune the length of this queue according to their needs. The configured length is reported to user space when drop monitor configuration is queried. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	444be061d0	drop_monitor: Add a command to query current configuration Users should be able to query the current configuration of drop monitor before they start using it. Add a command to query the existing configuration which currently consists of alert mode and packet truncation length. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	57986617a7	drop_monitor: Allow truncation of dropped packets When sending dropped packets to user space it is not always necessary to copy the entire packet as usually only the headers are of interest. Allow user to specify the truncation length and add the original length of the packet as additional metadata to the netlink message. By default no truncation is performed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	ca30707dee	drop_monitor: Add packet alert mode So far drop monitor supported only one alert mode in which a summary of locations in which packets were recently dropped was sent to user space. This alert mode is sufficient in order to understand that packets were dropped, but lacks information to perform a more detailed analysis. Add a new alert mode in which the dropped packet itself is passed to user space along with metadata: The drop location (as program counter and resolved symbol), ingress netdevice and drop timestamp. More metadata can be added in the future. To avoid performing expensive operations in the context in which kfree_skb() is invoked (can be hard IRQ), the dropped skb is cloned and queued on per-CPU skb drop list. Then, in process context the netlink message is allocated, prepared and finally sent to user space. The per-CPU skb drop list is limited to 1000 skbs to prevent exhausting the system's memory. Subsequent patches will make this limit configurable and also add a counter that indicates how many skbs were tail dropped. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	28315f7999	drop_monitor: Add alert mode operations The next patch is going to add another alert mode in which the dropped packet is notified to user space, instead of only a summary of recent drops. Abstract the differences between the modes by adding alert mode operations. The operations are selected based on the currently configured mode and associated with the probes and the work item just before tracing starts. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	c5ab9b1c41	drop_monitor: Require CAP_NET_ADMIN for drop monitor configuration Currently, the configure command does not do anything but return an error. Subsequent patches will enable the command to change various configuration options such as alert mode and packet truncation. Similar to other netlink-based configuration channels, make sure only users with the CAP_NET_ADMIN capability set can execute this command. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	44075f5637	drop_monitor: Reset per-CPU data before starting to trace The function reset_per_cpu_data() allocates and prepares a new skb for the summary netlink alert message ('NET_DM_CMD_ALERT'). The new skb is stored in the per-CPU 'data' variable and the old is returned. The function is invoked during module initialization and from the workqueue, before an alert is sent. This means that it is possible to receive an alert with stale data, if we stopped tracing when the hysteresis timer ('data->send_timer') was pending. Instead of invoking the function during module initialization, invoke it just before we start tracing and ensure we get a fresh skb. This also allows us to remove the calls to initialize the timer and the work item from the module initialization path, since both could have been triggered by the error paths of reset_per_cpu_data(). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	70c69274f3	drop_monitor: Initialize timer and work item upon tracing enable The timer and work item are currently initialized once during module init, but subsequent patches will need to associate different functions with the work item, based on the configured alert mode. Allow subsequent patches to make that change by initializing and de-initializing these objects during tracing enable and disable. This also guarantees that once the request to disable tracing returns, no more netlink notifications will be generated. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
Ido Schimmel	7c747838a5	drop_monitor: Split tracing enable / disable to different functions Subsequent patches will need to enable / disable tracing based on the configured alerting mode. Reduce the nesting level and prepare for the introduction of this functionality by splitting the tracing enable / disable operations into two different functions. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 10:53:30 -07:00
David S. Miller	2cc2743d8f	Merge branch 'Networking-driver-debugfs-cleanups' Greg Kroah-Hartman says: ==================== Networking driver debugfs cleanups There is no need to test the result of any debugfs call anymore. The debugfs core warns the user if something fails, and the return value of a debugfs call can always be fed back into another debugfs call with no problems. Also, debugfs is for debugging, so if there are problems with debugfs (i.e. the system is out of memory) the rest of the kernel should not change behavior, so testing for debugfs calls is pointless and not the goal of debugfs at all. This series cleans up a lot of networking drivers and some wimax code that was calling debugfs and trying to do something with the return value that it didn't need to. Removing this logic makes the code smaller, easier to understand, and use less run-time memory in some cases, all good things. The series is against net-next, and have no dependancies between any of them if they want to go through any random tree/order. Or, if wanted, I can take them through my driver-core tree where other debugfs cleanups are being slowly fed during major merge windows. v3: fix build warning in i2400m, I thought I had caught them all :( add acks from some reviewers v2: fix up build warnings, it's as if I never even built these. Ugh, so sorry for wasting people's time with the v1 series. I need to stop relying on 0-day as it isn't working well anymore :( ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:49 -07:00
Greg Kroah-Hartman	7e174a49bb	ieee802154: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Alexander Aring <alex.aring@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Harry Morris <h.morris@cascoda.com> Cc: linux-wpan@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Acked-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:48 -07:00
Greg Kroah-Hartman	35dc61ebfc	ixgbe: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:48 -07:00
Greg Kroah-Hartman	43c4eb0381	i40e: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:48 -07:00
Greg Kroah-Hartman	ecc5570751	fm10k: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:48 -07:00

1 2 3 4 5 ...

856914 Commits