linux

Author	SHA1	Message	Date
Nogah Frankel	2b77958bf4	mlxsw: resources: Add max cpu policers resource Add a new resource to resources query: max cpu policers which tells us how many policers can be used to limit the data rate to the cpu port. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 21:22:14 -05:00
Nogah Frankel	117b0dad2d	mlxsw: Create a different trap group list for each device Trap groups can be used to control traps priority, both in terms of which trap "wins" if a packet matches two traps (priority) and in terms of packets from which trap group will be scheduled to the cpu first (tc). They can also be used to set rate limiters (policers) on them (will be added in the next patches). Currently, we support two trap groups. In Spectrum we want a better resolution, so every protocol / flow will have a different trap group, so we can control its parameters separately. Once the policers will be implemented, it will also allow us limit the rate of each protocol by itself. This patch change the trap group list to include: * the emad trap group, which is shared for all the devices. * Switchx2's trap groups, which are a copy of the current trap groups. * Spectrum's new trap groups, in order to match the above guidelines. (Switchib is using only the emad trap group, so it require no changes). This patch also includes new configuration for Spectrum's trap groups, with primary priority order within them. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 21:22:14 -05:00
Nogah Frankel	616d8040e5	mlxsw: spectrum: Add BGP trap Add a trap for BGP protocol that was previously trapped by the generic trap for IP2ME. This trap will allow us to have better control (over priority and rate) of the traffic. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 21:22:14 -05:00
Nogah Frankel	579c82e4c5	mlxsw: Change trap groups setting Trap groups have many options which we currently set to default values. In the next patches we will use many of them with non-default values. Some of these options have no default value, so this patch sets them as params for the trap group set function. Others almost always use the same values, so the set function will use this default values. In the rare cases when they will need to be with other values, these values can be set directly (using the macros for fields in registers). Parameters without default value: TC - the traffic class for packets that hit this trap group. (old default is the max tc) priority - if one packet hits multiple trap groups, the group with the higher priority will "catch" it. (old default is 0) policer - limit rate policer (old default is disabled) Default parameters: swid - switch id, relevant for the emad trap only, ignored on Spectrum. (new default is 0) rdq - CPU receive descriptor queue (new default is identical to trap group id) Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 21:22:14 -05:00
Nogah Frankel	23432cb86a	mlxsw: resources: Add max trap groups resource Add the max number of trap groups to resource query. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 21:22:14 -05:00
Nogah Frankel	9d87fceac6	mlxsw: core: Change emad trap group settings Currently, the emad trap init was done in the core. In the future we will want to add some changes to the traps groups, according to device type. This commit create a driver function to create the trap group for the emad, so later it can be changed by devices. It also changes the emad registration to use the new generic functions. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 21:22:14 -05:00
Nogah Frankel	0fb78a4e9c	mlxsw: Add option to choose trap group Currently, we set the trap group to pre-determined option, based on whether it is an rx or event trap. This commit adds a possibility to chose the trap group, so it can be set to different values in the following patches. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 21:22:14 -05:00
Nogah Frankel	d570b7ee4e	mlxsw: Change trap set function Change trap setting function so instead of determining the trap group by trap id, it gets it as a parameter (so later we can have different trap groups for Spectrum and Switchx2). Add "is_ctrl" parameter to the trap setting function. It control whether the trapped packets wait in a designated control buffer or in their default one. This parameter is ignored by Switchx2 and Switchib. Add these parameters to the traps array in Spectrum, Switchx2 and Switchib. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 21:22:14 -05:00
Nogah Frankel	85d5c9cd90	mlxsw: switchib: Use generic listener struct for events Change the event handling in Switchib to be comptible with Spectrum and Switchx2. Use the generic listener struct for the events. Init and fini them by loop (and not by calling each event by its name). Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 21:22:14 -05:00
Nogah Frankel	6bf08b53ed	mlxsw: switchx2: Use generic listener struct for events Change the events to use the generic listener struct. Merge the event list into the trap list, so the same functions will handle both. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 21:22:14 -05:00
Nogah Frankel	4544913ed7	mlxsw: spectrum: Use generic listener struct for events Change the events to use the generic listener struct. Merge the event list into the trap list, so the same functions will handle both. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 21:22:14 -05:00
Nogah Frankel	fb9012d93f	mlxsw: core: Introduce generic macro for event Create a macro for creating the generic listener struct for events, similar to the one for rx traps. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 21:22:14 -05:00
Nogah Frankel	2332d8c7df	mlxsw: switchx2: Use generic listener struct for rx traps Reorganize the traps to use the new generic listener struct and functions. Use macros to shorten the traps list. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 21:22:14 -05:00
Nogah Frankel	14eeda99c4	mlxsw: spectrum: Use generic listener struct for rx traps Replace the old rx listener struct definitions by the generic ones. Use the new generic registering / unregistering functions for them. Add some macros to organize the trap list. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 21:22:14 -05:00
Nogah Frankel	b63da93de8	mlxsw: core: Expose generic macros for rx trap In Spectrum, there is a macro to arrange the traps list. This macro is useful for everyone who is using rx traps. Create a similar macro in core.h for creating the generic listener struct for rx traps. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 21:22:14 -05:00
Nogah Frankel	0791051c43	mlxsw: core: Create a generic function to register / unregister traps We have 2 types of HW traps to handle, rx traps and events. The registration workflow for both is very similar. So it only make sense to create one function to handle both. This patch creates a struct to hold the data for both cases. It also creates a registration and an un-registration functions that get this generic struct as input. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 21:22:14 -05:00
Nogah Frankel	ee4a60d898	mlxsw: spectrum: Remove unused traps Since commit `99724c18fc` ("mlxsw: spectrum: Introduce support for router interfaces") we no longer rely on flooding traffic to the CPU in order to trap packets intended for the host itself. Therefore, the FDB MC trap can be removed. Remove traps for protocols that are not supported yet. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 21:22:14 -05:00
Dan Carpenter	eafa6abd99	net/mlx5: remove a duplicate condition We verified that MLX5_FLOW_CONTEXT_ACTION_COUNT was set on the first line of the function so we don't need to check again here. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 20:28:28 -05:00
David S. Miller	0ebc5b62a3	Merge branch 'thunderx-new-features' Sunil Goutham says: ==================== net: thunderx: Support for 80xx, RED, PFC e.t.c This patch series adds support for SLM modules present on 80xx silicon, enables ramdom early discard, backpressure generation, PFC and some ethtool changes to display supported link modes e.t.c. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 20:21:24 -05:00
Sunil Goutham	430da20808	net: thunderx: Pause frame support Enable pause frames on both Rx and Tx side, configure pause interval e.t.c. Also support for enable/disable pause frames on Rx/Tx via ethtool has been added. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 20:21:17 -05:00
Sunil Goutham	d5b2d7a718	net: thunderx: Configure RED and backpressure levels This patch enables moving average calculation of Rx pkt's resources and configures RED and backpressure levels for both CQ and RBDR. Also initialize SQ's CQ_LIMIT properly. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 20:21:17 -05:00
Thanneeru Srinivasulu	1cc702591b	net: thunderx: Add ethtool support for supported ports and link modes. Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@cavium.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 20:21:17 -05:00
Sunil Goutham	5271156b1a	net: thunderx: 80xx BGX0 configuration changes On 80xx only one lane of DLM0 and DLM1 (of BGX0) can be used , so even though lmac count may be 2 but LMAC1 should use serdes lane of DLM1. Since it's not possible to distinguish 80xx from 81xx as PCI devid are same, this patch adds this config support by replying on what firmware configures the lmacs with. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 20:21:17 -05:00
Woojung Huh	a7dac9f9c1	phy: fix error case of phy_led_triggers_(un)register When phy_init_hw() fails at phy_attach_direct(); - phy_detach() calls phy_led_triggers_unregister() without previous call of phy_led_triggers_register(). - still call phy_led_triggers_register() and cause memory leak. Fixes: `2e0bc452f4` ("net: phy: leds: add support for led triggers on phy link state change") Signed-off-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 19:57:57 -05:00
Eric Dumazet	f52dffe049	net: properly flush delay-freed skbs Typical NAPI drivers use napi_consume_skb(skb) at TX completion time. This put skb in a percpu special queue, napi_alloc_cache, to get bulk frees. It turns out the queue is not flushed and hits the NAPI_SKB_CACHE_SIZE limit quite often, with skbs that were queued hundreds of usec earlier. I measured this can take ~6000 nsec to perform one flush. __kfree_skb_flush() can be called from two points right now : 1) From net_tx_action(), but only for skbs that were queued to sd->completion_queue. -> Irrelevant for NAPI drivers in normal operation. 2) From net_rx_action(), but only under high stress or if RPS/RFS has a pending action. This patch changes net_rx_action() to perform the flush in all cases and after more urgent operations happened (like kicking remote CPUS for RPS/RFS). Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 19:37:49 -05:00
David S. Miller	ca89fa77b4	Merge branch 'cgroup-bpf' Daniel Mack says: ==================== Add eBPF hooks for cgroups This is v9 of the patch set to allow eBPF programs for network filtering and accounting to be attached to cgroups, so that they apply to all sockets of all tasks placed in that cgroup. The logic also allows to be extendeded for other cgroup based eBPF logic. Again, only minor details are updated in this version. Changes from v8: * Move the egress hooks into ip_finish_output() and ip6_finish_output() so they run after the netfilter hooks. For IPv4 multicast, add a new ip_mc_finish_output() callback that is invoked on success by netfilter, and call the eBPF program from there. Changes from v7: * Replace the static inline function cgroup_bpf_run_filter() with two specific macros for ingress and egress. This addresses David Miller's concern regarding skb->sk vs. sk in the egress path. Thanks a lot to Daniel Borkmann and Alexei Starovoitov for the suggestions. Changes from v6: * Rebased to 4.9-rc2 * Add EXPORT_SYMBOL(__cgroup_bpf_run_filter). The kbuild test robot now succeeds in building this version of the patch set. * Switch from bpf_prog_run_save_cb() to bpf_prog_run_clear_cb() to not tamper with the contents of skb->cb[]. Pointed out by Daniel Borkmann. * Use sk_to_full_sk() in the egress path, as suggested by Daniel Borkmann. * Renamed BPF_PROG_TYPE_CGROUP_SOCKET to BPF_PROG_TYPE_CGROUP_SKB, as requested by David Ahern. * Added Alexei's Acked-by tags. Changes from v5: * The eBPF programs now operate on L3 rather than on L2 of the packets, and the egress hooks were moved from __dev_queue_xmit() to ip_output(). For BPF_PROG_TYPE_CGROUP_SOCKET, disallow direct access to the skb through BPF_LD_[ABS\|IND] instructions, but hook up the bpf_skb_load_bytes() access helper instead. Thanks to Daniel Borkmann for the help. Changes from v4: * Plug an skb leak when dropping packets due to eBPF verdicts in __dev_queue_xmit(). Spotted by Daniel Borkmann. * Check for sk_fullsock(sk) in __cgroup_bpf_run_filter() so we don't operate on timewait or request sockets. Suggested by Daniel Borkmann. * Add missing @parent parameter in kerneldoc of __cgroup_bpf_update(). Spotted by Rami Rosen. * Include linux/jump_label.h from bpf-cgroup.h to fix a kbuild error. Changes from v3: * Dropped the _FILTER suffix from BPF_PROG_TYPE_CGROUP_SOCKET_FILTER, renamed BPF_ATTACH_TYPE_CGROUP_INET_{E,IN}GRESS to BPF_CGROUP_INET_{IN,E}GRESS and alias BPF_MAX_ATTACH_TYPE to __BPF_MAX_ATTACH_TYPE, as suggested by Daniel Borkmann. * Dropped the attach_flags member from the anonymous struct for BPF attach operations in union bpf_attr. They can be added later on via CHECK_ATTR. Requested by Daniel Borkmann and Alexei. * Release old_prog at the end of __cgroup_bpf_update rather that at the beginning to fix a race gap between program updates and their users. Spotted by Daniel Borkmann. * Plugged an skb leak when dropping packets on the egress path. Spotted by Daniel Borkmann. * Add cgroups@vger.kernel.org to the loop, as suggested by Rami Rosen. * Some minor coding style adoptions not worth mentioning in particular. Changes from v2: * Fixed the RCU locking details Tejun pointed out. * Assert bpf_attr.flags == 0 in BPF_PROG_DETACH syscall handler. Changes from v1: * Moved all bpf specific cgroup code into its own file, and stub out related functions for !CONFIG_CGROUP_BPF as static inline nops. This way, the call sites are not cluttered with #ifdef guards while the feature remains compile-time configurable. * Implemented the new scheme proposed by Tejun. Per cgroup, store one set of pointers that are pinned to the cgroup, and one for the programs that are effective. When a program is attached or detached, the change is propagated to all the cgroup's descendants. If a subcgroup has its own pinned program, skip the whole subbranch in order to allow delegation models. * The hookup for egress packets is now done from __dev_queue_xmit(). * A static key is now used in both the ingress and egress fast paths to keep performance penalties close to zero if the feature is not in use. * Overall cleanup to make the accessors use the program arrays. This should make it much easier to add new program types, which will then automatically follow the pinned vs. effective logic. * Fixed locking issues, as pointed out by Eric Dumazet and Alexei Starovoitov. Changes to the program array are now done with xchg() and are protected by cgroup_mutex. * eBPF programs are now expected to return 1 to let the packet pass, not >= 0. Pointed out by Alexei. * Operation is now limited to INET sockets, so local AF_UNIX sockets are not affected. The enum members are renamed accordingly. In case other socket families should be supported, this can be extended in the future. * The sample program learned to support both ingress and egress, and can now optionally make the eBPF program drop packets by making it return 0. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 16:26:12 -05:00
Daniel Mack	d8c5b17f2b	samples: bpf: add userspace example for attaching eBPF programs to cgroups Add a simple userpace program to demonstrate the new API to attach eBPF programs to cgroups. This is what it does: * Create arraymap in kernel with 4 byte keys and 8 byte values * Load eBPF program The eBPF program accesses the map passed in to store two pieces of information. The number of invocations of the program, which maps to the number of packets received, is stored to key 0. Key 1 is incremented on each iteration by the number of bytes stored in the skb. * Detach any eBPF program previously attached to the cgroup * Attach the new program to the cgroup using BPF_PROG_ATTACH * Once a second, read map[0] and map[1] to see how many bytes and packets were seen on any socket of tasks in the given cgroup. The program takes a cgroup path as 1st argument, and either "ingress" or "egress" as 2nd. Optionally, "drop" can be passed as 3rd argument, which will make the generated eBPF program return 0 instead of 1, so the kernel will drop the packet. libbpf gained two new wrappers for the new syscall commands. Signed-off-by: Daniel Mack <daniel@zonque.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 16:26:04 -05:00
Daniel Mack	33b486793c	net: ipv4, ipv6: run cgroup eBPF egress programs If the cgroup associated with the receiving socket has an eBPF programs installed, run them from ip_output(), ip6_output() and ip_mc_output(). From mentioned functions we have two socket contexts as per `7026b1ddb6` ("netfilter: Pass socket pointer down through okfn()."). We explicitly need to use sk instead of skb->sk here, since otherwise the same program would run multiple times on egress when encap devices are involved, which is not desired in our case. eBPF programs used in this context are expected to either return 1 to let the packet pass, or != 1 to drop them. The programs have access to the skb through bpf_skb_load_bytes(), and the payload starts at the network headers (L3). Note that cgroup_bpf_run_filter() is stubbed out as static inline nop for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if the feature is unused. Signed-off-by: Daniel Mack <daniel@zonque.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 16:26:04 -05:00
Daniel Mack	c11cd3a6ec	net: filter: run cgroup eBPF ingress programs If the cgroup associated with the receiving socket has an eBPF programs installed, run them from sk_filter_trim_cap(). eBPF programs used in this context are expected to either return 1 to let the packet pass, or != 1 to drop them. The programs have access to the skb through bpf_skb_load_bytes(), and the payload starts at the network headers (L3). Note that cgroup_bpf_run_filter() is stubbed out as static inline nop for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if the feature is unused. Signed-off-by: Daniel Mack <daniel@zonque.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 16:26:04 -05:00
Daniel Mack	f432455148	bpf: add BPF_PROG_ATTACH and BPF_PROG_DETACH commands Extend the bpf(2) syscall by two new commands, BPF_PROG_ATTACH and BPF_PROG_DETACH which allow attaching and detaching eBPF programs to a target. On the API level, the target could be anything that has an fd in userspace, hence the name of the field in union bpf_attr is called 'target_fd'. When called with BPF_ATTACH_TYPE_CGROUP_INET_{E,IN}GRESS, the target is expected to be a valid file descriptor of a cgroup v2 directory which has the bpf controller enabled. These are the only use-cases implemented by this patch at this point, but more can be added. If a program of the given type already exists in the given cgroup, the program is swapped automically, so userspace does not have to drop an existing program first before installing a new one, which would otherwise leave a gap in which no program is attached. For more information on the propagation logic to subcgroups, please refer to the bpf cgroup controller implementation. The API is guarded by CAP_NET_ADMIN. Signed-off-by: Daniel Mack <daniel@zonque.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 16:26:04 -05:00
Daniel Mack	3007098494	cgroup: add support for eBPF programs This patch adds two sets of eBPF program pointers to struct cgroup. One for such that are directly pinned to a cgroup, and one for such that are effective for it. To illustrate the logic behind that, assume the following example cgroup hierarchy. A - B - C \ D - E If only B has a program attached, it will be effective for B, C, D and E. If D then attaches a program itself, that will be effective for both D and E, and the program in B will only affect B and C. Only one program of a given type is effective for a cgroup. Attaching and detaching programs will be done through the bpf(2) syscall. For now, ingress and egress inet socket filtering are the only supported use-cases. Signed-off-by: Daniel Mack <daniel@zonque.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 16:25:52 -05:00
Daniel Mack	0e33661de4	bpf: add new prog type for cgroup socket filtering This program type is similar to BPF_PROG_TYPE_SOCKET_FILTER, except that it does not allow BPF_LD_[ABS\|IND] instructions and hooks up the bpf_skb_load_bytes() helper. Programs of this type will be attached to cgroups for network filtering and accounting. Signed-off-by: Daniel Mack <daniel@zonque.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 16:25:52 -05:00
Colin Ian King	619228d86b	cxgb4: fix memory leak on txq_info Currently if txq_info->uldtxq cannot be allocated then txq_info->txq is being kfree'd (which is redundant because it is NULL) instead of txq_info. Fix this by instead kfree'ing txq_info. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-25 16:09:50 -05:00
Jason Wang	436accebb5	tuntap: remove unnecessary sk_receive_queue length check during xmit After commit `1576d98605` ("tun: switch to use skb array for tx"), sk_receive_queue was not used any more. So remove the uncessary sk_receive_queue length check during xmit. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-24 16:06:56 -05:00
Alexei Starovoitov	db6a71dd9a	samples/bpf: fix bpf loader llvm can emit relocations into sections other than program code (like debug info sections). Ignore them during parsing of elf file Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-24 16:04:52 -05:00
Alexei Starovoitov	d2b024d32d	samples/bpf: fix sockex2 example since llvm commit "Do not expand UNDEF SDNode during insn selection lowering" llvm will generate code that uses uninitialized registers for cases where C code is actually uses uninitialized data. So this sockex2 example is technically broken. Fix it by initializing on the stack variable fully. Also increase verifier buffer limit, since verifier output may not fit in 64k for this sockex2 code depending on llvm version. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-24 16:04:52 -05:00
Eric Dumazet	e3f42f8453	mlx4: reorganize struct mlx4_en_tx_ring Goal is to reorganize this critical structure to increase performance. ndo_start_xmit() should only dirty one cache line, and access as few cache lines as possible. Add sp_ (Slow Path) prefix to fields that are not used in fast path, to make clear what is going on. After this patch pahole reports something much better, as all ndo_start_xmit() needed fields are packed into two cache lines instead of seven or eight struct mlx4_en_tx_ring { u32 last_nr_txbb; /* 0 0x4 / u32 cons; / 0x4 0x4 / long unsigned int wake_queue; / 0x8 0x8 / struct netdev_queue tx_queue; /* 0x10 0x8 / u32 (free_tx_desc)(struct mlx4_en_priv , struct mlx4_en_tx_ring , int, u8, u64, int); /* 0x18 0x8 / struct mlx4_en_rx_ring recycle_ring; /* 0x20 0x8 / / XXX 24 bytes hole, try to pack / / --- cacheline 1 boundary (64 bytes) --- / u32 prod; / 0x40 0x4 / unsigned int tx_dropped; / 0x44 0x4 / long unsigned int bytes; / 0x48 0x8 / long unsigned int packets; / 0x50 0x8 / long unsigned int tx_csum; / 0x58 0x8 / long unsigned int tso_packets; / 0x60 0x8 / long unsigned int xmit_more; / 0x68 0x8 / struct mlx4_bf bf; / 0x70 0x18 / / --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- / __be32 doorbell_qpn; / 0x88 0x4 / __be32 mr_key; / 0x8c 0x4 / u32 size; / 0x90 0x4 / u32 size_mask; / 0x94 0x4 / u32 full_size; / 0x98 0x4 / u32 buf_size; / 0x9c 0x4 / void buf; /* 0xa0 0x8 / struct mlx4_en_tx_info tx_info; /* 0xa8 0x8 / int qpn; / 0xb0 0x4 / u8 queue_index; / 0xb4 0x1 / bool bf_enabled; / 0xb5 0x1 / bool bf_alloced; / 0xb6 0x1 / u8 hwtstamp_tx_type; / 0xb7 0x1 / u8 bounce_buf; /* 0xb8 0x8 / / --- cacheline 3 boundary (192 bytes) --- / long unsigned int queue_stopped; / 0xc0 0x8 / struct mlx4_hwq_resources sp_wqres; / 0xc8 0x58 / / --- cacheline 4 boundary (256 bytes) was 32 bytes ago --- / struct mlx4_qp sp_qp; / 0x120 0x30 / / --- cacheline 5 boundary (320 bytes) was 16 bytes ago --- / struct mlx4_qp_context sp_context; / 0x150 0xf8 / / --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- / cpumask_t sp_affinity_mask; / 0x248 0x20 / enum mlx4_qp_state sp_qp_state; / 0x268 0x4 / u16 sp_stride; / 0x26c 0x2 / u16 sp_cqn; / 0x26e 0x2 / / size: 640, cachelines: 10, members: 36 / / sum members: 600, holes: 1, sum holes: 24 / / padding: 16 / }; Instead of this silly placement : struct mlx4_en_tx_ring { u32 last_nr_txbb; / 0 0x4 / u32 cons; / 0x4 0x4 / long unsigned int wake_queue; / 0x8 0x8 / / XXX 48 bytes hole, try to pack / / --- cacheline 1 boundary (64 bytes) --- / u32 prod; / 0x40 0x4 / / XXX 4 bytes hole, try to pack / long unsigned int bytes; / 0x48 0x8 / long unsigned int packets; / 0x50 0x8 / long unsigned int tx_csum; / 0x58 0x8 / long unsigned int tso_packets; / 0x60 0x8 / long unsigned int xmit_more; / 0x68 0x8 / unsigned int tx_dropped; / 0x70 0x4 / / XXX 4 bytes hole, try to pack / struct mlx4_bf bf; / 0x78 0x18 / / --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- / long unsigned int queue_stopped; / 0x90 0x8 / cpumask_t affinity_mask; / 0x98 0x10 / struct mlx4_qp qp; / 0xa8 0x30 / / --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- / struct mlx4_hwq_resources wqres; / 0xd8 0x58 / / --- cacheline 4 boundary (256 bytes) was 48 bytes ago --- / u32 size; / 0x130 0x4 / u32 size_mask; / 0x134 0x4 / u16 stride; / 0x138 0x2 / / XXX 2 bytes hole, try to pack / u32 full_size; / 0x13c 0x4 / / --- cacheline 5 boundary (320 bytes) --- / u16 cqn; / 0x140 0x2 / / XXX 2 bytes hole, try to pack / u32 buf_size; / 0x144 0x4 / __be32 doorbell_qpn; / 0x148 0x4 / __be32 mr_key; / 0x14c 0x4 / void buf; /* 0x150 0x8 / struct mlx4_en_tx_info tx_info; /* 0x158 0x8 / struct mlx4_en_rx_ring recycle_ring; /* 0x160 0x8 / u32 (free_tx_desc)(struct mlx4_en_priv , struct mlx4_en_tx_ring , int, u8, u64, int); /* 0x168 0x8 / u8 bounce_buf; /* 0x170 0x8 / struct mlx4_qp_context context; / 0x178 0xf8 / / --- cacheline 9 boundary (576 bytes) was 48 bytes ago --- / int qpn; / 0x270 0x4 / enum mlx4_qp_state qp_state; / 0x274 0x4 / u8 queue_index; / 0x278 0x1 / bool bf_enabled; / 0x279 0x1 / bool bf_alloced; / 0x27a 0x1 / / XXX 5 bytes hole, try to pack / / --- cacheline 10 boundary (640 bytes) --- / struct netdev_queue tx_queue; /* 0x280 0x8 / int hwtstamp_tx_type; / 0x288 0x4 / / size: 704, cachelines: 11, members: 36 / / sum members: 587, holes: 6, sum holes: 65 / / padding: 52 */ }; Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-24 16:03:37 -05:00
Florian Fainelli	4b65246b42	ethtool: Protect {get, set}_phy_tunable with PHY device mutex PHY drivers should be able to rely on the caller of {get,set}_tunable to have acquired the PHY device mutex, in order to both serialize against concurrent calls of these functions, but also against PHY state machine changes. All ethtool PHY-level functions do this, except {get,set}_tunable, so we make them consistent here as well. We need to update the Microsemi PHY driver in the same commit to avoid introducing either deadlocks, or lack of proper locking. Fixes: `968ad9da7e` ("ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE") Fixes: `310d9ad57a` ("net: phy: Add downshift get/set support in Microsemi PHYs driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Allan W. Nielsen <allan.nielsen@microsemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-24 16:02:32 -05:00
David S. Miller	fab96ec867	Merge branch 'mlx5-next' Saeed Mahameed says: ==================== Mellanox 100G mlx5 SRIOV switchdev update This series from Roi and Or further enhances the new SRIOV switchdev mode. Roi's patches deal with allowing users to configure though devlink the level of inline headers that the VF should be setting in order for the eswitch HW to do proper matching. We also enforce that the matching required for offloaded TC rules is aligned with that level on the PF driver. Or's patches deals with allowing the user to control on the VF operational link state through admin directives on the mlx5 VF rep link. Also in this series is implementation of HW and SW counters for the mlx5 VF rep which is aligned with the design set by commit `a5ea31f573` 'Merge branch net-offloaded-stats'. v1 --> v2: * constified the net-device param of get offloaded stats ndo in mlxsw (pointed by 0-day screaming on us...) * added Or's Review-by tags for Roi's patches This series was generated against commit `e796f49d82` ("net: ieee802154: constify ieee802154_ops structures") ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-24 16:01:23 -05:00
Roi Dayan	de0af0bf64	net/mlx5e: Enforce min inline mode when offloading flows A flow should be offloaded only if the matches are allowed according to min inline mode. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-24 16:01:14 -05:00
Roi Dayan	bffaa91658	net/mlx5: E-Switch, Add control for inline mode Implement devlink show and set of HW inline-mode. The supported modes: none, link, network, transport. We currently support one mode for all vports so set is done on all vports. When eswitch is first initialized the inline-mode is queried from the FW. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-24 16:01:14 -05:00
Roi Dayan	34e4e99078	net/mlx5: Enable to query min inline for a specific vport Also move the inline capablities enum to a shared header vport.h Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-24 16:01:14 -05:00
Roi Dayan	59bfde01fa	devlink: Add E-Switch inline mode control Some HWs need the VF driver to put part of the packet headers on the TX descriptor so the e-switch can do proper matching and steering. The supported modes: none, link, network, transport. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-24 16:01:14 -05:00
Or Gerlitz	20a1ea6747	net/mlx5e: Support VF vport link state control for SRIOV switchdev mode Reflect the administative link changes done on the VF representor to the VF e-switch vport. This means that doing ip link set down/up commands on the VF rep will modify the e-switch vport state which in turn will make proper VF drivers to set their carrier accordingly. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-24 16:01:14 -05:00
Or Gerlitz	370bad0f9a	net/mlx5e: Support HW (offloaded) and SW counters for SRIOV switchdev mode Switchdev driver net-device port statistics should follow the model introduced in commit `a5ea31f573` 'Merge branch net-offloaded-stats'. For VF reps we return the SRIOV eswitch vport stats as the usual ones and SW stats if asked. For the PF, if we're in the switchdev mode, we return the uplink stats and SW stats if asked, otherwise as before. The uplink stats are implemented using the PPCNT 802_3 counters which are already being read/cached by the driver. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-24 16:01:14 -05:00
Or Gerlitz	3df5b3c675	net: Add net-device param to the get offloaded stats ndo Some drivers would need to check few internal matters for that. To be used in downstream mlx5 commit. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-24 16:01:14 -05:00
David S. Miller	ac32378f3e	Merge branch 'phy-broadcom-wirespeed-downshift-support' Florian Fainelli says: ==================== net: phy: broadcom: Wirespeed/downshift support This patch series adds support for the Broadcom Wirespeed, aka downsfhit feature utilizing the recently added ethtool PHY tunables. Tested with two Gigabit link partners with a 4-wire cable having only 2 pairs connected. Last patch in the series is a fix that was required for testing, which should make it to -stable, which I can submit separate against net if you prefer David. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-24 15:46:03 -05:00
Florian Fainelli	30ce0de435	net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change In case the link change and EEE is enabled or disabled, always try to re-negotiate this with the link partner. Fixes: `450b05c15f` ("net: dsa: bcm_sf2: add support for controlling EEE") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-24 15:45:53 -05:00
Florian Fainelli	db88816ba2	net: phy: bcm7xxx: Add support for downshift/Wirespeed Add support for configuring the downshift/Wirespeed enable/disable toggles and specify a link retry value ranging from 1 to 9. Since the integrated BCM7xxx have issues when wirespeed is enabled and EEE is also enabled, we do disable EEE if wirespeed is enabled. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-24 15:45:53 -05:00
Florian Fainelli	99cec8a4dd	net: phy: broadcom: Allow enabling or disabling of EEE In preparation for adding support for Wirespeed/downshift, we need to change bcm_phy_eee_enable() to allow enabling or disabling EEE, so make the function take an extra enable/disable boolean parameter and rename it to illustrate it sets EEE, not necessarily just enables it. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-11-24 15:45:53 -05:00

1 2 3 4 5 ...

636043 Commits