linux

Author	SHA1	Message	Date
Ido Schimmel	2db9937804	mlxsw: spectrum_router: Direct macvlans' MACs to router An IP packet received on a netdev with a macvlan upper whose MAC matches the packet's destination MAC will be re-injected to the Rx path as if it was received by the macvlan, and perform an L3 lookup. Reflect this functionality to the ASIC by programming FDB entries that will direct MACs of macvlan uppers to the router. In a similar fashion to router interfaces (RIFs) that are programmed upon the addition of the first IP address on an interface and destroyed upon the removal of the last IP address, the FDB entries for the macvlan are added and destroyed based on the addition of the first and removal of the last IP address on the macvlan. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-14 11:23:26 -07:00
Ido Schimmel	c55161852f	mlxsw: spectrum: Enable macvlan upper devices In order to allow more unicast MAC addresses (e.g., VRRP virtual MAC) to be directed to the router we need to enable macvlan uppers on top of mlxsw netdevs. Allow macvlan upper devices on top of mlxsw netdevs and sanitize configurations that can't work. For example, a macvlan can't be enslaved to a bridge as without ACLs the device doesn't take the destination MAC into account when classifying a packet to a bridge instance (i.e., a FID). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-14 11:23:25 -07:00
Yafang Shao	ff0432e5a8	tcp: remove redundant rcv_nxt update tcp_rcv_nxt_update() is already executed in tcp_data_queue(). This line is redundant. See bellow, tcp_queue_rcv tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq); tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq); <<<< redundant Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-14 11:21:40 -07:00
kbuild test robot	9cee8c4375	net: mvpp2: mvpp2_cls_flow_get() can be static Fixes: `f9358e12a0` ("net: mvpp2: split ingress traffic into multiple flows") Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-13 20:21:56 -07:00
Linus Walleij	6eb9c9dafd	of: mdio: Support fixed links in of_phy_get_and_connect() By a simple extension of of_phy_get_and_connect() drivers that have a fixed link on e.g. RGMII can support also fixed links, so in addition to: ethernet-port { phy-mode = "rgmii"; phy-handle = <&foo>; }; This setup with a fixed-link node and no phy-handle will now also work just fine: ethernet-port { phy-mode = "rgmii"; fixed-link { speed = <1000>; full-duplex; pause; }; }; This is very helpful for connecting random ethernet ports to e.g. DSA switches that typically reside on fixed links. The phy-mode is still there as the fixes link in this case is still an RGMII link. Tested on the Cortina Gemini driver with the Vitesse DSA router chip on a fixed 1Gbit link. Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-13 18:25:14 -07:00
Vlad Buslov	01683a1469	net: sched: refactor flower walk to iterate over idr Extend struct tcf_walker with additional 'cookie' field. It is intended to be used by classifier walk implementations to continue iteration directly from particular filter, instead of iterating 'skip' number of times. Change flower walk implementation to save filter handle in 'cookie'. Each time flower walk is called, it looks up filter with saved handle directly with idr, instead of iterating over filter linked list 'skip' number of times. This change improves complexity of dumping flower classifier from quadratic to linearithmic. (assuming idr lookup has logarithmic complexity) Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reported-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-13 18:24:27 -07:00
Nikolay Aleksandrov	c921c2077b	net: ipmr: add support for passing full packet on wrong vif This patch adds support for IGMPMSG_WRVIFWHOLE which is used to pass full packet and real vif id when the incoming interface is wrong. While the RP and FHR are setting up state we need to be sending the registers encapsulated with all the data inside otherwise we lose it. The RP then decapsulates it and forwards it to the interested parties. Currently with WRONGVIF we can only be sending empty register packets and will lose that data. This behaviour can be enabled by using MRT_PIM with val == IGMPMSG_WRVIFWHOLE. This doesn't prevent IGMPMSG_WRONGVIF from happening, it happens in addition to it, also it is controlled by the same throttling parameters as WRONGVIF (i.e. 1 packet per 3 seconds currently). Both messages are generated to keep backwards compatibily and avoid breaking someone who was enabling MRT_PIM with val == 4, since any positive val is accepted and treated the same. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-13 14:21:16 -07:00
Linus Walleij	430ac34de9	net: gemini: Indicate that we can handle jumboframes The hardware supposedly handles frames up to 10236 bytes and implements .ndo_change_mtu() so accept 10236 minus the ethernet header for a VLAN tagged frame on the netdevices. Use ETH_MIN_MTU as minimum MTU. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:39:15 -07:00
Linus Walleij	06d5151312	net: gemini: Move main init to port The initialization sequence for the ethernet, setting up interrupt routing and such things, need to be done after both the ports are clocked and reset. Before this the config will not "take". Move the initialization to the port probe function and keep track of init status in the state. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:39:15 -07:00
Linus Walleij	60cc7767b9	net: gemini: Allow multiple ports to instantiate The code was not tested with two ports actually in use at the same time. (I blame this on lack of actual hardware using that feature.) Now after locating a system using both ports, add necessary fix to make both ports come up. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:39:15 -07:00
Linus Walleij	9ab5c929e6	net: gemini: Improve connection prints Switch over to using a module parameter and debug prints that can be controlled by this or ethtool like everyone else. Depromote all other prints to debug messages. The phy_print_status() was already in place, albeit never really used because the debuglevel hiding it had to be set up using ethtool. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:39:15 -07:00
Linus Walleij	cedca41801	net: gemini: Look up L3 maxlen from table The code to calculate the hardware register enumerator for the maximum L3 length isn't entirely simple to read. Use the existing defines and rewrite the function into a table look-up. Acked-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:39:15 -07:00
David S. Miller	750c721ee0	Merge branch 'devlink-Add-support-for-region-access' Alex Vesker says: ==================== devlink: Add support for region access This is a proposal which will allow access to driver defined address regions using devlink. Each device can create its supported address regions and register them. A device which exposes a region will allow access to it using devlink. The suggested implementation will allow exposing regions to the user, reading and dumping snapshots taken from different regions. A snapshot represents a memory image of a region taken by the driver. If a device collects a snapshot of an address region it can be later exposed using devlink region read or dump commands. This functionality allows for future analyses on the snapshots to be done. The major benefit of this support is not only to provide access to internal address regions which were inaccessible to the user but also to provide an additional way to debug complex error states using the region snapshots. Implemented commands: $ devlink region help $ devlink region show [ DEV/REGION ] $ devlink region del DEV/REGION snapshot SNAPSHOT_ID $ devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ] $ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ] address ADDRESS length length Show all of the exposed regions with region sizes: $ devlink region show pci/0000:00:05.0/cr-space: size 1048576 snapshot [1 2] pci/0000:00:05.0/fw-health: size 64 snapshot [1 2] Delete a snapshot using: $ devlink region del pci/0000:00:05.0/cr-space snapshot 1 Dump a snapshot: $ devlink region dump pci/0000:00:05.0/fw-health snapshot 1 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8 0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc 0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5 Read a specific part of a snapshot: $ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0 length 16 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 For more information you can check devlink-region.8 man page Future: There is a plan to extend the support to include a write command as well as performing read and dump live region v1->v2: -Add a parameter to enable devlink region snapshot -Allocate snapshot memory using kvmalloc -Introduce destructor function devlink_snapshot_data_dest_t to avoid double allocation v2->v3: -Fix incorrect comment in devlink.h for DEVLINK_ATTR_REGION_SIZE from u32 to u64 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:37:14 -07:00
Alex Vesker	3c641ba4a8	net/mlx4_core: Use devlink region_snapshot parameter This parameter enables capturing region snapshot of the crspace during critical errors. The default value of this parameter is disabled, it can be enabled using devlink param commands. It is possible to configure during runtime and also driver init. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:37:13 -07:00
Alex Vesker	f6a69885f2	devlink: Add generic parameters region_snapshot region_snapshot - When set enables capturing region snapshots Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:37:13 -07:00
Alex Vesker	bedc989b0c	net/mlx4_core: Add Crdump FW snapshot support Crdump allows the driver to create a snapshot of the FW PCI crspace and health buffer during a critical FW issue. In case of a FW command timeout, FW getting stuck or a non zero value on the catastrophic buffer, a snapshot will be taken. The snapshot is exposed using devlink, cr-space, fw-health address regions are registered on init and snapshots are attached once a new snapshot is collected by the driver. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:37:13 -07:00
Alex Vesker	523f9eb1ef	net/mlx4_core: Add health buffer address capability Health buffer address is a 32 bit PCI address offset provided by the FW. This offset is used for reading FW health debug data located on the shared CR space. Cr space is accessible in both driver and FW and allows for different queries and configurations. Health buffer size is always 64B of readable data followed by a lock which is used to block volatile CR space access. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:37:13 -07:00
Alex Vesker	4e54795a27	devlink: Add support for region snapshot read command Add support for DEVLINK_CMD_REGION_READ_GET used for both reading and dumping region data. Read allows reading from a region specific address for given length. Dump allows reading the full region. If only snapshot ID is provided a snapshot dump will be done. If snapshot ID, Address and Length are provided a snapshot read will done. This is used for both snapshot access and will be used in the same way to access current data on the region. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:37:13 -07:00
Alex Vesker	866319bb94	devlink: Add support for region snapshot delete command Add support for DEVLINK_CMD_REGION_DEL used for deleting a snapshot from a region. The snapshot ID is required. Also added notification support for NEW and DEL of snapshots. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:37:13 -07:00
Alex Vesker	a006d467fb	devlink: Extend the support querying for region snapshot IDs Extend the support for DEVLINK_CMD_REGION_GET command to also return the IDs of the snapshot currently present on the region. Each reply will include a nested snapshots attribute that can contain multiple snapshot attributes each with an ID. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:37:13 -07:00
Alex Vesker	d8db7ea55f	devlink: Add support for region get command Add support for DEVLINK_CMD_REGION_GET command which is used for querying for the supported DEV/REGION values of devlink devices. The support is both for doit and dumpit. Reply includes: BUS_NAME, DEVICE_NAME, REGION_NAME, REGION_SIZE Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:37:13 -07:00
Alex Vesker	d7e5272282	devlink: Add support for creating region snapshots Each device address region can store multiple snapshots, each snapshot is identified using a different numerical ID. This ID is used when deleting a snapshot or showing an address region specific snapshot. This patch exposes a callback to add a new snapshot to an address region. The snapshot will be deleted using the destructor function when destroying a region or when a snapshot delete command from devlink user tool. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:37:13 -07:00
Alex Vesker	ccadfa444b	devlink: Add callback to query for snapshot id before snapshot create To restrict the driver with the snapshot ID selection a new callback is introduced for the driver to get the snapshot ID before creating a new snapshot. This will also allow giving the same ID for multiple snapshots taken of different regions on the same time. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:37:12 -07:00
Alex Vesker	b16ebe925a	devlink: Add support for creating and destroying regions This allows a device to register its supported address regions. Each address region can be accessed directly for example reading the snapshots taken of this address space. Drivers are not limited in the name selection for different regions. An example of a region-name can be: pci cr-space, register-space. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:37:12 -07:00
David S. Miller	23c9ef2b6e	Merge branch 'mvpp2-add-RSS-support' Maxime Chevallier says: ==================== net: mvpp2: add RSS support This series adds support for RSS on PPv2. There already was some code to handle the RSS tables, but the driver was missing all the classification steps required to actually use these tables. RSS is used through the classifier, using at least 2 lookups : - One using the C2 engine, a TCAM engine that match the packet based on some header extracted fields, assigns the default rx queue for that packet and tag it for RSS - One using the C3Hx engine, which computes the hash that's used to perform the lookup in the RSS table. Since RSS spreads the load across CPUs, we need to make sure that packets from the same flow are always assigned the same rx queue, to prevent re-ordering. This series therefore adds a classification step based on the Header Parser, that separate ingress traffic into 52 flows, based on some L2, L3 and L4 parameters. Patches 1 and 2 fix some header issues, from the driver splitting Patches 3 to 7 make sure the correct receive queue setup is used for RSS Patches 8 to 14 deal with the way we handle the RSS tables Patch 15 implement basic classifier configuration, by using it to assign the default receive queue Patch 16 implement the ingress traffic splitting into multiple flows Patch 17 adds RSS support, by using the needed classification steps Patch 18 adds the required ethtool ops to configure the flow hash parameters This was tested on MacchiatoBin, giving some nice performance improvements using ip forwarding (going from 5Gbps to 9.6Gbps total throughput). RSS is disabled by default. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:49 -07:00
Maxime Chevallier	436d4fdb20	net: mvpp2: allow setting RSS flow hash parameters with ethtool This commit allows setting the RSS hash generation parameters from ethtool. When setting parameters for a given flow type from ethtool (e.g. tcp4), all the corresponding flows in the flow table are updated, according to the supported hash parameters. For example, when configuring TCP over IPv4 hash parameters to be src/dst IP + src/dst port ("ethtool -N eth0 rx-flow-hash tcp4 sdfn"), we only set the "src/dst port" hash parameters on the non-fragmented TCP over IPv4 flows. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:49 -07:00
Maxime Chevallier	d33ec45250	net: mvpp2: add an RSS classification step for each flow One of the classification action that can be performed is to compute a hash of the packet header based on some header fields, and lookup a RSS table based on this hash to determine the final RxQ. This is done by adding one lookup entry per flow per port, so that we can configure the hash generation parameters for each flow and each port. There are 2 possible engines that can be used for RSS hash generation : - C3HA, that generates a hash based on up to 4 header-extracted fields - C3HB, that does the same as c3HA, but also includes L4 info in the hash There are a lot of fields that can be extracted from the header. For now, we only use the ones that we can configure using ethtool : - DST MAC address - L3 info - Source IP - Destination IP - Source port - Destination port The C3HB engine is selected when we use L4 fields (src/dst port). Header parser Dec table Ingress pkt +-------------+ flow id +----------------------------+ ------------->\| TCAM + SRAM \|-------->\|TCP IPv4 w/ VLAN, not frag \| +-------------+ \|TCP IPv4 w/o VLAN, not frag \| \|TCP IPv4 w/ VLAN, frag \|--+ \|etc. \| \| +----------------------------+ \| \| Flow table \| +---------+ +------------+ +--------------------------+ \| \| RSS tbl \|<--\| Classifier \|<--------\| flow 0: C2 lookup \| \| +---------+ +------------+ \| C3 lookup port 0 \| \| \| \| \| C3 lookup port 1 \| \| +-----------+ +-------------+ \| ... \| \| \| C2 engine \| \| C3H engines \| \| flow 1: C2 lookup \|<--+ +-----------+ +-------------+ \| C3 lookup port 0 \| \| ... \| \| ... \| \| flow 51 : C2 lookup \| \| ... \| +--------------------------+ The C2 engine also gains the role of enabling and disabling the RSS table lookup for this packet. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:49 -07:00
Maxime Chevallier	f9358e12a0	net: mvpp2: split ingress traffic into multiple flows The PPv2 classifier allows to perform classification operations on each ingress packet, based on the flow the packet is assigned to. The current code uses only 1 flow per port, and the only classification action consists of assigning the rx queue to the packet, depending on the port. In preparation for adding RSS support, we have to split all incoming traffic into different flows. Since RSS assigns a rx queue depending on the hash of some header fields, we have to make sure that the hash is generated in a consistent way for all packets in the same flow. What we call a "flow" is actually a set of attributes attached to a packet that depends on various L2/L3/L4 info. This patch introduces 52 flows, wich are a combination of various L2, L3 and L4 attributes : - Whether or not the packet has a VLAN tag - Whether the packet is IPv4, IPv6 or something else - Whether the packet is TCP, UDP or something else - Whether or not the packet is fragmented at L3 level. The flow is associated to a packet by the Header Parser. Each flow corresponds to an entry in the decoding table. This entry then points to the sequence of classification lookups to be performed by the classifier, represented in the flow table. For now, the only lookup we perform is a C2 lookup to set the default rx queue. Header parser Dec table Ingress pkt +-------------+ flow id +----------------------------+ ------------->\| TCAM + SRAM \|-------->\|TCP IPv4 w/ VLAN, not frag \| +-------------+ \|TCP IPv4 w/o VLAN, not frag \| \|TCP IPv4 w/ VLAN, frag \|--+ \|etc. \| \| +----------------------------+ \| \| Flow table \| +------------+ +---------------------+ \| To RxQ <---\| Classifier \|<-------\| flow 0: C2 lookup \|<--------+ +------------+ \| flow 1: C2 lookup \| \| \| ... \| +------------+ \| flow 51 : C2 lookup \| \| C2 engine \| +---------------------+ +------------+ Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:49 -07:00
Maxime Chevallier	b1a962c62c	net: mvpp2: use classifier to assign default rx queue The PPv2 Controller has a classifier, that can perform multiple lookup operations for each packet, using different engines. One of these engines is the C2 engine, which performs TCAM based lookups on data extracted from the packet header. When a packet matches an entry, the engine sets various attributes, used to perform classification operations. One of these attributes is the rx queue in which the packet should be sent. The current code uses the lookup_id table (also called decoding table) to assign the rx queue. However, this only works if we use one entry per port in the decoding table, which won't be the case once we add RSS lookups. This patch uses the C2 engine to assign the rx queue to each packet. The C2 engine is used through the flow table, which dictates what classification operations are done for a given flow. Right now, we have one flow per port, which contains every ingress packet for this port. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:49 -07:00
Maxime Chevallier	e6e21c0242	net: mvpp2: rename per-port RSS init function mvpp22_init_rss function configures the RSS parameters for each port, so rename it accordingly. Since this function relies on classifier configuration, move its call right after the classifier config. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Maxime Chevallier	2a2f467daf	net: mvpp2: make sure we don't spread load on disabled CPUs When filling the RSS table, we have to make sure that the rx queue is attached to an online CPU. This patch is not a full support for cpu_hotplug, but rather a way to make sure that we don't break network on system booted with the maxcpus parameter. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Antoine Tenart	662ae3fe65	net: mvpp2: improve the distribution of packets on CPUs when using RSS This patch adds an extra indirection when setting the indirection table into the RSS hardware table to improve the packets distribution across CPUs. For example, if 2 queues are used on a multi-core system this new indirection will choose two queues on two different CPUs instead of the two first queues which are on the same first CPU. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Antoine Tenart	8179642b52	net: mvpp2: RSS indirection table support This patch adds the RSS indirection table support, allowing to use the ethtool -x and -X options to dump and set this table. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> [Maxime: Small warning fixes, use one table per port] Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Maxime Chevallier	a27a254c26	net: mvpp2: use one RSS table per port PPv2 Controller has 8 RSS Tables, of 32 entries each. A lookup in the RXQ2RSS_TABLE is performed for each incoming packet, and the RSS Table to be used is chosen according to the default rx queue that would be used for the packet. This default rx queue is set in the Lookup_id Table (also called Decoding Table), and is equal to the port->first_rxq. Since the Classifier itself isn't active at any time for the moment, this doesn't have a direct effect, the default rx queue at the moment is the one where all packets end-up into. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Maxime Chevallier	4b86097be7	net: mvpp2: fix RSS register definitions There is no RSS_TABLE register in PPv2 Controller. The register 0x1510 which was specified is actually named "RSS_HASH_SEL", but isn't used by this driver at all. Based on how this register was used, it should have been the RXQ2RSS_TABLE register, which allows to select the RSS table that will be used for the incoming packet. The RSS_TABLE_POINTER is actually a field of this RXQ2RSS_TABLE register. Since RSS tables are actually not used by the driver for now, this commit does not fix a runtime bug. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Antoine Tenart	132baa0378	net: mvpp2: fix a typo in the RSS code Cosmetic patch fixing a typo in one of the RSS comments. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Maxime Chevallier	f8c6ba8424	net: mvpp2: use only one rx queue per port per CPU The number of receive queue per port is : - MVPP2_DEFAULT_RXQ if in single queue mode - MVPP2_DEFAULT_RXQ * num_possible_cpus if in multi queue mode with MVPP2_DEFAULT_RXQ = 4. However, we don't use the extra rx queues at the moment, we really only need one per port per CPU, until some more advanced classification rules are implemented. Suggested-by: Stefan Chulski <stefanc@marvell.com> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Maxime Chevallier	790d32c6d3	net: mvpp2: fix hardcoded number of rx queues There's a dedicated #define that indicates the number of rx queues per port per cpu, this commit removes a harcoded use of that value This doesn't fix any runtime bugs since the harcoded value matches the expected value. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Yan Markman	4c4a5686c4	net: mvpp2: use RSS only when using multi-queue mode Since RSS only applies when we have per-cpu rx queues, it should only be enabled when the driver is configured to make use of multi-queue mode. Signed-off-by: Yan Markman <ymarkman@marvell.com> [Maxime: Commit message] Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Maxime Chevallier	3f6aaf7289	net: mvpp2: make multi queue mode the default mode The multi queue mode is needed to have RSS available, and offers some nice advantages, being able to have one rx queue vector per CPU. This mode has been usable through the use of a module parameter, this commit makes it the default value. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Maxime Chevallier	1e27a628e3	net: mvpp2: make sure we use single queue mode on PPv2.1 The PPv2 driver defines 2 "queue_modes" : - QDIST_SINGLE_MODE, where each port share one rx queue vector between all CPUs - QDIST_MULTI_MODE, where each port has one rx queue vector per CPU. Multi queue mode isn't available on PPv2.1, make sure we fallback to single mode when running on this revision. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:48 -07:00
Maxime Chevallier	0ad2f53906	net: mvpp2: define the number of RSS entries per table in mvpp2.h The size of the the RSS indirection tables should be defined in mvpp2.h, so that we can use it in all files of the PPv2 driver. This commit moves the define in mvpp2.h, and adds the missing #include in mvpp2_cls.h. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:47 -07:00
Maxime Chevallier	53a40025c0	net: mvpp2: fix include guards in mvpp2_prs.h Include guards should be put before #includes. This doesn't fix any bug, but prevent future compilation issues when adding new files in the mvpp2 driver The Header Parser init function needs the platform_device definition, and with the fixed include guards we need to add the missing include. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:30:47 -07:00
Prashant Bhole	68d2f84a13	net: gro: properly remove skb from list Following crash occurs in validate_xmit_skb_list() when same skb is iterated multiple times in the loop and consume_skb() is called. The root cause is calling list_del_init(&skb->list) and not clearing skb->next in `d4546c2509`. list_del_init(&skb->list) sets skb->next to point to skb itself. skb->next needs to be cleared because other parts of network stack uses another kind of SKB lists. validate_xmit_skb_list() uses such list. A similar type of bugfix was reported by Jesper Dangaard Brouer. https://patchwork.ozlabs.org/patch/942541/ This patch clears skb->next and changes list_del_init() to list_del() so that list->prev will maintain the list poison. [ 148.185511] ================================================================== [ 148.187865] BUG: KASAN: use-after-free in validate_xmit_skb_list+0x4b/0xa0 [ 148.190158] Read of size 8 at addr ffff8801e52eefc0 by task swapper/1/0 [ 148.192940] [ 148.193642] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #25 [ 148.195423] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014 [ 148.199129] Call Trace: [ 148.200565] <IRQ> [ 148.201911] dump_stack+0xc6/0x14c [ 148.203572] ? dump_stack_print_info.cold.1+0x2f/0x2f [ 148.205083] ? kmsg_dump_rewind_nolock+0x59/0x59 [ 148.206307] ? validate_xmit_skb+0x2c6/0x560 [ 148.207432] ? debug_show_held_locks+0x30/0x30 [ 148.208571] ? validate_xmit_skb_list+0x4b/0xa0 [ 148.211144] print_address_description+0x6c/0x23c [ 148.212601] ? validate_xmit_skb_list+0x4b/0xa0 [ 148.213782] kasan_report.cold.6+0x241/0x2fd [ 148.214958] validate_xmit_skb_list+0x4b/0xa0 [ 148.216494] sch_direct_xmit+0x1b0/0x680 [ 148.217601] ? dev_watchdog+0x4e0/0x4e0 [ 148.218675] ? do_raw_spin_trylock+0x10/0x120 [ 148.219818] ? do_raw_spin_lock+0xe0/0xe0 [ 148.221032] __dev_queue_xmit+0x1167/0x1810 [ 148.222155] ? sched_clock+0x5/0x10 [...] [ 148.474257] Allocated by task 0: [ 148.475363] kasan_kmalloc+0xbf/0xe0 [ 148.476503] kmem_cache_alloc+0xb4/0x1b0 [ 148.477654] __build_skb+0x91/0x250 [ 148.478677] build_skb+0x67/0x180 [ 148.479657] e1000_clean_rx_irq+0x542/0x8a0 [ 148.480757] e1000_clean+0x652/0xd10 [ 148.481772] net_rx_action+0x4ea/0xc20 [ 148.482808] __do_softirq+0x1f9/0x574 [ 148.483831] [ 148.484575] Freed by task 0: [ 148.485504] __kasan_slab_free+0x12e/0x180 [ 148.486589] kmem_cache_free+0xb4/0x240 [ 148.487634] kfree_skbmem+0xed/0x150 [ 148.488648] consume_skb+0x146/0x250 [ 148.489665] validate_xmit_skb+0x2b7/0x560 [ 148.490754] validate_xmit_skb_list+0x70/0xa0 [ 148.491897] sch_direct_xmit+0x1b0/0x680 [ 148.493949] __dev_queue_xmit+0x1167/0x1810 [ 148.495103] br_dev_queue_push_xmit+0xce/0x250 [ 148.496196] br_forward_finish+0x276/0x280 [ 148.497234] __br_forward+0x44f/0x520 [ 148.498260] br_forward+0x19f/0x1b0 [ 148.499264] br_handle_frame_finish+0x65e/0x980 [ 148.500398] NF_HOOK.constprop.10+0x290/0x2a0 [ 148.501522] br_handle_frame+0x417/0x640 [ 148.502582] __netif_receive_skb_core+0xaac/0x18f0 [ 148.503753] __netif_receive_skb_one_core+0x98/0x120 [ 148.504958] netif_receive_skb_internal+0xe3/0x330 [ 148.506154] napi_gro_complete+0x190/0x2a0 [ 148.507243] dev_gro_receive+0x9f7/0x1100 [ 148.508316] napi_gro_receive+0xcb/0x260 [ 148.509387] e1000_clean_rx_irq+0x2fc/0x8a0 [ 148.510501] e1000_clean+0x652/0xd10 [ 148.511523] net_rx_action+0x4ea/0xc20 [ 148.512566] __do_softirq+0x1f9/0x574 [ 148.513598] [ 148.514346] The buggy address belongs to the object at ffff8801e52eefc0 [ 148.514346] which belongs to the cache skbuff_head_cache of size 232 [ 148.517047] The buggy address is located 0 bytes inside of [ 148.517047] 232-byte region [ffff8801e52eefc0, ffff8801e52ef0a8) [ 148.519549] The buggy address belongs to the page: [ 148.520726] page:ffffea000794bb00 count:1 mapcount:0 mapping:ffff880106f4dfc0 index:0xffff8801e52ee840 compound_mapcount: 0 [ 148.524325] flags: 0x17ffffc0008100(slab\|head) [ 148.525481] raw: 0017ffffc0008100 ffff880106b938d0 ffff880106b938d0 ffff880106f4dfc0 [ 148.527503] raw: ffff8801e52ee840 0000000000190011 00000001ffffffff 0000000000000000 [ 148.529547] page dumped because: kasan: bad access detected Fixes: `d4546c2509` ("net: Convert GRO SKB handling to list_head.") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Reported-by: Tyler Hicks <tyhicks@canonical.com> Tested-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 17:00:35 -07:00
David S. Miller	c8c81de96b	Merge branch 's390-qeth-updates' Julian Wiedmann says: ==================== s390/qeth: updates 2018-07-11 please apply this first batch of qeth patches for net-next. It brings the usual cleanups, and some performance improvements to the transmit paths. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:42:40 -07:00
Julian Wiedmann	fb321f25e5	s390/qeth: speed-up IPv4 OSA xmit Move the xmit of offload-eligible (ie IPv4) traffic on OSA over to the new, copy-free path. As with L2, we'll need to preserve the skb_orphan() behaviour of the old code path until TX completion is sufficiently fast. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:42:40 -07:00
Julian Wiedmann	a647a02512	s390/qeth: speed-up L3 IQD xmit This implements a new xmit path for L3 HiperSockets, which carves the HW header from skb headroom instead of allocating it from the hdr cache. It also adds NETIF_F_SG support. The delta in qeth_l3_xmit() is all just removal of IQD-specific code and some minor consolidation. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:42:40 -07:00
Julian Wiedmann	ea1d4a0c7f	s390/qeth: add a L3 xmit wrapper In preparation for future work, move the high-level xmit work into a separate wrapper. This matches the L2 xmit code. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:42:39 -07:00
Julian Wiedmann	371a1e7a07	s390/qeth: increase GSO max size for eligible L3 devices When a L3 device doesn't offer TSO, allow the stack to build full-size GSO skbs. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:42:39 -07:00
Julian Wiedmann	09960b3a0a	s390/qeth: clean up exported symbols Remove some redundant EXPORTs. While at it, also move some L2-only prototypes into the proper header file. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 16:42:39 -07:00

1 2 3 4 5 ...

767900 Commits