linux

Author	SHA1	Message	Date
Petr Machata	283a72a559	nexthop: Add implementation of resilient next-hop groups At this moment, there is only one type of next-hop group: an mpath group, which implements the hash-threshold algorithm. To select a next hop, hash-threshold algorithm first assigns a range of hashes to each next hop in the group, and then selects the next hop by comparing the SKB hash with the individual ranges. When a next hop is removed from the group, the ranges are recomputed, which leads to reassignment of parts of hash space from one next hop to another. While there will usually be some overlap between the previous and the new distribution, some traffic flows change the next hop that they resolve to. That causes problems e.g. as established TCP connections are reset, because the traffic is forwarded to a server that is not familiar with the connection. Resilient hashing is a technique to address the above problem. Resilient next-hop group has another layer of indirection between the group itself and its constituent next hops: a hash table. The selection algorithm uses a straightforward modulo operation to choose a hash bucket, and then reads the next hop that this bucket contains, and forwards traffic there. This indirection brings an important feature. In the hash-threshold algorithm, the range of hashes associated with a next hop must be continuous. With a hash table, mapping between the hash table buckets and the individual next hops is arbitrary. Therefore when a next hop is deleted the buckets that held it are simply reassigned to other next hops. When weights of next hops in a group are altered, it may be possible to choose a subset of buckets that are currently not used for forwarding traffic, and use those to satisfy the new next-hop distribution demands, keeping the "busy" buckets intact. This way, established flows are ideally kept being forwarded to the same endpoints through the same paths as before the next-hop group change. In a nutshell, the algorithm works as follows. Each next hop has a number of buckets that it wants to have, according to its weight and the number of buckets in the hash table. In case of an event that might cause bucket allocation change, the numbers for individual next hops are updated, similarly to how ranges are updated for mpath group next hops. Following that, a new "upkeep" algorithm runs, and for idle buckets that belong to a next hop that is currently occupying more buckets than it wants (it is "overweight"), it migrates the buckets to one of the next hops that has fewer buckets than it wants (it is "underweight"). If, after this, there are still underweight next hops, another upkeep run is scheduled to a future time. Chances are there are not enough "idle" buckets to satisfy the new demands. The algorithm has knobs to select both what it means for a bucket to be idle, and for whether and when to forcefully migrate buckets if there keeps being an insufficient number of idle buckets. There are three users of the resilient data structures. - The forwarding code accesses them under RCU, and does not modify them except for updating the time a selected bucket was last used. - Netlink code, running under RTNL, which may modify the data. - The delayed upkeep code, which may modify the data. This runs unlocked, and mutual exclusion between the RTNL code and the delayed upkeep is maintained by canceling the delayed work synchronously before the RTNL code touches anything. Later it restarts the delayed work if necessary. The RTNL code has to implement next-hop group replacement, next hop removal, etc. For removal, the mpath code uses a neat trick of having a backup next hop group structure, doing the necessary changes offline, and then RCU-swapping them in. However, the hash tables for resilient hashing are about an order of magnitude larger than the groups themselves (the size might be e.g. 4K entries), and it was felt that keeping two of them is an overkill. Both the primary next-hop group and the spare therefore use the same resilient table, and writers are careful to keep all references valid for the forwarding code. The hash table references next-hop group entries from the next-hop group that is currently in the primary role (i.e. not spare). During the transition from primary to spare, the table references a mix of both the primary group and the spare. When a next hop is deleted, the corresponding buckets are not set to NULL, but instead marked as empty, so that the pointer is valid and can be used by the forwarding code. The buckets are then migrated to a new next-hop group entry during upkeep. The only times that the hash table is invalid is the very beginning and very end of its lifetime. Between those points, it is always kept valid. This patch introduces the core support code itself. It does not handle notifications towards drivers, which are kept as if the group were an mpath one. It does not handle netlink either. The only bit currently exposed to user space is the new next-hop group type, and that is currently bounced. There is therefore no way to actually access this code. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 16:12:59 -08:00
Ido Schimmel	710ec56223	nexthop: Add netlink defines and enumerators for resilient NH groups - RTM_NEWNEXTHOP et.al. that handle resilient groups will have a new nested attribute, NHA_RES_GROUP, whose elements are attributes NHA_RES_GROUP_. - RTM_NEWNEXTHOPBUCKET et.al. is a suite of new messages that will currently serve only for dumping of individual buckets of resilient next hop groups. For nexthop group buckets, these messages will carry a nested attribute NHA_RES_BUCKET, whose elements are attributes NHA_RES_BUCKET_. There are several reasons why a new suite of messages is created for nexthop buckets instead of overloading the information on the existing RTM_{NEW,DEL,GET}NEXTHOP messages. First, a nexthop group can contain a large number of nexthop buckets (4k is not unheard of). This imposes limits on the amount of information that can be encoded for each nexthop bucket given a netlink message is limited to 64k bytes. Second, while RTM_NEWNEXTHOPBUCKET is only used for notifications at this point, in the future it can be extended to provide user space with control over nexthop buckets configuration. - The new group type is NEXTHOP_GRP_TYPE_RES. Note that nexthop code is adjusted to bounce groups with that type for now. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 16:12:59 -08:00
Petr Machata	90e1a9e213	nexthop: Add a dedicated flag for multipath next-hop groups With the introduction of resilient nexthop groups, there will be two types of multipath groups: the current hash-threshold "mpath" ones, and resilient groups. Both are multipath, but to determine the fact, the system needs to consider two flags. This might prove costly in the datapath. Therefore, introduce a new flag, that should be set for next-hop groups that have more than one nexthop, and should be considered multipath. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 16:12:59 -08:00
Petr Machata	96a856256a	nexthop: __nh_notifier_single_info_init(): Make nh_info an argument The cited function currently uses rtnl_dereference() to get nh_info from a handed-in nexthop. However, under the resilient hashing scheme, this function will not always be called under RTNL, sometimes the mutual exclusion will be achieved differently. Therefore move the nh_info extraction from the function to its callers to make it possible to use a different synchronization guarantee. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 16:12:59 -08:00
Petr Machata	597f48e46b	nexthop: Pass nh_config to replace_nexthop() Currently, replace assumes that the new group that is given is a fully-formed object. But mpath groups really only have one attribute, and that is the constituent next hop configuration. This may not be universally true. From the usability perspective, it is desirable to allow the replace operation to adjust just the constituent next hop configuration and leave the group attributes as such intact. But the object that keeps track of whether an attribute was or was not given is the nh_config object, not the next hop or next-hop group. To allow (selective) attribute updates during NH group replacement, propagate `cfg' to replace_nexthop() and further to replace_nexthop_grp(). Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 16:12:59 -08:00
David S. Miller	1d5d0a0786	Merge branch 'seg6-next' Julien Massonneau says: ==================== SRv6: SRH processing improvements Add support for IPv4 decapsulation in ipv6_srh_rcv() and ignore routing header with segments left equal to 0 for seg6local actions that doesn't perfom decapsulation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 16:09:21 -08:00
Julien Massonneau	fbbc5bc2ab	seg6: ignore routing header with segments left equal to 0 When there are 2 segments routing header, after an End.B6 action for example, the second SRH will never be handled by an action, packet will be dropped when the first SRH has segments left equal to 0. For actions that doesn't perform decapsulation (currently: End, End.X, End.T, End.B6, End.B6.Encaps), this patch adds the IP6_FH_F_SKIP_RH flag in arguments for ipv6_find_hdr(). Signed-off-by: Julien Massonneau <julien.massonneau@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 16:09:21 -08:00
Julien Massonneau	ee90c6ba34	seg6: add support for IPv4 decapsulation in ipv6_srh_rcv() As specified in IETF RFC 8754, section 4.3.1.2, if the upper layer header is IPv4 or IPv6, perform IPv6 decapsulation and resubmit the decapsulated packet to the IPv4 or IPv6 module. Only IPv6 decapsulation was implemented. This patch adds support for IPv4 decapsulation. Link: https://tools.ietf.org/html/rfc8754#section-4.3.1.2 Signed-off-by: Julien Massonneau <julien.massonneau@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 16:09:21 -08:00
David S. Miller	6c6095214a	Merge branch 'hns3-next' Huazhong Tan says: ==================== net: hns3: two updates for -next This series includes two updates for the HNS3 ethernet driver. ==================== Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 16:01:10 -08:00
Yufeng Mo	e8194f3262	net: hns3: use pause capability queried from firmware For maintainability and compatibility, add support to use pause capability queried from firmware, and add debugfs support to dump this capability. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 16:01:10 -08:00
Yufeng Mo	433ccce835	net: hns3: use FEC capability queried from firmware For maintainability and compatibility, add support to use FEC capability queried from firmware. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 16:01:10 -08:00
Jiapeng Chong	c53d21af67	netdevsim: fib: Remove redundant code Fix the following coccicheck warnings: ./drivers/net/netdevsim/fib.c:874:5-8: Unneeded variable: "err". Return "0" on line 889. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 14:32:48 -08:00
Florian Fainelli	b0bade515d	net: phy: Expose phydev::dev_flags through sysfs phydev::dev_flags contains a bitmask of configuration bits requested by the consumer of a PHY device (Ethernet MAC or switch) towards the PHY driver. Since these flags are often used for requesting LED or other type of configuration being able to quickly audit them without instrumenting the kernel is useful. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 12:47:27 -08:00
Florian Fainelli	ee47ed08d7	net: dsa: b53: Add debug prints in b53_vlan_enable() Having dynamic debug prints in b53_vlan_enable() has been helpful to uncover a recent but update the function to indicate the port being configured (or -1 for initial setup) and include the global VLAN enabled and VLAN filtering enable status. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-11 12:33:30 -08:00
Bhaskar Chowdhury	34bb975126	net: fddi: skfp: Mundane typo fixes throughout the file smt.h Few spelling fixes throughout the file. Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 15:42:22 -08:00
Shubhankar Kuranagatti	6b9c8f46af	net: ipv4: route.c: fix space before tab The extra space before tab space has been removed. Signed-off-by: Shubhankar Kuranagatti <shubhankarvk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 15:37:19 -08:00
David S. Miller	f2050d9139	Merge branch 'ionic-next' Shannon Nelson says: ==================== ionic Rx updates The ionic driver's Rx path is due for an overhaul in order to better use memory buffers and to clean up the data structures. The first two patches convert the driver to using page sharing between buffers so as to lessen the page alloc and free overhead. The remaining patches clean up the structs and fastpath code for better efficency. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 15:34:28 -08:00
Shannon Nelson	a25edab93b	ionic: simplify use of completion types Make better use of our struct types and type checking by passing the actual Rx or Tx completion type rather than a generic void pointer type. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 15:34:28 -08:00
Shannon Nelson	55eda6bbe0	ionic: rebuild debugfs on qcq swap With a reconfigure of each queue is needed a rebuild of the matching debugfs information. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 15:34:28 -08:00
Shannon Nelson	89e572e736	ionic: simplify rx skb alloc Remove an unnecessary layer over rx skb allocation. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 15:34:28 -08:00
Shannon Nelson	f37bc3462e	ionic: optimize fastpath struct usage Clean up a couple of struct uses to make for better fast path access. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 15:34:28 -08:00
Shannon Nelson	4b0a7539a3	ionic: implement Rx page reuse Rework the Rx buffer allocations to use pages twice when using normal MTU in order to cut down on buffer allocation and mapping overhead. Instead of tracking individual pages, in which we may have wasted half the space when using standard 1500 MTU, we track buffers which use half pages, so we can use the second half of the page rather than allocate and map a new page once the first buffer has been used. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 15:34:28 -08:00
Shannon Nelson	2b5720f269	ionic: move rx_page_alloc and free Move ionic_rx_page_alloc() and ionic_rx_page_free() to earlier in the file to make the next patch easier to review. Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 15:34:28 -08:00
David S. Miller	eeada4105d	Merge branch 'dpaa2-switch-next' Ioana Ciornei says: ==================== dpaa2-switch: CPU terminated traffic and move out of staging This patch set adds support for Rx/Tx capabilities on DPAA2 switch port interfaces as well as fixing up some major blunders in how we take care of the switching domains. The last patch actually moves the driver out of staging now that the minimum requirements are met. I am sending this directly towards the net-next tree so that I can use the rest of the development cycle adding new features on top of the current driver without worrying about merge conflicts between the staging and net-next tree. The control interface is comprised of 3 queues in total: Rx, Rx error and Tx confirmation. In this patch set we only enable Rx and Tx conf. All switch ports share the same queues when frames are redirected to the CPU. Information regarding the ingress switch port is passed through frame metadata - the flow context field of the descriptor. NAPI instances are also shared between switch net_devices and are enabled when at least on one of the switch ports .dev_open() was called and disabled when no switch port is still up. Since the last version of this feature was submitted to the list, I reworked how the switching and flooding domains are taken care of by the driver, thus the switch is now able to also add the control port (the queues that the CPU can dequeue from) into the flooding domains of a port (broadcast, unknown unicast etc). With this, we are able to receive and sent traffic from the switch interfaces. Also, the capability to properly partition the DPSW object into multiple switching domains was added so that when not under a bridge, the ports are not actually capable to switch between them. This is possible by adding a private FDB table per switch interface. When multiple switch interfaces are under the same bridge, they will all use the same FDB table. Another thing that is fixed in this patch set is how the driver handles VLAN awareness. The DPAA2 switch is not capable to run as VLAN unaware but this was not reflected in how the driver responded to requests to change the VLAN awareness. In the last patch, this is fixed by describing the switch interfaces as Rx VLAN filtering on [fixed] and declining any request to join a VLAN unaware bridge. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:30:36 -08:00
Ioana Ciornei	f48298d3fb	staging: dpaa2-switch: move the driver out of staging Now that the dpaa2-switch driver has basic I/O capabilities on the switch port net_devices and multiple bridging domains are supported, move the driver out of staging. The dpaa2-switch driver is placed right next to the dpaa2-eth driver since, in the near future, they will be sharing most of the data path. I didn't implement code reuse in this patch series because I wanted to keep it as small as possible. Also, the README is removed from staging with the intention to add proper rst documentation afterwards to actually match was is supported by the driver. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:30:36 -08:00
Ioana Ciornei	1c4928fc29	staging: dpaa2-switch: prevent joining a bridge while VLAN uppers are present Each time a switch port joins a bridge, it will start to use a FDB table common with all the other switch ports that are under the same bridge. This means that any VLAN added prior to a bridge join, will retain its previous FDB table destination. With this patch, I choose to restrict when a switch port can change it's upper device (either join or leave) so that the driver does not have to delete all the previously installed VLANs from the previous FDB and add them into the new one. Thus, in the PRECHANGEUPPER notification we check if there are any VLAN type upper devices and if that's true, deny the CHANGEUPPER. This way, the user is not restricted in the topology but rather in the order in which the setup is done: it must first create the bridging domain layout and after that add the necessary VLAN devices if necessary. The teardown is similar, the VLAN devices will need to be destroyed prior to a change in the bridging layout. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:30:36 -08:00
Ioana Ciornei	685b480145	staging: dpaa2-switch: add fast-ageing on bridge leave Upon leaving a bridge, any MAC addresses learnt on the switch port prior to this point have to be removed so that we preserve the bridging domain configuration. Restructure the dpaa2_switch_port_fdb_dump() function in order to have a common dpaa2_switch_fdb_iterate() function between the FDB dump callback and the fast age procedure. To accomplish this, add a new callback - dpaa2_switch_fdb_cb_t - which will be called on each MAC addr and, depending on the situation, will either dump the FDB entry into a netlink message or will delete the address from the FDB table, in case of the fast-age. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:30:36 -08:00
Ioana Ciornei	d671407fcc	staging: dpaa2-switch: accept only vlan-aware upper devices The DPAA2 Switch is not capable to handle traffic in a VLAN unaware fashion, thus the previous handling of both the accepted upper devices and the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING flag was wrong. Fix this by checking if the bridge that we are joining is indeed VLAN aware, if not return an error. Also, the RX VLAN filtering feature is defined as 'on [fixed]' and the .ndo_vlan_rx_add_vid() and .ndo_vlan_rx_kill_vid() callbacks are implemented just by recreating a switchdev_obj_port_vlan object and then calling the same functions used on the switchdev notifier path. In addition, changing the vlan_filtering flag to 0 on a bridge under which a DPAA2 switch interface is present is not supported, thus rejected when SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING is received with such a request. This patch is also adding the use of the switchdev_handle_port_attr_set function so that we can iterate through all the lower devices of the bridge that the notification was received on and actually catch if the user is trying to change the vlan_filtering state. Since on a VLAN filtering change the net_device is the bridge, we also move the dpaa2_switch_port_dev_check call so that we do not return NOTIFY_DONE right away. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:30:36 -08:00
Ioana Ciornei	16abb6ad6a	staging: dpaa2-switch: move the notifier register to module_init() Move the notifier blocks register into the module_init() step, instead of object probe, so that all DPSW devices probed by the dpaa2-switch driver can use the same notifiers. This will enable us to have a more straightforward approach in determining if an event is intended for an object managed by this driver or not. Previously, the dpaa2_switch_port_dev_check() function was forced to also check the notifier block beside the net_device_ops structure to determine if the event is for us or not. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:30:36 -08:00
Ioana Ciornei	539dda3c5d	staging: dpaa2-switch: properly setup switching domains Until now, the DPAA2 switch was not capable to properly setup its switching domains depending on the existence, or lack thereof, of a upper bridge device. This meant that all switch ports of a DPSW object were switching by default even though they were not under the same bridge device. Another issue was the inability to actually add the CPU in the flooding domains (broadcast, unknown unicast etc) of a particular switch port. This meant that a simple ping on a switch interface was not possible since no broadcast ARP frame would actually reach the CPU queues. This patch tries to fix exactly these problems by: * Creating and managing a FDB table for each flooding domain. This means that when a switch interface is not bridged it will use its own FDB table. While in bridged mode all DPAA2 switch interfaces under the same upper will use the same FDB table, thus leverage the same FDB entries. * Adding a new MC firmware command - dpsw_set_egress_flood() - through which the driver can setup the flooding domains as needed. For example, when the switch interface is standalone, thus not in a bridge with any other DPAA2 switch port, it will setup its broadcast and unknown unicast flooding domains to only include the control interface (the queues that reach the CPU and the driver can dequeue from). This flooding domain changes when the interface joins a bridge and is configured to include, beside the control interface, all other DPAA2 switch interfaces. We impose a minimum limit of FDB tables available equal to the number of switch interfaces so that we guarantee that, in the maximal configuration - all interfaces are standalone, each switch port will have a private FDB table. At the same time, we only probe DPSW objects that have the flooding and broadcast replicators configured to be per FDB (DPSW_*_PER_FDB). Without this, the dpaa2-switch driver would not be able to configure multiple switching domains. At probe time, a FDB table will be allocated for each port. At a bridge join event, the switch port will either continue to use the current FDB table (if it's the first dpaa2-switch port to join that bridge) or will switch to use the FDB table associated with the port that it's already under the bridge. If a FDB switch is necessary, the private FDB table which was previously used will be returned to the pool of unused FDBs. Upon a bridge leave, the switch port needs a private FDB table thus it will search and get the first unused FDB table. This way, all the other ports remaining under the bridge will continue to use the same FDB table. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:30:36 -08:00
Ioana Ciornei	613c0a5810	staging: dpaa2-switch: enable the control interface Enable the CTRL_IF of the switch object, now that all the pieces are in place (buffer and queue management, interrupts, NAPI instances etc). Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:30:36 -08:00
Ioana Ciornei	7fd94d86b7	staging: dpaa2-switch: add .ndo_start_xmit() callback Implement the .ndo_start_xmit() callback for the switch port interfaces. For each of the switch ports, gather the corresponding queue destination ID (QDID) necessary for Tx enqueueing. We'll reserve 64 bytes for software annotations, where we keep a skb backpointer used on the Tx confirmation side for releasing the allocated memory. At the moment, we only support linear skbs. Also, add support for the Tx confirmation path which for the most part shares the code path with the normal Rx queue. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:30:36 -08:00
Ioana Ciornei	0b1b713704	staging: dpaa2-switch: handle Rx path on control interface The dpaa2-ethsw supports only one Rx queue that is shared by all switch ports. This means that information about which port was the ingress port for a specific frame needs to be passed in metadata. In our case, the Flow Context (FLC) field from the frame descriptor holds this information. Besides the interface ID of the ingress port we also receive the virtual QDID of the port. Below is a visual description of the 64 bits of FLC. 63 47 31 15 0 +---------------------------------------------------+ \| \| \| \| \| \| RESERVED \| IF_ID \| RESERVED \| IF QDID \| \| \| \| \| \| +---------------------------------------------------+ Because all switch ports share the same Rx and Tx conf queues, NAPI management takes into consideration when there is at least one switch interface open to enable the NAPI instance. The Rx path is common, for the most part, for both Rx and Tx conf with the mention that each of them has its own consume function of a frame descriptor. Dequeueing from a FQ, consuming dequeued store and also the NAPI poll function is common between both queues. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:30:36 -08:00
Ioana Ciornei	04abc97d3e	staging: dpaa2-switch: setup dpio Setup interrupts on the control interface queues. We do not force an exact affinity between the interrupts received from a specific queue and a cpu. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:30:36 -08:00
Ioana Ciornei	2877e4f7e1	staging: dpaa2-switch: setup buffer pool and RX path rings Allocate and setup a buffer pool, needed on the Rx path of the control interface. Also, define the Rx buffer size seen by the WRIOP from the PAGE_SIZE buffers seeded. Also, create the needed Rx rings for both frame queues used on the control interface. On the Rx path, when a pull-dequeue operation is performed on a software portal, available frame descriptors are put in a ring - a DMA memory storage - for further usage. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:30:36 -08:00
Ioana Ciornei	26d419f36a	staging: dpaa2-switch: get control interface attributes Introduce a new structure to hold all necessary info related to an RX queue for the control interface and populate the FQ IDs. We only have one Rx queue and one Tx confirmation queue on the control interface, both shared by all the switch ports. Also, increase the minimum version of the object supported by the driver since for a basic switch driver support we'll be in need for some ABIs added in the latest version of firmware. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:30:35 -08:00
Ioana Ciornei	5dda9a7921	staging: dpaa2-switch: remove obsolete .ndo_fdb_{add\|del} callbacks Since the dpaa2-switch already listens for SWITCHDEV_FDB_ADD_TO_DEVICE / SWITCHDEV_FDB_DEL_TO_DEVICE events emitted by the bridge, we don't need the bridge bypass operations, and now is a good time to delete them. All 'bridge fdb' commands need the 'master' flag specified now. In fact, having the obsolete .ndo_fdb_{add\|del} callbacks would even complicate the bridge leave/join procedures without any real benefit. Every FDB entry is installed in an FDB ID as far as the hardware is concerned, and the dpaa2-switch ports change their FDB ID when they join or leave a bridge. So we would need to manually delete these FDB entries when the FDB ID changes. That's because, unlike FDB entries added through switchdev, where the bridge automatically deletes those on leave, there isn't anybody who will remove the static FDB entries installed via the bridge bypass operations upon a change in the upper device. Note that we still need .ndo_fdb_dump though. The dpaa2-switch does not emit any interrupts when a new address is learnt, so we cannot keep the bridge FDB in sync with the hardware FDB. Therefore, we need this callback to get a chance to print the FDB entries that were dynamically learnt by our hardware. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:30:35 -08:00
Ioana Ciornei	282d47de29	staging: dpaa2-switch: fix up initial forwarding configuration done by firmware By default, the DPSW object is configured with VLAN ID 1 in the VLAN table, which all ports are member of. This entry in the VLAN table selects the same FDB ID for all ports, meaning that forwarding between ports is permitted. This is unlike the switchdev model, where each port should operate as standalone by default. To make the switch operate in standalone ports mode, we need the VLAN table to select a unique FDB ID for each port. In order to do that, we need to simply delete the VLAN 1 created automatically by firmware, and let dpaa2_switch_port_init take over, by readding VLAN ID 1, but pointing towards a unique FDB ID. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:30:35 -08:00
Ioana Ciornei	93a4d0ab1e	staging: dpaa2-switch: remove broken learning and flooding support This patch is removing the current configuration of learning and flooding states per switch port because they are essentially broken in terms of integration with the switchdev APIs and the bridge understanding of these states. First of all, the learning state is a per switch port configuration while the dpaa2-switch driver was using it to configure the entire bridging domain. This is broken since the software learning state could be out of sync with the hardware state when ports from the same bridging domain are configured by the user with different learning parameters. The BR_FLOOD flag has been misinterpreted as well. Instead of denoting whether unicast traffic for which there is no FDB entry will be flooded towards a given port, the dpaa2-switch used the flag to configure whether or not a frame with an unknown destination received on a given port should be flooded or not. In summary, it was used as ingress setting instead of a egress one. Also, remove the unnecessary call to dpsw_if_set_broadcast() and the API definition. The HW default is to let all switch ports to be able to flood broadcast traffic thus there is no need to call the API again. Instead of trying to patch things up, just remove the support for the moment so that we'll add it back cleanly once the driver is out of staging. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:30:35 -08:00
David S. Miller	157611c895	Merge branch 'enetc-cleanups' Vladimir Oltean says: ==================== Refactoring/cleanup for NXP ENETC This series performs the following: - makes the API for Control Buffer Descriptor Rings in enetc_cbdr.c a bit more tightly knit. - moves more logic into enetc_rxbd_next to make the callers simpler - moves more logic into enetc_refill_rx_ring to make the callers simpler - removes forward declarations - simplifies the probe path to unify probing for used and unused PFs. Nothing radical. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	7a5222cb7a	net: enetc: make enetc_refill_rx_ring update the consumer index Since commit `fd5736bf9f` ("enetc: Workaround for MDIO register access issue"), enetc_refill_rx_ring no longer updates the RX BD ring's consumer index, that is left to be done by the caller. This has led to bugs such as the ones found in `96a5223b91` ("net: enetc: remove bogus write to SIRXIDR from enetc_setup_rxbdr") and `3a5d12c9be` ("net: enetc: keep RX ring consumer index in sync with hardware"), so it is desirable that we move back the update of the consumer index into enetc_refill_rx_ring. The trouble with that is the different MDIO locking context for the two callers of enetc_refill_rx_ring: - enetc_clean_rx_ring runs under enetc_lock_mdio() - enetc_setup_rxbdr runs outside enetc_lock_mdio() Simplify the callers of enetc_refill_rx_ring by making enetc_setup_rxbdr explicitly take enetc_lock_mdio() around the call. It will be the only place in need of ensuring the hot accessors can be used. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	0486185ee2	net: enetc: remove forward declaration for enetc_map_tx_buffs There is no other reason why this forward declaration exists rather than poor ordering of the functions. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	8580b3c3d7	net: enetc: remove forward-declarations of enetc_clean_{rx,tx}_ring This patch moves the NAPI enetc_poll after enetc_clean_rx_ring such that we can delete the forward declarations. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	7f071a450b	net: enetc: use enum enetc_active_offloads The active_offloads variable of enetc_ndev_priv has an enum type, use it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	c027aa9201	net: enetc: simplify callers of enetc_rxbd_next When we iterate through the BDs in the RX ring, the software producer index (which is already passed by value to enetc_rxbd_next) lags behind, and we end up with this funny looking "++i == rx_ring->bd_count" check so that we drag it after us. Let's pass the software producer index "i" by reference, so that enetc_rxbd_next can increment it by itself (mod rx_ring->bd_count), especially since enetc_rxbd_next has to increment the index anyway. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	4b47c0b81f	net: enetc: don't initialize unused ports from a separate code path Since commit `3222b5b613` ("net: enetc: initialize RFS/RSS memories for unused ports too") there is a requirement to initialize the memories of unused PFs too, which has left the probe path in a bit of a rough shape, because we basically have a minimal initialization path for unused PFs which is separate from the main initialization path. Now that initializing a control BD ring is as simple as calling enetc_setup_cbdr, let's move that outside of enetc_alloc_si_resources (unused PFs don't need classification rules, so no point in allocating them just to free them later). But enetc_alloc_si_resources is called both for PFs and for VFs, so now that enetc_setup_cbdr is no longer called from this common function, it means that the VF probe path needs to explicitly call enetc_setup_cbdr too. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	5b4daa7f12	net: enetc: pass bd_count as an argument to enetc_setup_cbdr It makes no sense from an API perspective to first initialize some portion of struct enetc_cbdr outside enetc_setup_cbdr, then leave that function to initialize the rest. enetc_setup_cbdr should be able to perform all initialization given a zero-initialized struct enetc_cbdr. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	0bfde022b3	net: enetc: squash clear_cbdr and free_cbdr into teardown_cbdr All call sites call enetc_clear_cbdr and enetc_free_cbdr one after another, so let's combine the two functions into a single method named enetc_teardown_cbdr which does both, and in the same order. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	27f9025d49	net: enetc: save the mode register address inside struct enetc_cbdr enetc_clear_cbdr depends on struct enetc_hw because it must disable the ring through a register write. We'd like to remove that dependency, so let's do what's already done with the producer and consumer indices, which is to save the iomem address in a variable kept in struct enetc_cbdr. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00
Vladimir Oltean	24be14e326	net: enetc: squash enetc_alloc_cbdr and enetc_setup_cbdr enetc_alloc_cbdr and enetc_setup_cbdr are always called one after another, so we can simplify the callers and make enetc_setup_cbdr do everything that's needed. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 13:14:15 -08:00

1 2 3 4 5 ...

996619 Commits