diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt index 616f89267d23..da82cd75a4f6 100644 --- a/Documentation/networking/switchdev.txt +++ b/Documentation/networking/switchdev.txt @@ -114,11 +114,11 @@ would be sub-port 0 on port 1 on switch 1. Switch ID ^^^^^^^^^ -The switchdev driver must implement the switchdev op switchdev_port_attr_get for -SWITCHDEV_ATTR_PORT_PARENT_ID for each port netdev, returning the same physical ID -for each port of a switch. The ID must be unique between switches on the same -system. The ID does not need to be unique between switches on different -systems. +The switchdev driver must implement the switchdev op switchdev_port_attr_get +for SWITCHDEV_ATTR_PORT_PARENT_ID for each port netdev, returning the same +physical ID for each port of a switch. The ID must be unique between switches +on the same system. The ID does not need to be unique between switches on +different systems. The switch ID is used to locate ports on a switch and to know if aggregated ports belong to the same switch. @@ -142,7 +142,7 @@ The port netdevs representing the physical switch ports can be organized into higher-level switching constructs. The default construct is a standalone router port, used to offload L3 forwarding. Two or more ports can be bonded together to form a LAG. Two or more ports (or LAGs) can be bridged to bridge -to L2 networks. VLANs can be applied to sub-divide L2 networks. L2-over-L3 +L2 networks. VLANs can be applied to sub-divide L2 networks. L2-over-L3 tunnels can be built on ports. These constructs are built using standard Linux tools such as the bridge driver, the bonding/team drivers, and netlink-based tools such as iproute2. @@ -177,6 +177,10 @@ entries are installed, for example, using iproute2 bridge cmd: bridge fdb add ADDR dev DEV [vlan VID] [self] +The driver should use the helper switchdev_port_fdb_xxx ops for ndo_fdb_xxx +ops, and handle add/delete/dump of SWITCHDEV_OBJ_PORT_FDB object using +switchdev_port_obj_xxx ops. + XXX: what should be done if offloading this rule to hardware fails (for example, due to full capacity in hardware tables) ? @@ -194,11 +198,11 @@ in turn, will notify the bridge driver using the switchdev notifier call: err = call_switchdev_notifiers(val, dev, info); -Where val is SWITCHDEV_FDB_ADD when learning and SWITCHDEV_FDB_DEL when forgetting, and -info points to a struct switchdev_notifier_fdb_info. On SWITCHDEV_FDB_ADD, the bridge -driver will install the FDB entry into the bridge's FDB and mark the entry as -NTF_EXT_LEARNED. The iproute2 bridge command will label these entries -"offload": +Where val is SWITCHDEV_FDB_ADD when learning and SWITCHDEV_FDB_DEL when +forgetting, and info points to a struct switchdev_notifier_fdb_info. On +SWITCHDEV_FDB_ADD, the bridge driver will install the FDB entry into the +bridge's FDB and mark the entry as NTF_EXT_LEARNED. The iproute2 bridge +command will label these entries "offload": $ bridge fdb 52:54:00:12:35:01 dev sw1p1 master br0 permanent @@ -229,18 +233,18 @@ the bridge's FDB. It's possible, but not optimal, to enable learning on the device port and on the bridge port, and disable learning_sync. To support learning and learning_sync port attributes, the driver implements -switchdev op switchdev_port_attr_get/set for SWITCHDEV_ATTR_PORT_BRIDGE_FLAGS. The driver -should initialize the attributes to the hardware defaults. +switchdev op switchdev_port_attr_get/set for SWITCHDEV_ATTR_PORT_BRIDGE_FLAGS. +The driver should initialize the attributes to the hardware defaults. FDB Ageing ^^^^^^^^^^ There are two FDB ageing models supported: 1) ageing by the device, and 2) ageing by the kernel. Ageing by the device is preferred if many FDB entries -are supported. The driver calls call_switchdev_notifiers(SWITCHDEV_FDB_DEL, ...) to -age out the FDB entry. In this model, ageing by the kernel should be turned -off. XXX: how to turn off ageing in kernel on a per-port basis or otherwise -prevent the kernel from ageing out the FDB entry? +are supported. The driver calls call_switchdev_notifiers(SWITCHDEV_FDB_DEL, +...) to age out the FDB entry. In this model, ageing by the kernel should be +turned off. XXX: how to turn off ageing in kernel on a per-port basis or +otherwise prevent the kernel from ageing out the FDB entry? In the kernel ageing model, the standard bridge ageing mechanism is used to age out stale FDB entries. To keep an FDB entry "alive", the driver should refresh @@ -262,8 +266,8 @@ STP State Change on Port Internally or with a third-party STP protocol implementation (e.g. mstpd), the bridge driver maintains the STP state for ports, and will notify the switch -driver of STP state change on a port using the switchdev op switchdev_attr_port_set for -SWITCHDEV_ATTR_PORT_STP_UPDATE. +driver of STP state change on a port using the switchdev op +switchdev_attr_port_set for SWITCHDEV_ATTR_PORT_STP_UPDATE. State is one of BR_STATE_*. The switch driver can use STP state updates to update ingress packet filter list for the port. For example, if port is @@ -296,33 +300,38 @@ IGMP Snooping XXX: complete this section -L3 routing ----------- +L3 Routing Offload +------------------ Offloading L3 routing requires that device be programmed with FIB entries from the kernel, with the device doing the FIB lookup and forwarding. The device does a longest prefix match (LPM) on FIB entries matching route prefix and -forwards the packet to the matching FIB entry's nexthop(s) egress ports. To -program the device, the switchdev driver is called with add/delete ops for IPv4 -and IPv6 FIB entries. For IPv4, the driver implements switchdev ops: +forwards the packet to the matching FIB entry's nexthop(s) egress ports. - int (*switchdev_fib_ipv4_add)(struct net_device *dev, - __be32 dst, int dst_len, - struct fib_info *fi, - u8 tos, u8 type, - u32 nlflags, u32 tb_id); +To program the device, the driver implements support for +SWITCHDEV_OBJ_IPV[4|6]_FIB object using switchdev_port_obj_xxx ops. +switchdev_port_obj_add is used for both adding a new FIB entry to the device, +or modifying an existing entry on the device. - int (*switchdev_fib_ipv4_del)(struct net_device *dev, - __be32 dst, int dst_len, - struct fib_info *fi, - u8 tos, u8 type, - u32 tb_id); +XXX: Currently, only SWITCHDEV_OBJ_IPV4_FIB objects are supported. -to add/delete IPv4 dst/dest_len prefix on table tb_id. The *fi structure holds -details on the route and route's nexthops. *dev is one of the port netdevs -mentioned in the routes next hop list. If the output port netdevs referenced -in the route's nexthop list don't all have the same switch ID, the driver is -not called to add/delete the FIB entry. +SWITCHDEV_OBJ_IPV4_FIB object passes: + + struct switchdev_obj_ipv4_fib { /* IPV4_FIB */ + u32 dst; + int dst_len; + struct fib_info *fi; + u8 tos; + u8 type; + u32 nlflags; + u32 tb_id; + } ipv4_fib; + +to add/modify/delete IPv4 dst/dest_len prefix on table tb_id. The *fi +structure holds details on the route and route's nexthops. *dev is one of the +port netdevs mentioned in the routes next hop list. If the output port netdevs +referenced in the route's nexthop list don't all have the same switch ID, the +driver is not called to add/modify/delete the FIB entry. Routes offloaded to the device are labeled with "offload" in the ip route listing: @@ -340,7 +349,7 @@ listing: 12.0.0.4 via 11.0.0.9 dev sw1p2 proto zebra metric 20 offload 192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.15 -XXX: add/del IPv6 FIB API +XXX: add/mod/del IPv6 FIB API Nexthop Resolution ^^^^^^^^^^^^^^^^^^ diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c index ac853acbe211..e008057dab46 100644 --- a/net/switchdev/switchdev.c +++ b/net/switchdev/switchdev.c @@ -803,7 +803,7 @@ static struct net_device *switchdev_get_dev_by_nhs(struct fib_info *fi) } /** - * switchdev_fib_ipv4_add - Add IPv4 route entry to switch + * switchdev_fib_ipv4_add - Add/modify switch IPv4 route entry * * @dst: route's IPv4 destination address * @dst_len: destination address length (prefix length) @@ -813,7 +813,7 @@ static struct net_device *switchdev_get_dev_by_nhs(struct fib_info *fi) * @nlflags: netlink flags passed in (NLM_F_*) * @tb_id: route table ID * - * Add IPv4 route entry to switch device. + * Add/modify switch IPv4 route entry. */ int switchdev_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi, u8 tos, u8 type, u32 nlflags, u32 tb_id)