mirror of
https://github.com/torvalds/linux.git
synced 2024-12-03 17:41:22 +00:00
d2afc2cd7f
Add some features that are not appropriate for the existing section to the "Others" part of the bridge document. Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
336 lines
15 KiB
ReStructuredText
336 lines
15 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0
|
|
|
|
=================
|
|
Ethernet Bridging
|
|
=================
|
|
|
|
Introduction
|
|
============
|
|
|
|
The IEEE 802.1Q-2022 (Bridges and Bridged Networks) standard defines the
|
|
operation of bridges in computer networks. A bridge, in the context of this
|
|
standard, is a device that connects two or more network segments and operates
|
|
at the data link layer (Layer 2) of the OSI (Open Systems Interconnection)
|
|
model. The purpose of a bridge is to filter and forward frames between
|
|
different segments based on the destination MAC (Media Access Control) address.
|
|
|
|
Bridge kAPI
|
|
===========
|
|
|
|
Here are some core structures of bridge code. Note that the kAPI is *unstable*,
|
|
and can be changed at any time.
|
|
|
|
.. kernel-doc:: net/bridge/br_private.h
|
|
:identifiers: net_bridge_vlan
|
|
|
|
Bridge uAPI
|
|
===========
|
|
|
|
Modern Linux bridge uAPI is accessed via Netlink interface. You can find
|
|
below files where the bridge and bridge port netlink attributes are defined.
|
|
|
|
Bridge netlink attributes
|
|
-------------------------
|
|
|
|
.. kernel-doc:: include/uapi/linux/if_link.h
|
|
:doc: Bridge enum definition
|
|
|
|
Bridge port netlink attributes
|
|
------------------------------
|
|
|
|
.. kernel-doc:: include/uapi/linux/if_link.h
|
|
:doc: Bridge port enum definition
|
|
|
|
Bridge sysfs
|
|
------------
|
|
|
|
The sysfs interface is deprecated and should not be extended if new
|
|
options are added.
|
|
|
|
STP
|
|
===
|
|
|
|
The STP (Spanning Tree Protocol) implementation in the Linux bridge driver
|
|
is a critical feature that helps prevent loops and broadcast storms in
|
|
Ethernet networks by identifying and disabling redundant links. In a Linux
|
|
bridge context, STP is crucial for network stability and availability.
|
|
|
|
STP is a Layer 2 protocol that operates at the Data Link Layer of the OSI
|
|
model. It was originally developed as IEEE 802.1D and has since evolved into
|
|
multiple versions, including Rapid Spanning Tree Protocol (RSTP) and
|
|
`Multiple Spanning Tree Protocol (MSTP)
|
|
<https://lore.kernel.org/netdev/20220316150857.2442916-1-tobias@waldekranz.com/>`_.
|
|
|
|
The 802.1D-2004 removed the original Spanning Tree Protocol, instead
|
|
incorporating the Rapid Spanning Tree Protocol (RSTP). By 2014, all the
|
|
functionality defined by IEEE 802.1D has been incorporated into either
|
|
IEEE 802.1Q (Bridges and Bridged Networks) or IEEE 802.1AC (MAC Service
|
|
Definition). 802.1D has been officially withdrawn in 2022.
|
|
|
|
Bridge Ports and STP States
|
|
---------------------------
|
|
|
|
In the context of STP, bridge ports can be in one of the following states:
|
|
* Blocking: The port is disabled for data traffic and only listens for
|
|
BPDUs (Bridge Protocol Data Units) from other devices to determine the
|
|
network topology.
|
|
* Listening: The port begins to participate in the STP process and listens
|
|
for BPDUs.
|
|
* Learning: The port continues to listen for BPDUs and begins to learn MAC
|
|
addresses from incoming frames but does not forward data frames.
|
|
* Forwarding: The port is fully operational and forwards both BPDUs and
|
|
data frames.
|
|
* Disabled: The port is administratively disabled and does not participate
|
|
in the STP process. The data frames forwarding are also disabled.
|
|
|
|
Root Bridge and Convergence
|
|
---------------------------
|
|
|
|
In the context of networking and Ethernet bridging in Linux, the root bridge
|
|
is a designated switch in a bridged network that serves as a reference point
|
|
for the spanning tree algorithm to create a loop-free topology.
|
|
|
|
Here's how the STP works and root bridge is chosen:
|
|
1. Bridge Priority: Each bridge running a spanning tree protocol, has a
|
|
configurable Bridge Priority value. The lower the value, the higher the
|
|
priority. By default, the Bridge Priority is set to a standard value
|
|
(e.g., 32768).
|
|
2. Bridge ID: The Bridge ID is composed of two components: Bridge Priority
|
|
and the MAC address of the bridge. It uniquely identifies each bridge
|
|
in the network. The Bridge ID is used to compare the priorities of
|
|
different bridges.
|
|
3. Bridge Election: When the network starts, all bridges initially assume
|
|
that they are the root bridge. They start advertising Bridge Protocol
|
|
Data Units (BPDU) to their neighbors, containing their Bridge ID and
|
|
other information.
|
|
4. BPDU Comparison: Bridges exchange BPDUs to determine the root bridge.
|
|
Each bridge examines the received BPDUs, including the Bridge Priority
|
|
and Bridge ID, to determine if it should adjust its own priorities.
|
|
The bridge with the lowest Bridge ID will become the root bridge.
|
|
5. Root Bridge Announcement: Once the root bridge is determined, it sends
|
|
BPDUs with information about the root bridge to all other bridges in the
|
|
network. This information is used by other bridges to calculate the
|
|
shortest path to the root bridge and, in doing so, create a loop-free
|
|
topology.
|
|
6. Forwarding Ports: After the root bridge is selected and the spanning tree
|
|
topology is established, each bridge determines which of its ports should
|
|
be in the forwarding state (used for data traffic) and which should be in
|
|
the blocking state (used to prevent loops). The root bridge's ports are
|
|
all in the forwarding state. while other bridges have some ports in the
|
|
blocking state to avoid loops.
|
|
7. Root Ports: After the root bridge is selected and the spanning tree
|
|
topology is established, each non-root bridge processes incoming
|
|
BPDUs and determines which of its ports provides the shortest path to the
|
|
root bridge based on the information in the received BPDUs. This port is
|
|
designated as the root port. And it is in the Forwarding state, allowing
|
|
it to actively forward network traffic.
|
|
8. Designated ports: A designated port is the port through which the non-root
|
|
bridge will forward traffic towards the designated segment. Designated ports
|
|
are placed in the Forwarding state. All other ports on the non-root
|
|
bridge that are not designated for specific segments are placed in the
|
|
Blocking state to prevent network loops.
|
|
|
|
STP ensures network convergence by calculating the shortest path and disabling
|
|
redundant links. When network topology changes occur (e.g., a link failure),
|
|
STP recalculates the network topology to restore connectivity while avoiding loops.
|
|
|
|
Proper configuration of STP parameters, such as the bridge priority, can
|
|
influence network performance, path selection and which bridge becomes the
|
|
Root Bridge.
|
|
|
|
User space STP helper
|
|
---------------------
|
|
|
|
The user space STP helper *bridge-stp* is a program to control whether to use
|
|
user mode spanning tree. The ``/sbin/bridge-stp <bridge> <start|stop>`` is
|
|
called by the kernel when STP is enabled/disabled on a bridge
|
|
(via ``brctl stp <bridge> <on|off>`` or ``ip link set <bridge> type bridge
|
|
stp_state <0|1>``). The kernel enables user_stp mode if that command returns
|
|
0, or enables kernel_stp mode if that command returns any other value.
|
|
|
|
VLAN
|
|
====
|
|
|
|
A LAN (Local Area Network) is a network that covers a small geographic area,
|
|
typically within a single building or a campus. LANs are used to connect
|
|
computers, servers, printers, and other networked devices within a localized
|
|
area. LANs can be wired (using Ethernet cables) or wireless (using Wi-Fi).
|
|
|
|
A VLAN (Virtual Local Area Network) is a logical segmentation of a physical
|
|
network into multiple isolated broadcast domains. VLANs are used to divide
|
|
a single physical LAN into multiple virtual LANs, allowing different groups of
|
|
devices to communicate as if they were on separate physical networks.
|
|
|
|
Typically there are two VLAN implementations, IEEE 802.1Q and IEEE 802.1ad
|
|
(also known as QinQ). IEEE 802.1Q is a standard for VLAN tagging in Ethernet
|
|
networks. It allows network administrators to create logical VLANs on a
|
|
physical network and tag Ethernet frames with VLAN information, which is
|
|
called *VLAN-tagged frames*. IEEE 802.1ad, commonly known as QinQ or Double
|
|
VLAN, is an extension of the IEEE 802.1Q standard. QinQ allows for the
|
|
stacking of multiple VLAN tags within a single Ethernet frame. The Linux
|
|
bridge supports both the IEEE 802.1Q and `802.1AD
|
|
<https://lore.kernel.org/netdev/1402401565-15423-1-git-send-email-makita.toshiaki@lab.ntt.co.jp/>`_
|
|
protocol for VLAN tagging.
|
|
|
|
`VLAN filtering <https://lore.kernel.org/netdev/1360792820-14116-1-git-send-email-vyasevic@redhat.com/>`_
|
|
on a bridge is disabled by default. After enabling VLAN filtering on a bridge,
|
|
it will start forwarding frames to appropriate destinations based on their
|
|
destination MAC address and VLAN tag (both must match).
|
|
|
|
Multicast
|
|
=========
|
|
|
|
The Linux bridge driver has multicast support allowing it to process Internet
|
|
Group Management Protocol (IGMP) or Multicast Listener Discovery (MLD)
|
|
messages, and to efficiently forward multicast data packets. The bridge
|
|
driver supports IGMPv2/IGMPv3 and MLDv1/MLDv2.
|
|
|
|
Multicast snooping
|
|
------------------
|
|
|
|
Multicast snooping is a networking technology that allows network switches
|
|
to intelligently manage multicast traffic within a local area network (LAN).
|
|
|
|
The switch maintains a multicast group table, which records the association
|
|
between multicast group addresses and the ports where hosts have joined these
|
|
groups. The group table is dynamically updated based on the IGMP/MLD messages
|
|
received. With the multicast group information gathered through snooping, the
|
|
switch optimizes the forwarding of multicast traffic. Instead of blindly
|
|
broadcasting the multicast traffic to all ports, it sends the multicast
|
|
traffic based on the destination MAC address only to ports which have
|
|
subscribed the respective destination multicast group.
|
|
|
|
When created, the Linux bridge devices have multicast snooping enabled by
|
|
default. It maintains a Multicast forwarding database (MDB) which keeps track
|
|
of port and group relationships.
|
|
|
|
IGMPv3/MLDv2 EHT support
|
|
------------------------
|
|
|
|
The Linux bridge supports IGMPv3/MLDv2 EHT (Explicit Host Tracking), which
|
|
was added by `474ddb37fa3a ("net: bridge: multicast: add EHT allow/block handling")
|
|
<https://lore.kernel.org/netdev/20210120145203.1109140-1-razor@blackwall.org/>`_
|
|
|
|
The explicit host tracking enables the device to keep track of each
|
|
individual host that is joined to a particular group or channel. The main
|
|
benefit of the explicit host tracking in IGMP is to allow minimal leave
|
|
latencies when a host leaves a multicast group or channel.
|
|
|
|
The length of time between a host wanting to leave and a device stopping
|
|
traffic forwarding is called the IGMP leave latency. A device configured
|
|
with IGMPv3 or MLDv2 and explicit tracking can immediately stop forwarding
|
|
traffic if the last host to request to receive traffic from the device
|
|
indicates that it no longer wants to receive traffic. The leave latency
|
|
is thus bound only by the packet transmission latencies in the multiaccess
|
|
network and the processing time in the device.
|
|
|
|
Other multicast features
|
|
------------------------
|
|
|
|
The Linux bridge also supports `per-VLAN multicast snooping
|
|
<https://lore.kernel.org/netdev/20210719170637.435541-1-razor@blackwall.org/>`_,
|
|
which is disabled by default but can be enabled. And `Multicast Router Discovery
|
|
<https://lore.kernel.org/netdev/20190121062628.2710-1-linus.luessing@c0d3.blue/>`_,
|
|
which help identify the location of multicast routers.
|
|
|
|
Switchdev
|
|
=========
|
|
|
|
Linux Bridge Switchdev is a feature in the Linux kernel that extends the
|
|
capabilities of the traditional Linux bridge to work more efficiently with
|
|
hardware switches that support switchdev. With Linux Bridge Switchdev, certain
|
|
networking functions like forwarding, filtering, and learning of Ethernet
|
|
frames can be offloaded to a hardware switch. This offloading reduces the
|
|
burden on the Linux kernel and CPU, leading to improved network performance
|
|
and lower latency.
|
|
|
|
To use Linux Bridge Switchdev, you need hardware switches that support the
|
|
switchdev interface. This means that the switch hardware needs to have the
|
|
necessary drivers and functionality to work in conjunction with the Linux
|
|
kernel.
|
|
|
|
Please see the :ref:`switchdev` document for more details.
|
|
|
|
Netfilter
|
|
=========
|
|
|
|
The bridge netfilter module is a legacy feature that allows to filter bridged
|
|
packets with iptables and ip6tables. Its use is discouraged. Users should
|
|
consider using nftables for packet filtering.
|
|
|
|
The older ebtables tool is more feature-limited compared to nftables, but
|
|
just like nftables it doesn't need this module either to function.
|
|
|
|
The br_netfilter module intercepts packets entering the bridge, performs
|
|
minimal sanity tests on ipv4 and ipv6 packets and then pretends that
|
|
these packets are being routed, not bridged. br_netfilter then calls
|
|
the ip and ipv6 netfilter hooks from the bridge layer, i.e. ip(6)tables
|
|
rulesets will also see these packets.
|
|
|
|
br_netfilter is also the reason for the iptables *physdev* match:
|
|
This match is the only way to reliably tell routed and bridged packets
|
|
apart in an iptables ruleset.
|
|
|
|
Note that ebtables and nftables will work fine without the br_netfilter module.
|
|
iptables/ip6tables/arptables do not work for bridged traffic because they
|
|
plug in the routing stack. nftables rules in ip/ip6/inet/arp families won't
|
|
see traffic that is forwarded by a bridge either, but that's very much how it
|
|
should be.
|
|
|
|
Historically the feature set of ebtables was very limited (it still is),
|
|
this module was added to pretend packets are routed and invoke the ipv4/ipv6
|
|
netfilter hooks from the bridge so users had access to the more feature-rich
|
|
iptables matching capabilities (including conntrack). nftables doesn't have
|
|
this limitation, pretty much all features work regardless of the protocol family.
|
|
|
|
So, br_netfilter is only needed if users, for some reason, need to use
|
|
ip(6)tables to filter packets forwarded by the bridge, or NAT bridged
|
|
traffic. For pure link layer filtering, this module isn't needed.
|
|
|
|
Other Features
|
|
==============
|
|
|
|
The Linux bridge also supports `IEEE 802.11 Proxy ARP
|
|
<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=958501163ddd6ea22a98f94fa0e7ce6d4734e5c4>`_,
|
|
`Media Redundancy Protocol (MRP)
|
|
<https://lore.kernel.org/netdev/20200426132208.3232-1-horatiu.vultur@microchip.com/>`_,
|
|
`Media Redundancy Protocol (MRP) LC mode
|
|
<https://lore.kernel.org/r/20201124082525.273820-1-horatiu.vultur@microchip.com>`_,
|
|
`IEEE 802.1X port authentication
|
|
<https://lore.kernel.org/netdev/20220218155148.2329797-1-schultz.hans+netdev@gmail.com/>`_,
|
|
and `MAC Authentication Bypass (MAB)
|
|
<https://lore.kernel.org/netdev/20221101193922.2125323-2-idosch@nvidia.com/>`_.
|
|
|
|
FAQ
|
|
===
|
|
|
|
What does a bridge do?
|
|
----------------------
|
|
|
|
A bridge transparently forwards traffic between multiple network interfaces.
|
|
In plain English this means that a bridge connects two or more physical
|
|
Ethernet networks, to form one larger (logical) Ethernet network.
|
|
|
|
Is it L3 protocol independent?
|
|
------------------------------
|
|
|
|
Yes. The bridge sees all frames, but it *uses* only L2 headers/information.
|
|
As such, the bridging functionality is protocol independent, and there should
|
|
be no trouble forwarding IPX, NetBEUI, IP, IPv6, etc.
|
|
|
|
Contact Info
|
|
============
|
|
|
|
The code is currently maintained by Roopa Prabhu <roopa@nvidia.com> and
|
|
Nikolay Aleksandrov <razor@blackwall.org>. Bridge bugs and enhancements
|
|
are discussed on the linux-netdev mailing list netdev@vger.kernel.org and
|
|
bridge@lists.linux-foundation.org.
|
|
|
|
The list is open to anyone interested: http://vger.kernel.org/vger-lists.html#netdev
|
|
|
|
External Links
|
|
==============
|
|
|
|
The old Documentation for Linux bridging is on:
|
|
https://wiki.linuxfoundation.org/networking/bridge
|