The Spanning Tree Protocol (802.1d)
The spanning tree protocol is a distributed standard switch used to reduce the network topology to a spanning tree by eliminating all cycles. Explore these examples to see how this technology processes frames in the datalink layer.
The Spanning TreeProtocol (STP), proposed in [Perlman1985], is a distributed protocol that is used by switches to reduce the network topology to a spanning tree, so that there are no cycles in the topology. For example, consider the network shown in the figure below. In this figure, each bold line corresponds to an Ethernet to which two Ethernet switches are attached. This network contains several cycles that must be broken to allow Ethernet switches that are using the MAC address learning algorithm to exchange frames.
In this network, the STP will compute the following spanning tree. Switch1 will be the root of the tree. All the interfaces of Switch1, Switch2 and Switch7 are part of the spanning tree. Only the interface connected to LANB will be active on Switch9. LANH will only be served by Switch7 and the port of Switch44 on LANG will be disabled. A frame originating on LANB and destined for LANA will be forwarded by Switch7 on LANC, then by Switch1 on LANE, then by Switch44 on LANF and eventually by Switch2 on LANA.
Switches running the Spanning TreeProtocol exchange BPDUs. These BPDUs are always sent as frames with the destination MAC address as the ALL_BRIDGES reserved multicast MAC address. Each switch has a unique 64 bit identifier. To ensure uniqueness, the lower 48 bits of the identifier are set to the unique MAC address allocated to the switch by its manufacturer. The high order 16 bits of the switch identifier can be configured by the network administrator to influence the topology of the spanning tree. The default value for these high order bits is 32768.
Figure 6.27: Spanning tree computed in a switched Ethernet network
The switches exchange BPDUs to build the spanning tree. Intuitively, the spanning tree is built by first selecting the switch with the smallest identifier as the root of the tree. The branches of the spanning tree are then composed of the shortest paths that allow all of the switches that compose the network to be reached. The BPDUs exchanged by the switches contain the following information:
- the identifier of the root switch (R)
- the cost of the shortest path between the switch that sent the BPDU and the root switch (c)
- the identifier of the switch that sent the BPDU (T )
- the number of the switch port over which the BPDU was sent (p)
We will use the notation <R,c,T,p> to represent a BPDU whose root identifier is R, cost is c and that was sent on the port p of switch T. The construction of the spanning tree depends on an ordering relationship among the BPDUs. This ordering relationship could be implemented by the python function below.
# returns True if bpdu b1 is better than bpdu b2
def better( b1, b2):
return ( (b1.R < b2.R) or
( (b1.R==b2.R) and (b1.c<b2.c) ) or
( (b1.R==b2.R) and (b1.c==b2.c) and (b1.T<b2.T) ) or
( (b1.R==b2.R) and (b1.c==b2.c) and (b1.T==b2.T) and (b1.p<b2.p) ) )
In addition to the identifier discussed above, the network administrator can also configure a cost to be associated to each switch port. Usually, the cost of a port depends on its bandwidth and the [802.1d] standard recommends the values below. Of course, the network administrator may choose other values. We will use the notation cost[p] to indicate the cost associated to port p in this section.
Bandwidth |
Cost |
10 Mbps 100 Mbps 1 Gbps 10 Gbps 100 Gbps |
2000000 200000 20000 2000 200 |
The Spanning TreeProtocol uses its own terminology that we illustrate in the figure above. A switch port can be in three different states: Root, Designated and Blocked. All the ports of the root switch are in the Designated state. The state of the ports on the other switches is determined based on the BPDU received on each port.
The Spanning Tree Protocol uses the ordering relationship to build the spanning tree. Each switch listens to BPDUs on its ports. When BPDU=<R,c,T,p> is received on port q, the switch computes the port’s priority vector: V[q]=<R,c+cost[q],T,p,q> , where cost[q] is the cost associated to the port over which the BPDU was received. The switch stores in a table the last priority vector received on each port. The switch then compares its own identifier with the smallest root identifier stored in this table. If its own identifier is smaller, then the switch is the root of the spanning tree and is, by definition, at a distance 0 of the root. The BPDU of the switch is then <R,0,R,p>, where R is the switch identifier and p will be set to the port number over which the BPDU is sent. Otherwise, the switch chooses the best priority vector from its table, bv=<R,c,T,p>. The port over which this best priority vector was learned is the switch port that is closest to the root switch. This port becomes the Root port of the switch. There is only one Root port per switch. The switch can then compute its BPDU as BPDU=<R,c,S,p>, where R is the root identifier, c the cost of the best priority vector, S the identifier of the switch and p will be replaced by the number of the port over which the BPDU will be sent. The switch can then determine the state of all its ports by comparing its own BPDU with the priority vector received on each port. If the switch’s BPDU is better than the priority vector of this port, the port becomes a Designated port. Otherwise, the port becomes a Blocked port.
The state of each port is important when considering the transmission of BPDUs. The root switch regularly sends its own BPDU over all of its (Designated) ports. This BPDU is received on the Root port of all the switches that are directly connected to the root switch. Each of these switches computes its own BPDU and sends this BPDU over all its Designated ports. These BPDUs are then received on the Root port of downstream switches, which then compute their own BPDU, etc. When the network topology is stable, switches send their own BPDU on all their Designated ports, once they receive a BPDU on their Root port. No BPDU is sent on a Blocked port. Switches listen for BPDUs on their Blocked and Designated ports, but no BPDU should be received over these ports when the topology is stable. The utilisation of the ports for both BPDUs and data frames is summarised in the table below.
Port state |
Receives BPDUs |
Sends BPDU |
Handles data frames |
Blocked Root Designated |
yes yes yes |
no no yes |
no yes yes |
To illustrate the operation of the Spanning Tree Protocol, let us consider the simple network topology in the figure below.
Figure 6.28: A simple Spanning tree computed in a switched Ethernet network
Assume that Switch4 is the first to boot. It sends its own BPDU=<4,0,4,?> on its two ports. When Switch1 boots, it sends BPDU=<1,0,1,1>. This BPDU is received by Switch4, which updates its table and computes a new BPDU=<1,3,4,?>. Port 1 of Switch4 becomes the Root port while its second port is still in the Designated state.
Assume now that Switch9 boots and immediately receives Switch1 ‘s BPDU on port 1. Switch9 computes its own BPDU=<1,1,9,?> and port 1 becomes the Root port of this switch. This BPDU is sent on port 2 of Switch9 and reaches Switch4. Switch4 compares the priority vector built from this BPDU (i.e. <1,2,9,2>) and notices that it is better than Switch4 ‘s BPDU=<1,3,4,2>. Thus, port 2 becomes a Blocked port on Switch4.
During the computation of the spanning tree, switches discard all received data frames, as at that time the network topology is not guaranteed to be loop-free. Once that topology has been stable for some time, the switches again start to use the MAC learning algorithm to forward data frames. Only the Root and Designated ports are used to forward data frames. Switches discard all the data frames received on their Blocked ports and never forward frames on these ports.
Switches, ports and links can fail in a switched Ethernet network. When a failure occurs, the switches must be able to recompute the spanning tree to recover from the failure. The Spanning Tree Protocol relies on regular transmissions of the BPDUs to detect these failures. A BPDU contains two additional fields: the Age of the BPDU and the Maximum Age. The Age contains the amount of time that has passed since the root switch initially originated the BPDU. The root switch sends its BPDU with an Age of zero and each switch that computes its own BPDU increments its Age by one. The Age of the BPDUs stored on a switch’s table is also incremented every second. A BPDU expires when its Age reaches the Maximum Age. When the network is stable, this does not happen as BPDU s are regularly sent by the root switch and downstream switches. However, if the root fails or the network becomes partitioned, BPDU will expire and switches will recompute their own BPDU and restart the Spanning Tree Protcol. Once a topology change has been detected, the forwarding of the data frames stops as the topology is not guaranteed to be loop-free. Additional details about the reaction to failures may be found in [802.1d].
Source: Olivier Bonaventure, https://s3.amazonaws.com/saylordotorg-resources/wwwresources/site/wp-content/uploads/2012/02/Computer-Networking-Principles-Bonaventure-1-30-31-OTC1.pdf
This work is licensed under a Creative Commons Attribution 3.0 License.