In the process of testing the LBT-RM reliable multicast protocol used in the 29West LBM messaging product, we noticed that some Ethernet switches and NICs now implement a form of flow control that can cause unexpected results. This is commonly seen in mixed speed networks (e.g. 10/100 Mbps or 100/1000 Mbps). We have seen apparently strange results such as two machines being able to send TCP at 100 Mbps but being limited to 10 Mbps for multicast using LBT-RM.
The cause of this seems to be Ethernet switches and NICs that implement the IEEE 802.3x Ethernet flow control standard. It seems to be most commonly implemented in desktop-grade switches. Infrastructure-grade equipment makers seem to have identified the potential harm that Ethernet flow control can cause and decided to implement it carefully if at all.
The intent of Ethernet flow control is to prevent loss in switches by providing back pressure to the sending NIC on ports that are going too fast to avoid loss. While this sounds like a good idea on the surface, higher-layer protocols like TCP were designed to rely on loss as a signal that they should send more slowly. Hence, at the very least, Ethernet flow control duplicates the flow control mechanism already built into TCP. At worst, it unnecessarily slows senders when there's no need.
Problems can happen with multicast on mixed-speed networks if a switch with Ethernet flow control applies back pressure on a multicast sender when it tries to go faster than the slowest active port on the switch. Consider the scenario where two machines X and Y are connected to a switch at 100 Mbps while a third machine Z is connected to the same switch at 10 Mbps. X can send TCP traffic to Y at 100 Mbps since the switch sees no congestion delivering data to Y's switch port. But if X starts sending multicast, the switch tries to forward the multicast to Z's switch port as well as to Y's. Since Z is limited to 10 Mbps, the switch senses congestion and prevents X from sending faster than 10 Mbps.
We have tested switches that behave this way even if there are no multicast receivers on the slow ports. That is, these switches did not have an IGMP snooping feature or an equivalent that would prevent multicast traffic from being forwarded out ports where there were no interested receivers. It seems best to avoid such switches when using LBT-RM. Any production system with switches like this may be vulnerable to the problem where the network runs fine until one day somebody plugs an old laptop into a conference room jack and inadvertently limits all the multicast sources to 10 Mbps.
It may also be possible to configure hosts to ignore flow control signals from switches. For example, on Linux, the /sbin/ethtool command may be able to configure a host to ignore pause signals that are sent from a switch sensing congestion. Module configuration parameters or registry settings might also be appropriate.
Copyright 2004 - 2010 29West, Inc.