Any application seeking to deliver the same data stream to a group of receivers faces challenges in dealing with slow receivers. Group members that can keep up with the sender may be inconvenienced or perhaps even harmed by those that can't keep up. At the very least, slow receivers cause the sender to use memory for buffering that could perhaps be put to other uses. Buffering adds latency, at least for the slow receiver and perhaps for all receivers. Note that rate control issues are present for groups using either multicast or unicast addressing.
Often, the whole group suffers due to the problems of one member. In extreme cases, the throughput for the group falls to zero because resources that the group shares are dedicated to the needs of one or a few members. Examples of shared resources include the sender's CPU, memory, and network bandwidth. This phenomenon is sometimes called the "crybaby receiver problem" because the cries (e.g. retransmission requests) from one receiver dominate the attention of the parent (the sender).
The chance of encountering a crybaby receiver problem increases as the number of receivers in the group increases. Odds are that at least one receiver will be having difficulty keeping up if the group is large enough.
As long as all receivers in the group are best served by the sender running at the same speed, there is no conflict within the group. Scenarios such as crybaby receivers often present a conflict between what's best for a few members and what's best for the majority. Robust systems will have policies for dealing with the apparent conflict and consistently resolving it.
There are three rate control policies that a sender can use for dealing with such conflict within a group. Two are extremes and the third is a middle ground.
The sender slows down to the rate of the slowest receiver. If the sender cannot control the rate at which new data arrives for transmission to the group, it must either buffer the data until the slowest receiver is ready or drop it. All receivers in the group then experience latency or lost data.
The sender sends as fast as is convenient for it. This is often the rate at which new data arrives or is generated. It is often possible that this rate is too fast for even the fastest receivers. All receivers that can't keep up with the rate most convenient for the sender will experience lost data. This is often called "uncontrolled" since there is no mechanism to regulate the rate used by the sender.
The sender operates within a set of boundaries established a system administrator or architect. The sender's goal is to minimize data loss and latency in the receiver group while staying within the configured limits.
The extreme policies have potentially dire consequences for many applications. For example, neither is ideal for market data and other types of latency-sensitive data.
Extreme 2 is the policy most often used. Successful use of it requires that networks and receivers be provisioned to keep up with the fastest rate that might be convenient for the sender. Such policies often leave little bandwidth for TCP traffic and can be vulnerable to "NAK storms" and other maladies which can destabilize the entire network.
Extreme 1 is appropriate only for transactional applications where it's more important for the group to stay in sync than for the group to have low latency.
The middle ground policy is ideal for many latency-sensitive applications such as transport of financial market data. It allows for low-latency reliable delivery while maintaining the stability of the network. No amount of overload can cause a "NAK storm" or other network outage when the policy is established with knowledge of the capabilities of the network.
The need to establish a group rate control policy is often not apparent to those accustomed to dealing with two-party communication (e.g. TCP). When there are only two parties communicating, one policy is commonly used: the sender goes as fast as it can without going faster than the receiver or being unfair to others on the network. (See Section 2 for details.) This is the only sensible policy for applications that can withstand some latency and cannot withstand data loss. With two-party communication using TCP, it's the only policy choice you have. However, group communication with one sender and many receivers opens up all of the policy possibilities mentioned above.
Some messaging systems support only one policy and hence require no policy configuration. Others may allow a choice of policy. Any middle ground policy will need to be configured to establish the boundaries within which it should operate.
The best results are obtained when the specific needs of a messaging application have been considered and the messaging system has been configured to reflect them. A messaging system has no means to automatically select from among the available group rate control policies. Human judgment is required.
A group rate control policy should be chosen to match the needs of the application serving the group. The choice is often made by weighing the benefits of low latency against reliable delivery. These benefits have to be considered for individuals within the group and for the group as a whole.
In some applications, members of the group benefit from the presence of other members. In these applications, the group benefit from reliable reception among all receivers may outweigh the pain of added latency or limited group throughput. Consider a group of servers that share the load from a common set of clients. Assume that the servers have to stay synchronized with a stream of messages to be able to answer client queries. If one server leaves the group because it lost some messages, the clients it was serving would move to the remaining servers. This could lead to a domino effect where a traffic burst caused the slowest server to drop from the group which in turn increased the load on the other servers causing them to fail in turn as well. Clearly in a situation like this, it's better for the whole group of servers to slow down a bit during a traffic peak so that even the slowest among them can keep up without loss. The appropriate group rate control policy for such an application is Extreme 1 (see Section 3.2.1).
Other applications see no incremental benefit for the group if all members experience reliable reception. Consider a group of independent traders who all subscribe to a market data stream. Traders who can keep up with the rate convenient for the sender do not want the sender to slow down for those who can't. Extreme 2 (see Section 3.2.2) is probably the appropriate policy for an application like this. However, care must be taken to prevent the sender from going faster than even the fastest trader as that often leads to NAK storms from which there is no recovery.
29West recommends a careful analysis of the policies that are best for the group using an application and for individual members of the group. The rate control policy best suited to your application will generally emerge from such an analysis. We have found that it is possible to build stable, low-latency messaging systems with careful network design and a messaging layer like LBM that supports a middle ground policy through the use of rate controls.
Once you've chosen a group rate control policy appropriate for your application, it's important to chose a transport protocol that can implement your chosen policy. Some transport protocols offer only the extreme policies while others allow parameters to be set to implement a middle ground policy.
UDP provides no rate control at all, so it follows group policy Extreme 2 above (see Section 3.2.2).
TCP doesn't operate naturally as a group communication protocol since it only supports unicast addressing. However, when TCP is used to send copies of the same data stream to more than one receiver, all of the group rate control issues discussed above are present. TCP's inherent flow control feature follows group policy Extreme 1 described above (see Section 3.2.1). If the sender is willing to use non-blocking I/O and manage buffering, then Extreme 2 and middle ground policies can be implemented to some degree. For example LBM supports middle ground policies over TCP with its latency-bounded TCP feature.
Some reliable multicast transport protocols provide no rate control at all (e.g. TIBCO Rendezvous), some provide only a fixed maximum rate limit (e.g. PGM), and some provide separate rate controls for initial data transmission and retransmission (e.g. LBT-RM from 29West). See Section 15 for details.
Copyright 2004 - 2010 29West, Inc.