18. Latency from Interrupt Coalescing

Many 29West customers using LBM are concerned about latency. We have helped them troubleshoot latency problems and have sometimes found a significant cause to be interrupt coalescing in Gigabit Ethernet NIC hardware. Fortunately, the behavior of interrupt coalescing is configurable and can generally be adjusted to the particular needs of an application.

As mentioned in Section 17.5, interrupt coalescing represents a trade-off between latency and throughput. Coalescing interrupts always adds latency to arriving messages, but the resulting efficiency gains may be desirable where high throughput is desired over low latency.

The default for some NICs or drivers is an "adaptive" or "dynamic" interrupt coalescing setting that seems to significantly favor high throughput over low latency. The advice in this section is generally aimed toward changing the default to favor low latency, perhaps at the expense of high throughput.

The details of configuring interrupt coalescing behavior will vary depending on the operating system and perhaps even the type of NIC in use. We have had specific experience with Linux using Intel and Broadcom NICs, and with Windows using Intel NICs.

18.1. Linux Interrupt Coalescing

On Linux, the conventional way to configure interrupt coalescing seems to be the ethtool command. However, some NIC drivers seem to require configuration in other ways. In our experience, if ethtool -c eth0 did not work, then another method was available.

For example, we have seen that with some Intel NICs on Linux, interrupt coalescing is controlled through the modprobe method of configuring loadable drivers. /etc/modprobe.conf might look like this:

options e1000 InterruptThrottleRate=8000

Some default configurations set InterruptThrottleRate to 1 which selects dynamic interrupt coalescing. We have seen significant reductions in latency by changing this to a fixed value (e.g. 8,000) as suggested in Intel Application Note AP-450. Using a value of 8,000 would limit receive latency to 1 second/8000 = 125 μs worst case.

For NICs and drivers configured with ethtool, make sure the adaptive-rx parameter is off for the lowest possible latency. Also check the settings of all of the rx-usecs and rx-frames parameters.

18.2. Linux "ethtool" Command Parameters

The Linux "ethtool -C" command provides a wide array of different types of parameters that can be configured in various ways to set values related to interrupt coalescing.

Please note that even though "ethtool" provides support for these parameters, the NIC driver itself may not. Use the "man ethtool" page along with the NIC documentation to research the exact parameters available in more detail.

The tables below are organized by type of parameter: RX Parameters, TX Parameters, and Other Parameters.

RX Parameters

Parameter

Definition

rx-usecs

Maximum number of microseconds to delay an RX interrupt after receiving a packet. If 0, only rx-max-frames is used. Do not set both rx-usecs and rx-max-frames to 0 as this would cause no RX interrupts to be generated.

rx-usecs-low

Same as rx-usecs, but used in concert with pkt-rate-low (see below).

rx-usecs-high

Same as rx-usecs, but used in concert with pkt-rate-high (see below).

rx-usecs-irq

Maximum number of microseconds to delay an RX interrupt after receiving a packet while an IRQ is also being serviced by the host. Some NIC drivers may not support this feature.

rx-max-frames

Maximum number of packets to delay an RX interrupt after receiving a packet. If 0, only rx-usecs is used. Do not set both rx-usecs and rx-max-frames to 0 as this would cause no RX interrupts to be generated.

rx-max-frames-low

Same as rx-max-frames, but used in concert with pkt-rate-low (see below).

rx-max-frames-high

Same as rx-max-frames, but used in concert with pkt-rate-high (see below).

rx-max-frames-irq

Maximum number of packets to delay an RX interrupt after receiving a packet while an IRQ is also being serviced by the host. Some NIC drivers may not support this feature.

TX Parameters

Parameter

Definition

tx-usecs

Maximum number of microseconds to delay a TX interrupt after sending a packet. If 0, only tx-max-frames is used. Do not set both tx-usecs and tx-max-frames to 0 as this would cause no TX interrupts to be generated.

tx-usecs-low

Same as tx-usecs, but used in concert with pkt-rate-low (see below).

tx-usecs-high

Same as tx-usecs, but used in concert with pkt-rate-high (see below).

tx-usecs-irq

Maximum number of microseconds to delay a TX interrupt after sending a packet while an IRQ is also being serviced by the host. Some NICs may not support this feature.

tx-max-frames

Maximum number of packets to delay a TX interrupt after sending a packet. If 0, only tx-usecs is used. Do not set both tx-usecs and tx-max-frames to 0 as this would cause no TX interrupts to be generated.

tx-max-frames-low

Same as tx-max-frames, but used in concert with pkt-rate-low (see below).

tx-max-frames-high

Same as tx-max-frames, but used in concert with pkt-rate-high (see below).

tx-max-frames-irq

Maximum number of packets to delay a TX interrupt after sending a packet while an IRQ is also being serviced by the host. Some NICs may not support this feature.

Other Parameters

Parameter Definition
adaptive-rx

An algorithm to improve rx latency under low packet rates and improve throughput under high packet rates. Some NIC drivers do not support this feature.

adaptive-tx

An algorithm to improve tx latency under low packet rates and improve throughput under high packet rates. Some NIC drivers do not support this feature.

pkt-rate-low

Rate of packets per second below which a different set of *-usecs and *-max-frames parameters are used:

  • rx-usecs-low

  • rx-max-frames-low

  • tx-usecs-low

  • tx-max-frames-low

Above this rate, the normal *-usecs and *-max-frames parameters are used.

pkt-rate-high

Rate of packets per second above which a different set of *-usecs and *-max-frames parameters are used:

  • rx-usecs-high

  • rx-max-frames-high

  • tx-usecs-high

  • tx-max-frames-high

Below this rate, the normal *-usecs and *-max-frames parameters are used.

sample-interval

Number of seconds to use as packet sampling rate for adaptive coalescing. Must be non-zero.

18.3. Windows Interrupt Coalescing

We haven't had as much experience configuring interrupt coalescing on Windows as we have had on Linux. For a given type of NIC, we'd expect the relevant parameters name to be similar to those given for Linux above.

18.4. Windows NIC Loss Avoidance and Detection

We know of a few ways to avoid and detect loss at the NIC level, depending on the brand of NIC on your Windows machine. Below, we cover Windows and Intel NICs and Windows and Broadcom NICs.

18.4.1. Windows and Intel NICs

If you have Intel NICs, you might have success doing what one customer did: avoid UDP loss by increasing the Receive Descriptors. It defaulted to 256 and this customer's NIC model allowed a maximum of 2048. (We recommend that you use the maximum Receive Descriptors value for your NIC model whatever it is.) See Figure 3.

Figure 3. Windows Intel NIC Receive Descriptors Parameter Setting

Increasing the Receive Descriptors parameter doesn't change interrupt coalescing settings. It simply increases the size of the ring buffer the NIC uses for receiving. This allows for more interrupt servicing latency before loss.

Note: As shown in the screen shot, each Receive Descriptor requires 2 KB of memory. Our customer's increase to 2048 means that each Intel NIC would allocate 2 MB of physical memory for its receive ring buffer.



18.4.2. Windows and Broadcom NICs

With Broadcom NICs, we recommend that you download and install the BACS tool to detect loss, and possibly to adjust tuning options on the NIC to help address it.

First, download BACS (Broadcom Advanced Control Suites) from the install CD that came with your NIC, or your hardware vendor. You might find this Broadcom NICs FAQ page useful.

After downloading and installing it, start BACS. Then run a workload on your machine that will drive the network traffic you are interested in diagnosing. Then click Start-->Control Panel, and look in the list for Broadcom Control Suite. Double click on it to open the Broadcom Advanced Control Suite dialog.

In this dialog, first check the list of Network Interfaces and make sure the Broadcom NIC is highlighted. Then, over on the upper right, click the Statistics tab, and scroll down the list of statistics to a statistic called "Out of Recv. Buffer". This value is the number of packets dropped due to a shortage of buffer space for the NIC. If the value is nonzero, you may wish to increase the size of your buffers.

Copyright 2004 - 2010 29West, Inc.