Many 29West customers using LBM are concerned about latency. We have helped them troubleshoot latency problems and have sometimes found a significant cause to be interrupt coalescing in Gigabit Ethernet NIC hardware. Fortunately, the behavior of interrupt coalescing is configurable and can generally be adjusted to the particular needs of an application.
As mentioned in Section 17.5, interrupt coalescing represents a trade-off between latency and throughput. Coalescing interrupts always adds latency to arriving messages, but the resulting efficiency gains may be desirable where high throughput is desired over low latency.
The default for some NICs or drivers is an "adaptive" or "dynamic" interrupt coalescing setting that seems to significantly favor high throughput over low latency. The advice in this section is generally aimed toward changing the default to favor low latency, perhaps at the expense of high throughput.
The details of configuring interrupt coalescing behavior will vary depending on the operating system and perhaps even the type of NIC in use. We have had specific experience with Linux using Intel and Broadcom NICs, and with Windows using Intel NICs.
On Linux, the conventional way to configure interrupt coalescing seems to be the ethtool command. However, some NIC drivers seem to require configuration in other ways. In our experience, if ethtool -c eth0 did not work, then another method was available.
For example, we have seen that with some Intel NICs on Linux, interrupt coalescing is controlled through the modprobe method of configuring loadable drivers. /etc/modprobe.conf might look like this:
options e1000 InterruptThrottleRate=8000
Some default configurations set InterruptThrottleRate
to 1 which selects dynamic interrupt coalescing. We have seen significant reductions in
latency by changing this to a fixed value (e.g. 8,000) as suggested in Intel Application
Note AP-450. Using a value of 8,000 would limit receive latency to 1
second/8000 = 125 μs worst case.
For NICs and drivers configured with ethtool, make sure the
adaptive-rx parameter is off for
the lowest possible latency. Also check the settings of all of the rx-usecs and rx-frames
parameters.
The Linux "ethtool -C" command provides a wide array of different types of parameters that can be configured in various ways to set values related to interrupt coalescing.
Please note that even though "ethtool" provides support for these parameters, the NIC driver itself may not. Use the "man ethtool" page along with the NIC documentation to research the exact parameters available in more detail.
The tables below are organized by type of parameter: RX Parameters, TX Parameters, and Other Parameters.
RX Parameters
|
Parameter |
Definition |
|---|---|
| rx-usecs |
Maximum number of microseconds to delay an RX interrupt after receiving a packet. If 0, only rx-max-frames is used. Do not set both rx-usecs and rx-max-frames to 0 as this would cause no RX interrupts to be generated. |
| rx-usecs-low |
Same as rx-usecs, but used in concert with pkt-rate-low (see below). |
| rx-usecs-high |
Same as rx-usecs, but used in concert with pkt-rate-high (see below). |
| rx-usecs-irq |
Maximum number of microseconds to delay an RX interrupt after receiving a packet while an IRQ is also being serviced by the host. Some NIC drivers may not support this feature. |
| rx-max-frames |
Maximum number of packets to delay an RX interrupt after receiving a packet. If 0, only rx-usecs is used. Do not set both rx-usecs and rx-max-frames to 0 as this would cause no RX interrupts to be generated. |
| rx-max-frames-low |
Same as rx-max-frames, but used in concert with pkt-rate-low (see below). |
| rx-max-frames-high |
Same as rx-max-frames, but used in concert with pkt-rate-high (see below). |
| rx-max-frames-irq |
Maximum number of packets to delay an RX interrupt after receiving a packet while an IRQ is also being serviced by the host. Some NIC drivers may not support this feature. |
TX Parameters
|
Parameter |
Definition |
|---|---|
| tx-usecs |
Maximum number of microseconds to delay a TX interrupt after sending a packet. If 0, only tx-max-frames is used. Do not set both tx-usecs and tx-max-frames to 0 as this would cause no TX interrupts to be generated. |
| tx-usecs-low |
Same as tx-usecs, but used in concert with pkt-rate-low (see below). |
| tx-usecs-high |
Same as tx-usecs, but used in concert with pkt-rate-high (see below). |
| tx-usecs-irq |
Maximum number of microseconds to delay a TX interrupt after sending a packet while an IRQ is also being serviced by the host. Some NICs may not support this feature. |
| tx-max-frames |
Maximum number of packets to delay a TX interrupt after sending a packet. If 0, only tx-usecs is used. Do not set both tx-usecs and tx-max-frames to 0 as this would cause no TX interrupts to be generated. |
| tx-max-frames-low |
Same as tx-max-frames, but used in concert with pkt-rate-low (see below). |
| tx-max-frames-high |
Same as tx-max-frames, but used in concert with pkt-rate-high (see below). |
| tx-max-frames-irq |
Maximum number of packets to delay a TX interrupt after sending a packet while an IRQ is also being serviced by the host. Some NICs may not support this feature. |
Other Parameters
| Parameter | Definition |
|---|---|
| adaptive-rx |
An algorithm to improve rx latency under low packet rates and improve throughput under high packet rates. Some NIC drivers do not support this feature. |
| adaptive-tx |
An algorithm to improve tx latency under low packet rates and improve throughput under high packet rates. Some NIC drivers do not support this feature. |
| pkt-rate-low |
Rate of packets per second below which a different set of *-usecs and *-max-frames parameters are used:
|
| pkt-rate-high |
Rate of packets per second above which a different set of *-usecs and *-max-frames parameters are used:
|
| sample-interval |
Number of seconds to use as packet sampling rate for adaptive coalescing. Must be non-zero. |
We haven't had as much experience configuring interrupt coalescing on Windows as we have had on Linux. For a given type of NIC, we'd expect the relevant parameters name to be similar to those given for Linux above.
We know of a few ways to avoid and detect loss at the NIC level, depending on the brand of NIC on your Windows machine. Below, we cover Windows and Intel NICs and Windows and Broadcom NICs.
If you have Intel NICs, you might have success doing what one customer did: avoid UDP
loss by increasing the Receive Descriptors. It defaulted
to 256 and this customer's NIC model allowed a maximum of 2048. (We recommend that you
use the maximum Receive Descriptors value for your NIC
model whatever it is.) See Figure 3.
Increasing the Receive Descriptors parameter doesn't
change interrupt coalescing settings. It simply increases the size of the ring buffer the NIC
uses for receiving. This allows for more interrupt servicing latency before loss.
Note: As shown in the screen shot, each
Receive Descriptorrequires 2 KB of memory. Our customer's increase to 2048 means that each Intel NIC would allocate 2 MB of physical memory for its receive ring buffer.
With Broadcom NICs, we recommend that you download and install the BACS tool to detect loss, and possibly to adjust tuning options on the NIC to help address it.
First, download BACS (Broadcom Advanced Control Suites) from the install CD that came with your NIC, or your hardware vendor. You might find this Broadcom NICs FAQ page useful.
After downloading and installing it, start BACS. Then run a workload on your machine that will drive the network traffic you are interested in diagnosing. Then click Start-->Control Panel, and look in the list for Broadcom Control Suite. Double click on it to open the Broadcom Advanced Control Suite dialog.
In this dialog, first check the list of Network Interfaces and make sure the Broadcom NIC is highlighted. Then, over on the upper right, click the Statistics tab, and scroll down the list of statistics to a statistic called "Out of Recv. Buffer". This value is the number of packets dropped due to a shortage of buffer space for the NIC. If the value is nonzero, you may wish to increase the size of your buffers.
Copyright 2004 - 2010 29West, Inc.