Lab Home Lab Phone Lab Search
Home Research Papers Software People Jobs Los Alamos Photos Site Map

High-Performance TCP

Since 1994, research has shown that aggregate network traffic can be characterized as bursty, or more specifically, self-similar or fractal. However, there has been only limited work on understanding why the traffic behavior is self-similar. While heavy-tailed distributions in file size, packet inter-arrival, and transfer durations may contribute to the self-similarity, we have found that the primary source of self-similarity is from the protocol stack itself, namely TCP. (More specifically, the self-similarity is due to a particular implementation of TCP - TCP Reno, a ubiquitously deployed transport protocol found in virtually all modern operating systems.)

Based on our network traffic measurements, we are investigating alternative implementations of TCP, e.g., TCP Vegas, that do not adversely modulate traffic (as much as TCP Reno does) and can therefore perform better over the wide-area network (WAN) in support of high-performance computational grids. Intuitively, TCP Reno's fundamental problem is that it repeatedly induces packet loss as it probes to identify available bandwidth via an "additive increase, multiplicative decrease" (AIMD) congestion-control mechanism. Although this approach works well over current local-area networks (LANs) and even some wide-area networks (WANs), it fails miserably as the bandwidth-delay product of a network connection becomes large, e.g., 100 Mb in the near future when a WAN connection goes over 1-Gb/s links with a round-trip time of 100 ms.

In addition to the above congestion-control problems, the TCP flow-control mechanisms are static and generally set to 32 or 64 KB by default. Consequently, a WAN connection with a bandwidth-delay product of 100 Mb (12.5 MB) and no competing traffic can only utilize at most 0.5% (64 KB / 12.5 MB) of the available bandwidth! Thus, in parallel to our congestion-control research, we have also developed and implemented a technique called "Dynamic Right-Sizing" of flow-control windows, which delivers dramatically better throughput than a default configuration over a WAN while remaining TCP-friendly. For more information, see the Dynamic Right-Sizing web page.

Publication List