Addressing Endpoint-Induced Congestion With Duplicate Acknowledgment

Guardado en:

Detalles Bibliográficos
Publicado en:	ProQuest Dissertations and Theses (2025)
Autor principal:	Chong, Timothy
Publicado:	ProQuest Dissertations & Theses
Materias:	Computer centers Network interface cards Protocol Workloads Bandwidths Traffic congestion Computer engineering Electrical engineering
Acceso en línea:	Citation/Abstract Full Text - PDF
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Descripción
Resumen:	The explosive growth of network link bandwidth in modern data centers, now reaching up to 400Gbps in recent hardware, has dramatically outpaced the improvements in host CPU and memory advancements. This widening gap means data can arrive at a receiver far more quickly than it can be processed, creating a significant imbalance that results in endpoint-induced congestion—a new type of congestion occurring at the host NIC or memory when incoming packets overwhelm the receiver's processing capabilities.To cope with these ultra-high network speeds, transport-layer functionalities are increasingly offloaded onto Network Interface Cards (NICs). Technologies such as Remote Direct Memory Access (RDMA) and TCP offload engines facilitate efficient packet processing, acknowledgment handling, and even congestion control directly within NIC hardware. However, this offloading, while beneficial, complicates the detection and proactive management of host-side congestion due to decoupling from the host system.Conventional fabric congestion control algorithms, when directly offloaded to NIC hardware, inherently lose visibility into host-side congestion because acknowledgments (ACKs) are immediately sent upon packet arrival at the NIC rather than after successful DMA transfers into host memory, as would occur in CPU-based networking. Consequently, these algorithms are ill-suited to effectively address endpoint-induced congestion, leading to severe packet loss rates—up to 36% in unicast scenarios and up to 10% per sender in typical incast conditions. Our evaluations reveal frequent cyclic buffer overflows and retransmissions when receiver processing significantly lags behind network speeds, motivating the development of a novel congestion control mechanism explicitly designed for host-induced congestion. For example, standard additive-increase, multiplicative-decrease (AIMD) schemes and even enhanced data center transports (e.g., DCTCP with ECN marking) lack the fine-grained, proactive feedback needed to handle host congestion. Consequently, in NIC-offloaded systems with a slow destination host, these algorithms often overshoot the receiver's capacity, leading to excessive packet loss, and latency spikes of packets.Our evaluations of such legacy approaches show cyclic buffer overflows and retransmission when receiver processing rates significantly lag link rates. This shortcoming motivates a new congestion control mechanism explicitly designed for transport-offloaded data center systems facing host-induced congestion.This dissertation proposes a novel end-to-end congestion control protocol that uses duplicate acknowledgments (DACKs) as an early, explicit signal of host congestion. In contrast to conventional TCP where duplicate ACKs indicate packet loss, here the receiver deliberately sends duplicate ACKs (without advancing its sequence number) to proactively flag that it is overwhelmed. By repurposing duplicate ACKs as a form of negative feedback, the protocol enables tight coordination between sender and receiver. Upon receiving these duplicate ACK signals, the sender promptly slows its sending rate to match the receiver's available processing bandwidth before any buffer overflow or packet drop occurs. Once the receiver catches up and clears its NIC buffer, it resumes sending normal cumulative ACKs with no congestion markings, allowing the sender to cautiously ramp back up. This closed-loop design effectively aligns the sender's transmission rate with the receiver's actual consumption rate in real time, preventing runaway packet bursts, buffer overflows and packet losses.We validate the proposed protocol through a comprehensive simulation-based evaluation. We develop a custom packet-level simulator that models NIC offload architectures and host processing constraints in detail. This environment allows us to simulate realistic data center traffic patterns, including incast workloads with many senders converging on one receiver and scenarios where a host’s processing capacity fluctuates over time. We benchmark our mechanism against traditional TCP-style AIMD and an ECN-based host congestion control (HostCC) across these scenarios. Key performance metrics—throughput, packet loss, and latency—are collected to assess how effectively the protocol mitigates endpoint congestion. Our simulation results demonstrate that our congestion control scheme dramatically outperforms existing approaches under host congestion.In a persistent host-overload scenario, our protocol sustained 0% packet loss regardless of NIC buffer size, whereas AIMD and ECN-based schemes suffer continual buffer overflows unless large NIC buffers are provisioned. Under an 8-to-1 incast and all-to-all traffic scenarios with multiple high-speed senders, legacy protocols experienced immense packet drop rate to 10% per sender, while our approach maintained a packet drop rate of 0% with low latency. The scheme also rapidly adapts to dynamic changes in receiver load: as the receiver’s processing rate rises or falls, the sender seamlessly adjusts its pace, staying in lock-step without the scattered transmission rates seen in conventional protocols.Overall, by virtually eliminating host-induced packet loss and minimizing latency jitter, the proposed duplicate-ACK mechanism helps transport-offloaded systems better manage the host congestion challenges posed by modern high-speed networks. This research provides a new theoretical framework for end-point congestion control that holistically coordinates network and endpoint resources, charting a path toward more robust data center transport protocols.
ISBN:	9798290652191
Fuente:	ProQuest Dissertations & Theses Global