"VTrace: Automatic Diagnostic System for Persistent Packet Loss in Cloud-Scale Overlay Network", a paper on cloud network fault location of Professor CHENG Peng and CHEN Jiming’s group has been accepted by SIGCOMM 2020. It is the first paper accepted by SIGCOMM main conference whose first author affiliation is Zhejiang University, and it is also the first accepted paper on cloud network from mainland China.
The work is highly supported by Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies (AZFT). FANG Chongrong and LIU Haoyu, doctoral students at Zhejiang University, completed VTrace through the long-term close cooperation with Alibaba Cloud.
Abstract:
Persistent packet loss in the cloud-scale overlay network severely compromises tenant experiences. Cloud providers are keen to automatically and quickly determine the root cause of such problems. However, existing work is either designed for the physical network or insufficient to present the concrete reason for packet loss. In this paper, we propose to record and analyze the on-site forwarding condition of packets during packet-level tracing. The cloud-scale overlay network presents great challenges to achieve this goal with its high network complexity, multi-tenant nature, and the diversity of root causes. To address these challenges, we present VTrace, an automatic diagnostic system for persistent packet loss over the cloud-scale overlay network. Utilizing the "fast path-slow path" structure of virtual forwarding devices (VFDs), e.g., vSwitches, VTrace installs several "coloring, matching and logging" rules in VFDs to selectively track the packets of interest and inspect them in depth. The detailed forwarding situation at each hop is logged and then assembled to perform analysis with an efficient path reconstruction scheme. Experiments are conducted to demonstrate VTrace’s low overhead and quick responsiveness. We share the experiences of how VTrace efficiently resolves persistent packet loss issues after deploying it in Alibaba Cloud for over 20 months.
Reporter: REN Tong
Editor: WANG Jing