Full Publications

2024

DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal Large Language Models
Zili Zhang, Yinmin Zhong, Ranchen Ming, Hanpeng Hu, Jianjian Sun, Zheng Ge, Yibo Zhu, Xin Jin
In Preprint.
[PDF] [Slides]

RLHFuse: Efficient RLHF Training for Large Language Models with Inter- and Intra-Stage Fusion
Yinmin Zhong, Zili Zhang, Bingyang Wu, Shengyu Liu, Yukun Chen, Changyi Wan, Hanpeng Hu, Lei Xia, Ranchen Ming, Yibo Zhu, Xin Jin
In Preprint.
[PDF] [Slides]

RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
Chao Jin, Zili Zhang, Xuanlin Jiang, Fangyue Liu, Xin Liu, Xuanzhe Liu, Xin Jin
In Preprint.
[PDF] [Slides]

Fast Distributed Inference Serving for Large Language Models
Bingyang Wu*, Yinmin Zhong*, Zili Zhang*, Gang Huang, Xuanzhe Liu, Xin Jin
(* Equal contribution)
In Preprint.
[PDF] [Slides]

dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving
Bingyang Wu, Ruidong Zhu, Zili Zhang, Peng Sun, Xuanzhe Liu, Xin Jin
USENIX Symposium on Operating Systems Design and Implementation (OSDI 2024), Santa Clara, July 10–12, 2024 (To appear).
[PDF] [Slides]

Jolteon: Unleashing the Promise of Serverless for Serverless Workflows
Zili Zhang, Chao Jin, Xin Jin
USENIX Symposium on Networked Systems Design and Implementation (NSDI 2024), Santa Clara, April 16–18, 2024 (To appear).
[PDF] [Slides]

Fast Vector Query Processing for Large Datasets Beyond GPU Memory with Reordered Pipelining
Zili Zhang, Fangyue Liu, Gang Huang, Xuanzhe Liu, Xin Jin
USENIX Symposium on Networked Systems Design and Implementation (NSDI 2024), Santa Clara, April 16–18, 2024 (To appear).
[PDF] [Slides]

2023

Ditto: Efficient Serverless Analytics with Elastic Parallelism
Chao Jin, Zili Zhang, Xingyu Xiang, Songyun Zou, Gang Huang, Xuanzhe Liu, Xin Jin
ACM Special Interest Group on Data Communication (SIGCOMM 2023), New York City, September 10-14, 2023.
[PDF] [Slides]

Fast, Approximate Vector Queries on Very Large Unstructured Datasets
Zili Zhang, Chao Jin, Linpeng Tang, Xuanzhe Liu, Xin Jin
USENIX Symposium on Networked Systems Design and Implementation (NSDI 2023), Boston, April 17–19, 2023.
[PDF] [Slides]

Transparent GPU Sharing in Container Clouds for Deep Learning Workloads
Bingyang Wu, Zili Zhang, Zhihao Bai, Xuanzhe Liu, Xin Jin
USENIX Symposium on Networked Systems Design and Implementation (NSDI 2023), Boston, April 17–19, 2023.
[PDF] [Slides]

Rise of Distributed Deep Learning Training in the Big Model Era: From A Software Engineering Perspective
Xuanzhe Liu , Diandian Gu, Zhenpeng Chen, Jinfeng Wen, Zili Zhang, Yun Ma, Haoyu Wang, Xin Jin
ACM Transactions on Software Engineering and Methodology (TOSEM 2023), 2023.
[PDF] [Slides]

2022

Optimizing Half Precision Winograd Convolution on ARM Many-Core Processors
Dedong Xie, Zhen Jia, Zili Zhang, Xin Jin
ACM Asia-Pacific Workshop on Systems (APSys 2022), online, August 23-24, 2022.
[PDF] [Slides]