Ph.D. Candidate · Future Network Laboratory · USTC

Tao Zhang

Efficient LLM Serving · AI Infrastructure · Multi-Agent Systems

I study how modern AI systems can serve large language models faster and more efficiently across heterogeneous GPU clusters, disaggregated inference pipelines, networked workloads, and collaborative agents.

LLM Serving AI Infrastructure Disaggregated Systems KV Cache Reuse Multi-Agent Communication Multimodal Efficiency
8 First/co-first papers
1 Oral paper
2 SCI Q1 journal papers
2026 CVPR · ACL · EMNLP · MM

Serving systems for modern AI workloads

My research centers on efficient inference serving and AI infrastructure: resource scheduling for disaggregated LLM serving, KV-cache optimization for RAG, MoE training systems, multimodal token pruning, and communication-efficient multi-agent collaboration.

  • DisHelisDeployment and resource allocation for heterogeneous disaggregated LLM serving.
  • SpecCacheSpeculative KV cache reuse for efficient RAG serving.
  • LatComLatent compression for efficient multi-agent collaboration.

First-author and co-first-author work

  • Ph.D. CandidateUniversity of Science and Technology of China, Institute of Advanced Technology and Future Network Laboratory, 2023.09 - Present
  • B.Eng.Chongqing University of Posts and Telecommunications, School of Communication and Information Engineering, 2019.09 - 2023.06
  • National ScholarshipUniversity of Science and Technology of China, 2025
  • Graduate Academic First-Class ScholarshipUniversity of Science and Technology of China, 2023 and 2024
  • Outstanding GraduateChongqing Municipality, 2023