Publications

Journal Articles and Manuscripts

Multi-Timescale Joint Optimization of Task Scheduling, Instance Switching, and Resource Scaling for Disaggregated LLM Serving

First author · IEEE Transactions on Cognitive Communications and Networking · SCI 二区 · 2026

DisHelis: Optimizing Deployment of Disaggregated LLMs Inference Serving over Heterogeneous Environments via Hierarchical Max-Flow

First author · IEEE Transactions on Cognitive Communications and Networking · SCI 一区 · 2026

FAESR: Fine-Grained Rate Adaptation for Energy-Aware Super Resolution in Mobile Panoramic Video Streaming

First author · IEEE Transactions on Cognitive Communications and Networking · SCI 一区 · 2025

Conference Papers and Submissions

LatCom: Latent Compression for Efficient Multi-Agent Collaboration

Co-first author · EMNLP 2026 · Poster · 2026

GSTEP: Global Spatio-Temporal Density-Driven Visual Token Pruning for Efficient Video Large Language Models

Co-first author · ACM Multimedia 2026 · Poster · 2026

SAVP: Scene-Aware Vision Token Pruning for Efficient Video Large Language Models

Co-first author · EMNLP 2026 · Poster · 2026

SpecCache: Speculative KV Cache Reuse for Efficient RAG Serving

Co-first author · ACL 2026 · Oral · 2026

HAWK: Head Importance-Aware Visual Token Pruning in Multimodal Models

Co-first author · CVPR 2026 · Poster · 2026