Multi-Timescale Joint Optimization of Task Scheduling, Instance Switching, and Resource Scaling for Disaggregated LLM Serving First author · IEEE Transactions on Cognitive Communications and Networking · SCI 二区 · 2026
DisHelis: Optimizing Deployment of Disaggregated LLMs Inference Serving over Heterogeneous Environments via Hierarchical Max-Flow First author · IEEE Transactions on Cognitive Communications and Networking · SCI 一区 · 2026
FAESR: Fine-Grained Rate Adaptation for Energy-Aware Super Resolution in Mobile Panoramic Video Streaming First author · IEEE Transactions on Cognitive Communications and Networking · SCI 一区 · 2025
LatCom: Latent Compression for Efficient Multi-Agent Collaboration Co-first author · EMNLP 2026 · Poster · 2026
GSTEP: Global Spatio-Temporal Density-Driven Visual Token Pruning for Efficient Video Large Language Models Co-first author · ACM Multimedia 2026 · Poster · 2026
SAVP: Scene-Aware Vision Token Pruning for Efficient Video Large Language Models Co-first author · EMNLP 2026 · Poster · 2026
SpecCache: Speculative KV Cache Reuse for Efficient RAG Serving Co-first author · ACL 2026 · Oral · 2026
HAWK: Head Importance-Aware Visual Token Pruning in Multimodal Models Co-first author · CVPR 2026 · Poster · 2026