Xiangbo Gao-image
about-me-image

Xiangbo Gao

Generative AI researcher working on video generation, controllable video editing, and world models.

ResumeContact
about-me-image

Profile

About me

Generative AI | Video Generation | Controllable Video Editing | World Models | Current PhD @ TAMU | MS @ Umich | BS @ UCI

  • LocationTexas A&M University, College Station, TX
  • Age26
  • NationalityChina
  • InterestsSnowboarding, Skiing, Rock climbing, Badminton, Tennis
  • StudyTexas A&M University, College Station, TX

Academic path

Education

Texas A&M University logo

Ph.D. in Computer Science

Texas A&M University

2025.1 - Present

University of Michigan logo

M.S. in Robotics

University of Michigan, Ann Arbor

2023.9 - 2024.12

University of California, Irvine logo

B.S. in Computer Science | B.S. in Mathematics

University of California, Irvine

2018.9 - 2023.3

Industry experience

Employment

Adobe logo

Research Intern

Adobe Research

2026.5 - Present
DiDi logo

Autonomous Driving Algorithms Research Intern

DiDi Global Inc., USA

2025.9 - 2026.5
COWA Robot logo

Perception Research Intern

Anhui Cowa ROBOT Co., Ltd, Shanghai, China

2023.4 - 2023.7

Publications

Selected

LangCoop: Collaborative Driving with Language

Xiangbo Gao, Runsheng Xu, Jiachen Li, Ziran Wang, Zhiwen Fan, Zhengzhong Tu

CVPR 2025

Multi-agent collaboration enhances autonomous driving by enabling connected vehicles to share information, but current communication methods suffer from bandwidth, heterogeneity, and information loss issues. We propose LangCoop, a language-driven collaboration framework that uses natural language as a compact, expressive medium for inter-agent communication. Featuring M3CoT for structured reasoning and LangPack for efficient message encoding, LangCoop achieves a 96% reduction in bandwidth while maintaining strong closed-loop driving performance in CARLA simulations.

about-me-image

AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving

Shuo Xing, Hongyuan Hua, Xiangbo Gao, Shenzhe Zhu, Renjie Li, Kexin Tian, Xiaopeng Li, Heng Huang, Tianbao Yang, Zhangyang Wang, Yang Zhou, Huaxiu Yao, Zhengzhong Tu

TMLR 2026

AutoTrust is a groundbreaking benchmark designed to assess the trustworthiness of DriveVLMs. This work aims to enhance public safety by ensuring DriveVLMs operate reliably across critical dimensions.

about-me-image

STAMP: Scalable Task- And Model-agnostic Collaborative Perception

Xiangbo Gao, Runsheng Xu, Jiachen Li, Ziran Wang, Zhiwen Fan, Zhengzhong Tu

ICLR 2025

STAMP is a new framework for multi-agent collaborative perception in autonomous driving that enables diverse vehicles to share sensor data efficiently. Using adapter-reverter pairs to convert between agent-specific and shared feature formats in Bird`s Eye View, it achieves better accuracy than existing methods while reducing computational costs and maintaining security across heterogeneous systems.

about-me-image

MambaST: A Plug-and-Play Cross-Spectral Spatial-Temporal Fuser for Efficient Pedestrian Detection

Xiangbo Gao, Asiegbu Miracle Kanu-Asiegbu, Xiaoxiao Du

ITSC 2024

MambaST is a new framework for pedestrian detection that combines RGB and thermal camera data while leveraging temporal information. It uses a novel Multi-head Hierarchical Patching and Aggregation structure with state space models to efficiently process multi-spectral data, achieving better results on small-scale detection while being more computationally efficient than transformer-based approaches.

about-me-image

Scale-free and Task-agnostic Attack: Generating Photo-realistic Adversarial Patterns with Patch Quilting Generator

Xiangbo Gao, Cheng Luo, Qinliang Lin, Weicheng Xie, Minmin Liu, Linlin Shen, Keerthy Kusumam, Siyang Song

ICASSP 2024

PQ-GAN is a novel scale-free generator for adversarial attacks that works on images of any size. Unlike previous methods limited to local or fixed-scale attacks, it demonstrates superior transferability, defense resistance, and visual quality when tested against other attack methods on ImageNet and CityScapes datasets.

about-me-image

Sample Hardness Based Gradient Loss for Long-Tailed Cervical Cell Detection

Minmin Liu, Xuechen Li, Xiangbo Gao, Junliang Chen, Linlin Shen, Huisi Wu

MICCAI 2022

A new Grad-Libra Loss method improves cancer cell detection in imbalanced cervical cancer datasets by adjusting for both sample difficulty and category distribution, achieving 7.8% better accuracy than standard approaches.

about-me-image

Preprints

AirV2X: Unified Air-Ground Vehicle-to-Everything Collaboration

Xiangbo Gao, Yuheng Wu, Fengze Yang, Xuewen Luo, Keshu Wu, Xinghao Chen, Yuping Wang, Chenxi Liu, Yang Zhou, Zhengzhong Tu

ArXiv 2025

While multi-vehicle collaboration improves safety and efficiency, traditional infrastructure-based V2X systems face high deployment costs and poor coverage in rural areas. To address this, we introduce AirV2X-Perception, a large-scale dataset that uses UAVs as flexible, low-cost perception units providing dynamic, occlusion-free bird’s-eye views. Spanning 6.73 hours of diverse driving scenarios, the dataset enables standardized development and evaluation of Vehicle-to-Drone (V2D) algorithms for aerial-assisted autonomous driving.

about-me-image

SafeCoop: Unravelling Full Stack Safety in Agentic Collaborative Driving

Xiangbo Gao, Tzu-Hsiang Lin, Ruojing Song, Yuheng Wu, Kuan-Ru Huang, Zicheng Jin, Fangzhou Lin, Shinan Liu, Zhengzhong Tu

ArXiv 2025

Collaborative driving systems utilize vehicle-to-everything (V2X) communication to enhance safety and efficiency, but traditional approaches face bandwidth, semantic, and interoperability limitations. Emerging language-driven V2X frameworks offer richer semantics and reasoning capabilities yet introduce new vulnerabilities such as message loss and semantic manipulation. To address these, we propose SafeCoop, an agentic defense pipeline that safeguards language-based collaboration through semantic firewalls, consistency checks, and multi-source consensus, achieving significant safety gains in closed-loop evaluations.

about-me-image

More Publications

Simulating the Unseen: Crash Prediction Must Learn from What Did Not Happen

Z Li, X Cao, X Gao, K Tian, K Wu, M Anis, H Zhang, K Long, J Jiang, X Li, ...

arXiv preprint arXiv:2505.21743, 2025

The pulse of motion: Measuring physical frame rate from visual dynamics

X Gao, M Wu, S Yang, J Yu, P Taghavi, F Lin, Z Tu

arXiv preprint arXiv:2603.14375, 2026

Pisco: Precise video instance insertion with sparse control

X Gao, R Li, X Chen, Y Wu, S Feng, Q Yin, Z Tu

arXiv preprint arXiv:2602.08277, 2026

Position: Human-Centric AI Requires a Minimum Viable Level of Human Understanding

F Lin, Q Ge, L Xu, P Li, X Gao, S Xing, K Yamada, Z Zhang, H Zhang, Z Tu

arXiv preprint arXiv:2602.00854, 2026

DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving

M Godbole, X Gao, Z Tu

ICCV workshop 2025, 2025

Automated Vehicles Should be Connected with Natural Language

X Gao, K Wu, H Zhang, K Tian, Y Zhou, Z Tu

arXiv preprint arXiv:2507.01059, 2025

Modular Safety Guardrails Are Necessary for Foundation-Model-Enabled Robots in the Real World

J Kim, W Chen, D Soleymanzadeh, Y Ding, X Gao, Z Tu, R Zhang, F Fei, ...

arXiv preprint arXiv:2602.04056, 2026

TimePre: Bridging Accuracy, Efficiency, and Stability in Probabilistic Time-Series Forecasting

L Jiang, L Xu, P Li, Q Ge, D Zhuang, S Xing, W Chen, X Gao, TH Chen, ...

arXiv preprint arXiv:2511.18539, 2025

SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation

J Yu, X Gao, P Verlani, A Gadde, Y Wang, B Adsumilli, Z Tu

arXiv preprint arXiv:2603.16864, 2026

VEFX-Bench: A Holistic Benchmark for Generic Video Editing and Visual Effects

X Gao, S Jiang, B Liu, X Chen, M Yang, S Yang, M Wu, J Yu, Q Zheng, ...

arXiv preprint arXiv:2604.16272, 2026

Physics-Aware Video Instance Removal Benchmark

Z Li, X Chen, L Jiang, D Hou, F Lin, K Yamada, X Gao, Z Tu

arXiv preprint arXiv:2604.05898, 2026

FORGE-Tree: Diffusion-Forcing Tree Search for Long-Horizon Robot Manipulation

Y Huang, S Liu, S Liu, Q Xu, M Wu, X Gao, Z Tu

arXiv preprint arXiv:2510.21744, 2025

Professional Services

Conference and Journal Paper Reviewing

  • CV & ML:ICCV, CVPR, ICLR, NeurIPS, T-PAMI
  • Robotics:RA-L
  • Transportation:TRBAM