Diandian Guo

I am a Ph.D. student at The Chinese University of Hong Kong, advised by Prof. Pheng-Ann Heng. My research focuses on computer vision for surgical intelligence, medical AI, and AI-assisted AR/XR systems.

Previously, I received my M.S. in Elektromobilitat from the University of Stuttgart, where I worked with Prof. Bin Yang on deep learning. Before that, I received my B.S. from Jilin University.

Email CV Google Scholar GitHub

News

2026SurgLQA was accepted to MICCAI 2026 as an early accept.
2026SurgClean was accepted to CVPR 2026.
2026BlooDet was accepted to CVPR 2026.
2025PmNet was accepted to AAAI 2025.
2025S²Former-OR was accepted to IEEE Transactions on Medical Imaging.
2024TriTemp-OR was accepted to MICCAI 2024.
2024VPSeg was accepted to CVPR 2024 as a Highlight.

Research

I work on surgical scene understanding, long-horizon surgical video reasoning, operating-room scene graph generation, surgical image restoration, and robust visual perception. Selected publications are listed below. For a full list, please see Google Scholar. * indicates equal contribution.

SurgLQA: Scalable Long-Horizon Surgical Video Question Answering

Diandian Guo, Xikai Yang, Ruiyang Li, Jialun Pei, Pheng-Ann Heng

MICCAI, 2026 Early Accept

paper code

Long-horizon surgical VideoQA with temporally faithful representation and adaptive inference-time reasoning.

Benchmarking Endoscopic Surgical Image Restoration and Beyond

Jialun Pei, Diandian Guo, Donghui Yang, Zhixi Li, Yuxin Feng, Long Ma, Bo Du, Pheng-Ann Heng

CVPR, 2026

paper code

A real-world SurgClean benchmark for endoscopic desmoking, defogging, and desplashing.

Synergistic Bleeding Region and Point Detection in Surgical Videos

Jialun Pei, Zhangjun Zhou, Diandian Guo, Zhixi Li, Jing Qin, Bo Du, Pheng-Ann Heng

CVPR, 2026

paper

BlooDet jointly detects bleeding regions and bleeding points in laparoscopic surgical videos.

Surgical Workflow Recognition and Blocking Effectiveness Detection in Laparoscopic Liver Resection with Pringle Maneuver

Diandian Guo, Weixin Si, Zhixi Li, Jialun Pei, Pheng-Ann Heng

AAAI, 2025

paper arXiv code

PmNet models short- and long-range surgical temporal cues for liver resection workflow monitoring.

S²Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR

Jialun Pei, Diandian Guo, Jingyang Zhang, Manxi Lin, Yueming Jin, Pheng-Ann Heng

IEEE Transactions on Medical Imaging, 2025

paper code

Single-stage bi-modal transformer for 2D-3D operating-room scene graph generation.

Tri-modal Confluence with Temporal Dynamics for Scene Graph Generation in Operating Rooms

Diandian Guo*, Manxi Lin*, Jialun Pei, He Tang, Yueming Jin, Pheng-Ann Heng

MICCAI, 2024

paper code

Tri-modal temporal modeling and medical-LLM knowledge transfer for OR scene graph generation.

Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes

Diandian Guo*, Deng-Ping Fan, Tongyu Lu*, Christos Sakaridis, Luc Van Gool

CVPR, 2024 Highlight

paper code

Video semantic segmentation for driving scenes using vanishing-point-guided temporal correspondence.