We gratefully acknowledge the contributions of 4D-OR benchmark (Özsoy et al.), which significantly facilitates our work.
Based on 4D-OR, we have reformatted the dataset to be more user-friendly.
Reformatting of SGG Annotations
Unlike the original 4D-OR dataset, we standardized the SGG annotations to align with popular open-world SGG datasets such as Visual Genome (Krishna et al., 2017) and Open Images (Kuznetsova et al., 2020). This adjustment makes the data more accessible and easier to process and apply.
Reformatted annotations for object detections in ORs
Reformatted annotations for triplet relations in ORs
Reformatting of Overall Structure
The original 4D-OR dataset was organized based on different surgical segments, mixing data from various modalities within separate surgical segment folders. To facilitate processing, we reorganized the dataset based on modality, categorizing input data into distinct modalities (e.g., 2D multi-view images, 3D point clouds, textual annotations). This reorganization enables more efficient multimodal processing.
Reformatted multi-view 2D image inputs
Reformatted 3D point cloud inputs
Training and Inference
Download the Processed reformatted 4D-OR provided below. The data folder should be like this:
S2Former-OR/data/: /images/: unzip 4d_or_images_multiview_reltrformat.zip /points/: unzip points.zip /infer/: unzip infer.zip /train.json: from reltr_annotations_8.3.zip /val.json: from reltr_annotations_8.3.zip /test.json: from reltr_annotations_8.3.zip /rel.json: from reltr_annotations_8.3.zip
After that, you can use the scripts in S2Former-OR and TriTemp-OR for training and inference. Please note that you need to first run the provided inference script and upload your inferred predictions here for evaluation on the test set.
Bibtex
@article{s2former2024,
               author={Jialun Pei and Diandian Guo and Jingyang Zhang and Manxi Lin and Yueming Jin and Pheng Ann Heng},
               title={S2Former-OR: Single-Stage Bimodal Transformer for Scene Graph Generation in OR},
               booktitle={TMI},
               year={2024}
}
@inproceedings{tritemp2024,
               author={Diandian Guo and Manxi Lin and Jialun Pei and He Tang and Yueming Jin and Pheng Ann Heng},
               title={Tri-modal Confluence with Temporal Dynamics for Scene Graph Generation in Operating Rooms},
               booktitle={MICCAI},
               year={2024}
}